This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Precoder Design for Massive MIMO Downlink with Matrix Manifold Optimization

Rui Sun, , Chen Wang, , An-An Lu, , Xiqi Gao, , and Xiang-Gen Xia This work was supported by the National Key R&D Program of China under Grant 2018YFB1801103, the Jiangsu Province Basic Research Project under Grant BK20192002, the Fundamental Research Funds for the Central Universities under Grant 2242022k60007, the Key R&D Plan of Jiangsu Province under Grant BE2022067, the Huawei Cooperation Project, and the National Natural Science Foundation of China under Grants 62394294 and 62371125. (Corresponding author: Xiqi Gao.)Rui Sun, Chen Wang, An-An Lu and Xiqi Gao are with the National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China and are also with Purple Mountain Laboratories, Nanjing 211111, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]).Xiang-Gen Xia is with the Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716 USA (e-mail: [email protected]).
Abstract

We investigate the weighted sum-rate (WSR) maximization linear precoder design for massive multiple-input multiple-output (MIMO) downlink. We consider a single-cell system with multiple users and propose a unified matrix manifold optimization framework applicable to total power constraint (TPC), per-user power constraint (PUPC) and per-antenna power constraint (PAPC). We prove that the precoders under TPC, PUPC and PAPC are on distinct Riemannian submanifolds, and transform the constrained problems in Euclidean space to unconstrained ones on manifolds. In accordance with this, we derive Riemannian ingredients, including orthogonal projection, Riemannian gradient, Riemannian Hessian, retraction and vector transport, which are needed for precoder design in the matrix manifold framework. Then, Riemannian design methods using Riemannian steepest descent, Riemannian conjugate gradient and Riemannian trust region are provided to design the WSR-maximization precoders under TPC, PUPC or PAPC. Riemannian methods do not involve the inverses of the large dimensional matrices during the iterations, reducing the computational complexities of the algorithms. Complexity analyses and performance simulations demonstrate the advantages of the proposed precoder design.

Index Terms:
Linear precoding, manifold optimization, per-antenna power constraint, per-user power constraint, total power constraint, weighted sum-rate.

I Introduction

Massive multiple-input multiple-output (MIMO) is one of the key techniques in the fifth generation (5G) wireless networks and will play an important role in future 6G systems with further increased antenna number scale [1, 2]. In massive MIMO systems, the base station (BS) equipped with a large number of antennas can serve a number of user terminals (UTs) on the same time-frequency resource, providing enormous potential capacity gains and high energy efficiency [3, 4]. However, serving many users simultaneously causes serious inter-user interference, which may reduce the throughput. To suppress the interference and increase the system throughput, precoders should be properly designed for massive MIMO downlink (DL) transmission. Specifically, a linear precoder is particularly attractive due to its low complexity [5, 6, 7].

There are several criteria for the DL precoder design, including minimum mean square error (MMSE), weighted sum-rate (WSR), quality of service (QoS), signal-to-interference-plus-noise ratio (SINR), signal-to-leakage-plus-noise ratio (SLNR), energy efficiency (EE), etc. [8, 9, 10, 11, 12, 13]. Among these criteria, WSR is of great practical significance and widely considered in massive MIMO systems, as it directly aims at increasing the system throughput, which is one of the main objectives of communication systems. In addition, different power constraints may be imposed on the designs. Generally, power constraints for massive MIMO systems can be classified into three main categories: total power constraint (TPC), per-user power constraint (PUPC) and per-antenna power constraint (PAPC). Designing precoders under different power constraints usually forms different problems, for which different approaches are proposed in the literature. To be specific, the precoder design problem under TPC is typically formulated as a nonconvex WSR-maximization problem, which is transformed into an MMSE problem iteratively solved by alternating optimization in [9]. Alternatively, [14] solves the WSR-maximization problem in the framework of the minorize-maximize (MM) method by finding a surrogate function of the objective function. When designing precoders under PUPC, the received SINR of each user is usually investigated with QoS constraints. With the received SINR, the sum-rate maximization problem is formulated and solved by the Lagrange dual function [15, 11]. Precoders under PAPC are deeply coupled with each other and thus challenging to design. In [16], precoders under PAPC are designed by minimizing the distance between the precoders under TPC and those under PAPC. The WSR-maximization precoder design under PAPC using the dual coordinate ascent method (DCAM) is proposed in [17], where the original nonconvex WSR-maximization problem is approximated as a sequence of convex MMSE-minimization problems subject to PAPC, and each convex problem is solved by the DCAM.

In general, the designs of WSR-maximization precoders under the power constraints mentioned above can be formulated as optimization problems with equality constraints. Recently, manifold optimization has been extensively studied and successfully applied to many domains [18, 19, 20, 21], showing a great advantage in dealing with smooth objective functions with challenging equality constraints. In mathematics, a manifold is a topological space that locally resembles Euclidean space near each point. By revealing the inherent geometric properties of the equality constraints, manifold optimization reformulates the constrained problems in Euclidean space as unconstrained ones on manifold. By defining the Riemannian ingredients associated with a Riemannian manifold, several Riemannian methods are presented for solving the unconstrained problems on manifold. In addition, manifold optimization usually shows promising algorithms resulting from the combination of insight from differential geometry, optimization, and numerical analysis. To be specific, by revealing that the precoders under TPC, PUPC or PAPC are on different Riemannian submanifolds, we can leverage this insight to transform the constrained problems into unconstrained ones on these submanifolds. Therefore, manifold optimization can provide a potential way for designing optimal WSR-maximization precoders under different power constraints in a unified framework.

In this paper, we focus on WSR-maximization precoder design for massive MIMO DL transmission and propose a matrix manifold framework applicable to TPC, PUPC and PAPC. We reveal the geometric properties of the precoders under different power constraints and prove that the precoder sets satisfying TPC, PUPC and PAPC form three different Riemannian submanifolds, respectively, transforming the constrained problems in Euclidean space into unconstrained ones on Riemannian submanifolds. To facilitate a better understanding, we analyze the precoder designs under TPC, PUPC and PAPC in detail. All the ingredients required during the optimizations on Riemannian submanifolds are derived for the three power constraints. Further, we present three Riemannian design methods using Riemannian steepest descent (RSD), Riemannian conjugate gradient (RCG) and Riemannian trust region (RTR), respectively. Without the need to invert the large dimensional matrix during the iterations, Riemannian methods can efficiently save computational costs, which is beneficial in practice. Complexity analysis shows that the method using RCG is computationally efficient. The numerical results confirm the advantages of the RCG method in convergence speed and WSR performance.

The remainder of the paper is organized as follows. In Section II, we introduce the preliminaries in the matrix manifold optimization. In Section III, we first formulate the WSR-maximization precoder design problem in Euclidean space. Then, we transform the constrained problems in Euclidean space under TPC, PUPC and PAPC to the unconstrained ones on Riemannian submanifolds and derive Riemannian ingredients in the matrix manifold framework. To solve the unconstrained problems on the Riemannian submanifolds, Section IV provides three Riemannian design methods and their complexity analyses. Section V presents numerical results and discusses the performance of the proposed precoder designs. The conclusion is drawn in Section VI.

Notations: Boldface lowercase and uppercase letters represent the column vectors and matrices, respectively. We write conjugate transpose of matrix 𝐀\mathbf{A} as 𝐀H\mathbf{A}^{H} while tr(𝐀)\mathrm{tr}\left(\mathbf{A}\right) and det(𝐀)\det\left(\mathbf{A}\right) denote the matrix trace and determinant of 𝐀\mathbf{A}, respectively. {𝐀}\Re\left\{\mathbf{A}\right\} means the real part of 𝐀\mathbf{A}. Let the mathematical expectation be 𝔼{}\mathbb{E}\left\{\cdot\right\}. 𝐈M\mathbf{I}_{M} denotes the M×MM\times M dimensional identity matrix, whose subscript may be omitted for brevity. 𝟎\mathbf{0} represents the vector or matrix whose elements are all zero. The Hadamard product of 𝐀\mathbf{A} and 𝐁\mathbf{B} is 𝐀𝐁\mathbf{A}\odot\mathbf{B}. Let 𝐞j\mathbf{e}_{j} denote the vector with the jj-th element equals 11 while the others equal 0. diag(𝐛)\mathrm{diag}\left(\mathbf{b}\right) represents the diagonal matrix with 𝐛\mathbf{b} along its main diagonal and diag(𝐀)\mathrm{diag}\left(\mathbf{A}\right) denotes the column vector of the main diagonal of 𝐀\mathbf{A}. Similarly, 𝐃=blkdiag{𝐀1,,𝐀K}\mathbf{D}=\mathrm{blkdiag}\left\{\mathbf{A}_{1},\cdots,\mathbf{A}_{K}\right\} denotes the block diagonal matrix with 𝐀1,,𝐀K\mathbf{A}_{1},\cdots,\mathbf{A}_{K} on the diagonal and [𝐃]i\left[\mathbf{D}\right]_{i} denotes the ii-th matrix on the diagonal, i.e., 𝐀i\mathbf{A}_{i}. A mapping FF from manifold 𝒩\mathcal{N} to manifold \mathcal{M} is F:𝒩:𝐗𝐘F:\mathcal{N}\rightarrow\mathcal{M}:\mathbf{X}\mapsto\mathbf{Y} denoted as F(𝐗)=𝐘F(\mathbf{X})=\mathbf{Y}. The differential of F(𝐗)F\left(\mathbf{X}\right) is represented as DF(𝐗)\mathrm{D}F\left(\mathbf{X}\right) while DF(𝐗)[𝝃𝐗]\mathrm{D}F\left(\mathbf{X}\right)\left[{\boldsymbol{\xi}}_{\mathbf{X}}\right] or DF[𝝃𝐗]\mathrm{D}F\left[{\boldsymbol{\xi}}_{\mathbf{X}}\right] means the directional derivative of FF at 𝐗\mathbf{X} along the tangent vector 𝝃𝐗{\boldsymbol{\xi}}_{\mathbf{X}}.

II Preliminaries for Matrix Manifold Optimization

In this section, we introduce some basic notions in the matrix manifold optimization and list some common notations in Table I, with \mathcal{M} representing a Riemannian manifold. For notational simplicity, the superscripts for distinguishing different manifolds can be neglected when no confusion will arise. To avoid overwhelmingly redundant preliminaries, we focus on a coordinate-free analysis and omit charts and differential structures. For a complete introduction to smooth manifolds, we refer to references [22, 23, 24].

TABLE I: Notations of the ingredients on a Riemannian submanifold \mathcal{M}
𝐗\mathbf{X} A point on \mathcal{M} T𝐗T_{\mathbf{X}}\mathcal{M} Tangent space of \mathcal{M} at 𝐗\mathbf{X}
𝝃𝐗{\boldsymbol{\xi}}_{\mathbf{X}} Tangent vector in T𝐗T_{\mathbf{X}}\mathcal{M} N𝐗N_{\mathbf{X}}\mathcal{M} Normal space of \mathcal{M} at 𝐗\mathbf{X}
f(𝐗)f(\mathbf{X}) Smooth function defined on \mathcal{M} Df(𝐗)[]\mathrm{D}f\left(\mathbf{X}\right)\left[\cdot\right] Differential operator of f(𝐗)f\left(\mathbf{X}\right)
g𝐗()g_{\mathbf{X}}^{\mathcal{M}}\left(\cdot\right) Riemannian metric operator on T𝐗T_{\mathbf{X}}\mathcal{M} R𝐗()R_{\mathbf{X}}^{\mathcal{M}}\left(\cdot\right) Retraction operator from T𝐗T_{\mathbf{X}}\mathcal{M} to \mathcal{M}
gradf(𝐗)\mathrm{grad}f(\mathbf{X}) Riemannian gradient of f(𝐗)f(\mathbf{X}) Hessf(𝐗)[]\mathrm{Hess}f(\mathbf{X})\left[\cdot\right] Riemannian Hessian operator of f(𝐗)f(\mathbf{X})
𝚷T𝐗T𝐘𝒩(𝝃𝐗){\boldsymbol{\Pi}}_{T_{\mathbf{X}}{\mathcal{M}}}^{T_{\mathbf{Y}}{\mathcal{N}}}\left({\boldsymbol{\xi}}_{\mathbf{X}}\right) Orthogonal projection from T𝐘𝒩T_{\mathbf{Y}}{\mathcal{N}} to T𝐗T_{\mathbf{X}}{\mathcal{M}} 𝒯η𝐗()\mathcal{T}_{\eta_{\mathbf{X}}}^{\mathcal{M}}\left(\cdot\right) Vector transport operator to TR𝐗(η𝐗)T_{R_{\mathbf{X}}^{\mathcal{M}}\left(\eta_{\mathbf{X}}\right)}\mathcal{M}

For a manifold 𝒩\mathcal{N}, a smooth mapping γ:𝒩:tγ(t)\gamma:\mathbb{R}\rightarrow\mathcal{N}:t\mapsto\gamma(t) is termed a curve in 𝒩\mathcal{N}. Let 𝐗\mathbf{X} be a point on 𝒩\mathcal{N} and 𝐗(𝒩)\mathcal{F}_{\mathbf{X}}(\mathcal{N}) denote the set of smooth real-valued functions defined on a neighborhood of 𝐗\mathbf{X}. A tangent vector 𝝃𝐗{\boldsymbol{\xi}}_{\mathbf{X}} to a manifold 𝒩\mathcal{N} at a point 𝐗\mathbf{X} is a mapping from 𝐗(𝒩)\mathcal{F}_{\mathbf{X}}(\mathcal{N}) to \mathbb{R} such that there exists a curve γ\gamma on 𝒩\mathcal{N} with γ(0)=𝐗\gamma(0)=\mathbf{X}, satisfying 𝝃𝐗f=d(f(γ(t)))dt|t=0{\boldsymbol{\xi}}_{\mathbf{X}}f=\left.\frac{\mathrm{d}(f(\gamma(t)))}{\mathrm{d}t}\right|_{t=0} for all f𝐗(𝒩)f\in\mathcal{F}_{\mathbf{X}}(\mathcal{N}). Such a curve γ\gamma is said to realize the tangent vector 𝝃𝐗{\boldsymbol{\xi}}_{\mathbf{X}}. For a manifold 𝒩\mathcal{N}, every point 𝐗\mathbf{X} on the manifold is attached to a unique and linear tangent space, denoted by T𝐗𝒩T_{\mathbf{X}}\mathcal{N}. T𝐗𝒩T_{\mathbf{X}}\mathcal{N} is a set of all the tangent vectors to 𝒩\mathcal{N} at 𝐗\mathbf{X}. The tangent bundle is the set of all tangent vectors to 𝒩\mathcal{N} denoted by T𝒩T\mathcal{N}, which is also a manifold. A vector field 𝝃{\boldsymbol{\xi}} on a manifold 𝒩\mathcal{N} is a smooth mapping from 𝒩\mathcal{N} to T𝒩T\mathcal{N} that assigns to each point 𝐗𝒩\mathbf{X}\in\mathcal{N} a tangent vector 𝝃𝐗T𝐗𝒩{\boldsymbol{\xi}}_{\mathbf{X}}\in T_{\mathbf{X}}\mathcal{N}. Particularly, every vector space \mathcal{E} forms a linear manifold naturally. Assuming 𝒩\mathcal{N} is a vector space \mathcal{E}, we have T𝐗𝒩=T_{\mathbf{X}}\mathcal{N}=\mathcal{E}.

Refer to caption
Figure 1: The relationship between two manifolds [22]. FF is a smooth mapping from 𝒩1\mathcal{N}_{1} to 𝒩2\mathcal{N}_{2}, and DF(𝐗)\mathrm{D}F\left(\mathbf{X}\right) is a linear mapping from T𝐗𝒩1T_{\mathbf{X}}{\mathcal{N}_{1}} to T𝐘𝒩2T_{\mathbf{Y}}{\mathcal{N}_{2}}.

Let F:𝒩1𝒩2F:\mathcal{N}_{1}\rightarrow\mathcal{N}_{2} be a smooth mapping from manifold 𝒩1\mathcal{N}_{1} to manifold 𝒩2\mathcal{N}_{2}. The differential of FF at 𝐗\mathbf{X} is a linear mapping from T𝐗𝒩1T_{\mathbf{X}}\mathcal{N}_{1} to T𝐘𝒩2T_{\mathbf{Y}}\mathcal{N}_{2}, denoted by DF(𝐗)[]\mathrm{D}F\left(\mathbf{X}\right)\left[\cdot\right]. Denote 𝐘=F(𝐗)\mathbf{Y}=F\left(\mathbf{X}\right) as a point on 𝒩2\mathcal{N}_{2}. Similarly, 𝝃𝐘=DF(𝐗)[𝝃𝐗]{\boldsymbol{\xi}}_{\mathbf{Y}}=\mathrm{D}F\left(\mathbf{X}\right)\left[{\boldsymbol{\xi}}_{\mathbf{X}}\right] is a tangent vector to 𝒩2\mathcal{N}_{2} at 𝐘\mathbf{Y}, i.e., an element in T𝐘𝒩2T_{\mathbf{Y}}\mathcal{N}_{2}. In particular, if 𝒩1\mathcal{N}_{1} and 𝒩2\mathcal{N}_{2} are linear manifolds, DF(𝐗)\mathrm{D}F\left(\mathbf{X}\right) reduces to the classical directional derivative [25]

DF(𝐗)[𝝃𝐗]=limt0F(𝐗+t𝝃𝐗)F(𝐗)t.\displaystyle\mathrm{D}F\left(\mathbf{X}\right)\left[\boldsymbol{{\boldsymbol{\xi}}}_{\mathbf{X}}\right]=\lim_{t\rightarrow 0}\frac{F\left(\mathbf{X}+t\boldsymbol{{\boldsymbol{\xi}}}_{\mathbf{X}}\right)-F\left(\mathbf{X}\right)}{t}. (1)

The rank of FF at 𝐗\mathbf{X} is the dimension of the range of DF(𝐗)[𝝃𝐗]\mathrm{D}F\left(\mathbf{X}\right)\left[\boldsymbol{{\boldsymbol{\xi}}}_{\mathbf{X}}\right]. Fig. 1 is a simple illustration for geometric understanding.

In practice, the equality constraint F:𝒩0F:\mathcal{N}\rightarrow 0 in many optimization problems forms a mapping between two linear manifolds. The solutions satisfying the equality constraint constitute a set ={𝐗𝒩F(𝐗)=0}\mathcal{M}=\left\{\mathbf{X}\in\mathcal{N}\mid F(\mathbf{X})=0\right\}. The set \mathcal{M} may admit several manifold structures, while it admits at most one differential structure that makes it an embedded submanifold of 𝒩\mathcal{N}. Whether \mathcal{M} forms an embedded submanifold mainly depends on the properties of F(𝐗)F\left(\mathbf{X}\right) [22, Proposition 3.3.2]. We assume \mathcal{M} is an embedded submanifold of 𝒩\mathcal{N} in the rest of this section to facilitate the introduction. Note that 𝐗\mathbf{X}\in\mathcal{M} is still a point on 𝒩\mathcal{N}. A level set of a real-valued function FF is the set of values 𝐗\mathbf{X} for which F(𝐗)F\left({\mathbf{X}}\right) is equal to a given constant. In particular, when \mathcal{M} is defined as a level set of a constant-rank function FF, the tangent space of \mathcal{M} at 𝐗\mathbf{X} is the kernel of the differential of FF and a subspace of the tangent space of 𝒩\mathcal{N}:

T𝐗=ker(DF(𝐗))T𝐗𝒩.\displaystyle T_{\mathbf{X}}{\mathcal{M}}=\mathrm{ker}\left(\mathrm{D}F\left(\mathbf{X}\right)\right)\subseteq T_{\mathbf{X}}\mathcal{N}. (2)

By endowing tangent space with an inner product g𝐗𝒩()g_{\mathbf{X}}^{\mathcal{N}}\left(\cdot\right), we can define the length of tangent vectors in T𝐗𝒩T_{\mathbf{X}}\mathcal{N}. Note that the subscript and the superscript in g𝐗𝒩()g_{\mathbf{X}}^{\mathcal{N}}\left(\cdot\right) is used to distinguish the metric of different points on different manifolds for clarity, which may be omitted if no confusion will arise. g𝐗𝒩()g_{\mathbf{X}}^{\mathcal{N}}\left(\cdot\right) is called Riemannian metric if it varies smoothly and the manifold is called Riemannian manifold. Since T𝐗T_{\mathbf{X}}{\mathcal{M}} can be regarded as a subspace of T𝐗𝒩T_{\mathbf{X}}{\mathcal{N}}, the Riemannian metric g𝐗𝒩()g_{\mathbf{X}}^{\mathcal{N}}\left(\cdot\right) on 𝒩\mathcal{N} induces a Riemannian metric g𝐗()g_{\mathbf{X}}^{\mathcal{M}}\left(\cdot\right) on \mathcal{M} according to

g𝐗(𝝃𝐗,𝜻𝐗)=g𝐗𝒩(𝝃𝐗,𝜻𝐗),𝝃𝐗,𝜻𝐗T𝐗.\displaystyle g_{\mathbf{X}}^{\mathcal{M}}\left({\boldsymbol{\xi}}_{\mathbf{X}},{\boldsymbol{\zeta}}_{\mathbf{X}}\right)=g_{\mathbf{X}}^{\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{X}},{\boldsymbol{\zeta}}_{\mathbf{X}}\right),\forall{\boldsymbol{\xi}}_{\mathbf{X}},{\boldsymbol{\zeta}}_{\mathbf{X}}\in T_{\mathbf{X}}\mathcal{M}. (3)

Note that 𝝃𝐗{\boldsymbol{\xi}}_{\mathbf{X}} and 𝜻𝐗{\boldsymbol{\zeta}}_{\mathbf{X}} on the right hand side are viewed as elements in T𝐗𝒩T_{\mathbf{X}}\mathcal{N}. Endowed with the Riemannian metric (3), the embedded submanifold \mathcal{M} forms a Riemannian submanifold.

In practice, it is suggested to utilize the orthogonal projection to obtain the elements in T𝐗T_{\mathbf{X}}{\mathcal{M}}. With the Riemannian metric g𝐗𝒩()g_{\mathbf{X}}^{\mathcal{N}}\left(\cdot\right), the T𝐗𝒩T_{\mathbf{X}}{\mathcal{N}} can be divided into two orthogonal subspaces as

T𝐗𝒩=T𝐗N𝐗,\displaystyle T_{\mathbf{X}}{\mathcal{N}}=T_{\mathbf{X}}{\mathcal{M}}\oplus N_{\mathbf{X}}{\mathcal{M}}, (4)

where the normal space N𝐗N_{\mathbf{X}}{\mathcal{M}} is the orthogonal complement of T𝐗T_{\mathbf{X}}{\mathcal{M}} defined as

N𝐗={𝝃𝐗T𝐗𝒩g𝐗(𝝃𝐗,𝜻𝐗)=0,𝜻𝐗T𝐗}.\displaystyle N_{\mathbf{X}}{\mathcal{M}}=\left\{{\boldsymbol{\xi}}_{\mathbf{X}}\in T_{\mathbf{X}}\mathcal{N}\mid g_{\mathbf{X}}\left({\boldsymbol{\xi}}_{\mathbf{X}},{\boldsymbol{\zeta}}_{\mathbf{X}}\right)=0,\ \forall{\boldsymbol{\zeta}}_{\mathbf{X}}\in T_{\mathbf{X}}{\mathcal{M}}\right\}. (5)

Thus, any 𝝃𝐗T𝐗𝒩{\boldsymbol{\xi}}_{\mathbf{X}}\in T_{\mathbf{X}}{\mathcal{N}} can be uniquely decomposed into the sum of an element in T𝐗T_{\mathbf{X}}{\mathcal{M}} and an element in N𝐗N_{\mathbf{X}}{\mathcal{M}}

𝝃𝐗=𝚷T𝐗T𝐗𝒩(𝝃𝐗)+𝚷N𝐗T𝐗𝒩(𝝃𝐗),\displaystyle{\boldsymbol{\xi}}_{\mathbf{X}}={\boldsymbol{\Pi}}_{T_{\mathbf{X}}{\mathcal{M}}}^{T_{\mathbf{X}}{\mathcal{N}}}\left({\boldsymbol{\xi}}_{\mathbf{X}}\right)+{\boldsymbol{\Pi}}_{N_{\mathbf{X}}{\mathcal{M}}}^{T_{\mathbf{X}}{\mathcal{N}}}\left({\boldsymbol{\xi}}_{\mathbf{X}}\right), (6)

where 𝚷𝐁𝐀(){\boldsymbol{\Pi}}_{\mathbf{B}}^{\mathbf{A}}\left(\cdot\right) denotes the orthogonal projection from 𝐀\mathbf{A} to 𝐁\mathbf{B}.

For a smooth real-valued function ff on a Riemannian submanifold \mathcal{M}, the Riemannian gradient of ff at 𝐗\mathbf{X}, denoted by gradf(𝐗)\mathrm{grad}f_{\mathcal{M}}(\mathbf{X}), is defined as the unique element in T𝐗T_{\mathbf{X}}{\mathcal{M}} that satisfies

g𝐗(gradf(𝐗),𝝃𝐗)=Df(𝐗)[𝝃𝐗],𝝃𝐗T𝐗.\displaystyle g_{\mathbf{X}}^{\mathcal{M}}\left(\mathrm{grad}f_{\mathcal{M}}(\mathbf{X}),\boldsymbol{{\boldsymbol{\xi}}}_{\mathbf{X}}\right)=\mathrm{D}f\left(\mathbf{X}\right)\left[\boldsymbol{{\boldsymbol{\xi}}}_{\mathbf{X}}\right],\forall{\boldsymbol{\xi}}_{\mathbf{X}}\in T_{\mathbf{X}}\mathcal{M}. (7)

Denote gradf\mathrm{grad}f_{\mathcal{M}} as the vector field of the Riemannian gradient on \mathcal{M}, whose subscript can be omitted when no confusion will arise. Note that, since gradf(𝐗)T𝐗\mathrm{grad}f_{\mathcal{M}}(\mathbf{X})\in T_{\mathbf{X}}{\mathcal{M}}, we have gradf𝒩(𝐗)T𝐗𝒩\mathrm{grad}f_{\mathcal{N}}(\mathbf{X})\in T_{\mathbf{X}}\mathcal{N} and it can then be decomposed as (6).

When second-order derivative optimization algorithms, such as Newton method and trust-region method, are preferred, affine connection is indispensable [22]. Let 𝔛(𝒩)\mathfrak{X}(\mathcal{N}) denote the set of smooth vector fields on 𝒩\mathcal{N}, the affine connection on 𝒩\mathcal{N} is defined as a mapping 𝒩:𝔛(𝒩)×𝔛(𝒩)𝔛(𝒩):(𝜼,𝝃)𝜼𝒩𝝃\nabla^{\mathcal{N}}:\mathfrak{X}(\mathcal{N})\times\mathfrak{X}(\mathcal{N})\rightarrow\mathfrak{X}(\mathcal{N}):(\boldsymbol{\eta},\boldsymbol{{\boldsymbol{\xi}}})\mapsto\nabla_{\boldsymbol{\eta}}^{\mathcal{N}}\boldsymbol{{\boldsymbol{\xi}}}. When Riemannian manifold \mathcal{M} is an embedded submanifold of a vector space, 𝜼𝐗𝝃\nabla_{\boldsymbol{\eta}_{\mathbf{X}}}^{\mathcal{M}}{\boldsymbol{\xi}} is called Riemannian connection and given by [22, Proposition 5.3.1]

𝜼𝐗𝝃=\displaystyle\nabla_{\boldsymbol{\eta}_{\mathbf{X}}}^{\mathcal{M}}{\boldsymbol{\xi}}= 𝚷T𝐗T𝐗𝒩(𝜼𝐗𝒩𝝃)\displaystyle{\boldsymbol{\Pi}}_{T_{\mathbf{X}}{\mathcal{M}}}^{T_{\mathbf{X}}{\mathcal{N}}}\left(\nabla_{\boldsymbol{\eta}_{{\mathbf{X}}}}^{\mathcal{N}}{\boldsymbol{\xi}}\right) (8)
=\displaystyle= 𝚷T𝐗T𝐗𝒩(D𝝃[𝜼𝐗]),𝜼𝐗T𝐗,𝝃𝔛().\displaystyle{\boldsymbol{\Pi}}_{T_{\mathbf{X}}{\mathcal{M}}}^{T_{\mathbf{X}}{\mathcal{N}}}\left(\mathrm{D}\boldsymbol{{\boldsymbol{\xi}}}\left[\boldsymbol{\eta}_{{\mathbf{X}}}\right]\right),\forall\boldsymbol{\eta}_{\mathbf{X}}\in T_{\mathbf{X}}\mathcal{M},{\boldsymbol{\xi}}\in\mathfrak{X}(\mathcal{M}).

Given a Riemannian connection \nabla^{\mathcal{M}} defined on \mathcal{M}, the Riemannian Hessian of a real valued function ff at point 𝐗\mathbf{X} on \mathcal{M} is the linear mapping Hessf(𝐗)[]\operatorname{Hess}f(\mathbf{X})\left[\cdot\right] of T𝐗T_{\mathbf{X}}\mathcal{M} into itself defined as

Hessf(𝐗)[𝝃𝐗]=𝝃𝐗gradf.\displaystyle\mathrm{Hess}f(\mathbf{X})\left[\boldsymbol{{\boldsymbol{\xi}}}_{\mathbf{X}}\right]=\nabla_{\boldsymbol{{\boldsymbol{\xi}}}_{\mathbf{X}}}^{\mathcal{M}}\mathrm{grad}f. (9)

The notion of moving along the tangent vector while remaining on the manifold is generalized by retraction. The retraction R𝐗𝒩()R_{\mathbf{X}}^{\mathcal{N}}\left(\cdot\right), a smooth mapping from T𝐗𝒩T_{\mathbf{X}}\mathcal{N} to 𝒩\mathcal{N} [22, Definition 4.1.1], builds a bridge between the tangent space T𝐗𝒩T_{\mathbf{X}}\mathcal{N} and the manifold 𝒩\mathcal{N}. Practically, the Riemannian exponential mapping forms a retraction but is computationally expensive to implement. For a Riemannian submanifold \mathcal{M}, the geometric structure of the manifold is utilized to define the retraction R𝐗()R_{\mathbf{X}}^{\mathcal{M}}\left(\cdot\right) through the projection for affordable and efficient access.

Refer to caption
Figure 2: Geometric interpretation of orthogonal projection, retraction and vector transport.

Sometimes, we need to move the tangent vector from the current tangent space to another, which is not always straightforward as the tangent spaces are different in a nonlinear manifold. To address this difficulty, vector transport denoted by 𝒯𝜼𝐗𝒩(𝝃𝐗)TR𝐗(𝜼𝐗)𝒩\mathcal{T}_{\boldsymbol{\eta}_{\mathbf{X}}}^{\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{X}}\right)\in T_{R_{\mathbf{X}}\left(\boldsymbol{\eta}_{\mathbf{X}}\right)}\mathcal{N} is introduced, which specifies how to transport a tangent vector 𝝃𝐗{\boldsymbol{\xi}}_{\mathbf{X}} from a point 𝐗\mathbf{X} on 𝒩\mathcal{N} to another point R𝐗(𝜼𝐗)R_{\mathbf{X}}\left(\boldsymbol{\eta}_{\mathbf{X}}\right). Like Hessf(𝐗)[]\mathrm{Hess}f(\mathbf{X})\left[\cdot\right], 𝒯𝜼𝐗𝒩()\mathcal{T}_{\boldsymbol{\eta}_{\mathbf{X}}}^{\mathcal{N}}\left(\cdot\right) is an operator rather than a matrix. For the Riemannian submanifold, 𝒯𝜼𝐗(𝝃𝐗)\mathcal{T}_{\boldsymbol{\eta}_{\mathbf{X}}}^{\mathcal{M}}\left({\boldsymbol{\xi}}_{\mathbf{X}}\right) can be achieved by orthogonal projection [22]:

𝒯𝜼𝐗(𝝃𝐗)=ΠTR𝐗(𝜼𝐗)T𝐗(𝝃𝐗).\displaystyle\mathcal{T}_{\boldsymbol{\eta}_{\mathbf{X}}}^{\mathcal{M}}\left({\boldsymbol{\xi}}_{\mathbf{X}}\right)=\Pi_{T_{R_{\mathbf{X}}^{\mathcal{M}}\left(\boldsymbol{\eta}_{\mathbf{X}}\right)}\mathcal{M}}^{T_{\mathbf{X}}\mathcal{M}}\left({\boldsymbol{\xi}}_{\mathbf{X}}\right). (10)

For geometric understanding, Fig. 2 is a simple illustration.

III Problem Formulation and Riemannian Elements in Matrix Manifold framework

In this section, we first present the WSR-maximization precoder design problems under TPC, PUPC and PAPC, respectively, for massive MIMO DL in Euclidean space. Then, we show that the precoder set satisfying TPC forms a sphere and the precoder set satisfying PUPC or PAPC forms an oblique manifold. Further, we prove that the precoder sets form three different Riemannian submanifolds, from which we transform the constrained problems in Euclidean space to unconstrained ones on these Riemannian submanifolds. Sequentially, we derive Riemannian ingredients, including orthogonal projection, Riemannian gradient, Riemannian Hessian, retraction and vector transport, which are needed for precoder design in matrix manifold framework.

III-A WSR-maximization Precoder Design Problem in Euclidean Space

We consider a single-cell massive MIMO system as shown in Fig. 3. In the system, the BS equipped with MtM_{t} antennas serves UU UTs and the ii-th UT has MiM_{i} antennas. The user set is denoted as 𝒰={1,2,,U}\mathcal{U}=\left\{1,2,\cdots,U\right\} and the set of the antennas at the BS side is denoted as 𝒜={1,2,,Mt}\mathcal{A}=\left\{1,2,\cdots,M_{t}\right\}. Let 𝐱idi\mathbf{x}_{i}\in\mathbb{C}^{d_{i}} denote the signal transmitted to the ii-th UT satisfying 𝔼{𝐱i𝐱iH}=𝐈di\mathbb{E}\left\{\mathbf{x}_{i}\mathbf{x}_{i}^{H}\right\}=\mathbf{I}_{d_{i}}, where did_{i} is the dimension of the signal transmitted to the ii-th UT. 𝐱=i=1U𝐏i𝐱i\mathbf{x}=\sum_{i=1}^{U}\mathbf{P}_{i}\mathbf{x}_{i} is the transmitted signal for all the UTs. The received signal 𝐲i\mathbf{y}_{i} of the ii-th UT is given by

𝐲i=𝐇i𝐏i𝐱idesiredsignal+𝐇ii,=1U𝐏𝐱+𝐳iinterferenceplusnoise,\displaystyle\mathbf{y}_{i}=\overbrace{\mathbf{H}_{i}\mathbf{P}_{i}\mathbf{x}_{i}}^{\mathrm{desired\ signal}}+\overbrace{\mathbf{H}_{i}\sum_{\ell\neq i,\ell=1}^{U}\mathbf{P}_{\ell}\mathbf{x}_{\ell}+\mathbf{z}_{i}}^{\mathrm{interference\ plus\ noise}}, (11)

where 𝐇i\mathbf{H}_{i} is the complex channel matrix from the BS to the ii-th UT, 𝐏i\mathbf{P}_{i} is the corresponding precoding matrix, and 𝐳i\mathbf{z}_{i} denotes the independent and identically distributed (i.i.d.) complex circularly symmetric Gaussian noise vector distributed as 𝒞𝒩(0,σz2𝐈Mi)\mathcal{CN}\left(0,\sigma_{z}^{2}\mathbf{I}_{M_{i}}\right).

Refer to caption
Figure 3: The illustration of the massive MIMO system.

For simplicity, we assume that the perfect channel state information (CSI) of the channel 𝐇i,i𝒰,\mathbf{H}_{i},\forall i\in\mathcal{U}, is known at the BS side, and the perfect CSI of the effective channel 𝐇i𝐏i\mathbf{H}_{i}\mathbf{P}_{i} is available for the ii-th UT via DL training. For the worst-case design, the aggregate interference plus noise 𝐳i=𝐇iliU𝐏𝐱+𝐳i\mathbf{z}_{i}^{\prime}=\mathbf{H}_{i}\sum_{l\neq i}^{U}\mathbf{P}_{\ell}\mathbf{x}_{\ell}+\mathbf{z}_{i} is treated as Gaussian noise. The covariance matrix of 𝐳i\mathbf{z}_{i}^{\prime} is as follows

𝐑i=𝔼{𝐳i(𝐳i)H}=σz2𝐈Mi+liU𝐇i𝐏𝐏H𝐇iH.\displaystyle\mathbf{R}_{i}=\mathbb{E}\left\{\mathbf{z}_{i}^{\prime}\left(\mathbf{z}_{i}^{\prime}\right)^{H}\right\}=\sigma_{z}^{2}\mathbf{I}_{M_{i}}+\sum_{l\neq i}^{U}\mathbf{H}_{i}\mathbf{P}_{\ell}\mathbf{P}_{\ell}^{H}\mathbf{H}_{i}^{H}. (12)

By assuming that 𝐑i\mathbf{R}_{i} is also known by UT ii, the rate of user ii can be written as

i=\displaystyle\mathcal{R}_{i}= logdet(𝐑i+𝐇i𝐏i𝐏iH𝐇iH)logdet(𝐑i)\displaystyle\log\det\left(\mathbf{R}_{i}+\mathbf{H}_{i}\mathbf{P}_{i}\mathbf{P}_{i}^{H}\mathbf{H}_{i}^{H}\right)-\log\det\left(\mathbf{R}_{i}\right) (13)
=\displaystyle= logdet(𝐈di+𝐏iH𝐇iH(𝐑i)1𝐇i𝐏i).\displaystyle\log\det\left(\mathbf{I}_{d_{i}}+\mathbf{P}_{i}^{H}\mathbf{H}_{i}^{H}\left(\mathbf{R}_{i}\right)^{-1}\mathbf{H}_{i}\mathbf{P}_{i}\right).

The WSR-maximization precoder design problem can be formulated into a minimization problem as

argmin𝐏1,,𝐏Uf(𝐏1,,𝐏U)s.t.F(𝐏1,𝐏2,,𝐏U)=0,\displaystyle\mathop{\arg\min}\limits_{\mathbf{P}_{1},\cdots,\mathbf{P}_{U}}f\left(\mathbf{P}_{1},\cdots,\mathbf{P}_{U}\right)\ \ \mathrm{s.t.}\ F\left(\mathbf{P}_{1},\mathbf{P}_{2},\cdots,\mathbf{P}_{U}\right)=0, (14)

where f(𝐏1,,𝐏U)=i=1Uwiif\left(\mathbf{P}_{1},\cdots,\mathbf{P}_{U}\right)=-\sum_{i=1}^{U}w_{i}\mathcal{R}_{i} is the objective function with wiw_{i} being the weighted factor of user ii, and F(𝐏1,𝐏2,,𝐏U)=0F\left(\mathbf{P}_{1},\mathbf{P}_{2},\cdots,\mathbf{P}_{U}\right)=0 is the power constraint. We consider three specific power constraints in massive MIMO DL transmission including TPC, PUPC and PAPC. By stacking the precoding matrices of different users, we redefine the variable as

𝐏\displaystyle\mathbf{P} =(𝐏1,𝐏2,,𝐏U).\displaystyle=\left(\mathbf{P}_{1},\mathbf{P}_{2},\cdots,\mathbf{P}_{U}\right). (15)

Let PP, Pi,i𝒰P_{i},\forall i\in\mathcal{U}, and pj,j𝒜p_{j},\forall j\in\mathcal{A}, denote the total transmit power of the BS, the transmit power allocated for the ii-th UT and the power constraint of the jj-th transmit antenna, respectively. The WSR-maximization problem can be rewritten as

𝐏=(𝐏1,𝐏2,,𝐏U)=argmin𝐏f(𝐏)s.t.F(𝐏)=0,\displaystyle\mathbf{P}^{\star}=\left({\mathbf{P}_{1}^{\star},\mathbf{P}_{2}^{\star},\cdots,\mathbf{P}_{U}^{\star}}\right)=\mathop{\arg\min}\limits_{\mathbf{P}}f\left(\mathbf{P}\right)\ \ \mathrm{s.t.}\ F\left(\mathbf{P}\right)=0, (16)

where F(𝐏)F\left(\mathbf{P}\right) can be expressed as

F^(𝐏)\displaystyle\hat{F}\left(\mathbf{P}\right) =tr{𝐏H𝐏}P,\displaystyle=\mathrm{tr}\left\{\mathbf{P}^{H}\mathbf{P}\right\}-P, (17a)
F~(𝐏)\displaystyle\tilde{F}\left(\mathbf{P}\right) =tr{𝐏iH𝐏i}Pi,i𝒰,\displaystyle=\mathrm{tr}\left\{\mathbf{P}_{i}^{H}\mathbf{P}_{i}\right\}-P_{i},\forall\ i\in\mathcal{U}, (17b)
F¯(𝐏)\displaystyle\bar{F}\left(\mathbf{P}\right) =𝐞jH𝐏𝐏H𝐞jpj,j𝒜\displaystyle=\mathbf{e}_{j}^{H}\mathbf{P}\mathbf{P}^{H}\mathbf{e}_{j}-p_{j},\forall\ j\in\mathcal{A} (17c)

for TPC, PUPC and PAPC, respectively. For simplicity, we assume that the power allocation process is done in the PUPC case and i=1UPi=P\sum_{i=1}^{U}P_{i}=P without loss of generality. Besides, equal antenna power constraint is usually considered in practice to efficiently utilize the power amplifier capacity of each antenna [26] in the PAPC case, where the power constraint for each transmit antenna is pj=PMt,j𝒜,p_{j}=\frac{P}{M_{t}},\forall j\in\mathcal{A}, and F¯(𝐏)\bar{F}\left(\mathbf{P}\right) can be reexpressed as F¯(𝐏)=𝐈Mt(𝐏𝐏H)=PMt𝐈Mt\bar{F}\left(\mathbf{P}\right)=\mathbf{I}_{M_{t}}\odot\left(\mathbf{P}\mathbf{P}^{H}\right)=\frac{P}{M_{t}}\mathbf{I}_{M_{t}}.

III-B Problem Reformulation on Riemannian Submanifold

In this subsection, we transform the constrained problems into unconstrained ones on three different Riemannian submanifolds.

From the perspective of manifold optimization, 𝐏iMt×di\mathbf{P}_{i}\in\mathbb{C}^{M_{t}\times d_{i}} belongs to the linear manifold Mt×di\mathbb{C}^{M_{t}\times d_{i}} naturally. Define the Riemannian metric as

g𝐏i(𝝃𝐏i,𝜻𝐏i)={tr(𝝃𝐏iH𝜻𝐏i)},\displaystyle g_{\mathbf{P}_{i}}\left({\boldsymbol{\xi}}_{\mathbf{P}_{i}},{\boldsymbol{\zeta}}_{\mathbf{P}_{i}}\right)=\Re\left\{\mathrm{tr}\left({\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}{\boldsymbol{\zeta}}_{\mathbf{P}_{i}}\right)\right\}, (18)

where 𝝃𝐏i{\boldsymbol{\xi}}_{\mathbf{P}_{i}} and 𝜻𝐏i{\boldsymbol{\zeta}}_{\mathbf{P}_{i}} are tangent vectors in tangent space T𝐏iMt×di=Mt×diT_{\mathbf{P}_{i}}{\mathbb{C}^{M_{t}\times d_{i}}}=\mathbb{C}^{M_{t}\times d_{i}}. With the Riemannian metric (18), Mt×di\mathbb{C}^{M_{t}\times d_{i}} forms a Riemannian manifold. In fact, 𝐏=(𝐏1,𝐏2,,𝐏U)\mathbf{P}=\left(\mathbf{P}_{1},\mathbf{P}_{2},\cdots,\mathbf{P}_{U}\right) is a point on the product manifold [22]

Mt×d1×Mt×d2××Mt×dU:=𝒩.\displaystyle\mathbb{C}^{M_{t}\times d_{1}}\times\mathbb{C}^{M_{t}\times d_{2}}\times\cdots\times\mathbb{C}^{M_{t}\times d_{U}}:=\mathcal{N}. (19)

Similarly, the tangent space of 𝒩\mathcal{N} is given by [23]

T𝐏𝒩=T𝐏1Mt×d1×T𝐏2Mt×d2××T𝐏UMt×dU,\displaystyle T_{\mathbf{P}}\mathcal{N}=T_{\mathbf{P}_{1}}\mathbb{C}^{M_{t}\times d_{1}}\times T_{\mathbf{P}_{2}}\mathbb{C}^{M_{t}\times d_{2}}\times\cdots\times T_{\mathbf{P}_{U}}\mathbb{C}^{M_{t}\times d_{U}}, (20)

of which the product Riemannian metric is defined as a direct sum [23] :

g𝐏(𝝃𝐏,𝜻𝐏)=g𝐏1(𝝃𝐏1,𝜻𝐏1)g𝐏U(𝝃𝐏U,𝜻𝐏U).\displaystyle g_{\mathbf{P}}\left({\boldsymbol{\xi}}_{\mathbf{P}},{\boldsymbol{\zeta}}_{\mathbf{P}}\right)=g_{\mathbf{P}_{1}}\left({\boldsymbol{\xi}}_{\mathbf{P}_{1}},{\boldsymbol{\zeta}}_{\mathbf{P}_{1}}\right)\oplus\cdots\oplus g_{\mathbf{P}_{U}}\left({\boldsymbol{\xi}}_{\mathbf{P}_{U}},{\boldsymbol{\zeta}}_{\mathbf{P}_{U}}\right). (21)

𝝃𝐏=(𝝃𝐏1,𝝃𝐏2,,𝝃𝐏U){\boldsymbol{\xi}}_{\mathbf{P}}=\left({\boldsymbol{\xi}}_{\mathbf{P}_{1}},{\boldsymbol{\xi}}_{\mathbf{P}_{2}},\cdots,{\boldsymbol{\xi}}_{\mathbf{P}_{U}}\right) and 𝜻𝐏=(𝜻𝐏1,𝜻𝐏2,,𝜻𝐏U){\boldsymbol{\zeta}}_{\mathbf{P}}=\left({\boldsymbol{\zeta}}_{\mathbf{P}_{1}},{\boldsymbol{\zeta}}_{\mathbf{P}_{2}},\cdots,{\boldsymbol{\zeta}}_{\mathbf{P}_{U}}\right) are tangent vectors at point 𝐏\mathbf{P} in T𝐏𝒩T_{\mathbf{P}}\mathcal{N}. With the Riemannian metric defined in (21), 𝒩\mathcal{N} is also a Riemannian manifold. Let

^\displaystyle\widehat{\mathcal{M}} ={𝐏tr{𝐏H𝐏}=P},\displaystyle=\left\{\mathbf{P}\mid\mathrm{tr}\left\{\mathbf{P}^{H}\mathbf{P}\right\}=P\right\}, (22a)
~\displaystyle\widetilde{\mathcal{M}} ={𝐏tr(𝐏iH𝐏i)=Pi,i𝒰},\displaystyle=\left\{\mathbf{P}\mid\mathrm{tr}\left(\mathbf{P}_{i}^{H}\mathbf{P}_{i}\right)=P_{i},\forall\ i\in\mathcal{U}\right\}, (22b)
¯\displaystyle\overline{\mathcal{M}} ={𝐏𝐈Mt(𝐏𝐏H)=PMt𝐈Mt}\displaystyle=\left\{\mathbf{P}\mid\mathbf{I}_{M_{t}}\odot\left(\mathbf{P}\mathbf{P}^{H}\right)=\frac{P}{M_{t}}\mathbf{I}_{M_{t}}\right\} (22c)

denote the precoder sets satisfying TPC, PUPC and PAPC, respectively. We have the following theorem.

Theorem 1.

The Frobenius norm of 𝐏^^\forall\hat{\mathbf{P}}\in\widehat{\mathcal{M}} is a constant and ^\widehat{\mathcal{M}} forms a sphere. The Frobenius norm of each component matrix 𝐏~i\tilde{\mathbf{P}}_{i} of 𝐏~~\forall\tilde{\mathbf{P}}\in\widetilde{\mathcal{M}} is a constant and ~\widetilde{\mathcal{M}} forms a oblique manifold composed of UU spheres. The norm of each column of 𝐏¯¯\forall\bar{\mathbf{P}}\in\overline{\mathcal{M}} is a constant and ¯\overline{\mathcal{M}} forms an oblique manifold with MtM_{t} spheres. ^\widehat{\mathcal{M}}, ~\widetilde{\mathcal{M}} and ¯\overline{\mathcal{M}} form three different Riemannian submanifolds of the product manifold 𝒩\mathcal{N}.

Proof.

The proof is provided in Appendix A. ∎

From Theorem 1, the constrained problems under TPC, PUPC and PAPC can be converted into unconstrained ones on manifolds as

𝐏^\displaystyle\hat{\mathbf{P}}^{\star} =argmin𝐏^^f(𝐏^),\displaystyle=\mathop{\arg\min}\limits_{\hat{\mathbf{P}}\in\widehat{\mathcal{M}}}f\left(\hat{\mathbf{P}}\right), (23a)
𝐏~\displaystyle\tilde{\mathbf{P}}^{\star} =argmin𝐏~~f(𝐏~),\displaystyle=\mathop{\arg\min}\limits_{\tilde{\mathbf{P}}\in\widetilde{\mathcal{M}}}f\left(\tilde{\mathbf{P}}\right), (23b)
𝐏¯\displaystyle\bar{\mathbf{P}}^{\star} =argmin𝐏¯¯f(𝐏¯).\displaystyle=\mathop{\arg\min}\limits_{\bar{\mathbf{P}}\in\overline{\mathcal{M}}}f\left(\bar{\mathbf{P}}\right). (23c)

In the case that TPC, PUPC and PAPC need to be satisfied simultaneously, the problem can be reformulated on the intersection of ~\widetilde{\mathcal{M}} and ¯\overline{\mathcal{M}} and solved with the help of the von Neumann’s method of alternating projections proposed in [27]. Precoder design with matrix manifold optimization is also investigated in [28], where the matrix manifold optimization is used to compute the related smallest eigenvalue and the corresponding eigenvector. So the problem is reformulated as the Rayleigh quotient minimization of generalized eigenvalue problems on the Riemannian quotient manifold, whose solution is not local optimal for the original problem.

III-C Riemannian Ingredients for Precoder Design

In this subsection, we derive all the ingredients needed in manifold optimization for the three precoder design problems on Riemannian submanifolds.

First of all, we derive the Euclidean gradient of f(𝐏)f\left(\mathbf{P}\right), which can be viewed as the Riemannian gradient of f(𝐏)f\left(\mathbf{P}\right) on 𝒩\mathcal{N}. Recall that the Riemannian gradient of f(𝐏)f\left(\mathbf{P}\right) on 𝒩\mathcal{N} is the unique element gradf(𝐏)T𝐏𝒩\mathrm{grad}f\left(\mathbf{P}\right)\in T_{\mathbf{P}}\mathcal{N} that satisfies (7). Thus the Riemannian gradient gradf(𝐏)\mathrm{grad}f\left(\mathbf{P}\right) is identified from the directional derivative Df(𝐏)[𝝃𝐏]\mathrm{D}f\left(\mathbf{P}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}}\right] with the Riemannian metric g𝐏()g_{\mathbf{P}}\left(\cdot\right). Now let us define

𝐀i=𝐑i1𝐇i𝐏iMi×di,\mathbf{A}_{i}=\mathbf{R}_{i}^{-1}\mathbf{H}_{i}\mathbf{P}_{i}\in\mathbb{C}^{M_{i}\times d_{i}}, (24)
𝐁i=𝐀i𝐂i𝐀iHMi×Mi,\mathbf{B}_{i}=\mathbf{A}_{i}\mathbf{C}_{i}\mathbf{A}_{i}^{H}\in\mathbb{C}^{M_{i}\times M_{i}}, (25)
𝐂i=(𝐈di+𝐏iH𝐇iH𝐀i)1di×di.\mathbf{C}_{i}=\left(\mathbf{I}_{d_{i}}+\mathbf{P}_{i}^{H}\mathbf{H}_{i}^{H}\mathbf{A}_{i}\right)^{-1}\in\mathbb{C}^{d_{i}\times d_{i}}. (26)
Theorem 2.

The Euclidean gradient of f(𝐏)f\left(\mathbf{P}\right) is

gradf(𝐏)=(gradf(𝐏1),gradf(𝐏2),,gradf(𝐏U)),\mathrm{grad}f\left(\mathbf{P}\right)=\left(\mathrm{grad}f\left(\mathbf{P}_{1}\right),\mathrm{grad}f\left(\mathbf{P}_{2}\right),\cdots,\mathrm{grad}f\left(\mathbf{P}_{U}\right)\right), (27)

where

gradf(𝐏i)=2(wi𝐇iH𝐀i𝐂iiw𝐇H𝐁𝐇𝐏i)\mathrm{grad}f\left(\mathbf{P}_{i}\right)=-2\left(w_{i}\mathbf{H}_{i}^{H}\mathbf{A}_{i}\mathbf{C}_{i}-\sum_{\ell\neq i}w_{\ell}\mathbf{H}_{\ell}^{H}\mathbf{B}_{\ell}\mathbf{H}_{\ell}\mathbf{P}_{i}\right) (28)

is the Euclidean gradient on the ii-th component submanifold Mt×di\mathbb{C}^{M_{t}\times d_{i}}.

Proof.

See the proof in Appendix B. ∎

With Theorem 2, we further derive the orthogonal projection, Riemannian gradient, retraction and vector transport for the three precoder design problems on Riemannian submanifolds.

III-C1 TPC

From (4), the tangent space T𝐏𝒩T_{\mathbf{P}}\mathcal{N} is decomposed into two orthogonal subspaces

T𝐏𝒩=T𝐏^^N𝐏^^.\displaystyle T_{\mathbf{P}}{\mathcal{N}}=T_{\hat{\mathbf{P}}}{\widehat{\mathcal{M}}}\oplus N_{\hat{\mathbf{P}}}{\widehat{\mathcal{M}}}. (29)

According to (2), the tangent space T𝐏^^T_{\hat{\mathbf{P}}}{\widehat{\mathcal{M}}} is given by

T𝐏^^=\displaystyle T_{\hat{\mathbf{P}}}{\widehat{\mathcal{M}}}= ker(DF^(𝐏))\displaystyle\mathrm{ker}\left(\mathrm{D}\hat{F}\left(\mathbf{P}\right)\right) (30)
=\displaystyle= {𝝃𝐏T𝐏𝒩tr(𝐏H𝝃𝐏+𝝃𝐏H𝐏)=0}.\displaystyle\left\{{\boldsymbol{\xi}}_{\mathbf{P}}\in T_{\mathbf{P}}\mathcal{N}\mid\mathrm{tr}\left(\mathbf{P}^{H}{\boldsymbol{\xi}}_{\mathbf{P}}+{\boldsymbol{\xi}}_{\mathbf{P}}^{H}\mathbf{P}\right)=0\right\}.

The normal space of ^\widehat{\mathcal{M}} is the orthogonal complement of T𝐏^^T_{\hat{\mathbf{P}}}{\widehat{\mathcal{M}}} and can be expressed as

N𝐏^^=\displaystyle N_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}= {𝝃𝐏T𝐏𝒩g𝐏(𝝃𝐏,𝜻𝐏^)=0,𝜻𝐏^T𝐏^^}\displaystyle\left\{{\boldsymbol{\xi}}_{\mathbf{P}}\in T_{\mathbf{P}}\mathcal{N}\mid g_{\mathbf{P}}\left({\boldsymbol{\xi}}_{\mathbf{P}},{\boldsymbol{\zeta}}_{\hat{\mathbf{P}}}\right)=0,\ \forall{\boldsymbol{\zeta}}_{\hat{\mathbf{P}}}\in T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}\right\} (31)
=\displaystyle= {λ^𝐏λ^}.\displaystyle\left\{\hat{\lambda}\mathbf{P}\mid\hat{\lambda}\in\mathbb{R}\right\}.

Therefore, any 𝝃𝐏T𝐏𝒩{\boldsymbol{\xi}}_{\mathbf{P}}\in T_{\mathbf{P}}\mathcal{N} can be decomposed into two orthogonal parts as

𝝃𝐏=ΠT𝐏^^T𝐏𝒩(𝝃𝐏)+ΠN𝐏^^T𝐏𝒩(𝝃𝐏),\displaystyle{\boldsymbol{\xi}}_{\mathbf{P}}=\Pi_{T_{\widehat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right)+\Pi_{N_{\widehat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right), (32)

where ΠT𝐏^^T𝐏𝒩(𝝃𝐏)\Pi_{T_{\widehat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right) and ΠN𝐏^^T𝐏𝒩(𝝃𝐏)\Pi_{N_{\widehat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right) represent the orthogonal projections of 𝝃𝐏{\boldsymbol{\xi}}_{\mathbf{P}} onto T𝐏^^T_{\hat{\mathbf{P}}}{\widehat{\mathcal{M}}} and N𝐏^^N_{\hat{\mathbf{P}}}{\widehat{\mathcal{M}}}, respectively.

Lemma 1.

For any 𝛏𝐏T𝐏𝒩{\boldsymbol{\xi}}_{\mathbf{P}}\in T_{\mathbf{P}}\mathcal{N}, the orthogonal projection ΠT𝐏^^T𝐏𝒩(𝛏𝐏)\Pi_{T_{\widehat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right) is given by

ΠT𝐏^^T𝐏𝒩(𝝃𝐏)=𝝃𝐏λ^𝐏,\displaystyle\Pi_{T_{\widehat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right)={\boldsymbol{\xi}}_{\mathbf{P}}-\hat{\lambda}\mathbf{P}, (33)

where

λ^=1P{tr(𝐏H𝝃𝐏)}.\displaystyle\hat{\lambda}=\frac{1}{P}\Re\left\{\mathrm{tr}\left(\mathbf{P}^{H}{\boldsymbol{\xi}}_{\mathbf{P}}\right)\right\}. (34)
Proof.

See Appendix C for the proof. ∎

Given the orthogonal projection ΠT𝐏^^T𝐏𝒩(𝝃𝐏)\Pi_{T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right), we derive the Riemannian gradient and Riemannian Hessian of f(𝐏^)f\big{(}\hat{\mathbf{P}}\big{)} subsequently.

Theorem 3.

The Riemannian gradient of f(𝐏^)f\big{(}\hat{\mathbf{P}}\big{)} is

gradf(𝐏^)=gradf(𝐏)λ^1𝐏,\displaystyle\mathrm{grad}f\left(\hat{\mathbf{P}}\right)=\mathrm{grad}f\left(\mathbf{P}\right)-\hat{\lambda}_{1}\mathbf{P}, (35)

where

λ^1=1P{tr(𝐏Hgradf(𝐏))}.\hat{\lambda}_{1}=\frac{1}{P}\Re\left\{\mathrm{tr}\left(\mathbf{P}^{H}\mathrm{grad}f\left(\mathbf{P}\right)\right)\right\}. (36)

The Riemannian Hessian of f(𝐏^)f\big{(}\hat{\mathbf{P}}\big{)} on ^\widehat{\mathcal{M}} is given by

Hessf(𝐏^)[𝝃𝐏^]=\displaystyle\mathrm{Hess}f(\hat{\mathbf{P}})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}]= (Hessf(𝐏^1)[𝝃𝐏^1],,Hessf(𝐏^U)[𝝃𝐏^U]),\displaystyle\left(\mathrm{Hess}f(\hat{\mathbf{P}}_{1})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}_{1}}],\cdots,\mathrm{Hess}f(\hat{\mathbf{P}}_{U})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}_{U}}]\right), (37)

where

Hessf(𝐏^i)[𝝃𝐏^i]=Dgradf(𝐏^i)[𝝃𝐏^i]λ^2𝐏i\displaystyle\mathrm{Hess}f(\hat{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}_{i}}]=\mathrm{Dgrad}f(\hat{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}_{i}}]-\hat{\lambda}_{2}\mathbf{P}_{i} (38)
=Dgradf(𝐏i)[𝝃𝐏i]Dλ^1[𝝃𝐏i]𝐏iλ^1𝝃𝐏iλ^2𝐏i,\displaystyle=\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]-\mathrm{D}\hat{\lambda}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]\mathbf{P}_{i}-\hat{\lambda}_{1}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}-\hat{\lambda}_{2}\mathbf{P}_{i},
λ^2=1P{𝐏HDgradf(𝐏^)[𝝃𝐏^]}.\begin{aligned} \hat{\lambda}_{2}=\frac{1}{P}\Re\left\{\mathbf{P}^{H}\mathrm{Dgrad}f(\hat{\mathbf{P}})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}]\right\}\end{aligned}. (39)
Proof.

The proof and the details of Riemannian gradient and Riemannian Hessian in the TPC case are provided in Appendix D. ∎

From Theorem 1, ^\widehat{\mathcal{M}} is a sphere, whose retraction can be defined as

R𝐏^(𝝃𝐏^):T𝐏^^^:\displaystyle R_{\hat{\mathbf{P}}}\left({\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\right):T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}\to\widehat{\mathcal{M}}: (40)
𝝃𝐏^P(𝐏^+𝝃𝐏^)tr((𝐏^+𝝃𝐏^)H(𝐏^+𝝃𝐏^))=γ^(𝐏^+𝝃𝐏^).\displaystyle{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\mapsto\frac{\sqrt{P}\left(\hat{\mathbf{P}}+{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\right)}{\sqrt{\mathrm{tr}\left(\left(\hat{\mathbf{P}}+{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\right)^{H}\left(\hat{\mathbf{P}}+{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\right)\right)}}=\hat{\gamma}\left(\hat{\mathbf{P}}+{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\right).

In Riemannian submanifold, the vector transport 𝒯𝜼𝐏^(𝝃𝐏^)\mathcal{T}_{\boldsymbol{\eta}_{\widehat{\mathbf{P}}}}\left({\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\right) can be achieved like (10) by orthogonally projecting 𝝃𝐏^T𝐏^^{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\in T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}} onto TR𝐏^(𝜼𝐏^)^T_{R_{\hat{\mathbf{P}}}\left(\boldsymbol{\eta}_{\hat{\mathbf{P}}}\right)}\widehat{\mathcal{M}}. Let 𝐏^Re=R𝐏^(𝜼𝐏^)\hat{\mathbf{P}}^{\mathrm{Re}}=R_{\hat{\mathbf{P}}}\left(\boldsymbol{\eta}_{\hat{\mathbf{P}}}\right), we have

𝒯𝜼𝐏^(𝝃𝐏^)=ΠT𝐏^Re^T𝐏^^(𝝃𝐏^),\displaystyle\mathcal{T}_{\boldsymbol{\eta}_{\widehat{\mathbf{P}}}}\left({\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\right)=\Pi_{T_{\widehat{\mathbf{P}}^{\mathrm{Re}}}\widehat{\mathcal{M}}}^{T_{\widehat{\mathbf{P}}}\widehat{\mathcal{M}}}\left({\boldsymbol{\xi}}_{\hat{\mathbf{P}}}\right), (41)

where the projection can be derived in a similar way with Lemma 1.

III-C2 PUPC

From (2), the tangent space T𝐏~~T_{\tilde{\mathbf{P}}}{\widetilde{\mathcal{M}}} is

T𝐏~~=\displaystyle T_{\tilde{\mathbf{P}}}{\widetilde{\mathcal{M}}}= {𝝃𝐏T𝐏𝒩tr(𝐏iH𝝃𝐏i+𝝃𝐏iH𝐏i)=0,i𝒰}.\displaystyle\left\{{\boldsymbol{\xi}}_{\mathbf{P}}\in T_{\mathbf{P}}\mathcal{N}\mid\mathrm{tr}\left(\mathbf{P}_{i}^{H}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}+{\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}\mathbf{P}_{i}\right)=0,\forall i\in\mathcal{U}\right\}. (42)

The normal space of ~\widetilde{\mathcal{M}} can be expressed as

N𝐏~~=\displaystyle N_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}}= {𝐏𝚲~𝚲~𝒟~},\displaystyle\left\{\mathbf{P}\tilde{{\boldsymbol{\Lambda}}}\mid\tilde{{\boldsymbol{\Lambda}}}\in\widetilde{\mathcal{D}}\right\}, (43)

where 𝒟~={blkdiag(𝐃1,,𝐃U)𝐃i=ai𝐈di,ai}\widetilde{\mathcal{D}}=\left\{\mathrm{blkdiag}\left(\mathbf{D}_{1},\cdots,\mathbf{D}_{U}\right)\mid\mathbf{D}_{i}=a_{i}\mathbf{I}_{d_{i}},a_{i}\in\mathbb{R}\right\} is the block diagonal matrix subset whose dimension is UU. With the normal space, we can derive the orthogonal projection of a tangent vector from T𝐏𝒩T_{\mathbf{P}}\mathcal{N} to T𝐏~~T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}}.

Lemma 2.

For any 𝛏𝐏T𝐏𝒩{\boldsymbol{\xi}}_{\mathbf{P}}\in T_{\mathbf{P}}\mathcal{N}, the orthogonal projection ΠT𝐏~~T𝐏𝒩(𝛏𝐏)\Pi_{T_{\widetilde{\mathbf{P}}}\widetilde{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right) is given by

ΠT𝐏~~T𝐏𝒩(𝝃𝐏)=𝝃𝐏𝐏𝚲~,\displaystyle\Pi_{T_{\widetilde{\mathbf{P}}}\widetilde{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right)={\boldsymbol{\xi}}_{\mathbf{P}}-\mathbf{P}\tilde{{\boldsymbol{\Lambda}}}, (44a)
[𝚲~]i=1Pi{𝐏iH𝝃𝐏i}𝐈di.\displaystyle\big{[}\tilde{{\boldsymbol{\Lambda}}}\big{]}_{i}=\frac{1}{P_{i}}\Re\left\{\mathbf{P}_{i}^{H}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right\}\mathbf{I}_{d_{i}}. (44b)
Proof.

The proof is similar with Appendix C and thus omitted for brevity. ∎

The Riemannian gradient and Hessian in T𝐏~~T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}} can be obtained by projecting the Riemannian gradient and Hessian in T𝐏𝒩T_{\mathbf{P}}\mathcal{N} onto T𝐏~~T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}}.

Theorem 4.

The Riemannian gradient of f(𝐏~)f\big{(}\tilde{\mathbf{P}}\big{)} is

gradf(𝐏~)=gradf(𝐏)𝐏𝚲~1,\displaystyle\mathrm{grad}f\big{(}\tilde{\mathbf{P}}\big{)}=\mathrm{grad}f\left(\mathbf{P}\right)-\mathbf{P}\tilde{{\boldsymbol{\Lambda}}}_{1}, (45)

where

[𝚲~1]i=1Pi{tr(𝐏iHgradf(𝐏i))}𝐈di.\big{[}\tilde{{\boldsymbol{\Lambda}}}_{1}\big{]}_{i}=\frac{1}{P_{i}}\Re\Big{\{}\mathrm{tr}\left(\mathbf{P}_{i}^{H}\mathrm{grad}f\left(\mathbf{P}_{i}\right)\right)\Big{\}}\mathbf{I}_{d_{i}}. (46)

The Riemannian Hessian of f(𝐏~)f\big{(}\tilde{\mathbf{P}}\big{)} is

Hessf(𝐏~)[𝝃𝐏]=(Hessf(𝐏~1)[𝝃𝐏~1],,Hessf(𝐏~U)[𝝃𝐏~U])\displaystyle\mathrm{Hess}f(\tilde{\mathbf{P}})[{\boldsymbol{\xi}}_{\mathbf{P}}]=\left(\right.\mathrm{Hess}f(\tilde{\mathbf{P}}_{1})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{1}}],\cdots,\mathrm{Hess}f(\tilde{\mathbf{P}}_{U})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{U}}]\left.\right) (47)
=Dgradf(𝐏)[𝝃𝐏]𝐏D𝚲~1[𝝃𝐏]𝝃𝐏𝚲~1𝐏𝚲~2,\displaystyle=\mathrm{Dgrad}f(\mathbf{P})[{\boldsymbol{\xi}}_{\mathbf{P}}]-\mathbf{P}\mathrm{D}\tilde{{\boldsymbol{\Lambda}}}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}}]-{\boldsymbol{\xi}}_{\mathbf{P}}\tilde{{\boldsymbol{\Lambda}}}_{1}-\mathbf{P}\tilde{{\boldsymbol{\Lambda}}}_{2},

where

[𝚲~2]i=1Pi{tr(𝐏H(Dgradf(𝐏~)[𝝃𝐏~]))}𝐈di.\displaystyle\big{[}\tilde{{\boldsymbol{\Lambda}}}_{2}\big{]}_{i}=\frac{1}{P_{i}}\Re\left\{\mathrm{tr}\left(\mathbf{P}^{H}\left(\mathrm{Dgrad}f(\tilde{\mathbf{P}})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}]\right)\right)\right\}\mathbf{I}_{d_{i}}. (48)
Proof.

The proof of the Riemannian gradient is similar with Appendix D and thus omitted for brevity. The proof and details of the Riemannian Hessian are provided in Appendix E. ∎

From Theorem 1, ~\widetilde{\mathcal{M}} is a product of UU spheres called oblique manifold, whose retraction can be defined by scaling [29]. Define

γ~i\displaystyle\tilde{\gamma}_{i} =Pitr((𝐏~i+𝝃𝐏~i)H(𝐏~i+𝝃𝐏~i)),\displaystyle=\frac{\sqrt{P_{i}}}{\sqrt{\mathrm{tr}\left(\left(\tilde{\mathbf{P}}_{i}+{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{i}}\right)^{H}\left(\tilde{\mathbf{P}}_{i}+{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{i}}\right)\right)}}, (49)

and let 𝚪~i=γ~i𝐈di\tilde{{\boldsymbol{\Gamma}}}_{i}=\tilde{\gamma}_{i}\mathbf{I}_{d_{i}} and 𝚪~=blkdiag(𝚪~1,,𝚪~U)𝒟~\tilde{{\boldsymbol{\Gamma}}}=\mathrm{blkdiag}\left(\tilde{{\boldsymbol{\Gamma}}}_{1},\cdots,\tilde{{\boldsymbol{\Gamma}}}_{U}\right)\in\widetilde{\mathcal{D}}. For any 𝐏~~\tilde{\mathbf{P}}\in\widetilde{\mathcal{M}} and 𝝃𝐏~T𝐏~~{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}\in T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}}, the mapping

R𝐏~(𝝃𝐏~):T𝐏~~~:𝝃𝐏~(𝐏~+𝝃𝐏~)𝚪~R_{\tilde{\mathbf{P}}}\left({\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}\right):T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}}\to\widetilde{\mathcal{M}}:{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}\mapsto\left(\tilde{\mathbf{P}}+{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}\right)\tilde{{\boldsymbol{\Gamma}}} (50)

is a retraction for ~\widetilde{\mathcal{M}} at 𝐏~\tilde{\mathbf{P}}.

Similarly, the vector transport 𝒯𝜼𝐏~(𝝃𝐏~)\mathcal{T}_{\boldsymbol{\eta}_{\tilde{\mathbf{P}}}}\left({\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}\right) can be achieved by projecting 𝝃𝐏~T𝐏~~{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}\in T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}} onto TR𝐏~(𝜼𝐏~)~T_{R_{\tilde{\mathbf{P}}}\left(\boldsymbol{\eta}_{\tilde{\mathbf{P}}}\right)}\widetilde{\mathcal{M}} orthogonally. Let 𝐏~Re\tilde{\mathbf{P}}^{\mathrm{Re}} represent R𝐏~(𝜼𝐏~)R_{\tilde{\mathbf{P}}}\left(\boldsymbol{\eta}_{\tilde{\mathbf{P}}}\right), we have

𝒯𝜼𝐏~(𝝃𝐏~)=ΠT𝐏~Re~T𝐏~~(𝝃𝐏~),\displaystyle\mathcal{T}_{\boldsymbol{\eta}_{\widetilde{\mathbf{P}}}}\left({\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}\right)=\Pi_{T_{\widetilde{\mathbf{P}}^{\mathrm{Re}}}\widetilde{\mathcal{M}}}^{T_{\widetilde{\mathbf{P}}}\widetilde{\mathcal{M}}}\left({\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}\right), (51)

where the orthogonal projection operator can be obtained in a similar way with Lemma 2.

III-C3 PAPC

From (2), the tangent space T𝐏¯¯T_{\bar{\mathbf{P}}}{\overline{\mathcal{M}}} is

T𝐏¯¯\displaystyle T_{\bar{\mathbf{P}}}{\overline{\mathcal{M}}} ={𝝃𝐏T𝐏𝒩𝐈Mt(𝐏𝝃𝐏H+𝝃𝐏𝐏H)=0}.\displaystyle=\left\{{\boldsymbol{\xi}}_{\mathbf{P}}\in T_{\mathbf{P}}\mathcal{N}\mid\mathbf{I}_{M_{t}}\odot\left(\mathbf{P}{\boldsymbol{\xi}}_{\mathbf{P}}^{H}+{\boldsymbol{\xi}}_{\mathbf{P}}\mathbf{P}^{H}\right)=0\right\}. (52)

The normal space of ¯\overline{\mathcal{M}} is given by

N𝐏¯¯=\displaystyle N_{\bar{\mathbf{P}}}\overline{\mathcal{M}}= {𝚲¯𝐏𝚲¯𝒟¯},\displaystyle\left\{\bar{{\boldsymbol{\Lambda}}}\mathbf{P}\mid\bar{{\boldsymbol{\Lambda}}}\in\overline{\mathcal{D}}\right\}, (53)

where 𝒟¯={diag(𝐰)𝐰Mt}\overline{\mathcal{D}}=\left\{\mathrm{diag}\left(\mathbf{w}\right)\mid\mathbf{w}\in\mathbb{R}^{M_{t}}\right\} is the diagonal matrix set. Similarly, we can derive ΠT𝐏¯¯T𝐏𝒩(𝝃𝐏)\Pi_{T_{\overline{\mathbf{P}}}\overline{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right) with the closed-form normal space as follows.

Lemma 3.

For any 𝛏𝐏T𝐏𝒩{\boldsymbol{\xi}}_{\mathbf{P}}\in T_{\mathbf{P}}\mathcal{N}, the orthogonal projection ΠT𝐏¯¯T𝐏𝒩(𝛏𝐏)\Pi_{T_{\bar{\mathbf{P}}}\overline{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right) is given by

ΠT𝐏¯¯T𝐏𝒩(𝝃𝐏)=𝝃𝐏𝚲¯𝐏,\displaystyle\Pi_{T_{\overline{\mathbf{P}}}\overline{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right)={\boldsymbol{\xi}}_{\mathbf{P}}-\bar{{\boldsymbol{\Lambda}}}\mathbf{P}, (54a)
𝚲¯=MtP𝐈Mt{𝐏𝝃𝐏H}.\displaystyle\bar{{\boldsymbol{\Lambda}}}=\frac{M_{t}}{P}\mathbf{I}_{M_{t}}\odot\Re\left\{\mathbf{P}{\boldsymbol{\xi}}_{\mathbf{P}}^{H}\right\}. (54b)
Proof.

The proof is similar with Appendix C and thus omitted for brevity. ∎

With ΠT𝐏¯¯T𝐏𝒩()\Pi_{T_{\overline{\mathbf{P}}}\overline{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left(\cdot\right), the Riemannian gradient and Hessian in T𝐏¯¯T_{\bar{\mathbf{P}}}\overline{\mathcal{M}} are provided in the following theorem.

Theorem 5.

The Riemannian gradient of f(𝐏¯)f\left(\bar{\mathbf{P}}\right) is

gradf(𝐏¯)=gradf(𝐏)𝚲¯1𝐏,\displaystyle\mathrm{grad}f\left(\bar{\mathbf{P}}\right)=\mathrm{grad}f\left(\mathbf{P}\right)-\bar{{\boldsymbol{\Lambda}}}_{1}\mathbf{P}, (55)

where

𝚲¯1=MtP𝐈Mt{𝐏(gradf(𝐏))H}.\bar{{\boldsymbol{\Lambda}}}_{1}=\frac{M_{t}}{P}\mathbf{I}_{M_{t}}\odot\Re\left\{\mathbf{P}\left(\mathrm{grad}f\left(\mathbf{P}\right)\right)^{H}\right\}. (56)

The Riemannian Hessian of f(𝐏¯)f\left(\bar{\mathbf{P}}\right) is

Hessf(𝐏¯)[𝝃𝐏¯]=\displaystyle\mathrm{Hess}f(\bar{\mathbf{P}})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}]= (Hessf(𝐏¯1)[𝝃𝐏¯1],,Hessf(𝐏¯U)[𝝃𝐏¯U]),\displaystyle\left(\mathrm{Hess}f(\bar{\mathbf{P}}_{1})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}_{1}}],\cdots,\mathrm{Hess}f(\bar{\mathbf{P}}_{U})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}_{U}}]\right), (57)

where

Hessf(𝐏¯i)[𝝃𝐏¯i]=\displaystyle\mathrm{Hess}f(\bar{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}_{i}}]= (58)
Dgradf(𝐏i)[𝝃𝐏i]D𝚲¯1[𝝃𝐏i]𝐏i𝚲¯1𝝃𝐏i𝚲¯2𝐏i,\displaystyle\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]-\mathrm{D}\bar{{\boldsymbol{\Lambda}}}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]\mathbf{P}_{i}-\bar{{\boldsymbol{\Lambda}}}_{1}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}-\bar{{\boldsymbol{\Lambda}}}_{2}\mathbf{P}_{i},
𝚲¯2=MtP𝐈Mt=1K{𝐏(Dgradf(𝐏¯)[𝝃𝐏¯])H}.\displaystyle\bar{{\boldsymbol{\Lambda}}}_{2}=\frac{M_{t}}{P}\mathbf{I}_{M_{t}}\odot\sum_{\ell=1}^{K}\Re\left\{\mathbf{P}_{\ell}\left(\mathrm{Dgrad}f(\bar{\mathbf{P}}_{\ell})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}_{\ell}}]\right)^{H}\right\}. (59)
Proof.

The proof of the Riemannian gradient is similar with Appendix D and thus omitted for brevity. The proof and details of the Riemannian Hessian are provided in Appendix F. ∎

Like PUPC, the retraction here can also be defined by scaling [29]. Let

𝚪¯\displaystyle\bar{{\boldsymbol{\Gamma}}} =(MtP𝐈Mt(𝐏¯+𝝃𝐏¯)(𝐏¯+𝝃𝐏¯)H)12.\displaystyle=\left(\frac{M_{t}}{P}\mathbf{I}_{M_{t}}\odot\left(\bar{\mathbf{P}}+{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\right)\left(\bar{\mathbf{P}}+{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\right)^{H}\right)^{-\frac{1}{2}}. (60)

For any 𝐏¯¯\bar{\mathbf{P}}\in\overline{\mathcal{M}} and 𝝃𝐏¯T𝐏¯¯{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\in T_{\bar{\mathbf{P}}}\overline{\mathcal{M}}, the mapping

R𝐏¯(𝝃𝐏¯):T𝐏¯¯¯:𝝃𝐏¯𝚪¯(𝐏¯+𝝃𝐏¯)R_{\bar{\mathbf{P}}}\left({\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\right):T_{\bar{\mathbf{P}}}\overline{\mathcal{M}}\to\overline{\mathcal{M}}:{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\mapsto\bar{{\boldsymbol{\Gamma}}}\left(\bar{\mathbf{P}}+{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\right) (61)

is a retraction for ¯\overline{\mathcal{M}} at 𝐏¯\bar{\mathbf{P}}.

The vector transport 𝒯𝜼𝐏¯(𝝃𝐏¯)\mathcal{T}_{\boldsymbol{\eta}_{\bar{\mathbf{P}}}}\left({\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\right) likewise can be achieved by orthogonally projecting 𝝃𝐏¯T𝐏¯¯{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\in T_{\bar{\mathbf{P}}}\overline{\mathcal{M}} onto TR𝐏¯(𝜼𝐏¯)¯T_{R_{\bar{\mathbf{P}}}\left(\boldsymbol{\eta}_{\bar{\mathbf{P}}}\right)}\overline{\mathcal{M}}. Let 𝐏¯Re\bar{\mathbf{P}}^{\mathrm{Re}} denote R𝐏¯(𝜼𝐏¯)R_{\bar{\mathbf{P}}}\left(\boldsymbol{\eta}_{\bar{\mathbf{P}}}\right), we have

𝒯𝜼𝐏¯(𝝃𝐏¯)=ΠT𝐏¯Re¯T𝐏¯¯(𝝃𝐏¯),\displaystyle\mathcal{T}_{\boldsymbol{\eta}_{\overline{\mathbf{P}}}}\left({\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\right)=\Pi_{T_{\overline{\mathbf{P}}^{\mathrm{Re}}}\overline{\mathcal{M}}}^{T_{\overline{\mathbf{P}}}\overline{\mathcal{M}}}\left({\boldsymbol{\xi}}_{\bar{\mathbf{P}}}\right), (62)

which can be derived in a similar way with Lemma 3.

IV Riemannian Methods for Precoder Design

With the Riemannian ingredients derived in Section III, we propose three precoder design methods using the RSD, RCG and RTR in this section. There is no inverse of the large dimensional matrix in the proposed Riemannian methods during the iterations, thereby enabling significant savings in computational resources. For the same power constraint, the computational complexities of the RSD or RCG method are nearly the same and are lower than those of the RTR and comparable methods.

IV-A Riemannian Gradient Methods

The gradient descent method is one of the most well-known and efficient line search methods in Euclidean space for unconstrained problems. For the Riemannian manifold, we update the point through retraction to ensure the updated point is still on the manifold and preserve the search direction. For notational clarity, we use the superscript kk to stand for the outer iteration. From (40), (50) and (61), the update formula on ^\widehat{\mathcal{M}}, ~\widetilde{\mathcal{M}} and ¯\overline{\mathcal{M}} are given, respectively, by

𝐏^ik+1\displaystyle\hat{\mathbf{P}}_{i}^{k+1} =γ^k(𝐏^ik+αk𝜼^ik),\displaystyle=\hat{\gamma}^{k}\left(\hat{\mathbf{P}}_{i}^{k}+\alpha^{k}\hat{\boldsymbol{\eta}}_{i}^{k}\right), (63a)
𝐏~ik+1\displaystyle\tilde{\mathbf{P}}_{i}^{k+1} =γ~ik(𝐏~ik+αk𝜼~ik),\displaystyle=\tilde{\gamma}_{i}^{k}\left(\tilde{\mathbf{P}}_{i}^{k}+\alpha^{k}\tilde{\boldsymbol{\eta}}_{i}^{k}\right), (63b)
𝐏¯ik+1\displaystyle\bar{\mathbf{P}}_{i}^{k+1} =𝚪¯k(𝐏¯ik+αk𝜼¯ik),\displaystyle=\bar{{\boldsymbol{\Gamma}}}^{k}\left(\bar{\mathbf{P}}_{i}^{k}+\alpha^{k}\bar{\boldsymbol{\eta}}_{i}^{k}\right), (63c)

where αk\alpha^{k} is the step length and 𝜼^ik\hat{\boldsymbol{\eta}}_{i}^{k}, 𝜼~ik\tilde{\boldsymbol{\eta}}_{i}^{k} and 𝜼¯ik\bar{\boldsymbol{\eta}}_{i}^{k} are the search directions of user ii. With (63) and the Riemannian gradient, the RSD method is available but converges slowly. The RCG method provides a remedy to this drawback by modifying the search direction, which calls for the sum of the current Riemannian gradient and the previous search direction [28, 30]. To add the elements in different tangent spaces, vector transport defined in (10) is utilized. We use \mathcal{M} to represent ^\widehat{\mathcal{M}}, ~\widetilde{\mathcal{M}} or ¯\overline{\mathcal{M}}. The search direction of the RCG method in the kk-th iteration is

𝜼k=gradf(𝐏k)+βk𝒯αk1𝜼k1(𝜼k1),\displaystyle\boldsymbol{\eta}^{k}=-\mathrm{grad}f\left(\mathbf{P}^{k}\right)+\beta^{k}\mathcal{T}^{\mathcal{M}}_{\alpha^{k-1}{{\boldsymbol{\eta}}}^{k-1}}\left({{\boldsymbol{\eta}}}^{k-1}\right), (64)

where βk{{\beta}}^{k} is chosen as the Fletcher-Reeves parameter:

βk=g𝐏k(gradf(𝐏k),gradf(𝐏k))g𝐏k1(gradf(𝐏k1),gradf(𝐏k1)).\displaystyle{{\beta}}^{k}=\frac{g^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left(\mathrm{grad}f\left({{\mathbf{P}}}^{k}\right),\mathrm{grad}f\left({{\mathbf{P}}}^{k}\right)\right)}{g^{\mathcal{M}}_{{{\mathbf{P}}}^{k-1}}\left(\mathrm{grad}f\left({{\mathbf{P}}}^{k-1}\right),\mathrm{grad}f\left({{\mathbf{P}}}^{k-1}\right)\right)}. (65)

RSD and RCG both need to compute the Riemannian gradient, which is made up of the Euclidean gradient (27) and the orthogonal projection. Let 𝐕i,k=𝐇i𝐏k{{\mathbf{V}}}_{i,\ell}^{k}=\mathbf{H}_{i}{{\mathbf{P}}}_{\ell}^{k} and 𝐔i,k=𝐇i𝜼k,i,𝒰{{\mathbf{U}}}_{i,\ell}^{k}=\mathbf{H}_{i}{{\boldsymbol{\eta}}}_{\ell}^{k},\forall i,\ell\in\mathcal{U}, 𝐑i\mathbf{R}_{i} in the kk-th iteration can be written as

𝐑ik=\displaystyle\mathbf{R}_{i}^{k}= σz2𝐈Mi+iU𝐕i,k(𝐕i,k)H,i𝒰.\displaystyle\sigma_{z}^{2}\mathbf{I}_{M_{i}}+\sum_{\ell\neq i}^{U}{{\mathbf{V}}}_{i,\ell}^{k}\left({{\mathbf{V}}}_{i,\ell}^{k}\right)^{H},\forall i\in\mathcal{U}. (66)

Then the Euclidean gradient of the ii-th UT, i𝒰\forall i\in\mathcal{U}, in the kk-th iteration can be written as

gradf(𝐏ik)=2wi𝐇iH𝐑i1𝐕i,ik(𝐈di+(𝐕i,ik)H𝐑i1𝐕i,ik)1\displaystyle\mathrm{grad}f\left({{\mathbf{P}}}_{i}^{k}\right)=-2w_{i}\mathbf{H}_{i}^{H}\mathbf{R}_{i}^{-1}{{\mathbf{V}}}_{i,i}^{k}\left(\mathbf{I}_{d_{i}}+\left({{\mathbf{V}}}_{i,i}^{k}\right)^{H}\mathbf{R}_{i}^{-1}{{\mathbf{V}}}_{i,i}^{k}\right)^{-1} (67)
+2iw𝐇H𝐑1𝐕,k(𝐈d+(𝐕,k)H𝐑1𝐕,k)1×\displaystyle+2\sum_{\ell\neq i}w_{\ell}\mathbf{H}_{\ell}^{H}\mathbf{R}_{\ell}^{-1}{{\mathbf{V}}}_{\ell,\ell}^{k}\left(\mathbf{I}_{d_{\ell}}+\left({{\mathbf{V}}}_{\ell,\ell}^{k}\right)^{H}\mathbf{R}_{\ell}^{-1}{{\mathbf{V}}}_{\ell,\ell}^{k}\right)^{-1}\times
(𝐕,k)H𝐑1𝐕,ik,\displaystyle\left({{\mathbf{V}}}_{\ell,\ell}^{k}\right)^{H}\mathbf{R}_{\ell}^{-1}{{\mathbf{V}}}_{\ell,i}^{k},

where 𝐕i,k,i,𝒰{{\mathbf{V}}}_{i,\ell}^{k},\forall i,\ell\in\mathcal{U}, can be obtained from (63) and expressed as

𝐕i,k\displaystyle{{\mathbf{V}}}_{i,\ell}^{k} =γ^k1(𝐕^i,k1+αk1𝐔^i,k1)for^,\displaystyle=\hat{\gamma}^{k-1}\left(\hat{\mathbf{V}}_{i,\ell}^{k-1}+\alpha^{k-1}\hat{\mathbf{U}}_{i,\ell}^{k-1}\right)\ \ \ \text{for}\ \widehat{\mathcal{M}}, (68a)
𝐕i,k\displaystyle{{\mathbf{V}}}_{i,\ell}^{k} =γ~k1(𝐕~i,k1+αk1𝐔~i,k1)for~,\displaystyle=\tilde{\gamma}_{\ell}^{k-1}\left(\tilde{\mathbf{V}}_{i,\ell}^{k-1}+\alpha^{k-1}\tilde{\mathbf{U}}_{i,\ell}^{k-1}\right)\ \ \ \text{for}\ \widetilde{\mathcal{M}}, (68b)
𝐕i,k\displaystyle{{\mathbf{V}}}_{i,\ell}^{k} =𝐇i𝚪¯k1(𝐏¯k1+αk1𝜼¯k1)for¯.\displaystyle=\mathbf{H}_{i}\bar{{\boldsymbol{\Gamma}}}^{k-1}\left(\bar{\mathbf{P}}_{\ell}^{k-1}+\alpha^{k-1}\bar{\boldsymbol{\eta}}_{\ell}^{k-1}\right)\ \text{for}\ \overline{\mathcal{M}}. (68c)

We can see that 𝐕i,k,i,𝒰{{\mathbf{V}}}_{i,\ell}^{k},\forall i,\ell\in\mathcal{U}, can be obtained directly from 𝐕i,k1{{\mathbf{V}}}_{i,\ell}^{k-1} and 𝐔i,k1{{\mathbf{U}}}_{i,\ell}^{k-1} for ^\widehat{\mathcal{M}} and ~\widetilde{\mathcal{M}}. On the contrary, 𝐕i,k{{\mathbf{V}}}_{i,\ell}^{k} needs to be computed in each iteration for ¯\overline{\mathcal{M}}.

The step length αk\alpha^{k} in (68) can be obtained through the backtracking method [31], where the objective function needs to be evaluated to ensure sufficient decrease. Compared with the projected steepest descent method [32], only one step length need to be searched in our proposed RSD and RCG methods. For notational clarity, we use the superscript pair (k,n)(k,n) to denote the nn-th inner iteration for searching for the step length during the kk-th outer iteration. As the precoder 𝐏k\mathbf{P}^{k} and the search direction 𝜼k\boldsymbol{\eta}^{k} are fixed when searching for the step length, the objective function in the (k,n)(k,n)-th iteration can be viewed as a function of αk,n1\alpha^{k,n-1} and written as

ϕ(αk,n1)=f(R𝐏k,n1(αk,n1𝜼k,n1))=\displaystyle\phi\left(\alpha^{k,n-1}\right)=f\left(R_{{{\mathbf{P}}}^{k,n-1}}\left(\alpha^{k,n-1}{{\boldsymbol{\eta}}}^{k,n-1}\right)\right)= (69)
i=1Ulogdet(σz2𝐈Mi+U𝐕i,k,n(𝐕i,k,n)H)\displaystyle-\sum_{i=1}^{U}\log\det\left(\sigma_{z}^{2}\mathbf{I}_{M_{i}}+\sum_{\ell}^{U}{{\mathbf{V}}}_{i,\ell}^{k,n}\left({{\mathbf{V}}}_{i,\ell}^{k,n}\right)^{H}\right)
+i=1Ulogdet(σz2𝐈Mi+iU𝐕i,k,n(𝐕i,k,n)H),\displaystyle+\sum_{i=1}^{U}\log\det\left(\sigma_{z}^{2}\mathbf{I}_{M_{i}}+\sum_{\ell\neq i}^{U}{{\mathbf{V}}}_{i,\ell}^{k,n}\left({{\mathbf{V}}}_{i,\ell}^{k,n}\right)^{H}\right),

where 𝐕i,k,n=𝐕i,k,n(αk,n1),i,𝒰{{\mathbf{V}}}_{i,\ell}^{k,n}={{\mathbf{V}}}_{i,\ell}^{k,n}\left(\alpha^{k,n-1}\right),\forall i,\ell\in\mathcal{U}, is viewed as a function of αk,n1\alpha^{k,n-1}. Note that 𝐕i,k,n(αk,n1){{\mathbf{V}}}_{i,\ell}^{k,n}\left(\alpha^{k,n-1}\right) can be derived in the same way as 𝐕i,k{{\mathbf{V}}}_{i,\ell}^{k} in (68) and are given by

𝐕i,k,n(αk,n1)\displaystyle{{\mathbf{V}}}_{i,\ell}^{k,n}\left(\alpha^{k,n-1}\right) =γ^(αk,n1)(𝐕^i,k+αk,n1𝐔^i,k)for^,\displaystyle=\hat{\gamma}\left(\alpha^{k,n-1}\right)\left(\hat{\mathbf{V}}_{i,\ell}^{k}+\alpha^{k,n-1}\hat{\mathbf{U}}_{i,\ell}^{k}\right)\ \ \text{for}\ \widehat{\mathcal{M}}, (70a)
𝐕i,k,n(αk,n1)\displaystyle{{\mathbf{V}}}_{i,\ell}^{k,n}\left(\alpha^{k,n-1}\right) =γ~(αk,n1)(𝐕~i,k+αk,n1𝐔~i,k)for~,\displaystyle=\tilde{\gamma}_{\ell}\left(\alpha^{k,n-1}\right)\left(\tilde{\mathbf{V}}_{i,\ell}^{k}+\alpha^{k,n-1}\tilde{\mathbf{U}}_{i,\ell}^{k}\right)\ \text{for}\ \widetilde{\mathcal{M}}, (70b)
𝐕i,k,n(αk,n1)\displaystyle{{\mathbf{V}}}_{i,\ell}^{k,n}\left(\alpha^{k,n-1}\right) =𝐇i𝚪¯(αk,n1)(𝐏k+αk,n1𝜼¯k)for¯.\displaystyle=\mathbf{H}_{i}\bar{{\boldsymbol{\Gamma}}}\left(\alpha^{k,n-1}\right)\left(\mathbf{P}_{\ell}^{k}+\alpha^{k,n-1}\bar{\boldsymbol{\eta}}_{\ell}^{k}\right)\ \ \text{for}\ \overline{\mathcal{M}}. (70c)

Similarly, 𝐕i,k,n(αk,n1){{\mathbf{V}}}_{i,\ell}^{k,n}\left(\alpha^{k,n-1}\right) can be obtained directly from 𝐕i,k{{\mathbf{V}}}_{i,\ell}^{k} and 𝐔i,k{{\mathbf{U}}}_{i,\ell}^{k} for ^\widehat{\mathcal{M}} and ~\widetilde{\mathcal{M}}, while it needs to be computed once per inner iteration for ¯\overline{\mathcal{M}}. Note that γ^(αk,n1)\hat{\gamma}\left(\alpha^{k,n-1}\right), γ~(αk,n1)\tilde{\gamma}_{\ell}\left(\alpha^{k,n-1}\right) and 𝚪¯(αk,n1)\bar{{\boldsymbol{\Gamma}}}\left(\alpha^{k,n-1}\right) defined in (40), (49) and (60) are viewed as functions of αk,n1\alpha^{k,n-1} here.

The procedure of RSD and RCG methods for precoder design is provided in Algorithm 1. RSD method is available by updating the search direction 𝜼k+1{{\boldsymbol{\eta}}}^{k+1} with gradf(𝐏k+1)-\mathrm{grad}f\left({{\mathbf{P}}}^{k+1}\right), while RCG method is achieved by updating the search direction 𝜼k+1{{\boldsymbol{\eta}}}^{k+1} with (64). Typically, r=0.5r=0.5 and c=104c=10^{-4}. The analyses of the convergence properties of the RSD and RCG methods can be found in Appendix G.

Algorithm 1 RSD and RCG methods for precoder design
0:  Riemannian submanifold \mathcal{M}; Riemannian metric g𝐏()g_{{{\mathbf{P}}}}\left(\cdot\right); Real-valued function ff; Retraction R𝐏()R_{{{\mathbf{P}}}}\left(\cdot\right); Vector transport 𝒯𝜼()\mathcal{T}_{{{\boldsymbol{\eta}}}}\left(\cdot\right); initial step length α0>0\alpha^{0}>0; r(0,1)r\in\left(0,1\right); c(0,1)c\in\left(0,1\right)
0:  Initial point 𝐏0{{\mathbf{P}}}^{0};
1:  repeat
2:     Get 𝐕i,k,i,𝒰{{\mathbf{V}}}_{i,\ell}^{k},\forall i,\ell\in\mathcal{U} with (68).
3:     Compute Euclidean gradient gradf(𝐏ik),i,𝒰\mathrm{grad}f({{\mathbf{P}}}_{i}^{k}),\forall i,\ell\in\mathcal{U} with (67).
4:     Get Riemannian gradient gradf(𝐏k)\mathrm{grad}f({{\mathbf{P}}}^{k}) with (35), (45) or (55).
5:     Update the search direction 𝜼k{{\boldsymbol{\eta}}}^{k} and compute 𝐔i,k,i,𝒰{{\mathbf{U}}}_{i,\ell}^{k},\forall i,\ell\in\mathcal{U}.
6:     while ϕ(αk,n1)f(𝐏k)\phi\left(\alpha^{k,n-1}\right)-f\left({{\mathbf{P}}}^{k}\right) c×g𝐏k(gradf(𝐏k),\geq c\times g^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left(\mathrm{grad}f\left({{\mathbf{P}}}^{k}\right),\right. αk,n1𝜼k)\left.\alpha^{k,n-1}{{\boldsymbol{\eta}}}^{k}\right) do
7:        Set αk,nrαk,n1\alpha^{k,n}\leftarrow r\alpha^{k,n-1} with αk,0=α0\alpha^{k,0}=\alpha^{0}.
8:        Get 𝐏k,n+1{{\mathbf{P}}}^{k,n+1} with (63) and 𝐕i,k,n(αk,n),i,𝒰{{\mathbf{V}}}_{i,\ell}^{k,n}\left(\alpha^{k,n}\right),\forall i,\ell\in\mathcal{U} with (70).
9:        Get ϕ(αk,n)\phi\left(\alpha^{k,n}\right) with (69), nn+1n\leftarrow n+1.
10:     end while
11:     Set 𝐏k+1𝐏k,n{{\mathbf{P}}}^{k+1}\leftarrow{{\mathbf{P}}}^{k,n}, kk+1k\leftarrow k+1.
12:  until convergence

IV-B Riemannian Trust Region Method

To implement the trust region method on Riemannian submanifold, the quadratic model is indispensable, which is the approximation of ff on a neighborhood of 𝐏{{\mathbf{P}}} defined on T𝐏T_{{{\mathbf{P}}}}\mathcal{M} [33]. Typically, the basic choice of a quadratic model in the kk-th iteration is

m𝐏k(𝝃𝐏k)=\displaystyle m_{{{\mathbf{P}}}^{k}}\left({\boldsymbol{\xi}}_{{{\mathbf{P}}}^{k}}\right)= f(𝐏k)+g𝐏k(gradf(𝐏k),𝝃𝐏k)\displaystyle f\left({{\mathbf{P}}}^{k}\right)+g^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left(\mathrm{grad}f\left({{\mathbf{P}}}^{k}\right),{\boldsymbol{\xi}}_{{{\mathbf{P}}}^{k}}\right) (71)
+12g𝐏k(Hessf(𝐏k)[𝝃𝐏k],𝝃𝐏k),\displaystyle+\frac{1}{2}g^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left(\mathrm{Hess}f\left({{\mathbf{P}}}^{k}\right)\left[{\boldsymbol{\xi}}_{{{\mathbf{P}}}^{k}}\right],{\boldsymbol{\xi}}_{{{\mathbf{P}}}^{k}}\right),

where Hessf(𝐏k)[𝝃𝐏k]\mathrm{Hess}f\left({{\mathbf{P}}}^{k}\right)\left[{\boldsymbol{\xi}}_{{{\mathbf{P}}}^{k}}\right] is used to form a quadratic model around the current iteration 𝐏k\mathbf{P}^{k} and is not necessarily positive semidefinite. Hessf(𝐏k)[𝝃𝐏k]\mathrm{Hess}f\left({{\mathbf{P}}}^{k}\right)\left[{\boldsymbol{\xi}}_{{{\mathbf{P}}}^{k}}\right] can be obtained from (38), (47) and (58). The search direction and the step length are obtained simultaneously by solving the trust-region subproblem

min𝜼kT𝐏km𝐏k(𝜼k)s.t.g𝐏k(𝜼k,𝜼k)(δk)2,\displaystyle\min_{{{\boldsymbol{\eta}}}^{k}\in T_{{{\mathbf{P}}}^{k}}\mathcal{M}}m_{{{\mathbf{P}}}^{k}}\left({{\boldsymbol{\eta}}}^{k}\right)\ \ \mathrm{s.t.}\ g^{\mathcal{M}}_{\mathbf{P}^{k}}\left({{\boldsymbol{\eta}}}^{k},{{\boldsymbol{\eta}}}^{k}\right)\leq\left(\delta^{k}\right)^{2}, (72)

where δk\delta^{k} is the trust-region radius in the kk-th iteration. (72) can be solved by truncated conjugate gradient (tCG) method [34, 22], where the Riemannian Hessian needs to be computed according to (38), (47) or (58). The most common stopping criteria of tCG is to truncate after a fixed number of iterations, denoted by NsubN_{\text{sub}} [35]. Once 𝜼k{{\boldsymbol{\eta}}}^{k} is obtained, the quality of the quadratic model m𝐏km_{{{\mathbf{P}}}^{k}} is evaluated by the quotient

ρk=f(𝐏k)f(𝐏k+1)m𝐏k(𝟎)m𝐏k(𝜼k).\displaystyle\rho^{k}=\frac{f\left({{\mathbf{P}}}^{k}\right)-f\left({{\mathbf{P}}}^{k+1}\right)}{m_{\mathbf{P}^{k}}\left(\boldsymbol{0}\right)-m_{\mathbf{P}^{k}}\left({{\boldsymbol{\eta}}}^{k}\right)}. (73)

The convergence analyses of the RTR method are available in Appendix G.

The RTR method for precoder design is summarized in Algorithm 2, where we use the superscript pair (k,d)(k,d) for the iteration of solving the trust region subproblem to distinguish from (k,n)(k,n) for the iteration of searching for the step length. NsubN_{\text{sub}} is the maximum inner iteration number. In the Step 7 of Algorithm 2, tt is found by computing the positive root of the quadratic equation [22]

t2g𝐏k(𝐐k,d1,𝐐k,d1)+2tg𝐏k(𝜼k,d1,𝐐k,d1)\displaystyle t^{2}g_{\mathbf{P}^{k}}^{\mathcal{M}}\left({{\mathbf{Q}}}^{k,d-1},{{\mathbf{Q}}}^{k,d-1}\right)+2tg_{\mathbf{P}^{k}}^{\mathcal{M}}\left({{\boldsymbol{\eta}}}^{k,d-1},{{\mathbf{Q}}}^{k,d-1}\right) (74)
=(δk)2g𝐏k(𝜼k,d1,𝜼k,d1).\displaystyle=\left(\delta^{k}\right)^{2}-g_{\mathbf{P}^{k}}^{\mathcal{M}}\left({{\boldsymbol{\eta}}}^{k,d-1},{{\boldsymbol{\eta}}}^{k,d-1}\right).
Algorithm 2 RTR method for precoder design
0:  Riemannian submanifold \mathcal{M}; Riemannian metric g𝐏()g_{{{\mathbf{P}}}}\left(\cdot\right); Real-valued function ff; Retraction R𝐏()R_{{{\mathbf{P}}}}\left(\cdot\right); initial region δ0(0,Δ]\delta_{0}\in(0,\Delta]; threshold ρ\rho^{\prime}; inner iteration number NsubN_{\text{sub}}.
0:  Initial point 𝐏0{{\mathbf{P}}}^{0}, 𝐐1,0=𝐆1,0=gradf(𝐏0){{\mathbf{Q}}}^{1,0}={{\mathbf{G}}}^{1,0}=-\mathrm{grad}f({{\mathbf{P}}}^{0}), 𝜼1,0=𝟎{{\boldsymbol{\eta}}}^{1,0}=\boldsymbol{0},
1:  repeat
2:     for d=1,2,,Nsubd=1,2,\cdots,N_{\text{sub}} do
3:        Compute Hessf(𝐏k)[𝐐k,d1]\mathrm{Hess}f({{\mathbf{P}}}^{k})\left[{{\mathbf{Q}}}^{k,d-1}\right] with (38), (47) or (58).
4:        τk,d=g𝐏k(𝐆k,d1,𝐆k,d1)g𝐏k,d1(𝐐k,d1,Hessf(𝐏k)[𝐐k,d1])\tau^{k,d}=\frac{g^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left({{\mathbf{G}}}^{k,d-1},{{\mathbf{G}}}^{k,d-1}\right)}{g^{\mathcal{M}}_{{{\mathbf{P}}}^{k,d-1}}\left({{\mathbf{Q}}}^{k,d-1},\mathrm{Hess}f({{\mathbf{P}}}^{k})\left[{{\mathbf{Q}}}^{k,d-1}\right]\right)}.
5:        𝜼k,d=𝜼k,d1+τk,d1𝐐k,d1{{\boldsymbol{\eta}}}^{k,d}={{\boldsymbol{\eta}}}^{k,d-1}+\tau^{k,d-1}{{\mathbf{Q}}}^{k,d-1}.
6:        if τk,d0\tau^{k,d}\leq 0 or g𝐏k(𝜼k,d,𝜼k,d)δkg^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left({{\boldsymbol{\eta}}}^{k,d},{{\boldsymbol{\eta}}}^{k,d}\right)\geq\delta^{k} then
7:           𝜼k,d=𝜼k,d1+t𝐐k,d1{{\boldsymbol{\eta}}}^{k,d}={{\boldsymbol{\eta}}}^{k,d-1}+t{{\mathbf{Q}}}^{k,d-1} with t>0t>0 and g(𝜼k,d,𝜼k,d)=(δk)2g({{\boldsymbol{\eta}}}^{k,d},{{\boldsymbol{\eta}}}^{k,d})=\left(\delta^{k}\right)^{2}, break.
8:        end if
9:        𝐆k,d=𝐆k,d1τk,dHessf(𝐏k)[𝐐k,d1]{{\mathbf{G}}}^{k,d}={{\mathbf{G}}}^{k,d-1}-\tau^{k,d}\mathrm{Hess}f({{\mathbf{P}}}^{k})\left[{{\mathbf{Q}}}^{k,d-1}\right].
10:        𝐐k,d=𝐆k,d+g𝐏k(𝐆k,d,𝐆k,d)g𝐏k(𝐆k,d1,𝐆k,d1)𝐐k,d1{{\mathbf{Q}}}^{k,d}={{\mathbf{G}}}^{k,d}+\frac{g^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left({{\mathbf{G}}}^{k,d},{{\mathbf{G}}}^{k,d}\right)}{g^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left({{\mathbf{G}}}^{k,d-1},{{\mathbf{G}}}^{k,d-1}\right)}{{\mathbf{Q}}}^{k,d-1}.
11:     end for
12:     𝜼k=𝜼k,d{{\boldsymbol{\eta}}}^{k}={{\boldsymbol{\eta}}}^{k,d}, ν=g𝐏k(𝜼k,𝜼k)\nu=g^{\mathcal{M}}_{{{\mathbf{P}}}^{k}}\left({{\boldsymbol{\eta}}}^{k},{{\boldsymbol{\eta}}}^{k}\right).
13:     Evaluate ρk\rho^{k} with (73).
14:     Update 𝐏k+1{{\mathbf{P}}}^{k+1} and δk+1\delta^{k+1} according to ρk\rho^{k} with
𝐏k+1={(63),ifρk>ρ𝐏k,ifρkρ,\displaystyle{{\mathbf{P}}}^{k+1}=\begin{cases}\eqref{Retraction},\ \text{if}\ \rho^{k}>\rho^{\prime}\\ {{\mathbf{P}}}^{k},\ \text{if}\ \rho^{k}\leq\rho^{\prime}\end{cases},
δk+1={14δk,ifρk<14min(2δk,Δ),ifρk>34&ν=δkδk,otherwise.\displaystyle\delta^{k+1}=\begin{cases}\frac{1}{4}\delta^{k},\text{if}\ \rho^{k}<\frac{1}{4}\\ \min\left(2\delta^{k},\Delta\right),\text{if}\ \rho^{k}>\frac{3}{4}\ \&\ \nu=\delta^{k}\\ \delta^{k},\text{otherwise}\end{cases}.
15:     Set kk+1k\leftarrow k+1.
16:  until convergence

IV-C Computational Complexity of Riemannian Methods

Let Nr=i=1UMiN_{r}=\sum_{i=1}^{U}M_{i} and Nd=i=1UdiN_{d}=\sum_{i=1}^{U}d_{i} denote the total antenna number of the users and the total data stream number, respectively. Let NoutN_{\text{out}} and NinN_{\text{in}} denote the numbers of outer iteration and inner iteration of searching for the step length, respectively. The computational costs of the Riemannian metric, orthogonal projection, retraction and vector transport are the same, i.e., MtNdM_{t}N_{d}. Although they need to be called several times per outer iteration in the proposed Riemannian methods, the total computational costs can still be neglected as they are much lower than that of the Riemannian gradient or Riemannian Hessian. For RSD and RCG methods on ^\widehat{\mathcal{M}} and ~\widetilde{\mathcal{M}}, 𝐕i,k{{\mathbf{V}}}_{i,\ell}^{k} and 𝐕i,k,n,i,𝒰{{\mathbf{V}}}_{i,\ell}^{k,n},\forall i,\ell\in\mathcal{U}, in Algorithm 1 can be obtained directly from the elements computed in the previous iteration with (68a), (68b), (70a) and (70b). Moreover, the dimension of the input of the cost function logdet()\log\det\left(\cdot\right) is MiM_{i} for the ii-th UT. Thus, the computational cost of the whole inner iteration per outer iteration is O(2Nini=1UMi3)O\left(2N_{\text{in}}\sum_{i=1}^{U}M_{i}^{3}\right), which is very low and can be omitted. So the computational cost mainly comes from Step 3 and Step 5 in Algorithm 1 during the iterations, and the complexity is O(MtNrNd+MtNd2)NoutO(M_{t}N_{r}N_{d}+M_{t}N_{d}^{2})N_{\text{out}}. For RSD and RCG methods on ¯\overline{\mathcal{M}}, 𝐕¯i,k,n,i,𝒰\bar{\mathbf{V}}_{i,\ell}^{k,n},\forall i,\ell\in\mathcal{U}, have to be computed once for each inner iteration according to (70c), with the total complexity of O((Nin+1)MtNrNd+MtNd2)NoutO((N_{\text{in}}+1)M_{t}N_{r}N_{d}+M_{t}N_{d}^{2})N_{\text{out}}. The main computational cost of the RTR method depends on Step 3 in Algorithm 2, where the Riemannian Hessian needs to be computed once in each inner iteration. The computational cost of the Riemannian Hessian mainly comes from the matrix multiplication according to Appendix D, Appendix E and Appendix F. The maximum complexity of the RTR method is O(8NsubMtNrNd)NoutO(8N_{\text{sub}}M_{t}N_{r}N_{d})N_{\text{out}} on ^\widehat{\mathcal{M}} or ¯\overline{\mathcal{M}} and O(4NsubMtNrNd)NoutO(4N_{\text{sub}}M_{t}N_{r}N_{d})N_{\text{out}} on ~\widetilde{\mathcal{M}}. Typically, we have Nin10N_{\text{in}}\leq 10 with r=0.5r=0.5, c=104c=10^{-4} and α0=103\alpha^{0}=10^{-3}, and Nsub10N_{\text{sub}}\leq 10.

The computational complexities of different Riemannian design methods are summarized in Table II. For the same power constraint, we can see that RSD and RCG methods have the same computational complexity, which is lower than that of the RTR method.

TABLE II: Computational complexity of different Riemannian design methods
RSD RCG RTR
TPC O(MtNrNd+MtNd2)NoutO(M_{t}N_{r}N_{d}+M_{t}N_{d}^{2})N_{\text{out}} O(MtNrNd+MtNd2)NoutO(M_{t}N_{r}N_{d}+M_{t}N_{d}^{2})N_{\text{out}} O(8NsubMtNrNd)NoutO(8N_{\text{sub}}M_{t}N_{r}N_{d})N_{\text{out}}
PUPC O(MtNrNd+MtNd2)NoutO(M_{t}N_{r}N_{d}+M_{t}N_{d}^{2})N_{\text{out}} O(MtNrNd+MtNd2)NoutO(M_{t}N_{r}N_{d}+M_{t}N_{d}^{2})N_{\text{out}} O(4NsubMtNrNd)NoutO(4N_{\text{sub}}M_{t}N_{r}N_{d})N_{\text{out}}
PAPC O((Nin+1)MtNrNd+MtNd2)NoutO((N_{\text{in}}+1)M_{t}N_{r}N_{d}+M_{t}N_{d}^{2})N_{\text{out}} O((Nin+1)MtNrNd+MtNd2)NoutO((N_{\text{in}}+1)M_{t}N_{r}N_{d}+M_{t}N_{d}^{2})N_{\text{out}} O(8NsubMtNrNd)NoutO(8N_{\text{sub}}M_{t}N_{r}N_{d})N_{\text{out}}

The popular WMMSE method proposed in [9] has the same objective function as our proposed Riemannian methods for precoder design. The computational complexity of the WMMSE method is O(2MtNrNd+MtNd2+32Nd3)NoutO(2M_{t}N_{r}N_{d}+M_{t}N_{d}^{2}+\frac{3}{2}N_{d}^{3})N_{\text{out}}, which is higher than that of RCG method under TPC, in the case with the same number of outer iterations used. SLNR precoder also satisfies PUPC, where the eigenvalue decomposition needs to be performed UU times [36]. Thus, the computational complexity of the SLNR precoder is O(Mt3U)O\left(M_{t}^{3}U\right), which is higher than the RCG method when Nout<Mt2UNrNd+Nd2N_{\text{out}}<\frac{M_{t}^{2}U}{N_{r}N_{d}+N_{d}^{2}}. In addition, the precoder under PUPC can also be designed by the WMMSE method, where UU Lagrange multipliers are obtained by the bisection method [37]. The complexity of WMMSE under PUPC is O(Mt3U+2MtNrNd+MtNd2+32Nd3)NoutO(M_{t}^{3}U+2M_{t}N_{r}N_{d}+M_{t}N_{d}^{2}+\frac{3}{2}N_{d}^{3})N_{\text{out}}, which is much higher than that of RCG method. WSR-maximization precoders under PAPC are difficult to design. [16] provides an alternative method by finding the linear precoder closest to the optimal ZF-based precoder under TPC, whose complexity is O(Mt3+(MtNr)3U)NZO(M_{t}^{3}+(M_{t}-N_{r})^{3}U)N_{\text{Z}} and NZN_{\text{Z}} is the number of iterations needed. Despite no inner iteration, this method has a higher computational complexity than that of the RCG method in massive MIMO systems when NZ=NoutN_{\text{Z}}=N_{\text{out}}. [17] designs the WSR-maximization precoder under PAPC via the DCAM-based method, whose computational complexity is O(ND(Mt3+Mt2Nd))NoutO\left(N_{\text{D}}\left(M_{t}^{3}+M_{t}^{2}N_{d}\right)\right)N_{\text{out}} with NDN_{\text{D}} being the number of iterations needed for the DCAM method to converge. Typically, the optimality tolerance of the duality gap is ϵ=104\epsilon=10^{-4} in [17] and NDlog2(1ϵ)N_{\text{D}}\propto\log_{2}\left(\frac{1}{\epsilon}\right), indicating that the computational complexity of the DCAM-based method is higher than that of the RCG design method per outer iteration. The numerical results in the next section will demonstrate that Riemannian design methods converge faster than the comparable iterative methods under different power constraints.

V Numerical Results

In this section, we provide simulation results to demonstrate that the precoders designed in the matrix manifold framework are numerically superior and computationally efficient under different power constraints, especially when the RCG method is used. We adopt the prevalent QuaDRiGa channel model [38], where “3GPP 38.901 UMa NLOS” scenario is considered. The 25 m high BS with Mt=128M_{t}=128 antennas serves U=20U=20 1.5 m high UTs, which are randomly distributed in the cell with the radius of 250 m. The antenna type is “3gpp-3d” at the BS side and “ula” at the user side. For simplicity, we set Mi=di=2,Pi=PU,i𝒰M_{i}=d_{i}=2,P_{i}=\frac{P}{U},\forall i\in\mathcal{U}. The center frequency is set at 4.8 GHz. The weighted factor of each user is set to 11. The powers of the channels for different users are normalized. The SNR is defined as Pσz2\frac{P}{\sigma_{z}^{2}}, where the noise power σz2\sigma_{z}^{2} is set to be 1 and different SNRs are obtained by adjusting the transmit power PP. For fair comparison, the Riemannian methods and the iterative methods for comparison are all initialized by the regularized zero-forcing (RZF) precoders [39], which are forced to satisfy the TPC, PUPC and PAPC, respectively.

Refer to caption
Figure 4: Convergence comparison under TPC at SNR=20dB.
Refer to caption
Figure 5: WSR performance when the RCG method converges under TPC.
Refer to caption
Figure 6: Convergence comparison between RCG and WMMSE under TPC.

Fig. 4 depicts the convergence trajectories of the Riemannian design methods compared with the WMMSE method [9] under TPC at SNR=20=20 dB for massive MIMO DL. As shown in Fig. 4, the RCG method converges much faster than RSD, WMMSE and RTR methods when Nsub=3N_{\text{sub}}=3. RTR method converges the fastest at the very beginning when Nsub=6N_{\text{sub}}=6 and obtains 87%87\% performance in the first three iterations. Besides, the complexity of the RCG method is lower than that of the WMMSE method. For the fairness of comparison, Fig. 6 compares the WSR performance of the RCG method with that of the WMMSE precoder under the same complexity when the RCG method converges. As we can see from Fig. 6, the RCG method has the same performance as the WMMSE method when they converge and has an evident performance gain in the high SNR regime when they share the same computational complexity. The RZF method [39] requires the least computation among these methods but also exhibits the poorest performance. To investigate the convergence behavior of the RCG method, we compare the approximate iterations needed for the RCG and WMMSE methods to completely converge in Fig. 6. Compared with the WMMSE method, the RCG method needs much fewer iterations to converge, especially when SNR is high, which clearly demonstrates the fast convergence and high computational efficiency of the RCG method.

Refer to caption
Figure 7: Convergence comparison under PUPC at SNR=20dB
Refer to caption
Figure 8: WSR performance when the RCG method converges under PUPC.
Refer to caption
Figure 9: Convergence comparison under PAPC at SNR=20dB.
Refer to caption
Figure 10: Convergence comparison under PAPC at SNR=10dB.
Refer to caption
Figure 11: WSR performance when the RCG method converges under PAPC.

The convergence behavior of Riemannian design methods under PUPC at SNR=20=20 dB is shown in Fig. 8. When Nsub=6N_{\text{sub}}=6, the RTR method needs nearly the same number of iterations to converge as RCG at the cost of a much higher computational complexity. In addition, WMMSE achieves good performance and converges fast at the first thirty iterations. However, RCG shows a better performance and converges faster compared with WMMSE when Nout>30N_{\text{out}}>30 with a much lower computational complexity. In Fig. 8, we compare the WSR performance of the Riemannian methods after convergence under different SNRs with those of the SLNR precoder and the WMMSE precoder satisfying PUPC, whose computational complexities are much higher than that of the RCG method. The WSR performances of the proposed Riemannian methods are almost the same after convergence with different initializations. From Fig. 8, the RCG method has the same performance as the WMMSE method after convergence and performs better than the SLNR precoder in the whole SNR regime.

Fig. 9 shows the convergence behavior of Riemannian design methods compared with the DCAM-based method in [17] under PAPC at SNR=20=20 dB. From Fig. 9, we find that the RTR method converges fast in the first three iterations, but the RCG method converges faster when Nout>10N_{\text{out}}>10. RCG and RTR methods both converge much faster than the DCAM-based method, whose complexity is higher than that of the RCG design method per outer iteration. To further show the convergence behavior of the Riemannian methods in the PAPC case under lower SNR, we plot the convergence trajectories of the Riemannian methods under PAPC at SNR=10 dB in Fig. 10. From Fig. 10, we can see that the RTR with Nsub=6N_{\text{sub}}=6 converges fastest and has the same WSR performance as the RCG method after convergence. Fig. 11 compares the WSR performance of the RCG method after convergence with the ZF-based method in [16] and the DCAM-based method in [17]. As we can see in Fig. 11, the RCG method is numerically superior in designing precoders under PAPC with a lower computational complexity compared with the algorithm proposed in [16]. The DCAM-based method exhibits similar performance with the RCG method in the low SNR regime at a higher computational cost but performs worse than the RCG method when SNR is high. In addition to the RZF precoder, we also try the random initialization and the WSR performances of the Riemannian methods after convergence are nearly the same.

Finally, we compare the running times required for the proposed Riemannian methods and the comparable iterative methods to converge in Table III. All the simulations are conducted by Matlab R2019b on a desktop with Intel(R) Core(TM) i9-10900K running at 3.70 GHz. We can see that the RCG method runs fastest in all the cases. In addition, RTR runs faster than the iterative methods for comparison in all the cases, especially in the PUPC case.

TABLE III: Running times of different methods (s)
RSD RCG RTR Iterative methods
TPC 100.5072 4.0844 44.8081    WMMSE-TPC: 73.6469
PUPC 90.2711 8.9171 17.3877 WMMSE-PUPC: 29.8341
PAPC 175.0686 16.5449 77.3072       DCAM-based: 145.9009

VI Conclusion

In this paper, we have investigated the linear precoder design methods with matrix manifold in massive MIMO DL transmission. We focus on the WSR-maximization precoder design and demonstrate that the precoders under TPC, PUPC and PAPC are on different Riemannian submanifolds. Then, the constrained problems are transformed into unconstrained ones on Riemannian submanifolds. Furthermore, RSD, RCG and RTR methods are proposed for optimizing on Riemannian submanifolds. There is no inverse of large dimensional matrix during the iterations in the proposed methods. Besides, the complexities of implementing these Riemannian design methods on different Riemannian submanifolds are investigated. Simulation results show the numerical superiority and computational efficiency of the RCG method.

Appendix A Proof of Theorem 1

A-A Proof in the TPC case

With the fact that the Frobenius norm of 𝐏^\hat{\mathbf{P}} is a constant, the constraint tr(𝐏H𝐏)=P\mathrm{tr}\left(\mathbf{P}^{H}\mathbf{P}\right)=P defines a sphere naturally. Clearly, ^\widehat{\mathcal{M}} is a subset of the set Mt×d1×Mt×d2××Mt×di\mathbb{C}^{M_{t}\times d_{1}}\times\mathbb{C}^{M_{t}\times d_{2}}\times\cdots\times\mathbb{C}^{M_{t}\times d_{i}}. Consider differentiable function F^:𝒩:𝐏tr(𝐏𝐏H)P\hat{F}:\mathcal{N}\rightarrow\mathbb{R}:\mathbf{P}\mapsto\mathrm{tr}\left(\mathbf{P}\mathbf{P}^{H}\right)-P, and clearly ^F^1(0)\widehat{\mathcal{M}}\in\hat{F}^{-1}\left(0\right), where 00\in\mathbb{R}. Based on the submersion theorem [22, Proposition 3.3.3], to show ^\widehat{\mathcal{M}} is a closed embedded submanifold of 𝒩\mathcal{N}, we need to prove F^\hat{F} as a submersion at each point of ^\widehat{\mathcal{M}}. In other words, we should verify that the rank of F^\hat{F} is equal to the dimension of \mathbb{R}, i.e., 1, at every point of ^\widehat{\mathcal{M}}. Since the rank of F^\hat{F} at a point 𝐏𝒩\mathbf{P}\in\mathcal{N} is defined as the dimension of the range of DF^(𝐏)\mathrm{D}\hat{F}\left(\mathbf{P}\right), we simply need to show that for all ww\in\mathbb{R}, there exists 𝐙=(𝐙1,𝐙2,,𝐙i)𝒩\mathbf{Z}=\left(\mathbf{Z}_{1},\mathbf{Z}_{2},\cdots,\mathbf{Z}_{i}\right)\in\mathcal{N} such that DF^(𝐏)[𝐙]=w\mathrm{D}\hat{F}\left(\mathbf{P}\right)\left[\mathbf{Z}\right]=w. Since the differential operation at 𝐏\mathbf{P} is equivalent to the component-wise differential at each of 𝐏1,𝐏2,,𝐏i\mathbf{P}_{1},\mathbf{P}_{2},\cdots,\mathbf{P}_{i}, we have

DF^(𝐏)[𝐙]=tr(𝐏H𝐙+𝐙H𝐏).\mathrm{D}\hat{F}\left(\mathbf{P}\right)\left[\mathbf{Z}\right]=\mathrm{tr}\left(\mathbf{P}^{H}\mathbf{Z}+\mathbf{Z}^{H}\mathbf{P}\right). (75)

It is easy to see that if we choose 𝐙i=12wP𝐏i\mathbf{Z}_{i}=\frac{1}{2}\frac{w}{P}\mathbf{P}_{i}, we will have DF^(𝐏)[𝐙]=w\mathrm{D}\hat{F}\left(\mathbf{P}\right)\left[\mathbf{Z}\right]=w. This shows that F^\hat{F} is full rank as well as a submersion on ^\widehat{\mathcal{M}}.

Because every tangent space T𝐏^^T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}} is a subspace of T𝐏𝒩T_{\mathbf{P}}\mathcal{N}, the Riemannian metric g𝐏()g_{\mathbf{P}}\left(\cdot\right) of 𝒩\mathcal{N} is naturally introduced to ^\widehat{\mathcal{M}} as g𝐏^()=g𝐏()g_{\hat{\mathbf{P}}}\left(\cdot\right)=g_{\mathbf{P}}\left(\cdot\right). With this metric, ^\widehat{\mathcal{M}} is an Riemannian embedded submanifold of 𝒩\mathcal{N}.

A-B Proof in the PUPC case

Note that each 𝐏~i\tilde{\mathbf{P}}_{i} in 𝐏~\tilde{\mathbf{P}} forms a sphere. Thus ~\widetilde{\mathcal{M}} forms an oblique manifold composed of UU spheres. To show ~\widetilde{\mathcal{M}} is a Riemannian submanifold of the product manifold 𝒩\mathcal{N}, we need to show that for 𝐃i=ai𝐈di,i𝒰\mathbf{D}_{i}=a_{i}\mathbf{I}_{d_{i}},\forall i\in\mathcal{U}, where aia_{i}\in\mathbb{R}, there exists 𝐙=(𝐙1,,𝐙U)𝒩\mathbf{Z}=\left(\mathbf{Z}_{1},\cdots,\mathbf{Z}_{U}\right)\in\mathcal{N} such that DF~(𝐏)[𝐙]=blkdiag(𝐃1,,𝐃U)=𝐃\mathrm{D}\tilde{F}\left(\mathbf{P}\right)\left[\mathbf{Z}\right]=\mathrm{blkdiag}\left(\mathbf{D}_{1},\cdots,\mathbf{D}_{U}\right)=\mathbf{D}. Since the differential operation at 𝐏\mathbf{P} is equivalent to the component-wise differential at each of 𝐏1,𝐏2,,𝐏U\mathbf{P}_{1},\mathbf{P}_{2},\cdots,\mathbf{P}_{U}, we have

DF~(𝐏)[𝐙]=\displaystyle\mathrm{D}\tilde{F}\left(\mathbf{P}\right)\left[\mathbf{Z}\right]= (76)
(tr(𝐏1H𝐙1+𝐙1H𝐏1)𝐈d1,,tr(𝐏UH𝐙U+𝐙UH𝐏U)𝐈dU).\displaystyle\left(\mathrm{tr}\left(\mathbf{P}_{1}^{H}\mathbf{Z}_{1}+\mathbf{Z}_{1}^{H}\mathbf{P}_{1}\right)\mathbf{I}_{d_{1}},\cdots,\mathrm{tr}\left(\mathbf{P}_{U}^{H}\mathbf{Z}_{U}+\mathbf{Z}_{U}^{H}\mathbf{P}_{U}\right)\mathbf{I}_{d_{U}}\right).

It is easy to see that if we choose 𝐙i=12UP𝐏i𝐃i\mathbf{Z}_{i}=\frac{1}{2}\frac{U}{P}\mathbf{P}_{i}\mathbf{D}_{i}, we will have DF~(𝐏)[𝐙]=𝐃\mathrm{D}\tilde{F}\left(\mathbf{P}\right)\left[\mathbf{Z}\right]=\mathbf{D}. This shows that F~\tilde{F} is full rank as well as a submersion on ~\widetilde{\mathcal{M}}.

Because every tangent space T𝐏~~T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}} is a subspace of T𝐏𝒩T_{\mathbf{P}}\mathcal{N}, the Riemannian metric g𝐏()g_{\mathbf{P}}\left(\cdot\right) of 𝒩\mathcal{N} is naturally introduced to ~\widetilde{\mathcal{M}} as g𝐏~()=g𝐏()g_{\tilde{\mathbf{P}}}\left(\cdot\right)=g_{\mathbf{P}}\left(\cdot\right). With this metric, ~\widetilde{\mathcal{M}} is an Riemannian embedded submanifold of 𝒩\mathcal{N}.

A-C Proof in the PAPC case

Every column of 𝐏¯H\bar{\mathbf{P}}^{H} forms a sphere and thus 𝐏¯H\bar{\mathbf{P}}^{H} forms a standard oblique manifold composed of MtM_{t} spheres. Similarly, to show ¯\overline{\mathcal{M}} is a Riemannian submanifold of the product manifold 𝒩\mathcal{N}, we need to show that for all 𝐰=(w1,w2,,wMt)Mt\mathbf{w}=\left(w_{1},w_{2},\cdots,w_{M_{t}}\right)\in\mathbb{R}^{M_{t}}, there exists 𝐙=(𝐙1,𝐙2,,𝐙U)𝒩\mathbf{Z}=\left(\mathbf{Z}_{1},\mathbf{Z}_{2},\cdots,\mathbf{Z}_{U}\right)\in\mathcal{N} such that DF(𝐏)[𝐙]=diag(𝐰)\mathrm{D}F\left(\mathbf{P}\right)\left[\mathbf{Z}\right]=\mathrm{diag}\left(\mathbf{w}\right). Since the differential operation at 𝐏\mathbf{P} is equivalent to the component-wise differential at each of 𝐏1,𝐏2,,𝐏i\mathbf{P}_{1},\mathbf{P}_{2},\cdots,\mathbf{P}_{i}, we have

DF¯(𝐏)[𝐙]=𝐈Mti=1U(𝐏i𝐙iH+𝐙i𝐏iH).\mathrm{D}\bar{F}\left(\mathbf{P}\right)\left[\mathbf{Z}\right]=\mathbf{I}_{M_{t}}\odot\sum_{i=1}^{U}\left(\mathbf{P}_{i}\mathbf{Z}_{i}^{H}+\mathbf{Z}_{i}\mathbf{P}_{i}^{H}\right). (77)

It is easy to see that if we choose 𝐙i=12MtPdiag(𝐰)𝐏i\mathbf{Z}_{i}=\frac{1}{2}\frac{M_{t}}{P}\mathrm{diag}\left(\mathbf{w}\right)\mathbf{P}_{i}, we will have DF¯(𝐏)[𝐙]=diag(𝐰)\mathrm{D}\bar{F}\left(\mathbf{P}\right)\left[\mathbf{Z}\right]=\mathrm{diag}\left(\mathbf{w}\right). This shows that F¯\bar{F} is full rank as well as a submersion on ¯\overline{\mathcal{M}}.

Because every tangent space T𝐏¯¯T_{\bar{\mathbf{P}}}\overline{\mathcal{M}} is a subspace of T𝐏𝒩T_{\mathbf{P}}\mathcal{N}, the Riemannian metric g𝐏()g_{\mathbf{P}}\left(\cdot\right) of 𝒩\mathcal{N} is naturally introduced to ¯\overline{\mathcal{M}} as g𝐏¯()=g𝐏()g_{\bar{\mathbf{P}}}\left(\cdot\right)=g_{\mathbf{P}}\left(\cdot\right). With this metric, ¯\overline{\mathcal{M}} is a Riemannian embedded submanifold of 𝒩\mathcal{N}.

Appendix B Proof of Theorem 2

To simplify the notations, we use f(𝐏i)f\left(\mathbf{P}_{i}\right) to represent f(𝐏)f\left(\mathbf{P}\right) that only considers 𝐏i\mathbf{P}_{i} as variable with 𝐏\mathbf{P}_{\ell} for i\ell\neq i fixed.

For any 𝝃𝐏iT𝐏iMt×Mi{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\in T_{\mathbf{P}_{i}}\mathbb{C}^{M_{t}\times M_{i}}, the directional derivative of f(𝐏i)f\left(\mathbf{P}_{i}\right) along 𝝃𝐏i{\boldsymbol{\xi}}_{\mathbf{P}_{i}} is

Df(𝐏i)[𝝃𝐏i]=wiDi[𝝃𝐏i]iUwD[𝝃𝐏i].\mathrm{D}f\left(\mathbf{P}_{i}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]=-w_{i}\mathrm{D}\mathcal{R}_{i}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]-\sum_{\ell\neq i}^{U}w_{\ell}\mathrm{D}\mathcal{R}_{\ell}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]. (78)

We derive Di[𝝃𝐏i]\mathrm{D}\mathcal{R}_{i}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right] and D[𝝃𝐏i]\mathrm{D}\mathcal{R}_{\ell}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right] separately as

Di[𝝃𝐏i]=\displaystyle\mathrm{D}\mathcal{R}_{i}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]= tr(𝐂i(𝝃𝐏iH𝐇iH𝐀i+𝐀iH𝐇i𝝃𝐏i)),\displaystyle\mathrm{tr}\left(\mathbf{C}_{i}\left({\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}\mathbf{H}_{i}^{H}\mathbf{A}_{i}+\mathbf{A}_{i}^{H}\mathbf{H}_{i}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right)\right), (79)
D[𝝃𝐏i]=\displaystyle\mathrm{D}\mathcal{R}_{\ell}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]= tr(𝐇H𝐁𝐇(𝝃𝐏i𝐏iH+𝐏i𝝃𝐏iH)).\displaystyle-\mathrm{tr}\bigg{(}\mathbf{H}_{\ell}^{H}\mathbf{B}_{\ell}\mathbf{H}_{\ell}\left({\boldsymbol{\xi}}_{\mathbf{P}_{i}}\mathbf{P}_{i}^{H}+\mathbf{P}_{i}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}\right)\bigg{)}. (80)

Thus, we have

D\displaystyle\mathrm{D} f(𝐏i)[𝝃𝐏i]=\displaystyle f\left(\mathbf{P}_{i}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]= (81)
g𝐏i(2wi𝐇iH𝐀i𝐂i+2iw𝐇H𝐁𝐇𝐏i,𝝃𝐏i)\displaystyle g_{\mathbf{P}_{i}}\Big{(}-2w_{i}\mathbf{H}_{i}^{H}\mathbf{A}_{i}\mathbf{C}_{i}+2\sum_{\ell\neq i}w_{\ell}\mathbf{H}_{\ell}^{H}\mathbf{B}_{\ell}\mathbf{H}_{\ell}\mathbf{P}_{i},{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\Big{)}

and gradf(𝐏i)\mathrm{grad}f\left(\mathbf{P}_{i}\right) is

gradf(𝐏i)=2(wi𝐇iH𝐀i𝐂iiw𝐇H𝐁𝐇𝐏i).\mathrm{grad}f\left(\mathbf{P}_{i}\right)=-2\Big{(}w_{i}\mathbf{H}_{i}^{H}\mathbf{A}_{i}\mathbf{C}_{i}-\sum_{\ell\neq i}w_{\ell}\mathbf{H}_{\ell}^{H}\mathbf{B}_{\ell}\mathbf{H}_{\ell}\mathbf{P}_{i}\Big{)}. (82)

Appendix C Proof of Lemma 1

From the decomposition (29) and the definition of the normal space (31), it is easy to have ΠT𝐏^^T𝐏𝒩(𝝃𝐏)=𝝃𝐏λ^1𝐏\Pi_{T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left({\boldsymbol{\xi}}_{\mathbf{P}}\right)={\boldsymbol{\xi}}_{\mathbf{P}}-\hat{\lambda}_{1}\mathbf{P}. Meanwhile, from the definition of the tangent space (30), 𝝃𝐏λ^1𝐏{\boldsymbol{\xi}}_{\mathbf{P}}-\hat{\lambda}_{1}\mathbf{P} should satisfy the equation

tr(𝐏H(𝝃𝐏λ^1𝐏)+(𝝃𝐏λ^1𝐏)H𝐏)=0.\mathrm{tr}\left(\mathbf{P}^{H}\left({\boldsymbol{\xi}}_{\mathbf{P}}-\hat{\lambda}_{1}\mathbf{P}\right)+\left({\boldsymbol{\xi}}_{\mathbf{P}}-\hat{\lambda}_{1}\mathbf{P}\right)^{H}\mathbf{P}\right)=0. (83)

After some algebra, we can get λ^1=1P{tr(𝐏H𝝃𝐏)}\hat{\lambda}_{1}=\frac{1}{P}\Re\left\{\mathrm{tr}\left(\mathbf{P}^{H}{\boldsymbol{\xi}}_{\mathbf{P}}\right)\right\}.

Appendix D Proof of Theorem 3

D-A Proof for Riemannian gradient in Theorem 3

From [22, Section 3.6.1] and Lemma 1, the Riemannian gradient of f(𝐏^)f\big{(}\hat{\mathbf{P}}\big{)} is

gradf(𝐏^)=ΠT𝐏^^T𝐏𝒩(gradf(𝐏^))=gradf(𝐏)λ^1𝐏,\mathrm{grad}f\big{(}\hat{\mathbf{P}}\big{)}=\Pi_{T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left(\mathrm{grad}f\big{(}\hat{\mathbf{P}}\big{)}\right)=\mathrm{grad}f\left(\mathbf{P}\right)-\hat{\lambda}_{1}\mathbf{P}, (84)

where λ^1=1P{tr(𝐏(gradf(𝐏))H)}\hat{\lambda}_{1}=\frac{1}{P}\Re\left\{\mathrm{tr}\left(\mathbf{P}\big{(}\mathrm{grad}f\left(\mathbf{P}\right)\big{)}^{H}\right)\right\}.

D-B Proof for Riemannian Hessian in Theorem 3

Combine (9) and (4), we get

Hessf(𝐏^)[𝝃𝐏^]\displaystyle\mathrm{Hess}f(\hat{\mathbf{P}})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}] =𝝃𝐏^^gradf\displaystyle=\nabla_{{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}}^{\widehat{\mathcal{M}}}\mathrm{grad}f (85)
=ΠT𝐏^^T𝐏𝒩(𝝃𝐏^𝒩gradf)\displaystyle=\Pi_{T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left(\nabla_{{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}}^{\mathcal{N}}\mathrm{grad}f\right)
=ΠT𝐏^^T𝐏𝒩(Dgradf(𝐏^)[𝝃𝐏^]),\displaystyle=\Pi_{T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left(\mathrm{Dgrad}f(\hat{\mathbf{P}})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}]\right),

which belongs to T𝐏^^T_{\hat{\mathbf{P}}}\widehat{\mathcal{M}}. So the following relationship holds:

{tr(𝐏HDgradf(𝐏^)[𝝃𝐏^]λ^2𝐏)}=0\displaystyle\Re\left\{\mathrm{tr}\left(\mathbf{P}^{H}\mathrm{Dgrad}f(\hat{\mathbf{P}})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}]-\hat{\lambda}_{2}\mathbf{P}\right)\right\}=0 (86)
λ^2=1P{tr(𝐏HDgradf(𝐏^)[𝝃𝐏^])}.\displaystyle\Rightarrow\hat{\lambda}_{2}=\frac{1}{P}\Re\left\{\mathrm{tr}\left(\mathbf{P}^{H}\mathrm{Dgrad}f(\hat{\mathbf{P}})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}]\right)\right\}.

Recall that Hessf(𝐏^)[𝝃𝐏^]\mathrm{Hess}f(\hat{\mathbf{P}})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}] is an element in T𝐏^^T_{\hat{\mathbf{P}}}{\widehat{\mathcal{M}}}. Thus from (20), we can get

Hessf(𝐏^)[𝝃𝐏^]=(Hessf(𝐏^1)[𝝃𝐏^1],,Hessf(𝐏^U)[𝝃𝐏^U]),\displaystyle\mathrm{Hess}f(\hat{\mathbf{P}})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}}]=\left(\mathrm{Hess}f(\hat{\mathbf{P}}_{1})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}_{1}}],\cdots,\mathrm{Hess}f(\hat{\mathbf{P}}_{U})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}_{U}}]\right), (87)

where Hessf(𝐏^i)[𝝃𝐏^i]\mathrm{Hess}f(\hat{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}_{i}}] can be denoted as

Hessf(𝐏^i)[𝝃𝐏^i]=\displaystyle\mathrm{Hess}f(\hat{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\hat{\mathbf{P}}_{i}}]= Dgradf(𝐏i)[𝝃𝐏i]\displaystyle\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}] (88)
Dλ^1[𝝃𝐏i]𝐏iλ^1𝝃𝐏iλ^2𝐏i.\displaystyle-\mathrm{D}\hat{\lambda}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]\mathbf{P}_{i}-\hat{\lambda}_{1}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}-\hat{\lambda}_{2}\mathbf{P}_{i}.

Dgradf(𝐏i)[𝝃𝐏i]\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}] and Dλ^1[𝝃𝐏i]\mathrm{D}\hat{\lambda}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}] remain unknown. For notational simplicity, let us define

𝐌,i=𝐇𝝃𝐏i𝐏iH𝐇H+𝐇𝐏i𝝃𝐏iH𝐇H,\displaystyle\mathbf{M}_{\ell,i}=\mathbf{H}_{\ell}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\mathbf{P}_{i}^{H}\mathbf{H}_{\ell}^{H}+\mathbf{H}_{\ell}\mathbf{P}_{i}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}\mathbf{H}_{\ell}^{H}, (89)
𝐅,i=𝐑1𝐌,i𝐁,𝐄,i=𝐁𝐌,i𝐁,\displaystyle\mathbf{F}_{\ell,i}=-\mathbf{R}_{\ell}^{-1}\mathbf{M}_{\ell,i}\mathbf{B}_{\ell},\ \mathbf{E}_{\ell,i}=-\mathbf{B}_{\ell}\mathbf{M}_{\ell,i}\mathbf{B}_{\ell},

then we can get

D𝐁[𝝃𝐏i]\displaystyle\mathrm{D}\mathbf{B}_{\ell}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right] =𝐅,i+𝐅,iH+𝐄,i.\displaystyle=\mathbf{F}_{\ell,i}+\mathbf{F}_{\ell,i}^{H}+\mathbf{E}_{\ell,i}. (90)

Then Dgradf(𝐏i)[𝝃𝐏i]\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}] can be calculated as follows:

Dgradf(𝐏i)[𝝃𝐏i]=2wi𝐇iH𝐑i1𝐇i𝝃𝐏i𝐂i\displaystyle\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]=-2w_{i}\mathbf{H}_{i}^{H}\mathbf{R}_{i}^{-1}\mathbf{H}_{i}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\mathbf{C}_{i} (91)
+2wi𝐇iH𝐀i𝐂i(𝝃𝐏iH𝐇iH𝐀i+𝐀iH𝐇i𝝃𝐏i)𝐂i\displaystyle+2w_{i}\mathbf{H}_{i}^{H}\mathbf{A}_{i}\mathbf{C}_{i}\left({\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}\mathbf{H}_{i}^{H}\mathbf{A}_{i}+\mathbf{A}_{i}^{H}\mathbf{H}_{i}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right)\mathbf{C}_{i}
+2iw𝐇H𝐁𝐇𝝃𝐏i+2iw𝐇HD𝐁[𝝃𝐏i]𝐇𝐏i.\displaystyle+2\sum_{\ell\neq i}w_{\ell}\mathbf{H}_{\ell}^{H}\mathbf{B}_{\ell}\mathbf{H}_{\ell}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}+2\sum_{\ell\neq i}w_{\ell}\mathbf{H}_{\ell}^{H}\mathrm{D}\mathbf{B}_{\ell}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]\mathbf{H}_{\ell}\mathbf{P}_{i}.

Dλ^1[𝝃𝐏i]\mathrm{D}\hat{\lambda}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}] can be calculated as follows:

Dλ^1[𝝃𝐏i]=1Pli{𝐏(Dgradf(𝐏)[𝝃𝐏i])H}+\displaystyle\mathrm{D}\hat{\lambda}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]=\frac{1}{P}\Re\sum_{l\neq i}\left\{\mathbf{P}_{\ell}\Big{(}\mathrm{Dgrad}f\left(\mathbf{P}_{\ell}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]\Big{)}^{H}\right\}+ (92)
1P{tr(𝝃𝐏iHgradf(𝐏i)+𝐏iHDgradf(𝐏i)[𝝃𝐏i])}.\displaystyle\frac{1}{P}\Re\left\{\mathrm{tr}\left({\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}\mathrm{grad}f(\mathbf{P}_{i})+\mathbf{P}_{i}^{H}\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]\right)\right\}.

Dgradf(𝐏)[𝝃𝐏i]\mathrm{Dgrad}f\left(\mathbf{P}_{\ell}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right] for i\forall\ell\neq i remains to be calculated:

Dgradf(𝐏)[𝝃𝐏i]=2jwj𝐇jHD𝐁j[𝝃𝐏i]𝐇j𝐏\displaystyle\mathrm{Dgrad}f\left(\mathbf{P}_{\ell}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]=2\sum_{j\neq\ell}w_{j}\mathbf{H}_{j}^{H}\mathrm{D}\mathbf{B}_{j}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]\mathbf{H}_{j}\mathbf{P}_{\ell} (93)
+2w𝐇H𝐑1𝐌,i𝐀𝐂2w𝐇H𝐀D𝐂[𝝃𝐏i],\displaystyle+2w_{\ell}\mathbf{H}_{\ell}^{H}\mathbf{R}_{\ell}^{-1}\mathbf{M}_{\ell,i}\mathbf{A}_{\ell}\mathbf{C}_{\ell}-2w_{\ell}\mathbf{H}_{\ell}^{H}\mathbf{A}_{\ell}\mathrm{D}\mathbf{C}_{\ell}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right],

where

D𝐂[𝝃𝐏i]=\displaystyle\mathrm{D}\mathbf{C}_{\ell}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]= 𝐂𝐀H𝐌,i𝐀𝐂,i,\displaystyle-\mathbf{C}_{\ell}\mathbf{A}_{\ell}^{H}\mathbf{M}_{\ell,i}\mathbf{A}_{\ell}\mathbf{C}_{\ell},\forall\ell\neq i, (94a)
D𝐁j[𝝃𝐏i]=\displaystyle\mathrm{D}\mathbf{B}_{j}\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]= {𝐅j,i+𝐅j,iH+𝐄j,i,j&ji,𝐑i1𝐇i(𝝃𝐏i𝐂i𝐏iH+𝐏i𝐂i𝝃𝐏iH)×𝐇iH𝐑i1+𝐄i,i,j=i.\displaystyle\begin{cases}\mathbf{F}_{j,i}+\mathbf{F}_{j,i}^{H}+\mathbf{E}_{j,i},j\neq\ell\&j\neq i,\\ \mathbf{R}_{i}^{-1}\mathbf{H}_{i}\big{(}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\mathbf{C}_{i}\mathbf{P}_{i}^{H}+\mathbf{P}_{i}\mathbf{C}_{i}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}\big{)}\times\\ \mathbf{H}_{i}^{H}\mathbf{R}_{i}^{-1}+\mathbf{E}_{i,i},j=i.\end{cases} (94b)

Appendix E Proof of Riemannian Hessian in Theorem 4

Like (85), we get

Hessf(𝐏~)[𝝃𝐏~]=ΠT𝐏~~T𝐏𝒩(Dgradf(𝐏~)[𝝃𝐏~]),\displaystyle\mathrm{Hess}f(\tilde{\mathbf{P}})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}]=\Pi_{T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\left(\mathrm{Dgrad}f(\tilde{\mathbf{P}})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}]\right), (95)

which belongs to T𝐏~~T_{\tilde{\mathbf{P}}}\widetilde{\mathcal{M}}, so the following relationship holds:

{tr(𝐏~iH(Dgradf(𝐏~i)[𝝃𝐏~i]𝐏~i[𝚲~2]i))}=0\displaystyle\Re\left\{\mathrm{tr}\Bigg{(}\tilde{\mathbf{P}}_{i}^{H}\Big{(}\mathrm{Dgrad}f(\tilde{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{i}}]-\tilde{\mathbf{P}}_{i}\big{[}\tilde{{\boldsymbol{\Lambda}}}_{2}\big{]}_{i}\Big{)}\Bigg{)}\right\}=0 (96)
[𝚲~2]i={tr(𝐏iH(Dgradf(𝐏~i)[𝝃𝐏~i]))}𝐈di.\displaystyle\Rightarrow\big{[}\tilde{{\boldsymbol{\Lambda}}}_{2}\big{]}_{i}=\Re\left\{\mathrm{tr}\Bigg{(}\mathbf{P}_{i}^{H}\left(\mathrm{Dgrad}f(\tilde{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{i}}]\right)\Bigg{)}\right\}\mathbf{I}_{d_{i}}.

From (20), we can get

Hessf(𝐏~)[𝝃𝐏~]=\displaystyle\mathrm{Hess}f(\tilde{\mathbf{P}})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}}]= (Hessf(𝐏~1)[𝝃𝐏~1],,Hessf(𝐏~U)[𝝃𝐏~U]),\displaystyle\left(\mathrm{Hess}f(\tilde{\mathbf{P}}_{1})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{1}}],\cdots,\mathrm{Hess}f(\tilde{\mathbf{P}}_{U})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{U}}]\right), (97)

where

Hessf(𝐏~i)[𝝃𝐏~i]=Dgradf(𝐏i)[𝝃𝐏i]\displaystyle\mathrm{Hess}f(\tilde{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\tilde{\mathbf{P}}_{i}}]=\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}] (98)
𝐏iD[𝚲~1]i[𝝃𝐏i]𝝃𝐏i[𝚲~1]i𝐏i[𝚲~2]i.\displaystyle-\mathbf{P}_{i}\mathrm{D}\big{[}\tilde{{\boldsymbol{\Lambda}}}_{1}\big{]}_{i}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]-{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\big{[}\tilde{{\boldsymbol{\Lambda}}}_{1}\big{]}_{i}-\mathbf{P}_{i}\big{[}\tilde{{\boldsymbol{\Lambda}}}_{2}\big{]}_{i}.

D[𝚲~1]i[𝝃𝐏i]\mathrm{D}\left[\tilde{{\boldsymbol{\Lambda}}}_{1}\right]_{i}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}] remains unknown, which can be calculated as follows:

D[𝚲~1]i[𝝃𝐏i]=\displaystyle\mathrm{D}\left[\tilde{{\boldsymbol{\Lambda}}}_{1}\right]_{i}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]= (99)
1Pi{tr(𝝃𝐏iHgradf(𝐏i)+𝐏iHDgradf(𝐏i)[𝝃𝐏i])}𝐈di,\displaystyle\frac{1}{P_{i}}\Re\left\{\mathrm{tr}\left({\boldsymbol{\xi}}_{\mathbf{P}_{i}}^{H}\mathrm{grad}f(\mathbf{P}_{i})+\mathbf{P}_{i}^{H}\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]\right)\right\}\mathbf{I}_{d_{i}},

where Dgradf(𝐏i)[𝝃𝐏i]\mathrm{Dgrad}f\left(\mathbf{P}_{i}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right] has been calculated in Appendix D.

Appendix F Proof of Riemannian Hessian in Theorem 5

Like (85), we get

Hessf(𝐏¯)[𝝃𝐏¯]=ΠT𝐏¯¯T𝐏𝒩(Dgradf(𝐏¯)[𝝃𝐏¯]),\displaystyle\mathrm{Hess}f(\bar{\mathbf{P}})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}]=\Pi_{T_{\bar{\mathbf{P}}}\overline{\mathcal{M}}}^{T_{\mathbf{P}}\mathcal{N}}\Big{(}\mathrm{Dgrad}f(\bar{\mathbf{P}})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}]\Big{)}, (100)

which belongs to T𝐏¯¯T_{\bar{\mathbf{P}}}\overline{\mathcal{M}}, so the following relationship holds:

𝐈Mtl=1K{𝐏¯(Dgradf(𝐏¯)[𝝃𝐏¯]𝚲¯2𝐏)H}=0\displaystyle\mathbf{I}_{M_{t}}\odot\sum_{l=1}^{K}\Re\left\{\bar{\mathbf{P}}_{\ell}\left(\mathrm{Dgrad}f(\bar{\mathbf{P}})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}]-\bar{{\boldsymbol{\Lambda}}}_{2}\mathbf{P}_{\ell}\right)^{H}\right\}=0 (101)
𝚲¯2=MtP𝐈Mtl=1K{𝐏(Dgradf(𝐏¯)[𝝃𝐏¯])H}.\displaystyle\Rightarrow\bar{{\boldsymbol{\Lambda}}}_{2}=\frac{M_{t}}{P}\mathbf{I}_{M_{t}}\odot\sum_{l=1}^{K}\Re\left\{\mathbf{P}_{\ell}\Big{(}\mathrm{Dgrad}f(\bar{\mathbf{P}})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}]\Big{)}^{H}\right\}.

From (20), we can get

Hessf(𝐏¯)[𝝃𝐏¯]=(Hessf(𝐏¯1)[𝝃𝐏¯1],,Hessf(𝐏¯U)[𝝃𝐏¯U]),\displaystyle\mathrm{Hess}f(\bar{\mathbf{P}})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}}]=\left(\mathrm{Hess}f(\bar{\mathbf{P}}_{1})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}_{1}}],\cdots,\mathrm{Hess}f(\bar{\mathbf{P}}_{U})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}_{U}}]\right), (102)

where Hessf(𝐏¯i)[𝝃𝐏¯i]\mathrm{Hess}f(\bar{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}_{i}}] can be denoted as:

Hessf(𝐏¯i)[𝝃𝐏¯i]=\displaystyle\mathrm{Hess}f(\bar{\mathbf{P}}_{i})[{\boldsymbol{\xi}}_{\bar{\mathbf{P}}_{i}}]= (103)
Dgradf(𝐏i)[𝝃𝐏i]D𝚲¯1[𝝃𝐏i]𝐏i𝚲¯1𝝃𝐏i𝚲¯2𝐏i.\displaystyle\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]-\mathrm{D}\bar{{\boldsymbol{\Lambda}}}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]\mathbf{P}_{i}-\bar{{\boldsymbol{\Lambda}}}_{1}{\boldsymbol{\xi}}_{\mathbf{P}_{i}}-\bar{{\boldsymbol{\Lambda}}}_{2}\mathbf{P}_{i}.

Dgradf(𝐏i)[𝝃𝐏i]\mathrm{Dgrad}f\left(\mathbf{P}_{i}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right] has been calculated in Appendix D. D𝚲¯1[𝝃𝐏i]\mathrm{D}\bar{{\boldsymbol{\Lambda}}}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}] remains unknown, which can be calculated as follows:

D𝚲¯1[𝝃𝐏i]=MtP𝐈Mtli{𝐏(Dgradf(𝐏)[𝝃𝐏i])H}+\displaystyle\mathrm{D}\bar{{\boldsymbol{\Lambda}}}_{1}[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]=\frac{M_{t}}{P}\mathbf{I}_{M_{t}}\odot\Re\sum_{l\neq i}\left\{\mathbf{P}_{\ell}\Big{(}\mathrm{Dgrad}f\left(\mathbf{P}_{\ell}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right]\Big{)}^{H}\right\}+ (104)
MtP𝐈Mt{𝝃𝐏i(gradf(𝐏i))H+𝐏iDgradf(𝐏i)[𝝃𝐏i]H}.\displaystyle\frac{M_{t}}{P}\mathbf{I}_{M_{t}}\odot\Re\left\{{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\big{(}\mathrm{grad}f(\mathbf{P}_{i})\big{)}^{H}+\mathbf{P}_{i}\mathrm{Dgrad}f(\mathbf{P}_{i})[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}]^{H}\right\}.

Dgradf(𝐏)[𝝃𝐏i]\mathrm{Dgrad}f\left(\mathbf{P}_{\ell}\right)\left[{\boldsymbol{\xi}}_{\mathbf{P}_{i}}\right] for i\forall\ell\neq i has been calculated in Appendix D.

Appendix G Convergence analyses of the proposed Riemannian methods

For the RSD method, gradf(𝐏k)-\mathrm{grad}f\left(\mathbf{P}^{k}\right) is naturally a descent direction as g𝐏k(gradf(𝐏k),g_{\mathbf{P}^{k}}\big{(}-\mathrm{grad}f(\mathbf{P}^{k}), gradf(𝐏k))<0\mathrm{grad}f(\mathbf{P}^{k})\big{)}<0. When the strong Wolfe curvature condition (3.31) in [40] holds, the search direction of the RCG method defined in (64) is a descent direction according to [40, Lemma 4.1]. We can reduce the initial step length α0\alpha^{0} to ensure the descent property of the search direction. Then, let {𝐏GDk}\left\{\mathbf{P}^{k}_{\mathrm{GD}}\right\} be an infinite sequence of iterations generated by our proposed RSD or RCG method. {𝐏GDk}\left\{\mathbf{P}^{k}_{\mathrm{GD}}\right\} has at least one accumulation point and all the accumulation points of {𝐏GDk}\left\{\mathbf{P}^{k}_{\mathrm{GD}}\right\} are critical points as long as ={𝐏f(𝐏)f(𝐏0)}\mathcal{L}=\left\{\mathbf{P}\in\mathcal{M}\mid f\left(\mathbf{P}\right)\leq f\left(\mathbf{P}^{0}\right)\right\} is a compact set from [22, Theorem 4.3.1] and [22, Corollary 4.3.2], which holds naturally if the sets formed by TPC, PUPC and PAPC are all compact. From [41, Theorem 1.4.8], a subset of a finite dimensional Euclidean space is compact if and only if it is closed and bounded. ^\widehat{\mathcal{M}}, ~\widetilde{\mathcal{M}} and ¯\overline{\mathcal{M}} defined in (22) are all subsets of 𝒩=Mt×i𝒰di\mathcal{N}=\mathbb{C}^{M_{t}\times\sum_{i\in\mathcal{U}}d_{i}}. They are all closed and bounded according to [41, Definition 1.1.6.] and [41, Definition 1.2.15.]. Therefore, ^\widehat{\mathcal{M}}, ~\widetilde{\mathcal{M}} and ¯\overline{\mathcal{M}} are all compact manifolds, and the proposed RSD and RCG methods can converge to a critical point.

Let {𝐏TRk}\left\{\mathbf{P}^{k}_{\mathrm{TR}}\right\} be an infinite sequence of iterations generated by the proposed RTR method for precoder design. From [23, Corollary 6.33], {𝐏TRk}\left\{\mathbf{P}^{k}_{\mathrm{TR}}\right\} has at least one accumulation point as \mathcal{L} is compact and all the accumulation points of {𝐏TRk}\left\{\mathbf{P}^{k}_{\mathrm{TR}}\right\} are critical points, indicating that the RTR method can converge to a critical point with a superlinear local convergence rate.

More analyses of the convergence of the Riemannian methods with any initialization can be found in [40] and [42].

References

  • [1] E. Björnson, L. Sanguinetti, H. Wymeersch, J. Hoydis, and T. L. Marzetta, “Massive MIMO is a reality—What is next?: Five promising research directions for antenna arrays,” Digit. Signal Process., vol. 94, pp. 3–20, Nov. 2019.
  • [2] E. D. Carvalho, A. Ali, A. Amiri, M. Angjelichinoski, and R. W. Heath, “Non-stationarities in extra-large-scale massive MIMO,” IEEE Wireless Commun., vol. 27, pp. 74–80, Aug. 2020.
  • [3] T. L. Marzetta and H. Yang, Fundamentals of Massive MIMO.   Cambridge University Press, 2016.
  • [4] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive MIMO for next generation wireless systems.” IEEE Commun. Mag., vol. 52, no. 2, pp. 186–195, Feb. 2014.
  • [5] T. Ketseoglou, M. C. Valenti, and E. Ayanoglu, “Millimeter wave massive MIMO downlink per-group communications with hybrid linear precoding,” IEEE Trans. Veh. Technol., vol. 70, no. 7, pp. 6841–6854, Jul. 2021.
  • [6] S. Jing and C. Xiao, “Linear MIMO precoders with finite alphabet inputs via stochastic optimization and deep neural networks (DNNs),” IEEE Trans. Signal Process., vol. 69, pp. 4269–4281, 2021.
  • [7] Y. Zhang, P. Mitran, and C. Rosenberg, “Joint resource allocation for linear precoding in downlink massive MIMO systems,” IEEE Trans. Commun., vol. 69, no. 5, pp. 3039–3053, May 2021.
  • [8] V. M. T. Palhares, A. R. Flores, and R. C. de Lamare, “Robust MMSE precoding and power allocation for cell-free massive MIMO systems,” IEEE Trans. Veh. Technol., vol. 70, no. 5, pp. 5115–5120, May 2021.
  • [9] S. S. Christensen, R. Agarwal, E. De Carvalho, and J. M. Cioffi, “Weighted sum-rate maximization using weighted MMSE for MIMO-BC beamforming design,” IEEE Trans. Wireless Commun., vol. 7, no. 12, pp. 4792–4799, Dec. 2008.
  • [10] T. X. Vu, S. Chatzinotas, and B. Ottersten, “Dynamic bandwidth allocation and precoding design for highly-loaded multiuser MISO in beyond 5G networks,” IEEE Trans. Wireless Commun., vol. 21, no. 3, pp. 1794–1805, Mar. 2022.
  • [11] J. Zhang, C.-K. Wen, C. Yuen, S. Jin, and X. Q. Gao, “Large system analysis of cognitive radio network via partially-projected regularized zero-forcing precoding,” IEEE Trans. Wireless Commun., vol. 14, no. 9, pp. 4934–4947, Sep. 2015.
  • [12] T. X. Tran and K. C. Teh, “Spectral and energy efficiency analysis for SLNR precoding in massive MIMO systems with imperfect CSI,” IEEE Trans. Wireless Commun., vol. 17, no. 6, pp. 4017–4027, Jun. 2018.
  • [13] L. You, X. Qiang, K.-X. Li, C. G. Tsinos, W. Wang, X. Q. Gao, and B. Ottersten, “Massive MIMO hybrid precoding for LEO satellite communications with twin-resolution phase shifters and nonlinear power amplifiers,” IEEE Trans. Commun., vol. 70, no. 8, pp. 5543–5557, Aug. 2022.
  • [14] A.-A. Lu, X. Q. Gao, W. Zhong, C. Xiao, and X. Meng, “Robust transmission for massive MIMO downlink with imperfect CSI,” IEEE Trans. Commun., vol. 67, no. 8, pp. 5362–5376, Aug. 2019.
  • [15] R. Muharar, R. Zakhour, and J. Evans, “Optimal power allocation and user loading for multiuser MISO channels with regularized channel inversion,” IEEE Trans. Commun., vol. 61, no. 12, pp. 5030–5041, Dec. 2013.
  • [16] J. Choi, S. Han, and J. Joung, “Low-complexity multiuser MIMO precoder design under per-antenna power constraints,” IEEE Trans. Veh. Technol., vol. 67, no. 9, pp. 9011–9015, Sep. 2018.
  • [17] X. Hu and X. Dai, “Low-complexity WSRMax precoder design using the dual coordinate ascent method,” IEEE Wireless Commun. Lett., vol. 12, no. 2, pp. 361–365, Feb. 2023.
  • [18] J. Chen, Y. Yin, T. Birdal, B. Chen, L. J. Guibas, and H. Wang, “Projective manifold gradient layer for deep rotation regression,” Proc. IEEE Conf. Comput. Vis. Pattern Recog., pp. 6646–6655, 2022.
  • [19] K. Li and R. Chen, “Batched data-driven evolutionary multiobjective optimization based on manifold interpolation,” IEEE Trans. Evol. Comput., vol. 27, no. 1, pp. 126–140, Feb. 2023.
  • [20] C. Feres and Z. Ding, “A Riemannian geometric approach to blind signal recovery for grant-free radio network access,” IEEE Trans. Signal Process., vol. 70, pp. 1734–1748, 2022.
  • [21] J. Dong, K. Yang, and Y. Shi, “Blind demixing for low-latency communication,” IEEE Trans. Wireless Commun., vol. 18, no. 2, pp. 897–911, Feb. 2019.
  • [22] P.-A. Absil, R. Mahony, and R. Sepulchre, Optimization Algorithms on Matrix Manifolds.   Princeton, NJ, USA: Princeton Univ. Press, 2009.
  • [23] N. Boumal, An Introduction to Optimization on Smooth Manifolds.   Cambridge University Press, 2023.
  • [24] J. M. Lee and J. M. Lee, Smooth Manifolds.   Springer, 2012.
  • [25] R. Abraham, J. E. Marsden, and T. Ratiu, Manifolds, Tensor Analysis, and Applications.   Springer Science & Business Media, 2012, vol. 75.
  • [26] W. Guo, A.-A. Lu, X. Meng, X. Q. Gao, and N. Ma, “Broad coverage precoding design for massive MIMO with manifold optimization,” IEEE Trans. Commun., vol. 67, no. 4, pp. 2792–2806, Apr. 2019.
  • [27] J. Von Neumann, Functional Operators: The Geometry of Orthogonal Spaces.   Princeton University Press, 1951, vol. 2.
  • [28] C. Wang, A.-A. Lu, X. Q. Gao, and Z. Ding, “Robust precoding for 3D massive MIMO configuration with matrix manifold optimization,” IEEE Trans. Wireless Commun., vol. 21, no. 5, pp. 3423–3437, May 2022.
  • [29] P.-A. Absil and K. Gallivan, “Joint diagonalization on the oblique manifold for independent component analysis,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2006, pp. 945–948.
  • [30] J. Li, G. Liao, Y. Huang, Z. Zhang, and A. Nehorai, “Riemannian geometric optimization methods for joint design of transmit sequence and receive filter on MIMO radar,” IEEE Trans. Signal Process., vol. 68, pp. 5602–5616, 2020.
  • [31] J. Nocedal and S. J. Wright, Numerical Optimization.   Springer, 1999.
  • [32] P. H. Calamai and J. J. Moré, “Projected gradient methods for linearly constrained problems,” Mathematical programming, vol. 39, no. 1, pp. 93–116, 1987.
  • [33] J. Sun, Q. Qu, and J. Wright, “Complete dictionary recovery over the sphere ii: Recovery by Riemannian trust-region method,” IEEE Trans. on Inf. Theory, vol. 63, no. 2, pp. 885–914, Feb. 2017.
  • [34] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust Region Methods.   Philadelphia: SIAM, 2000, vol. 1.
  • [35] P.-A. Absil, C. G. Baker, and K. A. Gallivan, “Trust-region methods on Riemannian manifolds.” Found. Comput. Math., vol. 7, no. 3, pp. 303–330, Jul. 2007.
  • [36] P. Patcharamaneepakorn, A. Doufexi, and S. Armour, “Equivalent expressions and performance analysis of SLNR precoding schemes: a generalisation to multi-antenna receivers,” IEEE Commun. Lett., vol. 17, no. 6, pp. 1196–1199, 2013.
  • [37] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4331–4340, Sep. 2011.
  • [38] S. Jaeckel, L. Raschkowski, K. Borner, and L. Thiele, “Quadriga: A 3-D multi-cell channel model with time evolution for enabling virtual field trials,” IEEE Trans. Antennas Propag., vol. 62, no. 6, pp. 3242–3256, Jun. 2014.
  • [39] C. Peel, B. Hochwald, and A. Swindlehurst, “A vector-perturbation technique for near-capacity multiantenna multiuser communication-part i: channel inversion and regularization,” IEEE Trans. on Commun., vol. 53, no. 1, pp. 195–202, Jan. 2005.
  • [40] H. Sato, Riemannian Optimization and Its Applications.   Springer, 2021, vol. 670.
  • [41] J. B. Conway, A Course in Point Set Topology.   Springer, 2014.
  • [42] N. Boumal, P.-A. Absil, and C. Cartis, “Global rates of convergence for nonconvex optimization on manifolds,” IMA J. Numer. Anal., vol. 39, no. 1, pp. 1–33, 2019.