This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning on a Grassmann Manifold:
CSI Quantization for Massive MIMO Systems

Keerthana Bhogi, Chiranjib Saha, and Harpreet S. Dhillon K. Bhogi, C. Saha, and H. S. Dhillon are with Wireless@VT, Department of ECE, Virginia Tech, Blacksburg, VA, USA. Email: {kbhogi, csaha, hdhillon}@vt.edu. The support of the US National Science Foundation (Grants ECCS-1731711 and CNS-1923807) is gratefully acknowledged.
Abstract

This paper focuses on the design of beamforming codebooks that maximize the average normalized beamforming gain for any underlying channel distribution. While the existing techniques use statistical channel models, we utilize a model-free data-driven approach with foundations in machine learning to generate beamforming codebooks that adapt to the surrounding propagation conditions. The key technical contribution lies in reducing the codebook design problem to an unsupervised clustering problem on a Grassmann manifold where the cluster centroids form the finite-sized beamforming codebook for the channel state information (CSI), which can be efficiently solved using KK-means clustering. This approach is extended to develop a remarkably efficient procedure for designing product codebooks for full-dimension (FD) multiple-input multiple-output (MIMO) systems with uniform planar array (UPA) antennas. Simulation results demonstrate the capability of the proposed design criterion in learning the codebooks, reducing the codebook size and producing noticeably higher beamforming gains compared to the existing state-of-the-art CSI quantization techniques.

Index Terms:
Massive MIMO, full-dimension (FD) MIMO, FDD, beamforming, codebook, machine learning, Grassmann manifold, KK-means clustering.

I Introduction

Transmit beamforming with receive combining is one of the simplest approaches to achieve full diversity in a MIMO system. It just requires CSI at the transmitter in the form of the transmit beamforming vector. The frequency division duplex (FDD) large-scale MIMO systems cannot utilize channel reciprocity to acquire CSI at the transmitter using uplink transmission. This necessitates channel estimation using downlink pilots and the subsequent feedback of the channel estimates (in this case the beamforming vector) to the transmitter over a dedicated feedback channel with limited capacity. This results in a significant overhead when the number of antennas is large. One way to overcome this problem is to construct a set of beamforming vectors constituting a codebook, which is known to both the transmitter and the receiver. The problem then reduces to determining the best beamforming vector at the receiver and conveying its index to the transmitter over the feedback channel [1]. A key step in this procedure is to construct the codebook, which is a classical problem in MIMO communications [2]. A common feature of all classical works in this direction is the assumption of a statistical channel model (such as Rayleigh) for which the optimal codebooks are constructed to optimize system performance.

With the recent interest in data-driven approaches to wireless system design, it is quite natural to wonder whether machine learning has any role to play in this classical problem. Since the fundamental difficulty in this problem is the dimensionality of the channel, the natural tendency is to think in terms of obtaining a low dimensional representation of the channel using deep learning techniques, such as autoencoders [3], [4], which can be used for codebook construction. An autoencoder operates on the hypothesis that the data possesses a representation on a lower dimensional manifold of the feature space, albeit unknown, and tries to learn the embedded manifold by training over the dataset. In contrast, for MIMO beamforming, the underlying manifold is known to be a Grassmann manifold. This removes the requirement of “learning” the manifold from the dataset which often times can be extremely complicated. Once the manifold is known, we can leverage the “shallow” learning techniques like the clustering algorithms on the manifold to find the optimal codebook for beamforming.

Prior Art. As is the case with any communication theory problem, almost all existing works on limited feedback assume some analytical model for the channel to enable tractable analyses, e.g., i.i.d Rayleigh fading [5], spatial correlation [6], temporal correlation [7] or both [8]. Specifically, the problem of quantized maximum ratio transmission (MRT) beamforming can be interpreted as a Grassmannian line packing problem for both uncorrelated [9] and spatially correlated [10] Rayleigh fading channels and has been extensively studied. The idea of connecting Grassmann manifolds to wireless communications is not new and has been used in other aspects of MIMO systems, such as non-coherent communication [11] and limited feedback unitary precoding [12]. Coming to the context of the limited feedback FDD MIMO, the codebook based on Grassmannian line packing is strictly dependent on the assumption of Rayleigh fading and hence cannot be extended to more realistic scenarios. On the other hand, the discrete Fourier transform (DFT) based beamforming exploits the second order statistics of the channel (such as the direction of departure of the dominant path) and offers a simple yet robust solution to the codebook construction. Owing to the direct connection with the spatial parameters of the channel, the DFT codebook can be extended to the Kronecker product (KP) codebook for 3D beamforming in FD MIMO scenarios. A major drawback of DFT codebook is that it scans all possible directions even though many of them may not be used and thus the available feedback bits are not used efficiently. Finally, since we will be proposing a clustering based solution, it is useful to note that clustering has already found applications in many related problems, such as MIMO detection [13], automatic modulation recognition [14], and radio resource allocation in a heterogeneous network [15].

Contributions. The key technical contribution of this paper lies in the novel formulation of transmit beamforming codebook design for any arbitrary channel distribution as the Grassmannian KK-means clustering problem. First, we develop the algorithm for KK-means clustering on the Grassmann manifold that finds the KK centroids of the clusters. Leveraging the fact that optimal MRT beamforming vectors lie on a Grassmann manifold, we then develop the design criterion for optimal beamforming codebooks. We then formally establish the connection between the Grassmannian KK-means algorithm and the codebook design problem and show that the optimal codebook is nothing but the set of centroids given by the KK-means algorithm. This approach is further extended to develop product codebooks for FD-MIMO systems employing UPA antennas. In particular, we show that under the rank{\rm rank}-11 approximation of the channel, the optimal codebook can be decomposed as the Cartesian product of two Grassmannian codebooks of smaller dimensions. We discuss the optimality and performance of the codebooks using both the proposed techniques in terms of average normalized beamforming gain.

Notation. We use boldface small case (upper case) letters, e.g. 𝐚(𝐀){\bf a}(\bf A), to designate column vectors (matrices) with entries in {\mathbb{C}}. We use M×N{\mathbb{C}}^{M\times N} to represent M×NM\times N dimensional complex space, 𝒰(M,N){\cal U}(M,N) to represent the set of all M×NM\times N orthonormal matrices, 𝒰M{\cal U}_{M} to represent the set of all M×MM\times M unitary matrices. Further, a(𝐚)a^{*}({\bf a}^{*}) denotes complex conjugate of aa\in{\mathbb{C}} (𝐚M×1({\bf a}\in{\mathbb{C}}^{M\times 1}), 𝐀T{\bf A}^{T} denotes transpose, 𝐀H{\bf A}^{H} denotes hermitian, svd(𝐀){\rm svd}({\bf A}) denotes the singular value decomposition, vec(𝐀){\rm vec}({\bf A}) denotes the vectorization of a matrix 𝐀{\bf A}. Also, 𝔼𝐚[]{\mathbb{E}}_{\bf a}[\cdot] represents the expectation over the distribution of 𝐚{\bf a}, |||\cdot| denotes the absolute value, 2\left\lVert\cdot\right\rVert_{2} denotes the matrix two-norm and j=1j=\sqrt{-1}.

II System Overview

We consider a narrow-band point-to-point Mt×MrM_{t}\times M_{r} MIMO communication scenario, where the transmitter and receiver are equipped with MtM_{t} and MrM_{r} antennas, respectively. In this paper, we focus on the transmit beamforming operation, where the transmitter sends one data stream over a flat fading channel. The discrete-time baseband input-output relation for this system can be expressed as

𝐲\displaystyle{\bf y} =𝐇𝐟s+𝐧,\displaystyle={\bf{H}}{\bf f}s+{\bf n}, (1)

where 𝐲Mr×1{\bf y}\in\mathbb{C}^{M_{r}\times 1} is the received baseband signal, 𝐇Mr×Mt{\bf H}\in{\mathbb{C}}^{M_{r}\times M_{t}} is the block fading MIMO channel, ss\in{\mathbb{C}} is the transmitted symbol, 𝐧Mr×1{\bf n}\in{{\mathbb{C}}}^{M_{r}\times 1} is the additive noise at the receiver, and 𝐟Mt×1{\bf f}\in{{\mathbb{C}}}^{M_{t}\times 1} is the beamforming vector. The symbol energy is given by 𝔼[|s|2]=t\mathbb{E}[|s|^{2}]={\cal E}_{t} and the total transmitted energy is 𝔼[𝐟s22]=t𝐟22\mathbb{E}[\left\lVert{\bf f}s\right\rVert_{2}^{2}]={\cal E}_{t}\left\lVert{\bf f}\right\rVert_{2}^{2}. The additive noise 𝐧{\bf n} is Gaussian, i.e., entries in 𝐧{\bf n} are i.i.d according to 𝒞𝒩(0,No)\mathcal{CN}(0,N_{o}). It is assumed that perfect channel knowledge is always available at the receiver. With the combining vector 𝐳Mr×1{\bf z}\in{\mathbb{C}}^{M_{r}\times 1}, the estimated transmitted symbol is obtained as s^=𝐳H𝐲\hat{s}={\bf z}^{H}{\bf y}. The receive SNR is

γr\displaystyle\gamma_{r} =t|𝐳H𝐇𝐟|2|𝐳H𝐧𝐧H𝐳|=γt|𝐳H𝐇𝐟|2𝐳22𝐟22,\displaystyle=\frac{{\cal E}_{t}|{\bf z}^{H}{{\bf H}{\bf f}}|^{2}}{|{\bf z}^{H}{\bf n}{\bf n}^{H}{\bf z}|}=\gamma_{t}\frac{|{\bf z}^{H}{{\bf H}{\bf f}}|^{2}}{\|{\bf z}\|_{2}^{2}\|{\bf f}\|_{2}^{2}},

where γt=t𝐟2/No\gamma_{t}={\cal E}_{t}\|{\bf f}\|^{2}/N_{o} is the transmit SNR. Without loss of generality, it is assumed that 𝐳22=1\left\lVert{\bf z}\right\rVert^{2}_{2}=1. Under this assumption

γr\displaystyle\gamma_{r} =γt|𝐳H𝐇𝐟|2𝐟2=γtΓ(𝐟,𝐳),\displaystyle={\gamma_{t}}\frac{{|{\bf z}^{H}{{\bf H}{\bf f}}|^{2}}}{\|{\bf f}\|^{2}}={\gamma_{t}}{\Gamma}({\bf f},{\bf z}), (2)

where Γ(𝐟,𝐳){\Gamma}({\bf f},{\bf z}) is the effective channel gain or the beamforming gain. The MIMO beamfoming problem is to choose 𝐟{\bf f} and 𝐳{\bf z} such that Γ(𝐟,𝐳){\Gamma}({\bf f},{\bf z}) is maximized, which would in turn maximize the SNR and consequently minimize the average probability of error and maximize the capacity [16]. A receiver that employs maximum ratio combining (MRC) chooses 𝐳{\bf z} such that Γ(𝐟,𝐳){\Gamma}({\bf f},{\bf z}) for a given 𝐟{\bf f} is maximized [9]. Under the assumption that receiver always uses MRC, 𝐳{\bf z} is given by

𝐳\displaystyle{\bf z} =𝐇𝐟/𝐇𝐟2,\displaystyle={\bf H}{\bf f}/\left\lVert{\bf H}{\bf f}\right\rVert_{2},

and Γ(𝐟,𝐳){\Gamma}({\bf f},{\bf z}) can be simplified as

Γ:=Γ(𝐟)=𝐇𝐟22.\displaystyle{\Gamma}:={\Gamma}({\bf f})={\left\lVert{\bf H}{\bf f}\right\rVert^{2}_{2}}. (3)

Therefore the MIMO beamforming problem is to find the optimal beamforming vector 𝐟{\bf f} that maximizes Γ(𝐟)\Gamma({\bf f}) and can be formally posed as

𝐟\displaystyle{\bf f} =argmax𝐱Mt×1Γ(𝐱).\displaystyle=\underset{{\bf x}\in{\mathbb{C}}^{M_{t}\times 1}}{\operatorname{arg~{}max}}\ {\Gamma}({\bf x}). (4)

To constrain transmit power, we assume that 𝐟22=1\left\lVert{\bf f}\right\rVert^{2}_{2}=1 without loss of generality. We consider maximum ratio transmission (MRT), which selects 𝐟{\bf f} to maximize Γ(𝐟,𝐳){\Gamma}({\bf f},{\bf z}) for a given 𝐳{\bf z} [17]. Under the assumptions of MRT, receive MRC, and no other design constraints on 𝐟{\bf f}, for a given NoN_{o} and t{\cal E}_{t}, the optimal beamforming vector 𝐟{\bf f} that maximizes Γ{\Gamma} is

𝐟\displaystyle{\bf f} =argmax𝐱Mt×1𝐇𝐱22 subjected to 𝐱22=1\displaystyle=\underset{{\bf x}\in{\mathbb{C}}^{M_{t}\times 1}}{\operatorname{arg~{}max}}\ \left\lVert{\bf H}{\bf x}\right\rVert^{2}_{2}\text{ subjected to }\left\lVert{\bf x}\right\rVert_{2}^{2}=1
=argmax𝐱𝒰(Mt,1)𝐇𝐱22.\displaystyle=\underset{{\bf x}\in{\cal U}(M_{t},1)}{\operatorname{arg~{}max}}\ \left\lVert{\bf H}{\bf x}\right\rVert^{2}_{2}. (5)

Note that argmax\operatorname{arg~{}max} of any function returns only one out of its possibly many global maximizers and thus the output may not necessarily be unique. For an MRT system, 𝐟{\bf f} is the orthonormal eigenvector associated with the maximum eigenvalue of 𝐇H𝐇{\bf H}^{H}{\bf H} [18]. Let λ1λMt{\lambda}_{1}\geq\dots\geq{\lambda}_{M_{t}} be the eigenvalues of 𝐇H𝐇{\bf H}^{H}{\bf H} and 𝐯1,,𝐯Mt{\bf v}_{1},\dots,{\bf v}_{M_{t}} be the corresponding eigenvectors. One possible solution of (5) is 𝐟=𝐯1{\bf f}={\bf v}_{1} and the corresponding beamforming gain is

Γ(𝐯1)\displaystyle{\Gamma}({\bf v}_{1}) =𝐇𝐯122=λ1.\displaystyle=\left\lVert{\bf H}{\bf v}_{1}\right\rVert_{2}^{2}=\lambda_{1}. (6)

For a given 𝐇{\bf H}, let the solution space of (5) be denoted as 𝒮𝐇𝒰(Mt,1){\cal S}_{\bf H}\subset{\cal U}(M_{t},1). Then 𝐯1𝒮𝐇{\bf v}_{1}\in{\cal S}_{\bf H} and for every 𝐟𝒮𝐇{\bf f}\in{\cal S}_{\bf H}, Γ(𝐟)=λ1\Gamma({\bf f})=\lambda_{1}.

Quite obviously, MRT beamforming requires CSIT. In particular, in an FDD system, the receiver estimates the channel 𝐇{\bf H} and sends 𝐯1{\bf v}_{1} back to the transmitter over a feedback channel. Thus the feedback overhead increases as MtM_{t} increases. Since the feedback channel is typically assumed to be a low-rate reliable channel, it is not always possible to transmit 𝐯1{\bf v}_{1} over this channel without any data compression [1]. One way to model this feedback bottleneck is to assume the feedback channel to be a zero-delay, error-free, and the capacity being limited to BB bits per channel use. Thus, it is necessary to introduce some method of quantization for 𝐯1{\bf v}_{1}. The most well-known approach for the quantization is to construct a dictionary of beams [1], also known as the beam codebook. In particular, the transmitter and receiver agree upon a finite set of possible beamforming vectors, say ={𝐟1,,𝐟2B}{\cal F}=\{{\bf f}_{1},\dots,{\bf f}_{2^{B}}\} of cardinality 2B2^{B}. The receiver chooses the appropriate vector 𝐟{\bf f}\in{\cal F} that maximizes Γ\Gamma and feeds the index of the codeword back to the transmitter. The system-level diagram of a limited feedback FDD-MIMO system, as discussed so far, is provided in Fig. 1.

Refer to caption
Figure 1: Block diagram of an FDD-MIMO system with limited feedback channel of capacity BB bits per channel use.

Therefore for a given codebook \cal F, the optimal beamforming vector as stated in (4) is

𝐟\displaystyle{\bf f} =argmax𝐟i𝐇𝐟i22.\displaystyle=\underset{{{\bf f}_{i}\in{\cal F}}}{\operatorname{arg~{}max}}\ \left\lVert{\bf H}{\bf f}_{i}\right\rVert^{2}_{2}. (7)

It is important to note that the original problem of finding the optimal MRT solution for (5) is a constrained optimization problem on the Euclidean space Mt×1{\mathbb{C}}^{M_{t}\times 1} and does not have unique solution on Mt×1{\mathbb{C}}^{M_{t}\times 1}. This problem can be reformulated as a manifold optimization problem as follows. As argued in [9], it can be shown that the optimal MRT beamformers for every 𝐇Mr×Mt{\bf H}\in{{\mathbb{C}}}^{M_{r}\times M_{t}} lie on a special kind of Riemann manifold embedded in Mt×1{\mathbb{C}}^{M_{t}\times 1}, known as the Grassmann manifold. This will be discussed in Section III. As it will be evident later, this manifold structure of the search domain of 𝐟{\bf f} in (5) is the key enabler for our data-driven codebook design.

III Clustering on a Grassmann Manifold

In this section, we provide a brief introduction to the Grassmann manifold, which is instrumental for the design of the proposed beamforming codebook design. The reader is referred to the foundational texts in differential geometry, such as [19], for a comprehensive and rigorous treatment of this manifold. A Grassmann manifold refers to a set of subspaces embedded in a higher-dimensional space (such as the surface of a sphere in 3\mathbb{R}^{3}). More formally, the complex Grassmann manifold 𝒢(Mt,M){\cal G}(M_{t},M) is defined as

𝒢(Mt,M)\displaystyle{\cal G}(M_{t},M) :={span(𝐘):𝐘Mt×M,𝐘H𝐘=𝐈M}.\displaystyle:=\{\text{span}({\bf Y}):{\bf Y}\in{\mathbb{C}}^{M_{t}\times M},{\bf Y}^{H}{\bf Y}={\bf I}_{M}\}. (8)

Any element 𝒴{\cal Y} in 𝒢(Mt,M){\cal G}(M_{t},M) is typically represented by an orthonormal matrix 𝐘{\bf Y} whose columns span 𝒴{\cal Y}. It is to be noted that there exists no unique representation of a subspace 𝒴{\cal Y}. This can be explained as follows. Let 𝐘\bf Y be the orthonormal basis that spans 𝒴\cal Y, then 𝒴{\cal Y} can also be spanned by some other orthonormal matrix 𝐘=𝐘𝐑{\bf Y}^{\prime}={\bf Y}{\bf R} for some 𝐑𝒰M{\bf R}\in{\cal U}_{M}. Thus 𝐘{\bf Y} and 𝐘{\bf Y}^{\prime} span the same subspace, which is represented by an equivalence relation 𝐘𝐘{\bf Y}^{\prime}\equiv{\bf Y}. Each of these MM-dimensional linear subspaces can be regarded as a single point on the Grassmann manifold, which is represented by its orthonormal basis. Since a linear subspace can be specified by an arbitrary basis, each point on 𝒢(Mt,M){\cal G}(M_{t},M) is an equivalence classes of orthonormal matrices. Specifically, 𝐘{\bf Y} and 𝐘𝐑{\bf Y}{\bf R} correspond to the same point on 𝒢(Mt,M){\cal G}(M_{t},M).

Refer to caption
Figure 2: Example of a Grassmann manifold 𝒢(3,1){\cal G}(3,1) embedded in 3\mathbb{R}^{3}: 1{\cal L}_{1}, 2{\cal L}_{2} are two arbitrary lines passing through the origin in 3\mathbb{R}^{3}. They are represented by unit vectors 𝐟1{\bf f}_{1}, 𝐟2{\bf f}_{2}, at the intersections of the lines 1{\cal L}_{1}, 2{\cal L}_{2} and the surface of the unit sphere in 3\mathbb{R}^{3}.

Now consider the case of M=1M=1, i.e. 𝒢(Mt,1){\cal G}(M_{t},1), which is the set of all one-dimensional subspaces in Mt×1{\mathbb{C}}^{M_{t}\times 1}. In other words, one can visualize 𝒢(Mt,1){\cal G}(M_{t},1) as the collection of all lines passing through the origin of the space Mt×1{\mathbb{C}}^{M_{t}\times 1}. A line {\mathcal{L}} passing through origin in Mt×1{\mathbb{C}}^{M_{t}\times 1} is represented in 𝒢(Mt,1){\cal G}(M_{t},1) by a unit vector 𝐟{\bf f} that spans the line. It can also be generated by any other unit vector 𝐟{\bf f}^{\prime} if 𝐟𝐟{\bf f}^{\prime}\equiv{\bf f}. Also note, 𝐟{\bf f} and 𝐟{\bf f}^{\prime} correspond to the same point on 𝒢(Mt,1){\cal G}(M_{t},1). Since a complex Grassmann of arbitrary dimensions is difficult to visualize, for illustration purpose, we present the real Grassmann manifold 𝒢(3,1){\cal G}(3,1) in 3\mathbb{R}^{3} in Fig. 2.

The notion of “distance” between the lines in 𝒢(Mt,1){\cal G}(M_{t},1) generated by two unit vectors 𝐟1,𝐟2{\bf f}_{1},{\bf f}_{2} can be defined as the sine of the angle between the lines [20]. In particular, the distance is expressed as

d(𝐟1,𝐟2)\displaystyle d({\bf f}_{1},{\bf f}_{2}) =sin(θ1,2)=1|𝐟1H𝐟2|2.\displaystyle=\sin(\theta_{1,2})=\sqrt{1-|{\bf f}^{H}_{1}{\bf f}_{2}|^{2}}. (9)

The connection between 𝒢(Mt,1){\cal G}(M_{t},1) and optimal MRT beamforming is established next.

Lemma 1.

For a given channel realization 𝐇Mr×Mt{\bf H}\in{\mathbb{C}}^{M_{r}\times M_{t}} and 𝐟𝒮𝐇{\bf f}\in{\cal S}_{\bf H}, every 𝐟{\bf f}^{\prime} such that 𝐟𝐟{\bf f}^{\prime}\equiv{\bf f} is also an optimal MRT beamformer.

Proof:

Given 𝐟𝒮𝐇{\bf f}\in{\cal S}_{\bf H} and 𝐟𝐟{\bf f}^{\prime}\equiv{\bf f} i.e. 𝐟=𝐟ejθ{\bf f}^{\prime}={\bf f}e^{j\theta} for some θ[0,2π)\theta\in[0,2\pi), then

𝐇𝐟22\displaystyle\left\lVert{\bf H}{\bf f}\right\rVert_{2}^{2} =|𝐟H𝐇H𝐇𝐟|\displaystyle=|{\bf f}^{H}{\bf H}^{H}{\bf H}{\bf f}|
=|ejθ𝐟H𝐇H𝐇𝐟ejθ|\displaystyle=|e^{-j\theta}{\bf f}^{H}{\bf H}^{H}{\bf H}{\bf f}e^{j\theta}|
=|𝐟H𝐇H𝐇𝐟|\displaystyle=|{\bf f}^{\prime H}{\bf H}^{H}{\bf H}{\bf f}^{\prime}|
=𝐇𝐟22.\displaystyle=\left\lVert{\bf H}{\bf f}^{\prime}\right\rVert_{2}^{2}. (10)

Therefore, 𝐟{\bf f}^{\prime} is also an element in 𝒮𝐇{\cal S}_{\bf H}. ∎

Remark 1.

According to Lemma 1, every 𝐟,𝐟𝒮𝐇{\bf f},{\bf f}^{\prime}\in{\cal S}_{\bf H} such that 𝐟𝐟{\bf f}^{\prime}\equiv{\bf f} correspond to the same point on 𝒢(Mt,1){\cal G}(M_{t},1). Therefore, any probability distribution on channel 𝐇Mr×Mt{\bf H}\in{\mathbb{C}}^{M_{r}\times M_{t}} will impose a probability distribution on 𝐯1{\bf v}_{1} in 𝒢(Mt,1){\cal G}(M_{t},1). Thus we need a quantizer defined on 𝒢(Mt,1){{\cal G}(M_{t},1)} to encode the optimal MRT beamforming vectors and generate a BB-bit beamforming codebook 𝒢(Mt,1){\cal F}\subset{\cal G}(M_{t},1) for limited feeback beamforming.

The optimal MRT beamformers are points on 𝒢(Mt,1){\cal G}(M_{t},1), hence can be described as a point process whose characteristics depend on the underlying channel distribution.

Remark 2.

For a Rayleigh fading channel where the entries of 𝐇{\bf H} are i.i.d according to 𝒞𝒩(0,1){\cal CN}(0,1), it has been shown in [9] that 𝐯1{\bf v}_{1} is uniformly distributed on 𝒢(Mt,1){\cal G}(M_{t},1). Thus, the construction of {\cal F} is equivalent to finding the best packing of 2B{2^{B}} lines in 𝒢(Mt,1){\cal G}(M_{t},1) [20]. See [9] for more details.

For an arbitrary distribution of 𝐇Mr×Mt{\bf H}\in{\mathbb{C}}^{M_{r}\times M_{t}}, the distribution of 𝐯1{\bf v}_{1} will no longer be uniformly distributed on 𝒢(Mt,1){\cal G}(M_{t},1). As an illustrative example, using the distance function defined in (9)\eqref{eq::def::distance}, we compare the Ripley’s KK function K(d)K(d) [21] of the optimal MRT beamforming vectors on 𝒢(Mt,1){\cal G}(M_{t},1) for a realistic scenario (see Section VI for more details on the experimental setup) with Rayleigh fading channels for the same system model. Ripley’s KK function is a spatial descriptive statistic that measures the deviation of a point process from spatial homogeneity. From Fig. 3, we see that the distribution of the optimal MRT beamforming vectors for the realistic channel is significantly different from the uniform distribution on 𝒢(Mt,1){\cal G}(M_{t},1) which is equivalent to complete randomness on the manifold. We can also infer that optimal MRT beamformers of the channel for the considered scenario exhibit clustering tendency on 𝒢(Mt,1){\cal G}(M_{t},1). Therefore, a reasonable codebook construction scheme is to simply deploy an unsupervised clustering method (such as KK-means clustering) on 𝒢(Mt,1){\cal G}(M_{t},1) that can identify optimal K=2BK=2^{B} cluster centroids and form the codebook \cal F. In what follows, we will establish that KK-means clustering on 𝒢(Mt,1){\cal G}(M_{t},1) actually yields an optimal codebook.

III-A Grassmannian KK-means clustering

KK-means clustering on a given metric space is a method of vector quantization to partition a set of nn data points into KK non-overlapping clusters in which each data point belongs to the cluster with the nearest cluster centroid. The centroids are the quantized representations of the data points that belong to the respective clusters. A quantizer on the given metric space maps the data points to one of the KK centroids. The KK centroids are chosen such that the average distortion due to the quantization according to a pre-defined distortion measure is minimized. Before we formally introduce the main steps of the clustering algorithm on 𝒢(Mt,1){\cal G}(M_{t},1), we first define the notion of a distortion measure and a quantizer as follows.

Definition 1 (Distortion measure).

The distortion caused by representing 𝐟𝒢(Mt,1){\bf f}\in{\cal G}(M_{t},1) with 𝐟𝒢(Mt,1){\bf f}^{\prime}\in{\cal G}(M_{t},1) is defined as the distortion measure dod_{o} which is given by do(𝐟,𝐟)=d2(𝐟,𝐟)d_{o}({\bf f},{\bf f}^{\prime})=d^{2}({\bf f},{\bf f}^{\prime}).

Definition 2 (Grassmann quantizer).

Let 𝒢(Mt,1){\cal F}\subset{\cal G}(M_{t},1) be a BB-bit codebook such that ={𝐟1,.,𝐟2B}{\cal F}=\{{\bf f}_{1},....,{\bf f}_{2^{B}}\}, then a Grassmann quantizer QQ_{\cal F} is defined as a function mapping elements of 𝒢(Mt,1){\cal G}(M_{t},1) to elements of {\cal F} i.e. Q:𝒢(Mt,1)Q_{\cal F}:{\cal G}(M_{t},1)\mapsto{\cal F}.

A performance measure of a Grassmann quantizer is the average distortion D(Q)D(Q_{\cal F}), where

D(Q)\displaystyle D(Q_{\cal F}) :=𝔼𝐱[do(𝐱,Q(𝐱))]\displaystyle:=\mathbb{E}_{\bf x}\ \left[d_{o}\big{(}{\bf x},Q_{\cal F}({\bf x})\big{)}\right]
=𝔼𝐱[d2(𝐱,Q(𝐱))].\displaystyle=\mathbb{E}_{\bf x}\ \left[d^{2}\big{(}{\bf x},Q_{\cal F}({\bf x})\big{)}\right]. (11)

In most practical settings, we may have access to a set of nn data points 𝒳={𝐱}{\cal X}=\{{\bf x}\} in lieu of the probability distribution p(𝐱)p({\bf x}). Then the expectation w.r.t 𝐱{\bf x} in (11) means averaging over the set 𝒳{\cal X}. Therefore the objective of KK-means clustering with K=2BK=2^{B} is to find the set of KK centroids, i.e. K{\cal F}^{K}, that minimizes D(Q)D(Q_{\cal F}) and can be expressed as

K\displaystyle{\cal F}^{K} =argmin𝒢(Mt,1)||=2BD(Q)\displaystyle=\underset{{\begin{subarray}{c}{\cal F}\subset{\cal G}(M_{t},1)\\ |{\cal F}|=2^{B}\end{subarray}}}{\operatorname{arg~{}min}}\ D(Q_{\cal F})
=argmin𝒢(Mt,1)||=2B𝔼𝐱[d2(𝐱,Q(𝐱))],\displaystyle=\underset{{\begin{subarray}{c}{\cal F}\subset{\cal G}(M_{t},1)\\ |{\cal F}|=2^{B}\end{subarray}}}{\operatorname{arg~{}min}}\ \mathbb{E}_{{\bf x}}\bigg{[}d^{2}({\bf x},Q_{\cal F}({\bf x}))\bigg{]}, (12)

and the associated quantizer is

QK(𝐱)\displaystyle Q_{{\cal F}^{K}}({\bf x}) =argmin𝐟ido(𝐱,𝐟i)\displaystyle=\underset{{\bf f}_{i}\in{\cal F}}{\operatorname{arg~{}min}}\ d_{o}({\bf x},{\bf f}_{i})
=argmin𝐟d2(𝐱,𝐟i).\displaystyle=\underset{{\bf f}\in{\cal F}}{\operatorname{arg~{}min}}\ d^{2}({\bf x},{\bf f}_{i}). (13)

However, finding the optimal solution for KK-means clustering is an NP-hard problem. Therefore, we use the Linde-Buzo-Gray algorithm [22] (outlined in Alg. 1) which is a heuristic algorithm that iterates between updating the cluster centroids and mapping a data point to the corresponding centroid that guarantees convergence to a local optimum. In Alg. 1, the only non-trivial step is the centroid calculation for a set of points. In contrast to the squared distortion measure in the Euclidean domain, the centroid of nn elements in a general manifold with respect to an arbitrary distortion measure does not necessarily exist in a closed form. However, the centroid computation on 𝒢(Mt,1){\cal G}(M_{t},1) is feasible because of the following Lemma.

Algorithm 1 Grassmannian KK-means Algorithm for (12)
1:procedure Codebook(𝒳,K{{\cal X}},K)
2:     Initialize random ={𝐟1,,𝐟K}{\cal F}=\{{\bf f}_{1},\cdots,{\bf f}_{K}\} on 𝒢(Mt,1){\cal G}(M_{t},1)
3:     𝒱k{𝐱:d(𝐱,𝐟k)d(𝐱,𝐟j),𝐱𝒳,kj}k{1,,K}{\cal V}_{k}\leftarrow\{{\bf x}:d({\bf x},{\bf f}_{k})\leq d({\bf x},{\bf f}_{j}),\forall{\bf x}\in{\cal X},k\neq j\}\ \forall k\in\{1,\cdots,K\}
4:     Q(𝐱Q_{\cal F}({\bf x}) \leftarrow argmin𝐟d2(𝐱,𝐟)𝐱𝒳\underset{{\bf f}\in{\cal F}}{\operatorname{arg~{}min}}\ d^{2}({\bf x},{\bf f})\ \forall{\bf x}\in{\cal X}
5:     while ! stopping criteria do
6:         Solve for {\cal F}:
𝐟kargmin𝐟𝐝𝟐(𝐱,𝐟)𝐱𝒱𝐤,𝐤{𝟏,,𝐊}\displaystyle{\bf f}_{k}\leftarrow\underset{\bf f}{\operatorname{arg~{}min}}\sum d^{2}({\bf x},{\bf f})\ \forall{\bf x}\in{\cal V}_{k},\forall k\in\{1,\cdots,K\}
Q(𝐱)argmin𝐟d2(𝐱,𝐟)𝐱𝒳\displaystyle Q_{\cal F}({\bf x})\leftarrow\underset{{\bf f}\in{\cal F}}{\operatorname{arg~{}min}}\ d^{2}({\bf x},{\bf f})\ \forall{\bf x}\in{{\cal X}}
𝒱k{𝐱:d(𝐱,𝐟k)d(𝐱,𝐟j),𝐱𝒳,kj}k\displaystyle{\cal V}_{k}\leftarrow\{{\bf x}:d({\bf x},{\bf f}_{k})\leq d({\bf x},{\bf f}_{j}),\forall{\bf x}\in{\cal X},k\neq j\}\ \forall k\in
{1,,K}\displaystyle\{1,\cdots,K\}
     return {\cal F}
Lemma 2 (Centroid computation).

For a set of points 𝒱k={𝐱i}i=1Nk{\cal V}_{k}=\{{{\bf x}_{i}}\}^{N_{k}}_{i=1}, 𝐱i𝒢(Mt,1){\bf x}_{i}\in{\cal G}(M_{t},1), that form the kk-th Voronoi partition, the centroid 𝐟k{\bf f}_{k} is

𝐟k\displaystyle{\bf f}_{k} =argmin𝐟𝒢(Mt,1)i=1Nkd2(𝐱i,𝐟)=eig(i=1Nk𝐱i𝐱iH),\displaystyle=\underset{{\bf f}\in{\cal G}(M_{t},1)}{\operatorname{arg~{}min}}\sum^{N_{k}}_{i=1}d^{2}({\bf x}_{i},{\bf f})={\rm eig}\bigg{(}\sum^{N_{k}}_{i=1}{\bf x}_{i}{\bf x}^{H}_{i}\bigg{)}, (15)

where eig(𝐘){\rm eig}({\bf Y}) is the dominant eigenvector of the matrix 𝐘{\bf Y}.

IV Grassmannian Codebook Design

In this section, we formally establish the connection between Grassmannian KK-means clustering and the optimal codebook construction. The transmitter and receiver use a BB-bit codebook 𝒢(Mt,1){\cal F}\subset{\cal G}(M_{t},1) to map the channel matrix 𝐇{\bf H} to a codeword in 𝐟{\bf f}\in{\cal F} according to (7). In order to define the optimality of the codebook, we first introduce the average normalized beamforming gain for {\cal F} as

Γav:\displaystyle{\Gamma}_{av}: =𝔼𝐇[Γ(𝐟)Γ(𝐯1)]\displaystyle=\mathbb{E}_{{\bf H}}\ \bigg{[}\frac{{\Gamma({\bf f})}}{\Gamma({\bf v}_{1})}\bigg{]}
=𝔼𝐇[𝐇𝐟22λ1]\displaystyle=\mathbb{E}_{{\bf H}}\ \bigg{[}\frac{\left\lVert{\bf H}{\bf f}\right\rVert_{2}^{2}}{\lambda_{1}}\bigg{]}
=𝔼𝐇[i=1Mtλi|𝐯iH𝐟|2λ1]\displaystyle=\mathbb{E}_{{\bf H}}\ \bigg{[}\sum^{M_{t}}_{i=1}\frac{{\lambda}_{i}|{\bf v}^{H}_{i}{\bf f}|^{2}}{{\lambda_{1}}}\bigg{]}
𝔼𝐯1[|𝐯1H𝐟|2].\displaystyle\geq\mathbb{E}_{{\bf v}_{1}}\ \bigg{[}|{\bf v}^{H}_{1}{\bf f}|^{2}\bigg{]}.

To measure the average distortion introduced by the quantization using {\cal F}, we use the loss in Γav{\Gamma}_{av} as given below:

L()\displaystyle L({\cal F}) :=𝔼𝐇[1Γav]\displaystyle:=\mathbb{E}_{{\bf H}}\ \big{[}1-{\Gamma}_{av}\big{]}
𝔼𝐯1[1|𝐯1H𝐟|2]:=Lub(),\displaystyle\leq\mathbb{E}_{{\bf v}_{1}}\ \big{[}1-|{\bf v}^{H}_{1}{\bf f}|^{2}\big{]}:=L_{\rm ub}({\cal F}), (16)

where Lub()L_{\rm ub}({\cal F}) is an upper bound of L()L({\cal F}) and the sufficient condition for equality in (16) is rank(𝐇)=1{\rm rank}({\bf H})=1. The optimal codebook intends to minimize L()L({\cal F}) by minimizing its upper bound Lub()L_{\rm ub}({\cal F}) as given in (16). Since the current limited feedback approach quantizes 𝐯1{\bf v}_{1} as 𝐟{\bf f}, it is reasonable to minimize Lub()L_{\rm ub}({\cal F}) which depends only on 𝐯1{\bf v}_{1} and 𝐟{\bf f}. This yields the following codebook design criterion.

Definition 3 (Codebook design criterion).

Over all of the BB-bit codebooks 𝒢(Mt,1){\cal F}\subset{\cal G}(M_{t},1), the Grassmannian codebook {\cal F}^{*} is the one that minimizes Lub()L_{\rm ub}(\cal{F}). Therefore

\displaystyle{\cal F}^{*} :=argmin𝒢(Mt,1)||=2BLub().\displaystyle:=\underset{{\begin{subarray}{c}{\cal F}\subset{\cal G}(M_{t},1)\\ |{\cal F}|=2^{B}\end{subarray}}}{\operatorname{arg~{}min}}\ L_{\rm ub}({\cal F}). (17)

Building on this discussion, we now state the method to find the optimal codebook in 𝒢(Mt,1){\cal G}(M_{t},1) as follows.

Theorem 1.

For a feedback channel with capacity BB bits per channel use, the Grassmannian codebook as defined in Definition 3 is the same as the set of cluster centroids found by the KK-means algorithm with K=2BK=2^{B} that minimizes D(Q)D(Q_{\cal F}) for a given distribution of optimal MRT beamforming vector through its training dataset as given in (11), i.e.

\displaystyle{\cal F}^{*} =K,\displaystyle={\cal F}^{K}, (18)

where K{\cal F}^{K} is given by (12).

Proof:

The optimal codebook is given by

\displaystyle{\cal F}^{*} =argmin𝒢(Mt,1)||=2BLub()\displaystyle=\underset{{\begin{subarray}{c}{\cal F}\subset{\cal G}(M_{t},1)\\ |{\cal F}|=2^{B}\end{subarray}}}{\operatorname{arg~{}min}}\ L_{\rm ub}({\cal F})
=argmin𝒢(Mt,1)||=2BE𝐯1[1|𝐯1H𝐟|2]\displaystyle=\underset{{\begin{subarray}{c}{\cal F}\subset{\cal G}(M_{t},1)\\ |{\cal F}|=2^{B}\end{subarray}}}{\operatorname{arg~{}min}}\ E_{{\bf v}_{1}}\ \big{[}1-|{\bf v}^{H}_{1}{\bf f}|^{2}\big{]}
=argmin𝒢(Mt,1)||=2BE𝐯1[min𝐟i(1|𝐯1H𝐟i|2)]\displaystyle=\underset{{\begin{subarray}{c}{\cal F}\subset{\cal G}(M_{t},1)\\ |{\cal F}|=2^{B}\end{subarray}}}{\operatorname{arg~{}min}}\ E_{{\bf v}_{1}}\ \bigg{[}\underset{{{\bf f}_{i}\in{\cal F}}}{\min}\big{(}1-|{\bf v}^{H}_{1}{\bf f}_{i}|^{2}\big{)}\bigg{]}
=argmin𝒢(Mt,1)||=2BE𝐯1[min𝐟id2(𝐯1,𝐟i)]=K,\displaystyle=\underset{{\begin{subarray}{c}{\cal F}\subset{\cal G}(M_{t},1)\\ |{\cal F}|=2^{B}\end{subarray}}}{\operatorname{arg~{}min}}\ E_{{\bf v}_{1}}\ \bigg{[}\underset{{\bf f}_{i}\in{\cal F}}{\min}\ d^{2}({{\bf v}_{1},{\bf f}_{i}})\bigg{]}={{\cal F}^{K}},

which completes the proof. ∎

Theorem 1 states the equivalence of the optimal codebook design with the KK-means clustering on 𝒢(Mt,1){\cal G}(M_{t},1). The benefit of making this connection is that it provides an approach for finding the optimal codebooks leveraging existing work on KK-means clustering on 𝒢(Mt,1){\cal G}(M_{t},1). We are now in a position to state the key steps in designing the optimal codebooks based on the Grassmanian KK-means clustering.

IV-A Codebook Construction

We assume a stationary distribution of the channel for a given coverage area of a transmitter. In order to construct the Grassmannian codebook, we construct ={𝐇}{\cal H}=\{{\bf H}\}, a set of channel realizations sampled for different user locations. The available channel dataset {\cal H} is split into training and testing datasets, train{\cal H}_{\rm train} and test{\cal H}_{\rm test} for generating beamforming codebooks and evaluating their performance, respectively. We assume that the size of the training set is large enough so that the sampling distribution closely approximates the original distribution. The training procedure yields the optimal codebook whose performance is evaluated by measuring Γav{\Gamma}_{av} for the channel realizations in the test set test{\cal H}_{\rm test}. Further details and benchmarking results are outlined in Section VI. The codebook design and performance evaluation processes are illustrated in Alg. 2.

Algorithm 2 Training and testing of the Grassmannian codebook
1:procedure TrainCodebook(train\cal H_{\rm train}, BB)
2:     Initialize training set 𝒳train={\cal X}_{\rm train}=\varnothing on 𝒢(Mt,1){\cal G}(M_{t},1)
3:     for 𝐇train{\bf H}\in{\cal H}_{\rm train} do
4:         𝐔𝚺𝐕Hsvd(𝐇){\bf U}{\bf\Sigma}{\bf V}^{H}\leftarrow{\rm svd}({\bf H})
5:         𝒳train𝒳train𝐯1{\cal X}_{\rm train}\leftarrow{\cal X}_{\rm train}\cup{\bf v}_{1}      
6:     {\cal F}^{*}\leftarrow CodeBook(𝒳train{\cal X}_{\rm train},K=2BK=2^{B}) return {\cal F}^{*}
7:procedure TestCodebook(test\cal H_{\rm test}, {\cal F}^{*})
8:     Initialize Γav=0\Gamma_{\rm av}=0
9:     for 𝐇test{\bf H}\in{\cal H}_{\rm test} do
10:         𝐔𝚺𝐕Hsvd(𝐇){\bf U}{\bf\Sigma}{\bf V}^{H}\leftarrow{\rm svd}({\bf H})
11:         𝐟argmin𝐟d2(𝐯1,𝐟){\bf f}\leftarrow\underset{{{\bf f}^{\prime}\in{\cal F}^{*}}}{\operatorname{arg~{}min}}\ d^{2}({\bf v}_{1},{\bf f}^{\prime})
12:         ΓavΓav+1#test(Γ(𝐟)Γ(𝐯1))\Gamma_{\rm av}\leftarrow\Gamma_{\rm av}+\frac{1}{\#{\cal H}_{\rm test}}\bigg{(}\frac{\Gamma({\bf f})}{\Gamma({\bf v}_{1})}\bigg{)}      return Γav{\Gamma}_{\rm av}

V Grassmannian Product Codebook for FD-MIMO

After discussing the general notion of the codebook construction for transmit beamforming for an Mt×MrM_{t}\times M_{r} MIMO system, we focus our attention to a special case of FD MIMO communication. We assume that the transmitter is equipped with a UPA with dimensions Mv×MhM_{v}\times M_{h} (MhMv=MtM_{h}M_{v}=M_{t}) while the receiver has one antenna, i.e. Mr=1M_{r}=1. Let 𝐇~\tilde{\bf H} represent the Mv×MhM_{v}\times M_{h} channel matrix where the (i,j)(i,j)-th element corresponds to the channel between the antenna element at the ii-th row and jj-th column of the UPA and the single receiver antenna at the user. Note that this system model appears as a special case of the general Mt×MrM_{t}\times M_{r} MIMO system discussed in the previous section where 𝐇T=vec(𝐇~T){\bf H}^{T}={\rm vec}(\tilde{\bf H}^{T}). Hence the codebook can be designed as given in Alg. 2 using the KK-means clustering in 𝒢(MvMh,1){\cal G}(M_{v}M_{h},1). Assuming Mv=Mh=O(n)M_{v}=M_{h}=O(n), the dimension of the codewords increase as O(n2)O(n^{2}). Naturally, the KK-means clustering will suffer from the curse of dimensionality as the difference in the maximum and minimum distances between two points in the dataset becomes less prominent as the dimension of the space increases [23]. In this section, we show that the codebook can be obtained by clustering on lower dimensional manifolds by exploring the geometry of the UPA. Considering the UPA channel as a matrix 𝐇~\tilde{\bf H}, we have the singular value decomposition of 𝐇~\tilde{\bf H} as follows.

𝐇~\displaystyle\tilde{\bf H} =𝐔𝚺𝐕H,\displaystyle={\bf U}{\bf\Sigma}{\bf V}^{H}, (19)

where 𝐔{\bf U} is the left singular matrix, 𝐕{\bf V} is the right singular matrix and 𝚺{\bf\Sigma} is the rectangular diagonal matrix with singular values σi\sigma_{i} in decreasing order. Let λi{\lambda}_{i} be the ii-th eigenvalue of 𝐇~H𝐇~\tilde{\bf H}^{H}\tilde{\bf H}, then λi=σi2{\lambda}_{i}={\sigma}^{2}_{i}. Further, 𝐮i{\bf u}_{i} and 𝐯i{\bf v}_{i} are the column vectors of 𝐔{\bf U} and 𝐕{\bf V} respectively with 𝐮iH𝐮i=1{\bf u}^{H}_{i}{\bf u}_{i}=1 and 𝐯iH𝐯i=1{\bf v}^{H}_{i}{\bf v}_{i}=1. Thus, 𝐮i𝒰(Mv,1){\bf u}_{i}\in{\cal U}(M_{v},1) and 𝐯i𝒰(Mh,1){\bf v}_{i}\in{\cal U}(M_{h},1). Then we have

𝐇T\displaystyle{\bf H}^{T} =vec(𝐇~T)\displaystyle={\rm vec}({\tilde{\bf H}}^{T})
=vec(𝐕𝚺𝐔T)\displaystyle={\rm vec}({\bf V}^{*}{\bf\Sigma}{\bf U}^{T})
=vec(i=1rank(𝐇~)σi𝐯i𝐮iT)\displaystyle={\rm vec}\bigg{(}\sum^{{\rm rank}(\tilde{\bf H})}_{i=1}\sigma_{i}{\bf v}_{i}^{*}{\bf u}^{T}_{i}\bigg{)}
=i=1rank(𝐇~)σi𝐮i𝐯i.\displaystyle=\sum^{{\rm rank}(\tilde{\bf H})}_{i=1}\sigma_{i}{\bf u}_{i}\otimes{\bf v}^{*}_{i}. (20)

From (20), we can represent 𝐇{\bf H} as the linear combination of 𝐮iT𝐯iH{\bf u}^{T}_{i}\otimes{\bf v}^{H}_{i} scaled with σi\sigma_{i}. Thus we have

𝐇\displaystyle{\bf H} =i=1rank(𝐇~)σi𝐮iT𝐯iH.\displaystyle=\sum^{{\rm rank}(\tilde{\bf H})}_{i=1}\sigma_{i}{\bf u}^{T}_{i}\otimes{\bf v}^{H}_{i}. (21)

Due to the finiteness of the physical paths between the transmitter to the receiver, it is well-known that rank(𝐇)<<min(Mv,Mh){\rm rank}({\bf H})<<\min(M_{v},M_{h}). For the sake of simplicity, we approximate the channel 𝐇{\bf H} with its dominant direction, i.e. 𝐮1T𝐯1H{\bf u}^{T}_{1}\otimes{\bf v}^{H}_{1}, which is called rank{\rm rank}-11 approximation and the approximated channel 𝐇¯\bar{\bf H} is given as

𝐇\displaystyle{\bf H} 𝐇¯=σ1𝐮1T𝐯1H.\displaystyle\approx\bar{\bf H}={\sigma_{1}}{\bf u}^{T}_{1}\otimes{\bf v}^{H}_{1}. (22)

Let 𝐱𝒰(Mt,1){\bf x}\in{\cal U}(M_{t},1) be a beamforming vector for 𝐇¯\bar{\bf H}. Then the KP form of 𝐇¯\bar{\bf H} naturally leads us to the idea of using 𝐱{\bf x} of the form 𝐱=𝐱v𝐱h{\bf x}={\bf x}_{v}\otimes{\bf x}_{h}. This motivates us to use separate codebooks h{\cal F}_{h} and v{\cal F}_{v} for the horizontal and vertical dimensions and enables to design product codebooks by clustering in lower dimensional manifolds. The beamforming gain Γ(𝐱){\Gamma}({\bf x}) for 𝐇¯\bar{\bf H} can now be simplified as

Γ(𝐱)\displaystyle{\Gamma}({\bf x}) =𝐇¯𝐱22\displaystyle=\left\lVert\bar{\bf H}{\bf x}\right\rVert^{2}_{2}
=σ1(𝐮1T𝐯1H)(𝐱v𝐱h)22\displaystyle=\left\lVert{\sigma_{1}}({\bf u}^{T}_{1}\otimes{\bf v}^{H}_{1})({\bf x}_{v}\otimes{\bf x}_{h})\right\rVert^{2}_{2}
=(a)σ12𝐮1T𝐱v22𝐯1H𝐱h22\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}{\sigma^{2}_{1}}\left\lVert{\bf u}^{T}_{1}{\bf x}_{v}\right\rVert_{2}^{2}\ \left\lVert{\bf v}^{H}_{1}{\bf x}_{h}\right\rVert_{2}^{2}
=σ12|𝐮1T𝐱v|2|𝐯1H𝐱h|2,\displaystyle={\sigma^{2}_{1}}\ |{\bf u}^{T}_{1}{\bf x}_{v}|^{2}\ |{\bf v}^{H}_{1}{\bf x}_{h}|^{2},

where step (a)(a) follows from the fact that 𝐀𝐁2=𝐀2𝐁2\left\lVert{\bf A}\otimes{\bf B}\right\rVert_{2}=\left\lVert{\bf A}\right\rVert_{2}\left\lVert{\bf B}\right\rVert_{2} for two matrices 𝐀{\bf A} and 𝐁{\bf B} of any dimensions. The optimal MRT beamforming vector 𝐟{\bf f} for 𝐇¯\bar{\bf H} can be simplified as

𝐟\displaystyle{\bf f} =argmax𝐱𝒰(Mt,1)Γ(𝐱)\displaystyle=\underset{{\bf x}\in{{\cal U}(M_{t},1)}}{\operatorname{arg~{}max}}\ {\Gamma({\bf x})}
=argmax𝐱v𝒰(Mv,1)𝐱h𝒰(Mh,1)|𝐮1T𝐱v|2|𝐯1H𝐱h|2\displaystyle=\underset{{\begin{subarray}{c}{\bf x}_{v}\in{\cal U}(M_{v},1)\\ {\bf x}_{h}\in{\cal U}(M_{h},1)\end{subarray}}}{\operatorname{arg~{}max}}\ |{\bf u}^{T}_{1}{\bf x}_{v}|^{2}\ |{\bf v}^{H}_{1}{\bf x}_{h}|^{2}
=argmax𝐱v𝒰(Mv,1)|𝐮1T𝐱v|2\displaystyle=\underset{{\bf x}_{v}\in{\cal U}(M_{v},1)}{\operatorname{arg~{}max}}\ |{\bf u}^{T}_{1}{\bf x}_{v}|^{2}
argmax𝐱h𝒰(Mh,1)|𝐯1H𝐱h|2\displaystyle\qquad\otimes\underset{{\bf x}_{h}\in{\cal U}(M_{h},1)}{\operatorname{arg~{}max}}\ |{\bf v}^{H}_{1}{\bf x}_{h}|^{2}
=𝐟v𝐟h,\displaystyle={\bf f}_{v}\otimes{\bf f}_{h}, (23)

where

𝐟v=argmax𝐱v𝒰(Mv,1)|𝐮1T𝐱v|2,𝐟h=argmax𝐱h𝒰(Mh,1)|𝐯1H𝐱h|2.\displaystyle{\bf f}_{v}=\underset{{\bf x}_{v}\in{\cal U}(M_{v},1)}{\operatorname{arg~{}max}}\ |{\bf u}^{T}_{1}{\bf x}_{v}|^{2},\ {\bf f}_{h}=\underset{{\bf x}_{h}\in{\cal U}(M_{h},1)}{\operatorname{arg~{}max}}\ |{\bf v}^{H}_{1}{\bf x}_{h}|^{2}. (24)

Observe that one possible solution for optimal MRT beamformer 𝐟{\bf f} in (23) is given by 𝐟v=𝐮1{\bf f}_{v}={\bf u}^{*}_{1} and 𝐟h=𝐯1{\bf f}_{h}={\bf v}_{1}. Following Remark 1, we can argue that 𝐟=𝐟v𝐟h{\bf f}={\bf f}_{v}\otimes{\bf f}_{h}, where 𝐟u𝒢(Mv,1){\bf f}_{u}\in{\cal G}(M_{v},1) and 𝐟v𝒢(Mh,1){\bf f}_{v}\in{\cal G}(M_{h},1).

The loss in average normalized beamforming gain Γav{\Gamma_{av}} with the codebook =vh{\cal F}={\cal F}_{v}\otimes{\cal F}_{h} can be bounded as

L()\displaystyle L({\cal F}) =𝔼𝐮1,𝐯1[1Γ(𝐟v𝐟h)Γ(𝐮1𝐯1)]\displaystyle=\mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}1-\frac{{\Gamma({\bf f}_{v}\otimes{\bf f}_{h})}}{\Gamma({\bf u}^{*}_{1}\otimes{\bf v}_{1})}\bigg{]}
=𝔼𝐮1,𝐯1[1|(𝐮1T𝐯1H)(𝐟v𝐟h)|2]\displaystyle=\mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}1-|({\bf u}^{T}_{1}\otimes{\bf v}^{H}_{1})({\bf f}_{v}\otimes{\bf f}_{h})|^{2}\bigg{]}
2𝔼𝐮1,𝐯1[1|(𝐮1T𝐯1H)(𝐟v𝐟h)|]\displaystyle\leq 2\mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}1-|({\bf u}^{T}_{1}\otimes{\bf v}^{H}_{1})({\bf f}_{v}\otimes{\bf f}_{h})|\bigg{]}
2𝔼𝐮1,𝐯1[minθ,ϕ((ejθ𝐮1ejϕ𝐯1)(𝐟v𝐟h))]\displaystyle\leq 2\mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}\underset{\theta,\phi}{\min}\big{(}\left\lVert(e^{j\theta}{\bf u}^{*}_{1}\otimes e^{j\phi}{\bf v}_{1})-({\bf f}_{v}\otimes{\bf f}_{h})\right\rVert\big{)}\bigg{]}
2𝔼𝐮1,𝐯1[minθ,ϕ(ejθ𝐮12ejϕ𝐯1𝐟h2+\displaystyle\leq 2\mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}\underset{\theta,\phi}{\min}\big{(}\left\lVert e^{j\theta}{\bf u}^{*}_{1}\right\rVert_{2}\left\lVert e^{j\phi}{\bf v}_{1}-{\bf f}_{h}\right\rVert_{2}+
ejθ𝐮1𝐟v2ejϕ𝐟h2)]\displaystyle\qquad\qquad\left\lVert e^{j\theta}{\bf u}^{*}_{1}-{\bf f}_{v}\right\rVert_{2}\left\lVert e^{j\phi}{\bf f}_{h}\right\rVert_{2}\big{)}\bigg{]}
=2𝔼𝐮1,𝐯1[minθ,ϕ(ejϕ𝐯1𝐟h2+ejθ𝐮1𝐟v2)]\displaystyle=2\mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}\underset{\theta,\phi}{\min}\big{(}\left\lVert e^{j\phi}{\bf v}_{1}-{\bf f}_{h}\right\rVert_{2}+\left\lVert e^{j\theta}{\bf u}^{*}_{1}-{\bf f}_{v}\right\rVert_{2}\big{)}\bigg{]}
=2𝔼𝐮1,𝐯1[(1|𝐯1H𝐟h|)1/2+(1|𝐮1T𝐟v|)1/2]\displaystyle=2\mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}(1-|{\bf v}^{H}_{1}{\bf f}_{h}|)^{1/2}+(1-|{\bf u}^{T}_{1}{\bf f}_{v}|)^{1/2}\bigg{]}
2𝔼𝐮1,𝐯1[(1|𝐯1H𝐟h|2)+(1|𝐮1T𝐟v|2)]=Lub().\displaystyle\leq\scalebox{0.9}{$2\mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}(1-|{\bf v}^{H}_{1}{\bf f}_{h}|^{2})+(1-|{\bf u}^{T}_{1}{\bf f}_{v}|^{2})\bigg{]}$}=L_{\rm ub}({\cal F}). (25)
Definition 4 (Grassmannian product codebook).

Under the rank-1 approximation of the channel, 𝐇𝐇¯=σ1𝐮1T𝐯1H{\bf H}\approx\bar{\bf H}=\sigma_{1}{\bf u}^{T}_{1}\otimes{\bf v}^{H}_{1}, the BB-bit Grassmannian product codebook =vh{\cal F}^{*}={\cal F}_{v}^{*}\otimes{\cal F}_{h}^{*} is the one that satisfies the codebook design criteria in Definition 3 for a given [Bv,Bh][B_{v},B_{h}] where |v|=2Bv|{\cal F}^{*}_{v}|=2^{B_{v}}, |h|=2Bh|{\cal F}^{*}_{h}|=2^{B_{h}} and Bh+Bv=BB_{h}+B_{v}=B.

We will now state the method to construct the product codebook {\cal F}^{*} as follows.

Lemma 3.

The Grassmannian product codebook =vh{\cal F}^{*}={\cal F}_{v}^{*}\otimes{\cal F}_{h}^{*} as defined in Definition 4 is constructed using the set of centroids vK{\cal F}^{K}_{v} and hK{\cal F}^{K}_{h} obtained from the independent KK-means clustering of the optimal MRT beamforming vectors 𝐯1{\bf v}_{1} and 𝐮1{\bf u}^{*}_{1} on 𝒢(Mv,1){\cal G}(M_{v},1) and 𝒢(Mh,1){\cal G}(M_{h},1) with K=2BvK=2^{B_{v}} and K=2BhK=2^{B_{h}}, respectively.

Proof:

From Definition 4,

\displaystyle{\cal F}^{*} =vh\displaystyle={\cal F}^{*}_{v}\otimes{\cal F}^{*}_{h}
=argminv,hLub()\displaystyle=\underset{{\cal F}_{v},{\cal F}_{h}}{\operatorname{arg~{}min}}\ L_{\rm ub}({\cal F})
=argminv,h𝔼𝐮1,𝐯1[(1|𝐯1H𝐟h|2)+(1|𝐮1𝐟v|2)]\displaystyle=\underset{{\cal F}_{v},{\cal F}_{h}}{\operatorname{arg~{}min}}\ \mathbb{E}_{{\bf u}_{1},{\bf v}_{1}}\ \bigg{[}(1-|{\bf v}^{H}_{1}{\bf f}_{h}|^{2})+(1-|{\bf u}^{*}_{1}{\bf f}_{v}|^{2})\bigg{]}
=argminv𝒢(Mv,1)|v|=2Bv𝔼𝐮1(1|𝐮1T𝐟v|2)\displaystyle=\underset{{\begin{subarray}{c}{\cal F}_{v}\subset{\cal G}(M_{v},1)\\ |{\cal F}_{v}|=2^{B_{v}}\end{subarray}}}{\operatorname{arg~{}min}}\mathbb{E}_{{\bf u}_{1}}\ (1-|{\bf u}^{T}_{1}{\bf f}_{v}|^{2})\ \otimes
argminh𝒢(Mh,1)|h|=2Bh𝔼𝐯1(1|𝐯1H𝐟h|2),\displaystyle\qquad\qquad\underset{{\begin{subarray}{c}{\cal F}_{h}\subset{\cal G}(M_{h},1)\\ |{\cal F}_{h}|=2^{B_{h}}\end{subarray}}}{\operatorname{arg~{}min}}\mathbb{E}_{{\bf v}_{1}}\ (1-|{\bf v}^{H}_{1}{\bf f}_{h}|^{2}),
where v\displaystyle\text{where }{\cal F}^{*}_{v} =argminv𝒢(Mv,1)||=2Bv𝔼𝐮1[1|𝐮1T𝐟v|2],\displaystyle=\underset{{\begin{subarray}{c}{\cal F}_{v}\subset{\cal G}(M_{v},1)\\ |{\cal F}|=2^{B_{v}}\end{subarray}}}{\operatorname{arg~{}min}}\ \mathbb{E}_{{\bf u}_{1}}\ \big{[}1-|{\bf u}^{T}_{1}{\bf f}_{v}|^{2}\big{]},
h\displaystyle{\cal F}^{*}_{h} =argminh𝒢(Mh,1)||=2Bh𝔼𝐯1[1|𝐯1H𝐟h|2].\displaystyle=\underset{{\begin{subarray}{c}{\cal F}_{h}\subset{\cal G}(M_{h},1)\\ |{\cal F}|=2^{B_{h}}\end{subarray}}}{\operatorname{arg~{}min}}\mathbb{E}_{{\bf v}_{1}}\ \big{[}1-|{\bf v}^{H}_{1}{\bf f}_{h}|^{2}\big{]}.

Following Theorem 1, we have h=hK,v=hK{\cal F}_{h}^{*}={\cal F}_{h}^{K},{\cal F}_{v}^{*}={\cal F}_{h}^{K}. Thus, the Grassmannian product codebook is given as

\displaystyle{\cal F}^{*} =vh=vKhK.\displaystyle={\cal F}^{*}_{v}\otimes{\cal F}^{*}_{h}={\cal F}^{K}_{v}\otimes{\cal F}^{K}_{h}. (26)

From Lemma 3, it is possible to perform KK-means clustering independently on 𝒢(Mv,1){\cal G}(M_{v},1), 𝒢(Mh,1){\cal G}(M_{h},1) and construct the product codebook with reduced complexity instead of performing KK-means clustering on 𝒢(MhMv,1){\cal G}(M_{h}M_{v},1). The training and testing procedure of the proposed product codebook design for a given set of channel realizations {\cal H} is given in the following remark.

Remark 3.

For a training set train{\cal H}_{\rm train}, the Grassmann product codebook {\cal F}^{*} defined as =vh{\cal F}^{*}={\cal F}^{*}_{v}\otimes{\cal F}^{*}_{h}, where v{\cal F}^{*}_{v} and h{\cal F}^{*}_{h} are obtained by the procedure TrainProdCodeBook(train{\cal H}_{\rm train},[Bv,Bh][B_{v},B_{h}]) using Alg. 3, where BvB_{v}, BhB_{h} are the number of bits used to encode 𝐮1{\bf u}^{*}_{1}, 𝐯1{\bf v}_{1} respectively (B=Bv+BhB=B_{v}+B_{h}). The performance of the codebook {\cal F}^{*} is evaluated by the procedure TestProdCodeBook(test{\cal H}_{\rm test},[v,h][{\cal F}_{v}^{*},{\cal F}_{h}^{*}]) as given in Alg. 3.

Algorithm 3 Training and testing of the Grassmannian product codebook
1:procedure TrainProdCodebook(train\cal H_{\rm train}, [Bv,Bh][B_{v},B_{h}])
2:     Initialize training sets 𝒳train={\cal X}_{\rm train}=\varnothing and 𝒴train={\cal Y}_{\rm train}=\varnothing on 𝒢(Mh,1){\cal G}(M_{h},1) and 𝒢(Mv,1){\cal G}(M_{v},1) respectively
3:     for 𝐇train{\bf H}\in{\cal H}_{\rm train} do
4:         Generate 𝐇~\tilde{\bf H} from 𝐇{\bf H}, 𝐔𝚺𝐕Hsvd(𝐇~){\bf U}{\bf\Sigma}{\bf V}^{H}\leftarrow{\rm svd}(\tilde{\bf H})
5:         𝒳train𝒳train𝐯1{\cal X}_{\rm train}\leftarrow{\cal X}_{\rm train}\cup{\bf v}_{1}, 𝒴train𝒴train𝐮1{\cal Y}_{\rm train}\leftarrow{\cal Y}_{\rm train}\cup{\bf u}^{*}_{1}      
6:     h{\cal F}_{h}^{*}\leftarrow CodeBook(𝒳train{\cal X}_{\rm train}, 2Bh2^{B_{h}})
7:     v{\cal F}_{v}^{*}\leftarrow CodeBook(𝒴train{\cal Y}_{\rm train}, 2Bv2^{B_{v}}) return [v[{\cal F}_{v}^{*}, h]{\cal F}_{h}^{*}]
8:procedure TestProdCodebook(test\cal H_{\rm test}, [v[{\cal F}_{v}^{*}, h]{\cal F}_{h}^{*}])
9:     Initialize Γav=0\Gamma_{\rm av}=0
10:     for 𝐇test{\bf H}\in{\cal H}_{\rm test} do
11:         Generate 𝐇~\tilde{\bf H} from 𝐇{\bf H}, 𝐔𝚺𝐕Hsvd(𝐇~){\bf U}{\bf\Sigma}{\bf V}^{H}\leftarrow{\rm svd}(\tilde{\bf H})
12:         𝐟hargmin𝐟hd2(𝐯1,𝐟){\bf f}_{h}\leftarrow\underset{{{\bf f}^{\prime}\in{\cal F}_{h}^{*}}}{\operatorname{arg~{}min}}\ d^{2}({\bf v}_{1},{\bf f}^{\prime}),
13:         𝐟vargmin𝐟vd2(𝐮1,𝐟){\bf f}_{v}\leftarrow\underset{{{\bf f}^{\prime}\in{\cal F}_{v}^{*}}}{\operatorname{arg~{}min}}\ d^{2}({\bf u}^{*}_{1},{\bf f}^{\prime})
14:         ΓavΓav+1#testΓ(𝐟v𝐟h)Γ(𝐮1𝐯1)\Gamma_{\rm av}\leftarrow\Gamma_{\rm av}+\frac{1}{\#{\cal H}_{\rm test}}\frac{\Gamma({\bf f}_{v}\otimes{\bf f}_{h})}{\Gamma({\bf u}^{*}_{1}\otimes{\bf v}_{1})}      return Γav{\Gamma}_{\rm av}

VI Simulations and Discussions

The proposed learning framework requires channel datasets to construct the corresponding beamforming codebooks. In this section, we describe the scenario adopted for the channel dataset generation and evaluate the performance of the generated codebooks in terms of Γav{\Gamma}_{av}.

VI-A Dataset generation

In this paper, we consider an indoor communication scenario between the base station and the users with Mr=1M_{r}=1 operating at a frequency of 2.5 GHz. The described scenario is a part of the DeepMIMO dataset [24]. The channel dataset with the parameters given in Table. I is generated using the dataset generation script of DeepMIMO.

Refer to caption
Figure 3: Comparison of Ripley’s K function K(d)K(d) of different channel models with Mt=2M_{t}=2 and Mr=1M_{r}=1.
TABLE I: Parameters of the DeepMIMO dataset
Name of scenario I1_2p5
Active BS 3
Active users 1 to 704
Number of antennas (x, y, z) (Mv,Mh,1M_{v},M_{h},1)
System bandwidth 0.2 GHz
Antennas spacing 0.5
Number of OFDM sub-carriers 1
OFDM sampling factor 1
OFDM limit 1

VI-B Results

We now demonstrate the performance of the proposed codebook design techniques through simulations. The dataset generated as described in Section VI-A is represented as ={𝐇}{\cal H}=\{{\bf H}\} where 𝐇1×Mt{\bf H}\in\mathbb{C}^{1\times M_{t}} and partitioned into training and testing datasets, train{\cal H}_{\rm train} and test{\cal H}_{\rm test} respectively. We refer to the Grassmannian codebook design in Section IV as Prop. I : [B][B] when BB bits are used for generating the codebook as illustrated in Alg. 2. The Grassmannian product codebook design described in Section V is referred as Prop. II : [Bv,Bh][B_{v},B_{h}] where [Bv,Bh][B_{v},B_{h}] is the allocation of the feedback bits for constructing the codebooks using Alg. 3 and B=Bv+BhB=B_{v}+B_{h}.

We compare the average normalized beamforming gain Γav{\Gamma}_{av} of the two proposed codebooks with that of the DFT structured KP codebooks [25] (referred as KP DFT) and the codebooks generated based on the Grassmannian line packings for correlated channel [10] (referred to as GLP). The Grassmannian line packings required for the codebook construction in [10] were obtained from [9], [26]. The channel correlation matrix 𝐑{\bf R} is calculated from the training channel dataset train{\cal H}_{\rm train} according to 𝐑=𝔼𝐇(𝐇H𝐇){\bf R}=\mathbb{E}_{\bf H}\big{(}{{\bf H}^{H}{\bf H}}\big{)}.

Refer to caption
Refer to caption
Figure 4: Average normalized beamforming gain for different transmit antenna configurations Mv×MhM_{v}\times M_{h} and feedback-bit allocations [Bv,Bh][B_{v},B_{h}].

In Fig. 4, the normalized beamforming gains for different transmit antenna configurations Mv×MhM_{v}\times M_{h} and feedback-bit allocations [Bv,Bh][B_{v},B_{h}] using the four codebook design techniques are plotted. The KP codebooks relying on DFT structure are simple to construct but the quantized beams may not be effective because the KP codebooks search only the beams lying in the direction of the right and left dominant singular vectors of the reshaped channel in (19). While the Grassmannian line packings were shown to be the optimal codebooks for uncorrelated Rayleigh fading channel and were extended to correlated Rayleigh fading channel, they do not necessarily work well with any arbitrary channel distribution. It was not possible to show the performance of the GLP codebooks for all Mv×MhM_{v}\times M_{h} because of the challenges in finding the best line packings. Simulation results in Fig. 4 demonstrate that the codebooks designed using the proposed techniques adapt well to the underlying channel distribution. They also outperform KP DFT codebooks [25], GLP codebooks [10] and provide beamforming gains comparable to that of optimal MRT beamforming. Equivalently, they reduce the feedback overhead because they can maintain the same quantization performance with less overhead compared to the other codebooks.

VII Conclusion

This paper considered the problem of codebook based MRT beamforming in an FDD-MIMO system operating under arbitrary channel conditions. Leveraging well-known connections of this problem with Grassmannian line packing, we identified that the problem of finding the optimal MRT beamforming codebooks for any arbitrary channel distribution can be constructed using the KK-means clustering of MRT beamforming vectors on 𝒢(Mt,1){\cal G}(M_{t},1). We presented the Grassmannian KK-means clustering algorithm to construct the beamforming codebooks. We showed that this approach can be extended to FD-MIMO systems with UPA antennas to design product codebooks with reduced computational complexity. As our future work, we will be investigating the design of optimal codebooks for limited feedback unitary precoding for spatial multiplexing in MIMO systems with arbitrary underlying channel. Overall, this paper provides a concrete example of a design problem with rigorous mathematical underpinnings that benefits from a classical shallow learning approach.

References

  • [1] A. Narula, M. J. Lopez, M. D. Trott, and G. W. Wornell, “Efficient use of side information in multiple-antenna data transmission over fading channels,” IEEE Journal on Sel. Areas in Commun., vol. 16, no. 8, pp. 1423–1436, Oct 1998.
  • [2] D. J. Love, R. W. Heath, V. K. N. Lau, D. Gesbert, B. D. Rao, and M. Andrews, “An overview of limited feedback in wireless communication systems,” IEEE Journal on Sel. Areas in Commun., vol. 26, no. 8, pp. 1341–1365, Oct 2008.
  • [3] C. Wen, W. Shih, and S. Jin, “Deep learning for massive MIMO CSI feedback,” IEEE Wireless Commun. Letters, vol. 7, no. 5, pp. 748–751, Oct 2018.
  • [4] T. Wang, C. Wen, S. Jin, and G. Y. Li, “Deep learning-based CSI feedback approach for time-varying massive MIMO channels,” IEEE Wireless Commun. Letters, vol. 8, no. 2, pp. 416–419, Apr 2019.
  • [5] K. K. Mukkavilli, A. Sabharwal, E. Erkip, and B. Aazhang, “On beamforming with finite rate feedback in multiple-antenna systems,” IEEE Trans. on Inf. Theory, vol. 49, no. 10, pp. 2562–2579, Oct 2003.
  • [6] Pengfei Xia and G. B. Giannakis, “Design and analysis of transmit-beamforming based on limited-rate feedback,” IEEE Trans. on Signal Proc., vol. 54, no. 5, pp. 1853–1863, May 2006.
  • [7] K. Huang, R. W. Heath, Jr., and J. G. Andrews, “Limited feedback beamforming over temporally-correlated channels,” IEEE Trans. on Signal Proc., vol. 57, no. 5, pp. 1959–1975, May 2009.
  • [8] B. Lee, J. Choi, J. Seol, D. J. Love, and B. Shim, “Antenna grouping based feedback compression for FDD-based massive MIMO systems,” IEEE Trans. on Commun., vol. 63, no. 9, pp. 3261–3274, Sep 2015.
  • [9] D. J. Love, R. W. Heath, and T. Strohmer, “Grassmannian beamforming for multiple-input multiple-output wireless systems,” IEEE Trans. on Inf. Theory, vol. 49, no. 10, pp. 2735–2747, Oct 2003.
  • [10] D. J. Love and R. W. Heath, “Limited feedback diversity techniques for correlated channels,” IEEE Trans. on Vehicular Tech., vol. 55, no. 2, pp. 718–722, Mar 2006.
  • [11] L. Zheng and D. N. C. Tse, “Communication on the Grassmann manifold: A geometric approach to the noncoherent multiple-antenna channel,” IEEE Trans. on Inf. Theory, vol. 48, no. 2, pp. 359–383, Feb 2002.
  • [12] D. J. Love and R. W. Heath, “Limited feedback unitary precoding for spatial multiplexing systems,” IEEE Trans. on Inf. Theory, vol. 51, no. 8, pp. 2967–2976, Aug 2005.
  • [13] Y. Huang, P. P. Liang, Q. Zhang, and Y. Liang, “A machine learning approach to MIMO communications,” in Proc. IEEE ICC, 2018, pp. 1–6.
  • [14] Y. Du, G. Zhu, J. Zhang, and K. Huang, “Automatic recognition of space-time constellations by learning on the Grassmann manifold,” IEEE Trans. on Signal Proc., vol. 66, no. 22, pp. 6031–6046, Nov 2018.
  • [15] A. Abdelnasser, E. Hossain, and D. I. Kim, “Clustering and resource allocation for dense femtocells in a two-tier cellular OFDMA network,” IEEE Trans. on Wireless Commun., vol. 13, no. 3, pp. 1628–1641, Mar 2014.
  • [16] J. B. Andersen, “Antenna arrays in mobile communications: gain, diversity, and channel capacity,” IEEE Antennas and Propagation Magazine, vol. 42, no. 2, pp. 12–16, Apr 2000.
  • [17] T. K. Y. Lo, “Maximum ratio transmission,” IEEE Trans. on Commun., vol. 47, no. 10, pp. 1458–1461, Oct 1999.
  • [18] Ching-Hok Tse, Kun-Wah Yip, and Tung-Sang Ng, “Performance tradeoffs between maximum ratio transmission and switched-transmit diversity,” in Proc. IEEE PIMRC, vol. 2, 2000, pp. 1485–1489.
  • [19] W. M. Boothby, An introduction to differentiable manifolds and Riemannian geometry.   Academic press, 1986.
  • [20] J. H. Conway, R. H. Hardin, and N. J. Sloane, “Packing lines, planes, etc.: Packings in Grassmannian spaces,” Experimental mathematics, vol. 5, no. 2, pp. 139–159, Apr 1996.
  • [21] P. Dixon, A. H. El-Shaarawi, and W. W. Piegorsch, “Ripley‘s K function,” Encyclopedia of Environmetrics, vol. 3, 2012.
  • [22] Y. Linde, A. Buzo, and R. Gray, “An algorithm for vector quantizer design,” IEEE Trans. on Commun., vol. 28, no. 1, pp. 84–95, Jan 1980.
  • [23] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference and prediction, 2nd ed.   Springer, 2009.
  • [24] A. Alkhateeb, “DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications,” in Proc. ITA, Feb 2019, pp. 1–8.
  • [25] J. Choi, K. Lee, D. J. Love, T. Kim, and R. W. Heath, “Advanced limited feedback designs for FD-MIMO using uniform planar arrays,” in proc. IEEE GLOBECOM, 2015, pp. 1–6.
  • [26] A. Medra and T. N. Davidson, “Flexible codebook design for limited feedback systems via sequential smooth optimization on the Grassmannian manifold,” IEEE Trans. on Signal Proc., vol. 62, no. 5, pp. 1305–1318, Mar 2014.