Hybrid Beamforming and Adaptive RF Chain Activation for Uplink Cell-Free Millimeter-Wave Massive MIMO Systems

Nhan Thanh Nguyen, Kyungchun Lee, , and Huaiyu Dai N. T. Nguyen and K. Lee are with the Department of Electrical and Information Engineering and the Research Center for Electrical and Information Technology, Seoul National University of Science and Technology, 232 Gongneung-ro, Nowon-gu, Seoul, 01811, Republic of Korea (e-mail: nhan.nguyen, kclee@seoultech.ac.kr).H. Dai is with the Department of Electrical and Computer Engineering, North Carolina State University, NC, USA. (e-mail: Huaiyu_Dai@ncsu.edu).

Abstract

In this work, we investigate hybrid analog–digital beamforming (HBF) architectures for uplink cell-free (CF) millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems. We first propose two HBF schemes, namely, decentralized HBF (D-HBF) and semi-centralized HBF (SC-HBF). In the former, both the digital and analog beamformers are generated independently at each AP based on the local channel state information (CSI). In contrast, in the latter, only the digital beamformer is obtained locally at the AP, whereas the analog beamforming matrix is generated at the central processing unit (CPU) based on the global CSI received from all APs. We show that the analog beamformers generated in these two HBF schemes provide approximately the same achievable rates despite the lower complexity of D-HBF and its lack of CSI requirement. Furthermore, to reduce the power consumption, we propose a novel adaptive radio frequency (RF) chain-activation (ARFA) scheme, which dynamically activates/deactivates RF chains and their connected analog-to-digital converters (ADCs) and phase shifters (PSs) at the APs based on the CSI. For the activation of RF chains, low-complexity algorithms are proposed, which can achieve significant improvement in energy efficiency (EE) with only a marginal loss in the total achievable rate.

Index Terms:

Cell-free massive MIMO, mmWave communication, hybrid beamforming, RF chain activation.

I Introduction

Recently, many attempts have been made to utilize millimeter waves (mmWaves) for high-data-rate mobile broadband communications. The main challenge of mmWave communication is the large path loss due to high carrier frequencies, which significantly limits the system performance and cell coverage [1, 2]. Fortunately, the short wavelength of mmWave systems facilitates the deployment of massive multiple-input multiple-output (MIMO) systems, which can provide large beamforming gains to compensate for the path loss in mmWave channels. Furthermore, Ngo et al. in [3, 4] introduce a cell-free (CF) massive MIMO architecture, in which a very large number of distributed access points (APs) connected to a single central processing unit (CPU) simultaneously serve a much smaller number of users over the same time/frequency resources. Therefore, the CF massive MIMO system is capable of providing good quality of service (QoS) uniformly to all served users, regardless of their locations in the coverage area, by using simple linear signal processing schemes [3, 4, 5]. From these aspects, the combination of mmWave and CF massive MIMO systems with the deployment of large numbers of antennas at the APs could be a symbiotic convergence of technologies that can significantly improve the performance of next-generation wireless communication systems [5].

I-A Related works

The performance CF massive MIMO systems in conventional sub-6-GHz frequency bands have been analyzed intensively [3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]. In particular, the closed-form expression of the achievable rate of CF massive MIMO systems is derived in [3, 4, 6, 7]. Furthermore, comparisons of CF massive MIMO and conventional small-cell MIMO systems in [3, 4, 8, 9, 7] show that the former is more robust to shadow fading correlation and significantly outperforms the latter in terms of throughput and coverage probability. In [10], a low-complexity power control technique with zero-forcing (ZF) precoding design is introduced for CF massive MIMO systems. In [11], an optimal power-allocation algorithm is proposed to maximize the total EE, which can double the total EE compared to the equal power control scheme. Furthermore, an AP selection (APS) scheme is proposed in [11], in which each user chooses and connects to only a subset of APs to reduce the power consumption caused by the backhaul links. Meanwhile, in [12], the EE maximization problem is considered under the effect of quantization distortion of the weighted received signals at the APs. In [13, 14], the authors propose AP-clustering approaches to overcome limitations upon the scalability of CF massive MIMO systems. Specifically, the APs are grouped in clusters and multiple CPUs are used to manage these clusters, leading to a reduction in the data distribution and computational complexity involved in channel estimation, power control, and beamforming.

Another line of work has attempted to investigate the performance of CF massive MIMO systems in mmWave channels [21, 22, 23, 5]. In particular, Femenias et al. introduced a hybrid beamforming (HBF) framework for CF mmWave massive MIMO systems with limited fronthaul capacity in [5], and the eigen-beamforming scheme is applied to generate precoders/combiners. Specifically, the phases of analog beamformers are obtained by quantizing those of dominant eigenvectors of the channel covariance matrix known at the APs. In [21], a hybrid precoding algorithm leveraging antenna-array response vectors is applied to distributed MIMO systems with partially-connected HBF architecture. Although the partially-connected HBF structure has lower power consumption than the fully-connected one, it cannot fully exploit the beamforming gains [24]. In [22] and [23], Alonzo et al. introduce an uplink multi-user estimation scheme along with low-complexity HBF architectures. Specifically, the baseband and analog precoders at each AP are generated by decomposing the fully digital ZF precoding matrix using the block-coordinate descent algorithm. Moreover, the problems of pilot assignment and channel estimation are considered in [25] and [26], respectively.

In mmWave communications, a large number of antennas at the APs not only provide beamforming gains to compensate for the propagation loss in mmWave channels, but also enhance favorable propagation [27]. In particular, it is shown in [27] that given a fixed total number of antennas at the APs, employing more antennas at fewer APs is more beneficial than deploying more APs with fewer antennas in terms of favorable propagation and channel hardening. Furthermore, large antenna arrays at the APs can achieve high beamforming gains to overcome the severe path loss in mmWave communication. However, an excessively high power consumption is required in this deployment because signal processing in the conventional digital domain requires a dedicated RF chain and analog-to-digital converter (ADC) for each antenna. Therefore, energy-efficient HBF schemes dedicated to CF mmWave massive MIMO systems are required, but only limited works exist in the literature focusing on the optimization of HBF for CF mmWave massive MIMO systems. Specifically, in [21, 22, 5, 23], the analog beamformers are all separately generated at the APs based on the local CSI, which are similar to HBF schemes for small-cell mmWave massive MIMO systems when each AP in the CF mmWave massive MIMO system is considered as a base station in the small-cell system. The antenna-selection (AS) schemes, which are proposed for the transceiver with a limited number of RF chains in [28, 29], can be applied to CF mmWave massive MIMO systems to reduce power consumption. However, they can cause performance degradation, especially for HBF in the highly correlated channels of mmWave communication [30].

I-B Contributions

In this work, we investigate the HBF for uplink CF mmWave massive MIMO systems in two scenarios: the global CSI of all APs is available or unavailable at the CPU. Then, we propose an adaptive RF chain-activation (ARFA) scheme, which provides considerable power reduction while nearly maintaining the system’s total achievable rate; thus, the EE is remarkably improved. Our specific contributions can be summarized as follows:

•

We first propose the decentralized HBF (D-HBF) and semi-centralized HBF (SC-HBF) schemes. Both have digital beamformers generated at the AP, but their difference lies in the analog beamformer. Specifically, in SC-HBF, the analog beamforming matrices for all APs are generated at the CPU based on the global CSI. In contrast, that of the D-HBF is obtained at each AP based only on the local CSI. By exploiting the global CSI to jointly optimize the analog combiners at the CPU, SC-HBF is expected to outperform D-HBF. However, our analytical and numerical results show that D-HBF can perform approximately the same as SC-HBF while requiring substantially lower computational complexity and no global CSI.
•

In CF mmWave massive MIMO systems with $L$ APs and $N$ RF chains at each AP, the power consumption is approximately proportional to $LN$ . Because $L$ is large in CF massive MIMO systems, the total power consumption can be excessively high. To overcome this challenge, we propose an ARFA scheme. In this scheme, the RF chains are selectively activated at the APs based on partial CSI, and the number of active RF chains at the APs is optimized so that the proposed scheme can significantly reduce the total power consumption while causing only marginal performance loss. Our numerical analysis reveals that in CF mmWave massive MIMO systems, the proposed ARFA scheme with a relatively small number of active RF chains can exhibit performance comparable to that of the conventional fixed-activation HBF scheme, which activates all the available RF chains. As a result, a considerable improvement in the EE is achieved.
•

In the proposed ARFA scheme, high computational complexity is required to find the optimal numbers of active RF chains at numerous APs with an exhaustive search. To reduce complexity, we propose a low-complexity near-optimal algorithms for the ARFA with SC-HBF. Furthermore, ARFA is incorporated with D-HBF in the proposed D-ARFA schemes, creating a singular value-based and path loss-based D-ARFA, wherein the CPU requires a very limited amount of information from the APs. Our simulation results show that the proposed algorithms perform very close to the conventional HBF scheme, in which all the available RF chains are turned on.

We note that an RF chain-selection (RFS) scheme is introduced in [31] for the conventional small-cell mmWave massive MIMO system. The RFS scheme and our proposed ARFA scheme are similar in exploiting a reduced number of RF chains for power reduction. However, the contributions in our work are novel in the following aspects. First, the number of active RF chains at the base station in the conventional small-cell system, which is a single integer, is optimized in the RFS scheme by solving the EE-optimization problem [31]. In contrast, we consider the CF mmWave massive MIMO system, and the numbers of active RF chains at numerous APs are jointly optimized by maximizing the system’s total achievable throughput. This results in unequal numbers of active RF chains at the APs, and an AP can even deactivate all RF chains, leading to significant power reduction. Second, [31] focuses on optimizing the number of RF chains at the transmitter rather than at the receiver to avoid the nontrivial integer programming problem. Our work fills in this hole by optimizing the numbers of active RF chains at the receivers, which are the APs in the uplink of CF mmWave massive MIMO systems. Because of these systematic differences, the algorithm presented in [31] cannot be leveraged for our proposed ARFA scheme, which thus requires novel algorithms as presented in Section IV.

The remainder of this paper is organized as follows: Section II introduces the system and channel models, whereas Section III describes the D-HBF and SC-HBF schemes. In Section IV, low-complexity ARFA algorithms are presented, and the power consumption of the proposed ARFA scheme is analyzed in Section V. Section VI presents simulation results, and the conclusion follows in Section VII.

II System Model

We consider the uplink of a CF mmWave massive MIMO system, where $L$ APs, each equipped with $N_{r}$ receive antennas and $N(\leq N_{r})$ RF chains, and $K$ single-antenna user equipments (UEs) are distributed in a large area. All APs are connected simultaneously to a CPU via fronthaul links and jointly serve $K$ UEs. At each AP, a fully connected architecture is considered for analog combining, in which $N$ RF chains are connected to $N_{r}$ receive antennas via a network of $NN_{r}$ PSs. We adopt a narrowband block-fading channel model [32, 33].

Let $\textbf{{h}}_{kl}\in\mathbb{C}^{N_{r}\times 1}$ denote the channel between the $k$ th UE and $l$ th AP. In mmWave systems, $\textbf{{h}}_{kl}$ follows the geometric Saleh–Valenzuela channel model and is given by [34, 35, 32]

\displaystyle\textbf{{h}}_{kl}=\sqrt{\zeta_{kl}}\sum_{p=1}^{P_{kl}}\alpha_{kl}^{(p)}\textbf{{a}}_{r}(\phi_{kl}^{(p)}),

(1)

where $\zeta_{kl}=\frac{G_{\mathrm{a}}}{\beta_{kl}}\frac{N_{r}}{P_{kl}}$ . Here, $G_{\mathrm{a}}$ is the antenna gain, and $\beta_{kl}$ represents the path loss between the $k$ th UE and $l$ th AP, given by [36, 37]

\displaystyle\beta_{kl}\text{[dB]}=\beta_{0}+10\epsilon\log_{10}\left(\frac{d_{kl}}{d_{0}}\right)+A_{\xi},

(2)

where $\beta_{0}=10\log_{10}\left(\frac{4\pi d_{0}}{\lambda}\right)^{2}$ , $d_{0}=1$ m, $d_{kl}$ is the distance between the $k$ th UE and $l$ th AP, $\epsilon$ is the average path-loss exponent over distance, and $A_{\xi}\sim\mathcal{N}(0,\xi^{2})$ represents the effect of shadow fading. Furthermore, $P_{kl}$ is the number of effective channel paths; $\alpha_{kl}^{(p)}\sim\mathcal{CN}(0,1),\forall l,k$ is the gain of the $p$ th path; and $\phi_{kl}^{(p)}$ is the azimuth angle of arrival (AoA). In (1), $\textbf{{a}}_{r}(\cdot)$ represents the normalized receive array response vector at an AP, which depends on the structure of the antenna array. In this work, we consider a uniform linear array (ULA), where $\textbf{{a}}_{r}(\cdot)$ is given by $\textbf{{a}}_{r}(\phi)=\frac{1}{\sqrt{N_{r}}}[1,e^{j\frac{2\pi}{\lambda}d_{s}\sin(\phi)},\ldots,e^{j(N_{r}-1)\frac{2\pi}{\lambda}d_{s}\sin(\phi)}]^{T}$ in which $\lambda$ denotes the wavelength of the signal and $d_{s}$ is the antenna spacing [32]. Let $\textbf{{A}}_{kl}=[\textbf{{a}}_{r}(\phi_{kl}^{(1)}),\ldots,\textbf{{a}}_{r}(\phi_{kl}^{(P_{kl})})]\in\mathbb{C}^{N_{r}\times P_{kl}}$ and $\boldsymbol{\alpha}_{kl}=[\alpha_{kl}^{(1)},\ldots,\alpha_{kl}^{(P_{kl})}]\in\mathbb{C}^{P_{kl}\times 1}$ . Then, $\textbf{{h}}_{kl}$ can be equivalently given as $\textbf{{h}}_{kl}=\sqrt{\zeta_{kl}}\textbf{{A}}_{kl}\boldsymbol{\alpha}_{kl}$ ; thus, $\textbf{{h}}_{kl}\sim\mathcal{CN}(\mathbf{0},\zeta_{kl}\boldsymbol{\Psi}_{kl})$ , where $\boldsymbol{\Psi}_{kl}=\mathbb{E}\left\{\textbf{{A}}_{kl}\textbf{{A}}^{H}_{kl}\right\}$ .

II-A Uplink Channel Estimation

For channel estimation, all $K$ UEs simultaneously transmit their pilot sequences to the APs. Let $\sqrt{\tau_{p}}\boldsymbol{\varphi}_{k}\in\mathbb{C}^{\tau_{p}\times 1}$ be the pilot sequence of the $k$ th UE, where $\left\lVert\boldsymbol{\varphi}_{k}\right\rVert^{2}=1$ , $k=1,\ldots,K$ . Here, $\tau_{p}$ $(\tau_{p}<\tau_{c})$ is the length of $\boldsymbol{\varphi}_{k}$ , where $\tau_{c}$ denotes the length of each coherence interval (in samples). When all $K$ UEs send their pilots, the received signal at the $l$ th AP is given as:

\displaystyle\textbf{{Y}}_{l}=\sqrt{\tau_{p}\rho_{p}}\sum_{k=1}^{K}\textbf{{h}}_{kl}\boldsymbol{\varphi}_{k}^{H}+\textbf{{Z}}_{l},

(3)

where $\rho_{p}>0$ represents the average transmit power of each UE; $\textbf{{Z}}_{l}\in\mathbb{C}^{N_{r}\times\tau_{p}}$ is the receiver noise, whose entries are independent and identically distributed (i.i.d.) $\mathcal{CN}(0,\sigma^{2})$ random variables (RVs); and $\sigma^{2}$ is the noise power. To estimate $\textbf{{h}}_{kl}$ , $\textbf{{Y}}_{l}$ is projected onto $\boldsymbol{\varphi}_{k}$ , which yields

\displaystyle\textbf{{y}}_{kl}\triangleq\textbf{{Y}}_{l}\boldsymbol{\varphi}_{k}=\sqrt{\tau_{p}\rho_{p}}\textbf{{h}}_{kl}+\sqrt{\tau_{p}\rho_{p}}\sum_{i\neq k}^{K}\textbf{{h}}_{il}\boldsymbol{\varphi}_{i}^{H}\boldsymbol{\varphi}_{k}+\textbf{{Z}}_{l}\boldsymbol{\varphi}_{k}.

Thus, the minimum mean-square error (MMSE) estimate of $\textbf{{h}}_{kl}$ , denoted by $\hat{\textbf{{h}}}_{kl}$ , is given by

\displaystyle\hat{\textbf{{h}}}_{kl}=\mathbb{E}\left\{\textbf{{h}}_{kl}\textbf{{y}}_{kl}^{H}\right\}\left(\mathbb{E}\left\{\textbf{{y}}_{kl}\textbf{{y}}_{kl}^{H}\right\}\right)^{-1}\textbf{{y}}_{kl}=\sqrt{\tau_{p}\rho_{p}}\zeta_{kl}\boldsymbol{\Psi}_{kl}\left(\tau_{p}\rho_{p}\sum_{i=1}^{K}\zeta_{il}\boldsymbol{\Psi}_{il}\left|\boldsymbol{\varphi}_{i}^{H}\boldsymbol{\varphi}_{k}\right|^{2}+\sigma^{2}\textbf{{I}}_{N_{r}}\right)^{-1}\textbf{{y}}_{kl}.

(4)

Assume that knowledge of the correlation matrix $\boldsymbol{\Psi}_{kl}$ , $\forall k$ , is available at the $l$ th AP [9], from which $\hat{\textbf{{h}}}_{kl}$ can be determined. Note that in (4), we assume that the signals received at all the antennas of the AP are available for MMSE estimation. As a result, the estimate of the full CSI associated with all the antennas can be obtained via the low-complexity MMSE estimator. In a very slow-fading and sparse channel, the full CSI can be obtained by a compressed sensing-based approach [32], but with the high complexity required for a large number of compressed-sensing measurements [38].

Let $\hat{\textbf{{H}}}_{l}\triangleq[\hat{\textbf{{h}}}_{1l},\ldots,\hat{\textbf{{h}}}_{Kl}]\in\mathbb{C}^{N_{r}\times K}$ denote the estimated channel matrix between the $K$ UEs and $l$ th AP. Furthermore, we define $\hat{\textbf{{H}}}\triangleq\left[\hat{\textbf{{H}}}_{1}^{T},\ldots,\hat{\textbf{{H}}}_{L}^{T}\right]^{T}\in\mathbb{C}^{LN_{r}\times K}$ as the composite estimated channels between all the APs and UEs. In the next section, $\hat{\textbf{{H}}}_{l}$ and $\hat{\textbf{{H}}}$ are employed for the hybrid beamforming design of the uplink data transmission.

II-B Uplink Data Transmission

Denote by $x_{k}$ the symbol sent from the $k$ th UE to all the APs, such that $\mathbb{E}\{\left|x_{k}\right|^{2}\}=1$ , $\forall k$ . The signal input-output relationship at the $l$ th AP can be expressed as

\displaystyle\textbf{{r}}_{l}=\sqrt{\rho}\sum_{k=1}^{K}\textbf{{W}}_{l}^{H}\textbf{{F}}_{l}^{H}\textbf{{h}}_{kl}x_{k}+\textbf{{W}}_{l}^{H}\textbf{{F}}_{l}^{H}\textbf{{z}}_{l},

(5)

where $\rho$ represents the average transmit power, and $\textbf{{z}}_{l}$ is the noise vector, whose elements are i.i.d. $\mathcal{CN}(0,\sigma^{2})$ RVs. Furthermore, $\textbf{{F}}_{l}\in\mathbb{C}^{N_{r}\times N}$ is the analog combining matrix at the $l$ th AP. Its $n$ th column, i.e., $\hskip 2.84544pt\textbf{{f}}_{ln}=[f_{ln}^{(1)},\ldots,f_{ln}^{(N_{r})}]^{T}$ , is the analog weight vector corresponding to the $n$ th RF chain at the $l$ th AP, and $f_{ln}^{(i)}=\frac{1}{\sqrt{N_{r}}}e^{j\theta_{ln}^{(i)}}$ is the $i$ th element of $\hskip 2.84544pt\textbf{{f}}_{ln}$ . $\textbf{{W}}_{l}\in\mathbb{C}^{N\times K}$ denote the digital combining matrix at the $l$ th AP. Then, the APs send $\textbf{{r}}_{l},\forall l$ to the CPU via a fronthaul network to perform the final signal detection. In this work, we assume a simple centralized decoding scheme at the CPU, which requires minimal information exchange between the APs and CPU. In this scheme, the final decoded signal at the CPU is given as the average of the local estimates, that is, $\frac{1}{L}\sum_{l=1}^{L}\textbf{{r}}_{l}$ [9].

The composite received signal available at the CPU can be expressed as

\displaystyle\begin{bmatrix}\textbf{{r}}_{1}\\ \vdots\\ \textbf{{r}}_{L}\end{bmatrix}=\sqrt{\rho}\sum_{k=1}^{K}\begin{bmatrix}\textbf{{W}}_{1}^{H}\textbf{{F}}_{1}^{H}\textbf{{h}}_{k1}\\ \vdots\\ \textbf{{W}}_{L}^{H}\textbf{{F}}_{L}^{H}\textbf{{h}}_{kL}\end{bmatrix}x_{k}+\begin{bmatrix}\textbf{{W}}_{1}^{H}\textbf{{F}}_{1}^{H}\textbf{{z}}_{1}\\ \vdots\\ \textbf{{W}}_{L}^{H}\textbf{{F}}_{L}^{H}\textbf{{z}}_{L}\end{bmatrix}.

(6)

Let $\textbf{{F}}\triangleq\text{diag}\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\}\in\mathbb{C}^{LN_{r}\times LN}$ and $\textbf{{W}}\triangleq\text{diag}\left\{\textbf{{W}}_{1},\ldots,\textbf{{W}}_{L}\right\}\in\mathbb{C}^{LN\times LK}$ be block-diagonal matrices containing the analog and digital combiners for all $L$ APs. In this work, we refer to F and W as global combiners, whereas $\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\}$ and $\left\{\textbf{{W}}_{1},\ldots,\textbf{{W}}_{L}\right\}$ for the signal combined at APs $\{1,\ldots,L\}$ are referred to as the local combiners. Furthermore, let $\textbf{{H}}_{l}\triangleq\left[\textbf{{h}}_{1l},\ldots,\textbf{{h}}_{Kl}\right]\in\mathbb{C}^{N_{r}\times K}$ denote the channel matrix between the $K$ UEs and the $l$ th AP. We define $\textbf{{r}}\triangleq[\textbf{{r}}_{1}^{T},\ldots,\textbf{{r}}_{L}^{T}]^{T}\in\mathbb{C}^{LK\times 1}$ , $\textbf{{x}}\triangleq[x_{1},\ldots,x_{K}]^{T}\in\mathbb{C}^{K\times 1}$ , $\textbf{{z}}\triangleq[\textbf{{z}}_{1}^{T},\ldots,\textbf{{z}}_{L}^{T}]^{T}\in\mathbb{C}^{LN_{r}\times 1}$ , and $\textbf{{H}}\triangleq[\textbf{{H}}_{1}^{T},\ldots,\textbf{{H}}_{L}^{T}]^{T}\in\mathbb{C}^{LN_{r}\times K}$ . Then, (6) can be rewritten in a more compact form as

\displaystyle=\sqrt{\rho}\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{H}}\textbf{{x}}+\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{z}}=\sqrt{\rho}\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{H}}\textbf{{x}}+\tilde{\textbf{{z}}},

(7)

where $\tilde{\textbf{{z}}}\triangleq\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{z}}\sim\mathcal{CN}(0,\textbf{{R}}_{\tilde{z}})$ , and $\textbf{{R}}_{\tilde{z}}\triangleq\sigma^{2}\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{F}}\textbf{{W}}$ .

We note that the analog processing is separately performed at the APs because the ADCs, RF chains, and PSs are installed at the APs. However, $\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\}$ can be generated either at the APs based on their local CSI or at the CPU based on the global CSI. Once the analog combiner is obtained, digital processing can be carried out at the corresponding AP. We follow the common assumption in [4, 19] that the digital combining is performed at the APs individually. Therefore, in this work, the D-HBF scheme refers to the HBF with analog combiners generated at each AP separately, whereas SC-HBF implies that the analog combiners are computed at the CPU based on the global CSI. In this regard, we note that the perfect local/global CSI is not available at neither the APs nor the CPU, respectively. Therefore, to perform D-HBF, the AP, say the $l$ th AP, treats its own estimated channels $\hat{\textbf{{H}}}_{l}$ as the true channel and employs it to generate the hybrid combiners $\textbf{{F}}_{l}$ and $\textbf{{W}}_{l}$ . Similarly, in SC-HBF, the CPU exploits the global estimated CSI $\hat{\textbf{{H}}}$ to obtain $\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\}$ .

III SC-HBF and D-HBF

Based on (7), the total achievable rate $\bar{\mathcal{R}}$ can be expressed as [33]

\displaystyle\bar{\mathcal{R}}=\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\left|\textbf{{I}}_{LK}+\rho\textbf{{R}}_{\tilde{z}}^{-1}\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{H}}\textbf{{H}}^{H}\textbf{{F}}\textbf{{W}}\right|.

(8)

We aim to design hybrid combiners that maximize $\mathcal{R}$ . The design of F and W can be decoupled by first designing the analog combiner assuming an optimal digital combiner and then finding the optimal digital combiner for the derived analog one [39]. Therefore, the analog beamforming design problem is formulated as


$\displaystyle(\text{P}_{\mathrm{a}})\quad\underset{\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\}}{\textrm{maximize}}\quad$	$\displaystyle\log_{2}\left\|\textbf{{I}}_{LN}+\rho\left(\textbf{{F}}^{H}\textbf{{F}}\right)^{-1}\textbf{{F}}^{H}\textbf{{H}}\textbf{{H}}^{H}\textbf{{F}}\right\|,$	(9a)
subject to	$\displaystyle\textbf{{F}}=\text{diag}\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\}.$	(9b)
	$\displaystyle\hskip 2.84544pt\textbf{{f}}_{ln}\in\mathcal{F},\forall l,n,$	(9c)

where $\gamma=\frac{\rho}{\sigma^{2}}$ , and $\mathcal{F}$ is the set of feasible analog combining coefficients $f_{ln}^{(i)}=\frac{1}{\sqrt{N_{r}}}e^{j\theta_{ln}^{(i)}},\forall l,n,i$ . To simplify the objective function in $(\text{P}_{\mathrm{a}})$ , we assume $\textbf{{F}}^{H}_{l}\textbf{{F}}_{l}\approx\textbf{{I}}_{N}$ [33, 39], which is tight in the considered CF mmWave massive MIMO system with a sufficiently large number of antennas deployed at each AP. Consequently, we have $\textbf{{F}}^{H}\textbf{{F}}\approx\textbf{{I}}_{LN}$ , and the objective function in (9a), which is the sum rate achieved by analog combining, can be approximated by $\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\left|\textbf{{I}}_{LN}+\gamma\textbf{{F}}^{H}\textbf{{H}}\textbf{{H}}^{H}\textbf{{F}}\right|\triangleq\bar{\mathcal{R}}^{\mathrm{a}}$ . Therefore, the optimal analog combiners can be solved approximately in

	$\displaystyle(\text{P}^{\prime}_{\mathrm{a}})\quad\underset{\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\}}{\textrm{maximize}}\quad$	$\displaystyle\bar{\mathcal{R}}^{\mathrm{a}}$
	subject to	$\displaystyle\eqref{Pa_cons_1},\eqref{Pa_cons_2}.$		(10)

The objective function $\bar{\mathcal{R}}^{\mathrm{a}}$ of $(\text{P}^{\prime}_{\mathrm{a}})$ is further investigated in the following theorem.

Theorem 1

In a CF mmWave massive MIMO system with $L$ APs, we have $\bar{\mathcal{R}}^{\mathrm{a}}=\sum_{l=1}^{L}\bar{\mathcal{R}}^{\mathrm{a}}_{l}$ , where

\displaystyle\bar{\mathcal{R}}^{\mathrm{a}}_{l}=\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\det\left(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\textbf{{Q}}_{l-1}^{-1}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\right),

(11)

with $\textbf{{Q}}_{0}=\textbf{{I}}_{K}$ and

\displaystyle\textbf{{Q}}_{l-1}=\textbf{{Q}}_{l-2}+\gamma\textbf{{H}}_{l-1}^{H}\textbf{{F}}_{l-1}\textbf{{F}}_{l-1}^{H}\textbf{{H}}_{l-1}.

(12)

Proof

See Appendix A. $\Box$

Based on Theorem 1, $\bar{\mathcal{R}}^{\mathrm{a}}$ can be maximized by optimizing $\{\bar{\mathcal{R}}^{\mathrm{a}}_{1},\ldots,\bar{\mathcal{R}}^{\mathrm{a}}_{L}\}$ corresponding to APs $\{1,\ldots,L\}$ . We recall that the $l$ th AP treats $\hat{\textbf{{H}}}_{l}$ as the true channel to obtain the combiners. Therefore, $\textbf{{F}}_{l}^{\star}$ in $(\text{P}^{\prime}_{\mathrm{a}})$ can be determined by maximizing $\mathcal{R}^{\mathrm{a}}_{l}\triangleq\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\det\left(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}\textbf{{F}}_{l}\right)$ , where $\hat{\textbf{{Q}}}_{l}$ is defined similarly to $\textbf{{Q}}_{l}$ by replacing $\textbf{{H}}_{l}$ with $\hat{\textbf{{H}}}_{l}$ . Consequently, we can write

\displaystyle\textbf{{F}}_{l}^{\star}=\arg\max_{\textbf{{F}}_{l}}\mathcal{R}^{\mathrm{a}}_{l},\forall l,\hskip 4.26773pt\text{subject to}\hskip 2.84544pt\hskip 2.84544pt\textbf{{f}}_{l1},\ldots,\hskip 2.84544pt\textbf{{f}}_{lN}\in\mathcal{F}.

(13)

Let $\left\{{\textbf{{u}}}^{\star}_{l1},\ldots,{\textbf{{u}}}^{\star}_{lN}\right\}$ be the $N$ singular vectors corresponding to $N$ largest singular values of $\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}$ , which are in decreasing order. Then, columns $\{\hskip 2.84544pt\textbf{{f}}_{l1}^{\star},\ldots,\hskip 2.84544pt\textbf{{f}}_{lN}^{\star}\}$ of a near-optimal solution to (13) can be obtained by quantizing $\left\{{\textbf{{u}}}^{\star}_{l1},\ldots,{\textbf{{u}}}^{\star}_{lN}\right\}$ , respectively, to the nearest vector in $\mathcal{F}$ [21], i.e.,

\displaystyle\hskip 2.84544pt\textbf{{f}}_{ln}^{\star}=\arg\min_{\hskip 2.84544pt\textbf{{f}}_{ln}\in\mathcal{F}}\left\lVert\textbf{{u}}_{ln}^{\star}-\hskip 2.84544pt\textbf{{f}}_{ln}\right\rVert^{2},\forall n.

(14)

At the $l$ th AP, once the analog combiner $\textbf{{F}}_{l}^{\star}$ is found, the optimal digital combiner is given as the MMSE solution, i.e.,

\displaystyle\textbf{{W}}_{l}^{\star}=\textbf{{J}}^{-1}\textbf{{F}}_{l}^{\star H}\hat{\textbf{{H}}}_{l},

(15)

where $\textbf{{J}}=\textbf{{F}}_{l}^{\star H}\hat{\textbf{{H}}}_{l}\hat{\textbf{{H}}}_{l}^{H}\textbf{{F}}_{l}^{\star}+\frac{1}{\gamma}\textbf{{F}}_{l}^{\star H}\textbf{{F}}_{l}^{\star}$ [39]. In the following subsections, we propose two HBF schemes in which the analog combiners are derived based on different assumptions for CSI.

Algorithm 1 SC-HBF scheme

\left\{\textbf{{F}}_{1}^{\star},\ldots,\textbf{{F}}_{L}^{\star}\right\}

\left\{\textbf{{W}}_{1}^{\star},\ldots,\textbf{{W}}_{L}^{\star}\right\}

, and

\left\{\mathcal{R}^{\mathrm{a}}_{1},\ldots,\mathcal{R}^{\mathrm{a}}_{L}\right\}

1: At the CPU:

\hat{\textbf{{Q}}}_{0}=\textbf{{I}}_{K}

2: for

l=1\rightarrow L

3: for

n=1\rightarrow N

4: Set

\textbf{{u}}_{ln}^{\star}

to the singular vector corresponding to the

n

th largest singular value of

\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}

\hskip 2.84544pt\textbf{{f}}_{ln}^{\star}=\frac{1}{\sqrt{N_{r}}}\mathcal{Q}(\textbf{{u}}_{ln}^{\star})

6: end for

\textbf{{F}}_{l}^{\star}=\left[\hskip 2.84544pt\textbf{{f}}_{l1}^{\star},\ldots,\hskip 2.84544pt\textbf{{f}}_{lN}^{\star}\right]

\textbf{{G}}_{l}=\hat{\textbf{{H}}}_{l}^{H}\textbf{{F}}_{l}^{\star}{\textbf{{F}}_{l}^{\star}}^{H}\hat{\textbf{{H}}}_{l}

\hat{\textbf{{Q}}}_{l}=\hat{\textbf{{Q}}}_{l-1}+\gamma\textbf{{G}}_{l}

10:

\mathcal{R}_{l}^{\mathrm{a}}=\log_{2}\left(\textbf{{I}}_{N}+\gamma{\textbf{{F}}_{l}^{\star}}^{H}\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}\textbf{{F}}_{l}^{\star}\right)

11: end for

12: At the

l

th AP: compute

\textbf{{W}}_{l}^{\star}

based on (15).

III-A SC-HBF

It is evident from (11) and (12) that $\mathcal{R}^{\mathrm{a}}_{l}$ depends not only on $\hat{\textbf{{H}}}_{l}$ , but also on $\hat{\textbf{{H}}}_{l-1},\hat{\textbf{{H}}}_{l-2},\ldots,\hat{\textbf{{H}}}_{1}$ . Therefore, finding $\textbf{{F}}_{l}^{\star}$ requires not only $\hat{\textbf{{H}}}_{l}$ but also $\hat{\textbf{{H}}}_{l-1},\hat{\textbf{{H}}}_{l-2},\ldots,\hat{\textbf{{H}}}_{1}$ . This is similar to the requirements for determining analog beamformers for sub-arrays in the partially-connected HBF architecture [40, 24]. As a result, solving $\left\{\textbf{{F}}_{1}^{\star},\textbf{{F}}_{2}^{\star},\ldots,\textbf{{F}}_{L}^{\star}\right\}$ requires the CSI of the channels between all $L$ APs and $K$ UEs, i.e., $\left\{\hat{\textbf{{H}}}_{1},\hat{\textbf{{H}}}_{2},\ldots,\hat{\textbf{{H}}}_{L}\right\}$ , which can be available at the CPU; hence, finding $\textbf{{F}}^{\star}$ based on (13) requires a SC-HBF scheme.

Algorithm 1 presents the proposed SC-HBF scheme to obtain $\left\{\textbf{{F}}_{1}^{\star},\textbf{{F}}_{2}^{\star},\ldots,\textbf{{F}}_{L}^{\star}\right\}$ . In particular, in steps 3–6, the combining vector $\hskip 2.84544pt\textbf{{f}}_{ln}^{\star}$ is obtained by quantizing $\textbf{{u}}_{ln}^{\star}$ based on (14), which ensures that the resultant analog combiners belong to the feasible set $\mathcal{F}$ . Then, $\textbf{{F}}_{l}^{\star}$ is found in step 7 and $\textbf{{G}}_{l}$ is computed in step 8, followed by $\hat{\textbf{{Q}}}_{l}$ being updated in step 9. In step 10, $\mathcal{R}^{\mathrm{a}}_{l}$ corresponding to the $l$ th AP is computed. Furthermore, the digital combiner is computed at each AP, as in step 12. We note that in Algorithm 1, steps 1–11 are performed at the CPU, whereas step 12 is performed at the APs.

III-B D-HBF

Let $\left\{\tilde{\textbf{{u}}}^{\star}_{l1},\ldots,\tilde{\textbf{{u}}}^{\star}_{lN}\right\}$ be the $N$ singular vectors corresponding to the $N$ largest singular values of $\hat{\textbf{{H}}}_{l}$ , which are in decreasing order. Furthermore, define

\displaystyle\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln}^{\star}=\arg\min_{\hskip 2.84544pt\textbf{{f}}_{ln}\in\mathcal{F}}\left\lVert\tilde{\textbf{{u}}}_{ln}^{\star}-\hskip 2.84544pt\textbf{{f}}_{ln}\right\rVert^{2},\forall n.

(16)

Then, in the D-HBF scheme, the optimal local analog combiner generated at the $l$ th AP based on $\hat{\textbf{{H}}}_{l}$ can be given as $\tilde{\textbf{{F}}}_{l}^{\star}=\left[\tilde{\hskip 2.84544pt\textbf{{f}}}_{l1}^{\star},\ldots,\tilde{\hskip 2.84544pt\textbf{{f}}}_{lN}^{\star}\right]$ [21]. Let $\tilde{\textbf{{F}}}^{\star}=\text{diag}\left\{\tilde{\textbf{{F}}}_{1}^{\star},\ldots,\tilde{\textbf{{F}}}_{L}^{\star}\right\}$ . In the following theorem, we show that the total achievable rate achieved by analog combining in the D-HBF scheme is approximately equal to that in SC-HBF.

Remark 1

In CF mmWave massive MIMO systems with large $L$ and low SNRs due to the significant pathloss in the mmWave channels, the total achievable rate achieved by the analog combining in the D-HBF scheme, denoted by $\tilde{\mathcal{R}}^{\mathrm{a}}$ , is approximately the same as that of the SC-HBF scheme, i.e.,

\displaystyle\tilde{\mathcal{R}}^{\mathrm{a}}=\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\det\left(\textbf{{I}}_{K}+\gamma\textbf{{H}}^{H}\tilde{\textbf{{F}}}^{\star}\tilde{\textbf{{F}}}^{{\star}H}\textbf{{H}}\right)\approx\bar{\mathcal{R}}^{\mathrm{a}},

(17)

where $\bar{\mathcal{R}}^{\mathrm{a}}$ is given in Theorem 1.

Proof

See Appendix B. $\Box$

It is observed that D-HBF can be performed with considerably lower computational complexity than SC-HBF. Specifically, only $N$ singular vectors corresponding to the $N$ largest singular values of the channel matrix are required to form the analog combiner. In contrast, additional matrix inversions, multiplications, and additions are performed in steps 4, 8, and 9 of Algorithm 1 for the SC-HBF scheme. Notably, despite the simpler implementation and lower complexity of the D-HBF scheme, it can approximately achieve the performance of SC-HBF, as stated in Remark 1. Furthermore, the D-HBF scheme requires less information exchange between the APs and CPU. Specifically, only $K$ complex numbers in $\textbf{{r}}_{l}$ are sent to the CPU on the fronthaul link to perform the final soft detection [4, 19], whereas the exchange of the CSI and analog combining matrix is not required, in contrast to the SC-HBF scheme

Remark 1 indicates that an efficient analog beamforming matrix can be designed based on the local CSI available at each AP of CF mmWave massive MIMO systems. However, this does not mean that further information exchange via the fronthaul links is completely useless. Indeed, the information exchange between the APs and CPU in CF massive MIMO systems can also be exploited to improve the EE. In the next section, it is discussed that by adaptively activating RF chains based on global CSI in SC-HBF or on limited information in D-HBF, the power consumption can be reduced, which leads to improved EE.

IV Adaptive RF chain activation

Refer to caption — Fig. 1: HBF architecture with the ARFA scheme. In the phase-shifter network, $n_{l}N_{r}$ out of $NN_{r}$ PSs are turned on at the $l$ th AP.

IV-A Problem formulation and basic ideas

The global analog combiner F can be expressed as

\displaystyle\textbf{{F}}=\text{diag}\{\underbrace{[\hskip 2.84544pt\textbf{{f}}_{11},\ldots,\hskip 2.84544pt\textbf{{f}}_{1N}]}_{\text{analog combiner at AP 1}},\ldots,\underbrace{[\hskip 2.84544pt\textbf{{f}}_{L1},\ldots,\hskip 2.84544pt\textbf{{f}}_{LN}]}_{\text{analog combiner at AP $L$}}\},

and the following facts are noted:

•

Based on Theorem 1, the total achievable rate obtained by analog combining can be expressed as a sum of $\{\bar{\mathcal{R}}^{\mathrm{a}}_{1},\ldots,\bar{\mathcal{R}}^{\mathrm{a}}_{L}\}$ corresponding to APs $\{1,\ldots,L\}$ . In CF mmWave massive MIMO systems, the APs are distributed in a large area, and their communication channels experience different path losses and shadowing effects. Therefore, the contributions of the local analog combiners at different APs to the total achievable rate are of different significances.
•

In a local analog combiner $\textbf{{F}}_{l}$ , combining vectors $\{\hskip 2.84544pt\textbf{{f}}_{l1},\ldots,\hskip 2.84544pt\textbf{{f}}_{lN}\}$ have different contributions to the sub-rate $\bar{\mathcal{R}}^{\mathrm{a}}_{l}$ given in (11). Specifically, the contribution of $\hskip 2.84544pt\textbf{{f}}_{ln}$ is more significant than that of $\hskip 2.84544pt\textbf{{f}}_{lm}$ if $n<m$ because $n$ and $m$ are the indices of the ordered singular values of $\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}$ in SC-HBF and of $\hat{\textbf{{H}}}_{l}$ in D-HBF.

As a result, it is likely that a subset of analog combining vectors in $\{\hskip 2.84544pt\textbf{{f}}_{11},\ldots,\hskip 2.84544pt\textbf{{f}}_{LN}\}$ are insignificant and can be removed from the global combiner F without causing considerable performance loss. We note that at an AP, an analog combining vector represents the effect of $N_{r}$ PSs connected to an RF chain, followed by an ADC. Therefore, an insignificant analog combining vector can be removed from signal combining by turning off its corresponding RF chain, ADC, and PSs, which results in a reduction in the total power consumption. Motivated by this, we propose an ARFA scheme that selectively activates RF chains at the APs. Let $\textbf{{n}}=\{n_{1},\ldots,n_{L}\}$ , where $n_{l}$ is the number of turned-on RF chains out of $N$ RF chains installed at the $l$ th AP, $0\leq n_{l}\leq N$ . We note that for $n_{l}=0$ , all the RF chains at the $l$ th AP are turned off, and this AP does not consume any power for signal combining. It is also noted that in the uplink training phase of the next coherent time, the deactivated RF chains are reactivated for channel estimation. The optimal activation of RF chains at the APs can be performed based on the following remark.

Remark 2

Because $\hskip 2.84544pt\textbf{{f}}_{ln}$ is always more important at the $l$ th AP than $\hskip 2.84544pt\textbf{{f}}_{lm}$ with $n<m$ in terms of achievable rate, the problem of optimal activation of RF chains at an AP is equivalent to finding an optimal number of turned-on RF chains at that AP. Specifically, for the $l$ th AP, if the ARFA scheme suggests using $n_{l}^{\star}$ RF chains, then the first $n_{l}^{\star}$ RF chains corresponding to $\{\hskip 2.84544pt\textbf{{f}}_{l1},\ldots,\hskip 2.84544pt\textbf{{f}}_{ln_{l}^{\star}}\}$ are selected for analog signal combining, whereas the others are deactivated to save power.

The HBF architecture with the proposed ARFA scheme is illustrated in Fig. 1. As an example, at the $l$ th AP, only $n_{l}$ out of $N$ RF chains are turned on. Furthermore, the ADCs and PSs connected to the inactive RF chains are also turned off. Consequently, the local combiner $\textbf{{F}}_{l}$ consists of only $n_{l}$ analog combining vectors, i.e., $\textbf{{F}}_{l}=\{\hskip 2.84544pt\textbf{{f}}_{1},\ldots,\hskip 2.84544pt\textbf{{f}}_{n_{l}}\}$ .

Unlike the conventional fixed-activation HBF, in the proposed ARFA scheme, the global combiner F, $\mathcal{R}^{\mathrm{a}}_{l}$ , and $\mathcal{R}\triangleq\sum_{l=1}^{L}\mathcal{R}^{\mathrm{a}}_{l}$ depend on n. Therefore, in this section, they are expressed as functions of n, i.e., $\textbf{{F}}(\textbf{{n}})$ , $\mathcal{R}^{\mathrm{a}}_{l}(\textbf{{n}})$ , and $\mathcal{R}(\textbf{{n}})$ , respectively. We limit the total number of turned-on RF chains in the ARFA scheme to $L\bar{n}$ , i.e., $\sum_{l=1}^{L}n_{l}=L\bar{n}$ , where $\bar{n}(\leq N)$ is the average number of activated RF chains at each AP. Based on Remark 2, the optimal activation of RF chains at the APs in the ARFA scheme can be performed by solving

\displaystyle\textbf{{n}}^{\star}=\arg\max_{\textbf{{n}}\in\mathcal{S}}\mathcal{R}(\textbf{{n}}),

(18)

where $\mathcal{S}=\left\{\textbf{{n}}:0\leq n_{l}\leq N,\sum_{l=1}^{L}n_{l}=L\bar{n}\right\}$ is the feasible set of n. The optimal $\textbf{{n}}^{\star}$ in (18) can be found by exhaustive search over the entire feasible set $\mathcal{S}$ . However, in CF mmWave massive MIMO systems, $L$ is large; thus, an excessively large number of candidates for n need to be examined in the exhaustive search, which is almost computationally prohibitive. In the following subsections, we propose three low-complexity algorithms to find $\textbf{{n}}^{\star}$ . By abuse of notation, we use $\textbf{{F}}^{\star}$ for the global combiner found by the proposed ARFA algorithms. Furthermore, we note that Algorithm 1 can be easily modified for the ARFA scheme by replacing $N$ in steps 3, 7, and 10 with $n_{l}$ , reflecting a dynamic number of analog combining vectors for the local combiner $\textbf{{F}}_{l}$ at the $l$ th AP.

IV-B ARFA with SC-HBF (SC-ARFA)

Algorithm 2 HBF with SC-ARFA

\textbf{{F}}^{\star}

1: Initialize

\textbf{{n}}=[n_{1},\ldots,n_{L}]

with

n_{l}=\bar{n},\forall l

2: Use Algorithm 1 to find

\textbf{{F}}(\textbf{{n}})

\mathcal{R}^{\mathrm{a}}_{l}(\textbf{{n}}),\forall l

, and

\mathcal{R}(\textbf{{n}})

\textbf{{F}}^{\star}=\textbf{{F}}(\textbf{{n}})

\mathcal{R}^{\star}=\mathcal{R}(\textbf{{n}})

4: Obtain

\{n_{[1]},\ldots,n_{[L]}\}

s.t.

\mathcal{R}^{\mathrm{a}}_{[1]}(\textbf{{n}})>\ldots>\mathcal{R}^{\mathrm{a}}_{[L]}(\textbf{{n}})

i=1

k=L

6: while

i<k

7: while

n_{[i]}=N

i=i+1

9: end while

10: while

n_{[k]}=0

11:

k=k-1

12: end while

13: Update n:

n_{[i]}=n_{[i]}+1

n_{[k]}=n_{[k]}-1

14: Use Algorithm 1 to find

\textbf{{F}}(\textbf{{n}})

and

\mathcal{R}(\textbf{{n}})

with the updated n.

15: Update

\textbf{{F}}^{\star}=\textbf{{F}}(\hat{\textbf{{n}}}^{\star})

\mathcal{R}(\hat{\textbf{{n}}}^{\star})>\mathcal{R}^{\star}

16: end while

17: At the

l

th AP, compute

\textbf{{W}}_{l}^{\star}

based on (15),

\forall l

In SC-ARFA, the ARFA scheme is incorporated with SC-HBF, and the optimal numbers of active RF chains at the APs are found at the CPU based on the global CSI. The idea to find $\textbf{{n}}^{\star}$ is to turn on/off as many RF chains as possible at the APs corresponding to the largest/smallest $\mathcal{R}^{\mathrm{a}}_{l}$ , as presented in Algorithm 2. In steps 1–3, all elements of n are set to $\bar{n}$ , then $\textbf{{F}}(\textbf{{n}}),\mathcal{R}^{\mathrm{a}}_{l}(\textbf{{n}}),\forall l$ , and $\mathcal{R}(\textbf{{n}})$ are computed. In step 4, the elements of n are ordered to obtain $\{n_{[1]},\ldots,n_{[L]}\}$ in the decreasing order of sub-rates $\{\mathcal{R}^{\mathrm{a}}_{1}(\textbf{{n}}),\ldots,\mathcal{R}^{\mathrm{a}}_{L}(\textbf{{n}})\}$ . Therefore, $n_{[i]}$ is the number of turned-on RF chains at the AP with the $i$ th largest sub-rates, i.e., $\mathcal{R}^{\mathrm{a}}_{[i]}(\textbf{{n}})$ .

In step 5, we initialize $i=1$ and $k=L$ . In steps 6–16, $n_{[i]}$ is increased by one, whereas $n_{[k]}$ is decreased by one, in each iteration. We note that in step 13, $n_{[i]}$ and $n_{[k]}$ are updated simultaneously to guarantee $\sum_{l=1}^{L}n_{[l]}=L\bar{n}$ . The updates of $n_{[i]}$ and $n_{[k]}$ result in a new candidate n. Hence, $\textbf{{F}}(\textbf{{n}})$ and $\mathcal{R}(\textbf{{n}})$ are found in step 14, and $\textbf{{F}}^{\star}$ is updated if the performance is improved, as shown in step 15. Once $n_{[i]}$ reaches the maximum, i.e., $N$ , the number of turned-on RF chains at the AP associated with the $(i+1)$ th largest sub-rate, i.e., $n_{[i+1]}$ , is considered, as shown in steps 7–9. In contrast, if $n_{[k]}$ reaches the minimum, i.e., zero, $n_{[k-1]}$ is considered next, as shown in steps 10–12. This iterative process is terminated if $i\geq k$ , for which we have $\mathcal{R}^{\mathrm{a}}_{[i]}(\textbf{{n}})\leq\mathcal{R}^{\mathrm{a}}_{[k]}(\textbf{{n}})$ and the increase (decrease) in $n_{[i]}$ $(n_{[k]})$ is unlikely to provide performance improvement. Once all the analog combiners are determined and sent to the APs, the digital combiner at each AP is determined, as in step 17.

We note that the ARFA process needs to be performed at the CPU to jointly optimize the numbers of RF chains at all APs. In the SC-ARFA schemes, the global CSI is exploited to evaluate the candidates for n. However, the employment of SC-HBF in these schemes requires high computational complexity and a large amount of information exchanged between the CPU and APs, as discussed in Section III-B. This motivates us to propose an ARFA scheme incorporated with D-HBF in the next subsection.

IV-C ARFA with D-HBF (D-ARFA)

Without global CSI, the ARFA scheme can be performed if the CPU knows the qualities of the available combining vectors or the path loss corresponding to each AP. The former idea relies on the fact that a combining vector is obtained by quantizing a singular vector of the channel matrix, as shown in (16). Therefore, the quality of a combining vector can be evaluated based on its corresponding singular value. In contrast, the latter idea for D-ARFA is motivated by the observation that the AP with more significant path loss should have fewer activated RF chains because it is more likely to have a low sub-rate.

IV-C1 Singular values-based D-ARFA (SV-based D-ARFA)

The SV-based D-ARFA scheme is summarized in Algorithm 3. Specifically, in step 1, each AP finds and sends the $N$ largest singular values of the channel matrix to the CPU. Here, only the $N$ largest singular values are sent because, in the proposed ARFA scheme, only $n_{l}$ out of $N$ combining vectors are selected for signal combining at the $l$ th AP. As a result, the set of $LN$ singular values $\left\{\lambda_{1}^{(1)},\ldots,\lambda_{N}^{(1)},\ldots,\lambda_{1}^{(L)},\ldots,\lambda_{N}^{(L)}\right\}$ is available at the CPU, where $\lambda_{n}^{(l)}$ is the $n$ th largest singular value associated with the $l$ th AP. Then, the numbers of active RF chains at the APs are determined in steps 3–6. Specifically, an RF chain at an AP is suggested for activation if its corresponding singular value is not smaller than $\lambda_{L\bar{n}}$ found in step 2. In other words, the number of active RF chains at the $l$ th AP, that is, $n^{\star}_{l}$ , is set as the number of elements in $\left\{\lambda_{1}^{(l)},\ldots,\lambda_{N}^{(l)}\right\}$ that are not smaller than $\lambda_{L\bar{n}}$ . Finally, the CPU sends the value $n^{\star}_{l}$ back to the $l$ th AP, which is then used for signal combining based on the D-HBF scheme.

Algorithm 3 HBF with SV-based D-ARFA

\textbf{{F}}^{\star}

1: Each AP finds the

N

largest singular values of its channel matrix and sends them to the CPU. Specifically, the

l

th AP finds and sends a vector

\textbf{{e}}_{l}=\left[\lambda_{1}^{(l)},\ldots,\lambda_{N}^{(l)}\right]

, where

\lambda_{n}^{(l)}

is the

n

th largest singular value of

\hat{\textbf{{H}}}_{l}

2: The CPU finds

\lambda_{L\bar{n}}

, which is the

(L\bar{n})

th largest element in the singular value set

\{\lambda_{1}^{(1)},\ldots,\lambda_{N}^{(1)},\ldots,\lambda_{1}^{(L)},\ldots,\lambda_{N}^{(L)}\}

received from all APs.

3: for

l=1\rightarrow L

4: The CPU sets

n^{\star}_{l}

to the number of elements in

\textbf{{e}}_{l}

that are not smaller than

\lambda_{L\bar{n}}

and sends

n^{\star}_{l}

to the

l

th AP.

5: The

l

th AP determines its local analog combiner

\textbf{{F}}_{l}^{\star}

for

n^{\star}_{l}

RF chains, i.e.,

\textbf{{F}}_{l}^{\star}=[\tilde{\hskip 2.84544pt\textbf{{f}}}_{l1}^{\star},\ldots,\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln^{\star}_{l}}^{\star}]

, where

\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln}^{\star}

is given by (16), and determines

\textbf{{W}}_{l}^{\star}

based on (15).

6: end for

IV-C2 Path loss-based D-ARFA (PL-based D-ARFA)

Algorithm 4 HBF with PL-based D-ARFA

\textbf{{F}}^{\star}

1: Find

\textbf{{n}}=\{n_{1},\ldots,n_{L}\}

based on (19).

2: Obtain

\{n_{[1]},\ldots,n_{[L]}\}

s.t.

\alpha_{[1]}>\ldots>\alpha_{[L]}

t=1

4: while

\textbf{{n}}\notin\mathcal{S}

5: if

\sum_{l=1}^{L}n_{[l]}<L\bar{n}

and

n_{[t]}<N

then

n_{[t]}=n_{[t]}+1

7: end if

8: if

\sum_{l=1}^{L}n_{[l]}>L\bar{n}

and

n_{[L-t+1]}>0

then

n_{[L-t+1]}=n_{[L-t+1]}-1

10: end if

11:

t=t+1

12: Reset

t=1

t>L

13: end while

14: Obtain

\textbf{{n}}^{\star}

by reordering

\{n_{[1]},\ldots,n_{[L]}\}

to the original order.

15: for

l=1\rightarrow L

16: The CPU sends

n_{l}^{\star}

to the

l

th AP.

17: The

l

th AP determines its local analog combiner

\textbf{{F}}_{l}^{\star}

for

n^{\star}_{l}

RF chains, i.e.,

\textbf{{F}}_{l}^{\star}=[\tilde{\hskip 2.84544pt\textbf{{f}}}_{l1}^{\star},\ldots,\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln^{\star}_{l}}^{\star}]

, where

\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln}^{\star}

is given by (16) and determine

\textbf{{W}}_{l}^{\star}

based on (15).

18: end for

In the SV-based D-ARFA scheme, the largest singular values of the channel matrices are required to find $\textbf{{n}}^{\star}$ . This entails high computational complexity, especially when $L$ and $N_{r}$ are large. To avoid this, we herein propose the PL-based D-ARFA scheme, in which $\textbf{{n}}^{\star}$ is obtained based on the total path losses associated with the APs. In CF massive MIMO systems, the APs are distributed in a large area. Therefore, the contribution of an AP to the total achievable rate considerably depends on its path loss.

Let $\beta_{l}=\sum_{k=1}^{K}\beta_{kl}$ be the sum of path loss of the $l$ th AP, with $\beta_{kl}$ given in (2), and let $\alpha_{l}=\frac{1}{\beta_{l}},\forall l$ . The number of activated RF chains at the $l$ th AP can be set to

\displaystyle n_{l}=\min\left\{N,\left\lfloor L\bar{n}\frac{\alpha_{l}}{\sum_{i=1}^{L}\alpha_{i}}\right\rceil\right\},\forall l,

(19)

where $\min\{N,\cdot\}$ is used to guarantee $n_{l}\leq N$ , and $\lfloor\cdot\rceil$ rounds a real number to its nearest integer. However, because of rounding, it is possible to obtain $\sum_{l=1}^{L}n_{l}\neq L\bar{n}$ , which leads to $\textbf{{n}}\notin\mathcal{S}$ . To solve this problem, we propose Algorithm 4.

In Algorithm 4, the elements of n found in step 1 based on (19) are sorted in step 2 in decreasing order of $\{\alpha_{1},\ldots,\alpha_{L}\}$ , i.e., in increasing order of the sums of path loss $\{\beta_{1},\ldots,\beta_{L}\}$ , to generate $\{n_{[1]},\ldots,n_{[L]}\}$ . Here, the order index $[t]$ indicates that $n_{[t]}$ RF chains are chosen to be activated at the AP associated with the $t$ th-smallest path loss. Therefore, if $\sum_{l=1}^{L}n_{[l]}<L\bar{n}$ and $n_{[t]}<N$ , $n_{[t]}$ is increased by one. In contrast, if $\sum_{l=1}^{L}n_{[l]}>L\bar{n}$ and $n_{[L-t+1]}>0$ , $n_{[L-t+1]}$ is decreased by one. This process is repeated until $\textbf{{n}}\in\mathcal{S}$ is satisfied, as shown in steps 3–13. In this procedure, by initializing $t=1$ and gradually increasing $t$ , the numbers of turned-on RF chains for the APs with path loss are chosen to increase first, whereas those for the APs with larger path loss are chosen to decrease first. In step 14, $\{n_{[1]},\ldots,n_{[L]}\}$ are reordered into the original order. In steps 15–18, the numbers of active RF chains at the APs are determined, which are then fed back to the APs for SC-HBF, as in steps 3–6 of Algorithm 3.

V Power consumption analysis

In the considered uplink CF mMIMO system, the total power consumption is modeled as [41, 12, 42, 43]

\displaystyle P_{\text{total}}=\sum_{k=1}^{K}\left(P_{\text{TX},k}+P_{\text{UE},k}\right)+\sum_{l=1}^{L}\left(P_{\text{fix},l}+P_{\text{BF},l}+P_{\text{FH},l}\right),

(20)

where $P_{\text{TX},k}$ and $P_{\text{UE},k}$ represent the transmit power and the required power to run circuit components at the $k$ th UE, respectively; $P_{\text{fix},l}$ , $P_{\text{BF},l}$ , and $P_{\text{FH},l}$ respectively denote the fixed power consumption term, the variable power consumption for the beamforming structure, and the fronthaul power consumption for the $l$ th AP. $P_{\text{TX},k}$ is given as

\displaystyle P_{\text{TX},k}=\gamma\sigma^{2}\sum_{k=1}^{K}\frac{1}{\eta_{k}}\mathbb{E}\left\{\left\lVert\textbf{{x}}_{k}\right\rVert^{2}\right\}=\sum_{k=1}^{K}\frac{\gamma\sigma^{2}}{\eta_{k}},

(21)

where $\eta_{k}\in(0,1]$ denotes the power amplifier efficiency of the UE $k$ , and the last equality is obtained by $\mathbb{E}\left\{\left\lVert\textbf{{x}}_{k}\right\rVert^{2}\right\}=1,\forall k$ . In an HBF architecture, each antenna requires a low-noise amplifier (LNA) and two mixers, and each RF chain requires one ADC and $N_{r}$ PSs, as illustrated in Fig. 1 [24, 44, 45]. Therefore, $P_{\text{BF},l}$ linearly depends on the numbers of antennas $(N_{r})$ and active RF chains at the $l$ th AP $(n_{l})$ as follows:

\displaystyle P_{\text{BF},l}=N_{r}p_{\text{BF},1}+n_{l}p_{\text{BF},2},

(22)

where $p_{\text{BF},1}=p_{\text{LNA}}+2p_{\text{M}}$ , $p_{\text{BF},2}=N_{r}p_{\text{PS}}+p_{\text{RF}}+p_{\text{ADC}}$ , with $p_{\text{LNA}}$ , $p_{\text{M}}$ , $p_{\text{PS}}$ , $p_{\text{RF}}$ , and $p_{\text{ADC}}$ respectively denoting the power consumed by an LNA, mixer, PS, RF chain, and ADC. Furthermore, $P_{\text{FH},l}$ can be obtained by [41, 12]

\displaystyle P_{\text{FH},l}=P_{\text{FH,max}}\frac{R_{\text{FH},l}}{C_{\text{FH},l}}=\kappa_{l}R_{\text{FH},l},

(23)

where $P_{\text{FH,max}}$ is the maximum power required for the fronthaul traffic at the full capacity $C_{\text{FH},l}$ , $\mathcal{R}_{\text{FH},l}$ is the actual fronthaul rate between the $l$ th AP and the CPU, and $\kappa_{l}=\frac{P_{\text{FH,max}}}{C_{\text{FH},l}}$ . In the considered decentralized signal processing scheme, $2K\tau_{d}\alpha_{l}$ bits are required to quantize the signal vector $\textbf{{r}}_{l}\in\mathbb{C}^{K\times 1}$ during each coherence interval [12, 46] at the $l$ th AP before being sent to the CPU. Here, $\alpha_{l}$ is the number of quantization bits at the $l$ th AP, and $\tau_{d}$ is the length (in symbols) of the uplink data. As a result, $\mathcal{R}_{\text{FH},l}$ is given by [12, 46]

\displaystyle R_{\text{FH},l}=\frac{2K\tau_{d}\alpha_{l}}{T_{c}},

(24)

where $T_{c}$ is the coherence time (in seconds). Assume that all the UEs have the same power amplifier efficiency and circuit power consumption, i.e., $\eta_{k}=\eta$ , $P_{\text{UE},k}=P_{\text{UE}},\forall k$ , and that all APs have the same fixed power consumption, number of quantization bits, and capacity, i.e., $P_{\text{fix},l}=P_{\text{fix}}$ , $\alpha_{l}=\alpha$ , $C_{\text{FH},l}=C_{\text{FH}}$ , $\kappa_{l}=\kappa$ , $\forall l$ . Then, we have $P_{\text{FH},l}=P_{\text{FH}}$ and $\mathcal{R}_{\text{FH},l}=R_{\text{FH}}$ , $\forall l$ . Furthermore, we note that AP $l$ requires $P_{\text{fix}}$ , even when it is in sleep mode; in contrast, $P_{\text{FH}}$ and $P_{\text{BF},l}$ are only consumed when it is in the active mode. Let $\mathbb{A}$ be the set of APs in active mode and $\left|\mathbb{A}\right|$ be the number of active APs. Then, from (20)–(24), the total power consumption can be expressed as

	$\displaystyle P_{\text{total}}$	$\displaystyle=\frac{K\gamma\sigma^{2}}{\eta}+KP_{\text{UE}}+LP_{\text{fix}}+\left\|\mathbb{A}\right\|P_{\text{FH}}+\sum_{l\in\mathbb{A}}\left(N_{r}p_{\text{BF},1}+n_{l}p_{\text{BF},2}\right),$
		$\displaystyle=P_{0}+\left\|\mathbb{A}\right\|P_{\text{FH}}+\left\|\mathbb{A}\right\|N_{r}p_{\text{BF},1}+p_{\text{BF},2}\sum_{l\in\mathbb{A}}n_{l},$		(25)

where $P_{0}=\frac{K\gamma\sigma^{2}}{\eta}+KP_{\text{UE}}+LP_{\text{fix}}$ , a fixed term in $P_{\text{total}}$ , for simple exposition.

It is observed from (25) that $P_{\text{total}}$ varies depending on the number of active APs, i.e., $\left|\mathbb{A}\right|$ ; the total number of turned on RF chains, i.e., $\sum_{l\in\mathbb{A}}n_{l}$ ; and the number of antennas $N_{r}$ . More specifically, it is a linearly increasing function of these factors. Therefore, $P_{\text{total}}$ can be minimized by using only a subset of APs in the APS scheme [11], using a reduced number of antennas in the AS scheme [28], or optimizing both $\sum_{l\in\mathbb{A}}n_{l}$ and $\left|\mathbb{A}\right|$ in the proposed ARFA scheme. Next, we compare these schemes in terms of the total power consumption. Furthermore, conventional fixed-activation HBF schemes are also considered as benchmarks.

$\bullet$ ARFA scheme: When the ARFA scheme is employed, $n_{l}$ is different among the APs; however, the total number of RF chains is fixed to $L\bar{n}$ , i.e., $\sum_{l\in\mathbb{A}}n_{l}=L\bar{n}$ . By inserting this into (25), we obtain

\displaystyle P_{\text{total}}^{\text{ARFA}}=P_{0}+\left|\mathbb{A}\right|P_{\text{FH}}+\left|\mathbb{A}\right|N_{r}p_{\text{BF},1}+L\bar{n}p_{\text{BF},2},

(26)

where $\mathbb{A}$ contains only the APs with at least one activated RF chain. Therefore, we have $\left|\mathbb{A}\right|=\sum_{l\in\mathbb{A}}\delta_{l}$ with $\delta_{l}=1$ if $n_{l}>0$ , and $\delta_{l}=0$ if $n_{l}=0$ . We note that the proposed ARFA algorithms have different operations, which can result in different $\mathbb{A}$ . Therefore, they can have different power consumption.

$\bullet$ Fixed-activation HBF: We refer to the SC-HBF and D-HBF without the ARFA as the fixed-activation HBF. In this scheme, the same number of RF chains are activated at all $L$ APs. For comparison with the proposed ARFA schemes, we consider two deployments: $n_{l}=N,\forall l$ and $n_{l}=\bar{n},\forall l$ . We note that with fixed activation HBF, all the APs are in active mode because they have a fixed nonzero number of RF chains for signal processing, i.e., $\left|\mathbb{A}\right|=L$ . By inserting $n_{l}=N,\forall l$ , and $n_{l}=\bar{n},\forall l$ into (25), we obtain

	$\displaystyle P_{\text{total}}^{\text{fix},N}=P_{0}+LP_{\text{FH}}+LN_{r}p_{\text{BF},1}+LNp_{\text{BF},2},$		(27)
	$\displaystyle P_{\text{total}}^{\text{fix},\bar{n}}=P_{0}+LP_{\text{FH}}+LN_{r}p_{\text{BF},1}+L\bar{n}p_{\text{BF},2}.$		(28)

$\bullet$ APS scheme: In this scheme, only a subset of the APs is selected based on received power [11], whereas the others are put into the sleep mode. For comparison with the proposed schemes in mmWave systems, we assume the conventional deployment of RF chains at the APs, i.e., each AP is equipped with $N$ RF chains, all of which are used for analog signal combining, i.e., $n_{l}=N,\forall l$ . For a fair comparison, the number of APs in active mode in this scheme is assumed to be $\frac{L\bar{n}}{N}$ . This guarantees that a total of $L\bar{n}$ RF chains are used at the selected APs, which is equal to the number of activated RF chains in the proposed ARFA scheme and fixed-activation HBF scheme with $n_{l}=\bar{n},\forall l$ . By inserting $n_{l}=N,\forall l$ , and $\left|\mathbb{A}\right|=\frac{L\bar{n}}{N}$ into (25), we have

\displaystyle P_{\text{total}}^{\text{APS}}=P_{0}+\frac{L\bar{n}}{N}P_{\text{FH}}+\frac{L\bar{n}}{N}N_{r}p_{\text{BF},1}+L\bar{n}p_{\text{BF},2}.

(29)

$\bullet$ AS scheme: In the AS scheme, at each AP, only $N_{r}^{\text{AS}}$ of $N_{r}$ antennas are activated, corresponding to $N_{r}^{\text{AS}}$ received signals put through the digital signal combining [28]. In other words, analog signal combining is conducted by a network of $N_{r}^{\text{AS}}$ switches rather than $NN_{r}$ PSs, in contrast to the other compared schemes. Therefore, at each AP, $N_{r}$ switches are required, whereas the numbers of antennas, RF chains, and ADCs are the same and as small as $N_{r}^{\text{AS}}$ , and the number of mixers is $2N_{r}^{\text{AS}}$ . Furthermore, in the AS scheme, all the APs are in the active mode, i.e., $\left|\mathbb{A}\right|=L$ . Let $p_{\text{SW}}$ be the power consumed by a switch. The total power consumption in this scheme is given as

\displaystyle P_{\text{total}}^{\text{AS}}=P_{0}+LP_{\text{FH}}+LN_{r}p_{\text{SW}}+LN_{r}^{\text{AS}}(p_{\text{RF}}+p_{\text{ADC}}+p_{\text{BF},1}).

(30)

By comparing (26) to (27)–(29), we observe that:

•

The proposed ARFA scheme requires no higher power consumption than the fixed-activation HBF schemes with $n_{l}=N$ and $n_{l}=\bar{n},\forall l$ , because $\bar{n}<N$ and $\left|\mathbb{A}\right|\leq L$ . Furthermore, we note that a dominant part of the power consumed for beamforming is created by the RF chains and ADC. Therefore, from (26) and (27), it is clear that a considerable reduction in power consumption can be obtained by the ARFA if $\bar{n}\ll N$ is chosen.
•

Both the power consumption and total achievable rate of the proposed ARFA scheme significantly depend on $\bar{n}$ . Specifically, a smaller $\bar{n}$ leads to a reduction in both power consumption and total achievable rate with respect to the fixed-activation HBF scheme with $n_{l}=N,\forall l$ . This tradeoff is discussed further in the next section.
•

It is observed from (29) and (26) that the APS and proposed ARFA schemes have a difference of $\left|\frac{L\bar{n}}{N}-\left|\mathbb{A}\right|\right|(P_{\text{FH}}+N_{r}p_{\text{BF,1}})$ in power consumption, even though they have the same total number of active RF chains. Specifically, the APS scheme requires slightly lower power consumption, but its achievable rate is much lower than that of the ARFA scheme, as is shown in the next section. It is not certain from (30) and (26) which of AS and ARFA schemes has the lower power consumption, which will be determined based on the simulation results in the next section.

VI Simulation results

VI-A Simulation parameters

Table I: Simulation parameters [12, 11, 24, 41]

Parameters	Values
Power amplifier efficiency	$\eta=0.3$
Coherent time and data length	$T_{c}=2$ ms, $\tau_{c}=200$ , $\tau_{p}=20$ symbols
No. of quantization bits	$\alpha=2$ bits
UE and fixed power term	$P_{\text{UE}}=1$ W, $P_{\text{fix}}=0.825$ W
Fronthaul capacity and required power	$C_{\text{FH}}=100$ Mbps, $P_{\text{FH,max}}=50$ W
Component power	$p_{\text{LNA}}=20$ mW, $p_{\text{ADC}}=200$ mW, $p_{\text{RF}}=40$ mW $p_{\text{PS}}=30$ mW, $p_{\text{M}}=0.3$ mW, $p_{\text{SW}}=5$ mW, $\rho_{p}=100$ mW

Simulations are performed to evaluate the total achievable rates, power consumption, EEs, and computational complexities of the proposed SC-HBF, D-HBF, and ARFA schemes. In simulations, $K$ UEs and $L$ APs are uniformly distributed at random within a square coverage area of size $D\times D$ $\text{m}^{2}$ , where $D$ is set to $1000$ m [4]. The large-scale fading coefficients are computed based on (2) with $\epsilon=4.1$ , $\xi=7.6$ , and the antenna gain is set to $G_{\mathrm{a}}=15$ dBi [36]. Furthermore, we assume $f_{c}=28$ GHz, $B=100$ MHz, and $\text{NF}=9$ dB for the carrier frequency, system bandwidth, and noise figure, respectively. As a result, the noise power is given as $\sigma^{2}=-174\text{ dBm/Hz}+10\log_{10}(B)+\text{NF}$ .

The channel coefficients between each UE and AP are generated based on the geometric Saleh–Valenzuela channel model given in (1). For simplicity, we assume an identical number of effective channel paths between each UE and AP, which is set to $P_{kl}=20,\forall l,k$ [47, 32, 1], reflecting the limited scattering in mmWave channels. The AoAs are uniformly distributed in $\left[-\frac{\pi}{12},\frac{\pi}{12}\right]$ . The ULA model is employed for the antenna arrays at the APs and UEs with antenna spacing of half a wavelength, i.e., $\frac{d_{s}}{\lambda}=\frac{1}{2}$ [40, 48]. The phases in the analog combiner are selected from $\Theta=\left\{0,\frac{2\pi}{2^{b}},\frac{4\pi}{2^{b}},\ldots,\frac{2(2^{b}-1)\pi}{2^{b}}\right\}$ , where $b=4$ is set, implying 4-bit quantization of the PSs. The parameters in Table I are assumed to compute the total power consumption [12, 11, 24, 41].

VI-B Performance of the C-HBF and SC-HBF schemes

We numerically evaluate the total achievable rates of the C-HBF and SC-HBF schemes, which are analyzed in Section III. We assume the conventional RF chain deployment in this section, i.e., all $N$ available RF chains are active for analog combining. For comparison, we consider the beam selection scheme, in which the analog beamforming vectors are selected from a discrete Fourier transform (DFT) codebook based on an exhaustive search [49, 50]. Fig. 2 shows the total achievable rates of the SC-HBF, D-HBF, and beam selection schemes with $(N_{r},K)=\{(32,4),(64,8)\}$ , $N=K$ , and $L=32$ [21]. It is clear from Fig. 2 that for both the considered systems, the SC-HBF and D-HBF schemes have almost the same total achievable rates because they employ the same digital beamformer and their analog beamformers perform approximately the same, as stated in Remark 1. Specifically, the former performs only slightly better than the latter. It is also observed in Fig. 2 that the proposed SC-HBF and D-HBF schemes outperform the beam selection method for both the considered systems.

VI-C Performance of the proposed ARFA scheme

The total achievable rates, power consumption, and EEs of the proposed ARFA schemes, namely, SC-ARFA, PL-based D-ARFA, and SV-based D-ARFA, are compared to those of the fixed-activation HBF with $n_{l}=N$ and $n_{l}=\bar{n},\forall l$ , APS, and AS schemes discussed in Section V. In our simulations, SC-HBF is used for the fixed-activation HBF and APS schemes. We note that the SC- and D-HBF provides almost identical performance, as shown in Fig. 2, and for the same RF chain deployment, they have the same power consumption. We consider a CF mmWave massive MIMO system with $L=32$ , $K=8$ , $N_{r}=64$ , $N=8$ [5, 21], and $\bar{n}=2$ . In the AS scheme, the number of selected antennas at each AP is set to $N_{r}^{\text{AS}}=32$ , which ensures that the AS scheme achieves comparable total achievable rates with respect to the proposed schemes, allowing us to compare them in terms of EE. In the simulations, the power consumption of the fixed-activation HBF schemes with $n_{l}=N$ , $n_{l}=\bar{n}$ , the APS, and AS scheme is computed based on (27)–(30), whereas that of the proposed ARFA schemes is obtained through simulations because it depends on $\delta_{n_{l}}$ , as indicated in (26). The EE of a scheme is calculated as the ratio between the total achievable rate and the total power consumption. Furthermore, for a fair comparison, we fix the total number of active RF chains for each compared scheme to $L\bar{n}$ , which ensures that an average of $\bar{n}$ RF chains are activated at each AP in all compared schemes.

In Fig. 3, we show the total achievable rates and EEs of the considered schemes versus the average transmit power $\rho$ for $L=32$ , $K=8$ , $N_{r}=64$ , $N=8$ , and $\bar{n}=2$ . From Fig. 3, the following observations are noted:

•

It is seen that the fixed-activation HBF scheme with $n_{l}=N$ achieves the highest total achievable rate, as seen in Fig. 3(a), because it activates all the available APs and RF chains. However, in this scheme, high power is consumed by $LN$ RF chains. Therefore, its EE is significantly lower than those of the other considered schemes, in which only $L\bar{n}$ $(\ll LN)$ RF chains are turned on, as seen in Fig. 3(b).
•

Among the proposed ARFA schemes, the SC-ARFA achieves the highest achievable rate and EE. However, all of these schemes achieve remarkable improvement in EE with a small loss in the total achievable rate with respect to the fixed-activation HBF scheme with $n_{l}=N$ ..
•

It is also observed that the proposed ARFA schemes outperform the AS scheme in terms of both the total achievable rate and EE. As it attains a comparable EE to the SC-ARFA scheme, the APS scheme is seen to be energy-efficient in Fig. 3(b), but its achievable rate is lower than those of all the proposed ARFA schemes. The fixed-activation HBF scheme with $n_{l}=\bar{n}$ is outperformed by the proposed SC-ARFA schemes in both achievable rate and EE.

In Fig. 4, we show the total achievable rates, EEs, and power consumption of the considered schemes versus the number of APs. In this figure, we use the same simulation parameters as in Fig. 3, except for the varying numbers of APs, i.e., $L=\{8,16,32,48,64\}$ , and $\rho=40$ dBm. In Figs. 4(a) and 4(b), the observations on the achievable rates and EEs of the considered schemes are similar to those from Fig. 3. In particular, it is seen that in the entire range of $L$ , the proposed ARFA schemes have small losses in total achievable rate but significant improvement in EE with respect to the fixed-activation HBF scheme with $n_{l}=N$ . Furthermore, the proposed ARFA schemes perform better than or comparable to the APS scheme in terms of both achievable rate and EE. In particular, it is clear that the AS is less efficient in both the spectral and energy compared to the proposed schemes. To further explain the EEs, we consider the total power consumption of these schemes in Fig. 4(c). It can be seen that the total power consumption of the fixed-activation schemes quickly increases with $L$ . Therefore, activating all $N$ RF chains at all the APs causes an extremely high power consumption for the CF mmWave massive MIMO system, motivating the ARFA in this work. Among the other schemes, the AS scheme consumes the highest power while achieving the lowest rates, making it energy-inefficient, as seen in Fig. 4(b). The proposed SC-ARFA and APS schemes have comparable and low power consumption.

To summarize, we can conclude from Figs. 3 and 4 that when a limited number of RF chains are used, equally activating the same number of RF chains at the APs, as in the fixed-activation HBF scheme with $n_{l}=N$ or $n_{l}=\bar{n}$ , is relatively energy-inefficient. Furthermore, although the APS scheme is energy-efficient approach, it has losses in total achievable rate. In contrast, the proposed ARFA schemes achieve the highest or nearly highest performance in terms of both spectral and energy efficiency, which are both far better than those of the AS scheme.

VI-D Tradeoff between achievable rates and power consumption

The total achievable rate and power consumption of the considered schemes versus $\bar{n}$ are evaluated numerically in Fig. 5 for $L=32$ , $K=8$ , $N_{r}=64$ , $N=8$ , $\bar{n}=\{1,2,\ldots,8\}$ , and $\rho=40$ dBm. From Fig. 5, the following observations can be noted:

•

The total achievable rate and power consumption of the fixed-activation HBF with $n_{l}=N$ and those of the AS scheme remain unchanged with $\bar{n}$ because $N$ and $N_{r}^{\text{AS}}$ RF chains, respectively, are always active at every AP. In contrast, those of the other schemes depend on $\bar{n}$ . Specifically, as $\bar{n}$ increases, both the total achievable rate and power consumption of the fixed-activation HBF scheme with $n_{l}=\bar{n}$ , the proposed ARFA, and the APS schemes increase to approach those of the fixed-activation HBF with $n_{l}=N$ .
•

In Fig. 5(a), the proposed ARFA schemes perform closest to the fixed-activation HBF scheme with $n_{l}=N$ , especially for a small $\bar{n}$ . In terms of power consumption, they require slightly higher power than the APS scheme. However, the EEs of SC-ARFA and APS are comparable, as shown in Figs. 3 and 4, owing to the efficient use of RF chains. Furthermore, for $\bar{n}\geq 2$ , the AS scheme has the lowest power consumption at the cost of the smallest total achievable rate.
•

For the optimal performance–power consumption tradeoff in the assumed environment, $\bar{n}\in[2,4]$ can be chosen in the proposed ARFA schemes to achieve a significant power reduction with marginal performance loss. In particular, for $\bar{n}=4$ , the performance loss is only $1.4-4.8\%$ . With $\bar{n}=1$ , only a single RF chain on average is turned on at each AP, and a significant loss in the achievable rate is observed for the proposed ARFA with respect to the fixed-activation HBF with $n_{l}=N$ .

VI-E Fronthauling load analysis

Table II: Comparison of the decentralized and semi-centralized schemes in terms of fronthauling load

Schemes	AP to CPU	CPU to AP
SC-HBF	$N_{r}K$ complex numbers	$N_{r}N$ real numbers
SC-ARFA	$N_{r}K$ complex numbers	$N_{r}\bar{n}$ real numbers
D-HBF	$K$ complex numbers	0
SV-based D-ARFA	$K$ complex numbers and $N$ real numbers	1 real number
PL-based D-ARFA	$K$ complex numbers	1 real number

In this section, we evaluate the amount of information exchange between the CPU and APs, which is presented in Table II. In the SC-HBF scheme, the CSI for $\hat{\textbf{{H}}}_{l}$ of size $N_{r}\times K$ is sent from the $l$ th AP to the CPU, which is used at the CPU to generate $\textbf{{F}}_{l}$ of size $N_{r}\times N$ . However, we note that all the entries of $\textbf{{F}}_{l}$ have constant amplitudes of $\frac{1}{\sqrt{N_{r}}}$ . Therefore, only $N_{r}N$ real numbers representing the phases are fed back on the reverse link. A similar analysis is valid for SC-ARFA with the note that only an average of $N_{r}\bar{n}$ real numbers need to be fed back from the APs to the CPU because in these schemes, only an average of $\bar{n}$ RF chains are activated. It is observed that the amount of information exchange between the CPU and APs is relatively large in the SC-HBF and SC-ARFA schemes.

In contrast, those for the decentralized schemes are small. Specifically, in the D-HBF schemes, only $K$ complex numbers representing the estimate of the transmitted signal, i.e., $\textbf{{r}}_{l}$ , are sent to the CPU for the final soft estimation. In the SV-based D-ARFA scheme, an addition of $N$ real numbers for the $N$ singular values are sent from an AP to the CPU for each channel variation to perform ARFA. In contrast, the transmission of path loss values in the PL-based D-ARFA scheme can be ignored because of their slow variations. On the reverse link from the CPU to an AP, only a single real number, which is the number of active RF chains, is fed back in the SV- and PL-based D-ARFA schemes, as demonstrated in Algorithms 3 and 4, respectively. Given that $N\ll N_{r}$ , the decentralized schemes require much less information exchange between the CPU and each AP compared to the semi-centralized schemes.

VII Conclusion

In this work, we propose two HBF schemes for CF mmWave massive MIMO systems, including SC-HBF and D-HBF, in which the analog combiners are generated at the CPU based on the global CSI and at each AP based on the local CSI, respectively. Notably, although the D-HBF requires substantially lower computational complexity and no information exchange between the CPU and APs, it achieves approximately the same total achievable rate as that obtained by the SC-HBF scheme. Furthermore, to reduce the power consumption in the CF mmWave massive MIMO system, we propose adaptive activation of RF chains at the APs. Low-complexity algorithms are developed to select the number of active RF chains at the APs such that the system’s power consumption is significantly reduced with only a marginal loss in the total achievable rate. The efficiency of the proposed schemes is justified by the simulation results, which show that the proposed ARFA scheme achieves significant improvement in EE while leading to a loss of only small loss in total achievable rate. In this work, the assumption on the availability of the full CSI for all the antennas is adopted for the design of hybrid beamformers and ARFA schemes. Although such full CSI can be obtained by compressed sensing-based approaches, further studies are required to make it practical in CF mmWave massive MIMO systems. Further optimization of the proposed analog combining schemes for wideband systems can be considered for future research. Furthermore, the proposed ARFA scheme can be incorporated with low-resolution ADCs [51, 52] to further reduce the power consumption.

Appendix A Proof of Theorem 1

Let $\textbf{{Q}}=\textbf{{I}}_{LN}+\gamma\textbf{{F}}^{H}\textbf{{H}}\textbf{{H}}^{H}\textbf{{F}}$ . Because F is a block-diagonal matrix, we have

\textbf{{H}}^{H}\textbf{{F}}=\left[\textbf{{H}}_{1}^{H}\textbf{{F}}_{1},\textbf{{H}}_{2}^{H}\textbf{{F}}_{2},\ldots,\textbf{{H}}_{L}^{H}\textbf{{F}}_{L}\right],

leading to $\textbf{{H}}^{H}\textbf{{F}}\textbf{{F}}^{H}\textbf{{H}}=\sum_{l=1}^{L}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}$ . Therefore, Q can be expressed as $\textbf{{Q}}=\textbf{{I}}_{K}+\gamma\sum_{l=1}^{L}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}$ . By letting $\textbf{{G}}_{l}=\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}$ , Q can be further expanded as

	Q	$\displaystyle=\underbrace{\textbf{{I}}_{K}+\gamma\textbf{{G}}_{1}}_{\triangleq\textbf{{E}}_{1}}+\gamma\textbf{{G}}_{2}+\ldots+\gamma\textbf{{G}}_{L}=\textbf{{E}}_{1}(\underbrace{\textbf{{I}}_{K}+\gamma\textbf{{E}}_{1}^{-1}\textbf{{G}}_{2}}_{\triangleq\textbf{{E}}_{2}}+\ldots+\gamma\textbf{{E}}_{1}^{-1}\textbf{{G}}_{L})$
		$\displaystyle=\textbf{{E}}_{1}\textbf{{E}}_{2}(\underbrace{\textbf{{I}}_{K}+\gamma\textbf{{E}}_{2}^{-1}\textbf{{E}}_{1}^{-1}\textbf{{G}}_{2}}_{\triangleq\textbf{{E}}_{3}}+\ldots+\gamma\textbf{{E}}_{2}^{-1}\textbf{{E}}_{1}^{-1}\textbf{{G}}_{L})=\ldots=\textbf{{E}}_{1}\textbf{{E}}_{2}\ldots\textbf{{E}}_{L},$		(31)

where $\textbf{{E}}_{l}=\textbf{{I}}_{K}+\gamma(\textbf{{E}}_{1}\ldots\textbf{{E}}_{l-1})^{-1}\textbf{{G}}_{l}$ , $l=2,3,\ldots,L$ . As a result, $\bar{\mathcal{R}}^{\mathrm{a}}$ can be expressed as

	$\displaystyle{\bar{\mathcal{R}}^{\mathrm{a}}}$	$\displaystyle=\log_{2}\det\textbf{{Q}}=\sum_{l=1}^{L}\log_{2}\det(\textbf{{E}}_{l})=\sum_{l=1}^{L}\log_{2}\det(\textbf{{I}}_{K}+\gamma(\underbrace{\textbf{{E}}_{1}\ldots\textbf{{E}}_{l-1}}_{\triangleq\textbf{{Q}}_{l-1}})^{-1}\textbf{{G}}_{l})$		(32)
		$\displaystyle=\sum_{l=1}^{L}\log_{2}\det\left(\textbf{{I}}_{K}+\gamma\textbf{{Q}}_{l-1}^{-1}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\right)=\sum_{l=1}^{L}\log_{2}\det\left(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\textbf{{Q}}_{l-1}^{-1}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\right),$		(33)

as given in Theorem 1. The last equality in (33) follows from $\det(\textbf{{I}}_{K}+\textbf{{A}}\textbf{{B}})=\det(\textbf{{I}}_{N}+\textbf{{B}}\textbf{{A}})$ with $\textbf{{A}}=\textbf{{Q}}_{l-1}^{-1}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\in\mathbb{C}^{K\times N}$ and $\textbf{{B}}=\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\in\mathbb{C}^{N\times K}$ . Furthermore, from the definition of $\textbf{{Q}}_{l-1}$ in (32), we have $\textbf{{E}}_{l-1}=\textbf{{I}}_{K}+\gamma(\textbf{{E}}_{1}\ldots\textbf{{E}}_{l-2})^{-1}\textbf{{G}}_{l-1}=\textbf{{I}}_{K}+\gamma\textbf{{Q}}_{l-2}^{-1}\textbf{{G}}_{l}$ . Finally, recalling that $\textbf{{G}}_{l}=\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}$ , we obtain the expression of $\textbf{{Q}}_{l-1}$ in (12), i.e.,

\displaystyle\textbf{{Q}}_{l-1}

\displaystyle=\textbf{{Q}}_{l-2}\textbf{{E}}_{l-1}=\textbf{{Q}}_{l-2}(\textbf{{I}}_{K}+\gamma\textbf{{Q}}_{l-2}^{-1}\textbf{{G}}_{l})=\textbf{{Q}}_{l-2}+\gamma\textbf{{H}}_{l-1}^{H}\textbf{{F}}_{l-1}\textbf{{F}}_{l-1}^{H}\textbf{{H}}_{l-1},

(34)

with $\textbf{{Q}}_{0}=\textbf{{I}}_{K}$ , which completes the proof.

Appendix B Proof of Remark 1

From (34), $\textbf{{Q}}_{l-1}$ in (11) can be expressed as

\displaystyle\textbf{{Q}}_{l-1}=\textbf{{I}}_{K}+\gamma\sum_{i=1}^{l-1}\textbf{{H}}_{i}^{H}\textbf{{F}}_{i}\textbf{{F}}_{i}^{H}\textbf{{H}}_{i},l=2,\ldots,L.

(35)

B-1 When $l$ is small

With the assumption of very low SNRs in CF mmWave massive MIMO, we have $\textbf{{Q}}_{l-1}\approx\textbf{{I}}_{K}$ for small $l$ , leading to

\displaystyle\bar{\mathcal{R}}^{\mathrm{a}}_{l}\approx\tilde{\mathcal{R}}^{\mathrm{a}}_{l}=\log_{2}\det\left(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\right),

(36)

where $\bar{\mathcal{R}}^{\mathrm{a}}_{l}$ is the sub-rate associated with the $l$ th AP in SC-HBF, given in (11). The unconstrained combiner that maximizes $\tilde{\mathcal{R}}_{l}$ in (36) is the matrix with columns being the $N$ singular vectors corresponding to the $N$ largest singular values of $\textbf{{H}}_{l}$ . As a result, the analog combining vectors in the D-HBF scheme can be determined as in (16).

B-2 As $l$ increases

Because $\textbf{{F}}_{i}$ only depends on $\textbf{{H}}_{i}$ for small $l$ , $\{\textbf{{H}}_{i}^{H}\textbf{{F}}_{i}\textbf{{F}}_{i}^{H}\textbf{{H}}_{i}\},i=1,\ldots,l-1$ are independent of each other. As $l$ grows and becomes sufficiently large, by the law of large numbers, we have $\sum_{i=1}^{l-1}\textbf{{H}}_{i}^{H}\textbf{{F}}_{i}\textbf{{F}}_{i}^{H}\textbf{{H}}_{i}\rightarrow(l-1)\bar{\textbf{{E}}}$ , where $\bar{\textbf{{E}}}=\mathbb{E}\left\{\textbf{{H}}_{i}^{H}\textbf{{F}}_{i}\textbf{{F}}_{i}^{H}\textbf{{H}}_{i}\right\}$ has constant diagonal elements and zeros for off-diagonal elements. Therefore, $\textbf{{Q}}_{l-1}$ in (35) becomes approximately diagonal even when $l$ is large, as does $\textbf{{Q}}_{l-1}^{-1}$ .

Based on the ordered singular value decomposition, $\textbf{{H}}_{l}$ can be factorized as $\textbf{{H}}_{l}=\textbf{{U}}\mathbf{\Sigma}\textbf{{V}}^{H}$ , where $\mathbf{\Sigma}$ is an $N_{r}\times K$ rectangular diagonal matrix with the singular values of $\textbf{{H}}_{l}$ on the main diagonal in decreasing order, whereas U and V are unitary matrices of size $N_{r}\times N_{r}$ and $K\times K$ , whose columns are the left- and right-singular vectors of $\textbf{{H}}_{l}$ , respectively. Then, $\bar{\mathcal{R}}_{l}^{\mathrm{a}}$ in (11) can be expressed as

\displaystyle\bar{\mathcal{R}}_{l}^{\mathrm{a}}

\displaystyle=\log_{2}\det(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{U}}\underbrace{\mathbf{\Sigma}\textbf{{V}}^{H}\textbf{{Q}}_{l-1}^{-1}\textbf{{V}}\mathbf{\Sigma}^{H}}_{\triangleq\mathbf{\Lambda}}\textbf{{U}}^{H}\textbf{{F}}_{l}).

(37)

Because $\textbf{{Q}}_{l-1}^{-1}$ is approximately a diagonal matrix with constant diagonal elements, as shown above, $\mathbf{\Lambda}=\mathbf{\Sigma}\textbf{{V}}^{H}\textbf{{Q}}_{l-1}^{-1}\textbf{{V}}\mathbf{\Sigma}^{H}$ becomes approximately a diagonal matrix, and its diagonal elements are in decreasing order. Therefore, the optimal solution of $\max_{\textbf{{F}}_{l}}\bar{\mathcal{R}}_{l}^{\mathrm{a}}=\log_{2}\det(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{U}}\mathbf{\Lambda}\textbf{{U}}^{H}\textbf{{F}}_{l})$ is approximately the matrix with columns being the first $N$ columns of U, which are the singular vectors corresponding to the $N$ largest singular values of $\textbf{{H}}_{l}$ , implying the analog combining vectors given in (16) for D-HBF. This completes the proof.

References

[1] N. T. Nguyen and K. Lee, “Coverage and Cell-Edge Sum-Rate Analysis of mmWave Massive MIMO Systems With ORP Schemes and MMSE Receivers,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5349–5363, 2018.
[2] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter wave channel modeling and cellular capacity evaluation,” IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1164–1179, 2014.
[3] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO: Uniformly great service for everyone,” in IEEE 16th Int. Workshop Signal Process. Advances in Wireless Commun. (SPAWC), 2015, pp. 201–205.
[4] ——, “Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, 2017.
[5] G. Femenias and F. Riera-Palou, “Cell-Free Millimeter-Wave Massive MIMO Systems With Limited Fronthaul Capacity,” IEEE Access, vol. 7, pp. 44 596–44 612, 2019.
[6] G. Interdonato, H. Q. Ngo, P. Frenger, and E. G. Larsson, “Downlink training in cell-free massive MIMO: A blessing in disguise,” IEEE Trans. Wireless Commun., vol. 18, no. 11, pp. 5153–5169, 2019.
[7] A. Papazafeiropoulos, P. Kourtessis, M. Di Renzo, S. Chatzinotas, and J. M. Senior, “Performance Analysis of Cell-Free Massive MIMO Systems: A Stochastic Geometry Approach,” IEEE Trans. Veh. Tech., vol. 69, no. 4, pp. 3523–3537, 2020.
[8] E. Nayebi, A. Ashikhmin, T. L. Marzetta, and H. Yang, “Cell-free massive MIMO systems,” in IEEE Asilomar Conf. Signals, Systems and Computers, 2015, pp. 695–699.
[9] E. Björnson and L. Sanguinetti, “Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,” IEEE Trans. Wireless Commun., 2019.
[10] L. D. Nguyen, T. Q. Duong, H. Q. Ngo, and K. Tourki, “Energy efficiency in cell-free massive MIMO with zero-forcing precoding design,” IEEE Commun. Lett., vol. 21, no. 8, pp. 1871–1874, 2017.
[11] H. Q. Ngo, L.-N. Tran, T. Q. Duong, M. Matthaiou, and E. G. Larsson, “On the total energy efficiency of cell-free massive MIMO,” IEEE Trans. Green Commun. and Network., vol. 2, no. 1, pp. 25–39, 2017.
[12] M. Bashar, K. Cumanan, A. G. Burr, H. Q. Ngo, E. G. Larsson, and P. Xiao, “Energy efficiency of the cell-free massive MIMO uplink with optimal uniform quantization,” IEEE Trans. Green Commun. Networking, vol. 3, no. 4, pp. 971–987, 2019.
[13] G. Interdonato, P. Frenger, and E. G. Larsson, “Scalability aspects of cell-free massive MIMO,” in IEEE Int. Conf. Commun. (ICC), 2019, pp. 1–6.
[14] E. Björnson and L. Sanguinetti, “Scalable cell-free massive MIMO systems,” IEEE Trans. Commun., 2020.
[15] S. E. Hajri, J. Denis, and M. Assaad, “Enhancing Favorable Propagation in Cell-Free Massive MIMO Through Spatial User Grouping,” in IEEE Int. Workshop Signal Process. Advances Wireless Commun. (SPAWC), 2018, pp. 1–5.
[16] X. Hu, C. Zhong, X. Chen, W. Xu, H. Lin, and Z. Zhang, “Cell-free massive MIMO systems with low resolution ADCs,” IEEE Trans. Communications, vol. 67, no. 10, pp. 6844–6857, 2019.
[17] E. Nayebi, A. Ashikhmin, T. L. Marzetta, H. Yang, and B. D. Rao, “Precoding and power optimization in cell-free massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 7, pp. 4445–4459, 2017.
[18] J. Zhang, Y. Wei, E. Björnson, Y. Han, and S. Jin, “Performance analysis and power control of cell-free massive MIMO systems with hardware impairments,” IEEE Access, vol. 6, pp. 55 302–55 314, 2018.
[19] S. Buzzi and C. D’Andrea, “Cell-free massive MIMO: User-centric approach,” IEEE Wireless Commun. Lett., vol. 6, no. 6, pp. 706–709, 2017.
[20] S.-H. Park, O. Simeone, Y. C. Eldar, and E. Erkip, “Optimizing Pilots and Analog Processing for Channel Estimation in Cell-Free Massive MIMO with One-Bit ADCs,” in IEEE Int. Workshop Signal Process. Advances Wireless Commun. (SPAWC), 2018, pp. 1–5.
[21] J. Li, D.-W. Yue, and Y. Sun, “Performance analysis of millimeter wave massive mimo systems in centralized and distributed schemes,” IEEE Access, vol. 6, pp. 75 482–75 494, 2018.
[22] M. Alonzo and S. Buzzi, “Cell-free and user-centric massive MIMO at millimeter wave frequencies,” in IEEE Annual Int. Symposium Personal, Indoor, and Mobile Radio Communications (PIMRC), 2017, pp. 1–5.
[23] M. Alonzo, S. Buzzi, A. Zappone, and C. D’Elia, “Energy-Efficient Power Control in Cell-Free and User-Centric Massive MIMO at Millimeter Wave,” IEEE Trans. Green Commun. Network., 2019.
[24] N. T. Nguyen and K. Lee, “Unequally Sub-connected Architecture for Hybrid Beamforming in Massive MIMO Systems,” IEEE Trans. Wireless Commun., vol. 19, no. 2, pp. 1127–1140, Feb. 2020.
[25] H. Liu, J. Zhang, X. Zhang, A. Kurniawan, T. Juhana, and B. Ai, “Tabu-search-based pilot assignment for cell-free massive MIMO systems,” IEEE Trans. Veh. Tech., vol. 69, no. 2, pp. 2286–2290, 2019.
[26] Y. Jin, J. Zhang, S. Jin, and B. Ai, “Channel estimation for cell-free mmwave massive MIMO through deep learning,” IEEE Trans. Veh. Technol., vol. 68, no. 10, pp. 10 325–10 329, 2019.
[27] Z. Chen and E. Björnson, “Channel hardening and favorable propagation in cell-free massive MIMO with stochastic geometry,” IEEE Trans. Commun., vol. 66, no. 11, pp. 5205–5219, 2018.
[28] T.-H. Tai, W.-H. Chung, and T.-S. Lee, “A low complexity antenna selection algorithm for energy efficiency in massive mimo systems,” in IEEE Int. Conf. Data Science and Data Intensive Systems, 2015, pp. 284–289.
[29] Z. Liu, W. Du, and D. Sun, “Energy and spectral efficiency tradeoff for massive MIMO systems with transmit antenna selection,” IEEE Trans. Veh. Tech., vol. 66, no. 5, pp. 4453–4457, 2016.
[30] X. Gao, L. Dai, and A. M. Sayeed, “Low RF-complexity technologies to enable millimeter-wave MIMO with large antenna array for 5G wireless communications,” IEEE Commun. Mag., vol. 56, no. 4, pp. 211–217, 2018.
[31] A. Kaushik, J. Thompson, E. Vlachos, C. Tsinos, and S. Chatzinotas, “Dynamic RF Chain Selection for Energy Efficient and Low Complexity Hybrid Beamforming in Millimeter Wave MIMO Systems,” IEEE Trans. Green Commun. Networking, vol. 3, no. 4, pp. 886–900, 2019.
[32] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846, 2014.
[33] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMO systems,” IEEE Trans. Wireless Commun., vol. 13, no. 3, pp. 1499–1513, 2014.
[34] S. Han, I. Chih-Lin, Z. Xu, and C. Rowell, “Large-scale antenna systems with hybrid analog and digital beamforming for millimeter wave 5G,” IEEE Commun. Mag., vol. 53, no. 1, pp. 186–194, 2015.
[35] Y.-Y. Lee, C.-H. Wang, and Y.-H. Huang, “A hybrid RF/baseband precoding processor based on parallel-index-selection matrix-inversion-bypass simultaneous orthogonal matching pursuit for millimeter wave MIMO systems,” IEEE Trans. Signal Process., vol. 63, no. 2, pp. 305–317, 2015.
[36] T. S. Rappaport, G. R. MacCartney, M. K. Samimi, and S. Sun, “Wideband millimeter-wave propagation measurements and channel models for future wireless communication system design,” IEEE Trans. Commun., vol. 63, no. 9, pp. 3029–3056, 2015.
[37] T. S. Rappaport, E. Ben-Dor, J. N. Murdock, and Y. Qiao, “38 GHz and 60 GHz angle-dependent propagation for cellular & peer-to-peer wireless communications,” in IEEE ICC, 2012, pp. 4568–4573.
[38] A. Alkhateeb, G. Leus, and R. W. Heath, “Compressed sensing based multi-user millimeter wave systems: How many measurements are needed?” in IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), 2015, pp. 2909–2913.
[39] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming design for large-scale antenna arrays,” IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 501–513, 2016.
[40] X. Gao, L. Dai, S. Han, I. Chih-Lin, and R. W. Heath, “Energy-efficient hybrid analog and digital precoding for mmWave MIMO systems with large antenna arrays,” IEEE J. Sel. Areas Commun., vol. 34, no. 4, pp. 998–1009, 2016.
[41] B. Dai and W. Yu, “Energy efficiency of downlink transmission strategies for cloud radio access networks,” IEEE J. Sel. Areas Commun., vol. 34, no. 4, pp. 1037–1050, 2016.
[42] E. Björnson, L. Sanguinetti, J. Hoydis, and M. Debbah, “Optimal design of energy-efficient multi-user MIMO systems: Is massive MIMO the answer?” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3059–3075, 2015.
[43] S. Payami, N. M. Balasubramanya, C. Masouros, and M. Sellathurai, “Phase shifters versus switches: An energy efficiency perspective on hybrid beamforming,” IEEE Wireless Commun. Let., vol. 8, no. 1, pp. 13–16, 2018.
[44] K. Roth and J. A. Nossek, “Achievable rate and energy efficiency of hybrid and digital beamforming receivers with low resolution ADC,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 2056–2068, 2017.
[45] K. Roth, H. Pirzadeh, A. L. Swindlehurst, and J. A. Nossek, “A comparison of hybrid beamforming and digital beamforming with low-resolution ADCs for multiple users and imperfect CSI,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 3, pp. 484–498, 2018.
[46] M. Bashar, H. Q. Ngo, K. Cumanan, A. G. Burr, P. Xiao, E. Björnson, and E. G. Larsson, “Uplink Spectral and Energy Efficiency of Cell-Free Massive MIMO with Optimal Uniform Quantization,” IEEE Trans. Commun., 2020.
[47] A. Alkhateeb, G. Leus, and R. W. Heath, “Limited feedback hybrid precoding for multi-user millimeter wave systems,” IEEE Trans. Wireless Commun., vol. 14, no. 11, pp. 6481–6494, 2015.
[48] T. E. Bogale, L. B. Le, A. Haghighat, and L. Vandendorpe, “On the number of RF chains and phase shifters, and scheduling design with hybrid analog–digital beamforming,” IEEE Trans. Wireless Commun., vol. 15, no. 5, pp. 3311–3326, 2016.
[49] Y. Han, S. Jin, J. Zhang, J. Zhang, and K.-K. Wong, “DFT-based hybrid beamforming multiuser systems: Rate analysis and beam selection,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 3, pp. 514–528, 2018.
[50] D. Yang, L.-L. Yang, and L. Hanzo, “DFT-based beamforming weight-vector codebook design for spatially correlated channels in the unitary precoding aided multiuser downlink,” in IEEE Int. Conf. Commun. (ICC), 2010, pp. 1–5.
[51] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1637–1660, 2020.
[52] J. Zhang, L. Dai, X. Li, Y. Liu, and L. Hanzo, “On low-resolution ADCs in practical 5G millimeter-wave massive MIMO systems,” IEEE Commun. Mag., vol. 56, no. 7, pp. 205–211, 2018.

Hybrid Beamforming and Adaptive RF Chain Activation for Uplink Cell-Free Millimeter-Wave Massive MIMO Systems

Abstract

Index Terms:

I Introduction

I-A Related works

I-B Contributions

II System Model

II-A Uplink Channel Estimation

II-B Uplink Data Transmission

III SC-HBF and D-HBF

Theorem 1

Proof

III-A SC-HBF

III-B D-HBF

Remark 1

Proof

IV Adaptive RF chain activation

IV-A Problem formulation and basic ideas

Remark 2

IV-B ARFA with SC-HBF (SC-ARFA)

IV-C ARFA with D-HBF (D-ARFA)

IV-C1 Singular values-based D-ARFA (SV-based D-ARFA)

IV-C2 Path loss-based D-ARFA (PL-based D-ARFA)

V Power consumption analysis

VI Simulation results

VI-A Simulation parameters

VI-B Performance of the C-HBF and SC-HBF schemes

VI-C Performance of the proposed ARFA scheme

VI-D Tradeoff between achievable rates and power consumption

VI-E Fronthauling load analysis

VII Conclusion

Appendix A Proof of Theorem 1

Appendix B Proof of Remark 1

B-1 When ll is small

B-2 As ll increases

References

B-1 When $l$ is small

B-2 As $l$ increases