This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Hybrid Beamforming and Adaptive RF Chain Activation for Uplink Cell-Free Millimeter-Wave Massive MIMO Systems

Nhan Thanh Nguyen, Kyungchun Lee, , and Huaiyu Dai N. T. Nguyen and K. Lee are with the Department of Electrical and Information Engineering and the Research Center for Electrical and Information Technology, Seoul National University of Science and Technology, 232 Gongneung-ro, Nowon-gu, Seoul, 01811, Republic of Korea (e-mail: nhan.nguyen, kclee@seoultech.ac.kr).H. Dai is with the Department of Electrical and Computer Engineering, North Carolina State University, NC, USA. (e-mail: Huaiyu_Dai@ncsu.edu).
Abstract

In this work, we investigate hybrid analog–digital beamforming (HBF) architectures for uplink cell-free (CF) millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems. We first propose two HBF schemes, namely, decentralized HBF (D-HBF) and semi-centralized HBF (SC-HBF). In the former, both the digital and analog beamformers are generated independently at each AP based on the local channel state information (CSI). In contrast, in the latter, only the digital beamformer is obtained locally at the AP, whereas the analog beamforming matrix is generated at the central processing unit (CPU) based on the global CSI received from all APs. We show that the analog beamformers generated in these two HBF schemes provide approximately the same achievable rates despite the lower complexity of D-HBF and its lack of CSI requirement. Furthermore, to reduce the power consumption, we propose a novel adaptive radio frequency (RF) chain-activation (ARFA) scheme, which dynamically activates/deactivates RF chains and their connected analog-to-digital converters (ADCs) and phase shifters (PSs) at the APs based on the CSI. For the activation of RF chains, low-complexity algorithms are proposed, which can achieve significant improvement in energy efficiency (EE) with only a marginal loss in the total achievable rate.

Index Terms:
Cell-free massive MIMO, mmWave communication, hybrid beamforming, RF chain activation.

I Introduction

Recently, many attempts have been made to utilize millimeter waves (mmWaves) for high-data-rate mobile broadband communications. The main challenge of mmWave communication is the large path loss due to high carrier frequencies, which significantly limits the system performance and cell coverage [1, 2]. Fortunately, the short wavelength of mmWave systems facilitates the deployment of massive multiple-input multiple-output (MIMO) systems, which can provide large beamforming gains to compensate for the path loss in mmWave channels. Furthermore, Ngo et al. in [3, 4] introduce a cell-free (CF) massive MIMO architecture, in which a very large number of distributed access points (APs) connected to a single central processing unit (CPU) simultaneously serve a much smaller number of users over the same time/frequency resources. Therefore, the CF massive MIMO system is capable of providing good quality of service (QoS) uniformly to all served users, regardless of their locations in the coverage area, by using simple linear signal processing schemes [3, 4, 5]. From these aspects, the combination of mmWave and CF massive MIMO systems with the deployment of large numbers of antennas at the APs could be a symbiotic convergence of technologies that can significantly improve the performance of next-generation wireless communication systems [5].

I-A Related works

The performance CF massive MIMO systems in conventional sub-6-GHz frequency bands have been analyzed intensively [3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]. In particular, the closed-form expression of the achievable rate of CF massive MIMO systems is derived in [3, 4, 6, 7]. Furthermore, comparisons of CF massive MIMO and conventional small-cell MIMO systems in [3, 4, 8, 9, 7] show that the former is more robust to shadow fading correlation and significantly outperforms the latter in terms of throughput and coverage probability. In [10], a low-complexity power control technique with zero-forcing (ZF) precoding design is introduced for CF massive MIMO systems. In [11], an optimal power-allocation algorithm is proposed to maximize the total EE, which can double the total EE compared to the equal power control scheme. Furthermore, an AP selection (APS) scheme is proposed in [11], in which each user chooses and connects to only a subset of APs to reduce the power consumption caused by the backhaul links. Meanwhile, in [12], the EE maximization problem is considered under the effect of quantization distortion of the weighted received signals at the APs. In [13, 14], the authors propose AP-clustering approaches to overcome limitations upon the scalability of CF massive MIMO systems. Specifically, the APs are grouped in clusters and multiple CPUs are used to manage these clusters, leading to a reduction in the data distribution and computational complexity involved in channel estimation, power control, and beamforming.

Another line of work has attempted to investigate the performance of CF massive MIMO systems in mmWave channels [21, 22, 23, 5]. In particular, Femenias et al. introduced a hybrid beamforming (HBF) framework for CF mmWave massive MIMO systems with limited fronthaul capacity in [5], and the eigen-beamforming scheme is applied to generate precoders/combiners. Specifically, the phases of analog beamformers are obtained by quantizing those of dominant eigenvectors of the channel covariance matrix known at the APs. In [21], a hybrid precoding algorithm leveraging antenna-array response vectors is applied to distributed MIMO systems with partially-connected HBF architecture. Although the partially-connected HBF structure has lower power consumption than the fully-connected one, it cannot fully exploit the beamforming gains [24]. In [22] and [23], Alonzo et al. introduce an uplink multi-user estimation scheme along with low-complexity HBF architectures. Specifically, the baseband and analog precoders at each AP are generated by decomposing the fully digital ZF precoding matrix using the block-coordinate descent algorithm. Moreover, the problems of pilot assignment and channel estimation are considered in [25] and [26], respectively.

In mmWave communications, a large number of antennas at the APs not only provide beamforming gains to compensate for the propagation loss in mmWave channels, but also enhance favorable propagation [27]. In particular, it is shown in [27] that given a fixed total number of antennas at the APs, employing more antennas at fewer APs is more beneficial than deploying more APs with fewer antennas in terms of favorable propagation and channel hardening. Furthermore, large antenna arrays at the APs can achieve high beamforming gains to overcome the severe path loss in mmWave communication. However, an excessively high power consumption is required in this deployment because signal processing in the conventional digital domain requires a dedicated RF chain and analog-to-digital converter (ADC) for each antenna. Therefore, energy-efficient HBF schemes dedicated to CF mmWave massive MIMO systems are required, but only limited works exist in the literature focusing on the optimization of HBF for CF mmWave massive MIMO systems. Specifically, in [21, 22, 5, 23], the analog beamformers are all separately generated at the APs based on the local CSI, which are similar to HBF schemes for small-cell mmWave massive MIMO systems when each AP in the CF mmWave massive MIMO system is considered as a base station in the small-cell system. The antenna-selection (AS) schemes, which are proposed for the transceiver with a limited number of RF chains in [28, 29], can be applied to CF mmWave massive MIMO systems to reduce power consumption. However, they can cause performance degradation, especially for HBF in the highly correlated channels of mmWave communication [30].

I-B Contributions

In this work, we investigate the HBF for uplink CF mmWave massive MIMO systems in two scenarios: the global CSI of all APs is available or unavailable at the CPU. Then, we propose an adaptive RF chain-activation (ARFA) scheme, which provides considerable power reduction while nearly maintaining the system’s total achievable rate; thus, the EE is remarkably improved. Our specific contributions can be summarized as follows:

  • We first propose the decentralized HBF (D-HBF) and semi-centralized HBF (SC-HBF) schemes. Both have digital beamformers generated at the AP, but their difference lies in the analog beamformer. Specifically, in SC-HBF, the analog beamforming matrices for all APs are generated at the CPU based on the global CSI. In contrast, that of the D-HBF is obtained at each AP based only on the local CSI. By exploiting the global CSI to jointly optimize the analog combiners at the CPU, SC-HBF is expected to outperform D-HBF. However, our analytical and numerical results show that D-HBF can perform approximately the same as SC-HBF while requiring substantially lower computational complexity and no global CSI.

  • In CF mmWave massive MIMO systems with LL APs and NN RF chains at each AP, the power consumption is approximately proportional to LNLN. Because LL is large in CF massive MIMO systems, the total power consumption can be excessively high. To overcome this challenge, we propose an ARFA scheme. In this scheme, the RF chains are selectively activated at the APs based on partial CSI, and the number of active RF chains at the APs is optimized so that the proposed scheme can significantly reduce the total power consumption while causing only marginal performance loss. Our numerical analysis reveals that in CF mmWave massive MIMO systems, the proposed ARFA scheme with a relatively small number of active RF chains can exhibit performance comparable to that of the conventional fixed-activation HBF scheme, which activates all the available RF chains. As a result, a considerable improvement in the EE is achieved.

  • In the proposed ARFA scheme, high computational complexity is required to find the optimal numbers of active RF chains at numerous APs with an exhaustive search. To reduce complexity, we propose a low-complexity near-optimal algorithms for the ARFA with SC-HBF. Furthermore, ARFA is incorporated with D-HBF in the proposed D-ARFA schemes, creating a singular value-based and path loss-based D-ARFA, wherein the CPU requires a very limited amount of information from the APs. Our simulation results show that the proposed algorithms perform very close to the conventional HBF scheme, in which all the available RF chains are turned on.

We note that an RF chain-selection (RFS) scheme is introduced in [31] for the conventional small-cell mmWave massive MIMO system. The RFS scheme and our proposed ARFA scheme are similar in exploiting a reduced number of RF chains for power reduction. However, the contributions in our work are novel in the following aspects. First, the number of active RF chains at the base station in the conventional small-cell system, which is a single integer, is optimized in the RFS scheme by solving the EE-optimization problem [31]. In contrast, we consider the CF mmWave massive MIMO system, and the numbers of active RF chains at numerous APs are jointly optimized by maximizing the system’s total achievable throughput. This results in unequal numbers of active RF chains at the APs, and an AP can even deactivate all RF chains, leading to significant power reduction. Second, [31] focuses on optimizing the number of RF chains at the transmitter rather than at the receiver to avoid the nontrivial integer programming problem. Our work fills in this hole by optimizing the numbers of active RF chains at the receivers, which are the APs in the uplink of CF mmWave massive MIMO systems. Because of these systematic differences, the algorithm presented in [31] cannot be leveraged for our proposed ARFA scheme, which thus requires novel algorithms as presented in Section IV.

The remainder of this paper is organized as follows: Section II introduces the system and channel models, whereas Section III describes the D-HBF and SC-HBF schemes. In Section IV, low-complexity ARFA algorithms are presented, and the power consumption of the proposed ARFA scheme is analyzed in Section V. Section VI presents simulation results, and the conclusion follows in Section VII.

II System Model

We consider the uplink of a CF mmWave massive MIMO system, where LL APs, each equipped with NrN_{r} receive antennas and N(Nr)N(\leq N_{r}) RF chains, and KK single-antenna user equipments (UEs) are distributed in a large area. All APs are connected simultaneously to a CPU via fronthaul links and jointly serve KK UEs. At each AP, a fully connected architecture is considered for analog combining, in which NN RF chains are connected to NrN_{r} receive antennas via a network of NNrNN_{r} PSs. We adopt a narrowband block-fading channel model [32, 33].

Let hklNr×1\textbf{{h}}_{kl}\in\mathbb{C}^{N_{r}\times 1} denote the channel between the kkth UE and llth AP. In mmWave systems, hkl\textbf{{h}}_{kl} follows the geometric Saleh–Valenzuela channel model and is given by [34, 35, 32]

hkl=ζklp=1Pklαkl(p)ar(ϕkl(p)),\displaystyle\textbf{{h}}_{kl}=\sqrt{\zeta_{kl}}\sum_{p=1}^{P_{kl}}\alpha_{kl}^{(p)}\textbf{{a}}_{r}(\phi_{kl}^{(p)}), (1)

where ζkl=GaβklNrPkl\zeta_{kl}=\frac{G_{\mathrm{a}}}{\beta_{kl}}\frac{N_{r}}{P_{kl}}. Here, GaG_{\mathrm{a}} is the antenna gain, and βkl\beta_{kl} represents the path loss between the kkth UE and llth AP, given by [36, 37]

βkl[dB]=β0+10ϵlog10(dkld0)+Aξ,\displaystyle\beta_{kl}\text{[dB]}=\beta_{0}+10\epsilon\log_{10}\left(\frac{d_{kl}}{d_{0}}\right)+A_{\xi}, (2)

where β0=10log10(4πd0λ)2\beta_{0}=10\log_{10}\left(\frac{4\pi d_{0}}{\lambda}\right)^{2}, d0=1d_{0}=1 m, dkld_{kl} is the distance between the kkth UE and llth AP, ϵ\epsilon is the average path-loss exponent over distance, and Aξ𝒩(0,ξ2)A_{\xi}\sim\mathcal{N}(0,\xi^{2}) represents the effect of shadow fading. Furthermore, PklP_{kl} is the number of effective channel paths; αkl(p)𝒞𝒩(0,1),l,k\alpha_{kl}^{(p)}\sim\mathcal{CN}(0,1),\forall l,k is the gain of the ppth path; and ϕkl(p)\phi_{kl}^{(p)} is the azimuth angle of arrival (AoA). In (1), ar()\textbf{{a}}_{r}(\cdot) represents the normalized receive array response vector at an AP, which depends on the structure of the antenna array. In this work, we consider a uniform linear array (ULA), where ar()\textbf{{a}}_{r}(\cdot) is given by ar(ϕ)=1Nr[1,ej2πλdssin(ϕ),,ej(Nr1)2πλdssin(ϕ)]T\textbf{{a}}_{r}(\phi)=\frac{1}{\sqrt{N_{r}}}[1,e^{j\frac{2\pi}{\lambda}d_{s}\sin(\phi)},\ldots,e^{j(N_{r}-1)\frac{2\pi}{\lambda}d_{s}\sin(\phi)}]^{T} in which λ\lambda denotes the wavelength of the signal and dsd_{s} is the antenna spacing [32]. Let Akl=[ar(ϕkl(1)),,ar(ϕkl(Pkl))]Nr×Pkl\textbf{{A}}_{kl}=[\textbf{{a}}_{r}(\phi_{kl}^{(1)}),\ldots,\textbf{{a}}_{r}(\phi_{kl}^{(P_{kl})})]\in\mathbb{C}^{N_{r}\times P_{kl}} and 𝜶kl=[αkl(1),,αkl(Pkl)]Pkl×1\boldsymbol{\alpha}_{kl}=[\alpha_{kl}^{(1)},\ldots,\alpha_{kl}^{(P_{kl})}]\in\mathbb{C}^{P_{kl}\times 1}. Then, hkl\textbf{{h}}_{kl} can be equivalently given as hkl=ζklAkl𝜶kl\textbf{{h}}_{kl}=\sqrt{\zeta_{kl}}\textbf{{A}}_{kl}\boldsymbol{\alpha}_{kl}; thus, hkl𝒞𝒩(𝟎,ζkl𝚿kl)\textbf{{h}}_{kl}\sim\mathcal{CN}(\mathbf{0},\zeta_{kl}\boldsymbol{\Psi}_{kl}), where 𝚿kl=𝔼{AklAklH}\boldsymbol{\Psi}_{kl}=\mathbb{E}\left\{\textbf{{A}}_{kl}\textbf{{A}}^{H}_{kl}\right\}.

II-A Uplink Channel Estimation

For channel estimation, all KK UEs simultaneously transmit their pilot sequences to the APs. Let τp𝝋kτp×1\sqrt{\tau_{p}}\boldsymbol{\varphi}_{k}\in\mathbb{C}^{\tau_{p}\times 1} be the pilot sequence of the kkth UE, where 𝝋k2=1\left\lVert\boldsymbol{\varphi}_{k}\right\rVert^{2}=1, k=1,,Kk=1,\ldots,K. Here, τp\tau_{p} (τp<τc)(\tau_{p}<\tau_{c}) is the length of 𝝋k\boldsymbol{\varphi}_{k}, where τc\tau_{c} denotes the length of each coherence interval (in samples). When all KK UEs send their pilots, the received signal at the llth AP is given as:

Yl=τpρpk=1Khkl𝝋kH+Zl,\displaystyle\textbf{{Y}}_{l}=\sqrt{\tau_{p}\rho_{p}}\sum_{k=1}^{K}\textbf{{h}}_{kl}\boldsymbol{\varphi}_{k}^{H}+\textbf{{Z}}_{l}, (3)

where ρp>0\rho_{p}>0 represents the average transmit power of each UE; ZlNr×τp\textbf{{Z}}_{l}\in\mathbb{C}^{N_{r}\times\tau_{p}} is the receiver noise, whose entries are independent and identically distributed (i.i.d.) 𝒞𝒩(0,σ2)\mathcal{CN}(0,\sigma^{2}) random variables (RVs); and σ2\sigma^{2} is the noise power. To estimate hkl\textbf{{h}}_{kl}, Yl\textbf{{Y}}_{l} is projected onto 𝝋k\boldsymbol{\varphi}_{k}, which yields

yklYl𝝋k=τpρphkl+τpρpikKhil𝝋iH𝝋k+Zl𝝋k.\displaystyle\textbf{{y}}_{kl}\triangleq\textbf{{Y}}_{l}\boldsymbol{\varphi}_{k}=\sqrt{\tau_{p}\rho_{p}}\textbf{{h}}_{kl}+\sqrt{\tau_{p}\rho_{p}}\sum_{i\neq k}^{K}\textbf{{h}}_{il}\boldsymbol{\varphi}_{i}^{H}\boldsymbol{\varphi}_{k}+\textbf{{Z}}_{l}\boldsymbol{\varphi}_{k}.

Thus, the minimum mean-square error (MMSE) estimate of hkl\textbf{{h}}_{kl}, denoted by h^kl\hat{\textbf{{h}}}_{kl}, is given by

h^kl=𝔼{hklyklH}(𝔼{yklyklH})1ykl=τpρpζkl𝚿kl(τpρpi=1Kζil𝚿il|𝝋iH𝝋k|2+σ2INr)1ykl.\displaystyle\hat{\textbf{{h}}}_{kl}=\mathbb{E}\left\{\textbf{{h}}_{kl}\textbf{{y}}_{kl}^{H}\right\}\left(\mathbb{E}\left\{\textbf{{y}}_{kl}\textbf{{y}}_{kl}^{H}\right\}\right)^{-1}\textbf{{y}}_{kl}=\sqrt{\tau_{p}\rho_{p}}\zeta_{kl}\boldsymbol{\Psi}_{kl}\left(\tau_{p}\rho_{p}\sum_{i=1}^{K}\zeta_{il}\boldsymbol{\Psi}_{il}\left|\boldsymbol{\varphi}_{i}^{H}\boldsymbol{\varphi}_{k}\right|^{2}+\sigma^{2}\textbf{{I}}_{N_{r}}\right)^{-1}\textbf{{y}}_{kl}. (4)

Assume that knowledge of the correlation matrix 𝚿kl\boldsymbol{\Psi}_{kl}, k\forall k, is available at the llth AP [9], from which h^kl\hat{\textbf{{h}}}_{kl} can be determined. Note that in (4), we assume that the signals received at all the antennas of the AP are available for MMSE estimation. As a result, the estimate of the full CSI associated with all the antennas can be obtained via the low-complexity MMSE estimator. In a very slow-fading and sparse channel, the full CSI can be obtained by a compressed sensing-based approach [32], but with the high complexity required for a large number of compressed-sensing measurements [38].

Let H^l[h^1l,,h^Kl]Nr×K\hat{\textbf{{H}}}_{l}\triangleq[\hat{\textbf{{h}}}_{1l},\ldots,\hat{\textbf{{h}}}_{Kl}]\in\mathbb{C}^{N_{r}\times K} denote the estimated channel matrix between the KK UEs and llth AP. Furthermore, we define H^[H^1T,,H^LT]TLNr×K\hat{\textbf{{H}}}\triangleq\left[\hat{\textbf{{H}}}_{1}^{T},\ldots,\hat{\textbf{{H}}}_{L}^{T}\right]^{T}\in\mathbb{C}^{LN_{r}\times K} as the composite estimated channels between all the APs and UEs. In the next section, H^l\hat{\textbf{{H}}}_{l} and H^\hat{\textbf{{H}}} are employed for the hybrid beamforming design of the uplink data transmission.

II-B Uplink Data Transmission

Denote by xkx_{k} the symbol sent from the kkth UE to all the APs, such that 𝔼{|xk|2}=1\mathbb{E}\{\left|x_{k}\right|^{2}\}=1, k\forall k. The signal input-output relationship at the llth AP can be expressed as

rl=ρk=1KWlHFlHhklxk+WlHFlHzl,\displaystyle\textbf{{r}}_{l}=\sqrt{\rho}\sum_{k=1}^{K}\textbf{{W}}_{l}^{H}\textbf{{F}}_{l}^{H}\textbf{{h}}_{kl}x_{k}+\textbf{{W}}_{l}^{H}\textbf{{F}}_{l}^{H}\textbf{{z}}_{l}, (5)

where ρ\rho represents the average transmit power, and zl\textbf{{z}}_{l} is the noise vector, whose elements are i.i.d. 𝒞𝒩(0,σ2)\mathcal{CN}(0,\sigma^{2}) RVs. Furthermore, FlNr×N\textbf{{F}}_{l}\in\mathbb{C}^{N_{r}\times N} is the analog combining matrix at the llth AP. Its nnth column, i.e., fln=[fln(1),,fln(Nr)]T\hskip 2.84544pt\textbf{{f}}_{ln}=[f_{ln}^{(1)},\ldots,f_{ln}^{(N_{r})}]^{T}, is the analog weight vector corresponding to the nnth RF chain at the llth AP, and fln(i)=1Nrejθln(i)f_{ln}^{(i)}=\frac{1}{\sqrt{N_{r}}}e^{j\theta_{ln}^{(i)}} is the iith element of fln\hskip 2.84544pt\textbf{{f}}_{ln}. WlN×K\textbf{{W}}_{l}\in\mathbb{C}^{N\times K} denote the digital combining matrix at the llth AP. Then, the APs send rl,l\textbf{{r}}_{l},\forall l to the CPU via a fronthaul network to perform the final signal detection. In this work, we assume a simple centralized decoding scheme at the CPU, which requires minimal information exchange between the APs and CPU. In this scheme, the final decoded signal at the CPU is given as the average of the local estimates, that is, 1Ll=1Lrl\frac{1}{L}\sum_{l=1}^{L}\textbf{{r}}_{l} [9].

The composite received signal available at the CPU can be expressed as

[r1rL]=ρk=1K[W1HF1Hhk1WLHFLHhkL]xk+[W1HF1Hz1WLHFLHzL].\displaystyle\begin{bmatrix}\textbf{{r}}_{1}\\ \vdots\\ \textbf{{r}}_{L}\end{bmatrix}=\sqrt{\rho}\sum_{k=1}^{K}\begin{bmatrix}\textbf{{W}}_{1}^{H}\textbf{{F}}_{1}^{H}\textbf{{h}}_{k1}\\ \vdots\\ \textbf{{W}}_{L}^{H}\textbf{{F}}_{L}^{H}\textbf{{h}}_{kL}\end{bmatrix}x_{k}+\begin{bmatrix}\textbf{{W}}_{1}^{H}\textbf{{F}}_{1}^{H}\textbf{{z}}_{1}\\ \vdots\\ \textbf{{W}}_{L}^{H}\textbf{{F}}_{L}^{H}\textbf{{z}}_{L}\end{bmatrix}. (6)

Let Fdiag{F1,,FL}LNr×LN\textbf{{F}}\triangleq\text{diag}\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\}\in\mathbb{C}^{LN_{r}\times LN} and Wdiag{W1,,WL}LN×LK\textbf{{W}}\triangleq\text{diag}\left\{\textbf{{W}}_{1},\ldots,\textbf{{W}}_{L}\right\}\in\mathbb{C}^{LN\times LK} be block-diagonal matrices containing the analog and digital combiners for all LL APs. In this work, we refer to F and W as global combiners, whereas {F1,,FL}\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\} and {W1,,WL}\left\{\textbf{{W}}_{1},\ldots,\textbf{{W}}_{L}\right\} for the signal combined at APs {1,,L}\{1,\ldots,L\} are referred to as the local combiners. Furthermore, let Hl[h1l,,hKl]Nr×K\textbf{{H}}_{l}\triangleq\left[\textbf{{h}}_{1l},\ldots,\textbf{{h}}_{Kl}\right]\in\mathbb{C}^{N_{r}\times K} denote the channel matrix between the KK UEs and the llth AP. We define r[r1T,,rLT]TLK×1\textbf{{r}}\triangleq[\textbf{{r}}_{1}^{T},\ldots,\textbf{{r}}_{L}^{T}]^{T}\in\mathbb{C}^{LK\times 1}, x[x1,,xK]TK×1\textbf{{x}}\triangleq[x_{1},\ldots,x_{K}]^{T}\in\mathbb{C}^{K\times 1}, z[z1T,,zLT]TLNr×1\textbf{{z}}\triangleq[\textbf{{z}}_{1}^{T},\ldots,\textbf{{z}}_{L}^{T}]^{T}\in\mathbb{C}^{LN_{r}\times 1}, and H[H1T,,HLT]TLNr×K\textbf{{H}}\triangleq[\textbf{{H}}_{1}^{T},\ldots,\textbf{{H}}_{L}^{T}]^{T}\in\mathbb{C}^{LN_{r}\times K}. Then, (6) can be rewritten in a more compact form as

r =ρWHFHHx+WHFHz=ρWHFHHx+z~,\displaystyle=\sqrt{\rho}\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{H}}\textbf{{x}}+\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{z}}=\sqrt{\rho}\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{H}}\textbf{{x}}+\tilde{\textbf{{z}}}, (7)

where z~WHFHz𝒞𝒩(0,Rz~)\tilde{\textbf{{z}}}\triangleq\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{z}}\sim\mathcal{CN}(0,\textbf{{R}}_{\tilde{z}}), and Rz~σ2WHFHFW\textbf{{R}}_{\tilde{z}}\triangleq\sigma^{2}\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{F}}\textbf{{W}}.

We note that the analog processing is separately performed at the APs because the ADCs, RF chains, and PSs are installed at the APs. However, {F1,,FL}\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\} can be generated either at the APs based on their local CSI or at the CPU based on the global CSI. Once the analog combiner is obtained, digital processing can be carried out at the corresponding AP. We follow the common assumption in [4, 19] that the digital combining is performed at the APs individually. Therefore, in this work, the D-HBF scheme refers to the HBF with analog combiners generated at each AP separately, whereas SC-HBF implies that the analog combiners are computed at the CPU based on the global CSI. In this regard, we note that the perfect local/global CSI is not available at neither the APs nor the CPU, respectively. Therefore, to perform D-HBF, the AP, say the llth AP, treats its own estimated channels H^l\hat{\textbf{{H}}}_{l} as the true channel and employs it to generate the hybrid combiners Fl\textbf{{F}}_{l} and Wl\textbf{{W}}_{l}. Similarly, in SC-HBF, the CPU exploits the global estimated CSI H^\hat{\textbf{{H}}} to obtain {F1,,FL}\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\}.

III SC-HBF and D-HBF

Based on (7), the total achievable rate ¯\bar{\mathcal{R}} can be expressed as [33]

¯=τcτpτclog2|ILK+ρRz~1WHFHHHHFW|.\displaystyle\bar{\mathcal{R}}=\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\left|\textbf{{I}}_{LK}+\rho\textbf{{R}}_{\tilde{z}}^{-1}\textbf{{W}}^{H}\textbf{{F}}^{H}\textbf{{H}}\textbf{{H}}^{H}\textbf{{F}}\textbf{{W}}\right|. (8)

We aim to design hybrid combiners that maximize \mathcal{R}. The design of F and W can be decoupled by first designing the analog combiner assuming an optimal digital combiner and then finding the optimal digital combiner for the derived analog one [39]. Therefore, the analog beamforming design problem is formulated as

(Pa)maximize{F1,,FL}\displaystyle(\text{P}_{\mathrm{a}})\quad\underset{\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\}}{\textrm{maximize}}\quad log2|ILN+ρ(FHF)1FHHHHF|,\displaystyle\log_{2}\left|\textbf{{I}}_{LN}+\rho\left(\textbf{{F}}^{H}\textbf{{F}}\right)^{-1}\textbf{{F}}^{H}\textbf{{H}}\textbf{{H}}^{H}\textbf{{F}}\right|, (9a)
subject to F=diag{F1,,FL}.\displaystyle\textbf{{F}}=\text{diag}\left\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\right\}. (9b)
fln,l,n,\displaystyle\hskip 2.84544pt\textbf{{f}}_{ln}\in\mathcal{F},\forall l,n, (9c)

where γ=ρσ2\gamma=\frac{\rho}{\sigma^{2}}, and \mathcal{F} is the set of feasible analog combining coefficients fln(i)=1Nrejθln(i),l,n,if_{ln}^{(i)}=\frac{1}{\sqrt{N_{r}}}e^{j\theta_{ln}^{(i)}},\forall l,n,i. To simplify the objective function in (Pa)(\text{P}_{\mathrm{a}}), we assume FlHFlIN\textbf{{F}}^{H}_{l}\textbf{{F}}_{l}\approx\textbf{{I}}_{N} [33, 39], which is tight in the considered CF mmWave massive MIMO system with a sufficiently large number of antennas deployed at each AP. Consequently, we have FHFILN\textbf{{F}}^{H}\textbf{{F}}\approx\textbf{{I}}_{LN}, and the objective function in (9a), which is the sum rate achieved by analog combining, can be approximated by τcτpτclog2|ILN+γFHHHHF|¯a\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\left|\textbf{{I}}_{LN}+\gamma\textbf{{F}}^{H}\textbf{{H}}\textbf{{H}}^{H}\textbf{{F}}\right|\triangleq\bar{\mathcal{R}}^{\mathrm{a}}. Therefore, the optimal analog combiners can be solved approximately in

(Pa)maximize{F1,,FL}\displaystyle(\text{P}^{\prime}_{\mathrm{a}})\quad\underset{\{\textbf{{F}}_{1},\ldots,\textbf{{F}}_{L}\}}{\textrm{maximize}}\quad ¯a\displaystyle\bar{\mathcal{R}}^{\mathrm{a}}
subject to (9b),(9c).\displaystyle\eqref{Pa_cons_1},\eqref{Pa_cons_2}. (10)

The objective function ¯a\bar{\mathcal{R}}^{\mathrm{a}} of (Pa)(\text{P}^{\prime}_{\mathrm{a}}) is further investigated in the following theorem.

Theorem 1

In a CF mmWave massive MIMO system with LL APs, we have ¯a=l=1L¯la\bar{\mathcal{R}}^{\mathrm{a}}=\sum_{l=1}^{L}\bar{\mathcal{R}}^{\mathrm{a}}_{l}, where

¯la=τcτpτclog2det(IN+γFlHHlQl11HlHFl),\displaystyle\bar{\mathcal{R}}^{\mathrm{a}}_{l}=\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\det\left(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\textbf{{Q}}_{l-1}^{-1}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\right), (11)

with Q0=IK\textbf{{Q}}_{0}=\textbf{{I}}_{K} and

Ql1=Ql2+γHl1HFl1Fl1HHl1.\displaystyle\textbf{{Q}}_{l-1}=\textbf{{Q}}_{l-2}+\gamma\textbf{{H}}_{l-1}^{H}\textbf{{F}}_{l-1}\textbf{{F}}_{l-1}^{H}\textbf{{H}}_{l-1}. (12)
Proof

See Appendix A. \Box

Based on Theorem 1, ¯a\bar{\mathcal{R}}^{\mathrm{a}} can be maximized by optimizing {¯1a,,¯La}\{\bar{\mathcal{R}}^{\mathrm{a}}_{1},\ldots,\bar{\mathcal{R}}^{\mathrm{a}}_{L}\} corresponding to APs {1,,L}\{1,\ldots,L\}. We recall that the llth AP treats H^l\hat{\textbf{{H}}}_{l} as the true channel to obtain the combiners. Therefore, Fl\textbf{{F}}_{l}^{\star} in (Pa)(\text{P}^{\prime}_{\mathrm{a}}) can be determined by maximizing laτcτpτclog2det(IN+γFlHH^lQ^l11H^lHFl)\mathcal{R}^{\mathrm{a}}_{l}\triangleq\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\det\left(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}\textbf{{F}}_{l}\right), where Q^l\hat{\textbf{{Q}}}_{l} is defined similarly to Ql\textbf{{Q}}_{l} by replacing Hl\textbf{{H}}_{l} with H^l\hat{\textbf{{H}}}_{l}. Consequently, we can write

Fl=argmaxFlla,l,subject tofl1,,flN.\displaystyle\textbf{{F}}_{l}^{\star}=\arg\max_{\textbf{{F}}_{l}}\mathcal{R}^{\mathrm{a}}_{l},\forall l,\hskip 4.26773pt\text{subject to}\hskip 2.84544pt\hskip 2.84544pt\textbf{{f}}_{l1},\ldots,\hskip 2.84544pt\textbf{{f}}_{lN}\in\mathcal{F}. (13)

Let {ul1,,ulN}\left\{{\textbf{{u}}}^{\star}_{l1},\ldots,{\textbf{{u}}}^{\star}_{lN}\right\} be the NN singular vectors corresponding to NN largest singular values of H^lQ^l11H^lH\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}, which are in decreasing order. Then, columns {fl1,,flN}\{\hskip 2.84544pt\textbf{{f}}_{l1}^{\star},\ldots,\hskip 2.84544pt\textbf{{f}}_{lN}^{\star}\} of a near-optimal solution to (13) can be obtained by quantizing {ul1,,ulN}\left\{{\textbf{{u}}}^{\star}_{l1},\ldots,{\textbf{{u}}}^{\star}_{lN}\right\}, respectively, to the nearest vector in \mathcal{F} [21], i.e.,

fln=argminflnulnfln2,n.\displaystyle\hskip 2.84544pt\textbf{{f}}_{ln}^{\star}=\arg\min_{\hskip 2.84544pt\textbf{{f}}_{ln}\in\mathcal{F}}\left\lVert\textbf{{u}}_{ln}^{\star}-\hskip 2.84544pt\textbf{{f}}_{ln}\right\rVert^{2},\forall n. (14)

At the llth AP, once the analog combiner Fl\textbf{{F}}_{l}^{\star} is found, the optimal digital combiner is given as the MMSE solution, i.e.,

Wl=J1FlHH^l,\displaystyle\textbf{{W}}_{l}^{\star}=\textbf{{J}}^{-1}\textbf{{F}}_{l}^{\star H}\hat{\textbf{{H}}}_{l}, (15)

where J=FlHH^lH^lHFl+1γFlHFl\textbf{{J}}=\textbf{{F}}_{l}^{\star H}\hat{\textbf{{H}}}_{l}\hat{\textbf{{H}}}_{l}^{H}\textbf{{F}}_{l}^{\star}+\frac{1}{\gamma}\textbf{{F}}_{l}^{\star H}\textbf{{F}}_{l}^{\star} [39]. In the following subsections, we propose two HBF schemes in which the analog combiners are derived based on different assumptions for CSI.

Algorithm 1 SC-HBF scheme
0:  {F1,,FL}\left\{\textbf{{F}}_{1}^{\star},\ldots,\textbf{{F}}_{L}^{\star}\right\}, {W1,,WL}\left\{\textbf{{W}}_{1}^{\star},\ldots,\textbf{{W}}_{L}^{\star}\right\}, and {1a,,La}\left\{\mathcal{R}^{\mathrm{a}}_{1},\ldots,\mathcal{R}^{\mathrm{a}}_{L}\right\}.
1:  At the CPU: Q^0=IK\hat{\textbf{{Q}}}_{0}=\textbf{{I}}_{K}
2:  for l=1Ll=1\rightarrow L do
3:     for n=1Nn=1\rightarrow N do
4:        Set uln\textbf{{u}}_{ln}^{\star} to the singular vector corresponding to the nnth largest singular value of H^lQ^l11H^lH\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}.
5:        fln=1Nr𝒬(uln)\hskip 2.84544pt\textbf{{f}}_{ln}^{\star}=\frac{1}{\sqrt{N_{r}}}\mathcal{Q}(\textbf{{u}}_{ln}^{\star})
6:     end for
7:     Fl=[fl1,,flN]\textbf{{F}}_{l}^{\star}=\left[\hskip 2.84544pt\textbf{{f}}_{l1}^{\star},\ldots,\hskip 2.84544pt\textbf{{f}}_{lN}^{\star}\right]
8:     Gl=H^lHFlFlHH^l\textbf{{G}}_{l}=\hat{\textbf{{H}}}_{l}^{H}\textbf{{F}}_{l}^{\star}{\textbf{{F}}_{l}^{\star}}^{H}\hat{\textbf{{H}}}_{l}
9:     Q^l=Q^l1+γGl\hat{\textbf{{Q}}}_{l}=\hat{\textbf{{Q}}}_{l-1}+\gamma\textbf{{G}}_{l}
10:     la=log2(IN+γFlHH^lQ^l11H^lHFl)\mathcal{R}_{l}^{\mathrm{a}}=\log_{2}\left(\textbf{{I}}_{N}+\gamma{\textbf{{F}}_{l}^{\star}}^{H}\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H}\textbf{{F}}_{l}^{\star}\right)
11:  end for
12:  At the llth AP: compute Wl\textbf{{W}}_{l}^{\star} based on (15).

III-A SC-HBF

It is evident from (11) and (12) that la\mathcal{R}^{\mathrm{a}}_{l} depends not only on H^l\hat{\textbf{{H}}}_{l}, but also on H^l1,H^l2,,H^1\hat{\textbf{{H}}}_{l-1},\hat{\textbf{{H}}}_{l-2},\ldots,\hat{\textbf{{H}}}_{1}. Therefore, finding Fl\textbf{{F}}_{l}^{\star} requires not only H^l\hat{\textbf{{H}}}_{l} but also H^l1,H^l2,,H^1\hat{\textbf{{H}}}_{l-1},\hat{\textbf{{H}}}_{l-2},\ldots,\hat{\textbf{{H}}}_{1}. This is similar to the requirements for determining analog beamformers for sub-arrays in the partially-connected HBF architecture [40, 24]. As a result, solving {F1,F2,,FL}\left\{\textbf{{F}}_{1}^{\star},\textbf{{F}}_{2}^{\star},\ldots,\textbf{{F}}_{L}^{\star}\right\} requires the CSI of the channels between all LL APs and KK UEs, i.e., {H^1,H^2,,H^L}\left\{\hat{\textbf{{H}}}_{1},\hat{\textbf{{H}}}_{2},\ldots,\hat{\textbf{{H}}}_{L}\right\}, which can be available at the CPU; hence, finding F\textbf{{F}}^{\star} based on (13) requires a SC-HBF scheme.

Algorithm 1 presents the proposed SC-HBF scheme to obtain {F1,F2,,FL}\left\{\textbf{{F}}_{1}^{\star},\textbf{{F}}_{2}^{\star},\ldots,\textbf{{F}}_{L}^{\star}\right\}. In particular, in steps 3–6, the combining vector fln\hskip 2.84544pt\textbf{{f}}_{ln}^{\star} is obtained by quantizing uln\textbf{{u}}_{ln}^{\star} based on (14), which ensures that the resultant analog combiners belong to the feasible set \mathcal{F}. Then, Fl\textbf{{F}}_{l}^{\star} is found in step 7 and Gl\textbf{{G}}_{l} is computed in step 8, followed by Q^l\hat{\textbf{{Q}}}_{l} being updated in step 9. In step 10, la\mathcal{R}^{\mathrm{a}}_{l} corresponding to the llth AP is computed. Furthermore, the digital combiner is computed at each AP, as in step 12. We note that in Algorithm 1, steps 1–11 are performed at the CPU, whereas step 12 is performed at the APs.

III-B D-HBF

Let {u~l1,,u~lN}\left\{\tilde{\textbf{{u}}}^{\star}_{l1},\ldots,\tilde{\textbf{{u}}}^{\star}_{lN}\right\} be the NN singular vectors corresponding to the NN largest singular values of H^l\hat{\textbf{{H}}}_{l}, which are in decreasing order. Furthermore, define

f~ln=argminflnu~lnfln2,n.\displaystyle\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln}^{\star}=\arg\min_{\hskip 2.84544pt\textbf{{f}}_{ln}\in\mathcal{F}}\left\lVert\tilde{\textbf{{u}}}_{ln}^{\star}-\hskip 2.84544pt\textbf{{f}}_{ln}\right\rVert^{2},\forall n. (16)

Then, in the D-HBF scheme, the optimal local analog combiner generated at the llth AP based on H^l\hat{\textbf{{H}}}_{l} can be given as F~l=[f~l1,,f~lN]\tilde{\textbf{{F}}}_{l}^{\star}=\left[\tilde{\hskip 2.84544pt\textbf{{f}}}_{l1}^{\star},\ldots,\tilde{\hskip 2.84544pt\textbf{{f}}}_{lN}^{\star}\right] [21]. Let F~=diag{F~1,,F~L}\tilde{\textbf{{F}}}^{\star}=\text{diag}\left\{\tilde{\textbf{{F}}}_{1}^{\star},\ldots,\tilde{\textbf{{F}}}_{L}^{\star}\right\}. In the following theorem, we show that the total achievable rate achieved by analog combining in the D-HBF scheme is approximately equal to that in SC-HBF.

Remark 1

In CF mmWave massive MIMO systems with large LL and low SNRs due to the significant pathloss in the mmWave channels, the total achievable rate achieved by the analog combining in the D-HBF scheme, denoted by ~a\tilde{\mathcal{R}}^{\mathrm{a}}, is approximately the same as that of the SC-HBF scheme, i.e.,

~a=τcτpτclog2det(IK+γHHF~F~HH)¯a,\displaystyle\tilde{\mathcal{R}}^{\mathrm{a}}=\frac{\tau_{c}-\tau_{p}}{\tau_{c}}\log_{2}\det\left(\textbf{{I}}_{K}+\gamma\textbf{{H}}^{H}\tilde{\textbf{{F}}}^{\star}\tilde{\textbf{{F}}}^{{\star}H}\textbf{{H}}\right)\approx\bar{\mathcal{R}}^{\mathrm{a}}, (17)

where ¯a\bar{\mathcal{R}}^{\mathrm{a}} is given in Theorem 1.

Proof

See Appendix B. \Box

It is observed that D-HBF can be performed with considerably lower computational complexity than SC-HBF. Specifically, only NN singular vectors corresponding to the NN largest singular values of the channel matrix are required to form the analog combiner. In contrast, additional matrix inversions, multiplications, and additions are performed in steps 4, 8, and 9 of Algorithm 1 for the SC-HBF scheme. Notably, despite the simpler implementation and lower complexity of the D-HBF scheme, it can approximately achieve the performance of SC-HBF, as stated in Remark 1. Furthermore, the D-HBF scheme requires less information exchange between the APs and CPU. Specifically, only KK complex numbers in rl\textbf{{r}}_{l} are sent to the CPU on the fronthaul link to perform the final soft detection [4, 19], whereas the exchange of the CSI and analog combining matrix is not required, in contrast to the SC-HBF scheme

Remark 1 indicates that an efficient analog beamforming matrix can be designed based on the local CSI available at each AP of CF mmWave massive MIMO systems. However, this does not mean that further information exchange via the fronthaul links is completely useless. Indeed, the information exchange between the APs and CPU in CF massive MIMO systems can also be exploited to improve the EE. In the next section, it is discussed that by adaptively activating RF chains based on global CSI in SC-HBF or on limited information in D-HBF, the power consumption can be reduced, which leads to improved EE.

IV Adaptive RF chain activation

Refer to caption
Fig. 1: HBF architecture with the ARFA scheme. In the phase-shifter network, nlNrn_{l}N_{r} out of NNrNN_{r} PSs are turned on at the llth AP.

IV-A Problem formulation and basic ideas

The global analog combiner F can be expressed as

F=diag{[f11,,f1N]analog combiner at AP 1,,[fL1,,fLN]analog combiner at AP L},\displaystyle\textbf{{F}}=\text{diag}\{\underbrace{[\hskip 2.84544pt\textbf{{f}}_{11},\ldots,\hskip 2.84544pt\textbf{{f}}_{1N}]}_{\text{analog combiner at AP 1}},\ldots,\underbrace{[\hskip 2.84544pt\textbf{{f}}_{L1},\ldots,\hskip 2.84544pt\textbf{{f}}_{LN}]}_{\text{analog combiner at AP $L$}}\},

and the following facts are noted:

  • Based on Theorem 1, the total achievable rate obtained by analog combining can be expressed as a sum of {¯1a,,¯La}\{\bar{\mathcal{R}}^{\mathrm{a}}_{1},\ldots,\bar{\mathcal{R}}^{\mathrm{a}}_{L}\} corresponding to APs {1,,L}\{1,\ldots,L\}. In CF mmWave massive MIMO systems, the APs are distributed in a large area, and their communication channels experience different path losses and shadowing effects. Therefore, the contributions of the local analog combiners at different APs to the total achievable rate are of different significances.

  • In a local analog combiner Fl\textbf{{F}}_{l}, combining vectors {fl1,,flN}\{\hskip 2.84544pt\textbf{{f}}_{l1},\ldots,\hskip 2.84544pt\textbf{{f}}_{lN}\} have different contributions to the sub-rate ¯la\bar{\mathcal{R}}^{\mathrm{a}}_{l} given in (11). Specifically, the contribution of fln\hskip 2.84544pt\textbf{{f}}_{ln} is more significant than that of flm\hskip 2.84544pt\textbf{{f}}_{lm} if n<mn<m because nn and mm are the indices of the ordered singular values of H^lQ^l11H^lH\hat{\textbf{{H}}}_{l}\hat{\textbf{{Q}}}_{l-1}^{-1}\hat{\textbf{{H}}}_{l}^{H} in SC-HBF and of H^l\hat{\textbf{{H}}}_{l} in D-HBF.

As a result, it is likely that a subset of analog combining vectors in {f11,,fLN}\{\hskip 2.84544pt\textbf{{f}}_{11},\ldots,\hskip 2.84544pt\textbf{{f}}_{LN}\} are insignificant and can be removed from the global combiner F without causing considerable performance loss. We note that at an AP, an analog combining vector represents the effect of NrN_{r} PSs connected to an RF chain, followed by an ADC. Therefore, an insignificant analog combining vector can be removed from signal combining by turning off its corresponding RF chain, ADC, and PSs, which results in a reduction in the total power consumption. Motivated by this, we propose an ARFA scheme that selectively activates RF chains at the APs. Let n={n1,,nL}\textbf{{n}}=\{n_{1},\ldots,n_{L}\}, where nln_{l} is the number of turned-on RF chains out of NN RF chains installed at the llth AP, 0nlN0\leq n_{l}\leq N. We note that for nl=0n_{l}=0, all the RF chains at the llth AP are turned off, and this AP does not consume any power for signal combining. It is also noted that in the uplink training phase of the next coherent time, the deactivated RF chains are reactivated for channel estimation. The optimal activation of RF chains at the APs can be performed based on the following remark.

Remark 2

Because fln\hskip 2.84544pt\textbf{{f}}_{ln} is always more important at the llth AP than flm\hskip 2.84544pt\textbf{{f}}_{lm} with n<mn<m in terms of achievable rate, the problem of optimal activation of RF chains at an AP is equivalent to finding an optimal number of turned-on RF chains at that AP. Specifically, for the llth AP, if the ARFA scheme suggests using nln_{l}^{\star} RF chains, then the first nln_{l}^{\star} RF chains corresponding to {fl1,,flnl}\{\hskip 2.84544pt\textbf{{f}}_{l1},\ldots,\hskip 2.84544pt\textbf{{f}}_{ln_{l}^{\star}}\} are selected for analog signal combining, whereas the others are deactivated to save power.

The HBF architecture with the proposed ARFA scheme is illustrated in Fig. 1. As an example, at the llth AP, only nln_{l} out of NN RF chains are turned on. Furthermore, the ADCs and PSs connected to the inactive RF chains are also turned off. Consequently, the local combiner Fl\textbf{{F}}_{l} consists of only nln_{l} analog combining vectors, i.e., Fl={f1,,fnl}\textbf{{F}}_{l}=\{\hskip 2.84544pt\textbf{{f}}_{1},\ldots,\hskip 2.84544pt\textbf{{f}}_{n_{l}}\}.

Unlike the conventional fixed-activation HBF, in the proposed ARFA scheme, the global combiner F, la\mathcal{R}^{\mathrm{a}}_{l}, and l=1Lla\mathcal{R}\triangleq\sum_{l=1}^{L}\mathcal{R}^{\mathrm{a}}_{l} depend on n. Therefore, in this section, they are expressed as functions of n, i.e., F(n)\textbf{{F}}(\textbf{{n}}), la(n)\mathcal{R}^{\mathrm{a}}_{l}(\textbf{{n}}), and (n)\mathcal{R}(\textbf{{n}}), respectively. We limit the total number of turned-on RF chains in the ARFA scheme to Ln¯L\bar{n}, i.e., l=1Lnl=Ln¯\sum_{l=1}^{L}n_{l}=L\bar{n}, where n¯(N)\bar{n}(\leq N) is the average number of activated RF chains at each AP. Based on Remark 2, the optimal activation of RF chains at the APs in the ARFA scheme can be performed by solving

n=argmaxn𝒮(n),\displaystyle\textbf{{n}}^{\star}=\arg\max_{\textbf{{n}}\in\mathcal{S}}\mathcal{R}(\textbf{{n}}), (18)

where 𝒮={n:0nlN,l=1Lnl=Ln¯}\mathcal{S}=\left\{\textbf{{n}}:0\leq n_{l}\leq N,\sum_{l=1}^{L}n_{l}=L\bar{n}\right\} is the feasible set of n. The optimal n\textbf{{n}}^{\star} in (18) can be found by exhaustive search over the entire feasible set 𝒮\mathcal{S}. However, in CF mmWave massive MIMO systems, LL is large; thus, an excessively large number of candidates for n need to be examined in the exhaustive search, which is almost computationally prohibitive. In the following subsections, we propose three low-complexity algorithms to find n\textbf{{n}}^{\star}. By abuse of notation, we use F\textbf{{F}}^{\star} for the global combiner found by the proposed ARFA algorithms. Furthermore, we note that Algorithm 1 can be easily modified for the ARFA scheme by replacing NN in steps 3, 7, and 10 with nln_{l}, reflecting a dynamic number of analog combining vectors for the local combiner Fl\textbf{{F}}_{l} at the llth AP.

IV-B ARFA with SC-HBF (SC-ARFA)

Algorithm 2 HBF with SC-ARFA
0:  F\textbf{{F}}^{\star}
1:  Initialize n=[n1,,nL]\textbf{{n}}=[n_{1},\ldots,n_{L}] with nl=n¯,ln_{l}=\bar{n},\forall l.
2:  Use Algorithm 1 to find F(n)\textbf{{F}}(\textbf{{n}}), la(n),l\mathcal{R}^{\mathrm{a}}_{l}(\textbf{{n}}),\forall l, and (n)\mathcal{R}(\textbf{{n}}).
3:  F=F(n)\textbf{{F}}^{\star}=\textbf{{F}}(\textbf{{n}}), =(n)\mathcal{R}^{\star}=\mathcal{R}(\textbf{{n}}).
4:  Obtain {n[1],,n[L]}\{n_{[1]},\ldots,n_{[L]}\} s.t. [1]a(n)>>[L]a(n)\mathcal{R}^{\mathrm{a}}_{[1]}(\textbf{{n}})>\ldots>\mathcal{R}^{\mathrm{a}}_{[L]}(\textbf{{n}}).
5:  i=1i=1, k=Lk=L
6:  while i<ki<k do
7:     while n[i]=Nn_{[i]}=N do
8:        i=i+1i=i+1
9:     end while
10:     while n[k]=0n_{[k]}=0 do
11:        k=k1k=k-1
12:     end while
13:     Update n: n[i]=n[i]+1n_{[i]}=n_{[i]}+1, n[k]=n[k]1n_{[k]}=n_{[k]}-1.
14:     Use Algorithm 1 to find F(n)\textbf{{F}}(\textbf{{n}}) and (n)\mathcal{R}(\textbf{{n}}) with the updated n.
15:     Update F=F(n^)\textbf{{F}}^{\star}=\textbf{{F}}(\hat{\textbf{{n}}}^{\star}) if (n^)>\mathcal{R}(\hat{\textbf{{n}}}^{\star})>\mathcal{R}^{\star}.
16:  end while
17:  At the llth AP, compute Wl\textbf{{W}}_{l}^{\star} based on (15), l\forall l.

In SC-ARFA, the ARFA scheme is incorporated with SC-HBF, and the optimal numbers of active RF chains at the APs are found at the CPU based on the global CSI. The idea to find n\textbf{{n}}^{\star} is to turn on/off as many RF chains as possible at the APs corresponding to the largest/smallest la\mathcal{R}^{\mathrm{a}}_{l}, as presented in Algorithm 2. In steps 1–3, all elements of n are set to n¯\bar{n}, then F(n),la(n),l\textbf{{F}}(\textbf{{n}}),\mathcal{R}^{\mathrm{a}}_{l}(\textbf{{n}}),\forall l, and (n)\mathcal{R}(\textbf{{n}}) are computed. In step 4, the elements of n are ordered to obtain {n[1],,n[L]}\{n_{[1]},\ldots,n_{[L]}\} in the decreasing order of sub-rates {1a(n),,La(n)}\{\mathcal{R}^{\mathrm{a}}_{1}(\textbf{{n}}),\ldots,\mathcal{R}^{\mathrm{a}}_{L}(\textbf{{n}})\}. Therefore, n[i]n_{[i]} is the number of turned-on RF chains at the AP with the iith largest sub-rates, i.e., [i]a(n)\mathcal{R}^{\mathrm{a}}_{[i]}(\textbf{{n}}).

In step 5, we initialize i=1i=1 and k=Lk=L. In steps 6–16, n[i]n_{[i]} is increased by one, whereas n[k]n_{[k]} is decreased by one, in each iteration. We note that in step 13, n[i]n_{[i]} and n[k]n_{[k]} are updated simultaneously to guarantee l=1Ln[l]=Ln¯\sum_{l=1}^{L}n_{[l]}=L\bar{n}. The updates of n[i]n_{[i]} and n[k]n_{[k]} result in a new candidate n. Hence, F(n)\textbf{{F}}(\textbf{{n}}) and (n)\mathcal{R}(\textbf{{n}}) are found in step 14, and F\textbf{{F}}^{\star} is updated if the performance is improved, as shown in step 15. Once n[i]n_{[i]} reaches the maximum, i.e., NN, the number of turned-on RF chains at the AP associated with the (i+1)(i+1)th largest sub-rate, i.e., n[i+1]n_{[i+1]}, is considered, as shown in steps 7–9. In contrast, if n[k]n_{[k]} reaches the minimum, i.e., zero, n[k1]n_{[k-1]} is considered next, as shown in steps 10–12. This iterative process is terminated if iki\geq k, for which we have [i]a(n)[k]a(n)\mathcal{R}^{\mathrm{a}}_{[i]}(\textbf{{n}})\leq\mathcal{R}^{\mathrm{a}}_{[k]}(\textbf{{n}}) and the increase (decrease) in n[i]n_{[i]} (n[k])(n_{[k]}) is unlikely to provide performance improvement. Once all the analog combiners are determined and sent to the APs, the digital combiner at each AP is determined, as in step 17.

We note that the ARFA process needs to be performed at the CPU to jointly optimize the numbers of RF chains at all APs. In the SC-ARFA schemes, the global CSI is exploited to evaluate the candidates for n. However, the employment of SC-HBF in these schemes requires high computational complexity and a large amount of information exchanged between the CPU and APs, as discussed in Section III-B. This motivates us to propose an ARFA scheme incorporated with D-HBF in the next subsection.

IV-C ARFA with D-HBF (D-ARFA)

Without global CSI, the ARFA scheme can be performed if the CPU knows the qualities of the available combining vectors or the path loss corresponding to each AP. The former idea relies on the fact that a combining vector is obtained by quantizing a singular vector of the channel matrix, as shown in (16). Therefore, the quality of a combining vector can be evaluated based on its corresponding singular value. In contrast, the latter idea for D-ARFA is motivated by the observation that the AP with more significant path loss should have fewer activated RF chains because it is more likely to have a low sub-rate.

IV-C1 Singular values-based D-ARFA (SV-based D-ARFA)

The SV-based D-ARFA scheme is summarized in Algorithm 3. Specifically, in step 1, each AP finds and sends the NN largest singular values of the channel matrix to the CPU. Here, only the NN largest singular values are sent because, in the proposed ARFA scheme, only nln_{l} out of NN combining vectors are selected for signal combining at the llth AP. As a result, the set of LNLN singular values {λ1(1),,λN(1),,λ1(L),,λN(L)}\left\{\lambda_{1}^{(1)},\ldots,\lambda_{N}^{(1)},\ldots,\lambda_{1}^{(L)},\ldots,\lambda_{N}^{(L)}\right\} is available at the CPU, where λn(l)\lambda_{n}^{(l)} is the nnth largest singular value associated with the llth AP. Then, the numbers of active RF chains at the APs are determined in steps 3–6. Specifically, an RF chain at an AP is suggested for activation if its corresponding singular value is not smaller than λLn¯\lambda_{L\bar{n}} found in step 2. In other words, the number of active RF chains at the llth AP, that is, nln^{\star}_{l}, is set as the number of elements in {λ1(l),,λN(l)}\left\{\lambda_{1}^{(l)},\ldots,\lambda_{N}^{(l)}\right\} that are not smaller than λLn¯\lambda_{L\bar{n}}. Finally, the CPU sends the value nln^{\star}_{l} back to the llth AP, which is then used for signal combining based on the D-HBF scheme.

Algorithm 3 HBF with SV-based D-ARFA
0:  F\textbf{{F}}^{\star}
1:  Each AP finds the NN largest singular values of its channel matrix and sends them to the CPU. Specifically, the llth AP finds and sends a vector el=[λ1(l),,λN(l)]\textbf{{e}}_{l}=\left[\lambda_{1}^{(l)},\ldots,\lambda_{N}^{(l)}\right], where λn(l)\lambda_{n}^{(l)} is the nnth largest singular value of H^l\hat{\textbf{{H}}}_{l}.
2:  The CPU finds λLn¯\lambda_{L\bar{n}}, which is the (Ln¯)(L\bar{n})th largest element in the singular value set {λ1(1),,λN(1),,λ1(L),,λN(L)}\{\lambda_{1}^{(1)},\ldots,\lambda_{N}^{(1)},\ldots,\lambda_{1}^{(L)},\ldots,\lambda_{N}^{(L)}\} received from all APs.
3:  for l=1Ll=1\rightarrow L do
4:     The CPU sets nln^{\star}_{l} to the number of elements in el\textbf{{e}}_{l} that are not smaller than λLn¯\lambda_{L\bar{n}} and sends nln^{\star}_{l} to the llth AP.
5:     The llth AP determines its local analog combiner Fl\textbf{{F}}_{l}^{\star} for nln^{\star}_{l} RF chains, i.e., Fl=[f~l1,,f~lnl]\textbf{{F}}_{l}^{\star}=[\tilde{\hskip 2.84544pt\textbf{{f}}}_{l1}^{\star},\ldots,\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln^{\star}_{l}}^{\star}], where f~ln\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln}^{\star} is given by (16), and determines Wl\textbf{{W}}_{l}^{\star} based on (15).
6:  end for

IV-C2 Path loss-based D-ARFA (PL-based D-ARFA)

Algorithm 4 HBF with PL-based D-ARFA
0:  F\textbf{{F}}^{\star}
1:  Find n={n1,,nL}\textbf{{n}}=\{n_{1},\ldots,n_{L}\} based on (19).
2:  Obtain {n[1],,n[L]}\{n_{[1]},\ldots,n_{[L]}\} s.t. α[1]>>α[L]\alpha_{[1]}>\ldots>\alpha_{[L]}.
3:  t=1t=1
4:  while n𝒮\textbf{{n}}\notin\mathcal{S} do
5:     if l=1Ln[l]<Ln¯\sum_{l=1}^{L}n_{[l]}<L\bar{n} and n[t]<Nn_{[t]}<N then
6:        n[t]=n[t]+1n_{[t]}=n_{[t]}+1
7:     end if
8:     if l=1Ln[l]>Ln¯\sum_{l=1}^{L}n_{[l]}>L\bar{n} and n[Lt+1]>0n_{[L-t+1]}>0 then
9:        n[Lt+1]=n[Lt+1]1n_{[L-t+1]}=n_{[L-t+1]}-1
10:     end if
11:     t=t+1t=t+1
12:     Reset t=1t=1 if t>Lt>L.
13:  end while
14:  Obtain n\textbf{{n}}^{\star} by reordering {n[1],,n[L]}\{n_{[1]},\ldots,n_{[L]}\} to the original order.
15:  for l=1Ll=1\rightarrow L do
16:     The CPU sends nln_{l}^{\star} to the llth AP.
17:     The llth AP determines its local analog combiner Fl\textbf{{F}}_{l}^{\star} for nln^{\star}_{l} RF chains, i.e., Fl=[f~l1,,f~lnl]\textbf{{F}}_{l}^{\star}=[\tilde{\hskip 2.84544pt\textbf{{f}}}_{l1}^{\star},\ldots,\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln^{\star}_{l}}^{\star}], where f~ln\tilde{\hskip 2.84544pt\textbf{{f}}}_{ln}^{\star} is given by (16) and determine Wl\textbf{{W}}_{l}^{\star} based on (15).
18:  end for

In the SV-based D-ARFA scheme, the largest singular values of the channel matrices are required to find n\textbf{{n}}^{\star}. This entails high computational complexity, especially when LL and NrN_{r} are large. To avoid this, we herein propose the PL-based D-ARFA scheme, in which n\textbf{{n}}^{\star} is obtained based on the total path losses associated with the APs. In CF massive MIMO systems, the APs are distributed in a large area. Therefore, the contribution of an AP to the total achievable rate considerably depends on its path loss.

Let βl=k=1Kβkl\beta_{l}=\sum_{k=1}^{K}\beta_{kl} be the sum of path loss of the llth AP, with βkl\beta_{kl} given in (2), and let αl=1βl,l\alpha_{l}=\frac{1}{\beta_{l}},\forall l. The number of activated RF chains at the llth AP can be set to

nl=min{N,Ln¯αli=1Lαi},l,\displaystyle n_{l}=\min\left\{N,\left\lfloor L\bar{n}\frac{\alpha_{l}}{\sum_{i=1}^{L}\alpha_{i}}\right\rceil\right\},\forall l, (19)

where min{N,}\min\{N,\cdot\} is used to guarantee nlNn_{l}\leq N, and \lfloor\cdot\rceil rounds a real number to its nearest integer. However, because of rounding, it is possible to obtain l=1LnlLn¯\sum_{l=1}^{L}n_{l}\neq L\bar{n}, which leads to n𝒮\textbf{{n}}\notin\mathcal{S}. To solve this problem, we propose Algorithm 4.

In Algorithm 4, the elements of n found in step 1 based on (19) are sorted in step 2 in decreasing order of {α1,,αL}\{\alpha_{1},\ldots,\alpha_{L}\}, i.e., in increasing order of the sums of path loss {β1,,βL}\{\beta_{1},\ldots,\beta_{L}\}, to generate {n[1],,n[L]}\{n_{[1]},\ldots,n_{[L]}\}. Here, the order index [t][t] indicates that n[t]n_{[t]} RF chains are chosen to be activated at the AP associated with the ttth-smallest path loss. Therefore, if l=1Ln[l]<Ln¯\sum_{l=1}^{L}n_{[l]}<L\bar{n} and n[t]<Nn_{[t]}<N, n[t]n_{[t]} is increased by one. In contrast, if l=1Ln[l]>Ln¯\sum_{l=1}^{L}n_{[l]}>L\bar{n} and n[Lt+1]>0n_{[L-t+1]}>0, n[Lt+1]n_{[L-t+1]} is decreased by one. This process is repeated until n𝒮\textbf{{n}}\in\mathcal{S} is satisfied, as shown in steps 3–13. In this procedure, by initializing t=1t=1 and gradually increasing tt, the numbers of turned-on RF chains for the APs with path loss are chosen to increase first, whereas those for the APs with larger path loss are chosen to decrease first. In step 14, {n[1],,n[L]}\{n_{[1]},\ldots,n_{[L]}\} are reordered into the original order. In steps 15–18, the numbers of active RF chains at the APs are determined, which are then fed back to the APs for SC-HBF, as in steps 3–6 of Algorithm 3.

V Power consumption analysis

In the considered uplink CF mMIMO system, the total power consumption is modeled as [41, 12, 42, 43]

Ptotal=k=1K(PTX,k+PUE,k)+l=1L(Pfix,l+PBF,l+PFH,l),\displaystyle P_{\text{total}}=\sum_{k=1}^{K}\left(P_{\text{TX},k}+P_{\text{UE},k}\right)+\sum_{l=1}^{L}\left(P_{\text{fix},l}+P_{\text{BF},l}+P_{\text{FH},l}\right), (20)

where PTX,kP_{\text{TX},k} and PUE,kP_{\text{UE},k} represent the transmit power and the required power to run circuit components at the kkth UE, respectively; Pfix,lP_{\text{fix},l}, PBF,lP_{\text{BF},l}, and PFH,lP_{\text{FH},l} respectively denote the fixed power consumption term, the variable power consumption for the beamforming structure, and the fronthaul power consumption for the llth AP. PTX,kP_{\text{TX},k} is given as

PTX,k=γσ2k=1K1ηk𝔼{xk2}=k=1Kγσ2ηk,\displaystyle P_{\text{TX},k}=\gamma\sigma^{2}\sum_{k=1}^{K}\frac{1}{\eta_{k}}\mathbb{E}\left\{\left\lVert\textbf{{x}}_{k}\right\rVert^{2}\right\}=\sum_{k=1}^{K}\frac{\gamma\sigma^{2}}{\eta_{k}}, (21)

where ηk(0,1]\eta_{k}\in(0,1] denotes the power amplifier efficiency of the UE kk, and the last equality is obtained by 𝔼{xk2}=1,k\mathbb{E}\left\{\left\lVert\textbf{{x}}_{k}\right\rVert^{2}\right\}=1,\forall k. In an HBF architecture, each antenna requires a low-noise amplifier (LNA) and two mixers, and each RF chain requires one ADC and NrN_{r} PSs, as illustrated in Fig. 1 [24, 44, 45]. Therefore, PBF,lP_{\text{BF},l} linearly depends on the numbers of antennas (Nr)(N_{r}) and active RF chains at the llth AP (nl)(n_{l}) as follows:

PBF,l=NrpBF,1+nlpBF,2,\displaystyle P_{\text{BF},l}=N_{r}p_{\text{BF},1}+n_{l}p_{\text{BF},2}, (22)

where pBF,1=pLNA+2pMp_{\text{BF},1}=p_{\text{LNA}}+2p_{\text{M}}, pBF,2=NrpPS+pRF+pADCp_{\text{BF},2}=N_{r}p_{\text{PS}}+p_{\text{RF}}+p_{\text{ADC}}, with pLNAp_{\text{LNA}}, pMp_{\text{M}}, pPSp_{\text{PS}}, pRFp_{\text{RF}}, and pADCp_{\text{ADC}} respectively denoting the power consumed by an LNA, mixer, PS, RF chain, and ADC. Furthermore, PFH,lP_{\text{FH},l} can be obtained by [41, 12]

PFH,l=PFH,maxRFH,lCFH,l=κlRFH,l,\displaystyle P_{\text{FH},l}=P_{\text{FH,max}}\frac{R_{\text{FH},l}}{C_{\text{FH},l}}=\kappa_{l}R_{\text{FH},l}, (23)

where PFH,maxP_{\text{FH,max}} is the maximum power required for the fronthaul traffic at the full capacity CFH,lC_{\text{FH},l}, FH,l\mathcal{R}_{\text{FH},l} is the actual fronthaul rate between the llth AP and the CPU, and κl=PFH,maxCFH,l\kappa_{l}=\frac{P_{\text{FH,max}}}{C_{\text{FH},l}}. In the considered decentralized signal processing scheme, 2Kτdαl2K\tau_{d}\alpha_{l} bits are required to quantize the signal vector rlK×1\textbf{{r}}_{l}\in\mathbb{C}^{K\times 1} during each coherence interval [12, 46] at the llth AP before being sent to the CPU. Here, αl\alpha_{l} is the number of quantization bits at the llth AP, and τd\tau_{d} is the length (in symbols) of the uplink data. As a result, FH,l\mathcal{R}_{\text{FH},l} is given by [12, 46]

RFH,l=2KτdαlTc,\displaystyle R_{\text{FH},l}=\frac{2K\tau_{d}\alpha_{l}}{T_{c}}, (24)

where TcT_{c} is the coherence time (in seconds). Assume that all the UEs have the same power amplifier efficiency and circuit power consumption, i.e., ηk=η\eta_{k}=\eta, PUE,k=PUE,kP_{\text{UE},k}=P_{\text{UE}},\forall k, and that all APs have the same fixed power consumption, number of quantization bits, and capacity, i.e., Pfix,l=PfixP_{\text{fix},l}=P_{\text{fix}}, αl=α\alpha_{l}=\alpha, CFH,l=CFHC_{\text{FH},l}=C_{\text{FH}}, κl=κ\kappa_{l}=\kappa, l\forall l. Then, we have PFH,l=PFHP_{\text{FH},l}=P_{\text{FH}} and FH,l=RFH\mathcal{R}_{\text{FH},l}=R_{\text{FH}}, l\forall l. Furthermore, we note that AP ll requires PfixP_{\text{fix}}, even when it is in sleep mode; in contrast, PFHP_{\text{FH}} and PBF,lP_{\text{BF},l} are only consumed when it is in the active mode. Let 𝔸\mathbb{A} be the set of APs in active mode and |𝔸|\left|\mathbb{A}\right| be the number of active APs. Then, from (20)–(24), the total power consumption can be expressed as

Ptotal\displaystyle P_{\text{total}} =Kγσ2η+KPUE+LPfix+|𝔸|PFH+l𝔸(NrpBF,1+nlpBF,2),\displaystyle=\frac{K\gamma\sigma^{2}}{\eta}+KP_{\text{UE}}+LP_{\text{fix}}+\left|\mathbb{A}\right|P_{\text{FH}}+\sum_{l\in\mathbb{A}}\left(N_{r}p_{\text{BF},1}+n_{l}p_{\text{BF},2}\right),
=P0+|𝔸|PFH+|𝔸|NrpBF,1+pBF,2l𝔸nl,\displaystyle=P_{0}+\left|\mathbb{A}\right|P_{\text{FH}}+\left|\mathbb{A}\right|N_{r}p_{\text{BF},1}+p_{\text{BF},2}\sum_{l\in\mathbb{A}}n_{l}, (25)

where P0=Kγσ2η+KPUE+LPfixP_{0}=\frac{K\gamma\sigma^{2}}{\eta}+KP_{\text{UE}}+LP_{\text{fix}}, a fixed term in PtotalP_{\text{total}}, for simple exposition.

It is observed from (25) that PtotalP_{\text{total}} varies depending on the number of active APs, i.e., |𝔸|\left|\mathbb{A}\right|; the total number of turned on RF chains, i.e., l𝔸nl\sum_{l\in\mathbb{A}}n_{l}; and the number of antennas NrN_{r}. More specifically, it is a linearly increasing function of these factors. Therefore, PtotalP_{\text{total}} can be minimized by using only a subset of APs in the APS scheme [11], using a reduced number of antennas in the AS scheme [28], or optimizing both l𝔸nl\sum_{l\in\mathbb{A}}n_{l} and |𝔸|\left|\mathbb{A}\right| in the proposed ARFA scheme. Next, we compare these schemes in terms of the total power consumption. Furthermore, conventional fixed-activation HBF schemes are also considered as benchmarks.

\bullet ARFA scheme: When the ARFA scheme is employed, nln_{l} is different among the APs; however, the total number of RF chains is fixed to Ln¯L\bar{n}, i.e., l𝔸nl=Ln¯\sum_{l\in\mathbb{A}}n_{l}=L\bar{n}. By inserting this into (25), we obtain

PtotalARFA=P0+|𝔸|PFH+|𝔸|NrpBF,1+Ln¯pBF,2,\displaystyle P_{\text{total}}^{\text{ARFA}}=P_{0}+\left|\mathbb{A}\right|P_{\text{FH}}+\left|\mathbb{A}\right|N_{r}p_{\text{BF},1}+L\bar{n}p_{\text{BF},2}, (26)

where 𝔸\mathbb{A} contains only the APs with at least one activated RF chain. Therefore, we have |𝔸|=l𝔸δl\left|\mathbb{A}\right|=\sum_{l\in\mathbb{A}}\delta_{l} with δl=1\delta_{l}=1 if nl>0n_{l}>0, and δl=0\delta_{l}=0 if nl=0n_{l}=0. We note that the proposed ARFA algorithms have different operations, which can result in different 𝔸\mathbb{A}. Therefore, they can have different power consumption.

\bullet Fixed-activation HBF: We refer to the SC-HBF and D-HBF without the ARFA as the fixed-activation HBF. In this scheme, the same number of RF chains are activated at all LL APs. For comparison with the proposed ARFA schemes, we consider two deployments: nl=N,ln_{l}=N,\forall l and nl=n¯,ln_{l}=\bar{n},\forall l. We note that with fixed activation HBF, all the APs are in active mode because they have a fixed nonzero number of RF chains for signal processing, i.e., |𝔸|=L\left|\mathbb{A}\right|=L. By inserting nl=N,ln_{l}=N,\forall l, and nl=n¯,ln_{l}=\bar{n},\forall l into (25), we obtain

Ptotalfix,N=P0+LPFH+LNrpBF,1+LNpBF,2,\displaystyle P_{\text{total}}^{\text{fix},N}=P_{0}+LP_{\text{FH}}+LN_{r}p_{\text{BF},1}+LNp_{\text{BF},2}, (27)
Ptotalfix,n¯=P0+LPFH+LNrpBF,1+Ln¯pBF,2.\displaystyle P_{\text{total}}^{\text{fix},\bar{n}}=P_{0}+LP_{\text{FH}}+LN_{r}p_{\text{BF},1}+L\bar{n}p_{\text{BF},2}. (28)

\bullet APS scheme: In this scheme, only a subset of the APs is selected based on received power [11], whereas the others are put into the sleep mode. For comparison with the proposed schemes in mmWave systems, we assume the conventional deployment of RF chains at the APs, i.e., each AP is equipped with NN RF chains, all of which are used for analog signal combining, i.e., nl=N,ln_{l}=N,\forall l. For a fair comparison, the number of APs in active mode in this scheme is assumed to be Ln¯N\frac{L\bar{n}}{N}. This guarantees that a total of Ln¯L\bar{n} RF chains are used at the selected APs, which is equal to the number of activated RF chains in the proposed ARFA scheme and fixed-activation HBF scheme with nl=n¯,ln_{l}=\bar{n},\forall l. By inserting nl=N,ln_{l}=N,\forall l, and |𝔸|=Ln¯N\left|\mathbb{A}\right|=\frac{L\bar{n}}{N} into (25), we have

PtotalAPS=P0+Ln¯NPFH+Ln¯NNrpBF,1+Ln¯pBF,2.\displaystyle P_{\text{total}}^{\text{APS}}=P_{0}+\frac{L\bar{n}}{N}P_{\text{FH}}+\frac{L\bar{n}}{N}N_{r}p_{\text{BF},1}+L\bar{n}p_{\text{BF},2}. (29)

\bullet AS scheme: In the AS scheme, at each AP, only NrASN_{r}^{\text{AS}} of NrN_{r} antennas are activated, corresponding to NrASN_{r}^{\text{AS}} received signals put through the digital signal combining [28]. In other words, analog signal combining is conducted by a network of NrASN_{r}^{\text{AS}} switches rather than NNrNN_{r} PSs, in contrast to the other compared schemes. Therefore, at each AP, NrN_{r} switches are required, whereas the numbers of antennas, RF chains, and ADCs are the same and as small as NrASN_{r}^{\text{AS}}, and the number of mixers is 2NrAS2N_{r}^{\text{AS}}. Furthermore, in the AS scheme, all the APs are in the active mode, i.e., |𝔸|=L\left|\mathbb{A}\right|=L. Let pSWp_{\text{SW}} be the power consumed by a switch. The total power consumption in this scheme is given as

PtotalAS=P0+LPFH+LNrpSW+LNrAS(pRF+pADC+pBF,1).\displaystyle P_{\text{total}}^{\text{AS}}=P_{0}+LP_{\text{FH}}+LN_{r}p_{\text{SW}}+LN_{r}^{\text{AS}}(p_{\text{RF}}+p_{\text{ADC}}+p_{\text{BF},1}). (30)

By comparing (26) to (27)–(29), we observe that:

  • The proposed ARFA scheme requires no higher power consumption than the fixed-activation HBF schemes with nl=Nn_{l}=N and nl=n¯,ln_{l}=\bar{n},\forall l, because n¯<N\bar{n}<N and |𝔸|L\left|\mathbb{A}\right|\leq L. Furthermore, we note that a dominant part of the power consumed for beamforming is created by the RF chains and ADC. Therefore, from (26) and (27), it is clear that a considerable reduction in power consumption can be obtained by the ARFA if n¯N\bar{n}\ll N is chosen.

  • Both the power consumption and total achievable rate of the proposed ARFA scheme significantly depend on n¯\bar{n}. Specifically, a smaller n¯\bar{n} leads to a reduction in both power consumption and total achievable rate with respect to the fixed-activation HBF scheme with nl=N,ln_{l}=N,\forall l. This tradeoff is discussed further in the next section.

  • It is observed from (29) and (26) that the APS and proposed ARFA schemes have a difference of |Ln¯N|𝔸||(PFH+NrpBF,1)\left|\frac{L\bar{n}}{N}-\left|\mathbb{A}\right|\right|(P_{\text{FH}}+N_{r}p_{\text{BF,1}}) in power consumption, even though they have the same total number of active RF chains. Specifically, the APS scheme requires slightly lower power consumption, but its achievable rate is much lower than that of the ARFA scheme, as is shown in the next section. It is not certain from (30) and (26) which of AS and ARFA schemes has the lower power consumption, which will be determined based on the simulation results in the next section.

VI Simulation results

VI-A Simulation parameters

Table I: Simulation parameters [12, 11, 24, 41]
Parameters Values
Power amplifier efficiency η=0.3\eta=0.3
Coherent time and data length Tc=2T_{c}=2 ms, τc=200\tau_{c}=200, τp=20\tau_{p}=20 symbols
No. of quantization bits α=2\alpha=2 bits
UE and fixed power term PUE=1P_{\text{UE}}=1 W, Pfix=0.825P_{\text{fix}}=0.825 W
Fronthaul capacity and required power CFH=100C_{\text{FH}}=100 Mbps, PFH,max=50P_{\text{FH,max}}=50 W
Component power pLNA=20p_{\text{LNA}}=20 mW, pADC=200p_{\text{ADC}}=200 mW, pRF=40p_{\text{RF}}=40 mW pPS=30p_{\text{PS}}=30 mW, pM=0.3p_{\text{M}}=0.3 mW, pSW=5p_{\text{SW}}=5 mW, ρp=100\rho_{p}=100 mW

Simulations are performed to evaluate the total achievable rates, power consumption, EEs, and computational complexities of the proposed SC-HBF, D-HBF, and ARFA schemes. In simulations, KK UEs and LL APs are uniformly distributed at random within a square coverage area of size D×DD\times D m2\text{m}^{2}, where DD is set to 10001000 m [4]. The large-scale fading coefficients are computed based on (2) with ϵ=4.1\epsilon=4.1, ξ=7.6\xi=7.6, and the antenna gain is set to Ga=15G_{\mathrm{a}}=15 dBi [36]. Furthermore, we assume fc=28f_{c}=28 GHz, B=100B=100 MHz, and NF=9\text{NF}=9 dB for the carrier frequency, system bandwidth, and noise figure, respectively. As a result, the noise power is given as σ2=174 dBm/Hz+10log10(B)+NF\sigma^{2}=-174\text{ dBm/Hz}+10\log_{10}(B)+\text{NF}.

The channel coefficients between each UE and AP are generated based on the geometric Saleh–Valenzuela channel model given in (1). For simplicity, we assume an identical number of effective channel paths between each UE and AP, which is set to Pkl=20,l,kP_{kl}=20,\forall l,k [47, 32, 1], reflecting the limited scattering in mmWave channels. The AoAs are uniformly distributed in [π12,π12]\left[-\frac{\pi}{12},\frac{\pi}{12}\right]. The ULA model is employed for the antenna arrays at the APs and UEs with antenna spacing of half a wavelength, i.e., dsλ=12\frac{d_{s}}{\lambda}=\frac{1}{2} [40, 48]. The phases in the analog combiner are selected from Θ={0,2π2b,4π2b,,2(2b1)π2b}\Theta=\left\{0,\frac{2\pi}{2^{b}},\frac{4\pi}{2^{b}},\ldots,\frac{2(2^{b}-1)\pi}{2^{b}}\right\}, where b=4b=4 is set, implying 4-bit quantization of the PSs. The parameters in Table I are assumed to compute the total power consumption [12, 11, 24, 41].

VI-B Performance of the C-HBF and SC-HBF schemes

We numerically evaluate the total achievable rates of the C-HBF and SC-HBF schemes, which are analyzed in Section III. We assume the conventional RF chain deployment in this section, i.e., all NN available RF chains are active for analog combining. For comparison, we consider the beam selection scheme, in which the analog beamforming vectors are selected from a discrete Fourier transform (DFT) codebook based on an exhaustive search [49, 50]. Fig. 2 shows the total achievable rates of the SC-HBF, D-HBF, and beam selection schemes with (Nr,K)={(32,4),(64,8)}(N_{r},K)=\{(32,4),(64,8)\}, N=KN=K, and L=32L=32 [21]. It is clear from Fig. 2 that for both the considered systems, the SC-HBF and D-HBF schemes have almost the same total achievable rates because they employ the same digital beamformer and their analog beamformers perform approximately the same, as stated in Remark 1. Specifically, the former performs only slightly better than the latter. It is also observed in Fig. 2 that the proposed SC-HBF and D-HBF schemes outperform the beam selection method for both the considered systems.

Refer to caption
Fig. 2: Total achievable rates of the C-HBF and SC-HBF schemes compared to those of the beam selection scheme with L=32L=32, K=8K=8, Nr=64N_{r}=64, and N=8N=8.

VI-C Performance of the proposed ARFA scheme

The total achievable rates, power consumption, and EEs of the proposed ARFA schemes, namely, SC-ARFA, PL-based D-ARFA, and SV-based D-ARFA, are compared to those of the fixed-activation HBF with nl=Nn_{l}=N and nl=n¯,ln_{l}=\bar{n},\forall l, APS, and AS schemes discussed in Section V. In our simulations, SC-HBF is used for the fixed-activation HBF and APS schemes. We note that the SC- and D-HBF provides almost identical performance, as shown in Fig. 2, and for the same RF chain deployment, they have the same power consumption. We consider a CF mmWave massive MIMO system with L=32L=32, K=8K=8, Nr=64N_{r}=64, N=8N=8 [5, 21], and n¯=2\bar{n}=2. In the AS scheme, the number of selected antennas at each AP is set to NrAS=32N_{r}^{\text{AS}}=32, which ensures that the AS scheme achieves comparable total achievable rates with respect to the proposed schemes, allowing us to compare them in terms of EE. In the simulations, the power consumption of the fixed-activation HBF schemes with nl=Nn_{l}=N, nl=n¯n_{l}=\bar{n}, the APS, and AS scheme is computed based on (27)–(30), whereas that of the proposed ARFA schemes is obtained through simulations because it depends on δnl\delta_{n_{l}}, as indicated in (26). The EE of a scheme is calculated as the ratio between the total achievable rate and the total power consumption. Furthermore, for a fair comparison, we fix the total number of active RF chains for each compared scheme to Ln¯L\bar{n}, which ensures that an average of n¯\bar{n} RF chains are activated at each AP in all compared schemes.

Refer to caption
(a) Total achievable rate
Refer to caption
(b) Energy efficiency
Fig. 3: Total achievable rates and EEs of the proposed ARFA schemes compared to those of the fixed-activation HBF with nl=Nn_{l}=N, nl=n¯,ln_{l}=\bar{n},\forall l, APS, and AS schemes. Simulation parameters are L=32L=32, K=8K=8, Nr=64N_{r}=64, N=8N=8, and n¯=2\bar{n}=2.

In Fig. 3, we show the total achievable rates and EEs of the considered schemes versus the average transmit power ρ\rho for L=32L=32, K=8K=8, Nr=64N_{r}=64, N=8N=8, and n¯=2\bar{n}=2. From Fig. 3, the following observations are noted:

  • It is seen that the fixed-activation HBF scheme with nl=Nn_{l}=N achieves the highest total achievable rate, as seen in Fig. 3(a), because it activates all the available APs and RF chains. However, in this scheme, high power is consumed by LNLN RF chains. Therefore, its EE is significantly lower than those of the other considered schemes, in which only Ln¯L\bar{n} (LN)(\ll LN) RF chains are turned on, as seen in Fig. 3(b).

  • Among the proposed ARFA schemes, the SC-ARFA achieves the highest achievable rate and EE. However, all of these schemes achieve remarkable improvement in EE with a small loss in the total achievable rate with respect to the fixed-activation HBF scheme with nl=Nn_{l}=N..

  • It is also observed that the proposed ARFA schemes outperform the AS scheme in terms of both the total achievable rate and EE. As it attains a comparable EE to the SC-ARFA scheme, the APS scheme is seen to be energy-efficient in Fig. 3(b), but its achievable rate is lower than those of all the proposed ARFA schemes. The fixed-activation HBF scheme with nl=n¯n_{l}=\bar{n} is outperformed by the proposed SC-ARFA schemes in both achievable rate and EE.

Refer to caption
(a) Total achievable rate
Refer to caption
(b) Energy efficiency
Refer to caption
(c) Total power consumption
Fig. 4: Total achievable rates, EEs, and power consumption of the proposed ARFA schemes compared to those of the fixed-activation HBF with nl=Nn_{l}=N, nl=n¯,ln_{l}=\bar{n},\forall l, APS, and AS schemes. Simulation parameters are L={8,16,32,48,64}L=\{8,16,32,48,64\}, K=8K=8, Nr=64N_{r}=64, N=8N=8, n¯=2\bar{n}=2, and ρ=40\rho=40 dBm.

In Fig. 4, we show the total achievable rates, EEs, and power consumption of the considered schemes versus the number of APs. In this figure, we use the same simulation parameters as in Fig. 3, except for the varying numbers of APs, i.e., L={8,16,32,48,64}L=\{8,16,32,48,64\}, and ρ=40\rho=40 dBm. In Figs. 4(a) and 4(b), the observations on the achievable rates and EEs of the considered schemes are similar to those from Fig. 3. In particular, it is seen that in the entire range of LL, the proposed ARFA schemes have small losses in total achievable rate but significant improvement in EE with respect to the fixed-activation HBF scheme with nl=Nn_{l}=N. Furthermore, the proposed ARFA schemes perform better than or comparable to the APS scheme in terms of both achievable rate and EE. In particular, it is clear that the AS is less efficient in both the spectral and energy compared to the proposed schemes. To further explain the EEs, we consider the total power consumption of these schemes in Fig. 4(c). It can be seen that the total power consumption of the fixed-activation schemes quickly increases with LL. Therefore, activating all NN RF chains at all the APs causes an extremely high power consumption for the CF mmWave massive MIMO system, motivating the ARFA in this work. Among the other schemes, the AS scheme consumes the highest power while achieving the lowest rates, making it energy-inefficient, as seen in Fig. 4(b). The proposed SC-ARFA and APS schemes have comparable and low power consumption.

To summarize, we can conclude from Figs. 3 and 4 that when a limited number of RF chains are used, equally activating the same number of RF chains at the APs, as in the fixed-activation HBF scheme with nl=Nn_{l}=N or nl=n¯n_{l}=\bar{n}, is relatively energy-inefficient. Furthermore, although the APS scheme is energy-efficient approach, it has losses in total achievable rate. In contrast, the proposed ARFA schemes achieve the highest or nearly highest performance in terms of both spectral and energy efficiency, which are both far better than those of the AS scheme.

VI-D Tradeoff between achievable rates and power consumption

The total achievable rate and power consumption of the considered schemes versus n¯\bar{n} are evaluated numerically in Fig. 5 for L=32L=32, K=8K=8, Nr=64N_{r}=64, N=8N=8, n¯={1,2,,8}\bar{n}=\{1,2,\ldots,8\}, and ρ=40\rho=40 dBm. From Fig. 5, the following observations can be noted:

  • The total achievable rate and power consumption of the fixed-activation HBF with nl=Nn_{l}=N and those of the AS scheme remain unchanged with n¯\bar{n} because NN and NrASN_{r}^{\text{AS}} RF chains, respectively, are always active at every AP. In contrast, those of the other schemes depend on n¯\bar{n}. Specifically, as n¯\bar{n} increases, both the total achievable rate and power consumption of the fixed-activation HBF scheme with nl=n¯n_{l}=\bar{n}, the proposed ARFA, and the APS schemes increase to approach those of the fixed-activation HBF with nl=Nn_{l}=N.

  • In Fig. 5(a), the proposed ARFA schemes perform closest to the fixed-activation HBF scheme with nl=Nn_{l}=N, especially for a small n¯\bar{n}. In terms of power consumption, they require slightly higher power than the APS scheme. However, the EEs of SC-ARFA and APS are comparable, as shown in Figs. 3 and 4, owing to the efficient use of RF chains. Furthermore, for n¯2\bar{n}\geq 2, the AS scheme has the lowest power consumption at the cost of the smallest total achievable rate.

  • For the optimal performance–power consumption tradeoff in the assumed environment, n¯[2,4]\bar{n}\in[2,4] can be chosen in the proposed ARFA schemes to achieve a significant power reduction with marginal performance loss. In particular, for n¯=4\bar{n}=4, the performance loss is only 1.44.8%1.4-4.8\%. With n¯=1\bar{n}=1, only a single RF chain on average is turned on at each AP, and a significant loss in the achievable rate is observed for the proposed ARFA with respect to the fixed-activation HBF with nl=Nn_{l}=N.

Refer to caption
(a) Total achievable rate
Refer to caption
(b) Power consumption
Fig. 5: Total achievable rates and EEs of the proposed ARFA schemes compared to those of the fixed-activation HBF with nl=Nn_{l}=N, nl=n¯,ln_{l}=\bar{n},\forall l, and APS schemes. Simulation parameters are L=32L=32, K=8K=8, Nr=64N_{r}=64, N=8N=8, n¯={1,2,,8}\bar{n}=\{1,2,\ldots,8\}, and ρ=40\rho=40 dBm.

VI-E Fronthauling load analysis

Table II: Comparison of the decentralized and semi-centralized schemes in terms of fronthauling load
Schemes AP to CPU CPU to AP
SC-HBF NrKN_{r}K complex numbers NrNN_{r}N real numbers
SC-ARFA NrKN_{r}K complex numbers Nrn¯N_{r}\bar{n} real numbers
D-HBF KK complex numbers 0
SV-based D-ARFA KK complex numbers and NN real numbers 1 real number
PL-based D-ARFA KK complex numbers 1 real number

In this section, we evaluate the amount of information exchange between the CPU and APs, which is presented in Table II. In the SC-HBF scheme, the CSI for H^l\hat{\textbf{{H}}}_{l} of size Nr×KN_{r}\times K is sent from the llth AP to the CPU, which is used at the CPU to generate Fl\textbf{{F}}_{l} of size Nr×NN_{r}\times N. However, we note that all the entries of Fl\textbf{{F}}_{l} have constant amplitudes of 1Nr\frac{1}{\sqrt{N_{r}}}. Therefore, only NrNN_{r}N real numbers representing the phases are fed back on the reverse link. A similar analysis is valid for SC-ARFA with the note that only an average of Nrn¯N_{r}\bar{n} real numbers need to be fed back from the APs to the CPU because in these schemes, only an average of n¯\bar{n} RF chains are activated. It is observed that the amount of information exchange between the CPU and APs is relatively large in the SC-HBF and SC-ARFA schemes.

In contrast, those for the decentralized schemes are small. Specifically, in the D-HBF schemes, only KK complex numbers representing the estimate of the transmitted signal, i.e., rl\textbf{{r}}_{l}, are sent to the CPU for the final soft estimation. In the SV-based D-ARFA scheme, an addition of NN real numbers for the NN singular values are sent from an AP to the CPU for each channel variation to perform ARFA. In contrast, the transmission of path loss values in the PL-based D-ARFA scheme can be ignored because of their slow variations. On the reverse link from the CPU to an AP, only a single real number, which is the number of active RF chains, is fed back in the SV- and PL-based D-ARFA schemes, as demonstrated in Algorithms 3 and 4, respectively. Given that NNrN\ll N_{r}, the decentralized schemes require much less information exchange between the CPU and each AP compared to the semi-centralized schemes.

VII Conclusion

In this work, we propose two HBF schemes for CF mmWave massive MIMO systems, including SC-HBF and D-HBF, in which the analog combiners are generated at the CPU based on the global CSI and at each AP based on the local CSI, respectively. Notably, although the D-HBF requires substantially lower computational complexity and no information exchange between the CPU and APs, it achieves approximately the same total achievable rate as that obtained by the SC-HBF scheme. Furthermore, to reduce the power consumption in the CF mmWave massive MIMO system, we propose adaptive activation of RF chains at the APs. Low-complexity algorithms are developed to select the number of active RF chains at the APs such that the system’s power consumption is significantly reduced with only a marginal loss in the total achievable rate. The efficiency of the proposed schemes is justified by the simulation results, which show that the proposed ARFA scheme achieves significant improvement in EE while leading to a loss of only small loss in total achievable rate. In this work, the assumption on the availability of the full CSI for all the antennas is adopted for the design of hybrid beamformers and ARFA schemes. Although such full CSI can be obtained by compressed sensing-based approaches, further studies are required to make it practical in CF mmWave massive MIMO systems. Further optimization of the proposed analog combining schemes for wideband systems can be considered for future research. Furthermore, the proposed ARFA scheme can be incorporated with low-resolution ADCs [51, 52] to further reduce the power consumption.

Appendix A Proof of Theorem 1

Let Q=ILN+γFHHHHF\textbf{{Q}}=\textbf{{I}}_{LN}+\gamma\textbf{{F}}^{H}\textbf{{H}}\textbf{{H}}^{H}\textbf{{F}}. Because F is a block-diagonal matrix, we have

HHF=[H1HF1,H2HF2,,HLHFL],\textbf{{H}}^{H}\textbf{{F}}=\left[\textbf{{H}}_{1}^{H}\textbf{{F}}_{1},\textbf{{H}}_{2}^{H}\textbf{{F}}_{2},\ldots,\textbf{{H}}_{L}^{H}\textbf{{F}}_{L}\right],

leading to HHFFHH=l=1LHlHFlFlHHl\textbf{{H}}^{H}\textbf{{F}}\textbf{{F}}^{H}\textbf{{H}}=\sum_{l=1}^{L}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}. Therefore, Q can be expressed as Q=IK+γl=1LHlHFlFlHHl\textbf{{Q}}=\textbf{{I}}_{K}+\gamma\sum_{l=1}^{L}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}. By letting Gl=HlHFlFlHHl\textbf{{G}}_{l}=\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}, Q can be further expanded as

Q =IK+γG1E1+γG2++γGL=E1(IK+γE11G2E2++γE11GL)\displaystyle=\underbrace{\textbf{{I}}_{K}+\gamma\textbf{{G}}_{1}}_{\triangleq\textbf{{E}}_{1}}+\gamma\textbf{{G}}_{2}+\ldots+\gamma\textbf{{G}}_{L}=\textbf{{E}}_{1}(\underbrace{\textbf{{I}}_{K}+\gamma\textbf{{E}}_{1}^{-1}\textbf{{G}}_{2}}_{\triangleq\textbf{{E}}_{2}}+\ldots+\gamma\textbf{{E}}_{1}^{-1}\textbf{{G}}_{L})
=E1E2(IK+γE21E11G2E3++γE21E11GL)==E1E2EL,\displaystyle=\textbf{{E}}_{1}\textbf{{E}}_{2}(\underbrace{\textbf{{I}}_{K}+\gamma\textbf{{E}}_{2}^{-1}\textbf{{E}}_{1}^{-1}\textbf{{G}}_{2}}_{\triangleq\textbf{{E}}_{3}}+\ldots+\gamma\textbf{{E}}_{2}^{-1}\textbf{{E}}_{1}^{-1}\textbf{{G}}_{L})=\ldots=\textbf{{E}}_{1}\textbf{{E}}_{2}\ldots\textbf{{E}}_{L}, (31)

where El=IK+γ(E1El1)1Gl\textbf{{E}}_{l}=\textbf{{I}}_{K}+\gamma(\textbf{{E}}_{1}\ldots\textbf{{E}}_{l-1})^{-1}\textbf{{G}}_{l}, l=2,3,,Ll=2,3,\ldots,L. As a result, ¯a\bar{\mathcal{R}}^{\mathrm{a}} can be expressed as

¯a\displaystyle{\bar{\mathcal{R}}^{\mathrm{a}}} =log2detQ=l=1Llog2det(El)=l=1Llog2det(IK+γ(E1El1Ql1)1Gl)\displaystyle=\log_{2}\det\textbf{{Q}}=\sum_{l=1}^{L}\log_{2}\det(\textbf{{E}}_{l})=\sum_{l=1}^{L}\log_{2}\det(\textbf{{I}}_{K}+\gamma(\underbrace{\textbf{{E}}_{1}\ldots\textbf{{E}}_{l-1}}_{\triangleq\textbf{{Q}}_{l-1}})^{-1}\textbf{{G}}_{l}) (32)
=l=1Llog2det(IK+γQl11HlHFlFlHHl)=l=1Llog2det(IN+γFlHHlQl11HlHFl),\displaystyle=\sum_{l=1}^{L}\log_{2}\det\left(\textbf{{I}}_{K}+\gamma\textbf{{Q}}_{l-1}^{-1}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\right)=\sum_{l=1}^{L}\log_{2}\det\left(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\textbf{{Q}}_{l-1}^{-1}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\right), (33)

as given in Theorem 1. The last equality in (33) follows from det(IK+AB)=det(IN+BA)\det(\textbf{{I}}_{K}+\textbf{{A}}\textbf{{B}})=\det(\textbf{{I}}_{N}+\textbf{{B}}\textbf{{A}}) with A=Ql11HlHFlK×N\textbf{{A}}=\textbf{{Q}}_{l-1}^{-1}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\in\mathbb{C}^{K\times N} and B=FlHHlN×K\textbf{{B}}=\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\in\mathbb{C}^{N\times K}. Furthermore, from the definition of Ql1\textbf{{Q}}_{l-1} in (32), we have El1=IK+γ(E1El2)1Gl1=IK+γQl21Gl\textbf{{E}}_{l-1}=\textbf{{I}}_{K}+\gamma(\textbf{{E}}_{1}\ldots\textbf{{E}}_{l-2})^{-1}\textbf{{G}}_{l-1}=\textbf{{I}}_{K}+\gamma\textbf{{Q}}_{l-2}^{-1}\textbf{{G}}_{l}. Finally, recalling that Gl=HlHFlFlHHl\textbf{{G}}_{l}=\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}, we obtain the expression of Ql1\textbf{{Q}}_{l-1} in (12), i.e.,

Ql1\displaystyle\textbf{{Q}}_{l-1} =Ql2El1=Ql2(IK+γQl21Gl)=Ql2+γHl1HFl1Fl1HHl1,\displaystyle=\textbf{{Q}}_{l-2}\textbf{{E}}_{l-1}=\textbf{{Q}}_{l-2}(\textbf{{I}}_{K}+\gamma\textbf{{Q}}_{l-2}^{-1}\textbf{{G}}_{l})=\textbf{{Q}}_{l-2}+\gamma\textbf{{H}}_{l-1}^{H}\textbf{{F}}_{l-1}\textbf{{F}}_{l-1}^{H}\textbf{{H}}_{l-1}, (34)

with Q0=IK\textbf{{Q}}_{0}=\textbf{{I}}_{K}, which completes the proof.

Appendix B Proof of Remark 1

From (34), Ql1\textbf{{Q}}_{l-1} in (11) can be expressed as

Ql1=IK+γi=1l1HiHFiFiHHi,l=2,,L.\displaystyle\textbf{{Q}}_{l-1}=\textbf{{I}}_{K}+\gamma\sum_{i=1}^{l-1}\textbf{{H}}_{i}^{H}\textbf{{F}}_{i}\textbf{{F}}_{i}^{H}\textbf{{H}}_{i},l=2,\ldots,L. (35)

B-1 When ll is small

With the assumption of very low SNRs in CF mmWave massive MIMO, we have Ql1IK\textbf{{Q}}_{l-1}\approx\textbf{{I}}_{K} for small ll, leading to

¯la~la=log2det(IN+γFlHHlHlHFl),\displaystyle\bar{\mathcal{R}}^{\mathrm{a}}_{l}\approx\tilde{\mathcal{R}}^{\mathrm{a}}_{l}=\log_{2}\det\left(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{H}}_{l}\textbf{{H}}_{l}^{H}\textbf{{F}}_{l}\right), (36)

where ¯la\bar{\mathcal{R}}^{\mathrm{a}}_{l} is the sub-rate associated with the llth AP in SC-HBF, given in (11). The unconstrained combiner that maximizes ~l\tilde{\mathcal{R}}_{l} in (36) is the matrix with columns being the NN singular vectors corresponding to the NN largest singular values of Hl\textbf{{H}}_{l}. As a result, the analog combining vectors in the D-HBF scheme can be determined as in (16).

B-2 As ll increases

Because Fi\textbf{{F}}_{i} only depends on Hi\textbf{{H}}_{i} for small ll, {HiHFiFiHHi},i=1,,l1\{\textbf{{H}}_{i}^{H}\textbf{{F}}_{i}\textbf{{F}}_{i}^{H}\textbf{{H}}_{i}\},i=1,\ldots,l-1 are independent of each other. As ll grows and becomes sufficiently large, by the law of large numbers, we have i=1l1HiHFiFiHHi(l1)E¯\sum_{i=1}^{l-1}\textbf{{H}}_{i}^{H}\textbf{{F}}_{i}\textbf{{F}}_{i}^{H}\textbf{{H}}_{i}\rightarrow(l-1)\bar{\textbf{{E}}}, where E¯=𝔼{HiHFiFiHHi}\bar{\textbf{{E}}}=\mathbb{E}\left\{\textbf{{H}}_{i}^{H}\textbf{{F}}_{i}\textbf{{F}}_{i}^{H}\textbf{{H}}_{i}\right\} has constant diagonal elements and zeros for off-diagonal elements. Therefore, Ql1\textbf{{Q}}_{l-1} in (35) becomes approximately diagonal even when ll is large, as does Ql11\textbf{{Q}}_{l-1}^{-1}.

Based on the ordered singular value decomposition, Hl\textbf{{H}}_{l} can be factorized as Hl=U𝚺VH\textbf{{H}}_{l}=\textbf{{U}}\mathbf{\Sigma}\textbf{{V}}^{H}, where 𝚺\mathbf{\Sigma} is an Nr×KN_{r}\times K rectangular diagonal matrix with the singular values of Hl\textbf{{H}}_{l} on the main diagonal in decreasing order, whereas U and V are unitary matrices of size Nr×NrN_{r}\times N_{r} and K×KK\times K, whose columns are the left- and right-singular vectors of Hl\textbf{{H}}_{l}, respectively. Then, ¯la\bar{\mathcal{R}}_{l}^{\mathrm{a}} in (11) can be expressed as

¯la\displaystyle\bar{\mathcal{R}}_{l}^{\mathrm{a}} =log2det(IN+γFlHU𝚺VHQl11V𝚺H𝚲UHFl).\displaystyle=\log_{2}\det(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{U}}\underbrace{\mathbf{\Sigma}\textbf{{V}}^{H}\textbf{{Q}}_{l-1}^{-1}\textbf{{V}}\mathbf{\Sigma}^{H}}_{\triangleq\mathbf{\Lambda}}\textbf{{U}}^{H}\textbf{{F}}_{l}). (37)

Because Ql11\textbf{{Q}}_{l-1}^{-1} is approximately a diagonal matrix with constant diagonal elements, as shown above, 𝚲=𝚺VHQl11V𝚺H\mathbf{\Lambda}=\mathbf{\Sigma}\textbf{{V}}^{H}\textbf{{Q}}_{l-1}^{-1}\textbf{{V}}\mathbf{\Sigma}^{H} becomes approximately a diagonal matrix, and its diagonal elements are in decreasing order. Therefore, the optimal solution of maxFl¯la=log2det(IN+γFlHU𝚲UHFl)\max_{\textbf{{F}}_{l}}\bar{\mathcal{R}}_{l}^{\mathrm{a}}=\log_{2}\det(\textbf{{I}}_{N}+\gamma\textbf{{F}}_{l}^{H}\textbf{{U}}\mathbf{\Lambda}\textbf{{U}}^{H}\textbf{{F}}_{l}) is approximately the matrix with columns being the first NN columns of U, which are the singular vectors corresponding to the NN largest singular values of Hl\textbf{{H}}_{l}, implying the analog combining vectors given in (16) for D-HBF. This completes the proof.

References

  • [1] N. T. Nguyen and K. Lee, “Coverage and Cell-Edge Sum-Rate Analysis of mmWave Massive MIMO Systems With ORP Schemes and MMSE Receivers,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5349–5363, 2018.
  • [2] M. R. Akdeniz, Y. Liu, M. K. Samimi, S. Sun, S. Rangan, T. S. Rappaport, and E. Erkip, “Millimeter wave channel modeling and cellular capacity evaluation,” IEEE J. Sel. Areas Commun., vol. 32, no. 6, pp. 1164–1179, 2014.
  • [3] H. Q. Ngo, A. Ashikhmin, H. Yang, E. G. Larsson, and T. L. Marzetta, “Cell-free massive MIMO: Uniformly great service for everyone,” in IEEE 16th Int. Workshop Signal Process. Advances in Wireless Commun. (SPAWC), 2015, pp. 201–205.
  • [4] ——, “Cell-free massive MIMO versus small cells,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1834–1850, 2017.
  • [5] G. Femenias and F. Riera-Palou, “Cell-Free Millimeter-Wave Massive MIMO Systems With Limited Fronthaul Capacity,” IEEE Access, vol. 7, pp. 44 596–44 612, 2019.
  • [6] G. Interdonato, H. Q. Ngo, P. Frenger, and E. G. Larsson, “Downlink training in cell-free massive MIMO: A blessing in disguise,” IEEE Trans. Wireless Commun., vol. 18, no. 11, pp. 5153–5169, 2019.
  • [7] A. Papazafeiropoulos, P. Kourtessis, M. Di Renzo, S. Chatzinotas, and J. M. Senior, “Performance Analysis of Cell-Free Massive MIMO Systems: A Stochastic Geometry Approach,” IEEE Trans. Veh. Tech., vol. 69, no. 4, pp. 3523–3537, 2020.
  • [8] E. Nayebi, A. Ashikhmin, T. L. Marzetta, and H. Yang, “Cell-free massive MIMO systems,” in IEEE Asilomar Conf. Signals, Systems and Computers, 2015, pp. 695–699.
  • [9] E. Björnson and L. Sanguinetti, “Making cell-free massive MIMO competitive with MMSE processing and centralized implementation,” IEEE Trans. Wireless Commun., 2019.
  • [10] L. D. Nguyen, T. Q. Duong, H. Q. Ngo, and K. Tourki, “Energy efficiency in cell-free massive MIMO with zero-forcing precoding design,” IEEE Commun. Lett., vol. 21, no. 8, pp. 1871–1874, 2017.
  • [11] H. Q. Ngo, L.-N. Tran, T. Q. Duong, M. Matthaiou, and E. G. Larsson, “On the total energy efficiency of cell-free massive MIMO,” IEEE Trans. Green Commun. and Network., vol. 2, no. 1, pp. 25–39, 2017.
  • [12] M. Bashar, K. Cumanan, A. G. Burr, H. Q. Ngo, E. G. Larsson, and P. Xiao, “Energy efficiency of the cell-free massive MIMO uplink with optimal uniform quantization,” IEEE Trans. Green Commun. Networking, vol. 3, no. 4, pp. 971–987, 2019.
  • [13] G. Interdonato, P. Frenger, and E. G. Larsson, “Scalability aspects of cell-free massive MIMO,” in IEEE Int. Conf. Commun. (ICC), 2019, pp. 1–6.
  • [14] E. Björnson and L. Sanguinetti, “Scalable cell-free massive MIMO systems,” IEEE Trans. Commun., 2020.
  • [15] S. E. Hajri, J. Denis, and M. Assaad, “Enhancing Favorable Propagation in Cell-Free Massive MIMO Through Spatial User Grouping,” in IEEE Int. Workshop Signal Process. Advances Wireless Commun. (SPAWC), 2018, pp. 1–5.
  • [16] X. Hu, C. Zhong, X. Chen, W. Xu, H. Lin, and Z. Zhang, “Cell-free massive MIMO systems with low resolution ADCs,” IEEE Trans. Communications, vol. 67, no. 10, pp. 6844–6857, 2019.
  • [17] E. Nayebi, A. Ashikhmin, T. L. Marzetta, H. Yang, and B. D. Rao, “Precoding and power optimization in cell-free massive MIMO systems,” IEEE Trans. Wireless Commun., vol. 16, no. 7, pp. 4445–4459, 2017.
  • [18] J. Zhang, Y. Wei, E. Björnson, Y. Han, and S. Jin, “Performance analysis and power control of cell-free massive MIMO systems with hardware impairments,” IEEE Access, vol. 6, pp. 55 302–55 314, 2018.
  • [19] S. Buzzi and C. D’Andrea, “Cell-free massive MIMO: User-centric approach,” IEEE Wireless Commun. Lett., vol. 6, no. 6, pp. 706–709, 2017.
  • [20] S.-H. Park, O. Simeone, Y. C. Eldar, and E. Erkip, “Optimizing Pilots and Analog Processing for Channel Estimation in Cell-Free Massive MIMO with One-Bit ADCs,” in IEEE Int. Workshop Signal Process. Advances Wireless Commun. (SPAWC), 2018, pp. 1–5.
  • [21] J. Li, D.-W. Yue, and Y. Sun, “Performance analysis of millimeter wave massive mimo systems in centralized and distributed schemes,” IEEE Access, vol. 6, pp. 75 482–75 494, 2018.
  • [22] M. Alonzo and S. Buzzi, “Cell-free and user-centric massive MIMO at millimeter wave frequencies,” in IEEE Annual Int. Symposium Personal, Indoor, and Mobile Radio Communications (PIMRC), 2017, pp. 1–5.
  • [23] M. Alonzo, S. Buzzi, A. Zappone, and C. D’Elia, “Energy-Efficient Power Control in Cell-Free and User-Centric Massive MIMO at Millimeter Wave,” IEEE Trans. Green Commun. Network., 2019.
  • [24] N. T. Nguyen and K. Lee, “Unequally Sub-connected Architecture for Hybrid Beamforming in Massive MIMO Systems,” IEEE Trans. Wireless Commun., vol. 19, no. 2, pp. 1127–1140, Feb. 2020.
  • [25] H. Liu, J. Zhang, X. Zhang, A. Kurniawan, T. Juhana, and B. Ai, “Tabu-search-based pilot assignment for cell-free massive MIMO systems,” IEEE Trans. Veh. Tech., vol. 69, no. 2, pp. 2286–2290, 2019.
  • [26] Y. Jin, J. Zhang, S. Jin, and B. Ai, “Channel estimation for cell-free mmwave massive MIMO through deep learning,” IEEE Trans. Veh. Technol., vol. 68, no. 10, pp. 10 325–10 329, 2019.
  • [27] Z. Chen and E. Björnson, “Channel hardening and favorable propagation in cell-free massive MIMO with stochastic geometry,” IEEE Trans. Commun., vol. 66, no. 11, pp. 5205–5219, 2018.
  • [28] T.-H. Tai, W.-H. Chung, and T.-S. Lee, “A low complexity antenna selection algorithm for energy efficiency in massive mimo systems,” in IEEE Int. Conf. Data Science and Data Intensive Systems, 2015, pp. 284–289.
  • [29] Z. Liu, W. Du, and D. Sun, “Energy and spectral efficiency tradeoff for massive MIMO systems with transmit antenna selection,” IEEE Trans. Veh. Tech., vol. 66, no. 5, pp. 4453–4457, 2016.
  • [30] X. Gao, L. Dai, and A. M. Sayeed, “Low RF-complexity technologies to enable millimeter-wave MIMO with large antenna array for 5G wireless communications,” IEEE Commun. Mag., vol. 56, no. 4, pp. 211–217, 2018.
  • [31] A. Kaushik, J. Thompson, E. Vlachos, C. Tsinos, and S. Chatzinotas, “Dynamic RF Chain Selection for Energy Efficient and Low Complexity Hybrid Beamforming in Millimeter Wave MIMO Systems,” IEEE Trans. Green Commun. Networking, vol. 3, no. 4, pp. 886–900, 2019.
  • [32] A. Alkhateeb, O. El Ayach, G. Leus, and R. W. Heath, “Channel estimation and hybrid precoding for millimeter wave cellular systems,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, pp. 831–846, 2014.
  • [33] O. El Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMO systems,” IEEE Trans. Wireless Commun., vol. 13, no. 3, pp. 1499–1513, 2014.
  • [34] S. Han, I. Chih-Lin, Z. Xu, and C. Rowell, “Large-scale antenna systems with hybrid analog and digital beamforming for millimeter wave 5G,” IEEE Commun. Mag., vol. 53, no. 1, pp. 186–194, 2015.
  • [35] Y.-Y. Lee, C.-H. Wang, and Y.-H. Huang, “A hybrid RF/baseband precoding processor based on parallel-index-selection matrix-inversion-bypass simultaneous orthogonal matching pursuit for millimeter wave MIMO systems,” IEEE Trans. Signal Process., vol. 63, no. 2, pp. 305–317, 2015.
  • [36] T. S. Rappaport, G. R. MacCartney, M. K. Samimi, and S. Sun, “Wideband millimeter-wave propagation measurements and channel models for future wireless communication system design,” IEEE Trans. Commun., vol. 63, no. 9, pp. 3029–3056, 2015.
  • [37] T. S. Rappaport, E. Ben-Dor, J. N. Murdock, and Y. Qiao, “38 GHz and 60 GHz angle-dependent propagation for cellular & peer-to-peer wireless communications,” in IEEE ICC, 2012, pp. 4568–4573.
  • [38] A. Alkhateeb, G. Leus, and R. W. Heath, “Compressed sensing based multi-user millimeter wave systems: How many measurements are needed?” in IEEE Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), 2015, pp. 2909–2913.
  • [39] F. Sohrabi and W. Yu, “Hybrid digital and analog beamforming design for large-scale antenna arrays,” IEEE J. Sel. Topics Signal Process., vol. 10, no. 3, pp. 501–513, 2016.
  • [40] X. Gao, L. Dai, S. Han, I. Chih-Lin, and R. W. Heath, “Energy-efficient hybrid analog and digital precoding for mmWave MIMO systems with large antenna arrays,” IEEE J. Sel. Areas Commun., vol. 34, no. 4, pp. 998–1009, 2016.
  • [41] B. Dai and W. Yu, “Energy efficiency of downlink transmission strategies for cloud radio access networks,” IEEE J. Sel. Areas Commun., vol. 34, no. 4, pp. 1037–1050, 2016.
  • [42] E. Björnson, L. Sanguinetti, J. Hoydis, and M. Debbah, “Optimal design of energy-efficient multi-user MIMO systems: Is massive MIMO the answer?” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3059–3075, 2015.
  • [43] S. Payami, N. M. Balasubramanya, C. Masouros, and M. Sellathurai, “Phase shifters versus switches: An energy efficiency perspective on hybrid beamforming,” IEEE Wireless Commun. Let., vol. 8, no. 1, pp. 13–16, 2018.
  • [44] K. Roth and J. A. Nossek, “Achievable rate and energy efficiency of hybrid and digital beamforming receivers with low resolution ADC,” IEEE J. Sel. Areas Commun., vol. 35, no. 9, pp. 2056–2068, 2017.
  • [45] K. Roth, H. Pirzadeh, A. L. Swindlehurst, and J. A. Nossek, “A comparison of hybrid beamforming and digital beamforming with low-resolution ADCs for multiple users and imperfect CSI,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 3, pp. 484–498, 2018.
  • [46] M. Bashar, H. Q. Ngo, K. Cumanan, A. G. Burr, P. Xiao, E. Björnson, and E. G. Larsson, “Uplink Spectral and Energy Efficiency of Cell-Free Massive MIMO with Optimal Uniform Quantization,” IEEE Trans. Commun., 2020.
  • [47] A. Alkhateeb, G. Leus, and R. W. Heath, “Limited feedback hybrid precoding for multi-user millimeter wave systems,” IEEE Trans. Wireless Commun., vol. 14, no. 11, pp. 6481–6494, 2015.
  • [48] T. E. Bogale, L. B. Le, A. Haghighat, and L. Vandendorpe, “On the number of RF chains and phase shifters, and scheduling design with hybrid analog–digital beamforming,” IEEE Trans. Wireless Commun., vol. 15, no. 5, pp. 3311–3326, 2016.
  • [49] Y. Han, S. Jin, J. Zhang, J. Zhang, and K.-K. Wong, “DFT-based hybrid beamforming multiuser systems: Rate analysis and beam selection,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 3, pp. 514–528, 2018.
  • [50] D. Yang, L.-L. Yang, and L. Hanzo, “DFT-based beamforming weight-vector codebook design for spatially correlated channels in the unitary precoding aided multiuser downlink,” in IEEE Int. Conf. Commun. (ICC), 2010, pp. 1–5.
  • [51] J. Zhang, E. Björnson, M. Matthaiou, D. W. K. Ng, H. Yang, and D. J. Love, “Prospective multiple antenna technologies for beyond 5G,” IEEE J. Sel. Areas Commun., vol. 38, no. 8, pp. 1637–1660, 2020.
  • [52] J. Zhang, L. Dai, X. Li, Y. Liu, and L. Hanzo, “On low-resolution ADCs in practical 5G millimeter-wave massive MIMO systems,” IEEE Commun. Mag., vol. 56, no. 7, pp. 205–211, 2018.