This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Latency Minimization for mmWave D2D Mobile Edge Computing Systems: Joint Task Allocation and Hybrid Beamforming Design

Yanzhen Liu, Yunlong Cai, An Liu, Minjian Zhao, and Lajos Hanzo Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. The work of Y. Cai was supported in part by the National Natural Science Foundation of China under Grants 61971376 and 61831004, and the Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholars under Grant LR19F010002. L. Hanzo would like to acknowledge the financial support of the Engineering and Physical Sciences Research Council projects EP/P034284/1 and EP/P003990/1 (COALESCE) as well as of the European Research Council’s Advanced Fellow Grant QuantCom (Grant No. 789028). (Corresponding author: Yunlong Cai.) Y. Liu, Y. Cai, A. Liu and M. Zhao are with the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). L. Hanzo is with the Department of ECS, University of Southampton, UK (e-mail: [email protected]).
Abstract

Mobile edge computing (MEC) and millimeter wave (mmWave) communications are capable of significantly reducing the network’s delay and enhancing its capacity. In this paper we investigate a mmWave and device-to-device (D2D) assisted MEC system, in which user A carries out some computational tasks and shares the results with user B with the aid of a base station (BS). We assume partial offloading model and the task can be partitioned into two portions: the first part is computed locally at user A, while the second part is transmitted to the BS and computed by the MEC server. The computational results are then sent to user B through a D2D link and via the link from the BS to user B, respectively. To support computation offloading, both the users and the BS are equipped with multiple antennas and employ analog and digital (A/D) hybrid beamforming. Moreover, we propose a novel two-timescale joint hybrid beamforming and task allocation algorithm to reduce the system latency whilst cut down the required signaling overhead. Specifically, the high-dimensional analog beamforming matrices are updated in a frame-based manner based on the channel state information (CSI) samples, where each frame consists of a number of time slots, while the low-dimensional digital beamforming matrices and the offloading ratio are optimized more frequently relied on the low-dimensional effective channel matrices in each time slot. A stochastic successive convex approximation (SSCA) based algorithm is developed to design the long-term analog beamforming matrices. As for the short-term variables, the digital beamforming matrices are optimized relying on the innovative penalty-concave convex procedure (penalty-CCCP) for handling the mmWave non-linear transmit power constraint, and the offloading ratio can be obtained via the derived closed-form solution. Simulation results verify the effectiveness of the proposed algorithm by comparing the benchmarks.

Index Terms:
Mobile edge computing, D2D, mmWave, latency minimization.

I Introduction

Given the rapid growth of computational-intensive mobile applications such as virtual reality (VR) [1], augmented reality (AR) [2], automatic driving [3], and face recognition [4], conventional remote cloud computing centers tend to struggle in meeting the stringent latency requirements of next-generation wireless systems [5]. Mobile edge computing (MEC) - which supports servers at the base station (BS) of cellular networks - has emerged as a promising solution [6, 7]. Thanks to the proximity of the mobile devices to the server, users can directly offload the computational-intensive tasks to the edge server without passing the back-haul networks, which significantly reduces the end-to-end delay and the network burden[8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. Specifically, the works in [8, 9, 10, 11, 12] considered the binary offloading in MEC systems. The authors of [8] studied an energy efficient binary offloading problem and designed optimal scheduling policies for both the mobile execution and cloud execution. An MEC system combined with energy harvesting techniques has been investigated in [9, 10]. In [11], the authors proposed a general framework for offloading tasks from a single user to multiple access points. Moreover, the authors of [12] investigated a joint design problem of the computation offloading decision, the resource allocation, and the content caching strategy. The partial offloading schemes have been proposed to further improve the performance of MEC systems  [13, 14, 15, 16, 17]. In [13], an optimal resource allocation scheme has been proposed for a multi-user MEC system based on time-division multiple access (TDMA) and orthogonal frequency-division multiple access (OFDMA), respectively. The authors of [14] studied a multi-user TDMA partial offloading MEC system, and derived the optimal solution to the delay minimization problem. By taking user cooperation into consideration, the authors of [15] investigated an energy-efficient problem for both binary offloading and partial offloading. To improve edge cloud efficiency with limited communication and computation capacities, the collaboration between cloud computing and edge computing was studied in [16, 17]. Furthermore, the authors of [18, 19, 20] investigated the intelligent reflecting surface (IRS) assisted MEC systems to improve the network efficiency.

However, MEC needs frequent data exchange between the mobile devices and the edge server, which requires a large communication capacity of the radio access network. Taking the 360360-degree immersive VR as an example, even under the 265265 HEVC 1:6001:600 video compression rate, a bit rate of up to 1 Gbps [21] is needed to match the 2×642\times 64 million pixel human-eye accuracy, which is challenging for the current 5th generation (5G) mobile communication technology [22]. Therefore, it is necessary to further enhance the system capacity for beyond 5G MEC systems. The millimeter wave (mmWave) and device-to-device (D2D) communications are exactly two promising techniques. MmWave has tremendous spectral resources and can achieve multi-gigabit transmission capacity. Moreover, at this short wavelength it is possible to integrate a large number of antenna elements in a compact space [23], thus achieving significant beamforming gain. D2D communication supports multiplex of the cellular spectrum, which allows mobile devices in proximity to communicate directly. It features high data rate, low latency, and high throughput [24], which fits the communication needs of MEC systems well. As a result, a number of solutions applying D2D techniques to MEC systems have been proposed to enable direct data transmission and computational resource sharing [25, 26, 28, 27].

To the best of our knowledge, the mmWave and D2D assisted MEC has not been well investigated in the literature. Although the D2D-aided MEC systems have been studied in the aforementioned works [25, 26, 28, 27], they assumed sub-6GHZ band and the tremendous mmWave spectral resource has not been considered to further enhance the capacity. Moreover, despite the tremendous benefits brought by the integration of mmWave, D2D and MEC, there are more challenges compared with existing works. To elaborate, 1) The challenges of the physical layer signal processing incurred in the mmWave frequency band, such as hybrid analog and digital (A/D) beamforming [31, 29, 33, 34, 32, 30], associated non-linear power consumption model, CSI acquisition etc. 2) The design of practical protocols to avoid the occurrence of transmission collisions that may happen in simultaneous uplink/downlink and D2D transmissions. 3) The design of efficient algorithms to solve the challenging non-convex optimization problems.

Hence, to fill this research blank and tackle the above challenges, we investigate a mmWave and D2D assisted MEC system, where user A processes the computational tasks to be solved and then shares the results with user B with the aid of a BS. The investigated model is general and its typical application scenarios include vehicle to vehicle communication [35], VR/AR gaming [36], and ultra high definition video transmission [37]. We assume partial offloading model as in [13, 14, 15, 16, 17], i.e., the task of user A can be partitioned into two portions: the first part is computed locally at user A, while the second part is transmitted to the BS and computed by the MEC server. The computation results are then sent to user B through a D2D link and the link from the BS to user B, respectively. In order to support computation offloading, both the users and the BS are equipped with multiple antennas and employ A/D hybrid beamforming. However, directly solving it by using the single-timescale algorithm requires very high complexity and a large amount of CSI feedback. Thus, we propose a novel two-timescale joint hybrid beamforming and task allocation algorithm to reduce the system latency whilst cut down the required signaling overhead. Specifically, the high-dimensional analog beamforming matrices are updated in a frame-based manner based on the channel state information (CSI) samples, where each frame consists of multiple time slots, while the low-dimensional digital beamforming matrices and the offloading ratio are optimized more frequently relied on the low-dimensional effective channel matrices in each time slot. We respectively formulate a long-term weighted ergodic channel capacity maximization problem and a short-term latency minimization problem for practical design. Our main contributions are summarized as follows:

  • We study a novel scenario that combines MEC with mmWave and D2D to significantly reduce the delay. We consider the raw data and result transmission in the uplink, the downlink, and the D2D link in details and make practical protocols to avoid collisions.

  • For the long-term weighted ergodic channel capacity maximization problem, a stochastic successive convex approximation (SSCA) based algorithm is developed for designing the analog beamforming matrices, which employs surrogate functions to approximate the original problem and converges to a stationary feasible solution.

  • Regarding the design of the digital beamforming matrices and offloading ratio, we equivalently decompose the short-term latency minimization problem into several decoupled subproblems. For the subproblems w.r.t. the digital beamforming matrices, an efficient penalty-CCCP based algorithm is proposed to tackle the nonlinear mmWave transmit power constraints. We also develop a low-complexity heuristic algorithm to design the digital beamforming matrices for performance-complexity trade-off.

  • The closed-form expressions of offloading ratio are derived based on classified discussion. We compare our proposed joint design algorithm of the task allocation and hybrid beamforming with the conventional algorithms in the simulation. The results verify the effectiveness of our proposed joint design algorithm.

The paper is structured as follows. Section II describes the system model. Section III formulates the two-timescale problem under investigation. The long-term analog beamforming design problem is solved in Section IV while the solutions to the design of the short-term digital beamforming matrices and the optimal offloading ratio are given in Section V. Section VI presents the simulation results and Section VII concludes this paper.

Notations: Scalars, vectors and matrices are denoted by lower case, boldface lower case and boldface upper case letters, respectively. 𝐈\mathbf{I} represents an identity matrix and 𝟎\mathbf{0} denotes an all-zero matrix. For a matrix 𝐀\mathbf{A}, 𝐀T{{\mathbf{A}}^{T}}, conj(𝐀)\textrm{conj}(\mathbf{A}), 𝐀H{{\mathbf{A}}^{H}}, 𝐀\mathbf{A}^{\dagger} and 𝐀\|\mathbf{A}\| denote its transpose, conjugate, conjugate transpose, Moore-Penrose inverse and Frobenius norm, respectively. For a square matrix 𝐀\mathbf{A}, Tr{𝐀}\textrm{Tr}\{\mathbf{A}\} and 𝐀1\mathbf{A}^{-1} denotes its trace and inverse, respectively, while 𝐀𝟎(𝐀𝟎){\mathbf{A}}\succeq{\mathbf{0}}~{}({\mathbf{A}}\preceq{\mathbf{0}}) means that 𝐀\mathbf{A} is positive (negative) semi-definite. For a vector 𝐚\mathbf{a}, 𝐚\|\mathbf{a}\| represents its Euclidean norm. {}\Re\{\cdot\} ({}\Im\{\cdot\}) denotes the real (imaginary) part of a variable. |||\cdot| denotes the absolute value of a complex scalar. m×n(m×n){\mathbb{C}^{m\times n}}\;({\mathbb{R}^{m\times n}}) denotes the space of m×n{m\times n} complex (real) matrices. \angle denotes the angle operator.

II System Model

Refer to caption

Figure 1: mmWave D2D MEC system.

In this section, we introduce the investigated system model. As shown in Fig. 1, we consider a system consisting of user A, user B and a BS with a MEC server. User A aims to process computation tasks and share the results with user B with the aid of the BS. We assume partial offloading model as as [13, 14, 15, 16, 17], i.e., user A has a total of LL bits task to be processed, and this task can be divided into two parts: ρL\rho L bits and (1ρ)L(1-\rho)L bits, where ρ\rho denotes the offloading ratio. The first part is transmitted to the BS and computed by the MEC server, and the second part is computed at the local CPU of user A. The computational results are transmitted to user B through the D2D link and the downlink (between the BS and user B), respectively.111It is worth mentioning that unlike the works in [28] where the authors utilize the computation resource of both the MEC server and the D2D users, we do not use the computation resource of user B because transmitting the raw data is time-consuming while the computing capacity at user B has no advantages over that of the BS. Both the users and the BS are equipped with A/D hybrid beamformers and work in mmWave band (The hybrid beamforming architecture of the BS is not plotted here since it is similar with that of the users).

II-A Computation model

In this paper, we adopt a general compression model and denote the computational results as αL\alpha L bits, where 0α10\leq\alpha\leq 1 denotes the compression ratio for the computation task and can be chosen as different values based on the category of the task and the adopted algorithm [38]. Defining KLLFLK_{L}\triangleq\frac{L}{F_{L}}, KELFEK_{E}\triangleq\frac{L}{F_{E}}, K1LR1K_{1}\triangleq\frac{L}{R_{1}}, K2αLR2K_{2}\triangleq\frac{\alpha L}{R_{2}} and K3αLR3K_{3}\triangleq\frac{\alpha L}{R_{3}} for convenience of notation, where FLF_{L} and FEF_{E} stand for the local computing capacity and the edge computing capacity (computing capacity of the MEC server), respectively, and R1R_{1}, R2R_{2} and R3R_{3} represent the transmission rates of the uplink (from user A to the BS), downlink and D2D link, respectively. We express different delays as follows,

  • The local computing time: TLc=(1ρ)KLT_{L}^{c}=(1-\rho)K_{L}.

  • The computing time at the MEC server: TEc=ρKET_{E}^{c}=\rho K_{E}.

  • The offloading time from user A to the BS: Tupt=ρK1T_{up}^{t}=\rho K_{1}.

  • The delay for transmitting the computational result from the BS to user B: Tdownt=ρK2T_{down}^{t}=\rho K_{2}.

  • The delay for transmitting the computational result from user A to user B: TD2Dt=(1ρ)K3T_{D2D}^{t}=(1-\rho)K_{3}.

Let us consider the process of computation and transmission more concretely. As shown in Fig. 2, there are four cases in total.

Refer to caption

Figure 2: The timeline of different offloading schemes.
  • Case 1: TuptTLcT_{up}^{t}\geq T_{L}^{c} and TEcTD2DtT_{E}^{c}\geq T_{D2D}^{t}. In this case, the local computing at user A finishes before the edge offloading. Thus user A has to wait until the task offloading is over to send the local computing result to user B through the D2D link. Moreover, in this case the transmission of the local computing result ends before the edge computing. Hence, the BS can send the edge computing results to user B directly without waiting until the transmission of D2D link is over.

  • Case 2: TuptTLcT_{up}^{t}\geq T_{L}^{c} and TEc<TD2DtT_{E}^{c}<T_{D2D}^{t}. In this case, the local computing at user A also finishes before the edge offloading. Hence, similar with Case 1, user A has to wait until the task offloading is over. However, we consider that the edge computing finishes before the transmission of the local computing result. Under this situation, the BS has to wait until the D2D link transmission is over to send the edge computing results to user B. Otherwise, collisions would happen at user B.

  • Case 3: Tupt<TLcT_{up}^{t}<T_{L}^{c} and Tupt+TEc<TLc+TD2DtT_{up}^{t}+T_{E}^{c}<T_{L}^{c}+T_{D2D}^{t}. In this case, the edge offloading ends before the local computing. Hence user A can send the local computing results to user B through the D2D link directly since the communication resource is available at the moment. Moreover, in this case the edge computing finishes before the D2D transmission of the local computing result from user A to user B. Thus, the BS has to wait until the D2D link transmission is over to send the edge computing result to user B.222It is also possible that the edge computing finishes before the local computing. However, if the BS transmits the edge computing results to user B immediately, collisions may happen because user A does not know when the BS finishes its transmission and may send the local computing results to user B simultaneously. Thus, we assume that user A has a priority to transmit results to user B compared to the BS, even if the computation at the BS ends earlier.

  • Case 4: Tupt<TLcT_{up}^{t}<T_{L}^{c} and Tupt+TEcTLc+TD2DtT_{up}^{t}+T_{E}^{c}\geq T_{L}^{c}+T_{D2D}^{t}. In this case, the edge offloading ends before the local computing, and the D2D link transmission finishes before the edge computing. As a result, no wait happens.

According to the four cases discussed above, we obtain the expression for the overall system delay as follows,

Ttotal={Tupt+max{TEc,TD2Dt}+Tdownt,TuptTLc,max{Tupt+TEc,TD2Dt+TLc}+Tdownt,Tupt<TLc.T_{total}=\begin{cases}T_{up}^{t}+max\{T_{E}^{c},T_{D2D}^{t}\}+T_{down}^{t},\\ \qquad\qquad\qquad\qquad\qquad T_{up}^{t}\geq T_{L}^{c},\\ max\{T_{up}^{t}+T_{E}^{c},T_{D2D}^{t}+T_{L}^{c}\}+T_{down}^{t},\\ \qquad\qquad\qquad\qquad\qquad T_{up}^{t}<T_{L}^{c}.\end{cases} (1)

II-B Communication model

Consider the three mmWave links that adopt hybrid A/D beamforming structures. User A and user B are equipped with NaN_{a}, NbN_{b} antennas, respectively, and NrfaN_{rfa} (NrfaNa)(N_{rfa}\leq N_{a}), NrfbN_{rfb} (NrfbNb)(N_{rfb}\leq N_{b}) RF chains, respectively, while the BS has NN antennas and NrfN_{rf} (NrfN)(N_{rf}\leq N) RF chains. Let 𝐬1d1×1\mathbf{s}_{1}\in\mathbb{C}^{d_{1}\times 1}, 𝐬2d2×1\mathbf{s}_{2}\in\mathbb{C}^{d_{2}\times 1} and 𝐬3d3×1\mathbf{s}_{3}\in\mathbb{C}^{d_{3}\times 1} 𝒞𝒩(𝟎,𝐈)\sim\mathcal{CN}(\mathbf{0},\mathbf{I}) denote the data symbols that transmitted from user A to the BS, the BS to user B and user A to user B, respectively. The received signal at the uplink, the downlink and the D2D link can be written as333 Here we adopt a single analog beamforming matrix 𝐅a\mathbf{F}_{a} at user A and 𝐅b\mathbf{F}_{b} at user B for different transmission phases, because this scheme can avoid frequent hand-off of the analog beamforming matrices with acceptable performance loss. Moreover, although we introduce the proposed algorithm under this case, it can be readily extended to the situation that there are independent analog beamforming matrices for different transmission stages.

𝐲1\displaystyle\mathbf{y}_{1} =𝐕1H𝐔1H𝐇1𝐅a𝐖a1𝐬1+𝐕1H𝐔1H𝐧1,\displaystyle=\mathbf{V}_{1}^{H}\mathbf{U}_{1}^{H}\mathbf{H}_{1}\mathbf{F}_{a}\mathbf{W}_{a1}\mathbf{s}_{1}+\mathbf{V}_{1}^{H}\mathbf{U}_{1}^{H}\mathbf{n}_{1},
𝐲2\displaystyle\mathbf{y}_{2} =𝐖b2H𝐅bH𝐇2𝐕2𝐔2𝐬2+𝐖b2H𝐅bH𝐧2,\displaystyle=\mathbf{W}_{b2}^{H}\mathbf{F}_{b}^{H}\mathbf{H}_{2}\mathbf{V}_{2}\mathbf{U}_{2}\mathbf{s}_{2}+\mathbf{W}_{b2}^{H}\mathbf{F}_{b}^{H}\mathbf{n}_{2},
𝐲3\displaystyle\mathbf{y}_{3} =𝐖b3H𝐅bH𝐇3𝐅a𝐖a3𝐬3+𝐖b3H𝐅bH𝐧3,\displaystyle=\mathbf{W}_{b3}^{H}\mathbf{F}_{b}^{H}\mathbf{H}_{3}\mathbf{F}_{a}\mathbf{W}_{a3}\mathbf{s}_{3}+\mathbf{W}_{b3}^{H}\mathbf{F}_{b}^{H}\mathbf{n}_{3},

respectively, where 𝐖a1Nrfa×d1\mathbf{W}_{a1}\in\mathbb{C}^{N_{rfa}\times d_{1}} and 𝐖a3Nrfa×d3\mathbf{W}_{a3}\in\mathbb{C}^{N_{rfa}\times d_{3}} represent the transmitting digital beamforming matrices of user A for the uplink and the D2D link, respectively. 𝐅aNa×Nrfa\mathbf{F}_{a}\in\mathbb{C}^{N_{a}\times N_{rfa}} represents the long-term transmitting analog beamforming matrices of user A. 𝐖b2Nrfb×d2\mathbf{W}_{b2}\in\mathbb{C}^{N_{rfb}\times d_{2}} and 𝐖b3Nrfb×d3\mathbf{W}_{b3}\in\mathbb{C}^{N_{rfb}\times d_{3}} represent the receiving digital beamforming matrices of user B for the downlink and the D2D link, respectively. 𝐅bNb×Nrfb\mathbf{F}_{b}\in\mathbb{C}^{N_{b}\times N_{rfb}} represents the long-term receiving analog beamforming matrices of user B. 𝐕1Nrf×d1\mathbf{V}_{1}\in\mathbb{C}^{N_{rf}\times d_{1}} and 𝐕2Nrf×d2\mathbf{V}_{2}\in\mathbb{C}^{N_{rf}\times d_{2}} represent the receiving and transmitting digital beamforming vectors at the BS, respectively, and 𝐔1N×Nrf\mathbf{U}_{1}\in\mathbb{C}^{N\times N_{rf}} and 𝐔2N×Nrf\mathbf{U}_{2}\in\mathbb{C}^{N\times N_{rf}} represent the long-term receiving and transmitting analog beamforming matrices at the BS, respectively. 𝐇1N×Na\mathbf{H}_{1}\in\mathbb{C}^{N\times N_{a}}, 𝐇2Nb×N\mathbf{H}_{2}\in\mathbb{C}^{N_{b}\times N}, and 𝐇3Nb×Na\mathbf{H}_{3}\in\mathbb{C}^{N_{b}\times N_{a}} denote the channel matrices of the uplink, downlink and D2D link, respectively, 𝔼{𝐧1𝐧1H}=σ12𝐈\mathbb{E}\{\mathbf{n}_{1}\mathbf{n}_{1}^{H}\}=\sigma_{1}^{2}\mathbf{I}, 𝔼{𝐧2𝐧2H}=σ22𝐈\mathbb{E}\{\mathbf{n}_{2}\mathbf{n}_{2}^{H}\}=\sigma_{2}^{2}\mathbf{I}, and 𝔼{𝐧3𝐧3H}=σ32𝐈\mathbb{E}\{\mathbf{n}_{3}\mathbf{n}_{3}^{H}\}=\sigma_{3}^{2}\mathbf{I} denote the zero mean additive white Gaussian noise of the uplink, downlink and D2D link, respectively.

With the above definitions, we write the transmission rate for the uplink R1R_{1}, downlink R2R_{2}, and D2D link R3R_{3}, respectively as (2)-(4)444We do not include the receiving digital beamformers in the rate expressions because it is well-known that the optimal digital receivers (i.e., minimum mean square error (MMSE) receivers) can achieve the maximum system rate, see [48] for more details.,

R1\displaystyle\small R_{1}\!\!\!\! =\displaystyle=\!\!\!\! B1logdet[𝐈+1σ12𝐔1H𝐇1𝐅a𝐖a1𝐖a1H𝐅aH𝐇1H𝐔1(𝐔1H𝐔1)1],\displaystyle B_{1}\log\det[\mathbf{I}+\frac{1}{\sigma_{1}^{2}}\mathbf{U}_{1}^{H}\mathbf{H}_{1}\mathbf{F}_{a}\mathbf{W}_{a1}\mathbf{W}_{a1}^{H}\mathbf{F}_{a}^{H}\mathbf{H}_{1}^{H}\mathbf{U}_{1}(\mathbf{U}_{1}^{H}\mathbf{U}_{1})^{-1}], (2)
R2\displaystyle R_{2}\!\!\!\! =\displaystyle=\!\!\!\! B2logdet[𝐈+1σ22𝐅bH𝐇2𝐔2𝐕2𝐕2H𝐔2H𝐇2H𝐅b(𝐅bH𝐅b)1],\displaystyle B_{2}\log\det[\mathbf{I}+\frac{1}{\sigma_{2}^{2}}\mathbf{F}_{b}^{H}\mathbf{H}_{2}\mathbf{U}_{2}\mathbf{V}_{2}\mathbf{V}_{2}^{H}\mathbf{U}_{2}^{H}\mathbf{H}_{2}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}], (3)
R3\displaystyle R_{3}\!\!\!\! =\displaystyle=\!\!\!\! B3logdet[𝐈+1σ32𝐅bH𝐇3𝐅a𝐖a3𝐖a3H𝐅aH𝐇3H𝐅b(𝐅bH𝐅b)1],\displaystyle B_{3}\log\det[\mathbf{I}+\frac{1}{\sigma_{3}^{2}}\mathbf{F}_{b}^{H}\mathbf{H}_{3}\mathbf{F}_{a}\mathbf{W}_{a3}\mathbf{W}_{a3}^{H}\mathbf{F}_{a}^{H}\mathbf{H}_{3}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}], (4)

where B1B_{1}, B2B_{2} and B3B_{3} represent the bandwidth of the uplink, downlink and D2D link, respectively.

In practice, the relationship between the circuit power and the output power may be non-linear due to the working mode of RF power amplifiers (PA) in mmWave band [39]. Hence, it is necessary to take the non-linear energy efficiency of PAs into consideration. Specifically, we consider the Doherty PA in this paper, which is one of the most widely used PA architecture in high frequency band that has enhanced energy efficiency and linearity [40]. The relationship between the output power PoutP_{out} and the actual PA power consumption PPAP_{PA} is given by [41]

PPA={2PoutPmax/π,0<Pout0.25Pmax,6PoutPmax/π2Pmax/π,0.25Pmax<Pout<Pmax,P_{PA}=\begin{cases}2\sqrt{P_{out}P_{max}}/\pi,\\ \qquad\qquad 0<P_{out}\leq 0.25P_{max},\\ 6\sqrt{P_{out}P_{max}}/\pi-2P_{max}/\pi,\\ \qquad\qquad 0.25P_{max}<P_{out}<P_{max},\end{cases} (5)

where PmaxP_{max} is the maximum output power of the PA. In order to compute the total power consumption of all PAs, we must calculate the output power of each PA first, which is given by

Pout1(i)=𝔼(|𝐅a(i,:)𝐖a1𝐬1|2)=𝐅a(i,:)𝐖a12,iP_{out1}(i)=\mathbb{E}(|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\mathbf{s}_{1}|^{2})=\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|^{2},\forall i (6)
Pout2(i)=𝔼(|𝐔2(i,:)𝐕2𝐬2|2)=𝐔2(i,:)𝐕22,iP_{out2}(i)=\mathbb{E}(|\mathbf{U}_{2}(i,:)\mathbf{V}_{2}\mathbf{s}_{2}|^{2})=\|\mathbf{U}_{2}(i,:)\mathbf{V}_{2}\|^{2},\forall i (7)
Pout3(i)=𝔼(|𝐅a(i,:)𝐖a3𝐬3|2)=𝐅a(i,:)𝐖a32,iP_{out3}(i)=\mathbb{E}(|\mathbf{F}_{a}(i,:)\mathbf{W}_{a3}\mathbf{s}_{3}|^{2})=\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a3}\|^{2},\forall i (8)

where Pout1(i)P_{out1}(i) and Pout3(i)P_{out3}(i) represent the output power of the iith PA of user A in the uplink and D2D link, respectively, and Pout2(i)P_{out2}(i) represents the output power of the iith PA of the BS in the downlink. By substituting (6)-(8) into (5), the power consumption of each PA, i.e. PPA1(i)P_{PA1}(i), PPA2(i)P_{PA2}(i) and PPA3(i)P_{PA3}(i) can be obtained.

III Problem Formulation

Based on the system model introduced above, we provide the two-timescale latency minimization problem in this section. We first introduce the proposed two-timescale scheme. Then, we formulate the long-term optimization problem and the short-term optimization problem, respectively. For the former, we seek to maximize the weighted ergodic channel capacity. As for the latter, we minimize the overall system latency. The details are given as follows.

III-A Two-timescale scheme

In a typical mobile radio environment, the channel matrices appearing in the system model of Section II will exhibit a random behavior and change more or less rapidly over time. The joint design of A/D hybrid beamforming matrices and offloading ratio for each channel instance is not realistic for implementation since as it requires repeated application of a search-based algorithm with extremely high computational complexity [42]. Moreover, this approach entails a huge amount of overhead in the estimation and exchange of real-time CSI information, and is likely to be very sensitive to CSI delays. Therefore, to circumvent these difficulties, we hereby propose a practical two-timescale hybrid beamforming scheme that takes into account changes in both the instantaneous CSI and their local statistics. Let us define the following concepts of timescales:

  • Long-timescale: The time interval over which the channel statistics555In this work, channel statistics refer to the moments or distribution of the channel fading realizations. The proposed long-term beamforming design only has to obtain a single (potentially outdated) channel sample at each frame. By observing one channel sample at each time, our proposed algorithm can automatically learn the channel statistics and converge to a stationary point of the considered stochastic optimization problem. are assumed constant.

  • Short-timescale: The time interval over which the channel gains are assumed constant, i.e. the channel coherence time.

Refer to caption

Figure 3: The two-timescale model.

As illustrated in Fig. 3, the time domain is divided into a number of super frames within which the channel statistics are invariant. Each super frame consist of TfT_{f} frames, and each frame is further divided into TsT_{s} time slots. In our proposed approach, to reduce CSI overhead, we only make use of one complete estimated CSI at the end of each frame, while we employ a so-called effective CSI (the multiplication of the analog beamforming matrices and the CSI matrices, take the uplink as an example, 𝐇ef1𝐔1H𝐇1𝐅aNrf×Nrfa\mathbf{H}_{ef1}\triangleq\mathbf{U}_{1}^{H}\mathbf{H}_{1}\mathbf{F}_{a}\in\mathbb{C}^{N_{rf}\times N_{rfa}} while 𝐇1N×Na\mathbf{H}_{1}\in\mathbb{C}^{N\times N_{a}}) with reduce dimension within each time slot. The short-timescale digital beamforming matrices and the offloading ratio are optimized in each time slot by using the effective real-time channel matrices with reduced dimension, and the long-timescale analog beamforming matrices are updated at the end of each frame based on estimated (possibly outdated) CSI. In the following, we formulate the long-term optimization problem and the short-term optimization problem, respectively.

Remark 1.

We assume the tasks can be finished within a channel coherence time. If the user has too many tasks and cannot finish in a single time slot, then he can allocate his tasks to multiple time slots and ensure as much tasks being finished in a time slot as possible. Hence we focus on the latency minimization in a single time slot in this paper.

III-B Problem formulation

Note that the long-timescale analog beamforming matrices should be optimized based on the CSI statistics over a long-term scale, and we cannot directly optimize them by minimizing the system latency that relies on the optimal digital beamforming matrices and offloading ratio for all channel realizations. To overcome this difficulty, we propose to optimize the analog beamforming matrices by maximizing the weighted ergodic sum capacity, that does not depend on the short-term variables. Then, we minimize the system latency by optimizing the digital beamforming matrices and offloading ratio in each time slot.666Note that it makes sense that the maximization of the channel capacity by designing long-term analog beamforming matrices can help minimize the system latency and this formulation is more suitable for practical design. Moreover, we validate the effectiveness of the proposed algorithm in our simulation.

1) The long-term master problem for designing analog beamforming matrices yields

𝒫1:max𝜽𝐔1,𝜽𝐔2,𝜽𝐅a,𝜽𝐅bf(𝜽𝐔1,𝜽𝐔2,𝜽𝐅a,𝜽𝐅b)𝔼{g(𝜽𝐔1,𝜽𝐔2,𝜽𝐅a,𝜽𝐅b)}w1C¯1+w2C¯2+w3C¯3,\begin{split}\mathcal{P}1:\max_{\bm{\theta}_{\mathbf{U}_{1}},\bm{\theta}_{\mathbf{U}_{2}},\bm{\theta}_{\mathbf{F}_{a}},\bm{\theta}_{\mathbf{F}_{b}}}&f(\bm{\theta}_{\mathbf{U}_{1}},\bm{\theta}_{\mathbf{U}_{2}},\bm{\theta}_{\mathbf{F}_{a}},\bm{\theta}_{\mathbf{F}_{b}})\\ &\triangleq\mathbb{E}\{g(\bm{\theta}_{\mathbf{U}_{1}},\bm{\theta}_{\mathbf{U}_{2}},\bm{\theta}_{\mathbf{F}_{a}},\bm{\theta}_{\mathbf{F}_{b}})\}\\ &\triangleq w_{1}\bar{C}_{1}+w_{2}\bar{C}_{2}+w_{3}\bar{C}_{3},\end{split} (9)

where we define 𝜽𝐔1=𝐔1\bm{\theta}_{\mathbf{U}_{1}}=\angle\mathbf{U}_{1}, 𝜽𝐔2=𝐔2\bm{\theta}_{\mathbf{U}_{2}}=\angle\mathbf{U}_{2}, 𝜽𝐅a=𝐅a\bm{\theta}_{\mathbf{F}_{a}}=\angle\mathbf{F}_{a} and 𝜽𝐅b=𝐅b\bm{\theta}_{\mathbf{F}_{b}}=\angle\mathbf{F}_{b} for convenience to meet the unit modulus constraint, and the weight w1,w2w_{1},w_{2} and w3w_{3} can be empirically chosen based on the transmission tasks of the corresponding link. Specifically, we choose the weight as wk=L¯ki=13L¯i,k=1,2,3w_{k}=\frac{\bar{L}_{k}}{\sum_{i=1}^{3}\bar{L}_{i}},k=1,2,3, where L¯1\bar{L}_{1}, L¯2\bar{L}_{2} and L¯3\bar{L}_{3} denote the total number of transmission bits of the uplink, the downlink and the D2D link in the last super frame. Please note that this formulation is quite similar with that of maximizing the queue-length-weighted sum rate, which is widely adopted in the area of wireless resource scheduling [43, 44, 45], and it is also reasonable to use the number of transmission data in the last super frame because this information is available at the BS and the statistics between two adjacent super frames are much alike. By defining

C1\displaystyle C_{1}\triangleq logdet[𝐈+1σ12𝐔1H𝐇1𝐅a𝐅aH𝐇1H𝐔1(𝐔1H𝐔1)1],\displaystyle\!\!\log\det[\mathbf{I}+\frac{1}{\sigma_{1}^{2}}\mathbf{U}_{1}^{H}\mathbf{H}_{1}\mathbf{F}_{a}\mathbf{F}_{a}^{H}\mathbf{H}_{1}^{H}\mathbf{U}_{1}(\mathbf{U}_{1}^{H}\mathbf{U}_{1})^{-1}], (10)
C2\displaystyle C_{2}\triangleq logdet[𝐈+1σ22𝐅bH𝐇2𝐔2𝐔2H𝐇2H𝐅b(𝐅bH𝐅b)1],\displaystyle\!\!\!\!\log\det[\mathbf{I}+\frac{1}{\sigma_{2}^{2}}\mathbf{F}_{b}^{H}\mathbf{H}_{2}\mathbf{U}_{2}\mathbf{U}_{2}^{H}\mathbf{H}_{2}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}], (11)
C3\displaystyle C_{3}\triangleq logdet[𝐈+1σ32𝐅bH𝐇3𝐅a𝐅aH𝐇3H𝐅b(𝐅bH𝐅b)1],\displaystyle\!\!\!\!\!\!\log\det[\mathbf{I}+\frac{1}{\sigma_{3}^{2}}\mathbf{F}_{b}^{H}\mathbf{H}_{3}\mathbf{F}_{a}\mathbf{F}_{a}^{H}\mathbf{H}_{3}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}], (12)

then, the expressions of the ergodic channel capacity for the link between user A and the BS, the link between the BS and user B, and the link between user A and user B, are given by C¯1𝔼{C1}\bar{C}_{1}\triangleq\mathbb{E}\{C_{1}\}, C¯2𝔼{C2}\bar{C}_{2}\triangleq\mathbb{E}\{C_{2}\}, and C¯3𝔼{C3}\bar{C}_{3}\triangleq\mathbb{E}\{C_{3}\}, respectively [46], and g(𝜽𝐔1,𝜽𝐔2,𝜽𝐅a,𝜽𝐅b)w1C1+w2C2+w3C3g(\bm{\theta}_{\mathbf{U}_{1}},\bm{\theta}_{\mathbf{U}_{2}},\bm{\theta}_{\mathbf{F}_{a}},\bm{\theta}_{\mathbf{F}_{b}})\triangleq w_{1}C_{1}+w_{2}C_{2}+w_{3}C_{3}.

2) The short-term optimization problem for designing the digital beamforming matrices and offloading ratio can be expressed as

𝒫2:min𝒮\displaystyle\mathcal{P}2:\min_{\mathcal{S}}\,\, Ttotal\displaystyle T_{total} (13a)
s.t. 0ρ1,\displaystyle 0\leq\rho\leq 1, (13b)
i=1NaPPA1(i)PUA,i=1NaPPA3(i)PUA,\displaystyle\sum_{i=1}^{N_{a}}P_{PA1}(i)\leq P_{UA},\sum_{i=1}^{N_{a}}P_{PA3}(i)\leq P_{UA}, (13c)
i=1NPPA2(i)PBS,\displaystyle\sum_{i=1}^{N}P_{PA2}(i)\leq P_{BS}, (13d)
0Poutk(i)Pmax,k=1,2,3,i,\displaystyle 0\leq P_{outk}(i)\leq P_{max},k=1,2,3,\forall i, (13e)

where 𝒮{ρ,𝐖a1,𝐖a3,𝐕2}\mathcal{S}\triangleq\{\rho,\mathbf{W}_{a1},\mathbf{W}_{a3},\mathbf{V}_{2}\} denotes the set of the short-term optimization variables. (13c) and (13d) denote the transmit power constraints at user A and the BS, respectively. (13e) denotes the output power constraints of the PAs.

IV Long-term analog beamforming design

In this section, we introduce the proposed long-term analog beamforming design algorithm for solving 𝒫1\mathcal{P}1. The original problem cannot be solved straightforwardly due to the stochastic and non-convex objective function. However, based on the theoretical framework exposed in [47], we seek to approximate the original objective function (9) by using a quadratic surrogate function. Specifically, at the end of each channel frame tt, the channel samples 𝐇1t\mathbf{H}_{1}^{t}, 𝐇2t\mathbf{H}_{2}^{t} and 𝐇3t\mathbf{H}_{3}^{t} are obtained and the surrogate objective function is updated based on these channel samples and the approximated gradients as follows,

f¯t(𝜽𝐔1,𝜽𝐔2,𝜽𝐅a,𝜽𝐅b)=f~t+(𝐟𝐔1t)T(𝜽𝐔1𝜽𝐔1t)+(𝐟𝐔2t)T(𝜽𝐔2𝜽𝐔2t)+(𝐟𝐅at)T(𝜽𝐅a𝜽𝐅at)+(𝐟𝐅bt)T(𝜽𝐅b𝜽𝐅bt)+ϖ𝜽𝐔1𝜽𝐔1t2+ϖ𝜽𝐔2𝜽𝐔2t2+ϖ𝜽𝐅a𝜽𝐅at2+ϖ𝜽𝐅b𝜽𝐅bt2,\begin{split}\bar{f}^{t}(\bm{\theta}_{\mathbf{U}_{1}},&\bm{\theta}_{\mathbf{U}_{2}},\bm{\theta}_{\mathbf{F}_{a}},\bm{\theta}_{\mathbf{F}_{b}})=\tilde{f}^{t}+(\mathbf{f}_{\mathbf{U}_{1}}^{t})^{T}(\bm{\theta}_{\mathbf{U}_{1}}-\bm{\theta}_{\mathbf{U}_{1}}^{t})\\ &+(\mathbf{f}_{\mathbf{U}_{2}}^{t})^{T}(\bm{\theta}_{\mathbf{U}_{2}}-\bm{\theta}_{\mathbf{U}_{2}}^{t})+(\mathbf{f}_{\mathbf{F}_{a}}^{t})^{T}(\bm{\theta}_{\mathbf{F}_{a}}-\bm{\theta}_{\mathbf{F}_{a}}^{t})\\ &+(\mathbf{f}_{\mathbf{F}_{b}}^{t})^{T}(\bm{\theta}_{\mathbf{F}_{b}}-\bm{\theta}_{\mathbf{F}_{b}}^{t})+\varpi\|\bm{\theta}_{\mathbf{U}_{1}}-\bm{\theta}_{\mathbf{U}_{1}}^{t}\|^{2}\\ &+\varpi\|\bm{\theta}_{\mathbf{U}_{2}}-\bm{\theta}_{\mathbf{U}_{2}}^{t}\|^{2}+\varpi\|\bm{\theta}_{\mathbf{F}_{a}}-\bm{\theta}_{\mathbf{F}_{a}}^{t}\|^{2}\\ &+\varpi\|\bm{\theta}_{\mathbf{F}_{b}}-\bm{\theta}_{\mathbf{F}_{b}}^{t}\|^{2},\end{split} (14)

where ϖ\varpi is a constant, and f~t\tilde{f}^{t}, 𝐟𝐔1t\mathbf{f}_{\mathbf{U}_{1}}^{t}, 𝐟𝐔2t\mathbf{f}_{\mathbf{U}_{2}}^{t}, 𝐟𝐅at\mathbf{f}_{\mathbf{F}_{a}}^{t} and 𝐟𝐅b\mathbf{f}_{\mathbf{F}_{b}} denote the approximation of the objective function ff, the partial derivatives f𝜽𝐔1\frac{\partial f}{\partial\bm{\theta}_{\mathbf{U}_{1}}}, f𝜽𝐔2\frac{\partial f}{\partial\bm{\theta}_{\mathbf{U}_{2}}}, f𝜽𝐅a\frac{\partial f}{\partial\bm{\theta}_{\mathbf{F}_{a}}} and f𝜽𝐅b\frac{\partial f}{\partial\bm{\theta}_{\mathbf{F}_{b}}}, respectively. The quantities can be updated based on the following expressions

f~t=(1εt)f~t1εtg(𝜽𝐔1t,𝜽𝐔2t,𝜽𝐅at,𝜽𝐅bt),\tilde{f}^{t}=(1-\varepsilon^{t})\tilde{f}^{t-1}-\varepsilon^{t}g(\bm{\theta}_{\mathbf{U}_{1}}^{t},\bm{\theta}_{\mathbf{U}_{2}}^{t},\bm{\theta}_{\mathbf{F}_{a}}^{t},\bm{\theta}_{\mathbf{F}_{b}}^{t}), (15)
𝐟𝐔1t=(1εt)𝐟𝐔1t1εtg𝜽𝐔1(𝜽𝐔1t,𝜽𝐔2t,𝜽𝐅at,𝜽𝐅bt),\mathbf{f}_{\mathbf{U}_{1}}^{t}=(1-\varepsilon^{t})\mathbf{f}_{\mathbf{U}_{1}}^{t-1}-\varepsilon^{t}\frac{\partial g}{\partial\bm{\theta}_{\mathbf{U}_{1}}}(\bm{\theta}_{\mathbf{U}_{1}}^{t},\bm{\theta}_{\mathbf{U}_{2}}^{t},\bm{\theta}_{\mathbf{F}_{a}}^{t},\bm{\theta}_{\mathbf{F}_{b}}^{t}), (16)
𝐟𝐔2t=(1εt)𝐟𝐔2t1εtg𝜽𝐔2(𝜽𝐔1t,𝜽𝐔2t,𝜽𝐅at,𝜽𝐅bt),\mathbf{f}_{\mathbf{U}_{2}}^{t}=(1-\varepsilon^{t})\mathbf{f}_{\mathbf{U}_{2}}^{t-1}-\varepsilon^{t}\frac{\partial g}{\partial\bm{\theta}_{\mathbf{U}_{2}}}(\bm{\theta}_{\mathbf{U}_{1}}^{t},\bm{\theta}_{\mathbf{U}_{2}}^{t},\bm{\theta}_{\mathbf{F}_{a}}^{t},\bm{\theta}_{\mathbf{F}_{b}}^{t}), (17)
𝐟𝐅at=(1εt)𝐟𝐅at1εtg𝜽𝐅a(𝜽𝐔1t,𝜽𝐔2t,𝜽𝐅at,𝜽𝐅bt),\mathbf{f}_{\mathbf{F}_{a}}^{t}=(1-\varepsilon^{t})\mathbf{f}_{\mathbf{F}_{a}}^{t-1}-\varepsilon^{t}\frac{\partial g}{\partial\bm{\theta}_{\mathbf{F}_{a}}}(\bm{\theta}_{\mathbf{U}_{1}}^{t},\bm{\theta}_{\mathbf{U}_{2}}^{t},\bm{\theta}_{\mathbf{F}_{a}}^{t},\bm{\theta}_{\mathbf{F}_{b}}^{t}), (18)
𝐟𝐅bt=(1εt)𝐟𝐅bt1εtg𝜽𝐅b(𝜽𝐔1t,𝜽𝐔2t,𝜽𝐅at,𝜽𝐅bt),\mathbf{f}_{\mathbf{F}_{b}}^{t}=(1-\varepsilon^{t})\mathbf{f}_{\mathbf{F}_{b}}^{t-1}-\varepsilon^{t}\frac{\partial g}{\partial\bm{\theta}_{\mathbf{F}_{b}}}(\bm{\theta}_{\mathbf{U}_{1}}^{t},\bm{\theta}_{\mathbf{U}_{2}}^{t},\bm{\theta}_{\mathbf{F}_{a}}^{t},\bm{\theta}_{\mathbf{F}_{b}}^{t}), (19)

with initial value f~1=0\tilde{f}^{-1}=0, 𝐟𝐔11=𝟎\mathbf{f}_{\mathbf{U}_{1}}^{-1}=\mathbf{0}, 𝐟𝐔21=𝟎\mathbf{f}_{\mathbf{U}_{2}}^{-1}=\mathbf{0}, 𝐟𝐅a1=𝟎\mathbf{f}_{\mathbf{F}_{a}}^{-1}=\mathbf{0} and 𝐟𝐅b1=𝟎\mathbf{f}_{\mathbf{F}_{b}}^{-1}=\mathbf{0}. The expressions of the partial derivatives are given in Appendix A, and {εt}\{\varepsilon^{t}\} is a sequence of the parameters to be properly chosen. Subsequently, we aim to solve the approximated problem at time frame tt, which is given by

min𝜽𝐔1,𝜽𝐔2,𝜽𝐅a,𝜽𝐅bf¯t(𝜽𝐔1,𝜽𝐔2,𝜽𝐅a,𝜽𝐅b).\min_{\bm{\theta}_{\mathbf{U}_{1}},\bm{\theta}_{\mathbf{U}_{2}},\bm{\theta}_{\mathbf{F}_{a}},\bm{\theta}_{\mathbf{F}_{b}}}\bar{f}^{t}(\bm{\theta}_{\mathbf{U}_{1}},\bm{\theta}_{\mathbf{U}_{2}},\bm{\theta}_{\mathbf{F}_{a}},\bm{\theta}_{\mathbf{F}_{b}}). (20)

It is readily seen that this is a convex problem and the solution is given by

𝜽¯𝐔1=𝜽𝐔1t𝐟𝐔1t2ϖ,𝜽¯𝐔2=𝜽𝐔2t𝐟𝐔2t2ϖ,𝜽¯𝐅a=𝜽𝐅at𝐟𝐅at2ϖ,𝜽¯𝐅b=𝜽𝐅bt𝐟𝐅bt2ϖ.\begin{split}&\bar{\bm{\theta}}_{\mathbf{U}_{1}}=\bm{\theta}_{\mathbf{U}_{1}}^{t}-\frac{\mathbf{f}_{\mathbf{U}_{1}}^{t}}{2\varpi},\bar{\bm{\theta}}_{\mathbf{U}_{2}}=\bm{\theta}_{\mathbf{U}_{2}}^{t}-\frac{\mathbf{f}_{\mathbf{U}_{2}}^{t}}{2\varpi},\\ &\bar{\bm{\theta}}_{\mathbf{F}_{a}}=\bm{\theta}_{\mathbf{F}_{a}}^{t}-\frac{\mathbf{f}_{\mathbf{F}_{a}}^{t}}{2\varpi},\bar{\bm{\theta}}_{\mathbf{F}_{b}}=\bm{\theta}_{\mathbf{F}_{b}}^{t}-\frac{\mathbf{f}_{\mathbf{F}_{b}}^{t}}{2\varpi}.\end{split} (21)

Then, the long-term variables are updated as

𝜽𝐔1t+1=(1γt)𝜽𝐔1t+γt𝜽¯𝐔1,𝜽𝐔2t+1=(1γt)𝜽𝐔2t+γt𝜽¯𝐔2,𝜽𝐅at+1=(1γt)𝜽𝐅at+γt𝜽¯𝐅a,𝜽𝐅bt+1=(1γt)𝜽𝐅bt+γt𝜽¯𝐅b,\begin{split}&\bm{\theta}_{\mathbf{U}_{1}}^{t+1}=(1-\gamma^{t})\bm{\theta}_{\mathbf{U}_{1}}^{t}+\gamma^{t}\bar{\bm{\theta}}_{\mathbf{U}_{1}},\bm{\theta}_{\mathbf{U}_{2}}^{t+1}=(1-\gamma^{t})\bm{\theta}_{\mathbf{U}_{2}}^{t}+\gamma^{t}\bar{\bm{\theta}}_{\mathbf{U}_{2}},\\ &\bm{\theta}_{\mathbf{F}_{a}}^{t+1}=(1-\gamma^{t})\bm{\theta}_{\mathbf{F}_{a}}^{t}+\gamma^{t}\bar{\bm{\theta}}_{\mathbf{F}_{a}},\bm{\theta}_{\mathbf{F}_{b}}^{t+1}=(1-\gamma^{t})\bm{\theta}_{\mathbf{F}_{b}}^{t}+\gamma^{t}\bar{\bm{\theta}}_{\mathbf{F}_{b}},\end{split} (22)

where similarly {γt}\{\gamma_{t}\} denotes a sequence of parameters. Based on [47], the convergence can be guaranteed if we choose εt\varepsilon^{t} and γt\gamma^{t} by following the conditions

limtεt=0,tεt=,t(εt)2<,limtγt=0,tγt=,t(γt)2<,limtγtεt=0.\begin{split}&\lim_{t\longrightarrow\infty}\varepsilon^{t}=0,\sum_{t}\varepsilon^{t}=\infty,\sum_{t}(\varepsilon^{t})^{2}<\infty,\\ &\lim_{t\longrightarrow\infty}\gamma^{t}=0,\sum_{t}\gamma^{t}=\infty,\sum_{t}(\gamma^{t})^{2}<\infty,\lim_{t\longrightarrow\infty}\frac{\gamma^{t}}{\varepsilon^{t}}=0.\end{split} (23)

Then the proposed SSCA-based algorithm can be guaranteed to converge to a stationary solution of 𝒫1\mathcal{P}1. We summarize the proposed long-term analog beamforming design algorithm in Algorithm 1, and its complexity is dominated by the procedure of updating the surrogate functions, which is given by 𝒪{Nrf3+NNrf(Na+Nb)}\mathcal{O}\{N_{rf}^{3}+NN_{rf}(N_{a}+N_{b})\}.

Algorithm 1 Proposed SSCA-based algorithm for the long-term analog beamforming design
1:  Initialize the optimization variables 𝜽𝐔10,𝜽𝐔20,𝜽𝐅a0,𝜽𝐅b0\bm{\theta}_{\mathbf{U}_{1}}^{0},\bm{\theta}_{\mathbf{U}_{2}}^{0},\bm{\theta}_{\mathbf{F}_{a}}^{0},\bm{\theta}_{\mathbf{F}_{b}}^{0} with a feasible point. Set an appropriate value for ϖ\varpi and let t=0t=0.
2:  repeat
3:     Obtain the CSI samples 𝐇1t\mathbf{H}_{1}^{t}, 𝐇2t\mathbf{H}_{2}^{t} and 𝐇3t\mathbf{H}_{3}^{t}. Compute the surrogate function (14) based on (15)-(19) and εt\varepsilon^{t}.
4:     Obtain the optimal solution via (21).
5:     Update 𝜽𝐔1t,𝜽𝐔2t,𝜽𝐅at,𝜽𝐅bt\bm{\theta}_{\mathbf{U}_{1}}^{t},\bm{\theta}_{\mathbf{U}_{2}}^{t},\bm{\theta}_{\mathbf{F}_{a}}^{t},\bm{\theta}_{\mathbf{F}_{b}}^{t} based on (22) and γt\gamma^{t}.
6:     Update the iteration number t=t+1t=t+1.
7:  until the convergence condition is satisfied or the maximum number of iterations is reached.

V short-term digital beamforming and offloading ratio design

In this section, we introduce the proposed algorithm for solving 𝒫\mathcal{P}2. We first decompose 𝒫\mathcal{P}2 into several subproblems that are easier to solve. Then, for the subproblems regarding the digital beamforming design, we propose a penalty-CCCP based algorithm to handle the non-linear power consumption constraints. As for the subproblems regarding the offloading ratio design, we derive closed-form solution via classification and discussion. The details are as follows.

V-A Problem decomposition

Due to the fact that the transmissions occur over orthogonal time in a single time slot, the transmission rates of the uplink, downlink and D2D link, i.e. R1R_{1}, R2R_{2}, R3R_{3} are independent of each other. Furthermore, since the overall system delay (13a) is nonincreasing with R1R_{1}, R2R_{2} and R3R_{3}, we can maximize the transmission rates first, and then optimize the offloading ratio. Hence the short-term latency minimization problem 𝒫\mathcal{P}2 can be equivalently decomposed into the following two parts.

  • The digital beamforming design subproblems for the uplink, the downlink and the D2D link, respectively, are provided as:

    𝒫3-1:max{𝐖a1}\displaystyle\mathcal{P}3\text{-}1:\max\limits_{\left\{\mathbf{W}_{a1}\right\}} R1\displaystyle\quad R_{1} (24a)
    s.t. i=1NaPPA1(i)PUA,\displaystyle\sum_{i=1}^{N_{a}}P_{PA1}(i)\leq P_{UA}, (24b)
    0𝐅a(i,:)𝐖a12Pmax,i,\displaystyle 0\leq\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|^{2}\leq P_{max},\forall i, (24c)
    𝒫3-2:max{𝐕2}\displaystyle\mathcal{P}3\text{-}2:\max\limits_{\left\{\mathbf{V}_{2}\right\}} R2\displaystyle\quad R_{2} (25a)
    s.t. i=1NPPA2(i)PBS,\displaystyle\sum_{i=1}^{N}P_{PA2}(i)\leq P_{BS}, (25b)
    0𝐔2(i,:)𝐕22Pmax,i,\displaystyle 0\leq\|\mathbf{U}_{2}(i,:)\mathbf{V}_{2}\|^{2}\leq P_{max},\forall i, (25c)
    𝒫3-3:max{𝐖a3}\displaystyle\mathcal{P}3\text{-}3:\max\limits_{\left\{\mathbf{W}_{a3}\right\}} R3\displaystyle\quad R_{3} (26a)
    s.t. i=1NaPPA3(i)PUA,\displaystyle\sum_{i=1}^{N_{a}}P_{PA3}(i)\leq P_{UA}, (26b)
    0𝐅a(i,:)𝐖a32Pmax,i,\displaystyle 0\leq\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a3}\|^{2}\leq P_{max},\forall i, (26c)
  • The offloading ratio optimization subproblem is given by:

    𝒫4:min0ρ1Ttotal.\mathcal{P}4:\min_{0\leq\rho\leq 1}\,\,T_{total}.\vspace{-0.5em} (27)

Although we have decomposed the original problem into several more tractable one, there are still some challenges, i.e. the non-linear power consumption constraint in 𝒫3\mathcal{P}3 and the multi-case piece-wise objective function in 𝒫4\mathcal{P}4. In the following two subsections, we introduce our proposed algorithms for tackling these issues.

V-B Short-term digital beamforming design

In this subsection, we introduce the proposed short-term digital beamformer design algorithm for solving 𝒫3-1\mathcal{P}3\text{-}1-𝒫3-3\mathcal{P}3\text{-}3. Since these subproblems are essentially the same, we focus on the uplink to introduce our proposed algorithm. First, we equivalently transform 𝒫3-1\mathcal{P}3\text{-}1 into a more tractable form based on the celebrated weighted minimum mean square error (WMMSE) method [48] as follows,

min𝐕1,𝐖𝐚𝟏,𝐙\displaystyle\min_{\mathbf{V}_{1},\mathbf{W_{a1}},\mathbf{Z}}\quad Tr(𝐙𝐄)logdet(𝐙)\displaystyle\text{Tr}(\mathbf{Z}\mathbf{E})-\log\det(\mathbf{Z}) (28a)
s.t. (24b),(24c),\displaystyle\eqref{uplink_PA_constraint},\eqref{uplink_PA_maxpower}, (28b)

where 𝐙\mathbf{Z} is an auxiliary variable satisfying 𝐙𝟎\mathbf{Z}\succeq\mathbf{0} and

𝐄(𝐕1H𝐇ef1𝐖a1𝐈)(𝐕1H𝐇ef1𝐖a1𝐈)H+σ12𝐕1H𝐔1H𝐔1𝐕1.\begin{split}\mathbf{E}\triangleq&(\mathbf{V}_{1}^{H}\mathbf{H}_{ef1}\mathbf{W}_{a1}-\mathbf{I})(\mathbf{V}_{1}^{H}\mathbf{H}_{ef1}\mathbf{W}_{a1}-\mathbf{I})^{H}\\ &+\sigma_{1}^{2}\mathbf{V}_{1}^{H}\mathbf{U}_{1}^{H}\mathbf{U}_{1}\mathbf{V}_{1}.\end{split} (29)

Then, to tackle the non-linear power constraint, we introduce two auxiliary variables PPA1P_{PA1} and Vout1V_{out1}, and equivalently convert (28) as

min𝒮¯\displaystyle\min_{\bar{\mathcal{S}}}\quad Tr(𝐙𝐄)logdet(𝐙)\displaystyle\text{Tr}(\mathbf{Z}\mathbf{E})-\log\det(\mathbf{Z}) (30a)
s.t. i=1NaPPA1(i)=P¯UA,\displaystyle\sum_{i=1}^{N_{a}}P_{PA1}(i)=\bar{P}_{UA}, (30b)
0<PPA1(i)4Pmax/π,i,\displaystyle 0<P_{PA1}(i)\leq 4P_{max}/\pi,\forall i, (30c)
0<Vout1(i)Pmax,i,\displaystyle 0<V_{out1}(i)\leq P_{max},\forall i, (30d)
PPA1(i)=h(Vout1(i)),i,\displaystyle P_{PA1}(i)=h(V_{out1}(i)),\forall i, (30e)
Vout1(i)=𝐅a(i,:)𝐖a1,i,\displaystyle V_{out1}(i)=\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|,\forall i, (30f)

where 𝒮¯{𝐕1,𝐖𝐚𝟏,𝐙,PPA1,Vout1}\bar{\mathcal{S}}\triangleq\{\mathbf{V}_{1},\mathbf{W_{a1}},\mathbf{Z},P_{PA1},V_{out1}\} is the set of optimization variables and P¯UAmin(PUA\bar{P}_{UA}\triangleq\min(P_{UA},4NaPmax/π)4N_{a}P_{max}/\pi), while h(Vout1(i))h(V_{out1}(i)) is defined as

h(Vout1(i))={2Vout1(i)Pmax/π,0<Vout1(i)0.25Pmax,6Vout1(i)Pmax/π2Pmax/π,0.25Pmax<Vout1(i)<Pmax.h(V_{out1}(i))=\begin{cases}2V_{out1}(i)\sqrt{P_{max}}/\pi,\\ \qquad\qquad 0<V_{out1}(i)\leq 0.25P_{max},\\ 6V_{out1}(i)\sqrt{P_{max}}/\pi-2P_{max}/\pi,\\ \qquad\qquad 0.25P_{max}<V_{out1}(i)<P_{max}.\end{cases} (31)

Then, we solve the transformed problem based on the penalty-CCCP framework, the detailed introduction of which can be found in [49]. By penalizing the equality constraints (30b), (30e) and (30f) into the objective function (30a), we obtain the penalized problem shown as follows.

𝒫5:min𝒮\displaystyle\mathcal{P}5:\min_{\mathcal{S}} Tr(𝐙𝐄)logdet(𝐙)\displaystyle\quad\text{Tr}(\mathbf{Z}\mathbf{E})-\log\det(\mathbf{Z})
+12ϱ(i=1Na(PPA1(i)h(Vout1(i)))2\displaystyle+\frac{1}{2\varrho}(\sum_{i=1}^{N_{a}}(P_{PA1}(i)-h(V_{out1}(i)))^{2}
+i=1Na(𝐅a(i,:)𝐖a1Vout1(i))2\displaystyle+\sum_{i=1}^{N_{a}}(\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|-V_{out1}(i))^{2}
+(i=1NaPPA1(i)P¯UA)2)\displaystyle+(\sum_{i=1}^{N_{a}}P_{PA1}(i)-\bar{P}_{UA})^{2}) (32a)
s.t. (30c),(30d),\displaystyle\quad\eqref{con_Ppai},\eqref{con_Vouti},

where ϱ\varrho denotes a penalty coefficient. Referring to the penalty-CCCP, the proposed algorithm contains two loops, where the penalty coefficient is adjusted in the outer loop, while in the inner loop the optimization variables are updated in a block coordinate descent fashion. In each block of the inner loop, we aim to decompose the resulting penalized problem into a number of subproblems, which can be solved easily in parallel. To this end, we divide the optimization variables into three blocks, i.e. {𝐕1,PPA1}\{\mathbf{V}_{1},P_{PA1}\}, {𝐙,Vout1}\{\mathbf{Z},V_{out1}\}, {𝐖a1}\{\mathbf{W}_{a1}\}. The detailed solutions within each block are given in Appendix B and we summarize the proposed penalty-CCCP based algorithm for uplink short-term digital beamforming design in Algorithm 2. The complexity is given by 𝒪{I1I2(Nrf3+Nrfa3+NNrfNa)}\mathcal{O}\{I_{1}I_{2}(N_{rf}^{3}+N_{rfa}^{3}+NN_{rf}N_{a})\}, where I1I_{1} and I2I_{2} denote the iteration numbers for the outer and inner loops, respectively.

Although the computational complexity in a single iteration is low, the required iteration number may be large for the double loop nature of the penalty-CCCP. Hence, we also propose a low-complexity heuristic algorithm for the short-term digital beamformer design. Ignoring the non-linear power constraint, the uplink digital beamforming design problem yields

max𝐖a1\displaystyle\max_{\mathbf{W}_{a1}}\quad logdet(𝐈+1σ12𝐇ef1𝐖a1𝐖a1H𝐇ef1H(𝐔1H𝐔1)1)\displaystyle\log\det(\mathbf{I}+\frac{1}{\sigma_{1}^{2}}\mathbf{H}_{ef1}\mathbf{W}_{a1}\mathbf{W}_{a1}^{H}\mathbf{H}_{ef1}^{H}(\mathbf{U}_{1}^{H}\mathbf{U}_{1})^{-1}) (33a)
s.t. Tr{𝐐Fa𝐖a1𝐖a1H}PUA,\displaystyle\text{Tr}\{\mathbf{Q}_{Fa}\mathbf{W}_{a1}\mathbf{W}_{a1}^{H}\}\leq P_{UA}, (33b)

where 𝐐Fa𝐅aH𝐅a\mathbf{Q}_{Fa}\triangleq\mathbf{F}_{a}^{H}\mathbf{F}_{a}. Since the long-term analog beamformer 𝐔1\mathbf{U}_{1} is fixed, we further view 𝐇¯ef1𝚺¯1𝐕¯𝐇ef1\bar{\mathbf{H}}_{ef1}\triangleq\bar{\mathbf{\Sigma}}^{-1}\bar{\mathbf{V}}\mathbf{H}_{ef1} as the effective channel matrix, where 𝚺¯\bar{\mathbf{\Sigma}} and 𝐕¯\bar{\mathbf{V}} are the diagonal singular value matrix and the right singular vector matrix of 𝐔1\mathbf{U}_{1}, respectively. This problem has a well-known water-filling solution which is given by

𝐖a1𝐐Fa1/2𝐔e𝚺e,\mathbf{W}_{a1}\triangleq\mathbf{Q}_{Fa}^{-1/2}\mathbf{U}_{e}\mathbf{\Sigma}_{e}, (34)

where 𝐔e\mathbf{U}_{e} is the set of the right singular vectors corresponding to the NaN_{a} largest singular values of 𝐇¯ef1𝐐Fa1/2\bar{\mathbf{H}}_{ef1}\mathbf{Q}_{Fa}^{-1/2}, and 𝚺e\mathbf{\Sigma}_{e} is the diagonal power allocation matrix. Since this low-complexity algorithm does not consider the nonlinear transmit power constraint directly, we scale the digital beamforming matrix 𝐖a1\mathbf{W}_{a1} to satisfy constraint (24b) and (24c), where the scaling factor can be conveniently found by using the bisection search. The computational complexity of the proposed low-complexity short-term digital beamforming design algorithm is given by 𝒪{NNaNrf+NrfNrfa2}\mathcal{O}\{NN_{a}N_{rf}+N_{rf}N_{rfa}^{2}\}.

Algorithm 2 Proposed penalty-CCCP based algorithm for uplink short-term digital beamformer design
1:  Initialize the optimization variables with a feasible point. Define the tolerance of accuracy δ1\delta_{1} and δ2\delta_{2}. Set iteration number i=0i=0 and j=0j=0. Set ϱ0>0\varrho^{0}>0 and c>1c>1.
2:  repeat
3:     repeat
4:        update 𝐕1\mathbf{V}_{1} and PPA1P_{PA1} according to (50) and (52), respectively.
5:        update 𝐙\mathbf{Z} and Vout1V_{out1} according to (54) and (59), respectively.
6:        update 𝐖a1\mathbf{W}_{a1} according to (63).
7:        update the iteration number i=i+1i=i+1.
8:     until the difference between two successive objective values is less than δ1\delta_{1}.
9:     ϱj+1=cϱj\varrho^{j+1}=c\varrho^{j}.
10:     update the iteration number j=j+1j=j+1.
11:  until the difference of successive objective function value is less than δ1\delta_{1} and the penalty term is less than δ2\delta_{2} or the maximum number of iterations is reached.

V-C Optimization of offloading ratio ρ\rho

In this subsection, we aim to optimize the offloading ratio by solving 𝒫\mathcal{P}4. Referring to the analysis of different cases for timelines shown in Fig. 2, it is readily seen that TtotalT_{total} is a linear piece-wise function of ρ\rho, and we can derive the optimal expression for ρ\rho through classification and analysis. Based on the four cases shown in Fig. 2, let us rewrite the expression of TtotalT_{total} with ρ\rho being the variable as (35). In order to derive the optimal ρ\rho, we need to analyze all possible situations. It is apparent that when ρ\rho grows from 0 to 1, Case 1 and Case 3 will happen for sure, while the occurrence of Case 2 depends on the condition of KLK1+KL<K3K3+KE\frac{K_{L}}{K_{1}+K_{L}}<\frac{K_{3}}{K_{3}+K_{E}}, which can be simplified to KLK1<K3KE\frac{K_{L}}{K_{1}}<\frac{K_{3}}{K_{E}}. Similar to Case 2, the occurrence of Case 4 depends on KLK1+KLK3+KLK3+KL+K1+KE\frac{K_{L}}{K_{1}+K_{L}}\geq\frac{K_{3}+K_{L}}{K_{3}+K_{L}+K_{1}+K_{E}}, which can be simplified to KLK1K3KE\frac{K_{L}}{K_{1}}\geq\frac{K_{3}}{K_{E}}. As we can see, the criteria of Case 2 and Case 4 contradict each other and thus only one of them can appear. Hence, there are two possible situations in general.

Ttotal=\displaystyle T_{total}= Case 1:K1ρ+KEρ+K2ρ,\displaystyle\!\!\textbf{Case 1}:\>K_{1}\rho+K_{E}\rho+K_{2}\rho, ifρ[KLK1+KL,1][K3K3+KE,1],\text{if}\,\,\rho\in[\frac{K_{L}}{K_{1}+K_{L}},1]\cap[\frac{K_{3}}{K_{3}+K_{E}},1], (35a)
Ttotal=\displaystyle T_{total}= Case 2:K1ρ+K3(1ρ)+K2ρ,\displaystyle\!\!\text{{Case 2}}:\>K_{1}\rho+K_{3}(1-\rho)+K_{2}\rho, ifρ[KLK1+KL,1][0,K3K3+KE),\text{if}\,\,\rho\in[\frac{K_{L}}{K_{1}+K_{L}},1]\cap[0,\frac{K_{3}}{K_{3}+K_{E}}), (35b)
Ttotal=\displaystyle T_{total}= Case 3:(KL+K3)(1ρ)+K2ρ,\displaystyle\!\!\text{{Case 3}}:\>(K_{L}+K_{3})(1-\rho)+K_{2}\rho, ifρ[0,KLK1+KL)[0,K3+KLK3+KL+K1+KE),\text{if}\,\,\rho\in[0,\frac{K_{L}}{K_{1}+K_{L}})\cap[0,\frac{K_{3}+K_{L}}{K_{3}+K_{L}+K_{1}+K_{E}}),\qquad\qquad\quad (35c)
Ttotal=\displaystyle T_{total}= Case 4:K1ρ+KEρ+K2ρ,\displaystyle\!\!\text{{Case 4}}:\>K_{1}\rho+K_{E}\rho+K_{2}\rho, ifρ[0,KLK1+KL)[K3+KLK3+KL+K1+KE,1].\text{if}\,\,\rho\in[0,\frac{K_{L}}{K_{1}+K_{L}})\cap[\frac{K_{3}+K_{L}}{K_{3}+K_{L}+K_{1}+K_{E}},1]. (35d)

i) Situation A: KLK1K3KE\frac{K_{L}}{K_{1}}\geq\frac{K_{3}}{K_{E}}

In this situation, Case 2 does not happen. Thus, the expression of TtotalT_{total} consists of (35c), (35d) and (35a) in order as ρ\rho increases from 0 to 1. However, note that the line function (35a) is the same as (35d). Therefore, the expression of TtotalT_{total} in situation A essentially consists of two line segments. It is apparent that (35a) or (35d) is nondecreasing, while the monotonicity of (35c) depends on whether K2KLK3K_{2}-K_{L}-K_{3} is positive or negative:

  • 1)

    K2KLK30K_{2}-K_{L}-K_{3}\geq 0: In this case, (35c) is a nondecreasing function of ρ\rho, thus the whole function is nondecreasing and we have ρ=0\rho^{*}=0.

  • 2)

    K2KLK3<0K_{2}-K_{L}-K_{3}<0: In this case, the expression of TtotalT_{total} consists of a decreasing line followed by an increasing one, thus the optimal ρ\rho should be at the turning point, i.e. ρ=K3+KLK3+KL+K1+KE\rho^{*}=\frac{K_{3}+K_{L}}{K_{3}+K_{L}+K_{1}+K_{E}}.

ii) Situation B: KLK1<K3KE\frac{K_{L}}{K_{1}}<\frac{K_{3}}{K_{E}}

In this situation, Case 4 does not happen. So the expression of TtotalT_{total} consists of (35c), (35b) and (35a) in order. Since (35a) is a nondecreasing line segment, we can conclude that ρ[0,K3K3+KE]\rho^{*}\in[0,\frac{K_{3}}{K_{3}+K_{E}}]. Then, let us focus on the monotonicity of (35c) and (35b). There are four kinds of situations in total:

  • 1)

    K2KLK3<0K_{2}-K_{L}-K_{3}<0 and K2+K1K3<0K_{2}+K_{1}-K_{3}<0: In this case, TtotalT_{total} is monotonically decreasing when ρ[0,K3K3+KE]\rho\in[0,\frac{K_{3}}{K_{3}+K_{E}}]. Thus we obtain ρ=K3K3+KE\rho^{*}=\frac{K_{3}}{K_{3}+K_{E}}.

  • 2)

    K2KLK3<0K_{2}-K_{L}-K_{3}<0 and K2+K1K30K_{2}+K_{1}-K_{3}\geq 0: In this case, TtotalT_{total} is decreasing first when ρ[0,KLK1+KL]\rho\in[0,\frac{K_{L}}{K_{1}+K_{L}}] and then increasing when ρ[KLK1+KL,K3K3+KE]\rho\in[\frac{K_{L}}{K_{1}+K_{L}},\frac{K_{3}}{K_{3}+K_{E}}]. Thus we obtain ρ=KLK1+KL\rho^{*}=\frac{K_{L}}{K_{1}+K_{L}}.

  • 3)

    K2KLK30K_{2}-K_{L}-K_{3}\geq 0 and K2+K1K3<0K_{2}+K_{1}-K_{3}<0: This case is impossible because from K2KLK30K_{2}-K_{L}-K_{3}\geq 0 we obtain K2K3K_{2}\geq K_{3} while from K2+K1K3<0K_{2}+K_{1}-K_{3}<0 we obtain K2<K3K_{2}<K_{3}.

  • 4)

    K2KLK30K_{2}-K_{L}-K_{3}\geq 0 and K2+K1K30K_{2}+K_{1}-K_{3}\geq 0: In this case, TtotalT_{total} is nondecreasing in the whole feasible region of ρ\rho. Thus we obtain ρ=0\rho^{*}=0.

By following the above discussion, we obtain the final result of the optimal ρ\rho. The above analysis is summarized in a flowchart given as Fig. 4.

The complexity of the proposed short-term variables design algorithm for solving 𝒫2\mathcal{P}2 is dominated by the penalty-CCCP algorithm, whose complexity is given above. Regarding the convergence, according to the detailed convergence analysis of the penalty-CCCP algorithm [49], the proposed short-term digital beamforming algorithms converge to the stationary solutions of 𝒫3-1\mathcal{P}3\text{-}1, 𝒫3-2\mathcal{P}3\text{-}2 and 𝒫3-3\mathcal{P}3\text{-}3. Moreover, considering the optimality of the derived offloading ratio and the fact that TtotalT_{total} is non-increasing with the transmission rate R1R_{1}, R2R_{2} and R3R_{3}. The proposed joint short-term digital beamforming and offloading ratio design algorithm converge to a stationary point of 𝒫2\mathcal{P}2.

Refer to caption

Figure 4: Offloading ratio optimization.
Remark 2.

We assume that the design algorithm is implemented at the BS. Specifically, in each time slot, the effective uplink CSI matrix 𝐇ef1\mathbf{H}_{ef1} is estimated at the BS. The effective downlink CSI matrix 𝐇ef2\mathbf{H}_{ef2} and D2D CSI matrix 𝐇ef3\mathbf{H}_{ef3} are estimated at user B and fed back to the BS. Moreover, user A sends the necessary information such as the local computing capacity FLF_{L} and the task compression ratio α\alpha to the BS and the BS conducts the short-term digital beamforming and task allocation algorithm. In each frame, the full channel samples 𝐇1\mathbf{H}_{1}, 𝐇2\mathbf{H}_{2} and 𝐇3\mathbf{H}_{3} are collected by the BS using the similar channel estimation strategy and Algorithm 1 is then performed.

VI Simulation results

In this section, we present simulation results to verify the effectiveness of our proposed algorithm. The simulation parameters are provided as follows unless otherwise stated. The number of antennas at the BS server is set as N=64N=64, while the number of antennas at the users is Na=Nb=8N_{a}=N_{b}=8. The number of RF chains at the BS is Nrf=4N_{rf}=4, with user RF chain numbers set as Nrfa=Nrfb=2N_{rfa}=N_{rfb}=2. We set the number of data streams as d1=min(Nrfa,Nrf)d_{1}=\min(N_{rfa},N_{rf}), d2=min(Nrf,Nrfb)d_{2}=\min(N_{rf},N_{rfb}) and d3=min(Nrfa,Nrfb)d_{3}=\min(N_{rfa},N_{rfb}), respectively. For the mmWave channel model, we employ the generally used extended Salch-Valenzuela geometric model [50]. Specifically, the channel matrix is given by

𝐇=N1N2Lpl=1Lpαlar(φlr)at(φlt)H×exp(j2πfdτcos(φlr)),\mathbf{H}=\sqrt{\frac{N_{1}N_{2}}{L_{p}}}\sum_{l=1}^{L_{p}}\alpha_{l}\textbf{\text{a}}_{r}(\varphi_{l}^{r})\textbf{\text{a}}_{t}(\varphi_{l}^{t})^{H}\times\exp(j2\pi f_{d}\tau\cos(\varphi_{l}^{r})), (36)

where N1N_{1} and N2N_{2} are the number of transmit and receive antennas, respectively. LpL_{p} is the number of distinguishable paths, αl𝒞𝒩(0,σpl2)\alpha_{l}\sim\mathcal{CN}(0,\sigma_{pl}^{2}) is the complex gain of the l-l\text{-}th path, ar(φlr)\textbf{\text{a}}_{r}(\varphi_{l}^{r}) and at(φlt)\textbf{\text{a}}_{t}(\varphi_{l}^{t}) are the receive and transmit antenna array response vectors, where φlr\varphi_{l}^{r} and φlt\varphi_{l}^{t} are the azimuth angles of arrival and departure, respectively. fdf_{d} is the maximum Doppler shift, and τ\tau is the delay. The expression of the response vector is given by

a(θ)=1N[1,ejk0daπsin(θ),,ejk0da(N1)πsin(θ)]T,\textbf{\text{a}}(\theta)=\frac{1}{N}\left[1,e^{jk_{0}d_{a}\pi\sin(\theta)},...,e^{jk_{0}d_{a}(N-1)\pi\sin(\theta)}\right]^{T}, (37)

where k0=2π/λ0k_{0}=2\pi/\lambda_{0}, λ0\lambda_{0} is the wavelength and dad_{a} is the antenna spacing set as da=λ0/2d_{a}=\lambda_{0}/2. We assume that there are 1 line-of-sight (LOS) path and 15 non-line-of-sight (NLOS) path, and the gain for the LOS path is σp12=1\sigma_{p1}^{2}=1, while the gain for the NLOS path is σpl2=0.1,l1\sigma_{pl}^{2}=0.1,\forall l\neq 1. We set the Doppler shift as fd=70Hzf_{d}=70\text{Hz} and the transmission delay as τ=4ms\tau=4\text{ms} according to [42].

The BS is located at [0,0,10m][0,0,10\text{m}] and the positions of user A and user B are set to [Dx,Dy,1m][D_{x},D_{y},1\text{m}] and [Dx,Dy,1m][-D_{x},D_{y},1\text{m}], respectively, with Dx=5mD_{x}=5\text{m} and Dy=50mD_{y}=50\text{m}. The path loss is modeled as Pls=C0(dlinkD0)βP_{ls}=C_{0}(\frac{d_{link}}{D_{0}})^{-\beta}, where C0C_{0} is the path loss at the reference distance D0=1mD_{0}=1\text{m} and is set to C0=30C_{0}=-30dB, dlinkd_{link} is the link distance, and β\beta is the path loss exponent where we set it for the uplink, the downlink and the D2D link as β1=3\beta_{1}=3, β2=3\beta_{2}=3 and β3=2.4\beta_{3}=2.4, respectively. The power of additive white Gaussian noise is assumed to be σ12=σ22=σ32=90dBm\sigma_{1}^{2}=\sigma_{2}^{2}=\sigma_{3}^{2}=-90\text{dBm} and the power budgets of the BS and user A are PBS=40dBmP_{BS}=40\text{dBm} and PUA=20dBmP_{UA}=20\text{dBm}, respectively. The maximum power output of the PA is set to 30dBm30\text{dBm} and we assume that the bandwidth of the three links are B1=B2=B3=100B_{1}=B_{2}=B_{3}=100MHz [51]. The number of computation tasks is L=106L=10^{6} bits and the compression ratio is chosen as α=0.01\alpha=0.01, which is a typical value for tasks like such as MPEG4 video (2D data and point cloud data) compression [52]. The computation capacities of the local CPU and the edge server are FL=200F_{L}=200Mbps and FE=1600F_{E}=1600Mbps, respectively. The number of frames in a channel statics coherence time and the number of time slots in a frame is set to Tf=100T_{f}=100 and Ts=100T_{s}=100, respectively. As for the algorithm parameters, the long-term analog beamforming design algorithm is updated based on εt=0.6t\varepsilon^{t}=0.6^{t} and γt=0.9t\gamma^{t}=0.9^{t}. For the short-term digital beamforming design, the tolerance of accuracy is set as δ1=103\delta_{1}=10^{-3}, and δ2=108\delta_{2}=10^{-8}. The initial value of the penalty coefficient is ϱ0=0.1\varrho^{0}=0.1 and the control parameter is c=0.8c=0.8.

We first study the convergence behavior of the proposed algorithm. Fig. 5 presents the convergence performance of the proposed long-term analog beamforming design Algorithm 1. The left y axis shows the value of the weighted ergodic channel capacity, which converges rapidly within 100 iterations. We also provide the value of the system latency, which is shown along the right y axis. As we can see, the system latency converges almost synchronously with the weighted ergodic capacity and decreases significantly when the iteration number increases, which validates the effectiveness of convergence for the proposed long-term algorithm. Fig. 6 presents the convergence performance of the proposed short-term digital beamforming design Algorithm 2. As shown in Fig. 6LABEL:sub@fig:3a, the objective function converges within about 40 iterations. Fig. 6LABEL:sub@fig:3b shows the penalty terms versus the number of iterations, which finally decreases to a level less than 101210^{-12}, indicating that the constraint is satisfied at the convergence point, thereby verifying the effectiveness of the proposed penalty-CCCP algorithm for handling the non-linear power constraint.

Refer to caption

Figure 5: Convergence performance of our proposed long-term analog beamforming design algorithm.
Refer to caption
(a)
Refer to caption
(b)
Figure 6: Convergence performance of our proposed short-term digital beamforming design algorithm in the uplink: (a) Objective function value versus the number of iterations; (b) Penalty terms versus the number of iterations.

In the following, we analyze the performance of our proposed two-timescale joint hybrid beamforming and offloading ratio design algorithm. We provide the following benchmarks for comparison,

  • Two-timescale heuristic beamforming: This scheme adopts Algorithm 1 for the long-term analog beamforming design and the derived optimal solution for the offloading ratio design, and the proposed heuristic low-complexity algorithm is employed for the short-term digital beamforming design.

  • Two-timescale binary offloading: This scheme adopts Algorithm 1 for the long-term analog beamforming design and Algorithm 2 for the short-term digital beamforming design. Moreover, the binary offloading strategy, i.e. selecting the one that has the lowest delay between the local computing scheme (ρ=0\rho=0) and the edge computing scheme (ρ=1\rho=1), is employed.

  • Single-timescale OMP: This scheme adopts the OMP algorithm [29] for the hybrid beamforming design and the optimal solution for the offloading ratio design in each time slot.

  • Single-timescale CM: Similar to the single-timescale OMP, however, it employs the CM algorithm [30] for the A/D hybrid beamforming design.

  • Single-timescale AO: Similar to the single-timescale OMP, however, it employs the AO algorithm [31] for the A/D hybrid beamforming design.

We assume that the CSI delay is proportional to the size of the required CSI matrix as [53]. Specifically, the size of the CSI overhead for the two-timescale algorithm in a frame is given by ζ[NNb+NaNb+Ts(NrfNrfb+NrfaNrfb)]\zeta[NN_{b}+N_{a}N_{b}+T_{s}(N_{rf}N_{rfb}+N_{rfa}N_{rfb})], where ζ\zeta is the number of quantization bits for each element of the CSI matrices, while that of the single-timescale algorithm is given by ζ[Ts(NNb+NaNb)]\zeta[T_{s}(NN_{b}+N_{a}N_{b})]. Fig. 7 illustrates the CSI overhead versus the number of antennas at the BS NN. We can see that the proposed two-timescale algorithm remarkably reduces the required signaling overhead, especially when NN is large. In the simulation, we set the CSI delay of the single-timescale algorithm as τ=4\tau=4ms. Then, the CSI delay of the two-timescale algorithm can be computed as τtts=NrfbNrfNbNτ=0.094\tau_{tts}=\frac{N_{rfb}N_{rf}}{N_{b}N}\tau=0.094ms.

Refer to caption

Figure 7: CSI overhead versus number of antennas at the BS NN.

Refer to caption

Figure 8: System delay TtotalT_{total} versus the CSI delay τ\tau (ms).

Fig. 8 shows the latency performance of different algorithms versus the CSI delay. As we can see, the proposed two-timescale algorithms vary slightly when the CSI delay increases, while the conventional single-timescale algorithms degrade dramatically with the CSI delay. We also observe that the proposed algorithm outperforms the other compared algorithms when the CSI delay is larger than 33ms. Fig. 9 compares the delay of different algorithms versus the transmit power of user A, i.e., PUAP_{UA}. We observe that our proposed algorithm provides evident superiority over the single-timescale algorithms and the binary offloading algorithm. When the transmit power is large, the proposed heuristic short-term beamforming algorithm achieves close performance as the proposed short-term penalty-CCCP based algorithm. However, when the transmit power is small, the gap between the heuristic low-complexity algorithm and the penalty-CCCP based algorithm becomes large for the impact of the non-linear PAs becomes evident.

Refer to caption

Figure 9: System delay TtotalT_{total} versus the transmit power of user A PUAP_{UA} (mW).

Fig. 10 presents the latency of various algorithms when the user position DyD_{y} changes. We observe that our proposed algorithm still achieves the lowest latency performance, and as the distance between the BS and the users increases, the latency of different schemes gradually converges to the small level. This is not hard to understand because when the users are far from the BS, the users will offload less tasks to the edge server, for transmitting the raw data through the uplink is time-consuming. When the distance is quite large, the offloading ratio will be close to 0, i.e. the local computing scheme. In fact, by observing the two-timescale binary offloading algorithm, we can find that the local computing scheme starts to outperform the edge computing scheme if DyD_{y} is larger than 80m.

Refer to caption

Figure 10: System delay TtotalT_{total} versus the user position DyD_{y} (Mbps).

Fig. 11 indicates the delay of different algorithms versus the computation resource ratio η\eta, where ηFEFL\eta\triangleq\frac{F_{E}}{F_{L}} and FLF_{L} is fixed to 200MHz. We present the latency of our proposed algorithm under two transmit power settings, i.e. PUA=100P_{UA}=100mW and PUA=200P_{UA}=200mW. As we can see, when PUA=100P_{UA}=100mW, the proposed algorithm and the binary offloading algorithm vary slightly, even when η\eta is large. However, when PUA=200P_{UA}=200mW, the proposed algorithm and the binary offloading algorithm decrease evidently with η\eta. This is because when the system latency is limited by the transmission rate, simply increasing the computing resource will not significantly reduce the delay, which motivates us to alleviate the system bottleneck instead of simply raising the edge computing capacity.

Refer to caption

Figure 11: System delay TtotalT_{total} versus the computation resource ratio η\eta.

Fig. 12 shows the latency performance of the analyzed algorithms versus different quantization bits of the analog beamformer. We observe that latency of all algorithms decreases with the increasing of quantization bits. Moreover, the proposed algorithm only needs 4 or 5 quantization bits to achieve near performance with that of infinite quantization level, which means that the developed algorithm is efficient in practice.

Refer to caption

Figure 12: System delay TtotalT_{total} under different quantization bits.

Fig. 13 illustrates the delay of different algorithms versus the Rician factor ψ\psi, which is defined as ψ=σp12l=2Lpσpl2\psi=\frac{\sigma_{p1}^{2}}{\sum_{l=2}^{L_{p}}\sigma_{pl}^{2}}. It is observed that when ψ\psi increases, the delay of the proposed two-timescale algorithm decreases significantly. This is because a larger Rician factor means a more deterministic channel. Hence the proposed stochastic optimization algorithm will perform better. We also observe that the latency of the single-timescale AO algorithm and the CM algorithm decreases with ψ\psi, which is due to the fact that as the channel approaches rank-1, these two algorithms can find near optimal solutions. However, because of the CSI delay, our proposed two-timescale algorithm still outperforms them. The performance of the single-timescale OMP algorithm degrades severely with ψ\psi because the OMP algorithm assumes that the LOS and NLOS components have the same gain and is not suitable for channels with large Rician factors. In contrast, our developed algorithm does not assume a specific channel model and can be applied to a variety of channels.

Refer to caption

Figure 13: System delay TtotalT_{total} versus the Rician factor (ψ\psi).

VII Conclusion

In this paper, we investigated a mmWave and D2D assisted MEC system, in which user A aims to process computation tasks and share the results with another user B with the aid of BS. We proposed a two-timescale algorithm where the analog beamforming matrices are updated at a long-timescale and the digital beamforming matrices and the offloading ratio are optimized at a short-timescale to reduce the required CSI overhead and minimize the system latency. We developed a SSCA-based algorithm to design the long-term analog beamforming matrices. The short-term digital beamforming matrices have been optimized relying on the concept of the penalty-CCCP for dealing with the mmWave non-linear transmit power constraint, and the offloading ratio has been obtained via the closed-form solution. We carried out the optimality and computational complexity analysis for the long-term and short-term design algorithms, respectively. Simulation results have been provided to verify the effectiveness of our proposed joint design algorithm. Extending the mmWave and D2D assisted MEC system to the multi-user case and more specific computation model is worthy of further investigation.

Appendix A Derivation of the gradients

Based on the rules of matrix computation, we express the derivatives associated with the long-term analog beamforming matrices as follows

g𝜽𝐔1=g𝐔11j𝐔1g𝐔11j𝐔1,\frac{\partial g}{\partial\bm{\theta}_{\mathbf{U}_{1}}}=\frac{\partial g}{\partial\mathbf{U}_{1}}\circ 1j\mathbf{U}_{1}-\frac{\partial g}{\partial\mathbf{U}_{1}^{*}}\circ 1j\mathbf{U}_{1}^{*}, (38)
g𝜽𝐔2=g𝐔21j𝐔2g𝐔21j𝐔2,\frac{\partial g}{\partial\bm{\theta}_{\mathbf{U}_{2}}}=\frac{\partial g}{\partial\mathbf{U}_{2}}\circ 1j\mathbf{U}_{2}-\frac{\partial g}{\partial\mathbf{U}_{2}^{*}}\circ 1j\mathbf{U}_{2}^{*}, (39)
g𝜽𝐅a=g𝐅a1j𝐅ag𝐅a1j𝐅a,\frac{\partial g}{\partial\bm{\theta}_{\mathbf{F}_{a}}}=\frac{\partial g}{\partial\mathbf{F}_{a}}\circ 1j\mathbf{F}_{a}-\frac{\partial g}{\partial\mathbf{F}_{a}^{*}}\circ 1j\mathbf{F}_{a}^{*}, (40)
g𝜽𝐅b=g𝐅b1j𝐅bg𝐅b1j𝐅b,\frac{\partial g}{\partial\bm{\theta}_{\mathbf{F}_{b}}}=\frac{\partial g}{\partial\mathbf{F}_{b}}\circ 1j\mathbf{F}_{b}-\frac{\partial g}{\partial\mathbf{F}_{b}^{*}}\circ 1j\mathbf{F}_{b}^{*}, (41)

and the derivatives associated with the analog beamforming matrices are given by

g𝐔1=w1σ12[𝐈𝐔1(𝐔1H𝐔1)1𝐔1H]×𝐘11𝐇1𝐅a𝐅aH𝐇1H𝐔1(𝐔1H𝐔1)1,\begin{split}\frac{\partial g}{\partial\mathbf{U}_{1}^{*}}=&\frac{w_{1}}{\sigma_{1}^{2}}[\mathbf{I}-\mathbf{U}_{1}(\mathbf{U}_{1}^{H}\mathbf{U}_{1})^{-1}\mathbf{U}_{1}^{H}]\\ &\times\mathbf{Y}_{1}^{-1}\mathbf{H}_{1}\mathbf{F}_{a}\mathbf{F}_{a}^{H}\mathbf{H}_{1}^{H}\mathbf{U}_{1}(\mathbf{U}_{1}^{H}\mathbf{U}_{1})^{-1},\end{split} (42)
g𝐔2=w2σ22𝐇2H𝐅b(𝐅bH𝐅b)1𝐅bH𝐘21𝐇2𝐔2,\frac{\partial g}{\partial\mathbf{U}_{2}^{*}}=\frac{w_{2}}{\sigma_{2}^{2}}\mathbf{H}_{2}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}\mathbf{F}_{b}^{H}\mathbf{Y}_{2}^{-1}\mathbf{H}_{2}\mathbf{U}_{2}, (43)
g𝐅a=w1σ12𝐇1H𝐔1(𝐔1H𝐔1)1𝐔1H𝐘11𝐇1𝐔1+w3σ32𝐇3H𝐅b(𝐅bH𝐅b)1𝐅bH𝐘31𝐇3𝐅b,\begin{split}\frac{\partial g}{\partial\mathbf{F}_{a}^{*}}=&\frac{w_{1}}{\sigma_{1}^{2}}\mathbf{H}_{1}^{H}\mathbf{U}_{1}(\mathbf{U}_{1}^{H}\mathbf{U}_{1})^{-1}\mathbf{U}_{1}^{H}\mathbf{Y}_{1}^{-1}\mathbf{H}_{1}\mathbf{U}_{1}\\ &+\frac{w_{3}}{\sigma_{3}^{2}}\mathbf{H}_{3}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}\mathbf{F}_{b}^{H}\mathbf{Y}_{3}^{-1}\mathbf{H}_{3}\mathbf{F}_{b},\end{split} (44)
g𝐅b=w2σ22[𝐈𝐅b(𝐅bH𝐅b)1𝐅bH]×𝐘21𝐇2𝐔2𝐔2H𝐇2H𝐅b(𝐅bH𝐅b)1+w3σ32[𝐈𝐅b(𝐅bH𝐅b)1𝐅bH]×𝐘31𝐇3𝐅a𝐅aH𝐇3H𝐅b(𝐅bH𝐅b)1,\begin{split}\frac{\partial g}{\partial\mathbf{F}_{b}^{*}}=&\frac{w_{2}}{\sigma_{2}^{2}}[\mathbf{I}-\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}\mathbf{F}_{b}^{H}]\\ &\times\mathbf{Y}_{2}^{-1}\mathbf{H}_{2}\mathbf{U}_{2}\mathbf{U}_{2}^{H}\mathbf{H}_{2}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}\\ &+\frac{w_{3}}{\sigma_{3}^{2}}[\mathbf{I}-\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}\mathbf{F}_{b}^{H}]\\ &\times\mathbf{Y}_{3}^{-1}\mathbf{H}_{3}\mathbf{F}_{a}\mathbf{F}_{a}^{H}\mathbf{H}_{3}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1},\end{split} (45)

where

𝐘1=𝐈+1σ12𝐇1𝐅a𝐅aH𝐇1H𝐔1(𝐔1H𝐔1)1𝐔1H,\mathbf{Y}_{1}=\mathbf{I}+\frac{1}{\sigma_{1}^{2}}\mathbf{H}_{1}\mathbf{F}_{a}\mathbf{F}_{a}^{H}\mathbf{H}_{1}^{H}\mathbf{U}_{1}(\mathbf{U}_{1}^{H}\mathbf{U}_{1})^{-1}\mathbf{U}_{1}^{H}, (46)
𝐘2=𝐈+1σ22𝐇2𝐔2𝐔2H𝐇2H𝐅b(𝐅bH𝐅b)1𝐅bH,\mathbf{Y}_{2}=\mathbf{I}+\frac{1}{\sigma_{2}^{2}}\mathbf{H}_{2}\mathbf{U}_{2}\mathbf{U}_{2}^{H}\mathbf{H}_{2}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}\mathbf{F}_{b}^{H}, (47)

and

𝐘3=𝐈+1σ32𝐇3𝐅a𝐅aH𝐇3H𝐅b(𝐅bH𝐅b)1𝐅bH.\mathbf{Y}_{3}=\mathbf{I}+\frac{1}{\sigma_{3}^{2}}\mathbf{H}_{3}\mathbf{F}_{a}\mathbf{F}_{a}^{H}\mathbf{H}_{3}^{H}\mathbf{F}_{b}(\mathbf{F}_{b}^{H}\mathbf{F}_{b})^{-1}\mathbf{F}_{b}^{H}. (48)

Moreover, we have g𝐔1=(g𝐔1)\frac{\partial g}{\partial\mathbf{U}_{1}}=(\frac{\partial g}{\partial\mathbf{U}_{1}^{*}})^{*}, g𝐔2=(g𝐔2)\frac{\partial g}{\partial\mathbf{U}_{2}}=(\frac{\partial g}{\partial\mathbf{U}_{2}^{*}})^{*}, g𝐅a=(g𝐅a)\frac{\partial g}{\partial\mathbf{F}_{a}}=(\frac{\partial g}{\partial\mathbf{F}_{a}^{*}})^{*} and g𝐅b=(g𝐅b)\frac{\partial g}{\partial\mathbf{F}_{b}}=(\frac{\partial g}{\partial\mathbf{F}_{b}^{*}})^{*} for the real value objective function. By substituting (42)-(45) into (38)-(41), we finally obtain the derivatives for the long-term analog beamforming matrices.

Appendix B Derivation of updating steps in the inner loop of Algorithm 1

In this appendix, we provide the detailed solutions for updating the block of variables in the proposed penalty-CCCP based short-term digital beamforming algorithm.

Block 1: We update 𝐕1\mathbf{V}_{1} and PPA1P_{PA1} in parallel with the other variables fixed. The subproblem with regard to 𝐕1\mathbf{V}_{1} is given by

min𝐕1Tr(𝐙𝐄)\min_{\mathbf{V}_{1}}\quad\text{Tr}(\mathbf{Z}\mathbf{E}) (49)

which is an unconstrained problem. By applying the first order optimality condition, the optimal solution to 𝐕1\mathbf{V}_{1} is given by

𝐕1=[σ12𝐔1H𝐔1+𝐇ef1𝐖a1𝐖a1H𝐇ef1H]1𝐇ef1𝐖a1.\mathbf{V}_{1}=[\sigma_{1}^{2}\mathbf{U}_{1}^{H}\mathbf{U}_{1}+\mathbf{H}_{ef1}\mathbf{W}_{a1}\mathbf{W}_{a1}^{H}\mathbf{H}_{ef1}^{H}]^{-1}\mathbf{H}_{ef1}\mathbf{W}_{a1}. (50)

The subproblem w.r.t. PPA1P_{PA1} is given by

minPPA1\displaystyle\min_{P_{PA1}}\,\,\, i=1Na(PPA1(i)h(Vout1(i)))2+(i=1NaPPA1(i)P¯UA)2\displaystyle\sum_{i=1}^{N_{a}}(P_{PA1}(i)-h(V_{out1}(i)))^{2}+(\sum_{i=1}^{N_{a}}P_{PA1}(i)-\bar{P}_{UA})^{2} (51a)
s.t. (30c).\displaystyle\eqref{con_Ppai}.

This is a convex problem and the optimal solution can be expressed as

PPA1(i)=max(0,min(4Pmax/π,(h(Vout1(i))+P¯UAjiNaPPA1(j))/2)).\begin{split}P_{PA1}(i)&=\max(0,\min(4P_{max}/\pi,\\ &(h(V_{out1}(i))+\bar{P}_{UA}-\sum_{j\neq i}^{N_{a}}P_{PA1}(j))/2)).\end{split} (52)

Block 2: We update 𝐙\mathbf{Z} and Vout1V_{out1} in parallel by fixing the other variables. The subproblem of 𝐙\mathbf{Z} is given by

min𝐙Tr(𝐙𝐄)logdet(𝐙)\min_{\mathbf{Z}}\text{Tr}(\mathbf{Z}\mathbf{E})-\log\det(\mathbf{Z}) (53)

By checking the first order optimality condition, the optimal 𝐙\mathbf{Z} can be expressed as

𝐙=𝐄1=(𝐈𝐕1H𝐇ef1𝐖a1)1,\mathbf{Z}=\mathbf{E}^{-1}=(\mathbf{I}-\mathbf{V}_{1}^{H}\mathbf{H}_{ef1}\mathbf{W}_{a1})^{-1}, (54)

where the last equality holds from substituting the optimal value of 𝐕1\mathbf{V}_{1}, i.e. (50) into (29).

The subproblem w.r.t. Vout1V_{out1} is given by

minVout1\displaystyle\min_{V_{out1}}\quad i=1Na(PPA1(i)h(Vout1(i)))2\displaystyle\sum_{i=1}^{N_{a}}(P_{PA1}(i)-h(V_{out1}(i)))^{2}
+i=1Na(𝐅a(i,:)𝐖a1Vout1(i))2\displaystyle\qquad+\sum_{i=1}^{N_{a}}(\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|-V_{out1}(i))^{2} (55a)
s.t. (30d).\displaystyle\eqref{con_Vouti}.

It is readily seen that the problem can be divided into NaN_{a} parallel subproblems, which yields

minVout1(i)\displaystyle\min_{V_{out1}(i)}\quad φ(Vout1(i))(PPA1(i)h(Vout1(i)))2\displaystyle\varphi(V_{out1}(i))\triangleq(P_{PA1}(i)-h(V_{out1}(i)))^{2}
+(𝐅a(i,:)𝐖a1Vout1(i))2\displaystyle+(\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|-V_{out1}(i))^{2} (56a)
s.t. Vout1(i)Pmax.\displaystyle V_{out1}(i)\leq\sqrt{P_{max}}. (56b)

Since the objective function is piecewise, we need to discuss different situations and make comparison. Defining x1x_{1}^{*} and x2x_{2}^{*} whose expressions are given by (57) and (58), respectively,

x1π2(2PmaxPPA1(i)/π+𝐅a(i,:)𝐖a1)π2+4Pmax|0Pmax/2x_{1}^{*}\triangleq\left.\frac{\pi^{2}(2\sqrt{P_{max}}P_{PA1}(i)/\pi+\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|)}{\pi^{2}+4P_{max}}\right|_{0}^{\sqrt{P_{max}}/2} (57)
x2π2(6PmaxPPA1(i)/π+𝐅a(i,:)𝐖a1+12PmaxPmax/π2)π2+36Pmax|Pmax/2Pmaxx_{2}^{*}\triangleq\left.\frac{\pi^{2}(6\sqrt{P_{max}}P_{PA1}(i)/\pi+\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|+12P_{max}\sqrt{P_{max}}/\pi^{2})}{\pi^{2}+36P_{max}}\right|_{\sqrt{P_{max}}/2}^{\sqrt{P_{max}}} (58)

where x|abmin(max(x,a),b)x|_{a}^{b}\triangleq\min(\max(x,a),b), we can express the optimal solution to Vout1(i)V_{out1}(i) as

Vout1(i)={x1,φ(x1)φ(x2),x2,φ(x1)φ(x2).V_{out1}(i)=\begin{cases}x_{1}^{*},\qquad\varphi(x_{1}^{*})\leq\varphi(x_{2}^{*}),\\ x_{2}^{*},\qquad\varphi(x_{1}^{*})\geq\varphi(x_{2}^{*}).\end{cases} (59)

Block 3 We update 𝐖a1\mathbf{W}_{a1} with the other variables fixed. The subproblem regarding 𝐖a1\mathbf{W}_{a1} is given by

min𝐖a1Tr(𝐙𝐄)+12ϱi=1Na(𝐅a(i,:)𝐖a1Vout(i))2.\min_{\mathbf{W}_{a1}}\quad\text{Tr}(\mathbf{Z}\mathbf{E})+\frac{1}{2\varrho}\sum_{i=1}^{N_{a}}(\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|-V_{out}(i))^{2}. (60)

By expanding the last term of (60) and ignoring the constant. We rewrite (60) as

min𝐖a1Tr(𝐙𝐄)+12ϱ𝐅a𝐖a12i=1NaVout(i)ϱ𝐅a(i,:)𝐖a1.\min_{\mathbf{W}_{a1}}\quad\text{Tr}(\mathbf{Z}\mathbf{E})+\frac{1}{2\varrho}\|\mathbf{F}_{a}\mathbf{W}_{a1}\|^{2}-\sum_{i=1}^{N_{a}}\frac{V_{out}(i)}{\varrho}\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|. (61)

Note that the last term of (61) is concave. Hence we can approximate the original problem using the CCCP [54]. Through the first order Taylor expansion, we provide a tight upper bound of (61) as follows

min𝐖a1Tr(𝐙𝐄)+12ϱ𝐅a𝐖a12i=1NaVout(i)ϱ{𝐖¯a1H𝐅a(i,:)H𝐅a(i,:)𝐖¯a1}𝐅a(i,:)𝐖¯a1,\begin{split}\min_{\mathbf{W}_{a1}}\quad&\text{Tr}(\mathbf{Z}\mathbf{E})+\frac{1}{2\varrho}\|\mathbf{F}_{a}\mathbf{W}_{a1}\|^{2}\\ &-\sum_{i=1}^{N_{a}}\frac{V_{out}(i)}{\varrho}\frac{\Re\{\bar{\mathbf{W}}_{a1}^{H}\mathbf{F}_{a}(i,:)^{H}\mathbf{F}_{a}(i,:)\bar{\mathbf{W}}_{a1}\}}{\|\mathbf{F}_{a}(i,:)\bar{\mathbf{W}}_{a1}\|},\end{split} (62)

where 𝐖¯a1\bar{\mathbf{W}}_{a1} is the current value of variable 𝐖a1\mathbf{W}_{a1}. By applying the first order optimality condition and setting 𝐖¯a1=𝐖a1\bar{\mathbf{W}}_{a1}=\mathbf{W}_{a1}, we express the solution to 𝐖a1\mathbf{W}_{a1} as

𝐖a1=(𝐇ef1H𝐕1𝐙𝐕1H𝐇ef1+12ϱ𝐅aH𝐅a)1×(𝐇ef1H𝐕1𝐙+i=1NaVout1(i)2ϱ𝐅a(i,:)𝐖a1𝐅a(i,:)H𝐅a(i,:)𝐖a1).\begin{split}&\mathbf{W}_{a1}=(\mathbf{H}_{ef1}^{H}\mathbf{V}_{1}\mathbf{Z}\mathbf{V}_{1}^{H}\mathbf{H}_{ef1}+\frac{1}{2\varrho}\mathbf{F}_{a}^{H}\mathbf{F}_{a})^{-1}\\ &\times(\mathbf{H}_{ef1}^{H}\mathbf{V}_{1}\mathbf{Z}+\sum_{i=1}^{N_{a}}\frac{V_{out1}(i)}{2\varrho\|\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}\|}\mathbf{F}_{a}(i,:)^{H}\mathbf{F}_{a}(i,:)\mathbf{W}_{a1}).\end{split} (63)

References

  • [1] F. Guo, F. R. Yu, H. Zhang, H. Ji, V. C. M. Leung, and X. Li, “An adaptive wireless virtual reality framework in future wireless networks: A distributed learning approach,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 8514-8528, Aug. 2020.
  • [2] J. Ren, Y. He, G. Huang, G. Yu, Y. Cai, and Z. Zhang, “An edge computing based architecture for mobile augmented reality,” IEEE Netw., vol. 33, no. 4, pp. 162-169, Aug. 2019.
  • [3] J. Zhang, H. Guo, J. Liu, and Y. Zhang, “Task offloading in vehicular edge computing networks: A load-balancing solution,” IEEE Trans. Veh. Technol., vol. 69, no. 2, pp. 2092-2104, Feb. 2020.
  • [4] T. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “PCANet: A simple deep learning baseline for image classification?,” IEEE Trans. Image Process., vol. 24, no. 12, pp. 5017-5032, Dec. 2015.
  • [5] S. Abolfazli, Z. Sanaei, E. Ahmed, A. Gani, and R. Buyya, “Cloud-based augmentation for mobile devices: Motivation, taxonomies, and open challenges,” IEEE Commun. Surv. Tuts., vol. 16, no. 1, pp. 337-368, 1st Quart., 2014.
  • [6] Y. C. Hu, M. Patel, D. Sabella, N. Sprecher, and V. Young, “Mobile edge computing: A key technology towards 5G,” ETSI White Paper. no. 11, Sept. 2015.
  • [7] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge computing: The communication perspective,” IEEE Commun. Surv. Tuts., vol. 19, no. 4, pp. 2322-2358, 4th quart., 2017.
  • [8] W. Zhang, Y. Wen, K. Guan, D. Kilper, H. Luo, and D. O. Wu, “Energy-optimal mobile cloud computing under stochastic wireless channel,” IEEE Trans. Wireless Commun., vol. 12, no. 9, pp. 4569-4581, Sept. 2013.
  • [9] Y. Mao, J. Zhang, and K. B. Letaief, “Dynamic computation offloading for mobile-edge computing with energy harvesting devices,” IEEE J. Sel. Areas Commun., vol. 34, no. 12, pp. 3590-3605, Dec. 2016.
  • [10] C. You, K. Huang, and H. Chae, “Energy efficient mobile cloud computing powered by wireless energy transfer,” IEEE J. Sel. Areas Commun., vol. 34, no. 5, pp. 1757-1771, May 2016.
  • [11] T. Q. Dinh, J. Tang, Q. D. La, and T. Q. S. Quek, “Offloading in mobile edge computing: Task allocation and computational frequency scaling,” IEEE Trans. Commun., vol. 65, no. 8, pp. 3571-3584, Aug. 2017.
  • [12] C. Wang, C. Liang, F. R. Yu, Q. Chen, and L. Tang, “Computation offloading and resource allocation in wireless cellular networks with mobile edge computing,” IEEE Trans. Wireless Commun., vol. 16, no. 8, pp. 4924-4938, Aug. 2017.
  • [13] C. You, K. Huang, H. Chae, and B. Kim, “Energy-efficient resource allocation for mobile-edge computation offloading,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1397-1411, Mar. 2017.
  • [14] J. Ren, G. Yu, Y. Cai, and Y. He, “Latency optimization for resource allocation in mobile-edge computation offloading,” IEEE Trans. Wireless Commun., vol. 17, no. 8, pp. 5506-5519, Aug. 2018.
  • [15] X. Cao, F. Wang, J. Xu, R. Zhang, and S. Cui, “Joint computation and communication cooperation for energy-efficient mobile edge computing,” IEEE Int. Things J., vol. 6, no. 3, pp. 4188-4200, Jun. 2019.
  • [16] J. Ren, G. Yu, Y. He, and G. Y. Li, “Collaborative cloud and edge computing for latency minimization,” IEEE Trans. Veh. Technol., vol. 68, no. 5, pp. 5031-5044, May 2019.
  • [17] J. Zhao, Q. Li, Y. Gong, and K. Zhang, “Computation offloading and resource allocation for cloud assisted mobile edge computing in vehicular networks,” IEEE Trans. Veh. Technol., vol. 68, no. 8, pp. 7944-7956, Aug. 2019.
  • [18] X. Cao, B. Yang, H. Zhang, C. Huang, C. Yuen, and Z. Han, “Reconfigurable-intelligent-surface-assisted MAC for wireless networks: Protocol design, analysis, and optimization,” IEEE Int. Things J., vol. 8, no. 18, pp. 14171-14186, Sept. 2021.
  • [19] X. Cao, B. Yang, C. Huang, C. Yuen, Y. Zhang, D. Niyato, and Z. Han, “Converged reconfigurable intelligent surface and mobile edge computing for space information networks," IEEE Netw., vol. 35, no. 4, pp. 42-48, Jul./Aug. 2021.
  • [20] T. Bai, C. Pan, Y. Deng, M. Elkashlan, A. Nallanathan, and L. Hanzo, “Latency minimization for intelligent reflecting surface aided mobile edge computing,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2666-2682, Nov. 2020.
  • [21] M. S. Elbamby, C. Perfecto, M. Bennis, and K. Doppler, “Toward low-latency and ultra-reliable virtual reality,” IEEE Netw., vol. 32, no. 2, pp. 78-84, Mar./Apr. 2018.
  • [22] M. S. Elbamby, C. Perfecto, M. Bennis, and K. Doppler, “Edge computing meets millimeter-wave enabled VR: Paving the way to cutting the cord,” Proc. WCNC, 2018, pp. 1-6.
  • [23] S. A. Busari, K. M. S. Huq, S. Mumtaz, L. Dai, and J. Rodriguez, “Millimeter-wave massive MIMO communication for future wireless systems: A survey,” IEEE Commun. Surv. Tuts., vol. 20, no. 2, pp. 836-869, 2nd quart., 2018.
  • [24] A. Bazzi, B. M. Masini, A. Zanella, and I. Thibault, “On the performance of IEEE 802.11p and LTE-V2V for the cooperative awareness of connected vehicles,” IEEE Trans. Veh. Technol., vol. 66, no. 11, pp. 10419-10432, Nov. 2017.
  • [25] Y. Li, L. Sun, and W. Wang, “Exploring device-to-device communication for mobile cloud computing,” in Proc. ICC, Sydney, NSW, 2014, pp. 2239-2244.
  • [26] W. Hu, and G. Cao, “Quality-aware traffic offloading in wireless networks,” IEEE Trans. Mobile Comput., vol. 16, no. 11, pp. 3182-3195, Nov. 2017.
  • [27] Y. Tao, C. You, P. Zhang, and K. Huang, “Stochastic control of computation offloading to a helper with a dynamically loaded CPU,” IEEE Trans. Wireless Commun., vol. 18, no. 2, pp. 1247-1262, Feb. 2019.
  • [28] Y. He, J. Ren, G. Yu, and Y. Cai, “D2D communications meet mobile edge computing for enhanced computation capacity in cellular networks,” IEEE Trans. Wireless Commun., vol. 18, no. 3, pp. 1750-1763, Mar. 2019.
  • [29] O. E. Ayach, S. Rajagopal, S. Abu-Surra, Z. Pi, and R. W. Heath, “Spatially sparse precoding in millimeter wave MIMO systems,” IEEE Trans. Wireless Commun., vol. 13, no. 3, pp. 1499-1513, Mar. 2014.
  • [30] J. Zhang, M. Haardt, I. Soloveychik, and A. Wiesel, “A channel matching based hybrid analog-digital strategy for massive multi-user MIMO downlink systems,” in Proc. SAM, Jul. 2016, pp. 1-5.
  • [31] X. Yu, J. Shen, J. Zhang, and K. B. Letaief, “Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems,” IEEE J. Sel. Topics in Signal Process., vol. 10, no. 3, pp. 485-500, Apr. 2016.
  • [32] F. Sohrabi, and W. Yu, “Hybrid digital and analog beamforming design for large-scale antenna arrays,” IEEE J. Sel. Topics in Signal Process., vol. 10, no. 3, pp. 501-513, Apr. 2016.
  • [33] Q. Shi, and M. Hong, “Spectral efficiency optimization for millimeter wave multiuser MIMO systems,” IEEE J. Sel. Topics in Signal Process., vol. 12, no. 3, pp. 455-468, Jun. 2018.
  • [34] Y. Cai, Y. Xu, Q. Shi, B. Champagne, and L. Hanzo, “Robust joint hybrid transceiver design for millimeter wave full-duplex MIMO relay systems,” IEEE Trans. Wireless Commun., vol. 18, no. 2, pp. 1199-1215, Feb. 2019.
  • [35] J. Choi, V. Va, N. Gonzalez-Prelcic, R. Daniels, C. R. Bhat, and R. W. Heath, “Millimeter-wave vehicular communication to support massive automotive sensing,” IEEE Commun. Magazine, vol. 54, no. 12, pp. 160-167, Dec. 2016.
  • [36] H. Huang, B. Liu, L. Chen, W. Xiang, M. Hu, and Y. Tao, “D2D-assisted VR video pre-caching strategy,” IEEE Access, vol. 6, pp. 61886-61895, 2018.
  • [37] A. Asadi, Q. Wang, and V. Mancuso, “A survey on device-to-device communication in cellular networks,” IEEE Commun. Surv. Tuts., vol. 16, no. 4, pp. 1801-1819, 4th quart., 2014.
  • [38] J. Ren, Y. Ruan, and G. Yu, “Data transmission in mobile edge networks: Whether and where to compress?,” IEEE Commun. Lett., vol. 23, no. 3, pp. 490-493, Mar. 2019.
  • [39] S. C. Cripps, RF Power Amplifier for Wireless Communication, Norwod, MA, USA: Artech House, 2006.
  • [40] C. Fager, W. Hallberg, M. Özen, K. Andersson, K. Buisman, and D. Gustafsson, “Design of linear and efficient power amplifiers by generalization of the Doherty theory,” Proc. PAWR, Phoenix, AZ, 2017, pp. 29-32.
  • [41] C. Lin and G. Y. Li, “Energy-efficient design of indoor mmWave and sub-THz systems with antenna arrays,” IEEE Trans. Commun., vol. 15, no. 7, pp. 4660-4672, Jul. 2016.
  • [42] Y. Cai, K. Xu, A. Liu, M. Zhao, B. Champagne, and L. Hanzo, “Two-timescale hybrid analog-digital beamforming for mmWave full-duplex MIMO multiple-relay aided systems,” IEEE J. Sel. Areas Commun., vol. 38, no. 9, pp. 2086-2103, Sept. 2020.
  • [43] Y. Cui, V. K. N. Lau, R. Wang, H. Huang, and S. Zhang, “A survey on delay-aware resource control for wireless systems-large deviation theory, stochastic lyapunov drift, and distributed stochastic learning,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1677-1701, Mar. 2012.
  • [44] K. Kar, S. Sarkar, A. Ghavami, and X. Luo, “Delay guarantees for throughput-optimal wireless link scheduling,” IEEE Trans. Autom. Control, vol. 57, no. 11, pp. 2906-2911, Nov. 2012.
  • [45] M. J. Neely, “Delay analysis for maximal scheduling with flow control in wireless networks with bursty traffic,” IEEE/ACM Trans. Netw., vol. 17, no. 4, pp. 1146-1159, Aug. 2009.
  • [46] T. L. Marzetta and B. M. Hochwald, “Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading,” IEEE Trans. Inf. Theory, vol. 45, no. 1, pp. 139-157, Jan. 1999.
  • [47] A. Liu, V. K. N. Lau, and B. Kananian, “Stochastic successive convex approximation for non-convex constrained stochastic optimization,” IEEE Trans. Signal Process., vol. 67, no. 16, pp. 4189-4203, Aug. 15, 2019.
  • [48] Q. Shi, M. Razaviyayn, Z. Luo, and C. He, “An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4331-4340, Sept. 2011.
  • [49] Y. Cai, Q. Shi, B. Champagne, and G. Y. Li, “Joint transceiver design for secure downlink communications over an amplify-and-forward MIMO relay,” IEEE Trans. Commun., vol. 65, no. 9, pp. 3691-3704, Sept. 2017.
  • [50] P. Smulders, and L. Correia, “Characterization of propagation in 60 GHz radio channels,” Electron. Commun. Eng. J., vol. 9, no. 2, pp. 73-80, Apr. 1997.
  • [51] M. E. Hajj, G. El Zein, G. Zaharia, H. Farhat, and S. Sadek, “Angular measurements and analysis of the indoor propagation channel at 60 GHz,” Proc. WiMob, Barcelona, Spain, 2019, pp. 121-126.
  • [52] S. Schwarz et al., “Emerging MPEG standards for point cloud compression,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 9, no. 1, pp. 133-148, Mar. 2019.
  • [53] A. Liu, and V. K. N. Lau, “Impact of CSI knowledge on the codebook-based hybrid beamforming in massive MIMO,” IEEE Trans. Signal Process., vol. 64, no. 24, pp. 6545-6556, Dec. 15, 2016.
  • [54] A. L. Yuille, and A. Rangarajan, “The concave-convex procedure,” Neural Computaion, vol. 15, no. 4, pp. 915-936, Apr. 2003.