This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Learning Optimal Fronthauling and Decentralized Edge Computation in
Fog Radio Access Networks

Hoon Lee, , Junbeom Kim, , and Seok-Hwan Park H. Lee is with the Department of Smart Robot Convergence and Application Engineering and the Department of Information and Communications Engineering, Pukyong National University, Busan 48513, South Korea (e-mail: [email protected]). J. Kim and S.-H. Park are with the Division of Electronic Engineering, Jeonbuk National University, Jeonju 54896, South Korea (e-mail: {junbeom, seokhwan}@jbnu.ac.kr). © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Abstract

Fog radio access networks (F-RANs), which consist of a cloud and multiple edge nodes (ENs) connected via fronthaul links, have been regarded as promising network architectures. The F-RAN entails a joint optimization of cloud and edge computing as well as fronthaul interactions, which is challenging for traditional optimization techniques. This paper proposes a Cloud-Enabled Cooperation-Inspired Learning (CECIL) framework, a structural deep learning mechanism for handling a generic F-RAN optimization problem. The proposed solution mimics cloud-aided cooperative optimization policies by including centralized computing at the cloud, distributed decision at the ENs, and their uplink-downlink fronthaul interactions. A group of deep neural networks (DNNs) are employed for characterizing computations of the cloud and ENs. The forwardpass of the DNNs is carefully designed such that the impacts of the practical fronthaul links, such as channel noise and signling overheads, can be included in a training step. As a result, operations of the cloud and ENs can be jointly trained in an end-to-end manner, whereas their real-time inferences are carried out in a decentralized manner by means of the fronthaul coordination. To facilitate fronthaul cooperation among multiple ENs, the optimal fronthaul multiple access schemes are designed. Training algorithms robust to practical fronthaul impairments are also presented. Numerical results validate the effectiveness of the proposed approaches.

Index Terms:
Deep learning, fog radio access networks, fronthaul interaction.

I Introduction

Centralized coordination of distributed edge nodes (ENs) has been brought great success in wireless communication networks [1, 2]. Such an architecture is realized with a cloud unit that schedules communication and computation of ENs by leveraging fronthaul interfaces. A particular example is a cloud radio access network (C-RAN) [3, 4, 5, 6, 7] where a cloud centrally performs the baseband signal processing, while radio-frequency (RF) functionalities are carried out by ENs, e.g., remote radio heads. The performance can be further enhanced by fog radio access networks (F-RANs) [8, 9, 10] where ENs are equipped with individual computing units. Measurements of RF propagation environments, e.g., channel state information (CSI), are available only at the ENs due to the absence of the RF circuitry at the cloud. To perform centralized computations with distributed data, the cloud collects the local measurements through uplink fronthaul links. The computation results of the cloud, which contain the information regarding the networking policies of the ENs, e.g., beamforming vectors, are forwarded via downlink fronthaul links. In the F-RAN systems, the data received from the cloud can be further processed at the ENs using local computing units. Hence, to optimize the F-RAN properly, we need to jointly design centralized cloud computing strategies, uplink-downlink fronthaul coordination, and distributed edge processing rules.

Recent studies [3, 4, 5, 6, 7, 8, 9, 10] have addressed various optimization tasks in the C-RAN and F-RAN systems. The works in [3, 4, 5, 6, 7, 8] have investigated a joint optimization of downlink fronthauling schemes at the cloud and multi-antenna signal processing at the ENs. Iterative algorithms are presented for tackling the nonconvexity of particular formulations. Assuming the capacity-constrained fronthauls, compression strategies of the cloud computing results are determined along with the beamforming vectors at the ENs. Although the downlink fronthaul interactions from the cloud to the ENs are adequately studied, they do not consider the imperfections occurred in the uplink fronthaul coordination such as the CSI update steps from the ENs to the cloud. Therefore, existing researches are suitable only for an ideal scenario where the global network state, e.g., the network CSIs, are perfectly known to the cloud. Practical C-RAN systems should involve the joint optimization of downlink-uplink fronthauling protocols and centralized cloud computing strategies. It is, however, not trivial for traditional model-based optimization techniques [3, 4, 5, 6, 7, 8] since the fronthaul interactions typically invoke intractable features including random noises and fronthaul signaling designs.

For the F-RAN architecture, we need to additionally identify decentralized edge computation rules for individual ENs. Distributed optimization methods in the F-RAN have been studied in cache-enabled networks [9] and tactile Internet applications [10]. Message-passing algorithms are employed in [9] to determine a decentralized cache deployment policy. Each EN iteratively updates messages for the interactions with other ENs. These messages should be carefully designed for each network setup, thereby lacking the adaptivity as a general optimization framework. The alternating direction method of multipliers (ADMM) approach can be exploited for the design of distributed and cooperative fog computing [10]. To facilitate iterative interactions among the ENs and the cloud, a proper reformulation technique is necessary to split a global optimization variable. These model-based decentralized algorithms cannot be straightforwardly applied to other types of optimization formulations. In addition, they do not take the practical fronthaul design issues into account such as quantization, noisy channels, and signaling overheads.

To overcome the drawbacks of traditional model-based algorithms, a learning to optimize paradigm has been intensively examined in various wireless networking scenarios [11, 12, 13, 14, 15, 16, 17, 18, 19]. Deep neural networks (DNNs) are employed to replace unknown computation rules for solving network optimization problems. Arbitrarily formulated objectives can be maximized in a data-driven manner without handcraft models, e.g., the convexity of functions and the prior information of the optimal solution. The DNNs are exploited to learn efficient power control mechanisms [12, 13] and user association policy [14] in interfering wireless networks. Beamforming optimization problems for multi-antenna systems are addressed [15]. These results reveal that deep learning (DL) approaches outperform existing suboptimal solutions with much reduced computational complexity. However, they are confined to centralized executions which are not suitable for the F-RAN systems.

Decentralized optimizations have been investigated via unsupervised DL [17, 18, 19] and reinforcement learning (RL) techniques [20]. In [17, 18, 19], a distributed network setup is considered where direct interactions among ENs are allowed by leveraging backhaul interfaces. An interaction policy is autonomously optimized along with distributed computation rules. However, the setup in [17, 18, 19] is different from the F-RAN architecture where the ENs can only be controlled by the cloud, and thus they cannot optimize the role of the cloud in the F-RAN systems. A cloud-aided distributed RL strategy is presented in [20]. To succeed the learning task, the RL framework requires a careful determination of state variables and rewards for individual ENs. The optimization of these hyperparameters typically incurs trial-and-error-based grid search procedures for each network setup. In addition, the backhaul imperfections and signaling overheads are not addressed in [17, 18, 19, 20]. Federated learning (FL) algorithms have been recently studied for handling distributed machine learning problems [17, 21]. The FL focuses on the training of a common DNN at the cloud with the aid of the ENs having individual training datasets. Thus, the FL would not be suitable for the design of decentralized optimization inferences in wireless networks where the ENs desire to identify their own networking solutions with partially observable statistics.

This paper proposes an unsupervised DL method for designing a generic optimization framework in the F-RAN systems. Distributed ENs observe their local states, e.g., the CSI for local wireless links, and desire to determine individual solutions, e.g., transmit power and beamforming vectors, for maximizing the network performance. Since the ENs are typically deployed in a wide cell coverage area, the locally observable information of a certain EN is not directly available to others. A network cloud connecting the ENs through imperfect fronthaul links schedules decentralized edge processing. To optimize the operations at the cloud and the ENs jointly, we propose a Cloud-Enabled Cooperation-Inspired Learning (CECIL) mechanism, which is a structural DL solution developed for the F-RAN systems. The proposed method consists of three consecutive steps: uplink fronthauling at ENs, centralized computation and downlink fronthauling at a cloud, and distributed decision at ENs. A group of DNN units is employed for characterizing the operations of the cloud and ENs. A joint training algorithm of the DNNs is presented with arbitrary given fronthaul imperfections.

The uplink and downlink fronthaul interaction steps incur inter-EN inference signals. To handle this issue, we design multiple access fronthauling schemes that can be autonomously optimized by the DNNs. Two different protocols are investigated. First, following conventional distributed DL approaches [17, 18, 19, 20], an orthogonal multiple access (OMA) is presented which assigns distinct fronthaul resources for each EN. Second, we propose a non-orthogonal multiple access (NOMA) fronthauling strategy where all ENs share the identical fronthaul resources. The non-orthogonal interaction policies among ENs have not yet been investigated in existing DL studies [17, 18, 19, 20], and thus its optimality would not be guaranteed in the design of the cooperative DNN inferencing steps. To this end, we rigorously prove the effectiveness of the OMA and NOMA schemes and analyze the amount of the fronthaul resources to achieve the optimality. The superiority of the NOMA method is verified in terms of the fronthaul signaling overheads. In addition, for the imperfect fronthaul link case, we present a robust learning policy that trains the DNNs in the presence of practical fronthaul impairments such as additive noise and finite-capacity constraints. Finally, numerical results verify the effectiveness of the proposed framework in various F-RAN applications. Our main contributions are summarized as follows.

  • We propose the CECIL framework, a model-driven DL-based optimization mechanism for the F-RAN structure, which jointly determines the decentralized edge computations, centralized cloud calculations, and uplink-downlink fronthaul coordination strategies.

  • For managing inter-edge interference, fronthaul multiple access schemes are designed which bridge computations of DNNs at the cloud and ENs. The optimality of the proposed fronthauling strategies are verified rigorously.

  • To combat the fronthaul channel imperfections, robust training policies are presented which optimize DNNs in the presence of additive noise and fronthaul capacity constraints.

  • Intensive numerical results validating the optimality of the proposed method are provided in interfering networks. Efficient fronthaul resource allocation methods are identified from the numerical results.

The rest of the paper is organized as follows. Section II describes a generic F-RAN system. The inference of the CECIL framework is explained in Section III, and its training process is presented in Section IV. Optimal fronthaul interaction strategies are designed in Section V, and in Section VI, robust training policies for imperfect fronthaul channels are studied. Section VII assesses the performance of the proposed CECIL approach from numerical simulations. Finally, concluding remarks are given in Section VIII.

II Network Model and Problem Formulation

Refer to caption
Figure 1: F-RAN system with a cloud and N=4N=4 ENs.

Fig. 1 illustrates an F-RAN architecture which exploits both cloud and edge computing processes for an efficient management of wireless networks. A cloud is regarded as a central unit that coordinates multiple, say NN, ENs by means of fronthaul links. The ENs are equipped with RF modules to provide networking services. We maximize a generic nonconvex network utility function f()f(\cdot) by optimizing network policies at the ENs. Without loss of generality, states of the F-RAN are represented by a vector 𝐚A\mathbf{a}\in\mathbb{R}^{A} of length AA. The global state 𝐚\mathbf{a} can be any measurement values such as a set of CSIs between the ENs and their intended mobile users. The ENs equipped with the RF processors are responsible for the estimation of the global state vector. The ENs are, in general, distributed over coverage areas to support reliable communication services. Therefore, a locally observable state at each EN ii (i=1,,Ni=1,\cdots,N) denoted by 𝐚iAi\mathbf{a}_{i}\in\mathbb{R}^{A_{i}} becomes a subset of 𝐚\mathbf{a} and it is not known to other ENs and the cloud. Then, the global information vector 𝐚\mathbf{a} can be represented by 𝐚{𝐚i:i}\mathbf{a}\triangleq\{\mathbf{a}_{i}:\forall i\} with size Ai=1NAiA\triangleq\sum_{i=1}^{N}A_{i}.

The fronthaul interface supports the cooperation among the cloud and ENs. For notational simplicity, the cloud is denoted by the 0-th node. Let Mi0M_{i0} and M0iM_{0i} be the number of uplink and downlink fronthaul resource blocks (RBs) assigned for EN ii, respectively. Without loss of the generality, one fronthaul RB is assumed to be occupied for conveying a real-valued scalar number. In practice, the RBs correspond to orthogonal time-frequency channels, e.g., a resource element in LTE systems which consists of one data symbol occupying 15 kHz bandwidth. Thus, Mi0M_{i0} and M0iM_{0i} reflect the fronthaul signaling overheads and the fronthaul resource constraints. The capacity constraints on the fronthaul RBs are addressed in Section VI-B. The total available RBs for the F-RAN is limited by MM as Mi0MM_{i0}\leq M and M0iMM_{0i}\leq M. The number of the RBs can be optimized in advance by the network operator and is assumed to be fixed. The RB allocation schemes are discussed in Section VII-A. Both time division duplexing (TDD) and frequency division duplexing (FDD) protocols can be exploited for implementing the fronthaul coordinations. For the TDD systems, we have Mi0=M0iM_{i0}=M_{0i} since the quality of the uplink and downlink fronthaul channels is the same due to the channel reciprocity. Also, a more general case of Mi0M0iM_{i0}\neq M_{0i} represents the FDD systems where the uplink and downlink fronthaul transmissions experience different radio propagation environments. As a result, the DL method presented in the preceding sections can be applied to arbitrary duplexing systems including both the TDD and FDD.

A decision of EN ii is characterized by a solution vector 𝐱iXi\mathbf{x}_{i}\in\mathbb{R}^{X_{i}} of length XiX_{i} which includes resource management policy and beamforming vector of EN ii. The performance of the F-RAN is generally affected by both the global state 𝐚\mathbf{a} and a set of solutions 𝐱{𝐱i:i}\mathbf{x}\triangleq\{\mathbf{x}_{i}:\forall i\}. Thus, the utility function can be written by f(𝐚,𝐱)f(\mathbf{a},\mathbf{x}). We focus on a maximization task of the utility averaged over the global state vector 𝐚\mathbf{a} expressed by

(P1):\displaystyle(\text{P}1): max𝐱𝔼𝐚[f(𝐚,𝐱)]\displaystyle\max_{\mathbf{x}}\mathbb{E}_{\mathbf{a}}[f(\mathbf{a},\mathbf{x})]
subject to 𝐱i𝒟i,i\displaystyle\text{subject to }\mathbf{x}_{i}\in\mathcal{D}_{i},\forall i

where 𝔼U[]\mathbb{E}_{U}[\cdot] is the expectation operation over a random variable UU and 𝒟i\mathcal{D}_{i} stands for a solution set of EN ii. To tackle (P1) in the F-RAN system, along with the solution vector 𝐱\mathbf{x}, we need to identify the fronthaul interaction policy subject to the fronthaul RB constraints Mi0M_{i0} and M0iM_{0i}, i\forall i. The effect of the fronthaul noise and inter-EN interference can be included in (P1). These are distinct features of our formulation (P1) compared to existing studies on decentralized multi-agent architectures [20] which do not consider the resource constraints on the coordination links.

In this paper, we develop an efficient solution for a generic formulation (P1) whose computational inferences can be realized in the F-RAN systems. Major challenges for (P1) arise from the distinctly available observations and imperfect fronthaul interfaces. We need to perfectly know 𝐚\mathbf{a} to solve (P1). One possible approach is to let the ENs upload their local measurements 𝐚i\mathbf{a}_{i} to the cloud through the uplink fronthaul links. Then, the cloud can calculate the network solution 𝐱\mathbf{x}. The local decision variables 𝐱i\mathbf{x}_{i} are transferred to individual ENs by leveraging the downlink fronthaul links. Such a strategy is only applicable to an ideal scenario where the fronthaul links are perfect and have sufficient RBs, i.e., Mi0AiM_{i0}\geq A_{i} and M0iXiM_{0i}\geq X_{i} for exchanging 𝐚iAi\mathbf{a}_{i}\in\mathbb{R}^{A_{i}} and 𝐱iXi\mathbf{x}_{i}\in\mathbb{R}^{X_{i}}, respectively. Conventional approaches [3, 4, 5, 6, 7, 8] only focus on the downlink or the uplink compression, but not their joint design. The existing FL algorithms [17, 21], where the ENs cooperatively find a common solution at the cloud via model-based iteration rules, are not suitable for addressing the F-RAN problem (P1) since it requires to optimize fronthaul interaction policies. To efficiently solve (P1) in the F-RAN system, we need to study a joint design of fronthaul communication policies and individual decisions at the ENs that can be applied to arbitrary utility functions.

III Cooperative Learning Mechanism

Refer to caption
(a) Uplink message generation
Refer to caption
(b) Downlink message generation
Refer to caption
(c) Distributed decision
Figure 2: Proposed cooperative inference.

This section presents a CECIL inference which designs cooperative optimization mechanisms for the F-RAN system. The CECIL is exploited as forward pass computations of a DNN-based optimization framework in Section IV. We characterize interactions among the cloud and ENs by leveraging abstracted computational inferences to be replaced by DNNs. Fig. 2 describes the proposed cooperative inference structure which consists of three sequential steps: uplink message generation at the ENs, downlink message generation at the cloud, and the decentralized decision at the ENs. In what follows, we describe the details of each step.

III-A Uplink message generation at ENs

As shown in Fig. 2(a), EN ii first sends the information regarding its local observation 𝐚i\mathbf{a}_{i} to the cloud using the uplink fronthaul link assigned with Mi0M_{i0} RBs. A straightforward transmission of the AiA_{i}-dimensional raw data 𝐚i\mathbf{a}_{i} would not be possible when we have insufficient fronthaul resources as Mi0AiM_{i0}\leq A_{i}. Thus, EN ii needs to identify a low-dimensional representation of 𝐚i\mathbf{a}_{i} without no direct interactions with other ENs. This can be viewed as decentralized edge compression steps. The resulting representation 𝐦i0Mi0\mathbf{m}_{i0}\in\mathbb{R}^{M_{i0}} of length Mi0M_{i0} is referred to as an uplink message that carries the local knowledge of EN ii to the cloud via Mi0M_{i0} fronthaul RBs. Let i()\mathcal{M}_{i}(\cdot) be a computational inference performing the uplink message generation of EN ii, i.e.,

𝐦i0=i(𝐚i).\displaystyle\mathbf{m}_{i0}=\mathcal{M}_{i}(\mathbf{a}_{i}). (1)

In (1), only the local observation 𝐚i\mathbf{a}_{i} is accepted as an input for characterizing fully decentralized processing. As discussed in Section IV, the inference i()\mathcal{M}_{i}(\cdot) is modeled by a DNN to be optimized for maximizing the utility.

III-B Downlink message generation at cloud

Practical fronthaul links are interrupted with channel impairments such as the noise, and thus the cloud would get the noisy observation of the uplink messages. To capture this, we introduce a channel transfer function hi0()h_{i0}(\cdot) for the uplink fronthaul link from EN ii to the cloud which can include any channel imperfection encountered in the uplink communication. Then, the received signal at the cloud 𝐲0\mathbf{y}_{0} depends on all the noisy uplink messages hi0(𝐦i0)h_{i0}(\mathbf{m}_{i0}), i\forall i. It is written by

𝐲0=u({hi0(𝐦i0):i})=u({hi0(i(𝐚i)):i}),\displaystyle\mathbf{y}_{0}=u\big{(}\{h_{i0}(\mathbf{m}_{i0}):\forall i\}\big{)}=u\big{(}\{h_{i0}(\mathcal{M}_{i}(\mathbf{a}_{i})):\forall i\}\big{)}, (2)

where the function u()u(\cdot) defined over the set of the noisy messages {hi0(𝐦i0):i}\{h_{i0}(\mathbf{m}_{i0}):\forall i\} describes an uplink transmission strategy of the ENs. The choice of u()u(\cdot) relies on the fronthaul resource sharing policy. For instance, if each EN occupies distinct fronthaul RBs, u()u(\cdot) is simply given by the concatenation operation. On the other hand, u()u(\cdot) becomes the summation when all ENs share the entire uplink fronthaul RBs. The dimension of 𝐲0\mathbf{y}_{0} depends on the number of the uplink fronthaul RBs Mi0M_{i0} and the uplink signaling strategy u()u(\cdot). These are specified in Section V.

From (2), we can observe that the received signal 𝐲0\mathbf{y}_{0} conveys distorted information of the global observation 𝐚={𝐚i:i}\mathbf{a}=\{\mathbf{a}_{i}:\forall i\} with the piecewise edge processing i()\mathcal{M}_{i}(\cdot). A standard approach to process with 𝐲0\mathbf{y}_{0} is to decompose the computations of the cloud into the following subsequent steps. The cloud first recovers the global state 𝐚\mathbf{a} from the received signal 𝐲0\mathbf{y}_{0}. Then, the solution to (P1) is determined by centralized cloud computing strategies. The resulting solution 𝐱iXi\mathbf{x}_{i}\in\mathbb{R}^{X_{i}} is sent back to EN ii via the downlink fronthaul links with M0iM_{0i} RBs. To handle the practical case with M0iXiM_{0i}\leq X_{i}, 𝐱i\mathbf{x}_{i} is encoded into a downlink message 𝐦0iM0i\mathbf{m}_{0i}\in\mathbb{R}^{M_{0i}} whose dimension M0iM_{0i} is fit to the number of the downlink fronthaul RBs M0iM_{0i} assigned to EN ii. As illustrated in Fig. 2(b), we integrate such cascaded procedures into a single computation inference 0()\mathcal{M}_{0}(\cdot) that creates a set of the downlink messages {𝐦0i:i}\{\mathbf{m}_{0i}:\forall i\} from 𝐲0\mathbf{y}_{0}. This can be written by

{𝐦0i:i}=0(𝐲0).\displaystyle\{\mathbf{m}_{0i}:\forall i\}=\mathcal{M}_{0}(\mathbf{y}_{0}). (3)

It is inferred from (3) that the downlink message 𝐦0i\mathbf{m}_{0i} encapsulates the local observations of other nodes 𝐚j\mathbf{a}_{j}, ji\forall j\neq i, as well as an intermediate decision taken at the cloud. The inference 0()\mathcal{M}_{0}(\cdot) is also modeled by a DNN whose parameters are determined to maximize the utility function.

Remark 1.

The inference in (3) can be viewed as a two-way relaying strategy [22, 23] where the cloud relays the signals received from the ENs after an appropriate signal processing 0()\mathcal{M}_{0}(\cdot). Classical relaying protocols are dependent on man-made signaling strategies, e.g., amplify-and-forward and decode-and-forward [24], which might not be the optimum cooperation policy. The proposed DL approach can identify the optimal relaying protocol, provided that a relaying inference 0()\mathcal{M}_{0}(\cdot) is approximated by a properly constructed DNN.

III-C Distributed decision at ENs

The distributed decision process shown in Fig. 2(c) is described. The cloud broadcasts the downlink messages to EN ii with a pre-designed downlink signaling strategy denoted by di()d_{i}(\cdot). Similar to the uplink signaling strategy u()u(\cdot) in (2), di()d_{i}(\cdot) is defined over the set of the downlink messages {𝐦0i:i}\{\mathbf{m}_{0i}:\forall i\} and becomes a design factor to be specified in Section V. The downlink signal intended to EN ii, which is denoted by 𝐝i\mathbf{d}_{i}, can be written by

𝐝i=di({𝐦0j:j}).\displaystyle\mathbf{d}_{i}=d_{i}(\{\mathbf{m}_{0j}:\forall j\}). (4)

Defining h0i()h_{0i}(\cdot) as the downlink fronthaul transfer function from the cloud to EN ii, the received signal 𝐲i\mathbf{y}_{i} at EN ii is given by

𝐲i=h0i(𝐝i)=h0i(di({𝐦0j:j})).\displaystyle\mathbf{y}_{i}=h_{0i}(\mathbf{d}_{i})=h_{0i}\big{(}d_{i}(\{\mathbf{m}_{0j}:\forall j\})\big{)}. (5)

The dimensions of 𝐝i\mathbf{d}_{i} and 𝐲i\mathbf{y}_{i} rely on the message broadcasting strategy di()d_{i}(\cdot) to be designed in Section V. Combining (1), (3), and (5), we can see that the received message 𝐲i\mathbf{y}_{i} of EN ii contains the local statistics of all the ENs. This implies that all the sufficient, but possibly corrupted, information for solving (P1) is now available at each EN. Thereby, the solution 𝐱i\mathbf{x}_{i} of EN ii can be attained individually by means of a node-centric decision inference 𝒳i()\mathcal{X}_{i}(\cdot). The proposed solution computation rule at EN ii is expressed as

𝐱i=𝒳i(𝐚i,𝐲i).\displaystyle\mathbf{x}_{i}=\mathcal{X}_{i}(\mathbf{a}_{i},\mathbf{y}_{i}). (6)

We use the local observation 𝐚i\mathbf{a}_{i} as the side information to refine the received signal 𝐲i\mathbf{y}_{i} dedicated to EN ii. This additional input forms a residual shortcut which leads to an efficient training strategy of very deep networks [25].

Algorithm 1 Proposed CECIL inference for F-RAN
  1. Uplink message generation:  EN ii, i\forall i, creates an uplink message 𝐦i0\mathbf{m}_{i0} from (1) and sends it to the cloud through the uplink fronthaul links (2).
  2. Downlink message generation:  The cloud broadcasts downlink messages 𝐦0i\mathbf{m}_{0i} generated from (3) using the downlink fronthaul links (5).
  3. Distributed decision:  EN ii, i\forall i, computes an individual solution 𝐱i\mathbf{x}_{i} from (6).

Algorithm 1 summarizes the inference of the CECIL framework. The uplink messages generated at the ENs are first transmitted to the cloud. Receiving the noisy signal 𝐲0\mathbf{y}_{0}, the centralized cloud computing yields the downlink messages to be broadcasted to the ENs. The decision 𝐱i\mathbf{x}_{i} is then taken at each EN ii individually. The proposed inference relies only on locally observable information, i.e., local measurement 𝐚i\mathbf{a}_{i} and the received messages, but not on instantaneous states of other network entities. As a result, Algorithm 1 can be implemented in a distributed manner with optimized i()\mathcal{M}_{i}(\cdot) and 𝒳i()\mathcal{X}_{i}(\cdot).

IV Deep Learning Formulation

Based on the formulations in (1), (3), and (6), the original problem (P1) can be transformed as

(P2):\displaystyle(\text{P}2): max{i(),𝒳i():i}𝔼𝐚[f(𝐚,{𝒳i(𝐚i,𝐲i):i})]\displaystyle\max_{\{\mathcal{M}_{i}(\cdot),\mathcal{X}_{i}(\cdot):\forall i\}}\mathbb{E}_{\mathbf{a}}[f(\mathbf{a},\{\mathcal{X}_{i}(\mathbf{a}_{i},\mathbf{y}_{i}):\forall i\})]
subject to 𝒳i(𝐚i,𝐲i)𝒟i,i,𝐚.\displaystyle~{}~{}~{}~{}\text{subject to }\mathcal{X}_{i}(\mathbf{a}_{i},\mathbf{y}_{i})\in\mathcal{D}_{i},\forall i,\ \forall\mathbf{a}.

The targets of the optimization are given by unstructured functions i()\mathcal{M}_{i}(\cdot) in (1), 0\mathcal{M}_{0} in (3), and 𝒳i()\mathcal{X}_{i}(\cdot) in (6), which cannot be tackled by traditional optimization techniques requiring analytical formulas. To this end, we employ the learning to optimize approach [11, 12, 13, 14, 15, 16, 17, 18, 19] which employs DNNs for replacing unknown mappings i()\mathcal{M}_{i}(\cdot) and 𝒳i()\mathcal{X}_{i}(\cdot). Let Q(;θ)\mathcal{F}_{Q}(\cdot;\theta) be a QQ-layer fully-connected DNN with a trainable parameter θ\mathbf{\theta}. For an input vector 𝐮U1\mathbf{u}\in\mathbb{R}^{U_{1}} of length U1U_{1}, the output of Q(𝐮;θ)\mathcal{F}_{Q}(\mathbf{u};\theta) is written as

Q(𝐮;θ)=σQ(𝐖Q××σ1(𝐖1𝐮+𝐛1)++𝐛Q),\displaystyle\mathcal{F}_{Q}(\mathbf{u};\mathbf{\theta})=\sigma_{Q}(\mathbf{W}_{Q}\!\times\!\cdots\!\times\!\sigma_{1}(\mathbf{W}_{1}\mathbf{u}+\mathbf{b}_{1})\!+\!\cdots\!+\!\mathbf{b}_{Q}), (7)

where σq()\sigma_{q}(\cdot) is an activation at layer q(q=1,,Q)q~{}(q=1,\cdots,Q) and θ\mathbf{\theta} accounts for the collection of weight matrices 𝐖qUq+1×Uq\mathbf{W}_{q}\in\mathbb{R}^{U_{q+1}\times U_{q}} and bias vectors 𝐛qUq\mathbf{b}_{q}\in\mathbb{R}^{U_{q}} for all layers.

We replace the mappings in (1), (3), and (6) with DNNs as

𝐦i0\displaystyle\mathbf{m}_{i0} =i(𝐚i)=QMi(𝐚i;θMi),\displaystyle=\mathcal{M}_{i}(\mathbf{a}_{i})=\mathcal{F}_{Q_{M_{i}}}(\mathbf{a}_{i};\theta_{M_{i}}), (8)
{𝐦0i:i}\displaystyle\{\mathbf{m}_{0i}:\forall i\} =0(𝐲0)=QM0(𝐲0;θM0),\displaystyle=\mathcal{M}_{0}(\mathbf{y}_{0})=\mathcal{F}_{Q_{M_{0}}}(\mathbf{y}_{0};\theta_{M_{0}}), (9)
𝐱i\displaystyle\mathbf{x}_{i} =𝒳i(𝐚i,𝐲i)=QXi(𝐚i𝐲i;θXi),\displaystyle=\mathcal{X}_{i}(\mathbf{a}_{i},\mathbf{y}_{i})=\mathcal{F}_{Q_{X_{i}}}(\mathbf{a}_{i}\oplus\mathbf{y}_{i};\theta_{X_{i}}), (10)

where 𝐮𝐯[𝐮T,𝐯T]T\mathbf{u}\oplus\mathbf{v}\triangleq[\mathbf{u}^{T},\mathbf{v}^{T}]^{T} stands for a concatenation operation of two vectors 𝐮\mathbf{u} and 𝐯\mathbf{v}. The output dimension of QMi(;θMi)\mathcal{F}_{Q_{M_{i}}}(\cdot;\theta_{M_{i}}), QM0(;θM0)\mathcal{F}_{Q_{M_{0}}}(\cdot;\theta_{M_{0}}), and QXi(;θXi)\mathcal{F}_{Q_{X_{i}}}(\cdot;\theta_{X_{i}}) are respectively set to the lengths of the desired outputs. The optimality of this DNN approximation is guaranteed by the universal approximation theorem [26]. It states that for any continuous mapping z(𝐮)z(\mathbf{u}) defined on a compact set 𝐮𝒰\mathbf{u}\in\mathcal{U}, there exist a finite QQ and arbitrary small ε>0\varepsilon>0 such that

sup𝐮𝒰z(𝐮)Q(𝐮;θ)ε\displaystyle\sup_{\mathbf{u}\in\mathcal{U}}\|z(\mathbf{u})-\mathcal{F}_{Q}(\mathbf{u};\mathbf{\theta})\|\leq\varepsilon (11)

with ε>0\varepsilon>0 being an arbitrary small number. From (11), we can identify a DNN close to any continuous function in terms of the worst-case Euclidean distance. Note that (11) also holds for the unknown optimal mappings i()\mathcal{M}_{i}^{\star}(\cdot) and 𝒳i()\mathcal{X}_{i}^{\star}(\cdot). Therefore, the DNN approximations in (8)-(10) can provide a tractable formulation of (P2) but without loss of the optimality.

Refer to caption
Figure 3: End-to-end structure of proposed CECIL-based F-RAN system.

IV-A Training and Implementation

Fig. 3 illustrates the CECIL-based F-RAN systems where the computations of the ENs and cloud are carried out by the DNNs in (8)-(10). The forward pass computations of the CECIL are provided in Algorithm 1. Plugging (8)-(10) to (P2) results in

(P3):\displaystyle(\text{P}3): max𝚯𝔼𝐚[f(𝐚,{QXi(𝐚i𝐲i;θXi):i})]\displaystyle\max_{\boldsymbol{\Theta}}\mathbb{E}_{\mathbf{a}}[f(\mathbf{a},\{\mathcal{F}_{Q_{X_{i}}}(\mathbf{a}_{i}\oplus\mathbf{y}_{i};\theta_{X_{i}}):\forall i\})]
subject to QXi(𝐚i𝐲i;θXi)𝒟i,i,𝐚,\displaystyle~{}\text{subject to }\mathcal{F}_{Q_{X_{i}}}(\mathbf{a}_{i}\oplus\mathbf{y}_{i};\theta_{X_{i}})\in\mathcal{D}_{i},\forall i,\ \forall\mathbf{a},

where 𝚯\boldsymbol{\Theta} accounts for the set of learnable parameters of the DNNs in (8)-(10) defined as

𝚯{θMi:i=0,1,,N}{θXi:i=1,,N}.\displaystyle\boldsymbol{\Theta}\triangleq\{\theta_{M_{i}}:\forall i=0,1,\cdots,N\}\bigcup\{\theta_{X_{i}}:\forall i=1,\cdots,N\}. (12)

To remove the constraint of (P2), the output activation of QXi(;θXi)\mathcal{F}_{Q_{X_{i}}}(\cdot;\theta_{X_{i}}) can be designed as the projection operator argmin𝐯𝒟i𝐮𝐯\arg\min_{\mathbf{v}\in\mathcal{D}_{i}}\|\mathbf{u}-\mathbf{v}\| for a layer input 𝐮\mathbf{u}. For the convex feasibility set 𝒟i\mathcal{D}_{i}, this projection activation is given by a convex quadratic program (QP) whose gradient-based training rules can be obtained with the backpropagation algorithm [27]. The nonconvex projection problem can be tackled by the successive convex approximation mechanism [28] by solving a series of approximated convex QPs. The gradients of such an iterative procedure can be obtained by integrating the gradients of approximated convex QPs. As a consequent, (P2) is readily solved by the gradient descent method and its variants for stochastic optimizations, e.g., the Adam algorithm [29]. We adopt the mini-batch stochastic gradient descent (SGD) method [30] where the expectations over the distribution of 𝐚\mathbf{a} are estimated as the sample mean evaluated on the mini-batch sets 𝒜{𝐚}\mathcal{A}\triangleq\{\mathbf{a}\}. The SGD update at the tt-th training epoch is given by

𝚯(t)=𝚯(t1)+α1|𝒜|𝐚𝒜𝚯f(𝐚,{QXi(𝐚i𝐲i;θXi(t1)):i}),\displaystyle\boldsymbol{\Theta}^{(t)}=\boldsymbol{\Theta}^{(t-1)}+\alpha\frac{1}{|\mathcal{A}|}\sum_{\mathbf{a}\in\mathcal{A}}\nabla_{\boldsymbol{\Theta}}f(\mathbf{a},\{\mathcal{F}_{Q_{X_{i}}}(\mathbf{a}_{i}\oplus\mathbf{y}_{i};\theta_{X_{i}}^{(t-1)}):\forall i\}), (13)

where q(t)q^{(t)} indicates a variable qq attained at the tt-th epoch, α>0\alpha>0 is a learning rate, and q\nabla_{q} denotes the gradient operator with respect to qq. The sample gradient 𝚯f(𝐚,{QXi(𝐚i𝐲i;θXi:i})\nabla_{\boldsymbol{\Theta}}f(\mathbf{a},\{\mathcal{F}_{Q_{X_{i}}}(\mathbf{a}_{i}\oplus\mathbf{y}_{i};\theta_{X_{i}}:\forall i\}) can be numerically calculated by the backpropagation algorithm [30], provided that the gradients of the channels hi0()h_{i0}(\cdot) and h0i()h_{0i}(\cdot) as well as the transmission strategies u()u(\cdot) and di()d_{i}(\cdot) are available.

The full knowledge of the global observation 𝐚\mathbf{a} is required for computing the gradient of the utility function. This can be achieved by the centralized training procedure in an offline domain before real-time optimization inferences [16, 17, 18, 19]. To this end, we can collect training samples, i.e., a set of the local observation vectors 𝐚i\mathbf{a}_{i}, from the ENs in advance. No labels such as the information regarding the optimal solution to (P1) are needed in the training. Thus, the proposed training strategy (13) is performed in a fully unsupervised manner. Once the parameter set 𝚯\boldsymbol{\Theta} is determined, they are readily implemented at the cloud and ENs. As discussed, the forward pass in Algorithm 1 can be carried out only with locally measurable statistics, thereby leading to the distributed realization of the online computations (8)-(10). Compared to existing decentralized F-RAN optimization algorithms [9, 10] that require iterative procedures, the proposed CECIL does not need any repetitions in the real-time inference step. Hence, the proposed approach can save both the fronthaul signaling and computation overheads.

V Message Multiple Access Design

The uplink and downlink interaction steps involve the transmission of the multiple messages over the fronthaul links, incurring inter-message interference both at the cloud and ENs. To handle this issue, we propose efficient fronthaul multiple accessing schemes that design the uplink and downlink signaling strategies u()u(\cdot) in (2) and di()d_{i}(\cdot) in (4), respectively.

V-A OMA fronthauling

We first develop an OMA method where distinct fronthaul resources are assigned to each of uplink and downlink messages to avoid inter-message interferences. The uplink messages 𝐦i0Mi0\mathbf{m}_{i0}\in\mathbb{R}^{M_{i0}} for i=1,,Ni=1,\cdots,N occupy NN bundles of the fronthaul RBs where the ii-th resource bundle containing Mi0M_{i0} RBs is dedicated to the uplink message transmission of EN ii. In this setup, the uplink signaling strategy u()u(\cdot) in (2) becomes the concatenation operation. Then, the received signals at the cloud (2) is rewritten by

𝐲0OMA=i=1Nhi0(𝐦i0),\displaystyle\mathbf{y}_{0}^{\text{OMA}}=\bigoplus_{i=1}^{N}h_{i0}(\mathbf{m}_{i0}), (14)

where i=1N𝐪i[𝐪1T,,𝐪NT]T\bigoplus_{i=1}^{N}\mathbf{q}_{i}\triangleq[\mathbf{q}_{1}^{T},\cdots,\mathbf{q}_{N}^{T}]^{T} defines the concatenation of NN vectors 𝐪i\mathbf{q}_{i} for i=1,,Ni=1,\cdots,N. The dimension of 𝐲0OMA\mathbf{y}_{0}^{\text{OMA}} becomes MUi=1NMi0M_{U}\triangleq\sum_{i=1}^{N}M_{i0} where MUM_{U} indicates the total number of the uplink fronthaul RBs.

In the downlink, 𝐦0iM0i\mathbf{m}_{0i}\in\mathbb{R}^{M_{0i}} is sent on NN orthogonal downlink fronthaul links each having M0iM_{0i} RBs. Hence, the downlink signaling strategy di()d_{i}(\cdot) in (4) can be specified as a masking operation extracting 𝐦0i\mathbf{m}_{0i} from the downlink message set {𝐦0j:j}\{\mathbf{m}_{0j}:\forall j\}, i.e., 𝐝i=𝐦0i\mathbf{d}_{i}=\mathbf{m}_{0i}. Combining this with 0()\mathcal{M}_{0}(\cdot) in (3), the downlink message generation of the OMA system can be refined as the procedure that creates the concatenation of NN downlink messages. It follows

i=1N𝐦0i=[𝐦01T,,𝐦0NT]T=0(𝐲0OMA).\displaystyle\bigoplus_{i=1}^{N}\mathbf{m}_{0i}=[\mathbf{m}_{01}^{T},\cdots,\mathbf{m}_{0N}^{T}]^{T}=\mathcal{M}_{0}(\mathbf{y}_{0}^{\text{OMA}}). (15)

The downlink message 𝐦0i\mathbf{m}_{0i} is then received by EN ii through the corresponding downlink fronthaul channel h0i()h_{0i}(\cdot). Hence, we refine the received signal at EN ii in (5) as

𝐲iOMA=h0i(𝐦0i).\displaystyle\mathbf{y}_{i}^{\text{OMA}}=h_{0i}(\mathbf{m}_{0i}). (16)

Since M0iM_{0i} RBs are allocated for the transmission of 𝐦0i\mathbf{m}_{0i}, the length of 𝐲iOMA\mathbf{y}_{i}^{\text{OMA}} is given by M0iM_{0i}, resulting in MDi=1NM0iM_{D}\triangleq\sum_{i=1}^{N}M_{0i} downlink fronthaul RBs. Therefore, the total number of the RBs denoted by MM is written by M=MU+MD=i=1N(Mi0+M0i)M=M_{U}+M_{D}=\sum_{i=1}^{N}(M_{i0}+M_{0i}).

Remark 2.

The orthogonal interaction concept has been adopted in various decentralized optimization techniques such as the message-passing algorithms [9, 31], the ADMM framework [10, 32], and the distributed learning systems [17, 18, 19, 20]. However, they do not consider the effect of the practical fronthaul links including the channel imperfection and the signaling overheads. Also, the effectiveness of the OMA interaction policy is not clearly addressed in the DNN-based optimization approaches [17, 18, 19, 20]. In the proceeding sections, we investigate the optimality of the proposed CECIL approach implemented with the OMA fronthauling scheme.

V-B NOMA fronthauling

The OMA strategy may waste the fronthaul resources for allocating distinct RBs for each EN. To this end, we propose a non-orthogonal message transmission scheme where all ENs share the same fronthaul resources. Provided that MUM_{U} RBs are assigned for the uplink message transmission, EN ii obtains its messages 𝐦i0\mathbf{m}_{i0} from (1) by setting Mi0=MUM_{i0}=M_{U}, i.e., utilizing all uplink fronthaul RBs. Then, the uplink transmission strategy u()u(\cdot) is obtained as the superposition of all the downlink messages since they are interfere with each other. Therefore, the cloud receives the superposed signal 𝐲0NOMAMU\mathbf{y}_{0}^{\text{NOMA}}\in\mathbb{R}^{M_{U}} of length MUM_{U} expressed as

𝐲0NOMA=i=1Nhi0(𝐦i0).\displaystyle\mathbf{y}_{0}^{\text{NOMA}}=\sum_{i=1}^{N}h_{i0}(\mathbf{m}_{i0}). (17)

In the downlink, the cloud multicasts a common downlink message 𝐦0MD\mathbf{m}_{0}\in\mathbb{R}^{M_{D}} of length MDM_{D} to all the ENs by leveraging all the available MDM_{D} downlink fronthaul RBs. Then, the downlink signaling in (2) is simply fixed as 𝐝i=𝐦0\mathbf{d}_{i}=\mathbf{m}_{0}, i\forall i, such that the cloud directly transmits the output of the cloud computation in (18). We thus modify (3) for the NOMA scheme as

𝐦0=0(𝐲0NOMA).\displaystyle\mathbf{m}_{0}=\mathcal{M}_{0}(\mathbf{y}_{0}^{\text{NOMA}}). (18)

Accordingly, the received signal 𝐲iNOMAMD\mathbf{y}_{i}^{\text{NOMA}}\in\mathbb{R}^{M_{D}} of length MDM_{D} at EN ii can be rewritten by

𝐲iNOMA=h0i(𝐦0).\displaystyle\mathbf{y}_{i}^{\text{NOMA}}=h_{0i}(\mathbf{m}_{0}). (19)

V-C Discussions

We discuss the effectiveness of the OMA and NOMA schemes for the perfect fronthaul link case, i.e., hi0()h_{i0}(\cdot) and h0i()h_{0i}(\cdot) are given by the identity functions. The received signals of the OMA and NOMA systems are respectively recast to

𝐲0OMA=i=1N𝐦i0,𝐲iOMA=𝐦0i,\displaystyle\mathbf{y}_{0}^{\text{OMA}}=\bigoplus_{i=1}^{N}\mathbf{m}_{i0},\ \mathbf{y}_{i}^{\text{OMA}}=\mathbf{m}_{0i}, (20)
𝐲0NOMA=i=1N𝐦i0,𝐲iNOMA=𝐦0,\displaystyle\mathbf{y}_{0}^{\text{NOMA}}=\sum_{i=1}^{N}\mathbf{m}_{i0},\ \mathbf{y}_{i}^{\text{NOMA}}=\mathbf{m}_{0}, (21)

which simplifies (15) and (18) as

i=1N𝐦0i\displaystyle\bigoplus_{i=1}^{N}\mathbf{m}_{0i} =0(i=1Ni(𝐚i)),\displaystyle=\mathcal{M}_{0}\left(\bigoplus_{i=1}^{N}\mathcal{M}_{i}(\mathbf{a}_{i})\right), (22)
𝐦0\displaystyle\mathbf{m}_{0} =0(i=1Ni(𝐚i)).\displaystyle=\mathcal{M}_{0}\left(\sum_{i=1}^{N}\mathcal{M}_{i}(\mathbf{a}_{i})\right). (23)

V-C1 Optimality of NOMA fronthauling

We first focus on the NOMA system. For constructing successful decision inference 𝒳i()\mathcal{X}_{i}(\cdot) in (6), the optimal downlink message denoted by 𝐦0\mathbf{m}_{0}^{\star} needs to properly encode all local observations 𝐚i\mathbf{a}_{i}, i\forall i. Also, since the NOMA downlink message 𝐦0\mathbf{m}_{0}^{\star} is common for all ENs, it should not be affected by permutations of input features. In other words, the computation of the downlink message has to be independent of the ordering of 𝐚i\mathbf{a}_{i}, i\forall i, so that individual ENs can leverage the downlink message for the individual decision 𝐱i=𝒳i(𝐚i,𝐦0)\mathbf{x}_{i}=\mathcal{X}_{i}(\mathbf{a}_{i},\mathbf{m}_{0}) without knowing their order ii indexed by the network. Notice that such a permutation-invariant property indeed holds for (23) due to the superposition signaling in (21).

Based on this intuition, we can model the optimal downlink message 𝐦0\mathbf{m}_{0}^{\star} of the NOMA by using a generic set operator g()g(\cdot), which is defined over a set of the local observations {𝐚i:i}\{\mathbf{a}_{i}:\forall i\}, to satisfy the permutation-invariant property. The corresponding formulation can be written as

𝐦0=g({𝐚i:i}).\displaystyle\mathbf{m}_{0}^{\star}=g(\{\mathbf{a}_{i}:\forall i\}). (24)

It is easy to see that (24) does not change with the ordering of the ENs since the input feature is given by the set. We may lose the optimality in the NOMA system if the downlink message calculation strategy (23) cannot approximate the optimal one in (24) accurately. The following proposition states that (23) can be the universal approximator for an arbitrary set function.

Proposition 1.

Suppose that the local observation 𝐚i\mathbf{a}_{i} is drawn from a compact set 𝒜i\mathcal{A}_{i} and has the identical dimension. Let g()g(\cdot) be any continuous set function with the permutation-invariant property that maps NN local observations to MDM_{D}-dimensional output vector. Then, for arbitrary small ε>0\varepsilon>0, there exist an outer mapping 0()\mathcal{M}_{0}(\cdot) and an inner mapping i()\mathcal{M}_{i}(\cdot) satisfying

sup{𝐚i𝒜i,i}g({𝐚i:i})0(i=1Ni(𝐚i))<ε.\displaystyle\sup_{\{\mathbf{a}_{i}\in\mathcal{A}_{i},\forall i\}}\left\|g(\{\mathbf{a}_{i}:\forall i\})-\mathcal{M}_{0}\left(\sum_{i=1}^{N}\mathcal{M}_{i}(\mathbf{a}_{i})\right)\right\|<\varepsilon. (25)
Proof:

Let [𝐮]k[\mathbf{u}]_{k} be the kk-th element of a vector 𝐮\mathbf{u}. Suppose an arbitrary set function λ({𝐚i:i})\lambda(\{\mathbf{a}_{i}:\forall i\}) whose output is given by a scalar number. From [33, Thm. 9] and the Stone–Weierstrass theorem [34], there exist a continuous mapping mk:MUm_{k}:\mathbb{R}^{M_{U}}\rightarrow\mathbb{R} and arbitrary small εk>0\varepsilon_{k}>0 which fulfills

sup{𝐚i𝒜i,i}|λ({𝐚i:i})mk(i=1Ni(𝐚i))|<εk.\displaystyle\sup_{\{\mathbf{a}_{i}\in\mathcal{A}_{i},\forall i\}}\left|\lambda(\{\mathbf{a}_{i}:\forall i\})\!\!-\!m_{k}\left(\sum_{i=1}^{N}\mathcal{M}_{i}(\mathbf{a}_{i})\right)\right|\!<\!\varepsilon_{k}. (26)

By setting λ({𝐚i:i})=[g({𝐚i:i})]k\lambda(\{\mathbf{a}_{i}:\forall i\})=[g(\{\mathbf{a}_{i}:\forall i\})]_{k} in (26), it is concluded mk()m_{k}(\cdot) forms the universal approximator for the kk-th element of the optimal message vector [𝐦0]k=[g({𝐚i:i})]k[\mathbf{m}_{0}^{\star}]_{k}=[g(\{\mathbf{a}_{i}:\forall i\})]_{k}. Stacking MDM_{D} element-wise mappings mk()m_{k}(\cdot) for k=1,,MDk=1,\cdots,M_{D} leads to (25) with 0()=k=1MDmk()\mathcal{M}_{0}(\cdot)=\bigoplus_{k=1}^{M_{D}}m_{k}(\cdot) and εk=εMD\varepsilon_{k}=\frac{\varepsilon}{\sqrt{M_{D}}}. This completes the proof. ∎

Notice that the optimal downlink message generation (24) cannot be implemented in the practical F-RAN systems since the cloud needs to know the local statistics of the ENs perfectly. Nevertheless, thanks to Proposition 1, it can be alternatively executed through the proposed computation rule in (23). Thus, although the uplink messages are independently created at the ENs, the superposition signaling and resource sharing policies of the uplink NOMA fronthauling strategy leads to the successful distributed decision at the ENs. Since Proposition 1 holds for any continuous functions 0()\mathcal{M}_{0}(\cdot) and i()\mathcal{M}_{i}(\cdot), the universal approximation property is satisfied in the DL formulation with well-designed DNNs (8) and (9). As a result, the unknown optimal downlink message 𝐦0\mathbf{m}_{0}^{\star} can be obtained by optimizing the DNNs with the end-to-end training policy (13).

V-C2 Impact of MUM_{U}

We analyze the number of the uplink fronthaul RBs MUM_{U} required for achieving the universal approximation property (25). For a scalar input uu, a simple inner mapping i(u)=[1,u,u2,,uN]T\mathcal{M}_{i}(u)=[1,u,u^{2},\cdots,u^{N}]^{T} of length N+1N+1 achieves the element-wise universal approximation property (26) [33, Thm. 7]. This implies that MU=N+1M_{U}=N+1 uplink fronthaul RBs are sufficient if all the local observations 𝐚i\mathbf{a}_{i}, i\forall i, are given by scalar numbers. An extension to a general vector input case is challenging. Instead, we may consider a trivial modification of (24) as

𝐦0=g({𝐚i:i})=g({[𝐚i]l:i,l=1,,Ai}),\displaystyle\mathbf{m}_{0}^{\star}=g(\{\mathbf{a}_{i}:\forall i\})=g(\{[\mathbf{a}_{i}]_{l}:\forall i,l=1,\cdots,A_{i}\}), (27)

where the observation vector 𝐚iAi\mathbf{a}_{i}\in\mathbb{R}^{A_{i}} is decoupled into its AiA_{i} elements [𝐚i]l[\mathbf{a}_{i}]_{l} for l=1,,Ail=1,\cdots,A_{i}. A modified operator now converts a set of i=1NAi\sum_{i=1}^{N}A_{i} elements into MDM_{D}-dimensional downlink message vector. This preserves the optimality since the resulting message still involves the global state 𝐚={𝐚i:i}\mathbf{a}=\{\mathbf{a}_{i}:\forall i\} essential for the individual decision of the ENs. To implement (27), EN ii can employ AiA_{i} different operators il([𝐚i]l)=[1,[𝐚i]l,[𝐚i]l2,,[𝐚i]lA]T\mathcal{M}_{il}([\mathbf{a}_{i}]_{l})=[1,[\mathbf{a}_{i}]_{l},[\mathbf{a}_{i}]_{l}^{2},\cdots,[\mathbf{a}_{i}]_{l}^{A}]^{T}, l=1,,Ai\forall l=1,\cdots,A_{i}. Then, (23) can be recast to

𝐦0=0(i=1Nl=1Aiil([𝐚i]l)).\displaystyle\mathbf{m}_{0}=\mathcal{M}_{0}\left(\sum_{i=1}^{N}\sum_{l=1}^{A_{i}}\mathcal{M}_{il}([\mathbf{a}_{i}]_{l})\right). (28)

The NOMA strategy in (28) is achieved with MU=i=1NAi+1M_{U}=\sum_{i=1}^{N}A_{i}+1 uplink fronthaul RBs. Although (28) is proven to be effective, we adopt the vector-valued operator i:AiMU\mathcal{M}_{i}:\mathbb{R}^{A_{i}}\rightarrow\mathbb{R}^{M_{U}} as in (23) since it includes (28) as a special case by restricting weight matrices of the DNN in (8) to diagonal matrices. Numerical results confirm that (23) requests a much smaller number of the uplink fronthaul RBs than the analytical result MU=i=1NAi+1M_{U}=\sum_{i=1}^{N}A_{i}+1.

V-C3 Optimality of OMA fronthauling

We now discuss the optimality of the OMA scheme in (22). Thanks to the orthogonal transmission, the cloud can separate the uplink messages 𝐦i0=i(𝐚i)\mathbf{m}_{i0}=\mathcal{M}_{i}(\mathbf{a}_{i}), i\forall i. Nevertheless, the universal approximation theorem (11) cannot be constructed for (22) since a simple concatenation of DNNs i=1Ni()\bigoplus_{i=1}^{N}\mathcal{M}_{i}(\cdot) is far from the fully-connected DNN assumed in (11). To this end, we present a suitable transformation of (22) that removes the concatenation operations. Let 𝐦~0i~i(𝐚i)MU\tilde{\mathbf{m}}_{0i}\triangleq\tilde{\mathcal{M}}_{i}(\mathbf{a}_{i})\in\mathbb{R}^{M_{U}} of length MUM_{U} be a zero-padded version of 𝐦i0Mi0\mathbf{m}_{i0}\in\mathbb{R}^{M_{i0}}. All elements of 𝐦~0i\tilde{\mathbf{m}}_{0i} are zeros except the (j=1i1Mj0+1)(\sum_{j=1}^{i-1}M_{j0}+1)-th to the (j=1iMj0)(\sum_{j=1}^{i}M_{j0})-th elements being replaced with 𝐦i0\mathbf{m}_{i0}. Similarly, the corresponding message generation operator ~i()\tilde{\mathcal{M}}_{i}(\cdot) can also be defined as the zero-padded version of the original inference i()\mathcal{M}_{i}(\cdot). Then, (22) can be refined as

𝐦~0=0(i=1N~i(𝐚i)),\displaystyle\tilde{\mathbf{m}}_{0}=\mathcal{M}_{0}\left(\sum_{i=1}^{N}\tilde{\mathcal{M}}_{i}(\mathbf{a}_{i})\right), (29)

where 𝐦~0i=1N𝐦0iMD\tilde{\mathbf{m}}_{0}\triangleq\bigoplus_{i=1}^{N}\mathbf{m}_{0i}\in\mathbb{R}^{M_{D}} is the concatenation of the downlink messages. Unlike the NOMA case (23), due to the concatenation operation, the ordering of the ENs affects the downlink message computations of the OMA. Therefore, the optimal OMA downlink message 𝐦~0\tilde{\mathbf{m}}_{0}^{\star} is modeled as a generic inference g~()\tilde{g}(\cdot) with the stacked local observation vectors, i.e., 𝐦~0=g~(i=1N𝐚i)\tilde{\mathbf{m}}_{0}^{\star}=\tilde{g}(\bigoplus_{i=1}^{N}\mathbf{a}_{i}), rather than the permutation-invariant set function in (24). Proposition 1, which is based on the permutation-invariance of the target set function, cannot be straightforwardly applied to the OMA method.

To address this, we leverage the Kolmogorov–Arnold representation theorem [35] which states that any continuous mapping can be represented as a superposition of continuous functions. Assuming MD=1M_{D}=1 and scalar local observations aia_{i}\in\mathbb{R}, i\forall i, a continuous function g~(i=1Nai)\tilde{g}(\bigoplus_{i=1}^{N}a_{i}) has the following representation [33, Thm. 8]

g~(i=1Nai)=0(i=1N~i(ai))\displaystyle\tilde{g}\left(\bigoplus_{i=1}^{N}a_{i}\right)=\mathcal{M}_{0}\left(\sum_{i=1}^{N}\tilde{\mathcal{M}}_{i}(a_{i})\right) (30)

with some mappings 0:2N+1\mathcal{M}_{0}:\mathbb{R}^{2N+1}\rightarrow\mathbb{R} and ~i:2N+1\tilde{\mathcal{M}}_{i}:\mathbb{R}\rightarrow\mathbb{R}^{2N+1}. The uplink message generation operator of the OMA ~i()\tilde{\mathcal{M}}_{i}(\cdot) requires MU=2N+1M_{U}=2N+1 uplink fronthaul RBs for the universal approximation property. With similar approaches presented in Section V-C2, extensions of (30) to the general case with MD>1M_{D}>1 and vector inputs 𝐚iAi\mathbf{a}_{i}\in\mathbb{R}^{A_{i}}, i\forall i, result in MU=2i=1NAi+1M_{U}=2\sum_{i=1}^{N}A_{i}+1 uplink RBs, which is about twice as large as that of the NOMA case in (28) achieved with MU=i=1NAi+1M_{U}=\sum_{i=1}^{N}A_{i}+1. Thus, although the performance of the OMA method could reach that of the NOMA system, it might need more uplink fronthaul resources. This is verified from the numerical results.

VI Imperfect Fronthaul Links

This section investigates the imperfect fronthaul link cases with random noise and finite capacity constraints. The robust training strategy of the CECIL framework is proposed for each scenario. The details are explained in the following.

VI-A Noisy fronthaul links

The imperfection of the wireless fronthauls can be modeled by the random additive noise. We specify the fronthaul channel functions as hi0(𝐮)=h0i(𝐮)=𝐮+𝜼h_{i0}(\mathbf{u})=h_{0i}(\mathbf{u})=\mathbf{u}+\boldsymbol{\eta} where 𝜼\boldsymbol{\eta} stands for the noise vector with arbitrary distribution. In the OMA system, the received messages 𝐲0OMA\mathbf{y}_{0}^{\text{OMA}} at the cloud (14) and 𝐲iOMA\mathbf{y}_{i}^{\text{OMA}} at EN ii (16) are respectively written by

𝐲0OMA=i=1N𝐦i0+𝜼0and𝐲iOMA=𝐦0i+𝜼i,\displaystyle\mathbf{y}_{0}^{\text{OMA}}=\bigoplus_{i=1}^{N}\mathbf{m}_{i0}+\boldsymbol{\eta}_{0}\ \text{and}\ \mathbf{y}_{i}^{\text{OMA}}=\mathbf{m}_{0i}+\boldsymbol{\eta}_{i}, (31)

where 𝜼i\boldsymbol{\eta}_{i} for i=0,1,,Ni=0,1,\cdots,N denotes the noise at node ii. We obtain similar formulations for the NOMA system as

𝐲0NOMA=i=1N𝐦i0+𝜼0and𝐲iNOMA=𝐦0+𝜼i.\displaystyle\mathbf{y}_{0}^{\text{NOMA}}=\sum_{i=1}^{N}\mathbf{m}_{i0}+\boldsymbol{\eta}_{0}\ \text{and}\ \mathbf{y}_{i}^{\text{NOMA}}=\mathbf{m}_{0}+\boldsymbol{\eta}_{i}. (32)

The noise hinders successful decisions at the ENs, thereby requiring robust message generation strategies both at the cloud and ENs. To this end, we modify the training update in (13) by taking the noise into account. We include numerous realization of the random noise vectors into the training set. A mini-batch set 𝒜\mathcal{A} becomes a set of tuples (𝐚,{𝜼i:i})(\mathbf{a},\{\boldsymbol{\eta}_{i}:\forall i\}) of the global observation 𝐚\mathbf{a} and a collection of the uplink and downlink noise vectors {𝜼i:i}\{\boldsymbol{\eta}_{i}:\forall i\}. The DNN parameter 𝚯\boldsymbol{\Theta} is then adjusted in the ascent direction of the gradient averaged over the noise distribution. Such a data-driven optimization enables robust design of the CECIL by observing numerous noisy messages (31) and (32) in the training step.

VI-B Finite-capacity fronthaul links

Until Section. V, we assumed lossless fronthaul interactions where each RB can convey a real-valued scalar number without any distortion. In the practical wired fronthaul link setup, however, the resolution of the message would be limited by the fronthaul capacity. To this end, in this subsection, we design a robust training policy of the CECIL for the general case where the fronthaul links are subject to the transmission capacity. The fronthaul channels hi0()h_{i0}(\cdot) and h0i()h_{0i}(\cdot) can be given as the rounding functions that output the nearest integer of the transmitted messages. In this configuration, only the lossy coordination is allowed to share discrete-valued messages. To accommodate capacity-limited fronthaul links, we present a message quantization process that creates discrete representations of continuous-valued messages. We focus on the quantization of the uplink message 𝐦i0\mathbf{m}_{i0}, but the proposed techniques are readily applied to the downlink message quantization. Let 𝐦^i0Mi0\hat{\mathbf{m}}_{i0}\in\mathbb{R}^{M_{i0}} be the quantization output of 𝐦i0\mathbf{m}_{i0}. The capacity of the uplink fronthaul link connecting EN ii and the cloud is modeled by a set of integers CilC_{il}, l=1,,Mi0\forall l=1,\cdots,M_{i0}, each of which indicates the alphabet size, or equivalently, the modulation level allowed for transferring the ll-th element [𝐦i0]l[\mathbf{m}_{i0}]_{l}. It is expressed as

[𝐦^i0]lψCil([𝐦i0]l){0,1,,Cil1},\displaystyle[\hat{\mathbf{m}}_{i0}]_{l}\triangleq\psi_{C_{il}}([\mathbf{m}_{i0}]_{l})\in\{0,1,\cdots,C_{il}-1\}, (33)

where ψC()\psi_{C}(\cdot) stands for the quantization function with the quantization level CC. It maps a continuous-valued input into a discrete set {0,1,,C1}\{0,1,\cdots,C-1\}. The received signals in (2) and (5) can then be refined as 𝐲0=u({hi0(𝐦^i0):i})\mathbf{y}_{0}=u\big{(}\{h_{i0}(\hat{\mathbf{m}}_{i0}):\forall i\}\big{)} and 𝐲i=h0i(di({𝐦^0j:j}))\mathbf{y}_{i}=h_{0i}\big{(}d_{i}(\{\hat{\mathbf{m}}_{0j}:\forall j\})\big{)}, respectively.

The quantization operator ψCil()\psi_{C_{il}}(\cdot) is viewed as an activation function that is followed by i()\mathcal{M}_{i}(\cdot), i.e., the DNN QMi(;θMi)\mathcal{F}_{Q_{M_{i}}}(\cdot;\theta_{M_{i}}) in (8). Our target is to design the activation ψCil()\psi_{C_{il}}(\cdot) such that 𝐦^i0\hat{\mathbf{m}}_{i0} acts as an accurate estimate of the original message 𝐦i0\mathbf{m}_{i0}. In this way, the cloud and ENs can successfully recover the original messages through their quantized observations. One naive approach would be to employ the rounding function. However, the simple rounding activation exhibits zero gradient for all input regime, thereby prohibiting the DNN parameters from being optimized using the SGD method in (13). This has been well-known as the vanishing gradient problem where the performance of the DNNs are no longer improved but possibly gets stuck into an unsatisfactory point [30]. In our case, the DNNs QMi(;θMi)\mathcal{F}_{Q_{M_{i}}}(\cdot;\theta_{M_{i}}) in (8) and (9) would not be trained properly. To handle this difficulty, a novel quantization method has been provided in [19, 36], but it is only applicable to the special case of Cil=2C_{il}=2.

We propose an integerization technique which is regarded as an extension of the binarization method in [19] for the general case of Cil>2C_{il}>2. The ll-th element of the continuous-valued message 𝐦i0\mathbf{m}_{i0} is assumed to lie in a bounded region [0,Cil1][0,C_{il}-1]. This can be achieved by applying a bounding activation, e.g., the sigmoid function, to the output layer of the DNN QMi(;θMi)\mathcal{F}_{Q_{M_{i}}}(\cdot;\theta_{M_{i}}). The proposed quantization function ψCil()\psi_{C_{il}}(\cdot) in (33) carries out a randomized rounding operation. It first configures two nearest integers c1c-1 and cc, c=1,,Cil1\forall c=1,\cdots,C_{il}-1, of the input [𝐦i0]l[\mathbf{m}_{i0}]_{l}, i.e., [𝐦i0]l[c1,c)[\mathbf{m}_{i0}]_{l}\in[c-1,c), as candidates of the quantization. For notational simplicity, we denote m[𝐦i0]lm\triangleq[\mathbf{m}_{i0}]_{l} and m^[𝐦^i0]l\hat{m}\triangleq[\hat{\mathbf{m}}_{i0}]_{l}. Provided that m[c1,c)m\in[c-1,c), the rounding output m^=ψCil(m)\hat{m}=\psi_{C_{il}}(m) can be either c1c-1 or cc with probabilities

Pr{m^\displaystyle\Pr\{\hat{m} =c1|m[c1,c)}=(cm),\displaystyle=c-1|m\in[c-1,c)\}=(c-m), (34)
Pr{m^\displaystyle\Pr\{\hat{m} =c|m[c1,c)}=(m(c1)).\displaystyle=c|m\in[c-1,c)\}=(m-(c-1)). (35)

The probabilities in (34) and (35) can be interpreted as the distances from the continuous input mm to the target quantization points cc and c1c-1, respectively. The probability Pr{m^=c|m[c1,c)}\Pr\{\hat{m}=c|m\in[c-1,c)\} increases as mm gets closer to cc, and the resulting quantization m^\hat{m} is more likely to be cc.

The proposed quantization activation ψCil(m)\psi_{C_{il}}(m) for an input m[0,Cil1)m\in[0,C_{il}-1) is given as

ψCil(m)={c1,with prob.(cm)𝟙m[c1,c),c,with prob.(m(c1))𝟙m[c1,c),\displaystyle\!\!\!\!\!\psi_{C_{il}}(m)\!=\!\!\begin{cases}c-1,&\!\!\!\!\!\text{with prob.}\ (c-m)\!\cdot\!\mathds{1}_{m\in[c-1,c)},\\ c,&\!\!\!\!\!\text{with prob.}\ (m-(c-1))\!\cdot\!\mathds{1}_{m\in[c-1,c)},\end{cases} (36)

where 𝟙Z{0,1}\mathds{1}_{Z}\in\{0,1\} denotes the indicator function which is 11 if the condition ZZ is true and 0 otherwise. The following proposition states the quality of the quantization m^=ψCil(m)\hat{m}=\psi_{C_{il}}(m) in terms of its estimation property for unavailable information mm.

Proposition 2.

The quantization m^=ψCil(m)\hat{m}=\psi_{C_{il}}(m) with the probabilities (34) and (35) is an unbiased estimate of mm.

Proof:

To prove the unbiased estimation property, it suffices to show that the conditional expectation of m^\hat{m} given mm, denoted by 𝔼m^[m^|m]\mathbb{E}_{\hat{m}}[\hat{m}|m], is equal to mm. It follows

𝔼m^[m^|m]\displaystyle\mathbb{E}_{\hat{m}}[\hat{m}|m] =𝔼c[𝔼m^[m^|m[c1,c)]]\displaystyle=\mathbb{E}_{c}\big{[}\mathbb{E}_{\hat{m}}[\hat{m}|m\in[c-1,c)]\big{]} (37)
=c=1CilPr{m[c1,c)}𝔼m^[m^|m[c1,c)]\displaystyle=\sum_{c=1}^{C_{il}}\!\Pr\{m\!\in\![c\!-\!1,c)\}\mathbb{E}_{\hat{m}}[\hat{m}|m\!\in\![c\!-\!1,c)] (38)
=c=1CilPr{m[c1,c)}m=m,\displaystyle=\sum_{c=1}^{C_{il}}\Pr\{m\in[c-1,c)\}\cdot m=m, (39)

where (39) is obtained since

𝔼m^[m^|m[c1,c)]\displaystyle\mathbb{E}_{\hat{m}}[\hat{m}|m\in[c-1,c)] =(c1)Pr{m^=c1|m[c1,c)}\displaystyle=(c-1)\cdot\Pr\{\hat{m}=c-1|m\in[c-1,c)\} (40)
+cPr{m^=c|m[c1,c)}=m\displaystyle~{}~{}~{}~{}+c\cdot\Pr\{\hat{m}=c|m\in[c-1,c)\}=m (41)

We thus have 𝔼m^[m^|m]=m\mathbb{E}_{\hat{m}}[\hat{m}|m]=m. This completes the proof. ∎

Proposition 2 reveals that the cloud and ENs can accurately recover the continuous-valued messages by taking expectations over the received quantized messages. This can be realized with numerous quantization samples observed in the training step. Therefore, the DNN at the cloud QM0(;θM0)\mathcal{F}_{Q_{M_{0}}}(\cdot;\theta_{M_{0}}) in (9), which processes the quantized uplink messages 𝐦^i0\hat{\mathbf{m}}_{i0}, can be trained to decode the original information 𝐦i0\mathbf{m}_{i0} successfully.

Now, we discuss an efficient training strategy of the DNNs implemented with the probabilistic activation (36), which is, in general, has no closed-form expression for the gradient 𝚯ψCil(m)\nabla_{\boldsymbol{\Theta}}\psi_{C_{il}}(m). To address this, the gradient estimation techniques [37, 38, 19, 36] is adopted which approximate an intractable gradient with its average evaluated over any randomized operations. By leveraging Proposition 2, the gradient 𝚯ψCil(m)\nabla_{\boldsymbol{\Theta}}\psi_{C_{il}}(m) can be approximated as

𝚯ψCil(m)=𝚯m^𝚯𝔼m^[m^|m]=𝚯m.\displaystyle\nabla_{\boldsymbol{\Theta}}\psi_{C_{il}}(m)=\nabla_{\boldsymbol{\Theta}}\hat{m}\simeq\nabla_{\boldsymbol{\Theta}}\mathbb{E}_{\hat{m}}[\hat{m}|m]=\nabla_{\boldsymbol{\Theta}}m. (42)

It is inferred from (42) that the gradient of the proposed quantization activation can be simply replaced with that of the input continuous-valued message mm. Since mm is obtained with a bounding activation, e.g., sigmoid function, whose derivative is well-defined in all the input domain, the parameter set 𝚯\boldsymbol{\Theta} can be efficiently trained with the SGD algorithm.

Combining (36) and (42), we can conclude that the proposed quantization activation exhibits different behaviors in the forward pass and backward pass. The actual quantized messages are computed in the forward pass with the randomized rounding operations (36), and the resulting quantization is forwarded through the capacity-limited fronthaul links. On the contrary, to optimize the DNN parameter set 𝚯\boldsymbol{\Theta}, we need to calculate the gradients through the backpropagation algorithm [30]. In this backward pass computation, the quantization activation ψCil()\psi_{C_{il}}(\cdot) yields its input variable mm directly.

VII Performance Evaluation

This section assesses the performance of the proposed CECIL framework for power control applications in the F-RAN systems. EN ii (i=1,,Ni=1,\cdots,N) sends data symbols to its intended mobile receiver referred to as user ii. The ENs share the identical time-frequency resources for the data transmission. To mitigate the multi-user interference, an appropriate power allocation mechanism is required at individual ENs. The decision variable of EN ii becomes the transmit power xi[0,P]x_{i}\in[0,P] with PP equal to the maximum allowable power budget. Let ajia_{ji} (i,j=1,,Ni,j=1,\cdots,N) be the channel gain from EN jj to user ii. EN ii can only observe an NN-dimensional local CSI vector 𝐚i{aji:j}N\mathbf{a}_{i}\triangleq\{a_{ji}:\forall j\}\in\mathbb{R}^{N} that is reported from the corresponding user [12, 19]. The global network CSI is then defined as 𝐚={𝐚i:i}N2\mathbf{a}=\{\mathbf{a}_{i}:\forall i\}\in\mathbb{R}^{N^{2}}.

Two different utility functions are considered: average sum rate utility and average sum energy-efficiency (EE) utility. Defining 𝒟{𝐱|xi[0,P]:i}\mathcal{D}\triangleq\{\mathbf{x}|x_{i}\in[0,P]:\forall i\} as the feasible set of the concatenated solution vector 𝐱={xi:i}N\mathbf{x}=\{x_{i}:\forall i\}\in\mathbb{R}^{N}, the sum rate maximization (SRMax) and the sum EE maximization (EEMax) problems are respectively formulated as

max𝐱𝒟𝔼𝐚[i=1Nri(𝐚,𝐱)]andmax𝐱𝒟𝔼𝐚[i=1Nri(𝐚,𝐱)xi+PS],\displaystyle\max_{\mathbf{x}\in\mathcal{D}}\mathbb{E}_{\mathbf{a}}\bigg{[}\sum_{i=1}^{N}r_{i}(\mathbf{a},\mathbf{x})\bigg{]}\ \text{and}\ \max_{\mathbf{x}\in\mathcal{D}}\mathbb{E}_{\mathbf{a}}\bigg{[}\sum_{i=1}^{N}\frac{r_{i}(\mathbf{a},\mathbf{x})}{x_{i}+P_{S}}\bigg{]}, (43)

where ri(𝐚,𝐱)log(1+aiixi1+jiajixj)r_{i}(\mathbf{a},\mathbf{x})\triangleq\log(1+\frac{a_{ii}x_{i}}{1+\sum_{j\neq i}a_{ji}x_{j}}) stands for the rate of user ii and PSP_{S} is the static power consumption at ENs [39]. The power of the proposed DL-based cooperative mechanisms and our intuitions presented in Section V can be analyzed by power control problems in (43) which have been popular applications of DNN-assisted cooperative optimization studies [16, 19, 18, 20].

The channel gains are generated as the exponential random variables with unit mean. The transmit power constraint is set to P=10P=10, and the static power consumption is fixed as PS=1P_{S}=1. A five-layer DNN with 100 hidden neurons is employed at the cloud DNN in (9). The DNNs (8) and (10) at the ENs are constructed with three layers each with 50 neurons. The batch normalization technique [40] followed by the rectified linear unit (ReLU) activation is adopted at the hidden layers. Unless stated otherwise, we use the linear activations at the output layers of the message generating DNNs at the cloud (9) and at the ENs (8). For creating a feasible power level xi[0,P]x_{i}\in[0,P], the sigmoid function multiplied by PP is utilized at the output layer of the distributed optimizing DNN in (10). Each training epoch consists of 50 mini-batches each of which contains 50005000 independently generated random channel gains 𝐚\mathbf{a}. The Adam algorithm [29] with learning rate α=0.0001\alpha=0.0001 is exploited. The test performance is evaluated with 10410^{4} test samples. The training and testing steps are implemented with Tensorflow 1.15.0 on a PC with an Intel i7-9700K CPU, 32 GB of RAM, and a GEFORCE RTX 2080 GPU.

VII-A Perfect Fronthaul Link Case

We first focus on the perfect fronthaul link case where the messages can be exchanged via the noiseless fronthaul channels (20) and (21). In this ideal scenario, we validate the optimality of the NOMA and OMA fronthauling methods. The following baseline schemes are considered.

  • Ideal cooperation (IC): The cloud is assumed to get the global CSI vector 𝐚\mathbf{a} perfectly. The cloud centrally computes the solution 𝐱\mathbf{x} via a DNN with 12 layers and 100 hidden neurons, which has the similar number of trainable variables to the proposed CECIL. The resulting solution is then assumed to be perfectly known to the ENs.

  • No cooperation (NC): No message exchange is allowed. Each EN needs to decide the power control solution with an individual DNN, which accepts only the local CSI 𝐚i\mathbf{a}_{i} as input.

  • Projected gradient descent (PGD): The power control solution is optimized via the PGD method [41] under the feasible set xi[0,P]x_{i}\in[0,P]. To facilitate GPU-enabled parallel computations, we utilize the Adam optimizer in Tensorflow with the precision 10510^{-5}. The PGD generates a locally optimum solution for the SRMax and EEMax.

To implement the IC and PGD methods, EN ii uploads an NN-dimensional local CSI vector 𝐚i\mathbf{a}_{i} to the cloud by using Mi0=NM_{i0}=N RBs, resulting in total MU=i=1NMi0=N2M_{U}=\sum_{i=1}^{N}M_{i0}=N^{2} uplink fronthaul RBs. Also, the clouds forwards the local decision variable xix_{i} to EN ii through the downlink fronthaul links with M0i=1M_{0i}=1 RB, requiring MD=i=1NM0i=NM_{D}=\sum_{i=1}^{N}M_{0i}=N downlink RBs. Therefore, the total number of the fronthaul RBs is given as M=MU+MD=N(N+1)M=M_{U}+M_{D}=N(N+1). On the other hands, the NC baseline does not allow no interactions among the cloud and the ENs as M=0M=0.

Refer to caption
(a) N=5N=5
Refer to caption
(b) N=7N=7
Refer to caption
(c) N=9N=9
Figure 4: Average sum rate performance with respect to MUM_{U}.

Fig. 4 exhibits the average sum rate performance by changing the number of the uplink fronthaul RBs MUM_{U} for various choice of the total number of the RBs MM. For fair comparison with the IC and PGD methods, the maximum of MM in the simulations is set to N(N+1)N(N+1). For fixed MUM_{U} and MM, the number of the downlink fronthaul RBs is determined as MD=MMUM_{D}=M-M_{U}. The OMA system evenly allocates the uplink and downlink RBs for each EN, i.e., Mi0=MUNM_{i0}=\frac{M_{U}}{N} and M0i=MDNM_{0i}=\frac{M_{D}}{N}. Fig. 4(a) depicts the performance with N=5N=5 ENs. We can see that the proposed CECIL outperforms the NC benchmark for all simulated MUM_{U} and MM, even at a small number of the uplink fronthaul RBs, e.g., MU=5M_{U}=5. The CECIL with the NOMA fronthauling performs better than that with the OMA scheme. With sufficient MUM_{U}, the CECIL is superior to the existing locally optimum PGD method. As MUM_{U} increases, the proposed schemes reach the upperbound performance of the IC method. For a fixed MUM_{U}, the performance of the proposed schemes does not improve by increasing MM, or, equivalently, increasing the number of the downlink RBs MD=MMUM_{D}=M-M_{U}. This means that the uplink coordination, which uploads the encoding of the local CSI 𝐚i\mathbf{a}_{i} from the ENs to the cloud, is more crucial than the downlink interaction that forwards the results of the cloud computing to the ENs. Thus, for fixed MM, the optimum fronthaul resource allocation policy is to assign MDM_{D} as small as possible, e.g., MD=NM_{D}=N, and utilize the remaining ones for the uplink coordination as MU=MMDM_{U}=M-M_{D}. For the NOMA, M=20M=20 RBs with the allocation scheme MU=15M_{U}=15 and MD=5M_{D}=5 are sufficient to achieve the performance of the IC requiring M=30M=30 RBs, thereby saving 1010 RBs. As expected in Section V-C, a more RBs are needed for the OMA as M=30M=30, which is the same as the IC baseline.

Similar observations are made from Figs. 4(b) and 4(c) presenting the sum rate with N=7N=7 and 99 ENs, respectively. We can numerically find that M=12N(N+3N)M=\frac{1}{2}N(N+3N) RBs with the allocation MU=12N(N+1)M_{U}=\frac{1}{2}N(N+1) and MD=NM_{D}=N suffices for the NOMA method to get close to the upperbound IC performance. This is much smaller than MU=N2+1M_{U}=N^{2}+1 obtained from the analysis in Section V-C. Compared to the IC and PGD methods requiring M=N(N+1)M=N(N+1) RBs, the proposed CECIL with the NOMA fronthauling can save total 12N(N1)\frac{1}{2}N(N-1) RBs while achieving the same sum rate performance. Still, the OMA method needs M=N(N+1)M=N(N+1) RBs with MU=N2M_{U}=N^{2} and MD=NM_{D}=N. We can conclude that the NOMA fronthauling is more efficient than the OMA for any given NN both in terms of the performance and the fronthaul signaling overheads.

Refer to caption
(a) N=5N=5
Refer to caption
(b) N=7N=7
Refer to caption
(c) N=9N=9
Figure 5: Average sum EE performance with respect to MUM_{U}.

The EEMax problem is examined in Fig. 5 which presents the average sum EE with respect to MUM_{U}. Similar phenomenons to the SRMax results are observed. The proposed approaches work well also in the EEMax formulations and outperforms other baselines. It is still beneficial to allocate more RBs to the uplink fronthaul interactions. The NOMA system with M=12N(N+3N)M=\frac{1}{2}N(N+3N) RBs for MU=12N(N+1)M_{U}=\frac{1}{2}N(N+1) and MD=NM_{D}=N achieves the performance identical to the IC method. We can conclude that the CECIL generally performs well for arbitrary utility functions.

TABLE I: Average GPU running time [sec]
N=5N=5 N=7N=7 N=9N=9
SRMax EEMax SRMax EEMax SRMax EEMax
PGD 6.142 1.210 10.812 1.641 14.134 2.348
CECIL (NOMA) 0.541 0.858 1.310
CECIL (OMA) 0.539 0.859 1.308

Table I compares the online time complexity in terms of the GPU running time for parallel executions of 10410^{4} test samples. Both the proposed and PGD methods are implemented with the identical Tensorflow environment to exploit GPU-enabled parallel computations. Both the NOMA and OMA systems employ M=N(N+1)M=N(N+1) RBs with MU=N2M_{U}=N^{2} and MD=NM_{D}=N which is the same setting to the PGD method. The proposed approaches significantly reduce the GPU running time of the traditional PGD algorithm that requires iterative calculations in the real-time inference. The execution time of the PGD varies for the formulations since its convergence speed highly relies on the structure of the utility functions. The SRMax generally needs a higher computational complexity than the EEMax. On the contrary, the proposed schemes show the identical time complexity performance regardless of the formulations since their online computations depend only on the structure of the DNNs. The result implies that the CECIL framework outperforms the traditional optimization algorithm in terms of the performance, signaling overhead, and computational complexity.

VII-B Imperfect Fronthaul Link Case

The rest of this section demonstrates the proposed CECIL method in the imperfect fronthaul link case. For simplicity, we focus on the SRMax with N=5N=5 ENs. The noisy fronthaul channels in Section VI-A is considered first. The noise vectors in (31) and (32) are generated as the Gaussian random vectors with zero mean and covariance σ2𝐈\sigma^{2}\mathbf{I}. The peak power constraint is imposed for the message transmission on each RB. The elements of the message vectors are designed to lie in the bounded range [1,+1][-1,+1] by applying the hyperbolic tangent activation tanh(x)exexex+ex\text{tanh}(x)\triangleq\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}} to the output layers of the DNNs in (8) and (9). Then, the fronthaul signal-to-noise ratio (SNR) can be defined as SNR=1σ2\text{SNR}=\frac{1}{\sigma^{2}}.

Refer to caption
(a) NOMA
Refer to caption
(b) OMA
Figure 6: Average sum rate performance with respect to SNR for M=20M=20.

Fig. 6 illustrates the average sum rate performance with M=20M=20 RBs by changing the fronthaul SNR. For comparison, the performance of the CECIL trained and tested without the noise, i.e., σ2=0\sigma^{2}=0, is plotted. The non-robust scheme stands for the case where the CECIL is trained with the perfect fronthaul links as σ2=0\sigma^{2}=0 but its test performance is evaluated in the presence of the noise. Two naive power control policies, i.e., the max power scheme with xi=Px_{i}=P and the random power method with uniformly generated power xi[0,P]x_{i}\in[0,P], are also depicted. Both in the NOMA (Fig. 6(a)) and OMA (Fig. 6(b)) scenarios, the proposed method converges to the performance of the perfect cooperation case of σ2=0\sigma^{2}=0 as the SNR grows. For all simulated MUM_{U}, the robust CECIL trained with the random noise presents a remarkable performance gain over the NC baseline even in the low SNR regime. This implies that the proposed cloud-aided coordination policy is beneficial for the practical noisy fronthaul channels. The non-robust design exhibits a fairly degraded performance. In the low SNR regime, the performance of the non-robust design becomes worse than the NC method, meaning that the fronthaul cooperation is not helpful if the DNNs are not carefully trained. This verifies the importance of the proposed robust learning strategy which includes random fronthaul noises in the training data set.

Refer to caption
Figure 7: Average sum rate performance with respect to MUM_{U}.

Fig. 7 provides the sum rate performance as a function of MUM_{U} for the fronthaul SNRs of 55 and 10dBs10\ \text{dBs}. The NOMA system is still superior to the OMA method in the presence of the noise. In the high SNR regime (SNR=15dB\text{SNR}=15\ \text{dB}), it is efficient to allocate more RBs to the uplink fronthaul link as in the perfect fronthaul case. This is however not true at SNR=0dB\text{SNR}=0\ \text{dB}. For fixed MM, the increase in MUM_{U} would lead to the performance degradation. There would be a nontrivial tradeoff in the uplink-downlink fronthaul RB allocation for the imperfect fronthaul link scenario.

Refer to caption
Figure 8: Average sum rate performance with respect to MUM_{U} with asymmetric fronthaul channel gains.

In Fig. 8, we examine the adaptivity of the CECIL framework in a more realistic setup where the fronthaul interactions undergo asymmetric channel gains. Each elements of the message vectors are multiplied by random channel coefficients drawn from the uniform distribution within [0.1, 1]. The NOMA fronthauling scheme still performs better than the OMA method, demonstrating the effectiveness of the resource sharing nature of the NOMA method [42]. Both the NOMA and OMA require fairly high fronthaul SNR to achieve the performance of the centralized PGD algorithm. A more sophisticated interaction policy would be needed at the cloud and ENs to capture the impact of the asymmetric fronthaul channel gains.

Refer to caption
(a) NOMA
Refer to caption
(b) OMA
Figure 9: Average sum rate performance with respect to BB for M=20M=20.

Next, we investigate the finite-capacity fronthaul link case in Section VI-B. The capacity of each fronthaul link is fixed as Cil=2BC_{il}=2^{B}, i,l\forall i,l, where BB reflects the fronthaul capacity in bits. Fig. 9 exhibits the sum rate performance of the finite-capacity fronthaul link case with respect to BB for M=20M=20 in the NOMA (Fig. 9(a)) and OMA (Fig. 9(b)) systems. The performance for the perfect fronthaul link case, i.e., infinite BB, is shown as a reference. The proposed quantization activation in (36) is not included in the non-robust design. Hence, it trains the DNNs in the prefect fronthaul link case, and its test performance is measured with the rounding channel functions with finite BB. The performance of the proposed message quantization constantly grows as BB gets larger and significantly outperforms the non-robust design. We can see that B=4B=4 is sufficient for the NOMA system to achieve the upperbound performance of the IC baseline with infinite BB, whereas the OMA fails to get close to it even with B=7B=7 bits.

Refer to caption
Figure 10: Average sum rate performance with respect to MUM_{U}.

We plot the average sum rate of the finite-capacity fronthaul case in Fig. 10 as a function of MUM_{U}. Similar to the additive noise scenario in Fig. 7, in the finite-capacity case, the optimal fronthaul resource allocation strategy is not trivial if BB is small, i.e., the F-RAN suffers from the inaccurate fronthaul interactions. Regardless of MM and BB, the NOMA outperforms the OMA fronthauling scheme in the finite-capacity fronthaul link case. Therefore, we can conclude that the NOMA system is robust to the imperfections incurred in the fronthaul interaction steps.

VIII Concluding Remarks

This paper studies a DL solution for addressing generic F-RAN optimization tasks where a cloud schedules decentralized computations of ENs through fronthaul links. A structural learning inference termed by the CECIL framework is proposed which mimics a cloud-aided cooperative optimization strategy. Three different types of DNN modules are applied to the cloud and individual ENs each of which is responsible for uplink and downlink coordinations and distributed optimization. We design message multiple accessing schemes to facilitate the multi-EN fronthaul interactions. A robust training policy is presented in the practical imperfect fronthaul link scenarios. Numerical simulations validate the superiority of the proposed DL framework over existing optimization algorithms in terms of the performance, fronthaul signaling overheads, and computational complexity. To combat wireless fading fronthaul channels, it would be an interesting future work to adopt channel autoencoder techniques [43, 44, 36] for the message-generating inferences. Also, extensions to more complicated application scenarios such as multi-antenna coordinated beamforming problems are worth pursuing.

References

  • [1] S.-H. Park, O. Simeone, and S. Shamai (Shitz), “Fronthaul compression for cloud radio access networks: signal processing advances inspired by network information theory,” IEEE Signal Process. Mag., vol. 31, no. 6, pp. 69–79, Nov. 2014.
  • [2] R. Tandon and O. Simeone, “Harnessing cloud and edge synergies: toward an information theory of fog radio access networks,” IEEE Communications Magazine, vol. 54, no. 8, pp. 44–50, Aug. 2016.
  • [3] S. Park, O. Simeone, O. Sahin, and S. Shamai, “Joint precoding and multivariate backhaul compression for the downlink of cloud radio access networks,” IEEE Trans. Signal Process., vol. 61, no. 22, pp. 5646–5658, Nov. 2013.
  • [4] H. Z. M. Tao, E. Chen and W. Yu, “Content-centric sparse multicast beamforming for cache-enabled cloud RAN,” IEEE Trans. Wireless Commun., vol. 15, no. 9, p. 6118–6131, Sep. 2016.
  • [5] W. Lee, O. Simeone, J. Kang, and S. Shamai (Shitz), “Multivariate fronthaul quantization for downlink C-RAN,” IEEE Trans. Signal Process., vol. 64, no. 19, p. 5025–5037, Oct. 2016.
  • [6] S.-H. Park, O. Simeone, and S. Shamai (Shitz), “Multi-tenant C-RAN with spectrum pooling: Downlink optimization under privacy constraints,” IEEE Trans. Veh. Technol., vol. 67, no. 11, pp. 10 492–10 503, Nov. 2018.
  • [7] J. Kim, S. Park, O. Simeone, I. Lee, and S. Shamai (Shitz), “Joint design of fronthauling and hybrid beamforming for downlink C-RAN systems,” IEEE Trans. Commun., vol. 67, no. 6, pp. 4423–4434, Jun. 2019.
  • [8] S.-H. Park, O. Simeone, and S. Shamai (Shitz), “Joint optimization of cloud and edge processing for fog radio access networks,” IEEE Trans. Wireless Commun., vol. 15, no. 11, pp. 7621–7632, Nov. 2016.
  • [9] J. Liu, B. Bai, J. Zhang, and K. B. Letaief, “Cache placement in fog-RANs: from centralized to distributed algorithms,” IEEE Trans. Wireless Commun., vol. 16, no. 11, pp. 7039–7051, Nov. 2017.
  • [10] Y. Xiao and M. Krunz, “Distributed optimization for energy-efficient fog computing in the tactile internet,” IEEE J. Sel. Areas Commun., vol. 36, no. 11, pp. 2390–2400, Nov. 2018.
  • [11] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos, “Learning to optimize: training deep neural networks for interference management,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5438–5453, Oct. 2018.
  • [12] W. Lee, D.-H. Cho, and M. Kim, “Deep power control: transmit power control scheme based on convolutional neural network,” IEEE Commun. Lett., vol. 22, no. 6, pp. 1276–1279, Jun. 2018.
  • [13] W. Lee, O. Jo, and M. Kim, “Intelligent resource allocation in wireless communications systems,” IEEE Commun. Mag., vol. 58, no. 1, pp. 100–105, Jan. 2020.
  • [14] D. Liu, C. Sun, C. Yang, and L. Hanzo, “Optimizing wireless systems using unsupervised and reinforced-unsupervised deep learning,” IEEE Netw., vol. 34, no. 4, pp. 270–277, Jul. 2020.
  • [15] J. Kim, H. Lee, S.-E. Hong, and S.-H. Park, “Deep learning methods for universal MISO beamforming,” IEEE Wireless. Commun. Lett., vol. 9, no. 11, pp. 1894–1898, Nov. 2020.
  • [16] P. de Kerret, D. Gesbert, and M. Filippone, “Team deep neural networks for interference channels,” in Proc. IEEE Int. Conf. Commun. (ICC), pp. 1–6, May 2018.
  • [17] D. Gunduz, P. de Kerret, N. D. Sidiropoulos, D. Gesbert, C. R. Murthy, and M. van der Schaar, “Machine learning in the air,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2184–2199, Sept. 2019.
  • [18] M. Kim, P. de Kerret, and D. Gesbert, “Learning to cooperate in decentralized wireless networks,” in in Proc. IEEE Asilomar Conf. Signals, Syst. Comput. (ACSSC), Oct. 2018, pp. 281–285.
  • [19] H. Lee, S. H. Lee, and T. Q. S. Quek, “Deep learning for distributed optimization: applications to wireless resource management,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2251–2266, Oct. 2019.
  • [20] Y. S. Nasir and D. Guo, “Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2239–2250, Oct. 2019.
  • [21] J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless network intelligence at the edge,” Proc. IEEE, vol. 107, no. 11, pp. 2204–2239, Nov. 2019.
  • [22] R. Zhang, Y.-C. Liang, C. C. Chai, and S. Cui, “Optimal beamforming for two-way multi-antenna relay channel with analogue network coding,” IEEE J. Sel. Areas Commun., vol. 27, no. 5, p. 699–712, Jun. 2009.
  • [23] K. Lee, H. Sung, E. Park, and I. Lee, “Joint optimization for one and two-way MIMO AF multiple-relay systems,” IEEE Trans. Wireless Commun., vol. 9, no. 12, pp. 3671–3681, Dec. 2010.
  • [24] J. N. Laneman, D. N. C. Tse, and G. W. Wornell, “Cooperative diversity in wireless networks: efficient protocols and outage behavior,” IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 3062–3080, Dec. 2004.
  • [25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 1–9, Jun. 2016.
  • [26] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Netw., vol. 2, no. 5, pp. 359–366, Jan. 1989.
  • [27] B. Amos and J. Z. Kolter, “Optnet: differentiable optimization as a layer in neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2017.
  • [28] Y. Sun, P. Babu, and D. P. Palomar, “Majorization-minimization algorithms in signal processing, communications, and machine learning,” IEEE Trans. Signal Process., vol. 65, no. 3, pp. 794–816, Feb. 2017.
  • [29] D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
  • [30] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning.   MIT Press.
  • [31] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001.
  • [32] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed Optimization and Statistical Learning via the Alternating Direction Method and Multipliers,” Foundat. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, 2010.
  • [33] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), pp. 3391–3401, Dec. 2017, [Online] Available: https://arxiv.org/abs/1703.06114.
  • [34] N. E. Cotter, “The stone-weierstrass theorem and its application to neural networks,” IEEE Trans. Neural Netw., vol. 1, no. 4, pp. 290–295, Dec. 1990.
  • [35] V. Karkova, “Kolmogorov’s theorem and multilayer neural networks,” Neural Netw., vol. 5, no. 3, pp. 501–506, Jan. 1992.
  • [36] H. Lee, T. Q. S. Quek, and S. H. Lee, “A deep learning approach to universal binary visible light communication transceiver,” IEEE Trans. Wireless Commun., vol. 19, no. 2, pp. 956–969, Feb. 2020.
  • [37] Y. Bengio, N. Leonard, and A. Courville, “Estimating or propagating gradients through stochastic neurons for conditional computation,” arXiv preprint arXiv:1305.2982, Aug. 2013.
  • [38] T. Raiko, M. Berglund, G. Alain, and L. Dinh, “Techniques for learning binary stochastic feedforward neural networks,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2015.
  • [39] C. Isheden, Z. Chong, E. Jorswieck, and G. Fettweis, “Framework for link-level energy efficiency optimization with informed transmitter,” IEEE Trans. Wireless Commun., vol. 11, no. 8, pp. 2946–2957, Aug. 2012.
  • [40] S. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariance shift,” in Proc. Int. Conf. Mach. Learn. (ICML), pp. 448–456, July 2015.
  • [41] S. Boyd and L. Vandenberghe, Convex Optimization.   Cambridge University Press, 2004.
  • [42] W. Shin, M. Vaezi, B. Lee, D. J. Love, J. Lee, and H. V. Poor, “Non-orthogonal multiple access in multi-cell networks: Theory, performance, and practical challenges,” IEEE Commun. Mag., vol. 55, no. 10, pp. 176–183, Oct. 2017.
  • [43] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cog. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017.
  • [44] H. Lee, S. H. Lee, T. Q. S. Quek, and I. Lee, “Deep learning framework for wireless systems: applications to optical wireless communications,” IEEE Commun. Mag., vol. 57, no. 3, pp. 35–41, Mar. 2019.