Graph Neural Network-Based Scheduling for Multi-UAV-Enabled Communications in D2D Networks
Abstract
In this paper, we jointly design the power control and position dispatch for Multi-unmanned aerial vehicle (UAV)-enabled communication in device-to-device (D2D) networks. Our objective is to maximize the total transmission rate of downlink users (DUs). Meanwhile, the quality of service (QoS) of all D2D users must be satisfied. We comprehensively considered the interference among D2D communications and downlink transmissions. The original problem is strongly non-convex, which requires high computational complexity for traditional optimization methods. And to make matters worse, the results are not necessarily globally optimal. In this paper, we propose a novel graph neural networks (GNN) based approach that can map the considered system into a specific graph structure and achieve the optimal solution in a low complexity manner. Particularly, we first construct a GNN-based model for the proposed network, in which the transmission links and interference links are formulated as vertexes and edges, respectively. Then, by taking the channel state information and the coordinates of ground users as the inputs, as well as the location of UAVs and the transmission power of all transmitters as outputs, we obtain the mapping from inputs to outputs through training the parameters of GNN. Simulation results verified that the way to maximize the total transmission rate of DUs can be extracted effectively via the training on samples. Moreover, it also shows that the performance of proposed GNN-based method is better than that of traditional means.
keywords:
Unmanned aerial vehicle (UAV), D2D communication, graph neural network (GNN), power control, position planning.procs \CopyrightLine2021Published by Elsevier Ltd.
1 Introduction
Due to the fast and flexible deployment features, unmanned aerial vehicles (UAVs) have attracted extensive attention from both academia and industry [1]. In the past few years, UAVs have been widely used in various fields such as military operations, earth and environment monitoring, disaster management, good delivery, precision agriculture, intelligent surveillance etc [2]. In the field of communications, UAVs are envisioned as a vital element of future wireless network technologies that will expand network coverage, improve deployment flexibility and increase system throughput [3, 4]. Compared with terrestrial base stations (BSs) whose the locations are pre-determined and fixed, UAVs can adaptively control its position to react as needed to requests for on demand services in a flexible and low-cost way [5].
Despite promising prospects for UAVs in many fileds like those mentioned above, some key issues must be dealt with seriously to effectively use them to realize seamless connectivity and ultra reliable communication in the future [6]. Among them, UAV trajectory design and position deployment for sensing and communications are two important issues which have received great attention. The trajectory design mainly for the fixed-wing UAVs who are always on the move [7], while the position deployment mainly for the rotary-wing UAVs that remain relatively stationary [8]. For the relatively static UAVs, it may always set the position based on a pre-designed optimization algorithm. Under the assumption of quasi-static channel, such an off-line design method is effective to some extent. However, in fact, this method obviously has the defects of complex calculation, poor real-time and unrealistic. Once the environment changes, it is worth studying how to adjust the UAVs’ position in time to achieve reliable communication and coverage.
D2D communication is an effective way to increase the transmission range and throughput [9]. Since the channel between UAV and ground user is usually assumed to be line of sight (LoS) channel, the UAV-to-ground channel will cause serious interference to D2D communications and thus compromise the system performance, how to integrate the UAV-enabled information dissemination and ground D2D transmission has attracted much attention. The authors in [10] considered the power control of a single UAV in the D2D communication network, a centralized processing algorithm was proposed to maximize the system throughput. [11] focused on the scenario that the D2D users act as the relay for UAV communications, by jointly optimizing user association, UAV scheduling, transmission power and UAV trajectory, the minimum security rate between UAVs was maximized.
Moreover, the authors in [12] focused on D2D-enhanced UAV nonorthogonal multiple access (NOMA) network architecture, in which the ground users can reuse the time-frequency resources assigned to NOMA links. It proposed a novel and efficient graph-based file dispatching method with a hypergraph-based grouping algorithm. D2D transmission was applied to extend wireless coverage of UAV in [13]. It focused on the optimal set of UAV-served users rather than the transmission performance of D2D users. An UAV-assisted D2D network was studied in [14], in which a new trade off between the throughput and channel switching cost was proposed. Instead of transforming the optimization problem into a convex form, it introduced game theory and constructed a distributed framework, which greatly reduced the computational complexity and makes it more suitable for scenarios with high-speed changed channels. Random walk model was also considered, in which D2D users move continuously and UAV flies around a central point [15]. Deep reinforcement learning was applied to solve the real-time resource allocation problem.
We note that the solutions of the above mentioned studies are not limited to convex optimization approaches. On the one hand, we know that some mathematical problems modeling of many complex network scenes are difficult to be transform into the equivalent convex form, and most of the widely used convex approximation skills can undermine optimality [16]. On the other hand, the popularity of smart devices enable users a certain degree of mobility, which makes the acquisition of real-time channel state information even worse. As a result, some efficient and low-complexity methods such as graph theory, game theory and deep reinforcement learning have been explored and exploited [17].
In many fields, such as communication signal processing, image processing and natural language processing, machine learning has become a common method to solve large-scale complex non-convex problems, as long as the training samples are available [18]. Machine learning can be mainly divided into supervised learning, unsupervised learning and semi-supervised learning. The difference among them is reflected in the proportion of labeled samples [19].
Recently, graph neural networks (GNNs) have been introduced to solve large-scale wireless resource management problems [20]. Some important properties were proposed in [20], including permutation equivalence property and the equivalence with distributed optimization algorithms, which can be used to analyze the performance and generalization of GNN-based methods. The authors in [21] proposed an intelligent reflecting surface (IRS)-based wireless network. To obtain the channel state information (CSI) of each communication link, a deep neural network was applied to parameterize the mapping from the received pilots to the optimized system configuration. Part of the channel information that can be obtained was included in the pilot information, and the interaction between users was captured through GNN. [22] considered a novel graph embedding based method for link scheduling in D2D networks, in which the graph embedding process was based on the distances of both communication and interference links without requiring the CSI. Two ways of learning were reflected in the simulation experiment of [22], i.e., supervised learning and unsupervised learning. For supervised learning, it was utilized to approximate optimal link scheduling strategy, while the unsupervised learning was employed to maximize the sum throughput of the proposed system.
From the above analysis, one may note that unsupervised learning is somewhat more popular than supervised learning. This is because that the formulated mathematical objective can be treated as the main component of the loss function, which is suitable for unsupervised learning. On the other hand, supervised learning requires a large number of labeled samples, who are not readily available since many mathematical problems of modeling are always highly non-convex and difficult to solve. Traditional convex optimization methods may obtain the optimal or near optimal solutions at the expense of high computing overhead. As a consequence, unsupervised learning becomes a better alternative.
In this paper, we focus on the deployment of relative static UAVs in D2D communication scenarios. Specifically, we consider the coexistence of the UAV-to-ground transmissions and D2D communications. Taking into account the interference of two patterns, we map the normal communication links to vertices of a graph, and map the co-channel interference links to edges of a graph. Based on the established heterogeneous graph, and taking the CSI and the coordinates of ground users as inputs, as well as taking the location of UAVs and the transmission power of all transmitters as outputs, we obtain the mapping from inputs to outputs through training the parameters of GNN. In addition, as wireless channels change with the movement of ground users, the UAVs’ positions need to be constantly adjusted accordingly. According to the real-time feedback of the CSI, the positions can be adjusted through the forwarding of trained GNN rather than the high-volume recalculation, which can help reducing the flight energy consumption and the computing delay effectively.
1.1 Contributions
To the best of authors’ knowledge, this is the first work in this field that propose GNN-based method to implement the position optimization and power control for multi-UAV-enabled communications in D2D networks. The main contributions of this paper are summarized as follows.
-
1.
We investigate the scenario of multi-UAV-enabled communications in D2D networks. The position arrangement and power control issue is studied to maximize the system sum throughput of UAV-to-DU links while satisfying the basic quality of service (QoS) requirement of D2D communications. A specific and efficient GNN model is proposed for the considered system and the formulated problem. We construct the transmission links and interference links as the vertices and edges, respectively. For each vertex, its feature consists of the transmission channel gain, the transmission power and the coordinate. For each edge, its feature is the interference channel gain.
-
2.
We carry out the technical training for the formulated GNN. The iteration of each vertex is divided into two stages. In the first stage, we exploit the multi-layer perception (MLP) to aggregate the information from all the neighbors of the vertex. The aggregated information contain the features of both neighboring vertices and neighboring edges. In the second stage, by using another type of MLP, we further aggregate the output of the first stage with the feature of the vertex itself. Then, the final output is obtained with the help of a widely used activation function.
-
3.
To demonstrate the superiority of our proposed design, we generate a large number of test samples, among which multiple D2D users and DUs are randomly distributed in a given wide range area. Numerical results show that the performance of our proposed method is significantly better than that of other benchmarks.
1.2 Organization
The remainder of this paper is organized as follows. Section 2 provides the system model and problem formulation. Section 3 proposes the GNN-based scheme, which includes the mapping from considered physical communication network to graph model and the training. Section 4 provides the simulation results. Finally, Section 5 concludes this paper.
2 System Model and Problem Formulation
As shown in Fig. 1, we consider a heterogeneous wireless communication system with the coexistence of multi-UAV-enabled downlink transmissions and D2D communication network, which includes UAVs, downlink users (DUs) and D2D pairs. Each D2D pair consists of one D2D transmitter (DT) and one D2D receiver (DR). Due to the limited energy storage and computing ability, multiple UAVs in the air cooperatively send the same message (such as the live high-definition video obtained from some of them) to all DUs on the ground. Meanwhile, each DT sends information to its corresponding DR. All the nodes in our system are single-antenna. It is assumed that the UAV-to-DU links share the same spectrum with D2D communications. Therefore, the co-channel interference among them must be handled carefully to achieve globally optimal system utility.

Without loss of generality, we use the three dimensional Cartesian coordinate system [23], with which the location of DR can be denoted as and that of DU is , with and . For simplicity, we assume that all UAVs are flying at the same altitude, and the location of UAV is expressed by , where is the minimum required height to circumvent ground obstacles and .
Here, we assume that the UAV-to-DU channel is the LoS channel, and locations of all DUs are given. Then, the channel gain from UAV to DU can be expressed as [5, 24]
(1) |
where is the channel power gain at the reference distance meter.
Similarly, the channel gain from UAV to DR can be expressed as
(2) |
It is worth noting that expression (1) and expression (2) have the similar structure. However, expression (1) refers to the transmission channel, while expression (2) denotes the interference channel. All air-to-ground channels are highly related to the position of UAV which is one of the main design concerns of this paper. By taking into the co-channel interference and the broadcast characteristics of the UAV-to-DU channels, the signal-to-interference and noise ratio (SINR) of DU can be given by
(3) |
where and are the transmission power of UAV and DT , respectively. The Rayleigh fading channel is the interference channel from DT to DU with as the wavelength, as the reference distance outdoors, as the path loss parameter relied on the transmission scene and as the distance between DT and DU . is the Gaussion noise. Similarly, the SINR of DR is expressed as
(4) |
for , where represents the channel between DT and DR with as the distance between them, the Rayleigh fading channel is the interference channel from DT to DR . is the Gaussion noise. It is worth noting that the interference received at DR contains two parts, one part is that from other D2D pairs and other part is from UAVs.
Our goal is to maximize the sum throughput of DUs while guaranteeing the basic QoS requirements of all the D2D pairs. In summary, the problem can be formulated as [5, 16]
(5) | ||||
where is the required minimum transmission rate of each D2D pair.
The first two constraints in (5) refer to the power consumption of UAVs and DTs, which cannot exceed their tolerance limit. The third constraint indicates that the interference from the UAVs and that of other D2D transmitters must ensure that the transmission rate of each D2D link cannot be lower than the threshold . It can be found that problem (5) is non-convex and difficult to solve directly, since the variables are highly coupled in the objectives and constraints. In the next section, instead of resorting to traditional optimization skills, such as the alternating optimization, block coordinate descent, etc, we will provide the details of GNN-based approach to solve this problem optimally and efficiently.
3 Graph Neural Network
In this section, we first convert the proposed multi-UAV D2D communication system into a graph model, and then provide the detailed training process.
3.1 Relational GNN model

In order to facilitate a better understanding, in this subsection, we first show the construction of the relational GNN model with single UAV. Then, based on this, we further demonstrate the GNN model with multiple UAVs, in which the relation will be more complex. For the single UAV case, as shown in Fig. 2, each UAV-to-DU link is treated as a node and is circled in green, meanwhile, each D2D link is also treated as a node and is circled in yellow. We have multiple green nodes corresponding to multiple DUs and have many yellow nodes corresponding to different D2D pairs. The directed edge denotes the interference among UAV-to-DU link and D2D link. In particular, which node the arrow points to indicates the transmission at that node is interfered with by the communication at another node on the directed edge. The double-side arrow means that the transmission at both ends of the arrow leads to mutual interference. There is no interference among green nodes due to the broadcast of the same message.
Thus, we formulate the relational GNN model of the proposed system as a set , where is the vertex set, and is the edge set. and represent mapping both vertices and edges to their corresponding features, respectively, where is the feature dimension of vertex and is the feature dimension of edge. In regard to the values of and , let’s take this step further. For , it is extracted from the features of transmission power, channel gain and coordinate of both the green and yellow nodes. Since the coordinate contains both the horizontal and vertical axis variables, so we have the value . For , it is extracted from the features of interference channel. Since the ground-to-ground channels are given with fixed distance and the coordinate values have been extracted above, so we have the only feature which is the channel gain and have the value .

Based on the results in Fig. 2, we further extend the case to multiple UAVs. For display purposes, a graph topology corresponding to two UAVs is given in Fig. 3. From Fig. 3, it can be seen that the topology has some degree of upper and lower symmetrical structure, where the top green nodes sharing one UAV and the bottom green nodes sharing the other one. Such a symmetry remains when the number of UAVs increases. Compared with the graph structure of one single UAV, the number of nodes in the multi-UAV case will be increased from to . There is still no interference among different UAV-to-DU links.
According to the above discussions, we define the element of adjacency feature tensor as
(6) |
where is a edge set, whose element corresponds to the interference link. The value 0 means that the two nodes are not connected in graph topology. denotes the interference channel gain between two D2D pairs or that between a D2D pair and UAV-to-DU link. The specific definition of can be given as
(7) |
3.2 Training
Considering a convolutional graph neural network with multiple convolutional layers. The convolutional layer encapsulates the hidden representation of each vertex by gathering feature information from its neighbors. After the features are aggregated, the nonlinear transformation is applied to the result output. By stacking multiple layers, the final hidden representation of each vertex will receive messages from all of the neighbors.

The basic GNN update process is shown in Fig. 4. Each vertex only aggregates the information of its neighbors during each iteration. Here, with the increase of the number of iterations, the convergence of the vertex gradually increases and eventually the training process tends to be stable. Moreover, the training process shown in Fig. 4 is also applicable to the case where the graph structure is a non-weighted graph, i.e., the edges of the graph involve only connectivity and carry no information. According to the definition of the adjacency matrix in (6), the edge of our proposed graph model carries the information of interference channel. Hence, we define the update rule of the -th layer at vertex as
(8) |
where represents a set function that aggregates information from the node’s neighbors, and is a function that combines aggregated information with its own information. We apply different multi-layer perceptions (MLPs) for these two functions, i.e., for and for . In , it would first aggregate information from the neighbors and their edges, and then generate the aggregated neighborhood information. The output of will be part of the input of . The output of is processed by the activation function, usually be , and the final output is obtained. Based on the above procedure, expression (8) can be transformed as
(9) | ||||

Fig. 5 takes an illustration of the update rule for the proposed GNN-based model. The gray dot on the left represents the information of neighbor vertex, and the yellow dot represents the information of the edge corresponding to each neighbor vertex. The information of all the vertices and edges are spliced and input into , then the output of and the information of the vertices themselves are aggregated (shown as a vector splicing in the proposed system) as the input of . The output of processes through the function. That’s a round of information update of the vertex.
Since the original problem (5) is highly non-convex and difficult to solve directly, and it is impossible to obtain samples with accurate labels, we consider setting the loss function to update the network parameters.
It is worth noting that (5) is an objective maximization problem. To minimize the loss, based on the objective function, we have the following loss function.
(10) |
Further, by adding constraints in (5) to the loss function (10) as a penalty, we have
(11) |
where is the penalty coefficient, which can be set from zero to infinity. is a penalty item, whose expression is given as
(12) | |||
There are a large number of items in (12), it will greatly reduce the training speed, so we need to simplify it accordingly. Specifically, (12) mainly includes two parts, one is transmission power, and the other is data rate. For transmission power , through introducing Sigmoid operation, it can be re-expressed as . Similarly, we have as . By substituting them into (12), the expressions and are always be equal to zero for every and , so the transmission power related terms can be removed from (12). As a result, the expression (11) can be simplified as
(13) | |||
The proposed GCN-based method employs a loss function (13) for the last layer to perform the parameter update. It is a regular form of unsupervised learning. In the next section, we will implement and validate the proposed scheme by simulation experiments.
4 Numerical Results
In this section, we provide numerical results to validate the effectiveness of the GNN-based scheme. Here, we introduce three schemes for comparison, including 1) alternate optimization (AO) scheme for the original problem, 2) random deployment of UAVs’ position, and 3) fixed power allocation. Some of the main parameters are listed in Table 1.
The graph neural network adopted includes one input layer, three hidden layers and one output layer. The number of parameters of the hidden layer is . In the training process, we take each user’s coordinates and CSI as input, and the each UAV’s power allocation and coordinates as output.
Parameter | Symbol | Value |
---|---|---|
Maximum power of UAVs | 30 dBm | |
Maximum power of D2D transmitters | 10 dBm | |
The noise power | -60dBm | |
The channel power gain of m | -30dB | |
Number of DUs | 4 | |
Number of D2D users | 6 | |
Number of UAVs | 4 | |
The fixed altitude of UAVs | 10m |

We use program simulation to generate 500 samples as the sampling set, and generate another 200 samples as the test set. For the purpose of updating the network weights more efficient, we use the Adam optimizer [25] instead of the classic stochastic gradient descent method. From Fig. 6, it can be seen that the result of the UAV-to-DU total transmission rate tends to be stable after about 100 iterations, the convergence of the proposed network is guaranteed.

Fig. 7 shows the the UAV-to-DU total transmission rate versus the number of D2D pairs. It reveals that our proposed algorithm has a greater improvement than other benchmark schemes in terms of downlink transmission rates. Specifically, as the number of D2D users increases, the total transmission rate of the system decreases. Since D2D users are randomly distributed in the area, the increase in the number of D2D users will not only increase the interference with DUs and DRs, but also make the deployment of UAVs more difficult. In this case, the UAV tends to fly far away from the communication area or reduce the transmission power to ensure the transmission rate of D2D users. Compared to , when , the transmission rate will decrease by more than two thirds. Hence, in this situation, increasing the number of UAV may be a feasible way to improve the performance, as what we will discussed in the next simulation result.

In Fig. 8, we take the total transmission rate versus the number of UAVs. It can be found that as the number of UAVs increases, the transmission rates increase significantly. This means that increase the number of UAV will not significant increase the interference with the DUs, and the interference with DRs from the newly added UAVs can be effectively suppressed by location deployment and power control. All of this reveals the obvious advantages of multi-UAV-enabled communications.
5 Conclusion
In this paper, the power control and position optimization problem was investigated for multi-UAV-enabled communications in D2D network. An GNN-based method was proposed to tackle the highly non-convex fractional programming problem optimally. In particular, we formulated a graph-based model for the proposed network via mapping the transmission links and interference links into vertexes and edges, respectively. Then, we trained the GNN with unsupervised learning. Finally, the trained GNN can be used to perform scheduling based on the input channel state information and the coordinates of ground users. Simulation results demonstrated the outstanding performance of UAV in facilitating the information broadcasting to ground downlink users. Moreover, the proposed GNN-based approach shows lower complexity and better system throughput than that of the benchmark scheme.
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (61901231), in part by the National Natural Science Foundation of China (61971238), in part by the Natural Science Foundation of Jiangsu Province of China (BK20180757), in part by the open project of the Key Laboratory of Dynamic Cognitive System of Electromagnetic Spectrum Space, Ministry of Industry and Information Technology (KF20202102), in part by the China Postdoctoral Science Foundation under Grant (2020M671480), in part by the Jiangsu Planned Projects for Postdoctoral Research Funds (2020Z295).
References
- [1] Y. Zeng, R. Zhang, T. J. Lim, Wireless communications with unmanned aerial vehicles: opportunities and challenges, IEEE Communications Magazine 54 (5) (2016) 36–42. doi:10.1109/MCOM.2016.7470933.
- [2] M. Chen, X. Wei, J. Chen, L. Wang, L. Zhou, Integration and provision for city public service in smart city cloud union: Architecture and analysis, IEEE Wireless Communications 27 (2) (2020) 148–154. doi:10.1109/MWC.001.1900264.
- [3] B. Liu, Y. Wan, F. Zhou, Q. Wu, R. Q. Hu, Robust trajectory and beamforming design for cognitive miso uav networks, IEEE Wireless Communications Letters 10 (2) (2021) 396–400. doi:10.1109/LWC.2020.3032621.
- [4] Q. Wu, F. Shen, Z. Wang, G. Ding, 3d spectrum mapping based on roi-driven uav deployment, IEEE Network 34 (5) (2020) 24–31. doi:10.1109/MNET.011.2000076.
- [5] F. Zhou, Y. Wu, R. Q. Hu, Y. Qian, Computation rate maximization in uav-enabled wireless-powered mobile-edge computing systems, IEEE Journal on Selected Areas in Communications 36 (9) (2018) 1927–1941. doi:10.1109/JSAC.2018.2864426.
- [6] X. Guan, Y. Huang, C. Dong, Q. Wu, User association and power allocation for uav-assisted networks: A distributed reinforcement learning approach, China Communications 17 (12) (2020) 110–122. doi:10.23919/JCC.2020.12.008.
- [7] Y. Zeng, R. Zhang, Energy-efficient uav communication with trajectory optimization, IEEE Transactions on Wireless Communications 16 (6) (2017) 3747–3760. doi:10.1109/TWC.2017.2688328.
- [8] S. U Rahman, G.-H. Kim, Y.-Z. Cho, A. Khan, Positioning of uavs for throughput maximization in software-defined disaster area uav communication networks, Journal of Communications and Networks 20 (5) (2018) 452–463. doi:10.1109/JCN.2018.000070.
- [9] A. Asadi, Q. Wang, V. Mancuso, A survey on device-to-device communication in cellular networks, IEEE Communications Surveys Tutorials 16 (4) (2014) 1801–1819. doi:10.1109/COMST.2014.2319555.
- [10] H. Wang, J. Chen, G. Ding, S. Wang, D2d communications underlaying uav-assisted access networks, IEEE Access 6 (2018) 46244–46255. doi:10.1109/ACCESS.2018.2865629.
- [11] J. Ji, K. Zhu, D. Niyato, R. Wang, Joint trajectory design and resource allocation for secure transmission in cache-enabled uav-relaying networks with d2d communications, IEEE Internet of Things Journal 8 (3) (2021) 1557–1571. doi:10.1109/JIOT.2020.3013647.
- [12] B. Wang, R. Zhang, C. Chen, X. Cheng, L. Yang, H. Li, Y. Jin, Graph-based file dispatching protocol with d2d-enhanced uav-noma communications in large-scale networks, IEEE Internet of Things Journal 7 (9) (2020) 8615–8630. doi:10.1109/JIOT.2020.2994549.
- [13] J. Miao, Q. Liao, Z. Zhao, Joint rate and coverage design for uav-enabled wireless networks with underlaid d2d communications, in: 2020 IEEE 6th International Conference on Computer and Communications (ICCC), 2020, pp. 815–819. doi:10.1109/ICCC51575.2020.9345214.
- [14] T. Fang, D. Wu, M. Wang, J. Chen, Multi-stage hierarchical channel allocation in uav-assisted d2d networks: A stackelberg game approach, China Communications 18 (2) (2021) 13–26. doi:10.23919/JCC.2021.02.002.
- [15] K. K. Nguyen, N. A. Vien, L. D. Nguyen, M.-T. Le, L. Hanzo, T. Q. Duong, Real-time energy harvesting aided scheduling in uav-assisted d2d networks relying on deep reinforcement learning, IEEE Access 9 (2021) 3638–3648. doi:10.1109/ACCESS.2020.3046499.
- [16] W. Wu, F. Zhou, R. Q. Hu, B. Wang, Energy-efficient resource allocation for secure noma-enabled mobile edge computing networks, IEEE Transactions on Communications 68 (1) (2020) 493–505. doi:10.1109/TCOMM.2019.2949994.
- [17] M. Chen, L. Wang, J. Chen, X. Wei, L. Lei, A computing and content delivery network in the smart city: Scenario, framework, and analysis, IEEE Network 33 (2) (2019) 89–95. doi:10.1109/MNET.2019.1800253.
- [18] Y. Bengio, A. Courville, P. Vincent, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8) (2013) 1798–1828. doi:10.1109/TPAMI.2013.50.
- [19] Y. Zhou, F. Zhou, Y. Wu, R. Q. Hu, Y. Wang, Subcarrier assignment schemes based on q-learning in wideband cognitive radio networks, IEEE Transactions on Vehicular Technology 69 (1) (2020) 1168–1172. doi:10.1109/TVT.2019.2953809.
- [20] Y. Shen, Y. Shi, J. Zhang, K. B. Letaief, Graph neural networks for scalable radio resource management: Architecture design and theoretical analysis, IEEE Journal on Selected Areas in Communications 39 (1) (2021) 101–115. doi:10.1109/JSAC.2020.3036965.
- [21] T. Jiang, H. V. Cheng, W. Yu, Learning to reflect and to beamform for intelligent reflecting surface with implicit channel estimation, IEEE Journal on Selected Areas in Communications 39 (7) (2021) 1931–1945. doi:10.1109/JSAC.2021.3078502.
- [22] M. Lee, G. Yu, G. Y. Li, Graph embedding-based wireless link scheduling with few training samples, IEEE Transactions on Wireless Communications 20 (4) (2021) 2282–2294. doi:10.1109/TWC.2020.3040983.
- [23] W. Wang, X. Li, R. Wang, K. Cumanan, W. Feng, Z. Ding, O. A. Dobre, Robust 3d-trajectory and time switching optimization for dual-uav-enabled secure communications, IEEE Journal on Selected Areas in Communications (2021) 1–1doi:10.1109/JSAC.2021.3088628.
- [24] Z. Wang, F. Zhou, Y. Wang, Q. Wu, Joint 3d trajectory and resource optimization for a uav relay-assisted cognitive radio network, China Communications 18 (6) (2021) 184–200. doi:10.23919/JCC.2021.06.015.
- [25] D. Kingma, J. Ba, Adam: A method for stochastic optimization, Computer Science.