This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimal Charging Profile Design for Solar-Powered Sustainable UAV Communication Networks thanks: Accepted by IEEE ICC 2023: Green Communication Systems and Networks Symposium.

Longxin Wang1, Saugat Tripathi2, Ran Zhang2, Nan Cheng1, and Miao Wang2
1 School of Telecommunication Engineering, Xidian University, Xi’an, China 2 Department of Electrical and Computer Engineering, Miami University, Oxford, USA Email: 1wanglx19wanglx19@stu.xidian.edu.cn,1dr.nan.chengdr.nan.cheng@ieee.org, 2{tripats,zhangr43,wangm64tripats,zhangr43,wangm64}@miamioh.edu
Abstract

This work studies optimal solar charging for solar-powered self-sustainable UAV communication networks, considering the day-scale time-variability of solar radiation and user service demand. The objective is to optimally trade off between the user coverage performance and the net energy loss of the network by proactively assigning UAVs to serve, charge, or land. Specifically, the studied problem is first formulated into a time-coupled mixed-integer non-convex optimization problem, and further decoupled into two sub-problems for tractability. To solve the challenge caused by time-coupling, deep reinforcement learning (DRL) algorithms are respectively designed for the two sub-problems. Particularly, a relaxation mechanism is put forward to overcome the ”dimension curse” incurred by the large discrete action space in the second sub-problem. At last, simulation results demonstrate the efficacy of our designed DRL algorithms in trading off the communication performance against the net energy loss, and the impact of different parameters on the tradeoff performance.

I Introduction

In future mobile communications networks, UAVs equipped with wireless transceivers can be exploited as mobile base stations to provide highly on-demand services to the ground users, forming UAV-based communication networks (UCNs)[1]. With the advantages of flexible 3D mobility, higher chance of Line-of-Sight communication channels, and lower deployment and operational cost, UCNs have received substantial research attention from various aspects[2]. However, the existing works mostly focus on UAV control considering a fixed set of UAVs. Few works have investigated how the network should optimally respond when the UAV crew dynamically change. On one hand, UAVs are powered by batteries. Some UAVs will run out of battery during the service and have to quit the network for charging. On the other hand, supplemental UAVs can be dispatched to easily join the existing crew to enhance the network performance. Therefore, it is indispensable to design novel responsive regulation strategies capable of optimally handling a UAV crew that may change dynamically.

To this end, we proposed in [3, 4] a responsive UAV trajectory control strategy to maximize the accumulated number of served users over a time horizon where at least one UAV quits or joins the network. Nevertheless, no matter how good such responsive strategies can be, they are by nature passive reaction strategies which can only accept and passively react to the change rather than proactively control the change. Solar charging makes the proactive control possible. The chance lies in that the user traffic demand in an area is usually time-varying. When the demand is low, some UAVs can be deliberately dispatched to elevate high to get solar charged even if they are not in bad need of charging. They can be called back later to replace other UAVs or meet the increased user demand. In this manner, the network is able to take charge of the change in the serving UAV crew, and a solar-powered sustainable (SPS) UAV network can be established.

UAV communications with solar charging have been studied by some pioneering works. With a single solar-powered UAV, [5] developed an optimal 3D trajectory control and resource allocation strategy, [6] studied the problems of energy outage at UAV and service outage at users by modeling solar and wind energy harvesting, and [7] proposed a novel power cognition scheme to intelligently adjust the energy harvesting, information transmission, and trajectory to improve UAV communication performance. For multi-UAV networks, the work[8] studied joint dynamic UAV altitude control and multi-cell wireless channel access management to optimally balance between solar charging and communication throughput improvement. The work[9] analytically characterized the user coverage performance of a UAV network based on a harvest power model and 3D antenna radiation patterns. Although solar charging is exquisitely integrated to fuel the UAV communications, most of the related works do not take into account the time-variability of solar radiation or user traffic demand, which, however, is usually the case in practice. To achieve day-scale sustainability, it is essential to consider these time variation when designing the UAV control strategies.

Therefore, in this work, we investigate the optimal solar charging strategy design for a UCN, considering time-varying solar radiation and user data traffic demand. The strategy aims to optimally trade off over a time horizon between maximizing the accumulated user coverage and minimizing the net energy loss, subject to the constraints of UAV sustainability and user service requirements. The net energy loss is defined as the difference between the total energy harvest and the total energy consumption. As far as we know, this work is the first to jointly consider time-varying solar radiation and user service demand at a day-scale in a solar-powered self-sustainable UAV network. Specifically, our contributions are three-fold.

  • The studied problem is first formulated into a time-coupled mixed-integer non-convex optimization problem. To make the problem tractable, the original problem is decoupled into two sub-problems, one obtaining the mapping between the number of serving UAVs and the number of served users in each time slot, and the other handling the time-variability of solar radiation and user service demand.

  • To tackle the challenge caused by time coupling, deep reinforcement learning (DRL) algorithms are developed to solve the two sub-problems. Particularly, a relaxation mechanism is designed to relieve the “dimension curse” caused by the large discrete action space in the second sub-problem.

  • Simulations are conducted to demonstrate the efficacy of the proposed learning algorithms and the impact of different parameters on the tradeoff performance.

remainder of the paper is organized as follows. Section II depicts the system model. Section III presents the problem formulation and decomposition. Section IV details the proposed DRL algorithms and its relaxation. Section V provides the numerical results. Section VI concludes the paper.

II System Model

Refer to caption
Figure 1: System model of SPS UAV communication network.

II-A Network Model

As shown in Fig. 1, we consider NN solar chargeable UAVs, denoted as 𝒮𝒰𝒜𝒱\mathcal{S_{UAV}}, providing communication services to (i.e., serve) a target area. All UAVs can communicate with a server via backhaul networks (e.g., a satellite or cellular network). Each UAV concentrates its transmission energy within an aperture underneath it, forming a ground coverage disk. UAVs are mostly at three altitudes: ground, the serving altitude (HSrvH_{Srv}) and the charging altitude (HChgH_{Chg}). When on the ground, UAVs consume negligible power only for messaging with the server. UAVs only serve and get charged at the fixed altitude HSrvH_{Srv} and HChgH_{Chg}, respectively. HSrvH_{Srv} is low to maintain good UAV-user communication quality, while HChgH_{Chg} is right above the upper boundary of the clouds to minimize the attenuation of solar radiation due to clouds. The consideration of only charging UAVs at HChgH_{Chg} is justified as follows. According to [10], the solar radiation is attenuated exponentially with the thickness of clouds between the sun and solar panel, leading to only 110\sim\frac{1}{10} after the first 300 meters. When it does not take long (e.g., one minute or two) for a UAV to move vertically through 300 meters, UAVs can be reasonably set for charging at a fixed altitude just above the clouds.

The time horizon TT is equally divided into time slots indexed by tt. In any time slot, a percentage pp of the users are randomly distributed in proximity to some hotspot centers while the remaining are uniformly distributed throughout the area. The numbers and spatial distributions of the users and hotspots are deemed unchanged within a time slot but may vary with tt. The dynamics of the user distribution are known to the server to obtain an offline UAV charging strategy. The well-trained strategy will be executed relying on the server-UAV communications via the backhaul links.

II-B Spectrum Access

Users access the UAV spectrum following LTE Orthogonal Frequency Division Multiple Access (OFDMA)[11], which assigns different users of one UAV at least one orthogonal Resource blocks (RBs) such that they do not interfere with each other. A heuristic 2-stage user association policy is adopted. In each time slot, users send connection requests in stage I to the serving UAVs that provides the best SINR (can be measured via reference signaling), and each UAV admits the users with the best SINR values based on its bandwidth. In stage II, the rejected users then chooses the UAV with the next best SINR and is admitted if the UAV has available bandwidth. The stage II procedure is repeated for each user without association until it is admitted or has no available UAV to send requests to. Each user has a minimum throughput requirement rur_{u}. When a UAV ii admits a user uu, the number of RBs assigned to the user, niuRBn^{RB}_{iu}, should satisfy

niuRBWRBlog2(1+PtGiun0+j𝐒u\{i}PtGju)ruwhere Giu=10PLiu/20,PLiu=20log10(4πfcdiuc)+η(dB).\begin{array}[]{l}n^{RB}_{iu}W^{RB}\log_{2}(1+\frac{P_{t}G_{iu}}{n_{0}+\sum_{j\in{\mathbf{S}_{u}}^{\prime}\backslash\{i\}}P_{t}G_{ju}})\geq r_{u}\\ \text{where }G_{iu}=10^{-PL_{iu}/20},\\ \;\;\;\;\;\;\;\;\;\;PL_{iu}=20\log_{10}{(\frac{4\pi f_{c}d_{iu}}{c})}+\eta\;\;\;\text{(dB)}.\end{array} (1)

In Eq. (1), WRBW^{RB} is the bandwidth per RB, PtP_{t} is the transmit power spectrum density (psd) of UAVs, n0n_{0} is the noise psd, 𝐒u{\mathbf{S}_{u}}^{\prime} denotes the set of UAVs that can cover user uu, and GiuG_{iu} is the UAV-to-user channel gain as a function of the center frequency fcf_{c}, distance diud_{iu} between UAV ii and user uu, light speed cc and a line of sight (LoS) related parameter η\eta[12].

II-C Energy Model

We follow the work in [5] to model the kinematics power consumption for the UAVs. For a UAV flying at a level speed vlvv_{lv} and a vertical speed vvtv_{vt}, the kinematics power consumption is modeled as Eq. (2). In Eq. (2), PlvP_{lv}, PvtP_{v}t and PdragP_{drag} denotes the power consumption of level flight, vertical flight and blade drag profile power, respectively, WW is the UAV weight, ρ\rho is the air density, AA is the total area of the UAV rotor disks, CD0C_{D0} is the profile drag coefficient, σA\sigma A is the total blade area, and vTv_{T} is the blade tip speed. Note that vvtv_{vt} is positive for UAV climbing, and negative for UAV landing.

Pkine=Plv+Pvt+Pdrag,where Plv=W22ρA1vlv2+vlv4+4Vh4,Pvt=Wvvt,Pdrag=18CD0ρσAvT3Vh=W2ρA.\begin{array}[]{l}P_{kine}=P_{lv}+P_{vt}+P_{drag},\\ \text{where }P_{lv}=\frac{W^{2}}{\sqrt{2}\rho A}\frac{1}{\sqrt{v_{lv}^{2}+\sqrt{v_{lv}^{4}+4V_{h}^{4}}}},\\ \;\;\;\;\;\;\;\;\;\;P_{vt}=Wv_{vt},\\ \;\;\;\;\;\;\;\;\;\;P_{drag}=\frac{1}{8}C_{D0}\rho\cdot\sigma A||v_{T}||^{3}\\ \;\;\;\;\;\;\;\;\;\;V_{h}=\sqrt{\frac{W}{2\rho A}}.\end{array} (2)

In addition to the kinematics power consumption, UAVs spend power on communication and on-board operations like computing, which are denoted as PtxP_{tx} and PstaticP_{static}, respectively. Thus the total power consumption of a UAV is given as

PTot=Pkine+Ptx+Pstatic.P_{Tot}=P_{kine}+P_{tx}+P_{static}. (3)

Note that PkineP_{kine} is usually several hundred watts. The transmission power of a small BS covering hundreds of meters typically falls between 0.25WW and 6W6W[13]. The operational power consumption is also in single-digit watts. Thus, PtxP_{tx} and PstaticP_{static} are usually neglected in practice.

The solar radiation intensity above the clouds varies with time in a day. We follow the model in [6] to characterize the average intensity as

Irad(t)=max{0,Imax(1/36t2+2/3t3)},0t<24,I_{rad}(t)=\max\{0,I_{max}(-1/36t^{2}+2/3t-3)\},0\leq t<24, (4)

where tt represents hour tt, and ImaxI_{max} denotes the maximum intensity during a day. The harvested solar power is then calculated as

Ph(t)={AcηcKcIrad(t)2,    0<Irad(t)<Kc,AcηcIrad(t)Irad(t)Kc,P_{h}(t)=\left\{\begin{array}[]{l}A_{c}\frac{\eta_{c}}{K_{c}}I_{rad}(t)^{2},\;\;\;\;0<I_{rad}(t)<K_{c},\\ A_{c}\eta_{c}I_{rad}(t)\;\;\;\;\;\;\;I_{rad}(t)\geq K_{c},\end{array}\right. (5)

where AcA_{c} is the solar panel area, ηc\eta_{c} is the charging efficiency coefficient, and KcK_{c} is an intensity threshold.

III Problem Formulation and Decomposition

The objective is to achieve the optimal tradeoff among maximizing the total number of served users over the time horizon TT, maximizing the total harvested solar energy, and minimizing the total energy consumption of the UAV network. The optimization is subject to the network sustainability constraints and user traffic demand requirements. With the above considerations, the problem formulation is given as P1P_{1}.

In Problem P1P_{1}, the decision variables include whether a UAV should land, go serving or go charging at any time slot tt, i.e, 𝐚^t=(a1,t,,aN,t)\hat{\mathbf{a}}_{t}=(a_{1,t},\cdots,a_{N,t}), and the horizontal positions of the UAVs that go serving in any time slot tt, i.e., 𝐩^t=(ak1(t),t,,akM(t),t)\hat{\mathbf{p}}_{t}=(a_{k_{1}(t),t},\cdots,a_{k_{M}(t),t}), where km(t),m{1,,M}k_{m}(t),m\in\{1,\cdots,M\} index the serving UAVs at time slot tt. Part A1A_{1} denotes the amount of energy harvest from solar charging for UAV ii at time slot tt. This part is determined by ai,ta_{i,t} and ai,t1a_{i,t-1} since a UAV takes some time to move from the last altitude to the current one, the harvest solar power Ph(t)P_{h}(t), and the battery residue at the end of time slot t1t-1, i.e., Et1resE^{res}_{t-1}, since the battery capacity may be reached during charging. Part A2A_{2} represents the energy consumption of UAV ii at tt, which is also determined by ai,ta_{i,t}, ai,t1a_{i,t-1}, and Et1resE^{res}_{t-1}. In Part A3A_{3}, 𝒮^tu()\hat{\mathcal{S}}^{u}_{t}(\cdot) is the set of users that are admitted and served by all the UAVs at tt, which is a function of 𝐚^t\hat{\mathbf{a}}_{t} and 𝐩^t\hat{\mathbf{p}}_{t}. The constant CC is the coefficient balancing the weights between the user coverage and the energy gain and losses.

max𝐚^t,𝐩^t\displaystyle\max\limits_{\hat{\mathbf{a}}_{t},\;\hat{\mathbf{p}}_{t}} t=1T{Ci𝒮𝒰𝒜𝒱[Eh(ai,t,ai,t1,Ph(t),Ei,t1res)A1\displaystyle\quad\sum\limits^{T}_{t=1}\left\{C\sum\limits_{i\in\mathcal{S_{UAV}}}[\underbrace{E_{h}(a_{i,t},a_{i,t-1},P_{h}(t),E^{res}_{i,t-1})}_{A_{1}}\right. (P1P_{1})
Ec(ai,t,ai,t1,Ei,t1res)A2]+|𝒮^tu(𝐚^t,𝐩^t)|A3}\displaystyle\quad\left.-\underbrace{E_{c}(a_{i,t},a_{i,t-1},E^{res}_{i,t-1})}_{A_{2}}]+\underbrace{|\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{a}}_{t},\hat{\mathbf{p}}_{t})|}_{A_{3}}\right\}
s.t. Ei,tresEmin(ai,t),i and t;\displaystyle\quad E^{res}_{i,t}\geq E_{min}(a_{i,t}),\;\forall i\mbox{ and }\forall t; (C1.1)
|𝒮^tu(𝐚^t,𝐩^t)|pmin|𝒮tu|,t;\displaystyle\quad|\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{a}}_{t},\hat{\mathbf{p}}_{t})|\geq p_{min}|\mathcal{S}^{u}_{t}|,\;\forall t; (C1.2)
Eq.(1),u𝒮^tu(𝐚^t,𝐩^t) and t.\displaystyle\quad\mbox{Eq.\eqref{eq.sinr}},\;\forall u\in\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{a}}_{t},\hat{\mathbf{p}}_{t})\mbox{ and }\forall t. (C1.3)

Constraint C1.1 represents the network sustainability requirements. The battery residue of any UAV ii at any tt should be no smaller than an altitude-dependent threshold Emin(ai,t)E_{min}(a_{i,t}). This is to make sure at the end of each tt, each UAV has enough energy to elevate to HChgH_{Chg} for charging in future slots to avoid completely leaving the crew. Thus,

Emin(ai,t)=ΔH/vup(PTot|vlv=0,vvt=vup),E_{min}(a_{i,t})=\Delta H/v_{up}\cdot({P_{Tot}}_{|v_{lv}=0,v_{vt}=v_{up}}), (6)

where ΔH\Delta H takes HChgH_{Chg} when ai,ta_{i,t}=0, HChgH_{Chg}-HSrvH_{Srv} when ai,ta_{i,t}=1, and 0 when ai,ta_{i,t}=2. Constraints C1.2 and C1.3 represent the user data traffic demand requirements. C1.2 requires that the percentage of served users at any tt should be no less than pminp_{min} given the total number of users |𝒮tu||\mathcal{S}^{u}_{t}|. C1.3 requires that the individual user traffic demand rur_{u} should be satisfied for any served users at any tt.

Problem P1P_{1} is a mixed integer nonlinear non-convex optimization problem with nonlinear constraints. The items of different tt in the objective function are temporally coupled through UAV battery residue. These facts make the sequential decision problem intractable. To this end, we decouple P1P_{1} into two sub-problems P2P_{2} and P3P_{3}, each of which can be solved by means of DRL algorithms.

max𝐩^t\displaystyle\max\limits_{\hat{\mathbf{p}}_{t}} |𝒮^tu(𝐩^t,NUAVSrv)|,t\displaystyle\quad|\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{p}}_{t},N^{Srv}_{UAV})|,\quad\forall t (P2P_{2})
Eq.(1),u𝒮^tu(𝐩^t,NUAVSrv).\displaystyle\quad\mbox{Eq.\eqref{eq.sinr}},\;\forall u\in\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{p}}_{t},N^{Srv}_{UAV}). (C2.1)

In the first sub-problem P2P_{2}, given the user distribution at each tt and the number of UAVs in service NUAVSrvN^{Srv}_{UAV}, the total number of users that can be served is maximized via determining the optimal positions 𝐩^t(NUAVSrv)\hat{\mathbf{p}}^{*}_{t}(N^{Srv}_{UAV}) as a function of NUAVSrvN^{Srv}_{UAV} and the user distribution.

max𝐚^t\displaystyle\max\limits_{\hat{\mathbf{a}}_{t}} t=1T{Ci𝒮𝒰𝒜𝒱[Eh(ai,t,ai,t1,Ph(t),Ei,t1res)\displaystyle\quad\sum\limits^{T}_{t=1}\left\{C\sum\limits_{i\in\mathcal{S_{UAV}}}[E_{h}(a_{i,t},a_{i,t-1},P_{h}(t),E^{res}_{i,t-1})\right. (P3P_{3})
Ec(ai,t,ai,t1,Ei,t1res)]+|𝒮^tu(𝐚^t,𝐩^t(NUAVSrv))|}\displaystyle\quad\left.-E_{c}(a_{i,t},a_{i,t-1},E^{res}_{i,t-1})]+|\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{a}}_{t},\hat{\mathbf{p}}^{*}_{t}(N^{Srv}_{UAV}))|\right\}
s.t. Ei,tresEmin(ai,t),i and t;\displaystyle\quad E^{res}_{i,t}\geq E_{min}(a_{i,t}),\;\forall i\mbox{ and }\forall t; (C3.1)
|𝒮^tu(𝐚^t,𝐩^t(NUAVSrv))|pmin|𝒮tu|,t;\displaystyle\quad|\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{a}}_{t},\hat{\mathbf{p}}^{*}_{t}(N^{Srv}_{UAV}))|\geq p_{min}|\mathcal{S}^{u}_{t}|,\;\forall t; (C3.2)
NUAVSrv=i𝒮𝒰𝒜𝒱I(ai,t==1),t.\displaystyle\quad N^{Srv}_{UAV}=\sum\nolimits_{i\in\mathcal{S_{UAV}}}I(a_{i,t}==1),\;\forall t. (7)

In the second sub-problem P3P_{3}, the achieved mapping from P2 between the maximum number of served users |𝒮^tu(𝐚^t,𝐩^t(NUAVSrv))||\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{a}}_{t},\hat{\mathbf{p}}^{*}_{t}(N^{Srv}_{UAV}))| and NUAVSrvN_{UAV}^{Srv} is exploited. The same objective as in P1P_{1} is maximized via only optimizing 𝐚^t\hat{\mathbf{a}}_{t}. The relationship between NUAVSrvN^{Srv}_{UAV} and 𝐚^t\hat{\mathbf{a}}_{t} is given in Eq. (7), where I()I(\cdot) is a binary indicator taking 1 if the inside condition is true and 0 otherwise.

IV Design of Deep Reinforcement Learning Algorithms

In this section, design of the DRL algorithms for solving P2P_{2} and P3P_{3} are elaborated. For P2P_{2}, we reuse the algorithm designed in our previous work [4] to achieve the mapping between the number of UAVs in service (NSrvUAVN_{Srv}^{UAV}) and the total number of served users (|𝒮^tu||\hat{\mathcal{S}}^{u}_{t}|) given a certain user distribution in hour tt. Based on the time-dependent mappings, the DRL algorithm design for solving P3P_{3} is emphatically presented.

IV-A DRL for Solving P2P_{2}

Our previous work [4] considered a set of UAVs flying at a fixed height providing communication services to the ground users with minimum throughput requirement. We considered dynamic UAV crew change during the training due to battery depletion or supplementary UAV join-in. A DDPG algorithm was designed to maximize the user satisfaction score via obtaining the optimal UAV trajectories during the steady period without crew change and the transition period when crew changes. With the following simplifications, the proposed algorithm can be fit to P2P_{2}. Crew change is not considered so that the state space can be cut down to only including the UAV positions. The action space remains unchanged allowing a UAV to go any direction with a maximum distance dmaxd_{max} per step. The reward function changes from step-wise user satisfaction score to step-wise total number of served users. The closest-SINR based user association policy is replaced with that elaborated in Subsection II-B.

IV-B DRL for Solving P3P_{3}

P3 exploits the mappings between |𝒮^tu||\hat{\mathcal{S}}^{u}_{t}| and NSrvUAVN_{Srv}^{UAV} in different hours obtained via P2P_{2}, and aims to maximize its objective function via optimizing the UAVs’ charging profiles in the considered time horizon. In each hour tt, the DRL agent needs to determine whether a UAV should go charging, go serving or go to the ground for energy saving, i.e., 𝐚^t\hat{\mathbf{a}}_{t}, based on the UAVs’ current battery residues and altitudes, solar radiation intensity, and user traffic demands. When designing the DRL algorithm, we consider the varying solar radiation and user traffic demands as the dynamics of the underlying environment. The key components of the algorithm are designed as follows.

IV-B1 State Space

Battery residue of a UAV is a critical factor in determining its next move, thus {Ei,tres},i𝒮𝒰𝒜𝒱\{E^{res}_{i,t}\},\forall i\in\mathcal{S_{UAV}} are included into the state space, denoting the residue battery of UAV ii at the beginning of hour tt. The current UAV altitude is another in-negligible factor as the altitude changing will incur wear energy consumption which may accumulate to significantly impact the overall scheduling. Minimizing the unnecessary altitude changes for UAVs while satisfying the constraints will contribute positively to the optimization objective. Therefore, {Hi,t},i𝒮𝒰𝒜𝒱\{H_{i,t}\},\forall i\in\mathcal{S_{UAV}} are embraced in the state space, which takes values HChgH_{Chg}, HSrvH_{Srv}, or 0. Lastly, the hour indexing tt needs to be considered to capture the dynamics of the environment (e.g., solar radiation and user traffic demand) so that different actions may be taken at different tt even if the rest of the states are same. The complete state space is given below, with a cardinality of 2N+12N+1.

𝐒𝐭={Ei,tres,Hi,t,t},i𝒮𝒰𝒜𝒱.\mathbf{S_{t}}=\{E^{res}_{i,t},H_{i,t},t\},~{}~{}~{}~{}\forall i\in\mathcal{S_{UAV}}. (8)

IV-B2 Action Space

The decision variables of subproblem P3P_{3} is 𝐚^t=(a1,t,,aN,t)\hat{\mathbf{a}}_{t}=(a_{1,t},\cdots,a_{N,t}), denoting the altitudes that each UAV will go to at the beginning of the current hour tt. The action space is directly defined as 𝐀𝐭={ai,t},i𝒮𝒰𝒜𝒱\mathbf{A_{t}}=\{a_{i,t}\},\forall i\in\mathcal{S_{UAV}}, which takes value 0 if the UAV goes to the ground, 1 if the UAV goes serving, and 2 if the UAV goes charging. The cardinality of the action space is 3N3^{N}.

IV-B3 Reward Function Design

The reward function rtr_{t} consists of three parts. The first part r1,tr_{1,t} corresponds to the constraints of P3P_{3}. When any UAV breaks the sustainability constraint (C3.1), a constant penalty pC1<0p_{C_{1}}<0 is applied. When constraint C3.2 is broken, i.e., the total number of serving UAVs NUAVSrvN^{Srv}_{UAV} cannot result in a minimum user service rate pminp_{min}, a constant penalty pC2(pC1,0)p_{C_{2}}\in(p_{C_{1}},0) is applied. In addition, when NUAVSrvN^{Srv}_{UAV} is larger than the minimum number of serving UAVs that result in 100%\% user service rate, a reward of 0 is applied to prevent service over-provisioning.

The second part r2,tr_{2,t} corresponds to the maximization of the total number of served users over the entire time horizon. Thus, r2,tr_{2,t} is set to be directly equal to |𝒮^tu(𝐚^t,𝐩^t(NUAVSrv))||\hat{\mathcal{S}}^{u}_{t}(\hat{\mathbf{a}}_{t},\hat{\mathbf{p}}^{*}_{t}(N^{Srv}_{UAV}))|. The third part r3,tr_{3,t} corresponds to the maximization of the difference between the total harvested energy and the total consumed energy. Due to the time-varying solar radiation intensity, it is beneficial for a UAV to land to the ground if it does not serve during some hours of a day (e.g., in the night or around sunset/sunrise), while it is beneficial to go charging during other hours of a day. In the former case, positive reward is given for UAVs going to the ground to promote energy saving, whereas in the latter case, positive reward is given for UAVs going charging to encourage energy harvesting. Therefore, r3r_{3} is designed as

r3,t={c1NUAVGnd, if landing is beneficial at t;c2NUAVChg, if charging is beneficial at t,r_{3,t}=\left\{\begin{array}[]{l}c_{1}\cdot N^{Gnd}_{UAV},\text{ if landing is beneficial at }t;\\ c_{2}\cdot N^{Chg}_{UAV},\text{ if charging is beneficial at }t,\end{array}\right. (9)

where c1c_{1} and c2c_{2} are reward coefficients to tradeoff between A1A_{1} and A3A2A_{3}-A_{2} in P1P_{1}, replacing coefficient CC for A1A_{1}. The total instantaneous reward rtr_{t} is rt=r1,t+r2,t+r3,tr_{t}=r_{1,t}+r_{2,t}+r_{3,t}.

IV-B4 Relaxation of the Discrete Action Space

As the state space 𝐒𝐭\mathbf{S_{t}} is continuous and discrete mixed, and the action space 𝐀𝐭\mathbf{A_{t}} is discrete, Deep Q learning (DQL) algorithm is typically exploited. However, the cardinality of the action space, i.e., 3N3^{N}, increases exponentially with the total number of UAVs NN. For an instance of NN=15, the total number of possible aggregate actions over all UAVs will be 3153^{15}\approx1.4e7e^{7}. As the number of outputs of the Deep Q network (DQN) is equal to the total number of possible actions, the resultant DQN will be prohibitively complicated, not to mention the number of hours in the considered time horizon. Therefore, DQL is technically feasible, but practically impossible.

Inspired by [14], we relax the original discrete action space into a continuous space and obtain the UAV charging profile 𝐚^t,tT\hat{\mathbf{a}}_{t},\forall t\in T by means of DDPG. Each action ai,ta_{i,t} is relaxed from discrete values {0,1,2}\{0,1,2\} to a continuous range (-0.5, 2.5). Hence, the relaxed action space becomes 𝐀𝐭~={ai,t}(1.5,2.5)N\mathbf{\widetilde{A_{t}}}=\{a_{i,t}\}\in(-1.5,2.5)^{N}. With DDPG, the number of outputs of the actor network is equal to the dimension of the action space, i.e., NN, which only increases linearly with NN rather than exponentially as 3N3^{N} in DQN. Each time when an aggregate action is determined by the actor network and added with noise, the action will then be discretized to the closest value in {0,1,2}\{0,1,2\}. The discretized action will be the actual action applied to the current state and stored in the experience replay buffer. In this manner, the complexity of the exploration can be significantly reduced.

V Numerical Results

V-A Simulation Setup

For subproblem P2P_{2}, we reuse the simulation setup and parameter configurations in our previous work [4] to obtain the hourly mapping between the number of UAVs and the maximal number of served users. For subproblem P3P_{3}, the environment parameters and the RL parameters are summarized in Table I and II, respectively. A 24-hour time horizon is considered per episode with each hour being a step. The training is conducted using Reinforcement Learning Toolbox of Matlab 2022a on a Windows 10 server with Intel Core i9-10920X CPU @ 3.50GHz, 64GB RAM, and Quadro RTX 6000 GPU.

Note that only considering 24 hours will not guarantee the same set of UAVs working for consecutive days, but only ensure that the involved UAVs have sufficient battery residue to go charging in the next day. Once it is verified that a given set of UAVs can be sustainable for a whole day, two sets of UAVs can serve in alternative days to achieve full sustainability.

                         mcell@mbjot  \mcell@mbjotParametersc! Values
   UAV levels (Hg,Hs,Hc)(H_{g},H_{s},H_{c}) (0,300,1400)mm    
   Max. solar radiation intensity 2000W/m22000W/m^{2}    
   above clouds ImaxI_{max}    
   Solar radiation intensity threshold KcK_{c} 150W/m2150W/m^{2}    
   UAV charging efficiency ηc\eta_{c} 0.25    
   UAV maximum speeds (vlv,vup,vdn)(v_{lv},v_{up},v_{dn}) (6m/s6m/s, 4m/sm/s, 4m/sm/s)    
   UAV rotor disk radius rdr_{d} 0.3mm    
   UAV charging panel area AChgA_{Chg} 1m21m^{2}    
   UAV weight and air density (W,ρ)(W,\rho) (5x9.8NN, 1.225kg/m3kg/m^{3})    
   UAV Battery Capacity EcapE_{cap} 600Wh600Wh[15]    
   UAV static operational power PstaticP_{static} 5W5W    
   UAV drag profile (CD0,σ,vT)(C_{D0},\sigma,v_{T}) (5e(5e-4,0.056,150m/s)4,0.056,150m/s)    
   Minimum user service rate pminp_{min} 0.85    
TABLE I: Summary of Main Environment Parameters
                         mcell@mbjot  \mcell@mbjotParametersc! Values
   Actor and critic networks 2 hidden layers, each with    
  400400 hidden nodes    
   Neural network learning rates 0.00010.0001 for critic and actor    
   Activation function tanhtanh (actor output),    
  ReluRelu (all remaining)    
   Regularization L2 with λ=0.0001\lambda=0.0001    
   Gradient threshold 1    
   Smooth factor for target networks 0.001    
   Update frequency for target networks 1    
   Mini-Batch size 512    
   Action noise (σmax2\sigma^{2}_{max}, decay,σmin2\sigma^{2}_{min}) (1.5,0.0001,0.2)(1.5,0.0001,0.2)    
   Experience buffer capacity 10610^{6}    
   Discount factor γ\gamma 0.990.99    
   Max. steps per episode 2424    
   Max. episodes simulated 10510^{5}    
TABLE II: Reinforcement Learning Parameters

V-B Simulation Results

The major dynamics of the environment are presented in Fig. 2 first. The solar radiation is concentrated between 7am-5pm in a day. The UAV charging rate above clouds is positively related to the solar radiation. In Subfig. 2(b), there are more users requesting services in the late morning and afternoon, which is consistent with daily working hours. To satisfy a minimum 85%85\% of user service rate per hour, the red bars show the minimum number of UAVs needed for serving.

Given the above dynamics and the constraints of sustainability and user demand, the convergence of episodic rewards under our designed DRL algorithms are provided in Fig. 3. It can be seen that for the same number of UAVs, convergence is almost the same for different reward coefficients, yet a larger UAV set will lead to longer convergence time due to increased dimensions of the state-action space.

Refer to caption
(a) Solar radiation dynamics
Refer to caption
(b) User demand dynamics
Figure 2: Dynamics of Solar Radiation and User Demand in a Day.
Refer to caption
Figure 3: Convergence of episodic reward for different numbers of UAVs and reward coefficients (c1,c2c_{1},c_{2}). The episode rewards are averaged over a window size of 300 with 95%\% credit interval.
Refer to caption
(a) Hourly number of serving UAVs
Refer to caption
(b) Accumulated number of served users
Figure 4: Performance measures of the proposed algorithm under different parameters.

Fig. 4 reveals details of the achieved optimal charging profiles in terms of hourly number of serving UAVs and the accumulated number of served users in one day. The baseline in Subfig. 4(a) gives the minimum required number of serving UAVs in each hour to satisfy the 85%85\% user service rate. It can be observed that with smaller reward coefficients (c1c_{1}, c2c_{2}), the number of serving UAVs in each hour tend to increase. The reason is that smaller reward coefficients will result in a smaller weight CC in optimization P1P_{1}. Thus the RL agent tends to dispatch more UAVs to serve more users to get more rewards rather than make UAVs go charging or idle. When there are more UAVs available (e.g., 17 UAVs), more UAVs can serve in each hour when c1c_{1} and c2c_{2} are relatively low. More serving UAVs in each hour will consequently bring more served users, which is confirmed by Subfig. 4(b).

VI Conclusions

In this paper, optimal solar charging problem has been studied in a sustainable UAV communication network, considering the dynamic solar radiation and user service demand. The problem has been formulated into a time-coupled optimization problem and further decoupled into two sub-problems. DRL algorithms have been designed to make the sub-problems tractable. Simulation results have demonstrated the efficacy of the designed algorithms in optimally trading off the communication performance against the net energy loss.

References

  • [1] A. Fotouhi, H. Qiang, M. Ding, M. Hassan, L. G. Giordano, A. Garcia-Rodriguez, and J. Yuan, “Survey on UAV cellular communications: Practical aspects, standardization advancements, regulation, and security challenges,” IEEE Communications surveys & tutorials, vol. 21, no. 4, pp. 3417–3442, 2019.
  • [2] M. Mozaffari, W. Saad, M. Bennis, Y.-H. Nam, and M. Debbah, “A tutorial on UAVs for wireless networks: Applications, challenges, and open problems,” IEEE communications surveys & tutorials, vol. 21, no. 3, pp. 2334–2360, 2019.
  • [3] R. Zhang, M. Wang, and L. X. Cai, “SREC: Proactive self-remedy of energy-constrained UAV-based networks via deep reinforcement learning,” in GLOBECOM 2020-2020 IEEE Global Communications Conference.   IEEE, 2020, pp. 1–6.
  • [4] R. Zhang, M. Wang, L. X. Cai, and X. Shen, “Learning to be proactive: Self-regulation of uav based networks with uav and user dynamics,” IEEE Transactions on Wireless Communications, vol. 20, no. 7, pp. 4406–4419, 2021.
  • [5] Y. Sun, D. Xu, D. W. K. Ng, L. Dai, and R. Schober, “Optimal 3D-trajectory design and resource allocation for solar-powered UAV communication systems,” IEEE Transactions on Communications, vol. 67, no. 6, pp. 4281–4298, 2019.
  • [6] S. Sekander, H. Tabassum, and E. Hossain, “Statistical performance modeling of solar and wind-powered UAV communications,” IEEE Transactions on Mobile Computing, vol. 20, no. 8, pp. 2686–2700, 2020.
  • [7] J. Zhang, M. Lou, L. Xiang, and L. Hu, “Power cognition: Enabling intelligent energy harvesting and resource allocation for solar-powered UAVs,” Future Generation Computer Systems, vol. 110, pp. 658–664, 2020.
  • [8] S. Khairy, P. Balaprakash, L. X. Cai, and Y. Cheng, “Constrained deep reinforcement learning for energy sustainable multi-UAV based random access IoT networks with NOMA,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 4, pp. 1101–1115, 2020.
  • [9] E. Turgut, M. C. Gursoy, and I. Guvenc, “Energy harvesting in unmanned aerial vehicle networks with 3D antenna radiation patterns,” IEEE Transactions on Green Communications and Networking, vol. 4, no. 4, pp. 1149–1164, 2020.
  • [10] A. Kokhanovsky, “Optical properties of terrestrial clouds,” Earth-Science Reviews, vol. 64, no. 3-4, pp. 189–241, 2004.
  • [11] R. Zhang, M. Wang, X. Shen, and L.-l. Xie, “Probabilistic analysis on QoS provisioning for Internet of Things in LTE-A heterogeneous networks with partial spectrum usage,” IEEE Internet of Things Journal, vol. 3, no. 3, pp. 354–365, 2015.
  • [12] A. Al-Hourani, S. Kandeepan, and A. Jamalipour, “Modeling air-to-ground path loss for low altitude platforms in urban environments,” in 2014 IEEE global communications conference.   IEEE, 2014, pp. 2898–2904.
  • [13] “Small cells and health,” Available at https://www.gsma.com/publicpolicy/wp-content/uploads/2015/03/SmallCellForum_2015_small-cells_and_health_brochure.pdf, 2015.
  • [14] G. Dulac-Arnold, R. Evans, H. van Hasselt, P. Sunehag, T. Lillicrap, J. Hunt, T. Mann, T. Weber, T. Degris, and B. Coppin, “Deep reinforcement learning in large discrete action spaces,” arXiv preprint arXiv:1512.07679, 2015.
  • [15] “High power density light weight drone solid state lithium battery,” Available at https://unmannedrc.com/products/high-power-density-light-weight-drone-solid-state-lithium-battery.