BESS Aided Reconfigurable Energy Supply using Deep Reinforcement Learning for 5G and Beyond

Hao Yuan, Guoming Tang, Deke Guo, Kui Wu, Xun Shao, Keping Yu, Wei Wei H. Yuan and D. Guo are with the Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha, Hunan, China. G. Tang is with the Peng Cheng Laboratory, Shenzhen, Guangdong, China. K. Wu is with the Department of Computer Science, University of Victoria, Victoria, BC, Canada. X. Shao is with the School of Regional Innovation and Social Design Engineering, Kitami Institute of Technology, Kitami, Japan. Keping Yu is with the Global Information and Telecommunication Institute, Waseda University, Shinjuku, Tokyo, Japan. W. Wei is wit School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, China.Corresponding authors: G. Tang and D. Guo.

Abstract

The year of 2020 has witnessed the unprecedented development of 5G networks, along with the widespread deployment of 5G base stations (BSs). Nevertheless, the enormous energy consumption of BSs and the incurred huge energy cost have become significant concerns for the mobile operators. As the continuous decline of the renewable energy cost, equipping the power-hungry BSs with renewable energy generators could be a sustainable solution. In this work, we propose an energy storage aided reconfigurable renewable energy supply solution for the BS, which could supply clean energy to the BS and store surplus energy for backup usage. Specifically, to flexibly reconfigure the battery’s discharging/charging operations, we propose a deep reinforcement learning based reconfiguring policy, which can adapt to the dynamical renewable energy generations as well as the varying power demands. Our experiments using the real-world data on renewable energy generations and power demands demonstrate that, our reconfigurable power supply solution can achieve an energy saving ratio of $74.8\%$ , compared to the case with traditional power grid supply.

Index Terms:

5G base stations, renewable energy, reconfigurable power supply, deep reinforcement learning

I Introduction

The 5G network is considered as a promising technology to significantly improve the way how we live [1]. Compared to the 4G/LTE, it can ensure users with higher bandwidth and lower latency and thus enable various cutting-edge mobile services, such as the Internet of Vehicles [2, 3], Virtual Reality [4], and Smart Medical Home [5, 6]. Nevertheless, due to the adoption of high frequency bands by 5G base station (BS), its signal coverage range is much shorter than that of the 4G/LTE. Consequently, the mobile operators need to deploy a large number of 5G BSs to tackle the problem of poor signal coverage. This would result in an ultra-dense BS deployment, especially in “hotspot” areas, as illustrated in Fig. 1.

Building and operating such large-scale BSs require an enormous investment and consume many resources (e.g., power consumption). According to field surveys in the cities of Guangzhou and Shenzhen, China, the full-load power consumption of a typical 5G BS is about $2\sim 3$ times of that of a 4G BS [7]. Considering the ultra-dense deployment of 5G BSs, it could lead to a tenfold increase in energy consumption. In this regard, how to effectively reduce energy consumption becomes an urgent problem to be solved.

Refer to caption — Figure 1: A vision of the future radio access network (RAN) in 5G and beyond, which consists of macro and small cells, and also includes the mobile and space BSs. For the purpose of green communication, all the BSs could be supplied by both the renewable energy and power grid.

Renewable energies like the solar energy and wind energy, as the eco-friendly way of power supply with low $CO_{2}$ emissions, have been popularized in more scenarios in recent years. Owing to the continuing price decline in photovoltaic (PV) module and wind turbine, the installation cost of renewable energy has dramatically decreased over the past decade, e.g., it reports a 61% reduction of the solar equipment from 2010 to 2017 [8]. Such cost reductions lead to a rapid payback period for the renewable energy investment, from a couple of years to several months [9]. The above observations indicate the great potential of renewable energy on the market of fossil fuel replacement and carbon emission reduction.

It thus has inspired the mobile operators to utilize renewable energy as the auxiliary power supply to tackle the huge power demand at 5G BSs. In some developing countries, solar power has already been applied to supply the BSs, some of which occupies over $8\%$ of the total electricity usage [10]. By installing the PV and wind turbine near the BSs, it shows that the maximum power from the solar and wind generators can reach up to 8.5kW and 6.0kW, respectively [10], which could remarkably cut down the communication energy supply from the traditional power gird.

To maximize the utilization of renewable energy, energy storage can be strategically utilized such that the energy can be continuously provided, as the renewable (like solar or wind) energy is intermittent and unstable. Meanwhile, most BSs are equipped with backup batteries to safeguard the BS’s normal functioning against power outages, making it the natural energy storage. Besides, with the continuous price decline in battery storage these years [11, 12], combining the battery storage with renewable energy generators could offer even greater cost-reduction potential. Specifically, i) when the generated renewable power is less than the power demand (e.g., during the peak hours), the battery can be discharged to flatten the peak power demands, and ii) when the generated renewable power is more than the power demand (e.g., during the off-peak hours), the battery can be charged to store the surplus renewable energy.

In this paper, we propose a battery energy storage system (BESS) aided renewable energy supply solution for the 5G network and beyond. Aiming at energy cost reduction for mobile operators, our solution is to maximize the utilization of the renewable energy and thus minimize the utilization of power grid (i.e., fossil energy). Specifically, the energy charge can be continuously reduced by the generated renewable power, and the demand charge can be reshaped and flatten through strategic battery discharging/charging operations.

When designing the optimal control strategy in battery discharging/charging operations, we are faced with several challenges. Firstly, the renewable energy generation and power demand are highly varying in both spatial and temporal dimensions and thus hard to predict. Secondly, owing to the physical constraints of the battery discharging/charging operations (e.g., discharge/charge efficiency), it is complicated to design the optimal battery controlling policy. Thirdly, as the battery’s capacity and lifetime are limited and shortened along with the discharge/charge cycles, it is necessary while non-trivial to trade-off between the cost of battery’s degradation/replacement and the gain of renewable energy storage.

By tackling the above challenges, we make the following contributions in this work:

•

We present the BESS aided reconfigurable renewable energy supply paradigm for 5G BS operations, in which the battery discharging/charging reconfiguration is modelled as an optimization problem. The model is comprehensive by taking into account the practical considerations of dynamic power demand and renewable energy generation, as well as battery specifications and physical constraints.
•

To cope with the intermittent renewable energy generation and dynamic BS power demand, while keeping computation complexity of the optimization problem under control, we propose a deep reinforcement learning (DRL) based battery discharging/charging reconfiguring policy, which can improve its decision-making efficiency through interacting with the environment.
•

We conduct extensive evaluations using real-world BS deployment scenario and BS traffic load traces. The results show that the proposed DRL-based battery discharging/charging reconfiguring policy can effectively utilize the renewable energy and cut down the energy cost.

The rest of the paper is organized as follows. In Sec. II, we introduce the background of this paper. In Sec. III, we give the system models and formulations of the problem, and then propose the BESS aided renewable energy supply solution in Sec. IV. We develop a DRL-based battery discharging/charging controlling policy in Sec. V. We evaluate the proposed method by experiments with a real-world dataset in Sec. VI. We present the related work in Sec. VII and conclude the paper in Sec. VIII.

II Background

II-A Base Station Power Demand

The power demand pattern of a BS is mainly determined by its location and associated with the behavior of users there. Usually the demand also show a periodic pattern (e.g., with a one-day or one-week period). As shown in Fig. 2, in this paper, we mainly consider three types of BSs at the areas of resident, office, and comprehensive, which account for nearly ninety percentage of the total demands [13]. To be detailed, the characteristics of these power demand patterns are as follows.

•

Power Demand of BSs at Resident Area: The power demands of this type of BSs increase rapidly in the evening, as most people stay at home after work. Compared with those in weekdays, the power demands keep at high-levels in weekends.
•

Power Demand of BSs at Office Area: The power demands of this type of BSs keep at the high-level in the day time, when most people work during the time. Besides, due to the fewer people work on the weekends, the weekend power demands are much lower than those on the weekdays.
•

Power Demand of BSs at Comprehensive Area: Due to the diversity of the requests, compared to the above two BSs, the power demand patterns of this type of BSs are more stable: constantly keep at a high-level in the day time and evening and drop down to the valley in late night and early morning.

The first two types of power demand patterns change relatively dramatically, leading to a huge energy-saving potential, especially for the demand charge, which will be discussed in the next section.

II-B Energy Cost of 5G BS

The energy cost of the mobile operator typically makes up of two components: i) energy charge, i.e., the total consumed electricity amount (in kWh) throughout the entire billing cycle (e.g., one month), and ii) demand charge, i.e., the peak power demand (in kW) during the billing cycle period. Specifically, the demand charge is regarded as a penalty due to the caused extra load burden to the power grid.

For example, for a commercial data center consuming 10 MW on peak and 6 MW on average, the monthly energy charge and demand charge amounts to around $24,000 and $165,500, respectively [14]. The demand charge could be up to 8x the energy charge, therefore, effectively cutting down the demand charge could remarkably reduce the energy cost. However, there seems no practical way to flatten the peak power demands of 5G BSs, e.g., shifting the real-time demands from mobile users to the off-peak hours could lead to the long delay for some of the classes of jobs [15].

III System Model

In this section, we present the system models and basic assumptions and problem formulation. For clarity, the major notations used in this paper are explained in Table I.

III-A Scenario Overview

As illustrated in Fig. 3, the proposed BESS aided renewable energy supply solution deployed at each 5G BS mainly includes: i) a renewable energy generator, e.g., the PV panel and wind turbine, which is deployed near the 5G BS system and generates renewable energy for the system, ii) a battery storage, which stores the surplus renewable energy and acts as the power source for the BS as needed, and iii) a controller, which can obtain the environment state (i.e., the measurement data) so as to control the battery discharging/charging operations through the control signals. In addition to the standard meter, as shown in Fig. 3, an additional generation meter is installed for the BS power supply system to measure the renewable energy generation. Furthermore, with commands from the controller, the distribution panel takes responsibility of power switch between the renewable energy and grid energy and ensures continuous and stable electricity supply for the BS.

As the essential component of the BESS aided renewable energy supply solution, the controller determines how efficient this paradigm is. Specifically, at each scheduling point, the controller needs to decide the amount of power supply from either the battery or the power grid. The scheduling operations should be made upon the power demands and battery states in real-time, so that the utilization of renewable energy can be enhanced and the total energy cost can be minimized.

Note that the feasibility of such an implementation as illustrated by Fig. 3 has been preliminarily verified in practice. According to [16], small integrated renewable energy generators are provided by some commercial companies for the BS system, which are easily deployed in both open rural and crowded urban environments.

III-B BS Power Supply and Demand

The power of each 5G BS is mainly supplied by three parts: power grid, generated renewable energy, and storage energy. In particular, i) when the generated renewable energy is more than the power demand (e.g., during the off-peak hours), each 5G BS is only supplied by the renewable energy (i.e., off-grid) and the surplus renewable energy is stored in the battery storage, ii) when the generated renewable energy is less than the power demand (e.g., during the peak hours), each 5G BS is supplied by all three parts in a cooperative way.

In this paper, we consider a discrete time model, where the entire billing cycle (e.g., one month) is equally spilt into $T$ consecutive slots with length of $\Delta t$ and denoted by $\mathcal{T}=\{1,2,\cdots,T\}$ . For an arbitrary 5G BS, the power demand during the entire billing cycle can be represented by a power demand vector:

d:=[d(1),d(2),\cdots,d(T)]

(1)

where $d(t)$ is the power demand in time slot $t$ , which can be obtained by power meter readings at each BS.

III-C Renewable Energy Generation

By harvesting energy from renewable energy resources, the BSs could be powered in an environmentally-friendly and cost-efficient way. In this paper, in order to make the model extensible, we denote the renewable energy generation vector as:

g:=[g(1),g(2),\cdots,g(T)]

(2)

In this work, we choose two typical renewable energy as the auxiliary way of power supply, i.e., solar energy (i.e., $g^{s}(t)$ ) and wind energy (i.e., $g^{w}(t)$ ). Accordingly, for an arbitrary time slot $t$ , the renewable energy generation vector can be represented by:

g(t)=g^{s}(t)+g^{w}(t)

(3)

We assume that if the total generated renewable energy is beyond the power demand (i.e., $g(t)>d(t)$ ), the power is supplied in proportion to the renewable energy generated. The generation of both varies during a certain period (e.g., one day) and is affected by a some similar factors such as weather, temperature, wind speed, and so on.

III-C1 Solar Energy Generation

Power generated by the solar PV system mainly depends on three factors: global horizontal irradiance ( $GHI(t)$ ), outdoor temperature ( $Temp(t)$ ), and time of day ( $ToD(t)$ ). By arranging solar PV cells in series/parallel, solar PV could harvest energy and convert it into DC to charge the battery storage and supply the power demand. The generated power by the solar PV at time slot $t$ can be measured by the following function:

g^{s}(t)=\mathbb{F}^{S}(GHI(t),Temp(t),ToD(t))

(4)

where $\mathbb{F}^{S}(\cdot)$ is a known, non-linear function defined in PVLIB [17]. Accordingly, the solar energy generation during the entire billing cycle can be represented by a vector:

g^{s}:=[g^{s}(1),g^{s}(2),\cdots,g^{s}(T)]

(5)

III-C2 Wind Energy Generation

Power generated by the wind turbine generator fluctuates randomly with time and mainly depends on the wind velocity ( $WV(t)$ ), weather system ( $WS(t)$ ), and hub height ( $HH(t)$ ). The wind turbine generate energy typically into two stages: first, it converts the wind power into mechanical energy and then transforms into electricity. The amount of the power generated by the wind turbine at time slot $t$ can be calculated by the following function:

g^{w}(t)=\mathbb{F}^{W}(WV(t),WS(t),HH(t))

(6)

where $\mathbb{F}^{W}(\cdot)$ is a known, non-linear function defined in [18]. Accordingly, the wind energy generation during the entire billing cycle can be represented by a vector:

g^{w}:=[g^{w}(1),g^{w}(2),\cdots,g^{w}(T)]

(7)

III-D Battery Specification

At an arbitrary time slot $t$ , the state of the battery is modeled as follows:

\chi(t):=\langle SoE(t),SoC(t),DoD(t)\rangle

(8)

where the notations of SoE, SoC, and DoD represent the state of effective capacity state of charge, and depth of discharge of the battery, respectively. Specifically, i) SoE indicates the current effective capacity of the battery, as a percentage of its initial capacity (denoted as $\pi$ ), ii) SoC indicates the current energy stored in the battery, as a percentage of the current effective capacity, and iii) DoD indicates how much energy the battery has released, as a percentage of the current effective capacity.

For simplicity to tackling the optimization problem, we discretize the SoC of a battery into $M$ equal-spaced states (e.g., $M=10$ , i.e., $\{10\%,20\%,\cdots,100\%\}$ ). Accordingly, the DoD are also discretized (e.g., release $10\%$ from $90\%$ , i.e., $90\%\to 80\%$ ). Besides, for an arbitrary time slot $t$ , in order to prevent the battery from over-discharging/charging, we use $SoC_{max}$ and $SoC_{min}$ to indicate the upper and lower bounds of SoCs, respectively, which is shown as follows.

SoC_{min}\leq{SoC}(t)\leq SoC_{max}

(9)

TABLE I: Summary of notations

Notation	Description
$d(t)$	power demand of 5G BS in time slot $t$
$g(t)$	renewable energy generation in time slot $t$
$b(t)$	battery discharging/charging operations in time slot $t$
$\chi(t)$	battery state in time slot $t$
$p(t)$	power supplied by the power gird in time slot $t$
$p_{max}$	peak power consumption supplied by power gird
$\pi$	initial capacity of the battery
$\mathcal{C}^{e}(t)$	energy charge of 5G BS in time slot $t$
$\mathcal{C}^{d}(t)$	demand charge of 5G BS in time slot $t$
$\mathcal{C}^{u}(t)$	investment cost in time slot $t$
$\lambda_{e}$	prices of energy charge
$\lambda_{d}$	prices of demand charge
$\lambda_{u}$	prices of investment cost
$\alpha,\beta$	discharging and charging efficiencies, respectively
$R+,R-$	max charge and discharge rates of battery, respectively
$s(t)$	environment state in time slot $t$
$a(t)$	action taken by the agent in time slot $t$
$r(t)$	reward of the action in time slot $t$
$\psi$	mapping policy from environment states to actions
$R(a(t),s(t))$	reward function of the DQN
$Q,\tilde{Q}$	Q-values of the main net and target net, respectively
$\theta,\tilde{\theta}$	parameters of the main net and target net, respectively

IV BESS Aided Renewable Energy Supply

The battery storage is deployed at 5G BS, and can charge by the surplus renewable energy (generated by solar PV and wind turbine system) and discharge to reshape the power demand, so as to maximize the utilization of renewable energy (or minimize the utilization of fossil fuel) and reduce the electricity bill.

We define the battery discharging/charging operations by a battery operation vector:

{b}:=[{b}(1),{b}(2),\cdots,{b}(T)]

(10)

where $b(t)$ is a real number variable and indicates the amount of discharging/charging operations. To be detail, i) positive value indicates discharging the power from the battery storage to the 5G BS during time slot $t$ , ii) negative value indicates charging from the renewable energy to the battery storage, and iii) zero value indicates no discharging/charging operation performs.

Meanwhile, the discharging/charging operations is constrained by the maximum charging rate and maximum discharging rate, denoted as $R^{+}$ and $R^{-}$ , respectively. It means the the largest power that the battery can be recharged and supply with in a time slot, which is shown as follows.

-R^{+}\leq b_{n}(t)\leq R^{-}

(11)

Besides, the battery storage need to meet the following conditions in discharging/charging operations:


$\displaystyle{b}(t)\leq 0$	$\displaystyle\mbox{, if }g(t)-d(t)\geq 0$	(12a)
$\displaystyle{b}(t)>0$	$\displaystyle\mbox{, if }g(t)-d(t)<0$	(12b)

which represents that the battery storage can only be charged when there exists surplus renewable energy after supplying to the 5G BS, and means that the battery storage cannot be simultaneously charged and discharged at any time slot.

Due to the power loss (e.g., AC-DC conversion and battery leakage [19]) occurred during discharging from battery storage to the power grid (or charging from renewable energy to the battery storage), we denote the actual discharging/charging operations from/to the battery by:

\tilde{b}(t)=\left\{\begin{array}[]{cl}b(t)/\alpha&\mbox{, if }{b}(t)\leq 0\\ \beta\cdot{b}(t)&\mbox{, if }b(t)>0\end{array}\right.

(13)

Given the power demand of the 5G BS (i.e., $d(t)$ ), the renewable energy generation (i.e., $g(t)$ ), and the battery discharging/charging operations (i.e., $b(t)$ ), we can derive the power consumption vector supplied by the power grid for an arbitrary time slot $t$ by:

p:=[p(1),p(2),\cdots,p(T)]

(14)

where $p(t)$ is denoted as:

p(t)=\left\{\begin{array}[]{cl}max\{0,d(t)-g(t)-\tilde{b}(t)\}&\mbox{, if discharging}\\ max\{0,d(t)-g(t)\}&\mbox{, if charging}\end{array}\right.

(15)

IV-A Energy Cost

The billing policy of the energy cost for the mobile operators throughout the entire billing cycle typically make up of two components, energy charge and demand charge, which is widely applied in previous [14, 15, 20]. And we will introduce them in detail as follows.

•

Energy Charge: the total consumed electricity amount (in kWh) throughout the entire billing cycle (in the unit $kWh and denoted by $\lambda_{e}$ ).
•

Demand Charge: the peak power consumption supplied by power gird (in kW) during the entire billing cycle (in the unit $kW and denoted by $\lambda_{d}$ ).

Therefore, the incurred cost of energy charge of the whole system in each time slot $t$ can be represented by:

\mathcal{C}^{e}(t)=\lambda_{e}\cdot p(t)\cdot\Delta t

(16)

Accordingly, the incurred cost of demand charge of the whole system in each time slot $t$ can be represented by:

\mathcal{C}^{d}(t)=\lambda_{d}\cdot max\big{\{}0,p(t)-p_{max}\big{\}}

(17)

where $p_{max}$ records the peak power consumption during the past $t-1$ time slots. For any arbitrary time slot $t$ , if $p(t)-p_{max}>0$ , $p_{max}$ will be updated to $p(t)$ accordingly.

IV-B Investment Cost

Every usage of this equipment (solar PV, wind turbine, and battery storage) incurs a certain reduction of its lifetime, which is essential for the investor. Therefore, it is significant to understand, detail and quantify the various factors influencing the performance loss curves. For the accuracy of our model, we quantify the investment cost in every time slot as follows.

IV-B1 Renewable Energy Generator Cost

As modules of a renewable energy generated system age, they gradually lose some performance. In this paper, we assume the decline of the system is linear and positively related to its using time. We denote the lifetime of the renewable energy generator as $L$ , which indicates the total time it can be used. For an arbitrary time slot $t$ , the remaining lifetime of the renewable energy generator is denoted as $l(t)$ , which is constrained by $0\leq l(t)\leq L$ . The renewable energy generator has to be discarded and replaced by a new one if $l(t)\leq 0$ . Given the remaining lifetime of the renewable energy generator at time $t-1$ , the remaining lifetime at time $t$ is updated by:

l(t)=l(t-1)-\Delta t\cdot u(t)

(18)

where $u(t)$ is defined by:

u(t)=\left\{\begin{array}[]{cl}1&\mbox{, if using}\\ 0&\mbox{, if not using}\end{array}\right.

(19)

We formulate the using cost of the renewable energy generator in each time slot $t$ as:

\mathcal{C}^{u}(t)=\lambda\cdot\frac{\Delta t\cdot u(t)}{L}

(20)

where $\lambda$ is the investment cost of a new renewable energy generator.

We extend the model of renewable energy generator to specific system, i.e., the solar PV system and wind turbine system. To be detail, i) for the solar PV system, we denote the lifetime, the investment cost, and investment as $l^{s}(t)$ , $\mathcal{C}^{u_{s}}(t)$ , and $\lambda_{s}$ , respectively, ii) for the wind turbine system, we denote the lifetime, the using cost, and investment as $l^{w}(t)$ , $\mathcal{C}^{u_{w}}(t)$ , and $\lambda_{w}$ . Accordingly, we can derive the using cost of the solar PV system and wind turbine system by replacing the symbol in the Eq. 20.

IV-B2 Battery Storage Degradation Cost

Every cycle of discharge/charge operation does some “harm” to the battery (typically lead-acid) and reduces its capacity and lifetime. Especially, a deep discharging severely affect its internal structure, even can permanently damage the battery (e.g., an overdischarging). The battery has to be discarded and replaced by a new one, when the effective capacity drops down to the ”ineffective” level, denoted by $SoE_{ine}$ in this paper.

As illustrated in Fig. 4, each level of DoD has a corresponding number of discharge/charge cycles, thus, we can formulate the battery storage degradation cost by the relationship between both. Given a state of battery at time slot $t$ , i.e., $\langle SoE(t),SoC(t),DoD(t)\rangle$ , the SoE decrease of the battery during this time slot can be measured by:

\Delta{SoE}(t)=\left\{\begin{array}[]{cl}0&\mbox{, if }b(t)\leq 0\\ \frac{1-SoE_{ine}}{h\left(DoD(t-1)+\Delta{DoD}(t)\right)}&\mbox{, if }b(t)>0\end{array}\right.

(21)

where $h(\cdot)$ maps from an input DoD level to the total number of discharge/charge cycles (exemplified in Fig. 4), and $\Delta{DoD}(t)$ gives the increase of DoD and can be calculated by:

\Delta{DoD}(t)=\frac{b(t)\Delta{t}}{\pi}

(22)

With the above expression of SoE decrease in each time slot $t$ , we can then formulate the degradation cost of the battery storage at each time slot $t$ as:

\mathcal{C}^{u_{b}}(t)=\lambda_{b}\cdot\Delta{SoE}(t)

(23)

where $\lambda_{b}$ is a coefficient converting the battery degradation to a monetary cost, with the unit of “$/SoE decrease”.

To sum up, the total investment cost in each time slot $t$ can be calculated as:

\mathcal{C}^{u}(t)=\mathcal{C}^{u_{s}}(t)+\mathcal{C}^{u_{p}}(t)+\mathcal{C}^{u_{b}}(t)

(24)

IV-C Optimization Formulation and Difficulty Analysis

The battery discharging/charging operations is controlled by the controller. Given the state (i.e., $\chi(t)$ ) of the battery storage in time slot $t-1$ , the state in time slot $t$ can be updated by:

\chi(t)\leftarrow\left\{\begin{array}[]{lll}SoE(t)&=&SoE(t-1)-\Delta SoE(t)\\ SoC(t)&=&SoC(t-1)-b(t)\Delta t/{\pi}\\ DoD(t)&=&DoD(t-1)+\Delta DoD(t)\end{array}\right.

(25)

For the entire billing cycle $\mathcal{T}$ , we need to find the optimal battery discharging/charging controlling policy to solve the optimization problem, so as to minimize the total electricity bill during the entire billing cycle, which is defined as follows.


$\displaystyle\underset{b(t)}{\text{min}}$	$\displaystyle\sum_{t=1}^{T}\big{(}\mathcal{C}^{e}(t)+\mathcal{C}^{d}(t)+\mathcal{C}^{u}(t)\big{)}$	(26a)
s.t.	$\displaystyle(\ref{eq:8}),(\ref{con-21}),(\ref{eq:11}),\text{and }(\ref{con-23}),\forall t\in\mathcal{T}$	(26b)

When solving the above optimization problems, however, we are faced with the following three challenges.

IV-C1 Uncertainty of Renewable Energy

Renewable energy generation is affected by multiple factors such as outdoor temperature and wind velocity. It is hard to accurately forecast renewable energy generation (i.e., $g(t)$ ) and make the optimal discharging/charging operations (i.e., $b(t)$ ) of the battery storage without accurate information in advance, as the unpredictable and intermittent nature of these factors. Therefore, we need to propose a method to tackle the problem of the uncertainty of renewable energy generation.

IV-C2 Dynamic of Power Demand

In our modeled problem, we assume the power demand (i.e., $p(t)$ ) is known in advance and thus can essentially optimize in an offline way. However, such assumptions are unrealistic in practice. In fact, traditional offline optimization methods (e.g., dynamic programming[22, 23]) are hard to find the global optimal solution, as the power demand can be obtained only when the workload arrives at the 5G BS. Thus, an online method to deal with the dynamic power demands (i.e., $d(t)$ ), and make optimal discharging/charging operations (i.e., $b(t)$ ), is in great need.

IV-C3 High Computation Complexity

The optimization problem in Eq. 26 has embedded NP-hard subproblems. Firstly, in every time slot $t$ , the controller needs to search the action space (mainly determined by $M$ ), so as to find the the optimal discharging/charging operation (i.e., $b(t)$ ). For simplicity to solving the optimization problem, in this paper, we discretize the SoC of battery in to $M$ equal-spaced states, however, in real scenario, the state of the battery is continous, which leads to an enormous searching space. Secondly, during the entire billing cycle (i.e., $\mathcal{T}$ ), it is challenging for the controller to continuously make the optimal discharging/charging operation.

To tackle the above three challenges, we propose an online discharging/charging operation controlling method based on deep reinforcement learning (DRL) in the following section.

V A DRL-based Battery Operation Approach

Recent breakthrough of deep reinforcement learning (DRL) [24] provides a promising technique for enabling effective experience-driven control, which exploit the past experience (e.g., historical battery discharging/charging operations) for better decision-making by adapting to current state of environment. We consider DRL is particularly suitable for online discharging/charging operation controlling because: i), it is capable of handling a high-dimensional state space (such as AlphaGo [25]), which is more advantageous over traditional Reinforcement Learning (RL) [26], and ii) it is able to deal with highly dynamic time-variant environments such as time-varying power demand and renewable energy generation. Next, we will introduce the basic components and concepts of DRL and the proposed DRL-based battery discharging/charging controlling policy in detail.

V-A Components & Concepts

A typical DRL framework consists of five key components: agent, state, action, policy, and reward. The concept and design of each component in our DRL-based battery discharging/charging controlling policy is explained as follows.

•

Agent: The role of the agent is to make decisions in every episode by interacting with the environment. Specifically, at the beginning of each time slot, it determines the discharging/charging operations (i.e., $b(t)$ ) according the current state (e.g., $d(t)$ , $g(t)$ and $\chi(t)$ ) of the environment. The objective is to find an optimal battery discharging/charging controlling policy to minimize the total electricity bill during the entire billing cycle.
•

State: At each episode, the agent first observes the state of the current environment to take action. In order to take the optimal action at each episode, the current state should cover as much information as possible. In this paper, we define the state vector of the current environment as $s(t)=[d(t),g(t),\chi(t),p_{max}]$ , which concludes the current information of the power demand, the renewable energy generation, the battery storage and the peak power consumption.
•

Action: After observing the state of the environment, the agent will take an action accordingly. In our problem, the action is to control the battery discharging/charging operations in each time slot, i.e., $b(t)$ , specifically, i) whether the battery should be discharged or charged, and ii) how much energy should be discharged or charged. We denote the action taken at time $t$ by $a(t)$ , which is equivalent to $b(t)$ .
•

Policy: The battery discharging/charging controlling policy $\psi(s(t)):\mathcal{S}\to\mathcal{A}$ defines the mapping relationship from the state space to the action space, where $\mathcal{S}$ and $\mathcal{A}$ represent the state space and the action space, respectively. Specifically, the controlling policy can be represented by set of $a(t)=\psi(s(t))$ , which maps the state of the environment to the action at time slot $t$ .
•

Reward: After interacting with the environment, the agent will receive a reward $r(t)$ (calculated by the reward function $R(s(t),a(t))$ ), which indicates the effect of the action in this episode, so as to update the controlling policy. The objective of the agent is to find a policy $\psi$ to maximize the total reward through continuous interacting with the environment. The design of the reward function significantly affect the performance of the DRL-based algorithm, and we will introduce its detail in the next subsection.

To sum up, at each episode, the agent observes the state $s(t)$ , takes an action $a(t)$ generated by the policy $\psi$ , and receives a reward $r(t)$ calculated by the reward function $R(s(t),a(t))$ . The objective of the proposed DRL-based battery discharging/charging controlling policy is to take the optimal action in every episode so as to maximize the total reward.

V-B Reward Function Design

At the end of each time slot, the agent evaluates the performance of the action using a reward function, which transforms the performance statistics to a numerical utility value. For an arbitrary time $t$ , the agent observes the state $s(t)$ , takes the action $a(t)$ and adopts the following reward function to access the performance of the controlling action:

R(s(t),a(t))=exp\big{(}V^{e}(t)+V^{d}(t)+V^{u}(t)\big{)}

(27)

in which:

•

$V^{e}(t)=-\mathcal{C}^{e}(t)$ , measures the reward of the incremental energy charge caused by the action in time slot $t$ .
•

$V^{d}(t)=-\mathcal{C}^{d}(t)$ , measures the reward of the incremental demand charge caused by the action in time slot $t$ .
•

$V^{u}(t)=-\mathcal{C}^{u}(t)$ , measures the reward of the investment cost caused by the action in time slot $t$ .

At the end of each time slot, the agent evaluates the performance of the action by the reward $r(t)$ calculated by the reward function $R(s(t),a(t))$ . In the DRL-based framework, the objective is to maximize the expected cumulative discounted reward:

r(t)=\mathbb{E}\big{[}\sum_{k=t}^{\infty}{\gamma}^{k}R(s(t),a(t))\big{]}

(28)

where $\gamma\in(0,1]$ is a factor discounting future rewards.

V-C Learning Process Design

The learning process of the algorithm adopts a deep neural network (DNN) called Deep Q-Network (DQN) to derive the correlation between each state-action pair $(s(t),a(t))$ and its value function $Q(s(t),a(t))$ , which is the expected discounted cumulative reward. If the environment is in state $s(t)$ and follows action $a(t)$ , the value function of the state-action $(s(t),a(t))$ can be represented as:

Q(s(t),a(t))=\mathbb{E}\big{[}r(t)|s(t),a(t)\big{]}

(29)

After obtaining the value of each state-action $(s(t),a(t))$ , the agent selects the action $a(t)$ with the $\epsilon$ -greedy policy $\psi$ , that is, randomly selects the action with the probability of $\epsilon$ , and chooses the action with the maximum of $Q(s(t),a(t))$ with the probability of 1- $\epsilon$ , i.e., $\mbox{argmax}_{a(t)}Q(s(t),a(t))$ .

As illustrated in Fig. 5, two effective techniques were introduced in [24] to improve stability: replay buffer and target network. Specifically,

•

Replay Buffer: Unlike traditional reinforcement learning, DQN applies a replay buffer to store state transition samples in the form of $\langle s(t),a(t),r(t),s(t+1)\rangle$ collected during learning. Every $\kappa$ time steps, the DRL-based agent updates the DNN with a mini-batch experiences from the replay buffer by means of stochastic gradient descent (SGD): $\theta_{i+1}=\theta_{i}+{\sigma}{\bigtriangledown}_{\theta}Loss(\theta)$ , where $\sigma$ is the learning rate. Compared with Q-learning (only using immediately collected samples), randomly sampling from the replay buffer allows the DRL-based agent to break the correlation between sequentially generated samples, and learn from a more independently and identically distributed past experiences. Thus, the replay buffer can smooth out learning and avoid oscillations or divergence.
•

Target Network: There are two neural networks with the same structure but different parameters in DQN, the main net and the target net. $Q(s,a;\theta)$ and $Q(s,a;\tilde{\theta})$ represent the current Q-value and target Q-value generated by the main net and the target net, respectively. The DRL-based agent uses the target net to estimate the target Q-value $\tilde{Q}$ for training the DQN. Every $\tau$ time steps, the target net copies the parameters from the main net, whose parameters are updated in real-time. After introducing the target net, the target Q-value will remain unchanged for a period time, which reduces the correlation between the current Q-value and the target Q-value and improves the stability of the algorithm.

Accordingly, the DQN can be trained by the loss:

Loss(\theta)\leftarrow\mathbb{E}\big{[}(\tilde{Q}-Q(s(t),a(t);\theta))^{2}\big{]}

(30)

where $\theta$ is the network parameters of the main net, and $\tilde{Q}$ is the target Q-value and calculated by:

\tilde{Q}\leftarrow r(t)+\gamma{max}_{a(t+1)}Q(s(t+1),a(t+1);\tilde{\theta})

(31)

where $\tilde{\theta}$ is the network parameters of the target net and it updates every $\tau$ time slots by coping from the main net.

Input: Power demand of BS

d(t)

and renewable energy generation

g(t)

1\leq t\leq T

Output: Discharging/charging actions

a(t)

1\leq t\leq T

1 Initialize replay buffer (RB) to capacity N;

2 Initialize main net

Q

with random weights

\theta

;

3 Initialize target net

\tilde{Q}

with weights

\tilde{\theta}=\theta

;

4 for $episode=1:MaxLoop$ do

5 for $t=1:T$ do

6 Get environment state

s(t)

;

a(t)=\left\{\begin{array}[]{l}\mbox{argmax}_{a}Q(s(t),a(t);\theta),\mbox{ prob. }\epsilon\\ \mbox{random action,}\mbox{ prob. }1-\epsilon\end{array}\right.

8 Execute action

a(t)

and receive

r(t)

and

s(t+1)

;

9 Store

\langle(s(t),a(t),r(t),s(t+1)\rangle

into RB;

10 Randomly sample a mini-batch of experience

\langle s(i),a(i),r(i),s(i+1)\rangle

from RB by every

\kappa

steps;

\tilde{Q}=\left\{\begin{array}[]{l}r(t),\mbox{ terminates at step }t+1\\ r(t)+{\gamma}\mbox{max}_{a(t+1)}\{{Q}(s(t+1),a(t+1);\tilde{\theta})\},\text{ else}\end{array}\right.

12 Perform SGD on

(\tilde{Q}-Q(s,a;\theta))^{2}

w.r.t.

\theta

;

13 Set

\tilde{Q}=Q

by every

\tau

steps;

15 end for

17 end for

Algorithm 1 Battery Controlling Algorithm with DRL

To sum up, the learning process is depicted by the pseudo-code in Alg. 1. The controller first initializes the replay buffer and the parameters (i.e., $\theta$ and $\tilde{\theta}$ ) of the main net and target net, respectively. After obtaining the value of each state-action $(s(t),a(t))$ , the agent selects the action $a(t)$ with the $\epsilon$ -greedy policy $\psi$ , and then performs the action $a(t)$ and interacts with the environment. Next, the agent will receive the reward $r(t)$ and observe the next state $s(t+1)$ of the environment, meanwhile store the state $\langle s(t),a(t),r(t),s(t+1)\rangle$ into the RB. Every $\kappa$ time steps, the agent updates the DNN by Eq. 30 with a mini-batch experience from the replay buffer by means of stochastic gradient descent (SGD). The target net will copy the parameters of the main net by every $\tau$ time steps. During the learning process, we set the learning rate $\sigma$ is 0.001, the $\epsilon$ in $\epsilon$ -greedy method is 0.9, the discount accumulative factor $\gamma$ is 0.9, and the step parameters $\tau$ and $\kappa$ are both 2000.

VI Performance Evaluation

We evaluate the performance of the proposed DRL-based battery discharging/charging controlling policy through extensive numerical analysis.

VI-A Experiment Setup

VI-A1 BS and Power Consumption Data

In order to show the performance of the proposed method, we mainly consider the 5G BS deployed at the three areas, i.e., resident area, office area, and comprehensive area, whose power consumption within one-week period are illustrated in Fig. 2, and we assume the power consumption of the same type BSs in different cities (e.g., Beijing, Shanghai and Guangzhou) is the same. For simplicity, we denote the BS deployed at the areas of resident, office, and comprehensive as type I, type II, and type III, respectively. We will apply the BESS aided renewable energy supply solution to different types of BSs in different cities under different weather conditions and evaluate its performance through massive simulation experiment.

VI-A2 Renewable Energy Generation Data

In Sec. III-C, we introduce the factors that impact the generation of renewable energy. For simplicity, we divide the weather conditions into three types. Accordingly, the output power pattern of the solar PV and wind turbine could be divided into three types. Specifically, for the solar PV, the weather conditions are divided into the clear day, partial cloudy day, and cloudy day; for the wind turbine, the weather conditions are divided into the high wind velocity, middle wind velocity, and low wind velocity. The output power patterns of the solar PV and wind turbine under different weather conditions are illustrated in Fig. 6.

VI-A3 Equipment Parameter Settings

In this study, we use a quantity of 15 Panasonic Sc330 solar modules each with a power rating of 330W and JFNH-5kW wind turbine of Qingdao Jinfan Energy Science and Technology Co., Ltd. For the battery storage, we consider the mainstream lithium-ion (LI) battery on the current market. We then refer to [15, 27, 28] for parameter settings of electricity billing policy and battery configurations and the main parameter settings are summarized in Table II.

VI-A4 Scenario Settings

As the generation of the renewable energy is significantly affected by the weather conditions, we choose three representative cities in China for this paper, i.e., Beijing, Shanghai, and Guangzhou, which has different weather pattern during the billing cycle window (i.e., from 1st June 2020 to 30th June 2020). We compare and analyze the overall energy cost (including energy charge, demand charge and investment cost), detailed controlling results and return of investment (ROI) for three types of BSs (i.e., type I, type II, and type III BSs) in these cities, and the specific day of the weather conditions in these cities during the billing cycle window are shown in Fig. 7. Specifically, i) for Beijing, it has more clear days during the billing cycle window, ii) for Shanghai, it is in the plum rain season during the billing cycle window, thus it has more high-wind days but less clear days, and iii) for Guangzhou, the cloudy days and the low-wind days are relatively more than other two cities.

TABLE II: Parameter Settings

	Parameter	Setting
Billing Policy	billing cycle window $\mathcal{W}$	one month (30 days)
	¹energy charge price $\lambda_{e}$	US $\$0.049/kWh$
	¹demand charge price $\lambda_{d}$	US $\$16.08/kW$
	²battery cost $\lambda_{b}$	US $\$271/$ kWh
Battery Config.	discharge efficiency $\alpha$	$85\%$
	charge efficiency $\beta$	$99.9\%$
	max charge rate $R+$	$16$ MW
	max discharge rate $R-$	$8$ MW
Solar PV	power rating $g^{s}$	4950 W
	price $\lambda_{s}$	US$3950
	lifetime $L^{s}$	25 years
Wind Turbine	power rating $g^{w}$	6000 W
	price $\lambda_{w}$	US$4500
	lifetime $L^{w}$	20 years

¹Prices of energy/demand charges in 2018, referring to the contract in [27].

²Battery capacity costs in 2018, referring to the data in [28].

VI-B Performance under Different Weather Conditions

As is shown in Fig. 6, the output power patterns of the solar PV and wind turbine are both divided into three types under different weather conditions. Accordingly, the weather pattern can be divided into nine types: clear & high-wind day, clear & middle-wind day, clear & low-wind day, partial cloudy & high-wind day, partial cloudy & middle-wind day, partial cloudy & low-wind day, cloudy & high-wind day, cloudy & middle-wind day, and cloudy & low-wind day.

The power supply patterns under different weather conditions in one day period of 5G BS at the area of resident, office, and comprehensive are illustrated in Fig. 8, Fig. 9, and Fig. 10 (in the appendix), respectively. As we can see, the BESS aided renewable energy supply solution could significantly reduce the power from the grid (i.e., energy charge and demand charge). Specifically, with the increase of radiation and wind velocity, renewable energy generation increased accordingly. It could cover most of the power demand and reduce the power supplied from the power grid. Especially, under high-wind days, the power demand could be totally supplied by the renewable energy and battery storage and need 0 power from the grid.

After calculating the power supply paradigm under different weather patterns, we can derive the electricity bill of these three types of BSs during the billing cycle in different cities (i.e., different weather patterns, which is illustrated in Fig. 7), and the results from all the set scenarios are summarized in Table III.

TABLE III: Results Summary (One Billing Cycle)

BS Type Scenario Energy Charge ($) Demand Charge ($) Investment Cost ($) Cost Saving ($) Saving Ratio (%) Type I No deployment 44.6 23.1 0 / / Deployment in Beijing 5.0 12.0 0.4 50.4 74.4 Deployment in Shanghai 4.7 12.0 0.4 50.7 74.8 Deployment in Guangzhou 5.9 12.0 0.3 49.5 73.2 Type II No deployment 40.1 20.2 0 / / Deployment in Beijing 4.8 9.1 0.3 46.1 76.4 Deployment in Shanghai 3.8 9.1 0.4 47.0 77.9 Deployment in Guangzhou 5.3 9.1 0.3 45.6 75.6 Type III No deployment 45.6 22.8 0 / / Deployment in Beijing 6.8 13.9 0.3 47.4 69.3 Deployment in Shanghai 5.7 13.9 0.4 48.4 70.8 Deployment in Guangzhou 7.9 13.9 0.2 46.4 67.8

Specifically, for a single 5G BS without the proposed power supply paradigm, the energy charge and the demand charge are $45.6 and $22.8, respectively. However, after utilizing the BESS aided renewable energy supply solution on the 5G BS, the electricity bill is significantly reduced. Especially in Shanghai, which has relatively more clear and high-wind days, the energy charge and the demand charge can be reduced to $3.8 and $9.1, respectively. Although there exists equipment degradation during the discharge/charge cycles, the investment cost still keeps at a well accepted level. The highest cost saving for the BS which utilized the proposed power supply paradigm in Beijing, Shanghai, and Guangzhou in one billing cycle is $50.4, $50.7 and $49.5, respectively. Accordingly, the saving ratio can be up to 74.4%, 74.8% and 73.2%, respectively.

VI-C Performance under Different Types of BSs

As the different types of BSs has diverse power demand, resulting in different energy charge and demand charge, thus the performance of deployment of the BESS aided renewable energy supply solution could be different.

Specifically, as is shown in Table III, the type I BS has the highest cost saving compared to other two types of BSs, i.e., $50.4 in Beijing, $50.7, and $49.5. This is because that type I BS has the biggest power demand and peak value (near 1450 watts), making it has great potential in energy-saving and peak power shaving. Besides, as type II BS’s power demands are relatively small, the generated and stored renewable energy can effectively reduce the power grid supply. Therefore it has the highest saving ratio, i.e., 76.4% in Beijing, 77.9% in Shanghai, and 75.6% in Guangzhou.

VI-D ROIs of Different City and Type Deployment

The return of investment (ROI) is a financial metric defined by the benefit (cost saving in our case) divided by the total investment. It indicates the probability of gaining a return from an investment and has been widely used to evaluate the efficiency of an investment [30]. Typically, a bigger ROI value indicates a higher investment efficiency. With the costs of renewable energy generator and battery storage (given in Table II), the total investments can be calculated. Accordingly, the ROIs can thus be derived with the results in Table III.

TABLE IV: Parameter Settings

BS Type	Beijing	Shanghai	Guangzhou
Type I	5.43%	5.46%	5.33%
Type II	4.97%	5.06%	4.91%
Type III	5.11%	5.21%	5.00%

The ROIs of different types of BSs deployed in different cities are shown in Table IV. Specifically, type I BS has the highest ROI, which could reach to 5.43% in Beijing, 5.46% in Shanghai, and 5.33% in Guangzhou, respectively, indicating a relatively high investment efficiency for the operators. This is because that type I BS has the biggest cost saving.

As the equipment’s cost is estimated to decrease dramatically in the future [31], and the ROI could rise significantly in 5G and beyond. Additionally, as we can see, the city with more clear and high-wind days will obtain a bigger ROI value, thus the proposed solution is more suitable for those cities with more sunny and windy days.

It is worth noting that, we assume the deployed renewable energy generator and the battery storage only supply power to one single 5G BS, and thus the surplus renewable energy (when the battery is full) will be discarded. This actually leads to a relatively low utilization, as given in this work. In practice, the generated renewable energy could supply to multiple BSs [7], so that the ROI and utilization of the renewable energy could be further improved.

VII Related Work

The most involved related literatures can be divided into the following three categories.

VII-A Base Station Energy-saving Method

With the increase of the BS power consumption, the energy-efficient design of cellular networks has recently received significant attention. Typically, the BS energy-saving methods are divided into three levels, equipment-level energy saving, site-level energy saving and network-level energy saving [32].

On the equipment-level, researchers propose new schemes (e.g., the scheme in this paper), new materials (e.g., semiconductor material), and new functions so as to achieve the energy-efficient goal. Besides, some liquid heat dissipation, high power amplifier efficiency, and high integration applications are applied in the BS so as to reduce the power consumption of the whole machine year by year.

On the site-level, one common scheme is to switching-on/off the BS related with the traffic load [33, 34, 35]. To be detail, switch-on the BS when the traffic load is large, and switch-off the BS when the traffic load is low. In addition, by combining with AI, the accuracy of the traffic load prediction can be improved so that the corresponding energy-saving policies can be elaborately formulated.

On the network-level, With the deployment of the 5G network, multiple networks (e.g., 4G/LTE and 5G) coexist in the current network, so the energy efficiency can be improved through the application of network-level energy-saving technology. Based on the basic data (e.g., configuration and performance) of the network and the built-in strategy, the multi network cooperative energy-saving technology [36] can realize the goal of reducing energy consumption by turning off the BS under the condition of ensuring service quality.

VII-B General System Peak Power Shaving

Peak power (i.e., the demand charge) is a sensitive factor for the power grid, as it occurs occasionally and takes place only for a small percentage of the time in a day [37]. The traditional solution is to increase the grid capacity, leading to uneconomic and inefficiency to supply peak power. The peak power shaving is a preferable approach to overcome these disadvantages, making the load curve flatten by reducing the peak amount of load and shifting it to times of lower load [38]. Typically, the related works are divided into the following three categories:

Peak shaving using energy storage system (ESS): Integrating energy storage systems to the grid is the most potent strategy of peak shaving due to its economic benefits [39, 40, 41]. Specifically, peak power shaving is achieved through the process of charging ESS when demand is low (off-peak period) and discharging when demand is high (one-peak period).

Peak shaving using electric vehicles (EV): Since the storage energy of electric vehicles is usually not fully utilized each day, this technology has the potential to provide peak shaving service [42, 43, 44]. For example, Alam et al. [45] proposed an effective strategy to utilize PEV batteries for both traveling and peak shaving purpose.

Peak shaving using demand side management (DSM): Demand side management refers to the programs that may influence the customers to balance their electricity consumption with the power supply system’s generation capacity [46, 47, 48, 49]. For example, Rozali et al. [50] aimed to achieve maximum peak shaving through DR under power shaving analysis.

VII-C Battery Storage Optimal Control

The optimal control of energy storage has been extensively studied in the past. Most related works formulate an optimization problem that aims to maximize the revenue generated by the battery storage co-located with renewable energy generator.

Babacan et al. [51] proposed a convex program to minimize the electricity bill of operators. Ratnam et al. [52] aimed to maximize the daily operational savings that accrue to customers while penalizing large voltage swings stemming from reverse power flow and peak load. Kazhamiaka et al. [53] studied the profitability of residential PV-storage systems in three jurisdictions and set up an integer linear program to determine the battery controlling policy. These works assume the generations of renewable energy and the power demand are known in advance and can be optimized in an offline way. However, these assumptions are unpractical in the real world.

Several papers study the optimal control of batteries under uncertainty and randomness. Guan et al. [54] utilized a reinforcement learning method to minimize the homeowner’s cost by taking an action that yields the best expected reward. EnergyBoost [19] could provide a predictable ability of the renewable energy generation and power demand. However, these works are only applied in the home scenario, which generates a few demands compared to 5G BS. Therefore, we propose the DRL-based method to tackle the problem of large and constrained state- and action-space and the uncertainty of renewable energy generation and power demand.

VIII Conclusions

To copy with the ever-increasing electricity bill for mobile operators in 5G era, we proposed a BESS aided reconfigurable energy supply solution for the 5G BS system, which models the battery discharging/charging reconfiguration as an optimization problem. With our proposed solution, besides the power grid, a BS can be powered by the renewable energy and the battery storage, to cut down the total energy cost. To solve the problem under the dynamic power demands and renewable energy generation, we developed a DRL-based approach to the BESS operation that accommodates all factors in the modeling phase and makes decisions in real-time. To evaluate the performance of our solution, we chose three cities with different weather patterns for experiments. The experimental results show that our reconfigurable power supply solution can significantly reduce the electricity bill and improve the renewable energy utilization.

References

[1] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanly, A. Lozano, A. C. Soong, and J. C. Zhang, “What will 5g be?” IEEE Journal on selected areas in communications, vol. 32, no. 6, pp. 1065–1082, 2014.
[2] M. Gerla, E.-K. Lee, G. Pau, and U. Lee, “Internet of vehicles: From intelligent grid to autonomous cars and vehicular clouds,” in 2014 IEEE world forum on internet of things (WF-IoT). IEEE, 2014, pp. 241–246.
[3] N. Kumar, S. Misra, J. J. Rodrigues, and M. S. Obaidat, “Coalition games for spatio-temporal big data in internet of vehicles environment: A comparative analysis,” IEEE Internet of Things Journal, vol. 2, no. 4, pp. 310–320, 2015.
[4] G. C. Burdea and P. Coiffet, Virtual reality technology. John Wiley & Sons, 2003.
[5] E. D. Muse, P. M. Barrett, S. R. Steinhubl, and E. J. Topol, “Towards a smart medical home,” The Lancet, vol. 389, no. 10067, p. 358, 2017.
[6] S. Misra, P. K. Bishoyi, and S. Sarkar, “i-mac: In-body sensor mac in wireless body area networks for healthcare iot,” IEEE Systems Journal, 2020.
[7] G. Tang, Y. Wang, and H. Lu, “Shiftguard: Towards reliable 5g network by optimal backup power allocation,” in IEEE SmartGridComm, 2020, pp. 1–6.
[8] R. Fu, D. Feldman, R. Margolis, M. Woodhouse, and K. Ardani, “Us solar photovoltaic system cost benchmark: Q1 2017,” EERE Publication and Product Library, Tech. Rep., 2017.
[9] J. A. Turner, “A realizable renewable energy future,” Science, vol. 285, no. 5428, pp. 687–689, 1999.
[10] X. Wang, A. V. Vasilakos, M. Chen, Y. Liu, and T. T. Kwon, “A survey of green mobile networks: Opportunities and challenges,” Mobile Networks and Applications, vol. 17, no. 1, pp. 4–20, 2012.
[11] B. Nykvist and M. Nilsson, “Rapidly falling costs of battery packs for electric vehicles,” Nature climate change, vol. 5, no. 4, pp. 329–332, 2015.
[12] A. Mondal, S. Misra, and M. S. Obaidat, “Distributed home energy management system with storage in smart grid using game theory,” IEEE Systems Journal, vol. 11, no. 3, pp. 1857–1866, 2015.
[13] H. Wang, F. Xu, Y. Li, P. Zhang, and D. Jin, “Understanding mobile traffic patterns of large scale cellular towers in urban environment,” in Proceedings of the 2015 Internet Measurement Conference, 2015, pp. 225–238.
[14] H. Xu and B. Li, “Reducing electricity demand charge for data centers with partial execution,” in Proceedings of the 5th international conference on Future energy systems, 2014, pp. 51–61.
[15] M. Dabbagh, B. Hamdaoui, A. Rayes, and M. Guizani, “Shaving data center power demand peaks through energy storage and workload shifting control,” IEEE Transactions on Cloud Computing, 2017.
[16] L. Qingdao Jinfan Energy Science & Technology Co., “Renewable energy generator,” http://www.jinfanenergy.cn, 2019.
[17] W. F. Holmgren, R. W. Andrews, A. T. Lorenzo, and J. S. Stein, “Pvlib python 2015,” in 2015 ieee 42nd photovoltaic specialist conference (pvsc). IEEE, 2015, pp. 1–5.
[18] A. Jahid, M. S. Hossain, M. K. H. Monju, M. F. Rahman, and M. F. Hossain, “Techno-economic and energy efficiency analysis of optimal power supply solutions for green cellular base stations,” IEEE Access, vol. 8, pp. 43 776–43 795, 2020.
[19] B. Qi, M. Rashedi, and O. Ardakanian, “Energyboost: Learning-based control of home batteries,” in Proceedings of the Tenth ACM International Conference on Future Energy Systems, 2019, pp. 239–250.
[20] Y. Shi, B. Xu, B. Zhang, and D. Wang, “Leveraging energy storage to optimize data center electricity cost in emerging power markets,” in Proceedings of the Seventh International Conference on Future Energy Systems, 2016, pp. 1–13.
[21] B. Aksanli, T. Rosing, and E. Pettis, “Distributed battery control for peak power shaving in datacenters,” in IEEE IGCC, 2013, pp. 1–8.
[22] D. K. Maly and K.-S. Kwan, “Optimal battery energy storage system (bess) charge scheduling with dynamic programming,” IEE Proceedings-Science, Measurement and Technology, vol. 142, no. 6, pp. 453–458, 1995.
[23] A. Oudalov, R. Cherkaoui, and A. Beguin, “Sizing and optimal operation of battery energy storage system for peak shaving application,” in 2007 IEEE Lausanne Power Tech. IEEE, 2007, pp. 621–625.
[24] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
[25] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, pp. 484–489, 2016.
[26] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[27] Dominion Energy South Carolina, Inc., “Rate 23 - industrial power service,” https://etariff.psc.sc.gov/Organization/TariffDetail/150?OrgId=411, 2020.
[28] US Department of Energy, “Energy storage technology and cost characterization report,” https://www.energy.gov/eere/water/downloads/energy-storage-technology-and-cost-characterization-report, 2019.
[29] China Meteorological Administration, “Historical weather forecast,” http://www.weather.com.cn/, 2020.
[30] Wikipedia, “Return on investment,” https://en.wikipedia.org/wiki/Return_on_investment, 2020.
[31] National Renewable Energy Laboratory (NREL), “Cost projections for utility-scale battery storage,” https://www.nrel.gov/docs/fy19osti/73222.pdf, 2019.
[32] China Mobile, “White paper on energy-saving technology of 5g base stations,” 2020.
[33] L. Chiaraviglio, D. Ciullo, M. Meo, M. A. Marsan, and I. Torino, “Energy-aware umts access networks,” 2008.
[34] E. Oh and B. Krishnamachari, “Energy savings through dynamic base station switching in cellular wireless access networks,” in IEEE Global Telecommunications Conference. IEEE, 2010, pp. 1–5.
[35] A. Kumar and C. Rosenberg, “Energy and throughput trade-offs in cellular networks using base station switching,” IEEE Transactions on Mobile Computing, vol. 15, no. 2, pp. 364–376, 2015.
[36] T. Chen, Y. Yang, H. Zhang, H. Kim, and K. Horneman, “Network energy saving technologies for green wireless access networks,” IEEE Wireless Communications, vol. 18, no. 5, pp. 30–38, 2011.
[37] M. Uddin, M. F. Romlie, M. F. Abdullah, S. Abd Halim, T. C. Kwang et al., “A review on peak load shaving strategies,” Renewable and Sustainable Energy Reviews, vol. 82, pp. 3323–3332, 2018.
[38] A. Nourai, V. Kogan, and C. M. Schafer, “Load leveling reduces t&d line losses,” IEEE Transactions on Power Delivery, vol. 23, no. 4, pp. 2168–2173, 2008.
[39] E. Reihani, M. Motalleb, R. Ghorbani, and L. S. Saoud, “Load peak shaving and power smoothing of a distribution grid with high renewable energy penetration,” Renewable energy, vol. 86, pp. 1372–1379, 2016.
[40] S. Son and H. Song, “Real-time peak shaving algorithm using fuzzy wind power generation curves for large-scale battery energy storage systems,” International Journal of Fuzzy Logic and Intelligent Systems, vol. 14, no. 4, pp. 305–312, 2014.
[41] O. Lavrova, F. Cheng, S. Abdollahy, H. Barsun, A. Mammoli, D. Dreisigmayer, S. Willard, B. Arellano, and C. Van Zeyl, “Analysis of battery storage utilization for load shifting and peak smoothing on a distribution feeder in new mexico,” in 2012 IEEE PES Innovative Smart Grid Technologies (ISGT). IEEE, 2012, pp. 1–6.
[42] C. G. Tse, B. A. Maples, and F. Kreith, “The use of plug-in hybrid electric vehicles for peak shaving,” Journal of Energy Resources Technology, vol. 138, no. 1, 2016.
[43] Y. Yao, W. Gao, and Y. Li, “Optimization of phev charging schedule for load peak shaving,” in 2014 IEEE Conference and Expo Transportation Electrification Asia-Pacific (ITEC Asia-Pacific). IEEE, 2014, pp. 1–6.
[44] N. Leemput, F. Geth, B. Claessens, J. Van Roy, R. Ponnette, and J. Driesen, “A case study of coordinated electric vehicle charging for peak shaving on a low voltage grid,” in 2012 3rd IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe). IEEE, 2012, pp. 1–7.
[45] M. J. E. Alam, K. M. Muttaqi, and D. Sutanto, “A controllable local peak-shaving strategy for effective utilization of pev battery capacity for distribution network support,” IEEE Transactions on Industry Applications, vol. 51, no. 3, pp. 2030–2037, 2014.
[46] T. Crosbie, V. Vukovic, M. Short, N. Dawood, R. Charlesworth, and P. Brodrick, “Future demand response services for blocks of buildings,” in Smart Grid Inspired Future Technologies. Springer, 2017, pp. 118–135.
[47] S. Mohagheghi, J. Stoupis, Z. Wang, Z. Li, and H. Kazemzadeh, “Demand response architecture: Integration into the distribution management system,” in IEEE International Conference on Smart Grid Communications. IEEE, 2010, pp. 501–506.
[48] M. Muratori and G. Rizzoni, “Residential demand response: Dynamic energy management and time-varying electricity pricing,” IEEE Transactions on Power systems, vol. 31, no. 2, pp. 1108–1117, 2015.
[49] A. Samanta and S. Misra, “Energy-efficient and distributed network management cost minimization in opportunistic wireless body area networks,” IEEE Transactions on Mobile Computing, vol. 17, no. 2, pp. 376–389, 2017.
[50] N. E. M. Rozali, S. R. W. Alwi, Z. A. Manan, and J. J. Klemeš, “Peak-off-peak load shifting for hybrid power systems based on power pinch analysis,” Energy, vol. 90, pp. 128–136, 2015.
[51] O. Babacan, E. L. Ratnam, V. R. Disfani, and J. Kleissl, “Distributed energy storage system scheduling considering tariff structure, energy arbitrage and solar pv penetration,” Applied Energy, vol. 205, pp. 1384–1393, 2017.
[52] E. L. Ratnam, S. R. Weller, and C. M. Kellett, “An optimization-based approach to scheduling residential battery storage with solar pv: Assessing customer benefit,” Renewable Energy, vol. 75, pp. 123–134, 2015.
[53] F. Kazhamiaka, P. Jochem, S. Keshav, and C. Rosenberg, “On the influence of jurisdiction on the profitability of residential photovoltaic-storage systems: A multi-national case study,” Energy Policy, vol. 109, pp. 428–440, 2017.
[54] C. Guan, Y. Wang, X. Lin, S. Nazarian, and M. Pedram, “Reinforcement learning-based control of residential energy storage systems for electric bill minimization,” in 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC). IEEE, 2015, pp. 637–642.

Hao Yuan received the B.S. degree in management science and engineering from National University of Defense Technology, Changsha, China, in 2019. He is currently working towards the M.S. degree in the same department. His main research interests include edge computing and green communication.

Guoming Tang is a research fellow at the Peng Cheng Laboratory, Shenzhen, Guangdong, China. He received his Ph.D. degree in Computer Science from the University of Victoria, Canada, in 2017, and the Bachelor’s and Master’s degrees from the National University of Defense Technology, China, in 2010 and 2012, respectively. He was also a visiting research scholar of the University of Waterloo, Canada, in 2016. His research mainly focuses on green computing, computing for green and edge computing.

Deke Guo received the B.S. degree in industry engineering from the Beijing University of Aeronautics and Astronautics, Beijing, China, in 2001, and the Ph.D. degree in management science and engineering from the National University of Defense Technology, Changsha, China, in 2008. He is currently a Professor with the College of System Engineering, National University of Defense Technology, and is also with the College of Intelligence and Computing, Tianjin University. His research interests include distributed systems, software-defined networking, data center networking, wireless and mobile systems, and interconnection networks. He is a senior member of the IEEE and a member of the ACM.

Kui Wu received the BSc and the MSc degrees in computer science from Wuhan University, China, in 1990 and 1993, respectively, and the PhD degree in computing science from the University of Alberta, Canada, in 2002. He joined the Department of Computer Science, University of Victoria, Canada, in 2002, where he is currently a Full Professor. His research interests include smart grid, mobile and wireless networks, and network performance evaluation. He is a senior member of the IEEE.

Xun Shao received his Ph.D. in information science from the Graduate School of Information Science and Technology, Osaka University, Japan, in 2013. From 2013 to 2017, he was a researcher with the National Institute of Information and Communications Technology (NICT) in Japan. Currently, he is an Assistant Professor at the School of Regional Innovation and Social Design Engineering, Kitami Institute of Technology, Japan. His research interests include distributed systems and networking. He is a member of the IEEE and IEICE.

Keping Yu received the M.E. and Ph.D. degrees from the Graduate School of Global Information and Telecommunication Studies, Waseda University, Tokyo, Japan, in 2012 and 2016, respectively. He was a Research Associate and a Junior Researcher with the Global Information and Telecommunication Institute, Waseda University, from 2015 to 2019 and 2019 to 2020, respectively, where he is currently an Assistant Professor. His research interests include smart grids, information-centric networking, the Internet of Things, artificial intelligence, blockchain, and information security. He is a Member of the IEEE.

Wei Wei received the M.S. and Ph.D. degrees from Xi’an Jiaotong University, Xi’an, China, in 2005 and 2011, respectively. He is currently an Associate Professor with the School of Computer Science and Engineering, Xi’an University of Technology, Xi’an. He ran many funded research projects as a Principal Investigator and Technical Member. He has published over 100 research articles in international conferences and journals. His current research interests include the area of wireless networks, wireless sensor networks application, image processing, mobile computing, distributed computing, and pervasive computing, the Internet of Things, and sensor data clouds. He is a Senior Member of the China Computer Federation. He is an Editorial Board Member of the Future Generation Computer System, the IEEE Access, Ad Hoc & Wireless Sensor Network, Institute of Electronics, Information and Communication Engineers, and KSII Transactions on Internet and Information Systems.