This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Confidence Estimation Transformer for Long-term Renewable Energy Forecasting in Reinforcement Learning-based Power Grid Dispatching

Xinhang Li, Zihao Li, Nan Yang, Zheng Yuan, Qinwen Wang, Yiying Yang, Yupeng Huang, Xuri Song1, Lei Li1, and Lin Zhang 1Corresponding authorThis work was supported by the National Natural Science Foundation of China (No.62176024) and Open Fund of Beijing Key Laboratory of Research and System Evaluation of Power Dispatching Automation Technology (China Electric Power Research Institute) (No.DZB51202101268).X. Li, Z. Li, Z. Yuan, Q. Wang, Y. Yang, L. Li and L. Zhang are with Beijing University of Posts and Telecommunications, Beijing 100876, China (e-mail: {lixinhang, lizihao, yuanzheng, wangqinwen, yyying, leili, zhanglin}@bupt.edu.cn).N. Yang, Y. Huang, X. Song is with Beijing Key Laboratory of Research and System Evaluation of Power Dispatching Automation Technology (China Electric Power Research Institute), Beijing 100192, China (e-mail: [email protected]; [email protected]; [email protected]).
Abstract

The expansion of renewable energy could help realizing the goals of peaking carbon dioxide emissions and carbon neutralization. Some existing grid dispatching methods integrating short-term renewable energy prediction and reinforcement learning (RL) have been proved to alleviate the adverse impact of energy fluctuations risk. However, these methods omit the long-term output prediction, which leads to stability and security problems on the optimal power flow. This paper proposes a confidence estimation Transformer for long-term renewable energy forecasting in reinforcement learning-based power grid dispatching (Conformer-RLpatching). Conformer-RLpatching predicts long-term active output of each renewable energy generator with an enhanced Transformer to boost the performance of hybrid energy grid dispatching. Furthermore, a confidence estimation method is proposed to reduce the prediction error of renewable energy. Meanwhile, a dispatching necessity evaluation mechanism is put forward to decide whether the active output of a generator needs to be adjusted. Experiments carried out on the SG-126 power grid simulator show that Conformer-RLpatching achieves great improvement over the second best algorithm DDPG in security score by 25.8%\% and achieves a better total reward compared with the golden medal team in the power grid dispatching competition sponsored by State Grid Corporation of China under the same simulation environment. Codes are outsourced in https://github.com/buptlxh/Conformer-RLpatching.

Index Terms:
optimal power flow, reinforcement learning, renewable energy prediction, Conformer-RLpatching

I Introduction

Refer to caption

Figure 1: Architecture of Conformer-RLpatching. Conformer (a) obtains accurate renewable energy long-term prediction, and is divided into the enhanced Transformer-based renewable energy prediction model (b) and the confidence estimation method (c). RLpatching (d) provides an appropriate dispatching strategy, and contains the DDPG-based power flow optimization algorithm (e) and the dispatching necessity evaluation mechanism (f).

The randomness and volatility of renewable energy sources (RESs) have highlighted the pressing need to address stability and security concerns in power grid dispatching [1, 2]. Meanwhile, the utilization rate of renewable energy (URRE), as another performance index in addition to security and economy, brings new challenges to the security constrained economic dispatching (SCED) in the hybrid energy grid [3]. [4] presented a modified version of multi-objective differential evolution by incorporating wind power plant into the dynamic economic dispatch system. Aiming to minimize cost and restrict risks, a novel SCED model [5] and a chance-constrained economic dispatch model [6] were proposed for wind integrated hybrid power system. [7] implemented a short-term control algorithm to smoothen the power dispatching by improving min-max dispatching method. [8] proposed a look-ahead stochastic unit commitment model for robust optimal dispatching. However, all the above dispatching methods did not consider the volatility of RESs and URRE simultaneously.

The emerging artificial intelligence (AI) technology is increasingly applied to the hybrid energy grid dispatching [9]. In August 2021, State Grid Corporation of China (SGCC) and Baidu jointly sponsored State Grid Dispatching AI Innovation Competition, which is authoritative in smart grid.111https://aistudio.baidu.com/aistudio/competition/detail/111/0/introduction SGCC provided SG-126 power grid simulator for participating teams to realize multi-objective power grid dispatching and strive for the highest total reward. The total reward of the first ranked team is 510.09 among nearly 100 teams. In addition, advanced AI algorithms were put forward to further improve flexibility, controllability and observability of the hybrid grid [10]. An innovative dispatch optimization strategy with an uncertainty post-processing approach was proposed in [11], and [12] adopted a learning-based technique to search the optimal joint control policy. [13] presented a manifold-learning-based Isomap algorithm to represent the low-dimensional hidden probabilistic structure of data. [14] proposed a learning-based decision-making framework for the economic energy dispatching based on historical sequences. However, the optimization objectives of [11, 12, 13, 14] did not involve URRE. [15] modeled the power dispatching as sequential decision-making and introduced Deep Reinforcement Learning (DRL). [16] applied deep deterministic policy gradient (DDPG) to microgrids with photovoltaic panels. [17] improved DRL to adaptively respond to the renewable energy fluctuations. The improved decision tree [18, 19] was also able to provide feasible and optimal dispatch decisions for microgrids. [20] developed a new renewable energy management system with short-term forecasting for hourly dispatching. [21] proposed a data-driven RL approach to relieve branch overload in large power systems. However, all the above dispatching algorithms based on AI did not consider the long-term fluctuation of new energy and were only suitable for real-time or short-term dispatching.

In order to improve the accuracy of forecasting, some prediction algorithms have been proposed. [22, 23] adopted the method of historical data mining to analyze data characteristics for regional wind power prediction and wind power ramp prediction, respectively. [24] introduced small-world neural network to reduce wind power forecasting errors. [25, 26] both used hybrid models for high-precision day-ahead short-term photovoltaic output forecasting. Although Informer proposed in [27] has been verified to be able to accurately predict long-term electricity consuming load, it has not been applied to renewable energy prediction.

In summary, the existing dispatching methods fail to achieve multi-objective dispatching for the hybrid power system under the long-term fluctuation of renewable energy. To address the problem, a confidence estimation Transformer for long-term renewable energy forecasting in reinforcement learning-based power grid dispatching (Conformer-RLpatching) is proposed shown in Fig. 1. In Conformer, the confidence estimation method weights the prediction results from the enhanced Transformer to reduce prediction error. RLpatching utilizes the long-term prediction from Conformer and the current power grid observations to provide an appropriate multi-objective dispatching strategy. The contributions of this paper are summarized as follows:

1. This paper proposes Conformer, an enhanced Transformer with a confidence estimation method, which provides accurate prediction to boost the performance of hybrid energy grid dispatching and abate the impact of renewable energy uncertainty.

2. This paper proposes Conformer-RLpatching to realize multi-objective dispatching under the long-term fluctuation of renewable energy. A dispatching necessity evaluation mechanism is put forward to reduce unnecessary dispatching, thereby improving the stability of the power grid.

3. This paper applies Conformer-RLpatching to the SG-126 power grid simulator provided by SGCC, bridging the ‘sim-to-real’ gap. Extensive experiments show that Conformer-RLpatching achieves great improvement over DDPG in security score by 25.8%\% and achieves remarkable total reward up to 527.32 superior to 510.09, the highest total reward in State Grid Dispatching AI Innovation Competition.222https://aistudio.baidu.com/aistudio/competition/detail/111/0/leaderboard

The rest of this paper is organized as follows. Section II introduces the confidence estimation Transformer for long-term renewable energy forecasting. Section III presents the reinforcement learning-based dispatching framework and the dispatching process. Section IV gives the performance of proposed methods. Section V draws the conclusion.

II Confidence Estimation Transformer for Long-term Renewable Energy Forecasting

This section mainly introduces two main components of Conformer, the enhanced Transformer-based renewable energy prediction model and the confidence estimation, which are responsible for the long-term prediction of renewable energy and the synthesis of prediction results, respectively. Section II-C introduces the specific process of Conformer.

Refer to caption

Figure 2: Flow Chart of Conformer. a, The enhanced Transformer-based renewable energy prediction model. b, The confidence estimation method.

II-A Enhanced Transformer for Renewable Energy Prediction

In order to deal with renewable energy fluctuations caused by external factors in advance, this paper adopts the enhanced Transformer [27] to efficiently predict the long-term maximum active output of every renewable energy generator. The enhanced Transformer-based renewable energy prediction model, shown in Fig. 2 a, is divided into encoder and decoder.

For all renewable energy generators, PR𝑡1,𝑡2{\mathop{\textit{PR}}\nolimits_{{\mathop{t}\nolimits_{1}},{\mathop{t}\nolimits_{2}}}} represents real maximum active power from 𝑡1{\mathop{t}\nolimits_{1}} to 𝑡2{\mathop{t}\nolimits_{2}}, and the sequence of the maximum active power predicted by the enhanced Transformer at time tt is noted as PR^𝑡1,𝑡2t{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{1}},{\mathop{t}\nolimits_{2}}}^{t}}. PR^t+1,t+10{\mathop{\widehat{\textit{PR}}}\nolimits_{{t}+1,{t}+10}} represents the final prediction result for the next 10 days output by Conformer.

The encoder mainly carries out coding for known sequences and extracting features. Suppose that the current time is 𝑡0{\mathop{t}\nolimits_{0}}. In the enhanced Transformer, the input of encoder includes the real data in the past 56 days PR𝑡055,𝑡0{\mathop{\textit{PR}}\nolimits_{{\mathop{t}\nolimits_{0}}-55,{\mathop{t}\nolimits_{0}}}} and the corresponding four-dimension time code, whose sizes are 56×𝑛new56\times\mathop{n}\nolimits_{new} and 56×456\times 4, respectively. ProbSparse self-attention is adopted in the encoder, achieving O(LlogL)O(LlogL) time complexity and O(LlogL)O(LlogL) memory usage on dependency alignments. For tuple inputs, i.e, query QQ, key KK and value VV, ProbSparse self-attention allows each key to only attend to the uu dominant queries:

A(Q,K,V)=Softmax(Q¯𝐾Td)V{{\rm A}(Q,K,V)=Softmax(\frac{{\overline{Q}\mathop{K}\nolimits^{T}}}{{\sqrt{d}}})V} (1)

where Q¯\overline{Q} is a sparse matrix of the same size as QQ, and it only contains the Top-uu queries under the sparsity measurement. Meanwhile, self-attention distilling operation is introduced. Conv1D layers and Maxpooling layers are added after each ProbSparse self-attention layer to privilege dominating attention scores and help receiving long sequence input.

The decoder is responsible for predicting. The input of the decoder includes historical data and corresponding time coding besides the sequence characteristics output by the encoder. The data section includes real data in the past 28 days PR𝑡027,𝑡0{\mathop{\textit{PR}}\nolimits_{{\mathop{t}\nolimits_{0}}-27,{\mathop{t}\nolimits_{0}}}} and that in the next 15 days replaced by 0. Given the input, the decoder predicts the sequence in the next 15 days PR^𝑡0+1,𝑡0+15𝑡0{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+15}^{{\mathop{t}\nolimits_{0}}}}. Therefore, the sizes of the input data and time encoding are 43×𝑛new43\times\mathop{n}\nolimits_{new} and 43×443\times 4 respectively. A generative style decoder is adopted to acquire long sequence output with only one forward step needed, simultaneously avoiding cumulative error spreading during the inference.

II-B Confidence Estimation Method

Combining the prediction results from different time can offset the contingency of single prediction result. This paper designs a confidence estimation method to represent the effectiveness of a prediction sequence as the combining weight.

P(eiB)P\left(e_{i}\mid B\right) is the condition probability that the ithi^{th} result in the prediction sequence is correct given the condition BB which means the accuracy of partial results is observed. P(eiB)P\left(e_{i}\mid B\right) can be derived as

P(eiB)=P(ei)P(Bei)j=1nP(ej)P(Bej)P\left(e_{i}\mid B\right)=\frac{P\left(e_{i}\right)P\left(B\mid e_{i}\right)}{\sum_{j=1}^{n}P\left(e_{j}\right)P\left(B\mid e_{j}\right)} (2)

To facilitate the update of effectiveness, we use beta distribution to represent confidence of the prediction sequence according to the accuracy of historical prediction results, which is defined as

f(x;m,n)\displaystyle f(x;m,n) =xm1(1x)n101um1(1u)n1𝑑u\displaystyle=\frac{x^{m-1}(1-x)^{n-1}}{\int_{0}^{1}u^{m-1}(1-u)^{n-1}du} (3)
=Γ(m+n)Γ(m)Γ(n)xm1(1x)n1\displaystyle=\frac{\Gamma(m+n)}{\Gamma(m)\Gamma(n)}x^{m-1}(1-x)^{n-1}
=1B(m,n)xm1(1x)n1\displaystyle=\frac{1}{B(m,n)}x^{m-1}(1-x)^{n-1}

where m>0 and n>0m>0\text{ and }n>0 represent the number of correct prediction results and incorrect prediction results, respectively. In this paper, the prediction result is judged to be correct, when its root mean square error is less than μ{\mu}. Γ(x)\Gamma(x) is the Gamma function, which could be written as

Γ(x)=0+tx1etdt(x>0)\Gamma(x)=\int_{0}^{+\infty}t^{x-1}e^{-t}\mathrm{~{}d}t(x>0) (4)

When the prediction model is initialized without prior knowledge, the confidence of the prediction sequence is expressed as a uniform distribution on (0,1)(0,1):

P(x)=uni(0,1)=Beta(1,1)P(x)=\operatorname{uni}(0,1)=\operatorname{Beta}(1,1) (5)

Suppose that t=t0d(1d5)t={t_{0}}-d\left({1\leqslant d\leqslant 5}\right) and the prediction sequence at time tt is PR^t+1,t+15𝑡{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t+1}},{\mathop{t+15}}}^{{\mathop{t}}}}. As shown in Fig. 2 b, the confidence estimation method can evaluate the effectiveness of prediction sequence at time tt by comparing PR^t+1,𝑡0𝑡{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t+1}},{\mathop{t}\nolimits_{0}}}^{{\mathop{t}}}} with PRt+1,𝑡0{\mathop{\textit{PR}}\nolimits_{{\mathop{t+1}},{\mathop{t}\nolimits_{0}}}}. It is assumed that mm correct prediction results and nn incorrect prediction results are identified after comparison. The estimated confidence of the prediction sequence is defined as

Cont=log10(d+1)E[Beta(m,n)]=mlog10(d+1)m+n\displaystyle Con_{t}={\log_{10}}(d+1)E[Beta(m,n)]=\frac{{m{{\log}_{10}}(d+1)}}{{m+n}} (6)

Finally, the confidence is normalized as

λt=Conti=𝑡05𝑡01Coni{\lambda_{t}}=\frac{{C{on_{t}}}}{{\sum\limits_{i=\mathop{t}\nolimits_{0}-5}^{\mathop{t}\nolimits_{0}-1}C{on_{i}}}} (7)

II-C Process of Conformer

The proposed Conformer algorithm is shown in Algorithm 1. Firstly, the enhanced Transformer-based renewable energy prediction model predicts the sequence in the next 15 days PR^𝑡0+1,𝑡0+15𝑡0{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+15}^{{\mathop{t}\nolimits_{0}}}} on the basis of the real data in the past 56 days PR𝑡055,𝑡0{\mathop{\textit{PR}}\nolimits_{{\mathop{t}\nolimits_{0}}-55,{\mathop{t}\nolimits_{0}}}}. And the prediction results are stored to the storage pool DsD_{s}. If 𝑡0<5{\mathop{t}\nolimits_{0}}<5, PR^𝑡0+1,𝑡0+15𝑡0{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+15}^{{\mathop{t}\nolimits_{0}}}} is directly intercepted as PR^𝑡0+1,𝑡0+10{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}}. If t05{t_{0}}\geqslant 5, the past 5 days’ prediction sequences are taken out from the storage pool DsD_{s}, and the confidence of the sequences is calculated via (6) and (7). Finally, the predicted active output results from 𝑡0+1{\mathop{t}\nolimits_{0}}+1 to 𝑡0+10{\mathop{t}\nolimits_{0}}+10 in the past 5 days’ prediction sequences are multiplied by the corresponding estimated confidence respectively and then accumulated to obtain the effective prediction PR^𝑡0+1,𝑡0+10{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}}.

In the prediction process of Conformer, this paper combines the enhanced Transformer prediction results from different time to get the final prediction, thereby reducing the prediction error. The final prediction is delivered to RLpatching to assist the power grid dispatching and abate the impact of new energy uncertainty.

1
Input: Current time 𝑡0{\mathop{t}\nolimits_{0}}, the enhanced Transformer-based renewable energy prediction model REPM and the storage pool DsD_{s}
Output: Prediction results PR^𝑡0+1,𝑡0+10{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}}
2
3Predict the PR^𝑡0+1,𝑡0+15𝑡0{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+15}^{{\mathop{t}\nolimits_{0}}}} by REPM;
4 Store PR^𝑡0+1,𝑡0+15𝑡0{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+15}^{{\mathop{t}\nolimits_{0}}}} to DsD_{s};
5 if 𝑡0<5{\mathop{t}\nolimits_{0}}<5 then
6       Intercept PR^𝑡0+1,𝑡0+10{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}} in PR^𝑡0+1,𝑡0+15𝑡0{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+15}^{{\mathop{t}\nolimits_{0}}}};
7      
8else
9       Get PR^i+1,i+15i{\mathop{\widehat{\textit{PR}}}\nolimits_{i+1,i+15}^{i}} (i=𝑡05,,𝑡01)\left({i={\mathop{t}\nolimits_{0}}-5,...,{\mathop{t}\nolimits_{0}}-1}\right) from DsD_{s};
10       Calculate the estimated confidence λi{\lambda_{i}} via (6), (7);
11       Calculate the final prediction results PR^𝑡0+1,𝑡0+10=i=𝑡05𝑡01𝜆iPR^𝑡0+1,𝑡0+10i{\widehat{\textit{PR}}_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}}=\sum\limits_{i={\mathop{t}\nolimits_{0}}-5}^{{\mathop{t}\nolimits_{0}}-1}{\mathop{\lambda}\nolimits_{i}\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}^{i}};
12      
13 end if
14
15return PR^𝑡0+1,𝑡0+10{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}};
16
Algorithm 1 Conformer Prediction Algorithm

III Reinforcement Learning-based Dispatching for the Hybrid Power Grid

This section mainly introduces RL-based multi-objective dispatching framework. Firstly, the optimization objectives and constraints are illustrated in Section III-A. DDPG-based power flow optimization, the core algorithm of RLpatching, is described in Section III-B. Next, Section III-C presents the dispatching necessity evaluation mechanism. Finally, Section III-D introduces the decision-making and training process of RLpatching with the assistance of Conformer.

III-A Optimization Objectives and Constraint Conditions

The optimization objectives consist of security, economic and environmental indicators. The security objective considers branch overflow 𝑆bt{\mathop{S}\nolimits_{b}^{t}}, reactive power output overrun 𝑆rt{\mathop{S}\nolimits_{r}^{t}} and voltage overrun 𝑆vt{\mathop{S}\nolimits_{v}^{t}}. The economic objective aims at minimizing the cumulative cost of the power system 𝐶t{\mathop{C}\nolimits^{t}}, which includes the operation cost and startup cost of the generators. And the environmental objective targets to maximize URRE 𝑅t\mathop{R}\nolimits^{t}. The formula of the optimization objectives is

max{t=1𝑁steps(ζ(𝑆rt)+𝑆bt+ζ(𝑆vt)+ζ(𝐶t)+𝜔R𝑅t)}in which,\displaystyle\begin{aligned} &\max\{\sum\limits_{t=1}^{\mathop{N}\nolimits_{steps}}{(\zeta(\mathop{S}\nolimits_{r}^{t})+\mathop{S}\nolimits_{b}^{t}+\zeta(\mathop{S}\nolimits_{v}^{t})+\zeta(-\mathop{C}\nolimits^{t})+\mathop{\omega}\nolimits_{R}\mathop{R}\nolimits^{t})\}}\\ &\emph{in which,}\end{aligned} (8)
𝑆rt=i=1𝑛gen{𝑢it(1qitqimax)qit>qimax𝑢it(qitqimin1)qit<qimin0else\displaystyle\mathop{S}\nolimits_{r}^{t}=\sum\limits_{i=1}^{\mathop{n}\nolimits_{gen}}{\left\{{\begin{array}[]{*{20}{l}}{{\mathop{u}\nolimits_{i}^{t}}(1-\frac{{{\rm{}}{q_{i}^{t}}}}{{{\rm{}}q_{i}^{\max}}})}&{{q_{i}^{t}}>{\rm{}}q_{i}^{\max}}\\ {{\mathop{u}\nolimits_{i}^{t}}(\frac{{{\rm{}}{q_{i}^{t}}}}{{{\rm{}}q_{i}^{\min}}}-1)}&{{q_{i}^{t}}<{\rm{}}q_{i}^{\min}}\\ 0&{else}\end{array}}\right.} (8a)
𝑆bt=11nbranchj=1𝑛branchmin(𝐼jtTj,1)\displaystyle\mathop{S}\nolimits_{b}^{t}=1-\frac{1}{{{n_{branch}}}}\sum\limits_{j=1}^{\mathop{n}\nolimits_{branch}}{\min(\frac{{{\mathop{I}\nolimits_{j}^{t}}}}{{{T_{j}}}},1)} (8b)
𝑆vt=k=1𝑛bus{1vktvkmaxvkt>vimaxvktvkmin1vkt<vkmin0else\displaystyle\mathop{S}\nolimits_{v}^{t}=\sum\limits_{k=1}^{\mathop{n}\nolimits_{bus}}{\left\{{\begin{array}[]{*{20}{l}}{1-\frac{{{\rm{}}{v_{k}^{t}}}}{{{\rm{}}v_{k}^{\max}}}}&{{v_{k}^{t}}>{\rm{}}v_{i}^{\max}}\\ {\frac{{{\rm{}}{v_{k}^{t}}}}{{{\rm{}}v_{k}^{\min}}}-1}&{{v_{k}^{t}}<{\rm{}}v_{k}^{\min}}\\ 0&{else}\end{array}}\right.} (8c)
𝐶t=i=1𝑛gen(𝑢it(𝑎i(𝑃it)2+𝑏i𝑃it+𝑐i)+𝑢it(1𝑢it1)𝑐istart)\displaystyle{\mathop{C}\nolimits^{t}}={\sum\limits_{i=1}^{\mathop{n}\nolimits_{gen}}{(\mathop{u}\nolimits_{i}^{t}(\mathop{a}\nolimits_{i}\mathop{(\mathop{P}\nolimits_{i}^{t})}\nolimits^{2}+\mathop{b}\nolimits_{i}\mathop{P}\nolimits_{i}^{t}+\mathop{c}\nolimits_{i})+\mathop{u}\nolimits_{i}^{t}(1-\mathop{u}\nolimits_{i}^{t-1})\mathop{c}\nolimits_{i}^{start})}} (8d)
𝑅t=i=1nnew𝑃iti=1nnew𝑃imax\displaystyle\mathop{R}\nolimits^{t}=\frac{{\sum\limits_{i=1}^{{n_{new}}}{\mathop{P}\nolimits_{i}^{t}}}}{{\sum\limits_{i=1}^{{n_{new}}}{\mathop{P}\nolimits_{i}^{\max}}}} (8e)

Here, 𝑁steps{\mathop{N}\nolimits_{steps}} is the maximum number of steps in which the dispatching strategy can make the simulator run safely under the constraints. 𝜔R\mathop{\omega}\nolimits_{R} is the weight coefficient greater than one to improve URRE. 𝑛gen{\mathop{n}\nolimits_{gen}}, 𝑛branch{\mathop{n}\nolimits_{branch}}, 𝑛bus{\mathop{n}\nolimits_{bus}} and 𝑛new{\mathop{n}\nolimits_{new}} represent the number of generators, branches, busbars and renewable energy generators, respectively. 𝑢it{{\mathop{u}\nolimits_{i}^{t}}}, 𝑞it{\mathop{q}\nolimits_{i}^{t}} and 𝑃it{{\mathop{P}\nolimits_{i}^{t}}} are the on/off status, reactive power output and active output of generator ii in period tt, respectively. 𝑞imax{\mathop{q}\nolimits_{i}^{\max}} and 𝑞imin{\mathop{q}\nolimits_{i}^{\min}} represent the maximum and minimum reactive power output of generator ii, and 𝑃imax{\mathop{P}\nolimits_{i}^{\max}} is the maximum active output of generator ii. 𝐼jt{\mathop{I}\nolimits_{j}^{t}} and 𝑇j{\mathop{T}\nolimits_{j}} separately represent the current and thermal limits of branch jj. 𝑣kt{\mathop{v}\nolimits_{k}^{t}} is the voltage of busbar kk in period tt, 𝑣kmax{\mathop{v}\nolimits_{k}^{\max}} and 𝑣kmin{\mathop{v}\nolimits_{k}^{\min}} represent the maximum and minimum voltage of busbar kk, respectively. 𝑎i{\mathop{a}\nolimits_{i}}, 𝑏i{\mathop{b}\nolimits_{i}} and 𝑐i{\mathop{c}\nolimits_{i}} are operation cost factors of generator ii. 𝑐istart{\mathop{c}\nolimits_{i}^{start}}, the factor of startup cost, is a fixed value for generator ii. In addition, we use a normalization function ζ(x)=𝑒x1{\zeta(x)=\mathop{e}\nolimits^{x}-1} to limit the ranges of 𝑆rt{\mathop{S}\nolimits_{r}^{t}}, 𝑆vt{\mathop{S}\nolimits_{v}^{t}} and 𝐶t{-\mathop{C}\nolimits^{t}} to (1,0]\left({-1,0}\right].

In this paper, three constraint conditions are set to limit optimization.

III-A1 Voltage constraints

The actual value of the voltage of any generator should not be greater than the upper limit of the voltage, nor less than the lower limit. Otherwise, the security objective will be affected negatively.

III-A2 Power balance constraints

The total power generation should cover the total power demand. Hence,

m𝑛t𝑃Tmt+n𝑛new𝑃Rnt=j𝑛l𝑃Ljt{\sum\limits_{m}^{\mathop{n}\nolimits_{t}}{\mathop{P}\nolimits_{Tm}^{t}}+\sum\limits_{n}^{\mathop{n}\nolimits_{new}}{\mathop{P}\nolimits_{Rn}^{t}}=\sum\limits_{j}^{\mathop{n}\nolimits_{l}}{\mathop{P}\nolimits_{Lj}^{t}}} (15)

where 𝑛t{\mathop{n}\nolimits_{t}} is the number of thermal generators, and 𝑛l{\mathop{n}\nolimits_{l}} is the number of loads.

III-A3 Ramp rate constraints

The active power adjustment value of each thermal power generator between any two continuous time steps must be smaller than the generator maximum adjustment value. Therefore, for generator ii in period tt, the ramp rate constraint is defined as

|𝑃it𝑃it+Δt|<ramp_rate×Pimax{|\mathop{P}\nolimits_{i}^{t}-\mathop{P}\nolimits_{i}^{t+\Delta t}|<\mathop{ramp\_rate\times P}\nolimits_{i}^{max}} (16)

III-B DDPG-based Power Flow Optimization

RLpatching adopting DDPG determines the active power output of all generators for the next day according to the current power grid operation state and long-term prediction of renewable energy from Conformer. It includes actor{actor} and critic{critic}, which are used for decision-making and scoring, respectively, as shown in Fig. 3.

Refer to caption

Figure 3: Architecture of RLpatching

In RLpatching, the input of actor{actor} includes the current power grid observations obstobs_{t} and the final prediction PR^𝑡+1,𝑡+10{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}}+1,{\mathop{t}}+10}} output by Conformer, which are marked as the state space 𝑠t\mathop{s}\nolimits_{t}. obstobs_{t} consists of 13 selected groups of current power grid parameters to represent the operation state, including reactive power of all generators, voltage, active power and reactive power at the starts and ends of branches, load ratio and current of branches, active power and reactive power of the loads and the grid loss. actor{actor} outputs active power of all generators for the next day 𝑃t\mathop{P}^{t}. critic{critic} takes 𝑠t\mathop{s}\nolimits_{t}, 𝑃t\mathop{P}^{t} as its input and outputs the value{value} Q{Q}.

Since the grid operation control is modeled as a time-continuous decision-making process, the reward function is defined as the optimization objectives at each time step. The reward function at time tt is

𝑟t=𝑆bt+ζ(𝑆rt)+ζ(𝑆vt)+ζ(𝐶t)+𝜔R𝑅t{\mathop{r}\nolimits_{t}=\mathop{S}\nolimits_{b}^{t}+\zeta(\mathop{S}\nolimits_{r}^{t})+\zeta(\mathop{S}\nolimits_{v}^{t})+\zeta(-\mathop{C}\nolimits^{t})+\mathop{\omega}\nolimits_{R}\mathop{R}\nolimits^{t}} (17)

The actor{actor} and critic{critic} parameterized by 𝜃μ{\mathop{\theta}\nolimits^{\mu}}, 𝜃Q{\mathop{\theta}\nolimits^{Q}} are used to represent the deterministic policy 𝑃t=μ(𝑠t|𝜃μ){\mathop{P}^{t}=\mu(\mathop{s}\nolimits_{t}\left|{\mathop{\theta}\nolimits^{\mu}}\right.)} and the critic function Q(𝑠t,𝑃t|𝜃Q){Q(\mathop{s}\nolimits_{t},\mathop{P}^{t}\left|{\mathop{\theta}\nolimits^{Q}}\right.)}. 𝜃μ{\mathop{\theta}\nolimits^{\mu}} and 𝜃Q{\mathop{\theta}\nolimits^{Q}} are optimized by stochastic gradient method. The loss functions of actoractor and criticcritic are

J(θμ)=1NtQ(st,Pt|θQ)|Pt=μ(st)μ(stθu)J({\theta^{\mu}})=\frac{1}{N}\sum\limits_{t}{{{\left.{Q({s_{t}},{P^{t}}\left|{{\theta^{Q}}}\right.)}\right|}_{{P^{t}}=\mu\left({{s_{t}}}\right)}}\mu\left({{s_{t}}\mid{\theta^{u}}}\right)} (18)
J(θQ)=1Nt[rt+γQ(st+1,μt+1|θQ)Q(st,Pt|θQ)]2J({\theta^{Q}})=\frac{1}{N}\sum\limits_{t}{\mathop{[{r_{t}}+\gamma Q^{\prime}({s_{t+1}},{{\mu^{\prime}}_{t+1}}\left|{{\theta^{Q^{\prime}}}}\right.)-Q({s_{t}},{P^{t}}\left|{{\theta^{Q}}}\right.)]}\nolimits^{2}} (19)

III-C Dispatching Necessity Evaluation Mechanism

In power grid dispatching, adjusting numerous generators would decrease the stability sharply. To deal with this problem, this paper designs a dispatching necessity evaluation mechanism to quantify the necessity of each generator and select 𝑛dis\mathop{n}\nolimits_{dis} generators to dispatch. For generator ii, three factors are considered to evaluate its necessity, including active power adjustment value for the next day ΔPi\mathop{\Delta P}\nolimits_{i}, utilization rate Ri=𝑃it𝑃imax{R_{i}}=\frac{{\mathop{P}\nolimits_{i}^{t}}}{{\mathop{P}\nolimits_{i}^{\max}}}, and maximum load rate of all branches around it 𝐿imax\mathop{L}\nolimits_{i}^{max}. |ΔPi|\left|{\mathop{\Delta P}\nolimits_{i}}\right| is normalized as |ΔPi|{\mathop{\left|{\mathop{\Delta P}\nolimits_{i}}\right|}\nolimits^{{}^{\prime}}} by the min-max method. We select 𝑛dis\mathop{n}\nolimits_{dis} generators with the highest necessity to dispatch, and the active output of other generators is consistent with that of previous time step. The necessity formula of dispatching is defined as

𝐷i=𝜔P|ΔPi|+(𝜑R𝑅i)ΔPi+(𝜑L𝐿imax)ΔPi\mathop{D}\nolimits_{i}=\mathop{\mathop{\omega}\nolimits_{P}\left|{\mathop{\Delta P}\nolimits_{i}}\right|}\nolimits^{{}^{\prime}}+({\mathop{\varphi}\nolimits_{R}}-\mathop{R}\nolimits_{i})\mathop{\Delta P}\nolimits_{i}+({\mathop{\varphi}\nolimits_{L}}-\mathop{L}\nolimits_{i}^{\max})\mathop{\Delta P}\nolimits_{i} (20)

Equation (20) is a polynomial composed of three parts. 𝜔P|ΔPi|\mathop{\mathop{\omega}\nolimits_{P}\left|{\mathop{\Delta P}\nolimits_{i}}\right|}\nolimits^{{}^{\prime}} measures the adjustment value, with which the dispatching necessity is positively correlated. (𝜑R𝑅i)ΔPi({\mathop{\varphi}\nolimits_{R}}-\mathop{R}\nolimits_{i})\mathop{\Delta P}\nolimits_{i} is designed to control the utilization rate of generator ii within a reasonable range. When utilization rate of generator is less than 𝜑R{\mathop{\varphi}\nolimits_{R}}, the necessity of adjustment increases with the increase of ΔPi\mathop{\Delta P}\nolimits_{i}. When utilization rate of generator is greater than 𝜑R{\mathop{\varphi}\nolimits_{R}}, the increase of active power of generator ii is restrained. (𝜑L𝐿imax)ΔPi({\mathop{\varphi}\nolimits_{L}}-\mathop{L}\nolimits_{i}^{\max})\mathop{\Delta P}\nolimits_{i} maintains the load rate of branches around generator ii, featuring the same characteristics as (𝜑R𝑅i)ΔPi({\mathop{\varphi}\nolimits_{R}}-\mathop{R}\nolimits_{i})\mathop{\Delta P}\nolimits_{i}.

III-D RLpatching Decision-making and Training Process

The decision-making and training process of the proposed Conformer-RLpatching for the hybrid power grid is shown in Algorithm 2. At the beginning of each epoch, obs𝑡0obs_{\mathop{t}\nolimits_{0}} is obtained from the initialized simulator, together with which the prediction from Conformer initializes the state space 𝑠𝑡0\mathop{s}\nolimits_{{\mathop{t}\nolimits_{0}}}. In each step of dispatching, active power output of all generators for the next step is determined by the actoractor, the dispatching necessity evaluation mechanism selects 𝑛dis\mathop{n}\nolimits_{dis} generators to dispatch, and then the simulator executes the dispatching strategy. Next, obs𝑡0+1obs_{{\mathop{t}\nolimits_{0}}+1} and PR^𝑡0+2,𝑡0+11{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+2,{\mathop{t}\nolimits_{0}}+11}} consists 𝑠𝑡0+1\mathop{s}\nolimits_{{\mathop{t}\nolimits_{0}}+1}. Afterwards (𝑠𝑡0,𝑃𝑡0,𝑟𝑡0,𝑠𝑡0+1)(\mathop{s}\nolimits_{\mathop{t}\nolimits_{0}},\mathop{P}^{\mathop{t}\nolimits_{0}},\mathop{r}\nolimits_{\mathop{t}\nolimits_{0}},\mathop{s}\nolimits_{{\mathop{t}\nolimits_{0}}+1}) is stored to DeD_{e}. Finally, samples are selected from DeD_{e} to update all networks. This epoch will end when the power flow calculation of the simulator fails to converge or the operation of the grid does not meet the constraints mentioned in Section III-A.

1
2
3Initialize actoractor, criticcritic and Conformer;
4 for epoch=1:R do
5       Initialize experience pool DeD_{e} and storage pool DsD_{s};
6       𝑡0=0{\mathop{t}\nolimits_{0}}=0;
7       Initialize the SG-126 power grid simulator and Obtain obs𝑡0obs_{\mathop{t}\nolimits_{0}};
8       Predict PR^𝑡0+1,𝑡0+10{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}} by Alg. 1;
9       [obs𝑡0,PR^𝑡0+1,𝑡0+10]𝑠𝑡0[{\mathop{obs}\nolimits_{\mathop{t}\nolimits_{0}}},{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+1,{\mathop{t}\nolimits_{0}}+10}}]\rightarrow{\mathop{s}\nolimits_{\mathop{t}\nolimits_{0}}};
10       while True do
11             Input 𝑠𝑡0{\mathop{s}\nolimits_{\mathop{t}\nolimits_{0}}} to actoractor and get P𝑡0P^{\mathop{t}\nolimits_{0}};
12             Select 𝑛dis\mathop{n}\nolimits_{dis} generators to dispatch via (20);
13             Observe 𝑟𝑡0{\mathop{r}\nolimits_{\mathop{t}\nolimits_{0}}}, DoneDone, obs𝑡0+1{\mathop{obs}\nolimits_{{\mathop{t}\nolimits_{0}}+1}} and PR𝑡0+1{{\textit{PR}}_{{\mathop{t}\nolimits_{0}}+1}} from the simulator;
14             Predict PR^𝑡0+2,𝑡0+11{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+2,{\mathop{t}\nolimits_{0}}+11}} by Algorithm 1;
15             [obs𝑡0+1,PR^𝑡0+2,𝑡0+11]𝑠𝑡0+1[{\mathop{obs}\nolimits_{{\mathop{t}\nolimits_{0}}+1}},{\mathop{\widehat{\textit{PR}}}\nolimits_{{\mathop{t}\nolimits_{0}}+2,{\mathop{t}\nolimits_{0}}+11}}]\rightarrow{\mathop{s}\nolimits_{{\mathop{t}\nolimits_{0}}+1}};
16             Store (𝑠𝑡0,𝑃𝑡0,𝑟𝑡0,𝑠𝑡0+1)(\mathop{s}\nolimits_{\mathop{t}\nolimits_{0}},\mathop{P}^{\mathop{t}\nolimits_{0}},\mathop{r}\nolimits_{\mathop{t}\nolimits_{0}},\mathop{s}\nolimits_{{\mathop{t}\nolimits_{0}}+1}) to DeD_{e};
17             Select samples from DeD_{e} and update networks;
18             if DoneDone then
19                   break;
20                  
21             end if
22            𝑡0=𝑡0+1{\mathop{t}\nolimits_{0}}={\mathop{t}\nolimits_{0}}+1;
23             𝑠𝑡0=𝑠𝑡0+1\mathop{s}\nolimits_{{\mathop{t}\nolimits_{0}}}=\mathop{s}\nolimits_{{\mathop{t}\nolimits_{0}}+1};
24            
25       end while
26      
27 end for
28
Algorithm 2 RLpatching Decision-making and Online Training Algorithm

IV Simulation and Results

IV-A The Simulator and Settings

Conformer-RLpatching is evaluated on SG-126 power grid simulator. The simulator includes 54 generators and 117 branches, which conforms to the characteristics and operation mode of provincial grid. The simulation parameters and generator parameters are illustrated in TABLE I and TABLE II, respectively.

TABLE I: Simulation parameters
Parameters Value Parameters Values
Renewable energy units 18 Thermal power units 36
Branches 185 Loads 91
Transformers 9 Ramp rate 0.05
𝜔R{\mathop{\omega}\nolimits_{R}} 2 μ{\mu} 5
TABLE II: Generator Parameters
Parameters Units
U1-U18 U19-U30 U30-U40 U40-U54
Type
Renewable
Energy
Thermal
Power
Thermal
Power
Thermal
Power
𝑃imax/MW{\mathop{P}\nolimits_{i}^{max}/MW} - 110 128 140
𝑃imin/MW{\mathop{P}\nolimits_{i}^{min}/MW} 0 15 25 28
𝑣imax/Mm3{\mathop{v}\nolimits_{i}^{max}/Mm^{3}} 105 105 105 105
𝑣imin/Mm3{\mathop{v}\nolimits_{i}^{min}/Mm^{3}} 95 95 95 95
𝑎i{\mathop{a}\nolimits_{i}} 0.0696 0.0285 0.0109 0.0097
𝑏i{\mathop{b}\nolimits_{i}} 26.2438 17.82 22.9423 12.8875
𝑐i{\mathop{c}\nolimits_{i}} 31.67 10.15 32.96 58.81
𝑐istart{\mathop{c}\nolimits_{i}^{start}} 80 100 200 880
TABLE III: Prediction RMSE
Input data length 48 56 64
Output data length 15 20 25 30 15 20 25 30 15 20 25 30
Conformer 5.0648 5.3919 5.8599 6.2192 4.9995 5.2718 5.9732 6.1780 5.0076 5.3985 6.0351 6.1866
Informer 5.9370 6.7637 7.2799 7.8346 5.9677 6.6830 7.3198 7.7981 6.0773 6.7428 7.3137 7.7993
LSTMa 10.9698 11.3034 12.1275 12.2250 10.8640 12.7792 13.0704 12.5671 11.1334 12.8136 13.0573 12.9230
Prophet 8.9671 9.7796 10.8414 11.8916 9.4211 10.3752 11.2745 12.3669 9.5111 10.3964 11.2522 12.5742
CNN 6.6045 7.5408 8.0617 8.5013 6.7521 7.4893 8.0038 8.5720 6.8096 7.5571 7.9928 8.6852
CNN-LSTM 7.8669 8.6973 9.2094 9.5646 8.2419 9.3553 9.6920 9.7102 8.3122 9.4028 9.4484 9.9086

Refer to caption

Figure 4: Prediction Results

IV-B Comparison of Renewable Energy Prediction Models

This subsection compares the predictive effectiveness of Conformer with that of state-of-the-art methods, including Informer [27], LSTMa [28], Prophet [29], convolutional neural network (CNN), and CNN-LSTM [26]. The data set used in the experiment is the maximum active power output of 18 renewable energy units in 106820 days, which is provided by China Electric Power Research Institute. This paper selects 90%\% of the data as the training set and 10%\% as the testing set. The root mean square error (RMSE) is used to evaluate the effect of the models:

RMSE=1tprennewttpreinnew(PitP^it)2RMSE=\sqrt{\frac{1}{{{t_{pre}}{n_{new}}}}\sum\limits_{t}^{{t_{pre}}}{\sum\limits_{i}^{{n_{new}}}{{{(P_{i}^{t}-\hat{P}_{i}^{t})}^{2}}}}} (21)

We set up a series of comparative experiments, w.r.t variable input length and output length, to predict the long-term maximum active output of all renewable energy generators. In these experiments, historical data of the past 48 days, 56 days, 64 days, 72 days, 80 days, 88 days, and 96 days are fed into models separately, and the length of predicted values tpre{t_{pre}} is set as 15, 20, 25 and 30. We only demonstrate the crutial results in TABLE III, the complete results are exhibited in APPENDIX A.

Conformer achieves much better performance than all compared methods in the overall experiments, which demonstrates its superiority. Specifically, the average RMSE of Conformer in 12 groups of comparative experiments is 19.07%\% lower than that of Informer with the second-best performance, which means Conformer has less prediction error. The results show that the confidence estimation method offsetting the contingency of a single prediction result facilitates the predictive effectiveness. In addition, the prediction accuracy of Conformer is 39.65%\% higher than other algorithms on average.

When the input length is fixed, all methods show a consistent trend, that is, the gap between the prediction and real data widens slightly as the output length increases. For example, when the input length is 56 and the output length rises from 15 to 30, the RMSE of Conformer and CNN increases by 23.57%\% and 26.95%\%, respectively.

Given the fixed output length, the prediction accuracy of all methods tends to increase with the input length rising, and reaches a peak as the input length is approximately 56. The phenomenon illustrates that more historical data endows the model with better predictive ability, while excessive input will bring more interference of invalid data to the model, thus reducing the prediction accuracy. When the output length is fixed at 20, despite the predictive error of Conformer increasing by 2.40%\% as the input length rises from 56 to 64, Conformer still maintains its superiority with the RMSE 19.94%\%, 28.56%\% lower than that of Informer and CNN, respectively.

This paper further observes the robustness of Conformer in different conditions of variable input and output lengths compared with Informer and CNN. Short-term prediction with more historical data makes it much easier for models. Therefore, three scenarios of variable difficulty are set as shown Fig. 4. It is clear that CNN has the worst performance in all scenarios. As depicted in Fig. 4 (a), both Conformer and Informer are able to describe the correct trend of real data, specially the prediction from day 9 to day 13 output by Conformer achieves surprisingly little error. In the moderate scenario Fig. 4 (b), Conformer performs more stably than Informer, and achieves more accurate long-term prediction. Conformer significantly outperforms Informer in the hard scenario Fig. 4 (c), where Informer deviates real data as output length rises, whereas Conformer describes the correct trend of real data accurately. In summary, Conformer still achieves the best performance despite the prediction accuracy decreasing as the difficulty rises, which proves its advantageous robustness.

TABLE IV: Simulation Results
𝜑R{\mathop{\varphi}\nolimits_{R}} 𝜑L{\mathop{\varphi}\nolimits_{L}} 𝑛dis\mathop{n}\nolimits_{dis} Steps
Security
Score
Average Cost
(Thousand RMB Yuan)
Average Renewable
Energy Utilization Rate
Total
Reward
Conformer-RLpatching I 0.8 0.8 40 426 261.145 65251.618 81.579%\% 527.318
Conformer-RLpatching II 0.8 0.8 40 309 202.073 56499.215 73.826%\% 346.117
Conformer-RLpatching III 0.7 0.8 40 417 250.544 65552.791 79.798%\% 498.990
Conformer-RLpatching IV 0.9 0.8 40 415 254.921 65399.495 77.909%\% 485.560
Conformer-RLpatching V 0.8 0.7 40 412 252.695 65651.850 81.822%\% 510.402
Conformer-RLpatching VI 0.8 0.9 40 395 241.744 66163.359 82.754%\% 497.888
Conformer-RLpatching VII 0.8 0.8 35 408 252.815 65538.547 80.055%\% 498.064
Conformer-RLpatching VIII 0.8 0.8 45 417 245.887 66081.338 82.638%\% 513.476
A+B - - - 411 250.824 65512.642 79.198%\% 490.008
B+C 0.8 0.8 40 206 129.021 68862.785 78.333%\% 241.563
DDPG - - - 325 207.656 63973.590 55.871%\% 243.303
DCR-TD3 - - - 250 159.899 59964.867 78.787%\% 299.359
PPO - - - 56 30.961 71204.500 99.451%\% 82.139
  • 1

    A: Conformer, B: DDPG-based Power Flow Optimization Algorithm, C: Dispatching Necessity Evaluation Mechanism

Refer to caption

Figure 5: Simulation Results of Each Step

IV-C Performance of Conformer-RLpatching With Different Hyper Parameters and Ablation Experiments

According to Section IV-B, the length of historical data input to Conformer-RLpatching is set as 56 with the lowest prediction error. Conformer outputs the prediction for the next 10 days in all experiments except for Conformer-RLpatching II, which only provides prediction for the next day.

In order to compare the impact of long-term and short-term renewable energy prediction on dispatching effect, this paper designs a comparative experiment between Conformer-RLpatching I and II. As shown in TABLE IV, Conformer-RLpatching I enables the simulator to run 117 more steps than Conformer-RLpatching II whilst satisfying the constrains, and the total reward of Conformer-RLpatching I is 52.352%\% higher.

Conformer-RLpatching I and III-VIII test the impact of different parameters in the dispatching necessity evaluation mechanism, including 𝜑R{\mathop{\varphi}\nolimits_{R}}, 𝜑L{\mathop{\varphi}\nolimits_{L}}, 𝑛dis\mathop{n}\nolimits_{dis}, on active power flow dispatching of hybrid power grid. According to TABLE IV, overall, when 𝜑R{\mathop{\varphi}\nolimits_{R}} and 𝜑L{\mathop{\varphi}\nolimits_{L}} are both 0.8 and 𝑛dis\mathop{n}\nolimits_{dis} is 40, Conformer-RLpatching performs best, and its total reward reaches 527. When 𝜑R{\mathop{\varphi}\nolimits_{R}} changes, the security score and renewable energy utilization rate show a downward trend, but the average cost of each step changes little. For example, when 𝜑R{\mathop{\varphi}\nolimits_{R}} increases from 0.8 to 0.9, the utilization rate of renewable energy decreases by 4.499%\%. When the critical value of branch current 𝜑L{\mathop{\varphi}\nolimits_{L}} increases, the security of power grid is seriously affected. As 𝜑L{\mathop{\varphi}\nolimits_{L}} increases from 0.8 to 0.9, the security score decreases by 7.429%\%. As 𝑛dis\mathop{n}\nolimits_{dis} rises to 40, the stability of the power grid reaches a peak. When 𝑛dis\mathop{n}\nolimits_{dis} increases from 40 to 45, the number of operation steps and security score decrease by 2.113%\% and 5.843%\%, respectively, despite the utilization rate of renewable energy increasing by 1.348%\%.

In addition, ablation experiments are carried out to explore the separate contributions of Conformer and dispatching necessity evaluation mechanism. It can be seen that the security score decreases by 3.952%\% without C. This indicates that the dispatching necessity evaluation mechanism can ensure the stable operation of the power grid system. Besides, the total reward significantly reduces by 54.190%\% discarding A, which illustrates that Conformer can assist the grid dispatching and abate the negative impact of renewable energy fluctuations.

IV-D Comparison With Other Dispatching Methods

This paper compares Conformer-RLpatching with DDPG, distributed classification replay twin delayed deep deterministic policy gradient (DCR-TD3) [30] and proximal policy optimization (PPO) to further verify the dispatching effect of Conformer-RLpatching. The simulation results are shown in TABLE IV. The security score of Conformer-RLpatching is 25.758%\% and 63.319%\% higher than that of DDPG and DCR-TD3. The total reward is improved by 76.149%\% compared with DCR-TD3 having the second best performance. The renewable energy utilization rate of Conformer-RLpatching is 4.540%\% higher than that of other methods on average.

More intuitively, Fig. 5 depicts the average renewable energy utilization rate, the average cost, and the cumulative reward of each step of the above dispatching methods in an epoch. As shown in Fig. 5 (a), despite PPO reaching relatively high renewable energy utilization rate in the beginning, its dispatching is highly unstable with an end at only step 50. In contrast, Conformer-RLpatching, which enables superior stability and efficient renewable energy utilization rate, is noticeably the best performer on the renewable energy utilization issue, albeit at the cost of slightly high cost as presented in Fig. 5 (b). Fig. 5 (c) shows the overall performance in dispatching. The rewards obtained by PPO and DCR-TD3 improve fast initially whereas coming to an end quickly, which indicates their instability. DDPG runs relatively more steps than PPO and DCR-TD3, but it obtains the lowest rewards of them in the epoch. To sum up, Conformer-RLpatching achieves the most running steps and the highest cumulative reward, verifying its superiority among the state-of-the-art dispatching methods.

TABLE V: Comparison With the Top Three Teams
Total Reward
Conformer-RLpatching I 527.318
The first ranked team 510.091
The second ranked team 509.206
The third ranked team 507.239

TABLE V compares the total reward of Conformer-RLpatching and the top three teams in State Grid Dispatching AI Innovation Competition. The results show that Conformer-RLpatching achieves a remarkable total reward up to 527.318 superior to 510.091 obtained by the best team.

V Conclusion

The emerging AI technology is gradually integrated into the security constrained economic dispatching of the hybrid energy grid. This paper proposes a Conformer-RLpatching to achieve multi-objective dispatching under the long-term fluctuations of renewable energy. Equipped with the confidence estimation method, the Conformer can provide accurate long-term renewable energy prediction to reduce the impact of new energy uncertainty on dispatching. What’s more, this paper designs RLpatching to realize active power flow optimization and puts forward a dispatching necessity evaluation mechanism to reduce unnecessary dispatching and improve the stability of the power grid. Extensive experiments based on the SG-126 power grid simulator demonstrate that the proposed Conformer-RLpatching is superior to other state-of-the-art methods. The future work will consider the impact of additional factors, such as weather and social activities, on renewable energy to improve the prediction accuracy, and further study the multi-objective dispatching of hybrid power grid under source-load side uncertainty.

Appendix A Prediction Results

As for prediction models, historical data of the past 48 days, 56 days, 64 days, 72 days, 80 days, 88 days, and 96 days are fed into models separately, and the length of prediction is set as 15, 20, 25 and 30. All experimental results are shown in the following table.

Input data
length
Output data
length
Conformer Informer LSTMa Prophet CNN
CNN-
LSTM
48 15 5.0648 5.937 10.9698 8.9671 6.6045 7.8669
20 5.3919 6.7637 11.3034 9.7796 7.5408 8.6973
25 5.8599 7.2799 12.1275 10.8414 8.0617 9.2094
30 6.2192 7.8346 12.225 11.8916 8.5013 9.5646
56 15 4.9995 5.9677 10.864 9.4211 6.7521 8.2419
20 5.2718 6.683 12.7792 10.3752 7.4893 9.3553
25 5.9732 7.3198 13.0704 11.2745 8.0038 9.692
30 6.178 7.7981 12.5671 12.3669 8.572 9.7102
64 15 5.0076 6.0773 11.1334 9.5111 6.8096 8.3122
20 5.3985 6.7428 12.8136 10.3964 7.5571 9.4028
25 6.0351 7.3137 13.0573 11.2522 7.9928 9.4484
30 6.1866 7.7993 12.923 12.5742 8.6852 9.9086
72 15 5.0214 6.081 11.1267 9.5745 6.8373 8.3324
20 5.4412 6.7233 12.9052 10.4459 7.5659 9.3964
25 5.906 7.3482 13.1027 11.3085 8.096 9.5547
30 6.2055 7.8026 12.9616 12.5302 8.6734 9.9164
80 15 5.1835 6.0975 11.3142 9.6998 6.8945 8.4128
20 5.5034 6.7255 12.8648 10.4943 7.5203 9.4645
25 5.9607 7.3541 12.9361 11.3496 8.1734 9.6077
30 6.2565 7.844 13.1125 12.5463 8.7174 9.8437
Input data
length
Output data
length
Conformer Informer LSTMa Prophet CNN
CNN-
LSTM
88 15 4.9972 6.1014 11.3667 9.6289 6.8981 8.3959
20 5.4511 6.6902 12.8955 10.5025 7.6697 9.773
25 5.9611 7.3878 13.0564 11.5041 8.3747 9.7416
30 6.2532 7.8635 13.2063 12.5195 8.7748 9.9718
96 15 5.1985 6.1397 11.4291 9.8016 6.9034 8.4526
20 5.5323 6.7316 13.0103 10.6395 7.6748 9.8253
25 5.9236 7.3796 13.1256 11.8777 8.4523 9.8562
30 6.3164 7.8826 13.1813 12.6051 8.8516 9.9868

References

  • [1] B. Khorramdel, A. Zare, C. Chung, and P. Gavriliadis, “A generic convex model for a chance-constrained look-ahead economic dispatch problem incorporating an efficient wind power distribution modeling,” IEEE Transactions on Power Systems, vol. 35, no. 2, pp. 873–886, 2019.
  • [2] C. Tang, J. Xu, Y. Sun, J. Liu, X. Li, D. Ke, J. Yang, and X. Peng, “Look-ahead economic dispatch with adjustable confidence interval based on a truncated versatile distribution model for wind power,” IEEE Transactions on Power Systems, vol. 33, no. 2, pp. 1755–1767, 2017.
  • [3] C. Yang, W. Sun, J. Yang, and D. Han, “Risk-averse two-stage distributionally robust economic dispatch model under uncertain renewable energy,” CSEE Journal of Power and Energy Systems, pp. 1–10, 2021.
  • [4] B. Qu, J. J. Liang, Y. Zhu, and P. N. Suganthan, “Solving dynamic economic emission dispatch problem considering wind power by multi-objective differential evolution with ensemble of selection method,” Natural Computing, vol. 18, no. 4, pp. 695–703, 2019.
  • [5] H. Huang, M. Zhou, S. Zhang, L. Zhang, G. Li, and Y. Sun, “Exploiting the operational flexibility of wind integrated hybrid ac/dc power systems,” IEEE Transactions on Power Systems, vol. 36, no. 1, pp. 818–826, 2020.
  • [6] Y. Yang, W. Wu, B. Wang, and M. Li, “Chance-constrained economic dispatch considering curtailment strategy of renewable energy,” IEEE Transactions on Power Systems, vol. 36, no. 6, pp. 5792–5802, 2021.
  • [7] W. Wang, B. Sun, H. Li, Q. Sun, and R. Wennersten, “An improved min-max power dispatching method for integration of variable renewable energy,” Applied Energy, vol. 276, p. 115430, 2020.
  • [8] E. Du, N. Zhang, B.-M. Hodge, Q. Wang, Z. Lu, C. Kang, B. Kroposki, and Q. Xia, “Operation of a high renewable penetrated power system with csp plants: A look-ahead stochastic unit commitment model,” IEEE Transactions on power systems, vol. 34, no. 1, pp. 140–151, 2018.
  • [9] L. Yin, Q. Gao, L. Zhao, and T. Wang, “Expandable deep learning for real-time economic generation dispatch and control of three-state energies based future smart grids,” Energy, vol. 191, p. 116561, 2020.
  • [10] Y. Gao and Q. Ai, “A novel optimal dispatch method for multiple energy sources in regional integrated energy systems considering wind curtailment,” CSEE Journal of Power and Energy Systems, pp. 1–10, 2022.
  • [11] A. C. do Amaral Burghi, T. Hirsch, and R. Pitz-Paal, “Artificial learning dispatch planning for flexible renewable-energy systems,” Energies, vol. 13, no. 6, p. 1517, 2020.
  • [12] K. Lv, H. Tang, Y. Li, and X. Li, “A learning-based optimization of active power dispatch for a grid-connected microgrid with uncertain multi-type loads,” Journal of Renewable and Sustainable Energy, vol. 9, no. 6, p. 065901, 2017.
  • [13] Z. Hu, Y. Xu, M. Korkali, X. Chen, L. Mili, and J. Valinejad, “A bayesian approach for estimating uncertainty in stochastic economic dispatch considering wind power penetration,” IEEE Transactions on Sustainable Energy, vol. 12, no. 1, pp. 671–681, 2020.
  • [14] W. Dong, Q. Yang, W. Li, and A. Y. Zomaya, “Machine-learning-based real-time economic dispatch in islanding microgrids in a cloud-edge computing environment,” IEEE Internet of Things Journal, vol. 8, no. 17, pp. 13 703–13 711, 2021.
  • [15] J. Guan, H. Tang, K. Wang, J. Yao, and S. Yang, “A parallel multi-scenario learning method for near-real-time power dispatch optimization,” Energy, vol. 202, p. 117708, 2020.
  • [16] L. Lei, Y. Tan, G. Dahlenburg, W. Xiang, and K. Zheng, “Dynamic energy dispatch based on deep reinforcement learning in iot-driven smart isolated microgrids,” IEEE Internet of Things Journal, vol. 8, no. 10, pp. 7938–7953, 2020.
  • [17] T. Yang, L. Zhao, W. Li, and A. Y. Zomaya, “Dynamic energy dispatch strategy for integrated energy system based on improved deep reinforcement learning,” Energy, vol. 235, p. 121377, 2021.
  • [18] R. Lu, T. Ding, B. Qin, J. Ma, X. Fang, and Z. Dong, “Multi-stage stochastic programming to joint economic dispatch for energy and reserve with uncertain renewable energy,” IEEE Transactions on Sustainable Energy, vol. 11, no. 3, pp. 1140–1151, 2019.
  • [19] Y. Huo, F. Bouffard, and G. Joós, “Decision tree-based optimization for flexibility management for sustainable energy microgrids,” Applied Energy, vol. 290, p. 116772, 2021.
  • [20] B. Mohandes, M. Wahbah, M. S. El Moursi, and T. H. El-Fouly, “Renewable energy management system: Optimum design and hourly dispatch,” IEEE Transactions on Sustainable Energy, vol. 12, no. 3, pp. 1615–1628, 2021.
  • [21] M. Kamel, R. Dai, Y. Wang, F. Li, and G. Liu, “Data-driven and model-based hybrid reinforcement learning to reduce stress on power systems branches,” CSEE Journal of Power and Energy Systems, vol. 7, no. 3, pp. 433–442, 2021.
  • [22] X. Peng, Y. Chen, K. Cheng, H. Wang, Y. Zhao, B. Wang, J. Che, C. Liu, J. Wen, C. Lu et al., “Wind power prediction for wind farm clusters based on the multifeature similarity matching method,” IEEE Transactions on Industry Applications, vol. 56, no. 5, pp. 4679–4688, 2020.
  • [23] M. Cui, J. Zhang, Q. Wang, V. Krishnan, and B.-M. Hodge, “A data-driven methodology for probabilistic wind power ramp forecasting,” IEEE Transactions on Smart Grid, vol. 10, no. 2, pp. 1326–1338, 2017.
  • [24] S. Wang, X. Zhao, H. Wang, and M. Li, “Small-world neural network and its performance for wind power forecasting,” CSEE Journal of Power and Energy Systems, vol. 6, no. 2, pp. 362–373, 2020.
  • [25] L. Ge, Y. Xian, J. Yan, B. Wang, and Z. Wang, “A hybrid model for short-term pv output forecasting based on pca-gwo-grnn,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 6, pp. 1268–1275, 2020.
  • [26] G. Li, S. Xie, B. Wang, J. Xin, Y. Li, and S. Du, “Photovoltaic power forecasting with a hybrid deep learning approach,” IEEE Access, vol. 8, pp. 175 871–175 880, 2020.
  • [27] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in Proceedings of AAAI, 2021.
  • [28] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
  • [29] S. J. Taylor and B. Letham, “Forecasting at scale,” The American Statistician, vol. 72, no. 1, pp. 37–45, 2018.
  • [30] J. Li and T. Yu, “Deep reinforcement learning based multi-objective integrated automatic generation control for multiple continuous power disturbances,” IEEE Access, vol. 8, pp. 156 839–156 850, 2020.

References

  • [1] B. Khorramdel, A. Zare, C. Chung, and P. Gavriliadis, “A generic convex model for a chance-constrained look-ahead economic dispatch problem incorporating an efficient wind power distribution modeling,” IEEE Transactions on Power Systems, vol. 35, no. 2, pp. 873–886, 2019.
  • [2] C. Tang, J. Xu, Y. Sun, J. Liu, X. Li, D. Ke, J. Yang, and X. Peng, “Look-ahead economic dispatch with adjustable confidence interval based on a truncated versatile distribution model for wind power,” IEEE Transactions on Power Systems, vol. 33, no. 2, pp. 1755–1767, 2017.
  • [3] C. Yang, W. Sun, J. Yang, and D. Han, “Risk-averse two-stage distributionally robust economic dispatch model under uncertain renewable energy,” CSEE Journal of Power and Energy Systems, pp. 1–10, 2021.
  • [4] B. Qu, J. J. Liang, Y. Zhu, and P. N. Suganthan, “Solving dynamic economic emission dispatch problem considering wind power by multi-objective differential evolution with ensemble of selection method,” Natural Computing, vol. 18, no. 4, pp. 695–703, 2019.
  • [5] H. Huang, M. Zhou, S. Zhang, L. Zhang, G. Li, and Y. Sun, “Exploiting the operational flexibility of wind integrated hybrid ac/dc power systems,” IEEE Transactions on Power Systems, vol. 36, no. 1, pp. 818–826, 2020.
  • [6] Y. Yang, W. Wu, B. Wang, and M. Li, “Chance-constrained economic dispatch considering curtailment strategy of renewable energy,” IEEE Transactions on Power Systems, vol. 36, no. 6, pp. 5792–5802, 2021.
  • [7] W. Wang, B. Sun, H. Li, Q. Sun, and R. Wennersten, “An improved min-max power dispatching method for integration of variable renewable energy,” Applied Energy, vol. 276, p. 115430, 2020.
  • [8] E. Du, N. Zhang, B.-M. Hodge, Q. Wang, Z. Lu, C. Kang, B. Kroposki, and Q. Xia, “Operation of a high renewable penetrated power system with csp plants: A look-ahead stochastic unit commitment model,” IEEE Transactions on power systems, vol. 34, no. 1, pp. 140–151, 2018.
  • [9] L. Yin, Q. Gao, L. Zhao, and T. Wang, “Expandable deep learning for real-time economic generation dispatch and control of three-state energies based future smart grids,” Energy, vol. 191, p. 116561, 2020.
  • [10] Y. Gao and Q. Ai, “A novel optimal dispatch method for multiple energy sources in regional integrated energy systems considering wind curtailment,” CSEE Journal of Power and Energy Systems, pp. 1–10, 2022.
  • [11] A. C. do Amaral Burghi, T. Hirsch, and R. Pitz-Paal, “Artificial learning dispatch planning for flexible renewable-energy systems,” Energies, vol. 13, no. 6, p. 1517, 2020.
  • [12] K. Lv, H. Tang, Y. Li, and X. Li, “A learning-based optimization of active power dispatch for a grid-connected microgrid with uncertain multi-type loads,” Journal of Renewable and Sustainable Energy, vol. 9, no. 6, p. 065901, 2017.
  • [13] Z. Hu, Y. Xu, M. Korkali, X. Chen, L. Mili, and J. Valinejad, “A bayesian approach for estimating uncertainty in stochastic economic dispatch considering wind power penetration,” IEEE Transactions on Sustainable Energy, vol. 12, no. 1, pp. 671–681, 2020.
  • [14] W. Dong, Q. Yang, W. Li, and A. Y. Zomaya, “Machine-learning-based real-time economic dispatch in islanding microgrids in a cloud-edge computing environment,” IEEE Internet of Things Journal, vol. 8, no. 17, pp. 13 703–13 711, 2021.
  • [15] J. Guan, H. Tang, K. Wang, J. Yao, and S. Yang, “A parallel multi-scenario learning method for near-real-time power dispatch optimization,” Energy, vol. 202, p. 117708, 2020.
  • [16] L. Lei, Y. Tan, G. Dahlenburg, W. Xiang, and K. Zheng, “Dynamic energy dispatch based on deep reinforcement learning in iot-driven smart isolated microgrids,” IEEE Internet of Things Journal, vol. 8, no. 10, pp. 7938–7953, 2020.
  • [17] T. Yang, L. Zhao, W. Li, and A. Y. Zomaya, “Dynamic energy dispatch strategy for integrated energy system based on improved deep reinforcement learning,” Energy, vol. 235, p. 121377, 2021.
  • [18] R. Lu, T. Ding, B. Qin, J. Ma, X. Fang, and Z. Dong, “Multi-stage stochastic programming to joint economic dispatch for energy and reserve with uncertain renewable energy,” IEEE Transactions on Sustainable Energy, vol. 11, no. 3, pp. 1140–1151, 2019.
  • [19] Y. Huo, F. Bouffard, and G. Joós, “Decision tree-based optimization for flexibility management for sustainable energy microgrids,” Applied Energy, vol. 290, p. 116772, 2021.
  • [20] B. Mohandes, M. Wahbah, M. S. El Moursi, and T. H. El-Fouly, “Renewable energy management system: Optimum design and hourly dispatch,” IEEE Transactions on Sustainable Energy, vol. 12, no. 3, pp. 1615–1628, 2021.
  • [21] M. Kamel, R. Dai, Y. Wang, F. Li, and G. Liu, “Data-driven and model-based hybrid reinforcement learning to reduce stress on power systems branches,” CSEE Journal of Power and Energy Systems, vol. 7, no. 3, pp. 433–442, 2021.
  • [22] X. Peng, Y. Chen, K. Cheng, H. Wang, Y. Zhao, B. Wang, J. Che, C. Liu, J. Wen, C. Lu et al., “Wind power prediction for wind farm clusters based on the multifeature similarity matching method,” IEEE Transactions on Industry Applications, vol. 56, no. 5, pp. 4679–4688, 2020.
  • [23] M. Cui, J. Zhang, Q. Wang, V. Krishnan, and B.-M. Hodge, “A data-driven methodology for probabilistic wind power ramp forecasting,” IEEE Transactions on Smart Grid, vol. 10, no. 2, pp. 1326–1338, 2017.
  • [24] S. Wang, X. Zhao, H. Wang, and M. Li, “Small-world neural network and its performance for wind power forecasting,” CSEE Journal of Power and Energy Systems, vol. 6, no. 2, pp. 362–373, 2020.
  • [25] L. Ge, Y. Xian, J. Yan, B. Wang, and Z. Wang, “A hybrid model for short-term pv output forecasting based on pca-gwo-grnn,” Journal of Modern Power Systems and Clean Energy, vol. 8, no. 6, pp. 1268–1275, 2020.
  • [26] G. Li, S. Xie, B. Wang, J. Xin, Y. Li, and S. Du, “Photovoltaic power forecasting with a hybrid deep learning approach,” IEEE Access, vol. 8, pp. 175 871–175 880, 2020.
  • [27] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang, “Informer: Beyond efficient transformer for long sequence time-series forecasting,” in Proceedings of AAAI, 2021.
  • [28] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
  • [29] S. J. Taylor and B. Letham, “Forecasting at scale,” The American Statistician, vol. 72, no. 1, pp. 37–45, 2018.
  • [30] J. Li and T. Yu, “Deep reinforcement learning based multi-objective integrated automatic generation control for multiple continuous power disturbances,” IEEE Access, vol. 8, pp. 156 839–156 850, 2020.


Xinhang Li received the B.E. degree in communication engineering from Beijing University of Posts and Telecommunications (BUPT), Beijing, China, in 2021. He is currently pursuing the Ph.D. degree in information and communication engineering from the School of Artificial Intelligence, BUPT. His research interests include deep reinforcement learning, optimal power flow and intelligent information processing.

Zihao Li is currently pursuing the B.E. degree in telecommunications engineering with management with Beijing University of Posts and Telecommunications, Beijing, China. His current research interest includes machine learning and artificial intelligence.

Nan Yang received the B.S. and M.S. degrees in electrical engineering from Beijing Institute of Technology (BIT), Beijing, China, in 2015 and 2018, respectively. She works for China Electric Power Research Institute, and her research interests include big data analysis and artificial intelligence application in the field of power dispatching automation.

Zheng Yuan received the B.E. degree in information engineering from Beijing University of Posts and Telecommunications (BUPT), Beijing, China, in 2021. He is currently pursuing an M.S. degree from the School of Artificial Intelligence, BUPT. His research interests are reinforcement learning, power dispatching and intelligent information processing.

Qinwen Wang received the B.E. degree in digital media technology from Communication University of China (CUC), Beijing, China, in 2020. She is currently pursuing an M.S. degree from the School of Artificial Intelligence, Beijing University of Posts and Telecommunications (BUPT), Beijing, China. Her research interests include reinforcement learning, smart grid and cooperative intelligent transportation systems.

Yiying Yang received the B.E. degree in communication engineering from Beijing University of Posts and Telecommunications (BUPT), Beijing, China, in 2021. She is currently pursuing the M.S. degree in information and communication engineering with BUPT. Her research interests include reinforcement learning, power dispatching and cooperative connected vehicles control.

Yupeng Huang received the B.S. degree in electrical engineering from Tsinghua University (THU), Beijing, China, in 2016, and received the M.S. degree in electric power system and automation from China Electric Power Research Institute (CEPRI), Beijing, China, in 2019. He works for China Electric Power Research Institute and his research interests include power dispatching and automation.

Xuri Song received the B.E. and M.S. degrees in electrical engineering from China Agricultural University, Beijing, China, in 2009 and 2011. He works for China Electric Power Research Institute. His research interests include power grid analysis and artificial intelligence application.

Lei Li is currently an Associate Professor with the School of Artificial Intelligence, Beijing University of Posts and Telecommunications, China. Her research interests include intelligent information processing, deep learning, machine learning, and natural language processing.

Lin Zhang (Member, IEEE) received the B.S. and Ph.D. degrees from the Beijing University of Posts and Telecommunications (BUPT), Beijing, China, in 1996 and 2001, respectively. He is currently the Director of Beijing Bigdata Center and also a Professor of BUPT. He was a Postdoctoral Researcher with Information and Communications University, South Korea. He used to hold a Research Fellow position with Nanyang Technological University, Singapore. In 2004, he joined BUPT as a Lecturer, then an Associate Professor in 2005, and a Professor in 2011. He has authored more than 120 papers in referenced journals and international conferences. His research interests include intelligent information processing, deep learning, mobile cloud computing and Internet of Things.