This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Random vector functional link neural network based ensemble deep learning for short-term load forecasting

Ruobin Gao1, , Liang Du1, P.N. Suganthan2, , Qin Zhou1, Kum Fai Yuen1 Corresponding author: Ruobin Gao (email: [email protected]). 1School of Civil and Environmental Engineering, Nanyang Technological University, Singapore, Singapore 2School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Abstract

Electricity load forecasting is crucial for the power systems’ planning and maintenance. However, its un-stationary and non-linear characteristics impose significant difficulties in anticipating future demand. This paper proposes a novel ensemble deep Random Vector Functional Link (edRVFL) network for electricity load forecasting. The weights of hidden layers are randomly initialized and kept fixed during the training process. The hidden layers are stacked to enforce deep representation learning. Then, the model generates the forecasts by ensembling the outputs of each layer. Moreover, we also propose to augment the random enhancement features by empirical wavelet transformation (EWT). The raw load data is decomposed by EWT in a walk-forward fashion, not introducing future data leakage problems in the decomposition process. Finally, all the sub-series generated by the EWT, including raw data, are fed into the edRVFL for forecasting purposes. The proposed model is evaluated on twenty publicly available time series from the Australian Energy Market Operator of the year 2020. The simulation results demonstrate the proposed model’s superior performance over eleven forecasting methods in three error metrics and statistical tests on electricity load forecasting tasks.

Index Terms:
Forecasting, random vector functional link network, deep learning, machine learning.

I Introduction

Forecasting electricity load accurately benefits electric power system planning for maintenance and construction. After collecting raw electricity demand, a reliable forecasting model established on raw historical data can approximate how much electricity is expected in the future. Therefore, accurate forecasts help the supplier to decrease energy generation and expenses and plan the resources efficiently [1]. Furthermore, short-term load forecasting models assist electricity organizations in making opportune decisions in a data-driven fashion. As a result, developing novel and accurate forecasting models for short-term load is beneficial.

The electricity load forecasting is one kind of time series forecasting tasks. Anticipating the future using intelligent forecasting models is a well-developed field, where the models established from the historical data are used to extrapolate future values [2]. There are plentiful forecasting models, such as Auto-regressive integrated moving average (ARIMA) [3], fuzzy time series [4], support vector regression (SVR) [5], randomized neural networks [6], hybrid models [7, 8, 9, 10], ensemble learning [11, 12] and deep learning models [13]. Accurate and reliable forecasts of electricity load is a challenging and significant problem for the electric power domain. In the field of load forecasting domain, the methods can be classified into three categories (i) statistical models, (ii) computational intelligence models and (ii) hybrid models. The statistical models, such as ARIMA [3] and exponential smoothing [14], are computationally efficient and theoretically solid, but their performance is not outstanding. The second huge branch is the computational intelligence models including fuzzy system [15, 7], SVR [5], shallow artificial neural networks (ANN) [6] and deep learning [13, 16, 17, 18, 19, 20]. In [16], a pooling deep recurrent neural network (RNN) is proposed to overcome the over-fitting problem caused by deep structures. A deep factored conditional restricted Boltzmann machine (FCRBM) whose parameters are optimized via a genetic wind-driven optimization (GWDO) for load forecasting is proposed in [17]. In [18], online tuning is utilized to update the deep RNN when the performance degrades. Several deep RNNs are evaluated for load forecasting in [19], where the input is selected from various weather and scheduled related variables. The last category, hybrid models, includes the combination of feature extraction blocks and several forecasting models to form a single model. For example, the empirical mode decomposition (EMD) is utilized to extract modes from the load and then deep belief network (DBN) is implemented to forecast each mode in [9]. Empirical wavelet transformation (EWT) is applied to decompose the load data into sub-series in a walk-forward fashion and then the concatenation of raw data and sub-series are fed into a random vector functional link (RVFL) network for forecasting purposes [8].

Neural networks are popular models for load forecasting due to their high accuracy and strong ability to handle non-linearity. The deep learning models [13, 16, 17, 18, 19, 20] succeed in forecasting short-term load accurately because of their hierarchical structures which learn a meaningful representation of the input data. However, most fully trained deep learning models suffer from huge computation burdens. Therefore, this paper proposes a fast ensemble deep learning algorithm for short-term load forecasting. The proposed model inherits the advantages of ensemble learning and deep learning without imposing much computational burden at the same time. This paper investigates the forecasting ability of a special kind of randomized deep neural networks, the deep RVFL network, whose training is fast. Ensemble learning techniques are combined with the deep RVFL to reduce the uncertainty caused by a single model. Since the deep RVFL’s hidden features are randomly generated and remained fixed during the training process, the EWT is utilized to extract features with different frequencies to augment the deep RVFL’s random features. Recently, the universal function approximation ability of the RVFL network is proved in [21]. This paper uses EWT to decompose the raw data in a walk-forward fashion which is different from decomposing the whole time series altogether [9, 10, 22, 12]. The future data are not involved in the walk-forward decomposition process. Therefore there is no data leakage problem in terms of forecasting.

The novel characteristics of the proposed model are summarized as follows:

  1. 1.

    This paper implements the edRVFL for short-term load forecasting for the first time. The mean and median computations are used as ensemble approaches which are different from the edRVFL for classification [23].

  2. 2.

    The EWT is combined with the edRVFL as a feature engineering block to augment the random features. Furthermore, the EWT is conducted in a walk-forward fashion to avoid future data leakage problems. Finally, two novel hybrid forecasting models based on walk-forward EWT and edRVFL are proposed for short-term load forecasting.

  3. 3.

    The hyper-parameters of the proposed model are optimized in a layer-wise fashion. The succeeding layers are based on the optimized previous layer’s features. Therefore, each layer has its suitable hyper-parameters and does not degrade the performance.

  4. 4.

    The proposed model is compared with various benchmark models from statistical ones to state-of-the-art models on twenty load time series. Three error metrics and two statistical tests are conducted for precise comparisons. The statistical tests demonstrate the proposed model’s superiority both in a group-wise and pair-wise fashion.

The remainder of this paper is organized as follows: Section II describes the methodologies and the proposed model in detail. We first describe the EWT and the walk-forward decomposition. Then, the ensemble deep RVFL and its combination with the walk-forward EWT is presented. Section III presents the experimental step-up and the results. Finally, conclusions are drawn and potential future directions are discussed in Section IV.

II Methodology

This section describes the methodologies in detail. First, we introduce the EWT and the walk-forward decomposition procedure. Then, we describe the ensemble deep RVFL network and the proposed model.

II-A Empirical wavelet transformation

The EWT is an automatic signal decomposition algorithm with solid theoretical foundations and remarkable effectiveness in decomposing non-stationary time series data [24]. Unlike discrete wavelet transform (DWT) and EMD [25], EWT precisely investigates the time series in the Fourier domain after fast Fourier transform (FFT). It realizes the spectrum separation using band-pass filtering with the data-driven filter banks.

Figure 1 shows the EWT’s regular procedures. In the EWT, limited freedom is provided for selecting wavelets. The algorithm employs Littlewood-Paley and Meyer’s wavelets because of the analytic accessibility of the Fourier domain’s closed-form expression [26]. In [24], the formulations of these band-pass filters are denoted by Equations 1 and 2

ϕ^n(ω)={1if |ω|(1γ)ωncos[π2β(12γωn(|ω|(1γ)ωn|))]if (1γ)ωn|ω|(1+γ)ωn 0otherwise,\centering\hat{\phi}_{n}(\omega)=\begin{cases}1&\makebox[1.0pt][r]{\text{if $\left|\omega\right|\leq(1-\gamma)\omega_{n}$}}\\ \cos\left[\frac{\pi}{2}\beta\left(\frac{1}{2\gamma\omega_{n}}\left(\left|\omega\right|-(1-\gamma)\omega_{n}|\right)\right)\right]&\\ \quad&\makebox[60.0pt][r]{\text{if $(1-\gamma)\omega_{n}\leq\left|\omega\right|\leq(1+\gamma)\omega_{n}$ }}\\ 0&\text{otherwise,}\\ \end{cases}\@add@centering (1)
ψ^n(ω)={1if (1+γ)ωn|ω|(1γ)ωn+1 cos[π2ζ(12γωn+1(|ω|(1γ)ωn+1|))]if (1γ)ωn+1|ω|(1+γ)ωn+1 sin[π2ζ(12γωn(|ω|(1γ)ωn|))]if (1γ)ωn|ω|(1+γ)ωn0otherwise,\centering\hat{\psi}_{n}(\omega)=\begin{cases}1&\makebox[12.0pt][r]{\text{if $(1+\gamma)\omega_{n}\leq\left|\omega\right|\leq(1-\gamma)\omega_{n+1}$ }}\\ \cos\left[\frac{\pi}{2}\zeta\left(\frac{1}{2\gamma\omega_{n+1}}\left(\left|\omega\right|-(1-\gamma)\omega_{n+1}|\right)\right)\right]&\\ \quad&\makebox[22.0pt][r]{\text{if $(1-\gamma)\omega_{n+1}\leq\left|\omega\right|\leq(1+\gamma)\omega_{n+1}$ }}\\ \sin\left[\frac{\pi}{2}\zeta\left(\frac{1}{2\gamma\omega_{n}}\left(\left|\omega\right|-(1-\gamma)\omega_{n}|\right)\right)\right]&\\ \quad&\makebox[0.001pt][r]{\text{if $(1-\gamma)\omega_{n}\leq\left|\omega\right|\leq(1+\gamma)\omega_{n}$}}\\ 0&\makebox[2.0pt][r]{\text{otherwise,}}\\ \end{cases}\@add@centering (2)
Refer to caption
Figure 1: EWT implementation.

with a transitional band width parameter γ\gamma satisfying γminnωn+1ωnωn+1+ωn\gamma\leq\min_{n}\frac{\omega_{n+1}-\omega_{n}}{\omega_{n+1}+\omega_{n}}. The most common function ζ(x)\zeta(x) in Equation 1 and 2 is presented in Equation 3. This empowers the formulated empirical scaling and wavelet function {ϕ^1(ω),{ψ^n(ω)}n=1N}\{\hat{\phi}_{1}(\omega),\{\hat{\psi}_{n}(\omega)\}_{n=1}^{N}\} to be a tight frame of L2()L^{2}(\mathbb{R}) [27].

β(x)=x4(3584x+70x220x3)\centering\beta(x)=x^{4}(35-84x+70x^{2}-20x^{3})\@add@centering (3)

It can be observed that {ϕ^1(ω),{ψ^n(ω)}n=1N}\{\hat{\phi}_{1}(\omega),\{\hat{\psi}_{n}(\omega)\}_{n=1}^{N}\} are used as band-pass filters centered at assorted center frequencies.

II-B Walk-forward decomposition

Plentiful works utilize signal decomposition techniques as a feature engineering block for the forecasting algorithms [9, 10, 28, 7, 29, 8, 30], however, most do not implement the decomposition in a proper way [8, 30]. As mentioned in [7, 8, 30], direct application of signal decomposition algorithm to the whole time series causes the data leakage problem in terms of forecasting. The decomposed data are actually the output from convolution operations and the future data definitely are involved during the convolution. Therefore, decomposition of the whole time series is incorrect and improper, especially for establishing forecasting models.

Some solutions are proposed to avoid the future data leakage problem for decomposition-based forecasting models, such as the data-driven padding [7], moving window strategy [30] and walk-forward decomposition [8]. The data-driven padding approach is to train a simple learning algorithm which aims at padding its forecast to the end of the time series [7]. The moving window strategy only decomposes the data located in the window (order) and then the decomposed series are fed into forecasting models [30]. Different from the moving window strategy, only part of the decomposed sub-series are used as input in the walk-forward decomposition. The moving window strategy is a subset of the walk-forward decomposition. When the order is equal to the window, the moving window strategy and the walk-forward decomposition are the same.

This paper adopts the walk-forward decomposition for the EWT. The walk-forward EWT decomposes the data in a rolling window ww, which consists of x(t1),x(t2),x(tw)x(t-1),x(t-2),......x(t-w), into kk scales with the aim to predict x(t)x(t). Then only the last order data points are used as input for the forecasting model. Therefore, only historical observations are involved both in the decomposition process and the model’ training.

II-C Ensemble deep RVFL

Inspired by the deep representation learning, the deep RVFL is an extension of the RVFL with a shallow structure [23]. The deep RVFL is established by stacking multiple enhancement layers to achieve deep representation learning. The clean data are fed into each enhancement layer to guide the random features’ generation. In this fashion, the enhancement features of hidden layers are generated based on the information from the clean data and the features from the previous layer. A diverse set of features is generated with the help of hierarchical structures. Ensemble learning is introduced into the deep RVFL architecture to formulate the ensemble deep RVFL (edRVFL). Different from the popular deep learning models with a single output layer, the edRVFL trains multiple output layers based on all the hidden features. Finally, the forecasts from all output layers are combined for forecasting.

For the sake of presentation simplicity, we only present the edRVFL with a structure of LL enhancement layers and there are NN enhancement nodes in each layer. Figure 2 shows the architecture of the edRVFL network.

Refer to caption
Figure 2: Architecture of the edRVFL.

Suppose that the input data is 𝐗n×d\mathbf{X}\in\mathbb{R}^{n\times d}, where nn and dd represent the number of samples and feature dimension, respectively. dd is the time lag (order) for the time series forecasting model. The features generated by the first enhancement layer are defined as

𝐇𝟏=g(𝐗𝐖𝟏),\mathbf{H^{1}}=g(\mathbf{XW_{1}}), (4)

where 𝐖𝟏d×N\mathbf{W_{1}}\in\mathbb{R}^{d\times N} represents the weight vector of the first enhancement layer, 𝐇𝟏n×N\mathbf{H^{1}}\in\mathbb{R}^{n\times N} denotes the enhancement features and g()g() is a non-linear activation function. The readers can refer to [31] for a comprehensive evaluation of different activation functions. Then, for the deeper enhancement layer ll, the enhancement features can be computed as

𝐇l=g([𝐇l𝟏,𝐗]𝐖l),\mathbf{H^{\textit{l}}}=g(\mathbf{[H^{\textit{l}-1},X]W_{\textit{l}}}), (5)

where 𝐖l(d+N)×N\mathbf{W_{\textit{l}}}\in\mathbb{R}^{(d+N)\times N} and 𝐇ln×N\mathbf{H^{\textit{l}}}\in\mathbb{R}^{n\times N}. The enhancement weight vectors 𝐖𝟏\mathbf{W_{1}} and 𝐖l\mathbf{W_{\textit{l}}} are randomly initialized and remained fixed during training.

The edRVFL computes the output weights by splitting the task into ll small tasks. The output weights are calculated separately for each layer. There are several differences from using the last layer’s features and all layers’ features for decisions. Most deep learning models only use the last layer’s features for decisions, however, the information from the intermediate features is lost. Using all layers’ features requires a computation on the feature matrix with a huge dimension. Moreover, both of the above architectures only train one network, but our method benefits from the ensemble approach, which reduces the uncertainty of a single model.

The loss function of lthl^{th} enhancement layer is defined as

Lossl=[𝐇l,𝐗]βlY2+λβl2,Loss_{\textit{l}}=||\mathbf{[H^{\textit{l}},X]}\beta_{\textit{l}}-Y||^{2}+\lambda||\beta_{\textit{l}}||^{2}, (6)

where βl\beta_{\textit{l}} denotes the output vector of lthl^{th} layer and λ\lambda is the regularization parameter. The minimization of LosslLoss_{\textit{l}} can be solved via a closed-form solution based on ridge regression [32].

βl=(𝐃𝐓𝐃+λ𝐈)1𝐃TY,\beta_{\textit{l}}=(\mathbf{D^{T}D}+\lambda\mathbf{I})^{-1}\mathbf{D}^{T}Y, (7)

where 𝐃=[𝐇l,𝐗]\mathbf{D}=\mathbf{[H^{\textit{l}},X]}. After computing all βl\beta_{\textit{l}}, the deep network can output LL forecasts. The final forecast is an ensemble of all outputs. Any forecast combination approach can be applied to this procedure [33]. According to the suggestions in [33], the mean or median operation is always likely to improve the forecast combination’s performance. Therefore, we use the mean and median as the combination operator. Correspondingly, two different edRVFLs are proposed, the Mea-edRVFL and Med-edRVFL.

II-D EWT-edRVFL

The model EWT-edRVFL consists of two blocks, the walk-forward EWT decomposition and the edRVFL. The walk-forward EWT is first applied to the load data to extract some features in a causal fashion. Then the raw data concatenated with the sub-series are fed into the edRVFL with LL enhancement layers for learning purposes. The output weights βl\beta_{\textit{l}} of the lthl^{th} enhancement layer are computed according to Equation 7. Finally, we ensemble the LL forecasts with mean or median operation to obtain the output y^\hat{y}. Correspondingly, two different EWT-edRVFLs are proposed, the EWTMea-edRVFL and EWTMed-edRVFL.

Since the higher enhancement layer’s performance depends on the lower ones’, the hyper-parameters of the whole model are tuned in a layer-wise fashion. Once the shallow layer’s hyper-parameters are determined, then they are fixed and the cross-validation approach is applied to the next layer. Layer-wise cross-validation offers a different set of hyper-parameters for each layer. Therefore each enhancement layer has its own regularization parameter, which helps the overall edRVFL learns a diverse set of output layers.

III Empirical study

This section presents the empirical study on twenty load time series collected from the Australian Energy Market Operator (AEMO). First, we briefly introduce the data’ characteristics and pre-processing steps. Then, the benchmark models and hyper-parameter optimization are described. Finally, the simulation results are shown, and discussions are conducted.

III-A Data and its nature

Table I summarizes the descriptive statistics of the twenty load time series. These load data are collected from the states of South Australia (SA), Queensland (QLD), New South Wales (NSW), Victoria (VIC), and Tasmania (TAS) of the year 2020, which is significantly affected by Covid-19. Four months, January, April, July, and October are selected to reflect the four seasons’ characteristics as in [8, 34, 9]. The data are recorded every half an hour. Therefore, there are 48 data points per day.

A suitable and correct data pre-processing approach helps the machine learning model generate accurate outputs. We utilize the max-min normalization to pre-process the raw data. We assume that the maximum and minimum of the training set are xmaxx_{max} and xminx_{min}, respectively. The data are transformed into the range [0,1] using the following equation:

xnormalized=xxminxmaxxminx_{normalized}=\frac{x-x_{min}}{x_{max}-x_{min}} (8)

where xnormalizedx_{normalized} and xx represent the normalized and original time series, respectively.

All datasets are split into three sets, the training, validation and test set, to adopt the cross-validation [35]. The validation and test set account for 10% and 20% of the dataset, respectively. The remaining data are used as the training set.

TABLE I: Descriptive statistics.
Location Month Max Min Median Mean Std Skewness Kurtosis
SA Jan 3085.49 440.54 1212.79 1268.80 427.93 1.26 2.60
Apr 1841.85 503.67 1177.78 1161.61 248.31 -0.33 -0.37
Jul 2383.18 765.27 1489.76 1514.57 338.45 0.26 -0.59
Oct 1955.46 288.92 1140.50 1095.25 266.31 -0.55 0.21
QLD Jan 9620.91 5407.70 6824.81 6941.23 949.16 0.44 -0.65
Apr 7722.78 4480.52 5783.49 5916.37 693.05 0.60 -0.48
Jul 8148.44 4216.62 5783.27 5925.44 812.46 0.35 -0.87
Oct 7646.61 3921.39 5503.29 5673.93 746.37 0.41 -0.59
NSW Jan 13330.14 5765.85 8053.13 8264.22 1535.24 0.85 0.42
Apr 9471.04 5384.58 6983.91 6926.61 792.43 0.20 -0.58
Jul 11739.02 5678.37 8670.19 8690.30 1247.70 0.17 -0.75
Oct 9324.77 5221.13 6999.92 6955.32 771.00 0.01 -0.62
VIC Jan 9507.26 3060.58 4565.41 4765.55 1017.14 1.82 4.39
Apr 6515.96 3094.45 4453.18 4485.45 632.63 0.29 -0.42
Jul 7354.11 3816.70 5497.73 5514.65 832.99 0.04 -0.92
Oct 6142.91 2975.43 4325.26 4379.82 587.84 0.27 -0.53
TAS Jan 1298.63 794.25 1036.17 1040.35 84.44 0.09 -0.26
Apr 1379.49 843.31 1087.11 1093.91 113.14 0.22 -0.71
Jul 1597.64 887.09 1240.32 1246.55 151.24 0.08 -0.86
Oct 1447.61 842.78 1068.39 1087.26 112.91 0.47 -0.33

III-B Results and discussion

Three forecasting error metrics are employed to appraise the accuracy of these models. The first error metric is the regular root mean square error (RMSE) whose definition is

RMSE=1Ltestj=1Ltest(xj^xj)2,RMSE=\sqrt{\frac{1}{L_{test}}\sum_{j=1}^{L_{test}}(\hat{x_{j}}-x_{j})^{2}}, (9)

where LtestL_{test} is the size of the test set, xjx_{j} and xj^\hat{x_{j}} are the raw data and predictions. The second error metric implemented in the paper is the mean absolute scaled error (MASE) [36]. The definition of MASE is

MASE=mean(xj^xj1Ltrain1t=2Ltrain|xtxt1|),MASE=mean(\frac{\hat{x_{j}}-x_{j}}{\frac{1}{L_{train}-1}\sum_{t=2}^{L_{train}}|x_{t}-x_{t-1}|}), (10)

where LtrainL_{train} represents the size of training set. The denominator of MASE is the mean absolute error of the in-sample naive forecast. The third error metric is the Mean Absolute Percentage Error (MAPE) whose definition is

MAPE=1Ltestj=1Ltest|xj^xjxj|.MAPE=\frac{1}{L_{test}}\sum_{j=1}^{L_{test}}|\frac{\hat{x_{j}}-x_{j}}{x_{j}}|. (11)

We compare the proposed model with many classical and state-of-the-art models. These models are Persistence model [2], ARIMA [3], SVR [5], MLP [13], LSTM [37], Temporal CNN (TCN) [38], hybrid EWT fuzzy cognitive map (FCM) learned with SVR (EWTFCMSVR) [7], Wavelet High-order FCM (WHFCM) [29], Laplacian ESN (LapESN) [39], EWTRVFL [8] and RVFL [6]. The previous one day, 48 data points are used as input for all the models as in [8]. To achieve a fair comparison, all models’ hyper-parameters are optimized by cross-validation. The hyper-parameter search space is presented in Table II. The decomposition level for the walk-forward EWT is set to 2 according to the conclusion and suggestions in [8]. Some parameters are not involved in the optimization process and they are set to the same values for all the relevant models, which include the batch size equals to 32, learning rate equals to 0.001 and epochs equal to 200.

Tables III, IV and V summarize the performance on the test sets. The numbers in bold represent the corresponding model’s performance is the best on the specific time series. Figures 3, 4, 5, 6 and 7 present the comparison of raw data and the forecasts generated by the proposed model. It is clear to find that the proposed model anticipates future trends, cycles, and fluctuations accurately. Statistical tests are implemented to investigate the difference among all the models further. We first implement the Friedman test, and the pp-value is smaller than 0.05, which represents that these forecasting models are significantly different on these twenty datasets. Therefore, a post-hoc Nemenyi test is utilized to distinguish them [40]. The critical distance of the Nemenyi test is calculated by:

CD=qαk(k+1)6NdCD=q_{\alpha}\sqrt{\frac{k(k+1)}{6N_{d}}} (12)

where qαq_{\alpha} is the critical value coming from the studentized range statistic divided by 2\sqrt{2}, kk represents the number of models and NdN_{d} is the number of datasets [40]. Figure 8 represents the Nemenyi test results. The figures show that the models that achieve excellent performance are at the top, whereas the model with the worst performance is at the bottom. Some consistent conclusions can be drawn from the Nemenyi test results of three error metrics. The Persistence method is the tailender because it learns nothing about the patterns. ARIMA is a penultimate because of its simple linear structure. The LSTM model outperforms many benchmark models except the EWTRVFL and the model proposed in this paper. Figure 8 demonstrates the superiority of the proposed models because they are always at the top. Another finding is that the edRVFL with mean ensemble operator is better than the median operator. A pair-wise Nemenyi post-hoc statistical comparison is further conducted and the pp values are shown in Tables VII, VIII and IX. The pp values smaller than 0.05 indicate that the two corresponding models are significantly different. The negative one in the diagonal positions represent that it is meaningless to compare the model with itself. The proposed EWTMea-edRVFL is significantly different from Persistence, ARIMA, SVR, MLP, LSTM, TCN, EWT-FCM-SVR, WHFCM, LapESN, and RVFL. The Mea-edRVFL and Med-edRVFL do not show significant superiority over LSTM, WHFCM, Lap ESN, RVFL, EWT-RVFL, and the EWT-based edRVFL models.

Refer to caption
Figure 3: Comparisons of raw data and forecasts for the SA dataset.
Refer to caption
Figure 4: Comparisons of raw data and forecasts for the QLD dataset.
Refer to caption
Figure 5: Comparisons of raw data and forecasts for the NSW dataset.
Refer to caption
Figure 6: Comparisons of raw data and forecasts for the VIC dataset.
Refer to caption
Figure 7: Comparisons of raw data and forecasts for the TAS dataset.
TABLE II: Hyper-parameter search space for the benchmark models.
Model Parameter Values
ARIMA p/qp/q [1,2,3]
dd 0,1
SVR CC [210,20][2^{-10},2^{0}]
ϵ\epsilon [0.001,0.01,0.1]
Radius [0.001,0.01,0.1]
MLP Hidden nodes [2,4,8,16,32]
Layers [1,2,3]
Optimizer Adam
Activation ReluRelu
LSTM Hidden nodes [2,4,8,16,32]
Layers [1,2,3]
Optimizer Adam
Activation TanhTanh
TCN Filters [2,4,8,16,32]
Kernel size 2
Optimizer Adam
Activation ReluRelu
EWTFCMSVR Concepts [2,6]
WHFCM Regularization [20,28][2^{0},2^{-8}]
LapESN Reservoir size [50,200,50]
Spectral radius [0.96,0.98]
Input scalings [0.001,0.01,0.1]
RVFL Ennhancement nodes [50,200,50]
Proposed Regularization parameter [24,28,0][2^{-4},2^{-8},0]
TABLE III: Comparative results in terms of RMSE.
Location Persistence [2] ARIMA [3] SVR [5] MLP [41] LSTM [37] TCN [38] EWTFCMSVR [7] WHFCM [29] LapESN [39] RVFL [6] EWTRVFL [8] Med-edRVFL Mea-edRVFL EWTMed-edRVFL EWTMea-edRVFL
SA Jan 72.8870 55.6380 66.9695 54.0868 59.6167 56.8684 70.2112 45.6521 52.2596 56.9607 45.0455 48.6848 48.7193 43.3276 43.6756
Apr 67.9854 55.6386 52.7960 49.5188 55.7887 58.4276 111.5263 50.7253 53.0914 51.7494 47.3257 50.1305 49.8541 48.2897 47.9759
Jul 96.1804 61.5945 50.4778 56.4437 46.0799 58.9092 43.7050 45.9180 53.9056 50.9046 45.5713 49.1807 48.6929 45.0094 44.9430
Oct 66.4943 51.9143 48.7796 54.9934 44.5717 50.2702 63.0079 45.1588 46.1959 45.3673 44.2800 44.7918 44.6774 46.0017 45.7415
QLD Jan 149.3016 72.1442 101.8223 84.6203 63.4356 69.3515 99.0970 58.6258 58.4367 56.2622 54.1518 55.4811 55.4332 54.8145 54.5508
Apr 135.1574 72.4100 57.1544 60.1787 49.3209 75.5726 149.5419 58.7142 60.0888 56.8695 53.2141 54.6669 54.1989 51.6416 51.0361
Jul 217.7307 92.4658 73.7379 79.8500 63.6131 73.6105 64.9643 75.7170 69.3350 66.0357 62.2991 64.3635 64.1440 63.3489 63.0091
Oct 149.8869 101.2928 122.5457 106.1602 102.8157 102.7253 193.6453 92.7304 93.5727 93.7343 92.5209 91.4382 91.3391 91.4528 91.2787
NSW Jan 241.6598 128.5224 176.6366 177.7787 124.1342 123.5894 124.7843 125.0457 121.5456 119.2369 118.7250 119.3599 119.1618 120.6610 120.3967
Apr 170.5394 107.0291 123.2308 82.3968 109.8911 114.7059 262.1209 75.4579 86.3372 82.9668 68.5926 81.1944 80.5445 74.4880 73.7032
Jul 294.9618 131.8007 130.3933 119.4671 100.2549 98.0032 85.5482 95.0363 103.8141 105.9918 120.4936 96.4300 99.9890 94.2159 93.9689
Oct 179.3761 97.9060 147.0332 96.8127 105.0375 125.7355 99.5157 85.9031 87.3090 87.9294 83.3434 83.5984 83.4924 81.4568 81.1915
VIC Jan 166.0274 107.4523 476.5141 105.5127 160.8402 167.1842 80.5299 96.0986 96.8277 99.3172 89.0404 96.9804 98.4364 93.2445 93.6646
Apr 161.6524 95.0628 112.7885 91.8794 90.1323 103.5377 157.6802 79.5648 79.3548 85.3221 85.0635 77.2368 77.3312 76.7201 76.4103
Jul 202.3882 100.1305 76.3694 73.9571 66.8791 86.8470 234.3873 77.2782 77.6613 71.9779 68.2234 68.4402 67.8317 66.9922 66.4860
Oct 146.7197 93.5497 85.8821 84.7936 78.0583 95.2157 68.0958 77.1013 75.3540 76.0608 70.7279 72.0342 71.8692 71.1336 70.5075
TAS Jan 22.6897 18.8835 21.7117 20.3779 17.7090 19.9012 18.6469 18.7898 18.3987 18.3759 18.3235 18.2444 18.2500 18.2270 18.2175
Apr 29.5110 20.4644 17.2503 25.9389 18.1222 19.1709 17.5378 20.0259 18.1358 17.5151 17.6300 17.2185 17.1762 17.0570 17.0362
Jul 41.6062 24.1888 22.5443 22.4608 20.4853 22.6558 20.1255 24.9275 20.9395 21.7029 20.8809 20.2539 20.1882 19.6646 19.5957
Oct 30.8810 21.3638 19.4400 20.1470 19.3222 21.0611 20.0869 20.8855 19.9530 19.6488 19.4225 19.4212 19.3918 19.2082 19.1757
TABLE IV: Comparative results in terms of MASE.
Location Month Persistence [2] ARIMA [3] SVR [5] MLP [41] LSTM [37] TCN [38] EWTFCMSVR [7] WHFCM [29] LapESN [39] RVFL [6] EWTRVFL [8] Med-edRVFL Mea-edRVFL EWTMed-edRVFL EWTMea-edRVFL
SA Jan 1.2552 0.8463 0.9083 0.8405 0.8355 0.8471 1.0406 0.7138 0.7733 0.8153 0.6966 0.7188 0.7209 0.6720 0.6772
Apr 1.1203 0.8195 0.7782 0.7581 0.8278 0.8801 1.7619 0.8120 0.7971 0.8048 0.7271 0.7513 0.7477 0.7328 0.7283
Jul 1.1060 0.5701 0.4610 0.5598 0.4125 0.5624 0.4049 0.4437 0.5319 0.5103 0.4411 0.4705 0.4656 0.4322 0.4319
Oct 1.0056 0.7204 0.6215 0.8209 0.6088 0.6858 0.8455 0.6309 0.6299 0.6353 0.6303 0.6159 0.6148 0.6494 0.6454
QLD Jan 1.0560 0.4847 0.6849 0.6120 0.4403 0.4635 0.4630 0.3981 0.4017 0.3898 0.3776 0.3838 0.3835 0.3736 0.3718
Apr 1.0272 0.5268 0.3905 0.4328 0.3295 0.5529 1.1220 0.4356 0.4401 0.4121 0.3871 0.3892 0.3857 0.3745 0.3702
Jul 1.1261 0.4237 0.3345 0.3696 0.3015 0.3432 0.3085 0.3544 0.3248 0.3146 0.2898 0.3060 0.3052 0.2977 0.2957
Oct 1.0274 0.6096 0.7358 0.6492 0.6073 0.6036 1.3169 0.5511 0.5521 0.5646 0.5576 0.5444 0.5431 0.5432 0.5422
NSW Jan 1.4312 0.5671 0.8771 0.9933 0.5749 0.5624 0.5576 0.5445 0.5562 0.5439 0.5404 0.5363 0.5358 0.5426 0.5418
Apr 1.0095 0.6026 0.5571 0.4514 0.5353 0.6150 1.5863 0.4308 0.4851 0.4540 0.3834 0.4393 0.4358 0.4101 0.4056
Jul 0.9287 0.3917 0.3625 0.3415 0.3018 0.2910 0.2551 0.2842 0.3078 0.3074 0.3355 0.2750 0.2820 0.2693 0.2680
Oct 1.0979 0.5425 0.7185 0.5264 0.5644 0.6956 0.5590 0.4746 0.4696 0.4790 0.4571 0.4504 0.4497 0.4340 0.4326
VIC Jan 1.3105 0.7993 2.3222 0.8405 1.0803 1.0792 0.6126 0.7456 0.7330 0.7268 0.6341 0.7153 0.7193 0.6875 0.6874
Apr 1.1833 0.6260 0.7401 0.6363 0.5785 0.6515 0.9166 0.5284 0.5260 0.5611 0.5613 0.5078 0.5080 0.5014 0.4994
Jul 1.0659 0.4864 0.3698 0.3608 0.3264 0.4103 1.2698 0.3729 0.3774 0.3492 0.3332 0.3268 0.3246 0.3224 0.3201
Oct 0.9891 0.5518 0.5032 0.5154 0.4693 0.5786 0.4141 0.4647 0.4652 0.4763 0.4345 0.4460 0.4449 0.4383 0.4354
TAS Jan 1.1101 0.8751 1.0609 0.9565 0.8581 0.9171 0.8967 0.8819 0.8769 0.8633 0.8627 0.8594 0.8590 0.8601 0.8587
Apr 1.0463 0.6983 0.6081 0.9746 0.6298 0.6694 0.5870 0.6926 0.6143 0.5968 0.6045 0.5803 0.5793 0.5756 0.5745
Jul 1.1317 0.6349 0.5599 0.5926 0.5358 0.5890 0.5169 0.6721 0.5384 0.5500 0.5307 0.5078 0.5061 0.4932 0.4905
Oct 1.0218 0.6730 0.6162 0.6354 0.6145 0.6872 0.6295 0.6598 0.6269 0.6252 0.6210 0.6115 0.6106 0.6088 0.6083
TABLE V: Comparative results in terms of MAPE.
Location Month Persistence [2] ARIMA [3] SVR [5] MLP [41] LSTM [37] TCN [38] EWTFCMSVR [7] WHFCM [29] LapESN [39] RVFL [6] EWTRVFL [8] Med-edRVFL Mea-edRVFL EWTMed-edRVFL EWTMea-edRVFL
SA Jan 0.03832 0.02579 0.02579 0.02600 0.02413 0.02478 0.03112 0.02190 0.02313 0.02414 0.02143 0.02176 0.02178 0.02093 0.02101
Apr 0.04442 0.03280 0.03104 0.03127 0.03330 0.03411 0.07170 0.03389 0.03246 0.03296 0.03098 0.03080 0.03065 0.03048 0.03034
Jul 0.05192 0.02697 0.02229 0.02676 0.02013 0.02664 0.02053 0.02184 0.02584 0.02505 0.02228 0.02294 0.02270 0.02185 0.02185
Oct 0.04723 0.03363 0.02968 0.03962 0.02909 0.03218 0.04202 0.03016 0.03010 0.03037 0.03052 0.02932 0.02927 0.03116 0.03100
QLD Jan 0.01639 0.00747 0.01072 0.00951 0.00688 0.00712 0.00707 0.00617 0.00628 0.00606 0.00589 0.00597 0.00596 0.00580 0.00577
Apr 0.01848 0.00949 0.00714 0.00797 0.00600 0.01015 0.02166 0.00793 0.00811 0.00761 0.00725 0.00715 0.00709 0.00691 0.00683
Jul 0.03002 0.01125 0.00899 0.01001 0.00818 0.00911 0.00842 0.00952 0.00877 0.00853 0.00789 0.00828 0.00826 0.00806 0.00800
Oct 0.01990 0.01176 0.01393 0.01271 0.01159 0.01161 0.02650 0.01072 0.01072 0.01101 0.01087 0.01060 0.01057 0.01057 0.01055
NSW Jan 0.02287 0.00869 0.01373 0.01555 0.00865 0.00879 0.00859 0.00837 0.00854 0.00837 0.00833 0.00825 0.00824 0.00834 0.00833
Apr 0.01901 0.01117 0.01001 0.00843 0.00984 0.01138 0.03066 0.00810 0.00914 0.00846 0.00729 0.00823 0.00817 0.00774 0.00765
Jul 0.02753 0.01148 0.01074 0.01003 0.00914 0.00854 0.00765 0.00841 0.00917 0.00915 0.01012 0.00819 0.00842 0.00800 0.00797
Oct 0.02052 0.01015 0.01278 0.00988 0.01042 0.01277 0.01052 0.00887 0.00882 0.00891 0.00855 0.00843 0.00841 0.00813 0.00811
VIC Jan 0.02269 0.01423 0.03252 0.01552 0.01743 0.01748 0.01102 0.01345 0.01293 0.01271 0.01115 0.01249 0.01252 0.01203 0.01199
Apr 0.02973 0.01592 0.01832 0.01669 0.01457 0.01619 0.02435 0.01349 0.01348 0.01437 0.01442 0.01304 0.01305 0.01287 0.01283
Jul 0.03014 0.01395 0.01068 0.01062 0.00950 0.01192 0.03815 0.01070 0.01097 0.01014 0.00975 0.00948 0.00942 0.00938 0.00931
Oct 0.02657 0.01496 0.01364 0.01411 0.01282 0.01570 0.01144 0.01267 0.01274 0.01303 0.01198 0.01220 0.01217 0.01201 0.01193
TAS Jan 0.01633 0.01292 0.01551 0.01403 0.01267 0.01345 0.01326 0.01299 0.01294 0.01272 0.01272 0.01266 0.01265 0.01267 0.01265
Apr 0.02101 0.01420 0.01247 0.02014 0.01292 0.01377 0.01205 0.01407 0.01251 0.01222 0.01243 0.01186 0.01185 0.01182 0.01179
Jul 0.02673 0.01532 0.01356 0.01437 0.01299 0.01427 0.01260 0.01613 0.01304 0.01337 0.01292 0.01229 0.01226 0.01195 0.01189
Oct 0.02164 0.01442 0.01321 0.01363 0.01317 0.01464 0.01354 0.01409 0.01347 0.01344 0.01335 0.01314 0.01312 0.01308 0.01307
TABLE VI: Average ranking of all models.
Persistence [2] ARIMA [3] SVR [5] MLP [41] LSTM [37] TCN [38] EWTFCMSVR [7] WHFCM [29] LapESN [39] RVFL [6] EWTRVFL [8] Med-edRVFL Mea-edRVFLL EWTMed-edRVFL EWTMea-edRVFL
RMSE 14.65 11.85 11.3 10.65 7.45 11.45 9.55 7.95 8.30 7.75 4.05 5.15 4.6 3.15 2.15
MASE 14.7 11.85 10.6 11.1 7.4 11.7 9.45 8.25 8.25 7.85 4.75 4.7 3.95 3.25 2.20
MAPE 14.70 11.80 10.35 11.40 7.20 11.7 9.55 8.1 8.25 7.90 5.35 4.5 3.9 3.15 2.15
Refer to caption
Refer to caption
Refer to caption
Figure 8: Nemenyi testing results for load forecasting based on: (a) RMSE, (b) MASE and (c) MAPE. The critical distance is 4.50. The Friedman p-values are: (a) 6.41e-45, (b) 3.02e-37 and (c) 8.48e-40.
TABLE VII: Pairwise comparisons using Nemenyi post-hoc test based on RMSE.
Persistence [2] ARIMA [3] SVR [5] MLP [41] LSTM [37] TCN [38] EWTFCMSVR [7] WHFCM [29] LapESN [39] RVFL [6] EWTRVFL [8] Med-edRVFL Mea-edRVFL EWTMed-edRVFL EWTMea-edRVFL
Persistence [2] -1.000 0.783 0.533 0.232 0.001 0.601 0.025 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
ARIMA [3] 0.783 -1.000 0.900 0.900 0.115 0.900 0.900 0.271 0.438 0.196 0.001 0.001 0.001 0.001 0.001
SVR [5] 0.533 0.900 -1.000 0.900 0.292 0.900 0.900 0.533 0.692 0.438 0.001 0.001 0.001 0.001 0.001
MLP [41] 0.232 0.900 0.900 -1.000 0.601 0.900 0.900 0.829 0.900 0.738 0.001 0.009 0.002 0.001 0.001
LSTM [37] 0.001 0.115 0.292 0.601 -1.000 0.232 0.900 0.900 0.900 0.900 0.510 0.900 0.760 0.138 0.015
TCN [38] 0.601 0.900 0.900 0.900 0.232 -1.000 0.900 0.463 0.624 0.361 0.001 0.001 0.001 0.001 0.001
EWTFCMSVR [7] 0.025 0.900 0.900 0.900 0.900 0.900 -1.000 0.900 0.900 0.900 0.009 0.115 0.035 0.001 0.001
WHFCM [29] 0.001 0.271 0.533 0.829 0.900 0.463 0.900 -1.000 0.900 0.900 0.271 0.783 0.533 0.050 0.004
LapESN [39] 0.001 0.438 0.692 0.900 0.900 0.624 0.900 0.900 -1.000 0.900 0.151 0.624 0.361 0.022 0.001
RVFL [6] 0.001 0.196 0.438 0.738 0.900 0.361 0.900 0.900 0.900 -1.000 0.361 0.874 0.624 0.077 0.007
EWTRVFL [8] 0.001 0.001 0.001 0.001 0.510 0.001 0.009 0.271 0.151 0.361 -1.000 0.900 0.900 0.900 0.900
Med-edRVFL 0.001 0.001 0.001 0.009 0.900 0.001 0.115 0.783 0.624 0.874 0.900 -1.000 0.900 0.900 0.692
Mea-edRVFL 0.001 0.001 0.001 0.002 0.760 0.001 0.035 0.533 0.361 0.624 0.900 0.900 -1.000 0.900 0.900
EWTMed-edRVFL 0.001 0.001 0.001 0.001 0.138 0.001 0.001 0.050 0.022 0.077 0.900 0.900 0.900 -1.000 0.900
EWTMea-edRVFL 0.001 0.001 0.001 0.001 0.015 0.001 0.001 0.004 0.001 0.007 0.900 0.692 0.900 0.900 -1.000
TABLE VIII: Pairwise comparisons using Nemenyi post-hoc test based on MASE.
Persistence [2] ARIMA [3] SVR [5] MLP [41] LSTM [37] TCN [38] EWTFCMSVR [7] WHFCM [29] LapESN [39] RVFL [6] EWTRVFL [8] Med-edRVFL Mea-edRVFL EWTMed-edRVFL EWTMea-edRVFL
Persistence [2] -1.000 0.760 0.196 0.412 0.001 0.692 0.017 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
ARIMA [3] 0.760 -1.000 0.900 0.900 0.104 0.900 0.900 0.412 0.412 0.232 0.001 0.001 0.001 0.001 0.001
SVR [5] 0.196 0.900 -1.000 0.900 0.601 0.900 0.900 0.900 0.900 0.806 0.003 0.003 0.001 0.001 0.001
MLP [41] 0.412 0.900 0.900 -1.000 0.361 0.900 0.900 0.760 0.760 0.578 0.001 0.001 0.001 0.001 0.001
LSTM [37] 0.001 0.104 0.601 0.361 -1.000 0.138 0.900 0.900 0.900 0.900 0.851 0.829 0.487 0.180 0.019
TCN [38] 0.692 0.900 0.900 0.900 0.138 -1.000 0.900 0.487 0.487 0.292 0.001 0.001 0.001 0.001 0.001
EWTFCMSVR [7] 0.017 0.900 0.900 0.900 0.900 0.900 -1.000 0.900 0.900 0.900 0.062 0.056 0.009 0.001 0.001
WHFCM [29] 0.001 0.412 0.900 0.760 0.900 0.487 0.900 -1.000 0.900 0.900 0.463 0.438 0.138 0.031 0.002
LapESN [39] 0.001 0.412 0.900 0.760 0.900 0.487 0.900 0.900 -1.000 0.900 0.463 0.438 0.138 0.031 0.002
RVFL [6] 0.001 0.232 0.806 0.578 0.900 0.292 0.900 0.900 0.900 -1.000 0.647 0.624 0.271 0.077 0.006
EWTRVFL [8] 0.001 0.001 0.003 0.001 0.851 0.001 0.062 0.463 0.463 0.647 -1.000 0.900 0.900 0.900 0.897
Med-edRVFL 0.001 0.001 0.003 0.001 0.829 0.001 0.056 0.438 0.438 0.624 0.900 -1.000 0.900 0.900 0.900
Mea-edRVFL 0.001 0.001 0.001 0.001 0.487 0.001 0.009 0.138 0.138 0.271 0.900 0.900 -1.000 0.900 0.900
EWTMed-edRVFL 0.001 0.001 0.001 0.001 0.180 0.001 0.001 0.031 0.031 0.077 0.900 0.900 0.900 -1.000 0.900
EWTMea-edRVFL 0.001 0.001 0.001 0.001 0.019 0.001 0.001 0.002 0.002 0.006 0.897 0.900 0.900 0.900 -1.000
TABLE IX: Pairwise comparisons using Nemenyi post-hoc test based on MAPE.
Persistence [2] ARIMA [3] SVR [5] MLP [41] LSTM [37] TCN [38] EWTFCMSVR [7] WHFCM [29] LapESN [39] RVFL [6] EWTRVFL [8] Med-edRVFL Mea-edRVFL EWTMed-edRVFL EWTMea-edRVFL
Persistence [2] -1.000 0.738 0.125 0.556 0.001 0.692 0.022 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001
ARIMA [3] 0.738 -1.000 0.900 0.900 0.077 0.900 0.900 0.361 0.438 0.271 0.001 0.001 0.001 0.001 0.001
SVR [5] 0.125 0.900 -1.000 0.900 0.624 0.900 0.900 0.900 0.900 0.900 0.031 0.003 0.001 0.001 0.001
MLP [41] 0.556 0.900 0.900 -1.000 0.166 0.900 0.900 0.556 0.624 0.463 0.002 0.001 0.001 0.001 0.001
LSTM [37] 0.001 0.077 0.624 0.166 -1.000 0.094 0.900 0.900 0.900 0.900 0.900 0.829 0.556 0.214 0.028
TCN [38] 0.692 0.900 0.900 0.900 0.094 -1.000 0.900 0.412 0.487 0.313 0.001 0.001 0.001 0.001 0.001
EWTFCMSVR [7] 0.022 0.900 0.900 0.900 0.900 0.900 -1.000 0.900 0.900 0.900 0.166 0.028 0.006 0.001 0.001
WHFCM [29] 0.001 0.361 0.900 0.556 0.900 0.412 0.900 -1.000 0.900 0.900 0.806 0.412 0.166 0.035 0.002
LapESN [39] 0.001 0.438 0.900 0.624 0.900 0.487 0.900 0.900 -1.000 0.900 0.738 0.336 0.125 0.025 0.002
RVFL [6] 0.001 0.271 0.900 0.463 0.900 0.313 0.900 0.900 0.900 -1.000 0.897 0.510 0.232 0.056 0.004
EWTRVFL [8] 0.001 0.001 0.031 0.002 0.900 0.001 0.166 0.806 0.738 0.897 -1.000 0.900 0.900 0.900 0.601
Med-edRVFL 0.001 0.001 0.003 0.001 0.829 0.001 0.028 0.412 0.336 0.510 0.900 -1.000 0.900 0.900 0.900
Mea-edRVFL 0.001 0.001 0.001 0.001 0.556 0.001 0.006 0.166 0.125 0.232 0.900 0.900 -1.000 0.900 0.900
EWTMed-edRVFL 0.001 0.001 0.001 0.001 0.214 0.001 0.001 0.035 0.025 0.056 0.900 0.900 0.900 -1.000 0.900
EWTMea-edRVFL 0.001 0.001 0.001 0.001 0.028 0.001 0.001 0.002 0.002 0.004 0.601 0.900 0.900 0.900 -1.000

Table X records the simulation time for optimization and training time. It is worth noting that the optimization time is the time of the cross-validation using grid-search. The training time represents the time that the model is trained using the hyper-parameters selected by the cross-validation. The time for RVFL-related models is the summation of twenty runs. Several phenomenons are concluded according to Table X. The most time-consuming model is the LSTM because of its recurrent structure which processes the data in a sequential order. The hybrid RVFL model with EWT is more time-consuming than RVFL-related models. For example, the EWT-RVFL and EWT-edRVFL are more time-consuming than RVFL and edRVFL, respectively. Therefore, the main computation is in the walk-forward EWT decomposition block because it happens at each step.

TABLE X: Average time complexity (per second) for all the models.
Optimization time Training time
ARIMA [3] 42.595 3.692
SVR [5] 4.058 0.109
MLP [41] 65.260 4.386
LSTM [37] 1561.631 150.642
TCN [38] 171.563 50.919
EWTFCMSVR [7] 40.528 26.531
WHFCM [29] 21.755 0.130
LapESN [39] 29.182 6.078
RVFL [6] 1.689 0.140
EWTRVFL [8] 42.518 2.060
edRVFL 31.859 7.307
EWT-edRVFL 75.620 14.067

IV Conclusion

This paper proposes a novel ensemble deep RVFL network combined with walk-forward decomposition for short-term load forecasting. The enhancement layers’ weights are randomly initialized and kept fixed as in the shallow RVFL network. Only the output weights of each layer are computed in a closed form. Since the enhancement features are unsupervised and randomly initialized, the walk-forward EWT is implemented to augment the feature extraction. The walk-forward EWT is different from most literature, where the whole time series is decomposed at one time. Therefore, there is no data leakage problem during the decomposition process. Finally, the mean and median of all forecasts are used as the final output. The experiments on twenty electricity loads demonstrate the superiority and efficiency of the proposed model. Moreover, the proposed model does not suffer from a colossal computation burden compared with other deep learning models which are fully trained.

There are several reasons for the superiority of the proposed model:

  1. 1.

    The edRVFL’s structure benefits from ensemble learning. The edRVFL treats each enhancement layer as a single forecaster. Therefore, the ensemble multiple forecasters reduce the uncertainty of a single forecaster.

  2. 2.

    The clean raw data are fed into all enhancement layers to calibrate the random features’ generation.

  3. 3.

    The output layer learns both the linear patterns from the direct link and nonlinear patterns from the enhancement features.

  4. 4.

    The walk-forward EWT is used as a feature engineering block to boost the accuracy further.

Although our model shows its superiority in these twenty datasets, there are still some limitations. For the walk-forward EWT process, whether to discard the highest frequency is an open problem. It is challenging to determine how valuable information is in the highest frequency component. Moreover, other learning techniques can be considered to further boost the performance, like incremental learning and semi-unsupervised learning.

Acknowledgment

The authors thank the anonymous reviewers for providing valuable comments to improve this paper.

References

  • [1] A. Heydari, M. M. Nezhad, E. Pirshayan, D. A. Garcia, F. Keynia, and L. De Santoli, “Short-term electricity price and load forecasting in isolated power grids based on composite neural network and gravitational search optimization algorithm,” Applied Energy, vol. 277, p. 115503, 2020.
  • [2] S. Makridakis, S. C. Wheelwright, and R. J. Hyndman, Forecasting methods and applications.   John wiley & sons, 2008.
  • [3] J. Contreras, R. Espinola, F. J. Nogales, and A. J. Conejo, “Arima models to predict next-day electricity prices,” IEEE transactions on power systems, vol. 18, no. 3, pp. 1014–1020, 2003.
  • [4] R. Gao and O. Duru, “Parsimonious fuzzy time series modelling,” Expert Systems with Applications, vol. 156, p. 113447, 2020.
  • [5] B.-J. Chen, M.-W. Chang et al., “Load forecasting using support vector machines: A study on eunite competition 2001,” IEEE transactions on power systems, vol. 19, no. 4, pp. 1821–1830, 2004.
  • [6] Y. Ren, P. N. Suganthan, N. Srikanth, and G. Amaratunga, “Random vector functional link network for short-term electricity load demand forecasting,” Information Sciences, vol. 367, pp. 1078–1093, 2016.
  • [7] R. Gao, L. Du, and K. F. Yuen, “Robust empirical wavelet fuzzy cognitive map for time series forecasting,” Engineering Applications of Artificial Intelligence, vol. 96, p. 103978, 2020.
  • [8] R. Gao, L. Du, K. F. Yuen, and P. N. Suganthan, “Walk-forward empirical wavelet random vector functional link for time series forecasting,” Applied Soft Computing, vol. 108, p. 107450, 2021.
  • [9] X. Qiu, Y. Ren, P. N. Suganthan, and G. A. Amaratunga, “Empirical mode decomposition based ensemble deep learning for load demand time series forecasting,” Applied Soft Computing, vol. 54, pp. 246–255, 2017.
  • [10] Y. Ren, P. Suganthan, and N. Srikanth, “A comparative study of empirical mode decomposition-based short-term wind speed forecasting methods,” IEEE Transactions on Sustainable Energy, vol. 6, no. 1, pp. 236–244, 2014.
  • [11] X. Qiu, L. Zhang, P. N. Suganthan, and G. A. Amaratunga, “Oblique random forest ensemble via least square estimation for time series forecasting,” Information Sciences, vol. 420, pp. 249–262, 2017.
  • [12] X. Qiu, P. N. Suganthan, and G. A. Amaratunga, “Ensemble incremental learning random vector functional link network for short-term electric load forecasting,” Knowledge-Based Systems, vol. 145, pp. 182–196, 2018.
  • [13] A. Almalaq and G. Edwards, “A review of deep learning methods applied on load forecasting,” in 2017 16th IEEE international conference on machine learning and applications (ICMLA).   IEEE, 2017, pp. 511–516.
  • [14] J. W. Taylor, “Short-term load forecasting with exponentially weighted methods,” IEEE Transactions on Power Systems, vol. 27, no. 1, pp. 458–464, 2011.
  • [15] M. Ali, M. Adnan, M. Tariq, and H. V. Poor, “Load forecasting through estimated parametrized based fuzzy inference system in smart grids,” IEEE Transactions on Fuzzy Systems, 2020.
  • [16] H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting—a novel pooling deep rnn,” IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 5271–5280, 2017.
  • [17] G. Hafeez, K. S. Alimgeer, and I. Khan, “Electric load forecasting based on deep learning and optimized by heuristic algorithm in smart grid,” Applied Energy, vol. 269, p. 114915, 2020.
  • [18] M. N. Fekri, H. Patel, K. Grolinger, and V. Sharma, “Deep learning for load forecasting with smart meter data: Online adaptive recurrent neural network,” Applied Energy, vol. 282, p. 116177, 2021.
  • [19] G. Chitalia, M. Pipattanasomporn, V. Garg, and S. Rahman, “Robust short-term electrical load forecasting framework for commercial buildings using deep recurrent neural networks,” Applied Energy, vol. 278, p. 115410, 2020.
  • [20] X. Qiu, L. Zhang, Y. Ren, P. N. Suganthan, and G. Amaratunga, “Ensemble deep learning for regression and time series forecasting,” in 2014 IEEE symposium on computational intelligence in ensemble learning (CIEL).   IEEE, 2014, pp. 1–6.
  • [21] D. Needell, A. A. Nelson, R. Saab, and P. Salanevich, “Random vector functional link networks for function approximation on manifolds,” arXiv preprint arXiv:2007.15776, 2020.
  • [22] Y. Ren, P. N. Suganthan, and N. Srikanth, “A novel empirical mode decomposition with support vector regression for wind speed forecasting,” IEEE transactions on neural networks and learning systems, vol. 27, no. 8, pp. 1793–1798, 2014.
  • [23] Q. Shi, R. Katuwal, P. Suganthan, and M. Tanveer, “Random vector functional link neural network based ensemble deep learning,” Pattern Recognition, vol. 117, p. 107978, 2021.
  • [24] J. Gilles, “Empirical wavelet transform,” IEEE transactions on signal processing, vol. 61, no. 16, pp. 3999–4010, 2013.
  • [25] P. Flandrin, G. Rilling, and P. Goncalves, “Empirical mode decomposition as a filter bank,” IEEE signal processing letters, vol. 11, no. 2, pp. 112–114, 2004.
  • [26] J. Spencer, Ten lectures on the probabilistic method.   SIAM, 1994, vol. 64.
  • [27] P. G. Casazza et al., “The art of frame theory,” Taiwanese Journal of Mathematics, vol. 4, no. 2, pp. 129–201, 2000.
  • [28] L. Ghelardoni, A. Ghio, and D. Anguita, “Energy load forecasting using empirical mode decomposition and support vector regression,” IEEE Transactions on Smart Grid, vol. 4, no. 1, pp. 549–556, 2013.
  • [29] S. Yang and J. Liu, “Time-series forecasting based on high-order fuzzy cognitive maps and wavelet transform,” IEEE Transactions on Fuzzy Systems, vol. 26, no. 6, pp. 3391–3402, 2018.
  • [30] Y. Huang and Y. Deng, “A new crude oil price forecasting model based on variational mode decomposition,” Knowledge-Based Systems, vol. 213, p. 106669, 2021.
  • [31] L. Zhang and P. N. Suganthan, “A comprehensive evaluation of random vector functional link networks,” Information sciences, vol. 367, pp. 1094–1105, 2016.
  • [32] C. Saunders, A. Gammerman, and V. Vovk, “Ridge regression learning algorithm in dual variables,” Proceedings of the 15th International Conference on Machine Learning, 1998.
  • [33] A. Timmermann, “Forecast combinations,” Handbook of economic forecasting, vol. 1, pp. 135–196, 2006.
  • [34] S. M. J. Jalali, S. Ahmadian, A. Khosravi, M. Shafie-khah, S. Nahavandi, and J. P. Catalao, “A novel evolutionary-based deep convolutional neural network model for intelligent load forecasting,” IEEE Transactions on Industrial Informatics, 2021.
  • [35] C. Bergmeir and J. M. Benítez, “On the use of cross-validation for time series predictor evaluation,” Information Sciences, vol. 191, pp. 192–213, 2012.
  • [36] R. J. Hyndman and A. B. Koehler, “Another look at measures of forecast accuracy,” International journal of forecasting, vol. 22, no. 4, pp. 679–688, 2006.
  • [37] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, “Short-term residential load forecasting based on lstm recurrent neural network,” IEEE Transactions on Smart Grid, vol. 10, no. 1, pp. 841–851, 2017.
  • [38] S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” Universal Language Model Fine-tuning for Text Classification, 2018.
  • [39] M. Han and M. Xu, “Laplacian echo state network for multivariate time series prediction,” IEEE transactions on neural networks and learning systems, vol. 29, no. 1, pp. 238–244, 2017.
  • [40] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” Journal of Machine learning research, vol. 7, no. Jan, pp. 1–30, 2006.
  • [41] N. Kandil, R. Wamkeue, M. Saad, and S. Georges, “An efficient approach for short term load forecasting using artificial neural networks,” International Journal of Electrical Power & Energy Systems, vol. 28, no. 8, pp. 525–530, 2006.