\supertitle

Submission Template for IET Research Journal Papers

Detecting Load Redistribution Attacks via Support Vector Models

\auZhigang Chu

{}^{1\corr}

\auOliver Kosut¹ \auLalitha Sankar¹ [email protected] \add1School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, 85287, USA

Abstract

A machine learning-based detection framework is proposed to detect a class of cyber-attacks that redistribute loads by modifying measurements. The detection framework consists of a multi-output support vector regression (SVR) load predictor that predicts loads by exploiting both spatial and temporal correlations, and a subsequent support vector machine (SVM) attack detector to determine the existence of load redistribution (LR) attacks utilizing loads predicted by the SVR predictor. Historical load data for training the SVR are obtained from the publicly available PJM zonal loads and are mapped to the IEEE 30-bus system. The SVM is trained using normal data and randomly created LR attacks, and is tested against both random and intelligently designed LR attacks. The results show that the proposed detection framework can effectively detect LR attacks. Moreover, attack mitigation can be achieved by using the SVR predicted loads to re-dispatch generations.

1 Introduction

Leveraging information technology, the operation of modern electric power grids largely rely on real-time sensing, monitoring, communication, and control. State estimation (SE) utilizes the power system measurements collected by the supervisory control and data acquisition (SCADA) system to estimate the operating states. These states are used by the energy management system (EMS) to allow for real-time control of power system. In the last decade, the cyber-security of SE has been studied with considerable attention. A class of false data injection (FDI) attacks that replace measurements with counterfeits have been shown to be able to easily spoof SE and the traditional bad data detector (BDD) Liu2009 . This finding serves as the basis of a wide class of FDI attacks, called load redistribution (LR) attacks, which make it appear as if the loads are redistributed among load buses while the total load remain unchanged.

Worst-case consequences of LR attacks can be found using bi-level optimization problems. These attacks can be designed to cause physical or economic consequences. For physical consequences, Zhang2016TSG attempts to find an attack to mask the outage of a transmission line, and Liang2015 designs attacks that can cause physical overflows. For economic consequences, Moslemi2018 and Jia2014 show that LR attacks can change locational marginal prices, and/or make profit for attackers. Therefore, it is crucial to develop techniques to detect and mitigate LR attacks.

Various attack detection techniques have been presented in the literature. In An2019 , the authors propose a multivariate Gaussian-based anomaly detector trained using correlation features of micro phasor measurement units ( $\mu$ PMUs), but this detector requires installation of $\mu$ PMUs in the system. Liu et al. Liu2018 detect and identify attacks using reactance perturbation, but this method only works when the attacker has limited resources. The authors of Che2019 attempt to mitigate LR attacks using a tri-level optimization approach, and the authors of Li2019 try to identify LR attacks by monitoring abnormal load deviations and suspicious branch flow changes. However, they both only focus on attacks that cause line overflows. In Liu2019 , a financially motivated FDI attack model is analyzed and a robust incentive-reduction strategy is proposed to deter such attacks by protecting a subset of meters. More generally, machine learning techniques are also deployed in detecting LR attacks. For example, Ozay2016 proposes supervised and semi-supervised machine learning algorithms to detect FDI attacks by exploiting the relationships between statistical and geometric properties of attack vectors employed in the attack scenarios. A deep reinforcement learning-based approach is proposed to detect LR attacks in An2019a . In Pinceti2018 , three machine learning techniques are introduced for attack detection, namely nearest neighbor, semi-supervised one class SVM, and replicator neural network. These three algorithms compare estimated loads with historical loads and use thresholding to determine the existence of LR attacks.

Estimation-Detection Framework: In this paper, we introduce an LR attack detection framework based on support vector models by leveraging the historical load information commonly available to system operators. Unlike most existing approaches in the literature, our method determines the existence of LR attacks directly through the estimated loads, without requiring installations of new devices nor protection of specific measurements. When an LR attack occurs, the estimated loads obtained from the SE results are different from the true loads, but the net loads are the same. Thus, if accurate load predictions are available, the existence of LR attacks can be determined by comparing the predicted and estimated loads. Moreover, if an LR attack is detected, the predicted loads can be directly used to re-dispatch generation instead of using the estimated loads. By doing this, the attack consequences can be temporarily mitigated, giving operators time to perform other corrective actions.

Support Vector Models: In particular, we propose a support vector regression (SVR) Smola2004 based load predictor to accurately predict loads, and a subsequent support vector machine (SVM) Cortes1995 based attack detector that compares the predicted and observed loads to detect LR attacks. Our choice of this modular design aims to separate the prediction and classification, so that each module can be independently enhanced (e.g., using additional features) and also replaced by other methods, as seen fit. Support vector models are optimization-based machine learning approaches that can be used for both regression and classification purposes. There are many different machine learning methods, and we choose support vector models for the following reasons: (i) they are mature methods that have been proven to be effective for various regression/classification tasks in power systems, including transient stability assessment Yuanhang2015 , component outage estimation Eskandarpour2017 , and state estimation Kirincic2019 ; (ii) they are analytically developed models with fewer and easier to tune parameters compared to many other machine learning methods, e.g., neural networks.

SVR has been widely used for load prediction in electric power systems. In Qiang2019 , a short-term load forecasting algorithm is proposed combining SVR and particle swarm optimization. The authors of Capuno2017 proposes a SVR model that predicts very short term loads using weather data and day ahead predicted loads as features. Similar features along with additional time-related features are used to train a SVR model that predicts short term and mid term loads in Su2017 . In Azad2018 , Azad et al, predict the daily peak load using the historical peak load consumption and the corresponding temperature and relative humidity. Chong et al, propose a K-step ahead prediction using SVR in Chong2017 .

Proposed SVR Load Predictor: The aforementioned references focus on predicting the net load utilizing temporal correlation. To the best of our knowledge, we are one of the first to predict loads at each bus using SVR, leveraging both spatial and temporal correlations between all the loads in the system. Features selected for the SVR predictor include historical load values of all loads chosen at distinct time intervals prior to the target time (e.g., one hour before, one day before, etc.) as well as the specific time information (e.g., month, weekday/weekend). This choice allows for conveniently using the same features to predict loads at different buses as the temporal features for all loads implicitly capture the spatial correlations among them.

Proposed SVM Detector: SVM is a supervised learning approach to solve classification problems, based on learning separating hyperplanes. Our approach using SVR to detect attacks largely mirrors existing approaches; our key contribution is in how we generate the training data needed to learn the SVM model to classify accurately over a large class of attacks. We now describe the dataset and our approach to train and test the two models.

Dataset: We train and test our models using the publicly available PJM metered zonal load data PJM2019 . We map each of the 20 zones of the PJM data to a load bus in the IEEE 30-bus system, leveraging the fact that there are 20 loads in this system.

Training and Testing: To apply SVM on attack detection, it is necessary to create training data in both classes, namely normal and attacked data. The SVR predicted loads and the true loads (assuming trustworthy historical data) naturally form the normal data. For the attacked data, we propose a novel approach that generates random LR attacks in order to maximally explore the attack space, and thereby enhance accuracy in detecting any LR attack. Each of these attacks alters a random number of loads, and a Gaussian distribution is used to generate the deviation of each load from its true value. The severity of the attacks is controlled by varying the maximum deviation percentage over all loads. Our approach also guarantees the net load change is 0 to satisfy the constraints of LR attacks. We use 80% of the data for training, and the remaining 20% for testing.

In addition to the random attacks, we also generate two types of intelligently designed LR attacks, namely cost maximization (CM) and line overflow (LO) attacks, to test the effectiveness of our SVM attack detector. CM attacks aim to maximize the operation cost Yuan11 ; and LO attacks attempt to overflow a target transmission line Liang2015 . These two types of attacks are designed through optimizations to maximize their economic/physical consequences.

Our results show that the proposed attack estimation-detection framework can effectively predict and detect both random and intelligently designed LR attacks. Moreover, we illustrate that using the SVR predicted loads to re-dispatch when attacks are detected can significantly reduce the attack consequences.

Summary of Contributions: The key contributions of this paper are as follows:

1. We propose an LR attack detection framework consisting of an SVR load predictor and a subsequent SVM attack detector. This modular design enables separate enhancement of each block, and also provides sufficiently accurate predicted loads for attack mitigation purposes.

2. The SVR predictor leverages both temporal and spatial correlations within the historical load data to allow for prediction of bus-level loads. Through training and testing the proposed SVR predictor on the PJM metered load data PJM2019 , we show that it can accurately predict every load in the system.

3. Utilizing the SVR predicted loads, we train the SVM detector using normal data and random LR attacks designed to maximally explore the attack space.

4. The performance of the detection framework is tested on random attacks as well as two types of intelligently designed LR attacks. These attacks aim to cause economic/physical consequences. Our simulation results show that our detection framework can significantly reduce the impact of LR attacks.

The rest of this paper is organized as follows. Section 2 introduces LR attacks and existing approaches to create intelligently designed LR attacks. Section 3 describes the structure of the proposed attack detection framework, the formulations of SVR and SVM, as well as random LR attack creation method for SVM training purpose. Section 4 illustrates the performance of the SVR load predictor and the SVM attack detector. Concluding remarks are presented in Section 5.

2 Load Redistribution Attacks

2.1 Load Redistribution (LR) Attacks and Unobservable False Data Injection (FDI) Attacks

Definition 1: LR attacks are a class of cyber-attacks that redistribute loads among the buses, while keeping the net load unchanged. The false loads in an LR attack $\bm{P}_{\text{Atk}}$ satisfies

	$\displaystyle\bm{P}_{\text{Atk}}=\bm{P}+\Delta\bm{P},$		(1)
	$\displaystyle\sum_{i}\Delta{P_{i}}=0,$		(2)

where $\bm{P}$ is the true load vector, $\Delta\bm{P}$ is the load change caused by attack, and $i$ is the load index.

Definition 2: The load shift $\tau$ is defined to be the largest load change in percentage of the true loads:

\tau=\underset{i}{\max}\left|\frac{\Delta{P_{i}}}{P_{i}}\right|\times 100\%.

(3)

We use $\tau$ as an intrinsic metric to characterize the detectability of LR attacks. We found that it is trivial to detect attacks with sufficiently large $\tau$ , because load deviations far from true values are suspicious. Thus, an attacker is likely to limit $\tau$ to avoid detection by this metric. In this paper, we only consider LR attacks with $\tau\leq 20\%$ .

The most common way to generate LR attacks in the literature is through unobservable FDI attacks against power system state estimation (SE). FDI attacks are a class of cyber-attacks that involves an attacker maliciously replacing power system measurements with counterfeits. Under DC power flow assumption¹¹1For simplicity, we focus on DC power flow settings, but our work can be generalized to AC cases as in Liang2015 ., the true measurement vector z, consisting of the line power flow and bus power injection measurements, is given by

\textbf{z}=\bm{H\theta}+\bm{e},

(4)

where $\bm{\theta}$ is the state vector (voltage angles), $\bm{H}$ is the dependency matrix between measurements and states, and $\bm{e}$ is the noise vector.

Definition 3: A false measurement vector $\bar{\textbf{z}}$ created with state attack vector c,

\bar{\textbf{z}}=\bm{H}(\bm{\theta}+\textbf{c})+\bm{e},

(5)

is unobservable to the conventional bad data detector (BDD) embedded with SE, because it is not distinguishable from the true measurements if the true states were $(\bm{\theta}+\textbf{c})$ .

Let $\bm{B}$ be the dependency matrix between bus power injections and states, and let $\bm{G}$ be a given generation vector, then the bus power injections without attack can be expressed as

\bm{G}-\bm{P}=\bm{B\theta}.

(6)

With attack, the false injections are given by

\bm{G}-\bm{P}_{\text{Atk}}=\bm{B}(\bm{\theta}+\bm{c}).

(7)

Substituting (6) into (7) yields the load change vector

\Delta\bm{P}=\bm{P}_{\text{Atk}}-\bm{P}=-\bm{Bc}.

(8)

Note that since $\bm{1}^{T}\bm{B}=\bm{0}^{T}$ , the net load change is $\sum\limits_{i}\Delta{P_{i}}=-\bm{1}^{T}\bm{Bc}=0$ . Thus, given a generation dispatch, an unobservable FDI attack leads to an LR attack.

2.2 Intelligently Designed LR Attacks

Although an attacker can inject arbitrary $\bm{c}$ as long as it controls the measurements corresponding to all non-zero entries of $\bm{Hc}$ , its goal will be to maliciously choose $\bm{c}$ so that the resulting false loads can mislead the system re-dispatch to cause physical and/or economical consequences. We define these attacks as intelligent attacks, whose consequences can be maximized by solving optimization problems. In this paper, we consider two specific intelligent attacks to test the robustness of our proposed detector, namely cost maximization (CM) attacks Yuan11 and line overflow (LO) attacks Liang2015 .

CM attacks are a class of FDI attacks that aim to maximize the operation cost after re-dispatch. The attack vector $\bm{c}$ of CM attacks can be obtained through the following bi-level optimization problem:


$\displaystyle\underset{\bm{c}}{\text{maximize}}\>\hskip 5.69046pt$	$\displaystyle\bm{a}^{T}\bm{G}^{*}$	(9a)
subject to	$\displaystyle-\tau\bm{P}\leq\bm{Bc}\leq\tau\bm{P}$	(9b)
	$\displaystyle\left\{\bm{G}^{},\bm{P_{L}}^{}\right\}=\text{arg}\left\{\underset{\bm{G,P_{L}}}{\text{min}}\>\bm{a}^{T}\bm{G}\right\}$	(9c)
	$\displaystyle\text{subject to}\hskip 5.69046pt\;\sum\bm{G}=\sum\bm{P}$	(9d)
	$\displaystyle\hskip 48.36958pt\bm{P_{L}}=\bm{R}(\bm{G-P+Bc})$	(9e)
	$\displaystyle\hskip 48.36958pt-\bm{P_{L}}^{\max}\leq\bm{P_{L}}\leq\bm{P_{L}}^{\max}$	(9f)
	$\displaystyle\hskip 48.36958pt\bm{G}^{\min}\leq\bm{G}\leq\bm{G}^{\max}$	(9g)

where $\bm{a}$ is the generation cost, $\bm{P_{L}}$ is the cyber line power flows, $\bm{R}$ is the power transfer distribution factor (PTDF) matrix, $\bm{P_{L}}^{\max}$ is the line power flow limits, and $\bm{G}^{\max}$ and $\bm{G}^{\min}$ are generation upper and lower limits, respectively. In the upper level, (9a) models the attacker’s objective to maximize the operation cost, and (9b) models the load shift limit. The lower level problem (9c)-(9g) is the system DCOPF under attack. This bi-level optimization problem can be converted to a single level mixed-integer linear program (MILP) by replacing the lower level DCOPF with its Karush-Kuhn-Tucker (KKT) conditions BoydBook , and then converting the complementary slackness conditions to mixed integer constraints. The optimal $\bm{c}$ is obtained by solving the MILP.

LO attacks attempt to maximize the physical power flow on a target line $l$ after re-dispatch, and possibly cause overflows. Optimal $\bm{c}$ for LO attacks can be obtained by changing the objective function of (9) to maximizing physical power flow:

	$\displaystyle\underset{\bm{c}}{\text{maximize}}\>\hskip 5.69046pt$	$\displaystyle\left\|\bm{P_{L}}^{l*}-\bm{R}_{l}\cdot\bm{Bc}\right\|$		(10)
	subject to	$\displaystyle\eqref{eq:CM_Con_loadshift}-\eqref{eq:OPFCon:Gen_limit},$

where $\bm{P_{L}}^{l*}$ is the optimal cyber power flow on target line $l$ , $\bm{R}_{l}$ is the $l^{\text{th}}$ row of $\bm{R}$ , and the second term in (10) characterizes the impact of false loads on the physical power flow of line $l$ .

3 Proposed Attack Detection Framework

Figure 1 illustrates the structure of our proposed LR attack detection framework. During the real-time operation, features are selected from the historical load data until the current time step to capture both spatial and temporal correlations. Loads at the next time step are then predicted by the SVR load predictor using these features. One SVR model is trained for each load using the same features. Subsequently, the SVM attack detector takes the predicted loads and loads estimated after SE to determine the existence of LR attacks.

For detecting attacks, it should suffice to skip the SVR load predictor and plug all SVR features into the SVM to perform classification. However, in this paper we include the SVR for the following two reasons. The first one is that we aim to not only find an attack detection technique, but also have a corrective mechanism when attacks are detected. Using the (accurate) predicted loads to perform control actions when attacks are flagged provides time to locate the attacked measurements without causing severe consequences. The second reason is for easier scaling of the proposed models to large-scale power systems. Without the SVR predictor, the number of features used in SVM classifier will be very large, making it difficult to train and perform real-time classifications. With the SVR predictor in place, the SVM detector only needs the predicted and observed load values as features, making it useful for large-scale systems.

Refer to caption — Figure 1: Structure of the proposed LR attack detection framework.

3.1 The SVR Load Predictor

Given data samples $\bm{x}_{j}\in\mathbb{R}^{p},j=1,2,3,...,m$ and target values $\bm{y}\in\mathbb{R}^{m}$ , an SVR attempts to find the best parameters $\bm{w}_{r}$ and $b_{r}$ to fit $|y_{j}-\bm{w}_{r}^{T}\phi(\bm{x}_{j})-b_{r}|\leq\varepsilon$ by solving the following optimization problem Smola2004 :


$\displaystyle\underset{\bm{w}_{r},b_{r},\zeta_{j},\zeta_{j}^{\prime}}{\text{minimize}}\>\hskip 5.69046pt$	$\displaystyle\frac{1}{2}\bm{w}_{r}^{T}\bm{w}_{r}+M\sum\limits_{j=1}^{n}(\zeta_{j}+\zeta_{j}^{\prime})$	(11a)
subject to	$\displaystyle y_{j}-\bm{w}_{r}^{T}\phi(\bm{x}_{j})-b_{r}\leq\varepsilon+\zeta_{j}\hskip 8.5359pt(\alpha_{j})$	(11b)
	$\displaystyle\bm{w}_{r}^{T}\phi(\bm{x}_{j})+b_{r}-y_{j}\leq\varepsilon+\zeta_{j}^{\prime}\hskip 8.5359pt(\alpha_{j}^{\prime})$	(11c)
	$\displaystyle\zeta_{j},\zeta_{j}^{\prime}\geq 0,\forall j,$	(11d)

where $\varepsilon$ is the regression tolerance, $\zeta_{j},\zeta_{j}^{\prime}$ are slack variables to allow for outliers, $M$ is the penalty factor for outliers, $\alpha_{j},\alpha_{j}^{\prime}$ are dual variables, and $\phi(\cdot)$ is a function that implicitly maps the data samples to a higher dimensional space. The dual formulation has a smaller number of variables and allows for applying the kernel trick:


$\displaystyle\underset{\bm{\alpha,\alpha^{\prime}}}{\textnormal{minimize}}\hskip 8.5359pt$	$\displaystyle\frac{1}{2}(\bm{\alpha-\alpha^{\prime}})^{T}\bm{Q}(\bm{\alpha-\alpha^{\prime}})$	(12a)
	$\displaystyle+\varepsilon\bm{1}^{T}(\bm{\alpha+\alpha^{\prime}})-y^{T}(\bm{\alpha-\alpha^{\prime}})$
subject to	$\displaystyle\bm{1}^{T}(\bm{\alpha-\alpha^{\prime}})=0$	(12b)
	$\displaystyle 0\leq\alpha_{j},\alpha_{j}^{\prime}\leq M,\forall j$	(12c)

where $\bm{Q}$ is a positive semi-definite matrix, and $Q_{ij}=Q(\bm{x}_{i},\bm{x}_{j})=\phi(\bm{x}_{i})^{T}\phi(\bm{x}_{j})$ is the kernel. Once the optimal solutions ( $\bm{\alpha}^{*},\bm{\alpha}^{\prime*}$ ) are obtained, the regression value $y_{\text{new}}$ of a new data sample $\bm{x}_{\text{new}}$ can be computed as

y_{\text{new}}=\sum_{j=1}^{n}(\alpha_{j}^{*}-\alpha_{j}^{\prime*})Q(\bm{x}_{j},\bm{x}_{\text{new}}).

(13)

To accurately predict the load values, many different features can be used, including time, weather, temperature, location, and load type (residential/commercial/industrial). Intuitively, it would be the best if we use all the features to perform the prediction, but many of them are unavailable, and some of them may be redundant. The features used in the SVR load predictor also depend on the available dataset. For example, the time step of the prediction depends on how frequently the historical load data are recorded. For the specific dataset we use in this paper, we select time information and historical load values at several time points relative to the target time to capture the temporal correlation, and load values at the same time points for all loads to capture the spatial correlation. Details of selected features for the SVR load predictor will be given in Section 4.1.

3.2 The SVM Attack Detector

Given data samples $\bm{u}_{j}\in\mathbb{R}^{q},j=1,2,3,...n$ and a vector of class labels $\bm{v}\in\{1,-1\}^{n}$ , an SVM attempts to find the decision boundary with the maximal margin to best classify $\bm{u}_{j}$ by solving the following optimization problem Cortes1995 :


$\displaystyle\underset{\bm{w}_{m},b_{m},\lambda_{j}}{\text{minimize}}\>\hskip 5.69046pt$	$\displaystyle\frac{1}{2}\bm{w}_{m}^{T}\bm{w}_{m}+C\sum\limits_{j=1}^{n}\lambda_{j}$	(14a)
subject to	$\displaystyle{v}_{j}(\bm{w}_{m}^{T}\phi(\bm{u}_{j})+b_{m})\geq 1-\lambda_{j}\hskip 8.5359pt(\beta_{j})$	(14b)
	$\displaystyle\lambda_{j}\geq 0,\forall j.$	(14c)

Similar to the SVR formulation in (11), $\lambda_{j}$ is a slack variable to allow for outliers, $C$ is its penalty factor, and $\beta_{j}$ is the dual variable. Again, applying the kernel trick, the dual formulation is used:


$\displaystyle\underset{\bm{\beta}}{\text{minimize}}\hskip 8.5359pt$	$\displaystyle\frac{1}{2}\bm{\beta}^{T}\bm{Q\beta}-\bm{1}^{T}\bm{\beta}$	(15a)
subject to	$\displaystyle\bm{v}^{T}\bm{\beta}=0$	(15b)
	$\displaystyle 0\leq\beta_{j}\leq C,\forall j.$	(15c)

Note that here $Q_{ij}=v_{i}v_{j}Q(\bm{u}_{i},\bm{u}_{j})=v_{i}v_{j}\phi(\bm{u}_{i})^{T}\phi(\bm{u}_{j})$ . Once the optimal solution $\bm{\beta}$ is acquired, the label $v_{\text{new}}$ for a new input data sample $\bm{u}_{\text{new}}$ can be obtained by

v_{\text{new}}=\text{sgn}(\sum_{j=1}^{n}v_{j}\beta_{j}^{*}Q(\bm{u}_{j},\bm{u}_{\text{new}}))

(16)

where $\text{sgn}(\cdot)$ is the sign function. The features in $\bm{u}_{j}$ include the SVR predicted loads, the observed loads, and the same time information used in the SVR.

3.3 Generating Random LR Attacks to Train the SVM

We train the SVM detector using normal data and randomly designed LR attacks. The SVM detector trained using random attacks is expected to maximally explore the space of LR attacks, and hence, perform well in detecting any LR attacks. Given true loads $\bm{P}$ , the false loads $\bm{P}_{\text{Atk}}$ in these random attacks are acquired using (1), $\bm{P}_{\text{Atk}}=\bm{P}+\Delta\bm{P}$ . Thus, finding $\bm{P}_{\text{Atk}}$ is equivalent to finding $\Delta\bm{P}$ . In each attack, we assume the attacker changes $K$ loads at random, whose indices form a set $\mathcal{K}$ , so that $\Delta P_{\mathcal{K}(k)}$ indicates the load change of the $k^{\text{th}}$ attacked load, $k=1,2,\dots,K$ . The load changes of these attacked loads, denoted $\bm{\gamma}$ , can be arbitrary. However, according to the LR attack property (2), they must be constrained to have a 0 sum. Thus, we model $\bm{\gamma}$ with a joint Gaussian distribution with 0 mean and covariance matrix $\bm{\Gamma}$ :

	$\displaystyle\bm{\gamma}$	$\displaystyle\sim\mathcal{N}(\bm{0,\Gamma})$		(17)
	$\displaystyle\gamma_{k}$	$\displaystyle=\Delta P_{\mathcal{K}(k)}.$		(18)

Given a load shift $\tau$ , the diagonal entries of $\bm{\Gamma}$ must satisfy

\Gamma_{kk}=Var(\gamma_{k})=(\frac{1}{2}\tau P_{\mathcal{K}(k)})^{2},\forall k

(19)

to ensure that the probability of $|\gamma_{k}|\leq\tau P_{\mathcal{K}(k)}$ is 95%, because the probability of deviating beyond $2\times$ standard deviation in a Gaussian distribution is 5%. Recall that the load changes caused by a valid LR attack must satisfy (2), which can be rewritten as

\sum\limits_{i}\Delta P_{i}=\sum\limits_{k}\Delta P_{\mathcal{K}(k)}=\bm{1}^{T}\bm{\gamma}=0.

(20)

Eq. (20) is equivalent to

$\displaystyle E[(\bm{1}^{T}\bm{\gamma})^{2}]$	$\displaystyle=E[\bm{1}^{T}\bm{\gamma}\bm{\gamma}^{T}\bm{1}]$
	$\displaystyle=\bm{1}^{T}\bm{\Gamma}\bm{1}$
	$\displaystyle=0.$	(21)

Finding a valid $\bm{\gamma}$ is equivalent to finding a positive semidefinite matrix $\bm{\Gamma}$ that satisfies (19) and (21). Since $\bm{\Gamma}$ is a covariance matrix, it must be positive semidefinite:

\bm{\Gamma}\succeq 0.

(22)

Any $\bm{\Gamma}$ satisfying (19), (21) and (22) would suffice for our application. Finding $\bm{\Gamma}$ is equivalent to solving a semidefinite program with arbitrary objective, constrained by (19), (21) and (22). The procedure to acquire false loads $\bm{P}_{\text{Atk}}$ is summarized in Alg. 1. Varying the attack hour $h$ , load shift $\tau$ , and number of attacked loads $K$ , we can find feasible $\bm{\Gamma}$ to obtain $\bm{\gamma}$ using (17), and subsequently create an arbitrary number of false loads $\bm{P}_{\text{Atk}}$ using (1). Note that for specific combinations of $h,\tau,K$ , and $\mathcal{K}$ , sometimes no feasible $\bm{\Gamma}$ can be found, but we can simply re-run Alg.1 with different inputs. Since (17) is drawing $\bm{\gamma}$ randomly from a Gaussian distribution, the resulting real load shift $\tau_{r}$ of $\bm{P}_{\text{Atk}}$ may be different than the input $\tau$ . We keep drawing $\bm{\gamma}$ until $\tau_{r}\leq\tau$ . The false loads created are then used to generate data samples to train and test the SVM detector.

Algorithm 1 Generating random LR attack false loads

Input: $h$ , $K$ , $\tau$
Output: $\bm{P}_{\text{Atk}}$

1.

Obtain the true loads $\bm{P}$ at hour $h$ .
2.

Randomly select $K$ loads to attack and let $\mathcal{K}$ denote the set of indices of the attacked loads.
3.

Find a $\bm{\Gamma}$ satisfying (19), (21) and (22) with $\tau,K,\mathcal{K}$ , and $\bm{P}$ . This can be done by solving a semidefinite program with arbitrary objective, constrained by (19), (21) and (22). If no feasible $\bm{\Gamma}$ can be found, terminate.
4.

Draw the non-zero load changes $\bm{\gamma}$ from $\mathcal{N}(\bm{0,\Gamma})$ and obtain false loads $\bm{P}_{\text{Atk}}$ using (1).
5.

Calculate the real load shift $\tau_{r}$ of $\bm{P}_{\text{Atk}}$ using (3). If $\tau_{r}>\tau$ , go to step 4). Otherwise, terminate.

4 Numerical Results

We use the publicly available PJM zonal hourly metered load data PJM2019 from 2015 through 2018 for 20 transmission zones as the historical data to train and test our LR attack detection framework. In order to conveniently create intelligently designed LR attacks as described in Section 2.2, we map each zone to a load bus in the IEEE 30-bus system, leveraging the fact that there are 20 loads in this system. The mapping relationship is adopted from Pinceti2018 , and all load values are multiplied by a scaling factor of $1.308\times 10^{-3}$ to obtain a system with moderate amount of congestion. Table 1 describes the mapping rules between load indices, PJM zones, and bus indices. The SVR and SVM models are implemented in Python using the Scikit-learn package sklearn . The random, CM and LO attack creation are implemented in Matlab with solver Gurobi. All experiments are conducted on a 2.7 GHz CPU with 32 GB RAM.

Table 1: Mapping rules between load indices, PJM zones, and bus indices

Load	Zone	Bus	Load	Zone	Bus
1	DOM	2	11	PL	17
2	AE	3	12	PN	18
3	JC	4	13	PE	19
4	CE	7	14	RECO	20
5	AEP	8	15	ATSI	21
6	DPL	10	16	DUQ	23
7	PS	12	17	BC	24
8	DEOK	14	18	ME	26
9	PEP	15	19	EKPC	29
10	DAY	16	20	AP	30

4.1 The SVR Load Predictor Performance

In this section, we provide details on training and testing the SVR load predictor. As mentioned above, given the hourly load data we have, our SVR load predictor aims to accurately predict the load values at hour $h+1$ when the current hour is $h$ . The features we use include time information and historical load values up to hour $h$ . We select month ( $mo$ ), hour ( $hr$ ), and weekday/weekend ( $wd$ ) as the time information features, $\bm{t}=[mo,wd,hr]$ . Note that $hr$ here is the wall clock time, for example, $hr=14$ for 2 PM, and is different than $h$ , which is a unique point in time. Here we only distinguish between weekdays and weekends since loads tend to be similar during weekdays, i.e., $wd=1$ for weekdays and $wd=2$ for weekends. The temporal correlation of each load is captured by including its historical values, at hour $h$ and $s$ previous hours; and at hour $hr$ and $hr+1$ of $d$ previous days, as features. For load $i$ , the load value features $\bm{f}_{i}$ are given by

	$\displaystyle\bm{f}_{i}=[$	$\displaystyle{P}_{i}^{h},{P}_{i}^{h-1},...,{P}_{i}^{h-s},{P}_{i}^{h-24d},$
		$\displaystyle{P}_{i}^{h-24d+1},...,{P}_{i}^{h-24},{P}_{i}^{h-23}].$		(23)

To capture the spatial correlations, we concatenate the load value features of all the loads.

The multi-output SVR load predictor is achieved by solving one SVR optimization problem (11) for each load. In our experiments, we trained three SVR models to justify the contribution of capturing spatial correlations, as well as to see the influence of different selected features. Model 1 predicts each load using only time information $\bm{t}$ and its own load value features. A data sample used in Model 1 to predict load $i$ is given by

\displaystyle\bm{x}_{j,i}=[\bm{t},\bm{f}_{i}]\forall i.

(24)

Model 2 and 3 use $\bm{t}$ and $\bm{f}_{i},\forall i,$ as features to predict all loads. A data sample in these two models is given by

\displaystyle\bm{x}_{j}=[\bm{t},\bm{f}_{1},\bm{f}_{2},...\bm{f}_{n_{l}}],

(25)

where $n_{l}$ is the number of loads in the system. In Model 2, $s=3$ and $d=2$ ; and in Model 3, $s=4$ and $d=3$ . The ground truth ${y}_{j,i}={P}_{i}^{h+1}$ is the true load value at hour $h+1$ for load $i$ . Table 2 presents some properties of the three tested SVR models. Comparing Models 1 and 2, we can see the influence of considering spatial correlations in addition to temporal correlations, as these two models use the same temporal features, but Model 2 additionally uses the features of all the loads to capture spatial correlations.

Table 2: Statistics of SVR models

Model	$s$	$d$	$m$	$p$	Training time (h)
1	3	2	35011	11	1.927
2	3	2	35011	163	4.234
3	4	3	34987	223	33.324

The dimension of the data matrix $\bm{X},m\times p,$ and target value matrix $\bm{Y},m\times n_{l},$ depend on the values of $s$ and $d$ . Derivation of $m$ and $p$ are described in the Appendix. For each model, the training data matrix $\bm{X}_{\text{train}}$ contains all data from 2015 - 2017, and data in 2018 are used as $\bm{X}_{\text{test}}$ . Each column of $\bm{X}_{\text{train}}$ is scaled to zero mean and unit variance, and each column of $\bm{X}_{\text{test}}$ is scaled using the mean and variance of the corresponding column in $\bm{X}_{\text{train}}$ . The same split and scaling are performed on $\bm{Y}$ to obtain $\bm{Y}_{\text{train}}$ and $\bm{Y}_{\text{test}}$ as well. The parameters in training the SVR models are chosen as $\varepsilon=10^{-2}$ and $M=100$ . The radial basis function (RBF) kernel

Q(\bm{x}_{i},\bm{x}_{j})=-\sigma\|\bm{x}_{i}-\bm{x}_{j}\|^{2}

(26)

is used with $\sigma=10^{-2}$ . Applying the trained SVR predictor on $\bm{X}_{\text{train}}$ and $\bm{X}_{\text{test}}$ yields the predicted loads $\hat{\bm{Y}}_{\text{train}}$ and $\hat{\bm{Y}}_{\text{test}}$ , respectively.

Two metrics are used to evaluate the performance of the SVR load predictor, namely root mean square error (RMSE) and mean absolute percentage error (MAPE). RMSE measures the square root of the average squared error for each load, and hence the unit is MW. MAPE measures on average how much the predicted loads deviate from their true values in percentage. These metrics for each load $i$ are calculated as

	$\displaystyle\text{RMSE}_{\text{train},i}=\sqrt{\frac{1}{m}\sum_{j=1}^{m}(\bm{Y}_{\text{train},i,j}-\hat{\bm{Y}}_{\text{train},i,j})^{2}}$		(27)
	$\displaystyle\text{MAPE}_{\text{train},i}=\frac{1}{m}\sum_{j=1}^{m}\left\|\frac{\bm{Y}_{\text{train},i,j}-\hat{\bm{Y}}_{\text{train},i,j}}{\bm{Y}_{\text{train},i,j}}\right\|$		(28)

where $\bm{Y}_{\text{train},i}$ is the $i^{\text{th}}$ column of $\bm{Y}_{\text{train}}$ , and $\bar{\bm{Y}}_{\text{train},i}$ is its mean. These metrics are similarly applied on $\bm{Y}_{\text{test}}$ to evaluate the performance of the SVR load predictor on testing data.

Figures 2 illustrates the RMSE and MAPE for the SVR models. RMSE values largely depend on the load values itself, for example, load 5 has the largest RMSE value because it is the biggest load in the system. From Figure 2(b) we can see that the MAPE for most loads are around 1%, and MAPE for load 19, the most difficult load to predict, is around 2%. Comparing these quantities for Models 1 and 2, we can see that they are both smaller for Model 2. Recall that the difference between Models 1 and 2 is that Model 2 considers all prior loads, while Model 1 only includes the prior data at the load of interest. This result indicates that considering spatial correlations does improve the performance of the SVR load predictor. Comparing Models 2 and 3, it can be concluded that including too much historical data as features decreases the accuracy of the SVR load predictor. Besides, it can be seen from Table 2 that using too many features makes it extremely slow in training the SVR model. Thus, in the following sections, Model 2 is adopted to generate predicted loads used by the SVM attack detector.

4.2 The SVM Attack Detector Performance on Random Attacks

The outputs of the SVR load predictor are used as input features of the SVM attack detector. Depending on the existence of attack, input data samples of the SVM are given by


$\displaystyle\bm{u}_{j}$	$\displaystyle=[mo,wd,hr,\hat{\bm{P}},\bm{P}],\text{if }v_{j}=-1,$	(29a)
$\displaystyle\bm{u}_{j}$	$\displaystyle=[mo,wd,hr,\hat{\bm{P}},\bm{P}_{\text{Atk}}],\text{if }v_{j}=1,$	(29b)

where $v_{j}=-1$ indicates that there is no attack, and $v_{j}=1$ otherwise. The predicted loads $\hat{\bm{P}}$ of $m=35011$ hours, along with their ground truth values $\bm{P}$ and time information, yield $35011$ normal data samples for the SVM detector in the form of (29a). The length of each data sample $q=3+20\times 2=43$ . The normal data matrix $\bm{U}_{\text{normal}}$ is of size $35011\times 43$ . We randomly select 80% of these vectors for training and the remaining 20% for testing. We create $10^{5}$ attacked data samples in the form of (29b) using Alg. 1, resulting in $\bm{U}_{\text{attack}}$ of size $10^{5}\times 43$ with real load shift $\tau_{r}$ ranging from 1% to 20%. From now on, we omit the subscript in $\tau_{r}$ for easier presentation.

We obtain different SVM models to compare their performances by varying the penalty factor $C$ and $\tau_{\min}$ (the minimal $\tau$ used in the training data). The normal data in the training data matrix $\bm{U}_{\text{train}}$ are the same for all models, i.e., the same 80% of $\bm{U}_{\text{normal}}$ . The attacked data in $\bm{U}_{\text{train}}$ include 80% of attacked data samples with $\tau\geq\tau_{\min}$ . The testing data $\bm{U}_{\text{test}}$ consists of the remaining 20% of attacked data that are not used in training with all load shifts, and are the same for all models. For each model, every column of training data matrix $\bm{U}_{\text{train}}$ is scaled to zero mean and unit variance, and the same scaling is performed to the testing data. The kernel function used in the SVM detector is also the RBF kernel in the form of (26), but this time $\sigma$ is calculated as $\sigma=1/q$ (this is the “scale” option in Scikit-learn).

Figure 3 illustrates the effect of $\tau_{\min}$ on missed detection rate and false alarm rate. The false alarm rate is calculated by applying the detector on all $m=35011$ normal data samples, including both training and testing. The parameter $C$ is fixed at $1000$ . $\tau_{\min}$ controls the amount of attacked training data. For instance, if $\tau_{\min}=3\%$ , $\bm{U}_{\text{train}}$ contains 80% of attacks with $\tau\geq 3\%$ , but does not contain any attack with $\tau<3\%$ . Intuitively, attacks with higher $\tau$ are further away from the normal data than those with lower $\tau$ . Thus, a detector trained with a low $\tau_{\min}$ will have a high false alarm rate, as the SVM is trying to find a decision boundary between normal data and attacks with small load shift. However, it should perform better in detecting attacks with small $\tau$ than detectors trained with large $\tau_{\min}$ . In Figure 3, the blue lines indicate the missed detection rate of attacks with certain load shift $\tau$ , and the red line shows the false alarm rate. It can be seen that as $\tau_{\min}$ increases, the false alarm rate decreases, but the missed detection rate increases for attacks with small load shifts. This observation justifies the intuition discussed above, indicating that $\tau_{\min}$ is indeed a trade-off between false alarm rate and detection probability for small attacks. Note that for attacks with large $\tau$ , the effect of $\tau_{\min}$ is insignificant. For testing attacks with extremely small $\tau$ , the missed detection rates are very high even with small $\tau_{\min}$ , because these attacks are in principle very difficult to detect. However, these attacks are also unlikely to cause severe consequences. From Figure 3, we can see that $\tau_{\min}=3\%$ is a good choice for our dataset.

The parameter $C$ trades off misclassification of training examples against simplicity of the decision boundary. A small $C$ makes the decision boundary smooth, while a large $C$ aims at classifying all training samples correctly. Therefore, detector with large $C$ is expected to have a better performance. However, a large $C$ allows for fewer outliers, making it harder to solve the SVM optimization problem (14), so the training time increases. Figure 4 shows the performance of models trained with different $C$ on testing random attacks while fixing $\tau_{\min}=3\%$ . The larger $C$ is, the higher detection probability we can achieve. This model performs well on attacks with large $\tau$ , and the detection probability almost achieves 100% starting at $\tau=7\%$ . System operators can similarly vary $\tau_{\min}$ and $C$ to obtain SVM model with satisfactory performance, in terms of false alarm rate and missed detection rate.

4.3 The SVM Attack Detector Performance on Intelligently Designed LR Attacks

In this section, we evaluate the performance of the trained SVM detector on cost maximization (CM) and line overflow (LO) attacks. According to the previous section, here we choose SVM parameters $C=2000$ and $\tau_{\min}=3\%$ to balance false alarm rate and missed detection. The procedures to generate these attacks are described as follows. On the IEEE 30-bus system, we first perform base case DCOPF for each hour in year 2015 through 2018 using the true loads. At hour $h$ , if there are at least 2 lines whose power flows are greater than 80% of their ratings, we say those lines are critical lines, and $h$ is a critical hour. The total number of critical hours is found to be 8038. We focus on critical hours because the false loads are likely to cause congestions at those times, which in turn change the generation dispatch to have consequences. For each critical hour, we solve optimization problem (9) 20 times to obtain attack vector $\bm{c}$ fo CM attacks with $\tau=1\%,2\%,\dots,20\%$ . For each critical line, we solve (10) 20 times to obtain $\bm{c}$ for LO attacks, also with $\tau=1\%,2\%,\dots,20\%$ . Every non-zero $\bm{c}$ is used to construct false load vector $\bm{P}_{\text{Atk}}$ as in (8). If a $\bm{P}_{\text{Atk}}$ makes the DCOPF infeasible, it is discarded. The total number of false loads for CM attacks and LO attacks are 113031 and 343135, respectively.

Figure 5(a) illustrates the detection probability versus the load shift $\tau$ on CM and LO attacks. For both attacks, the detection probabilities almost achieve 100% when $\tau\geq 4\%$ . For attacks with $\tau=3\%$ , the detector performance drops to 97% for LO attacks, but it is still perfect in detecting CM attacks. Comparing with the performance on random attacks as shown in Figure 4, it can be seen that intelligently designed attacks are easier to detect than random attacks.

Figure 5(b) illustrates the detection probability versus load shift $\tau$ on CM and LO attacks with consequences. CM attacks with consequences are those that increase the operating cost by more than 1%. LO attacks with consequences are those result in physical overflows. Comparing Figures 5(a) and 5(b), it can be seen that the detector performs even better on attacks with consequences.

4.4 Attack Mitigation

If LR attack is flagged by our detection framework, the simplest way to mitigate the attacks is to temporarily use the loads output by the SVR load predictor for re-dispatching purposes. To test the mitigation performance using this method, we compare the worst consequences of intelligently designed attacks with and without our detection framework.

In order to obtain the consequences, we run DCOPF three times using different loads. Under normal operation, running DCOPF with true loads $\bm{P}_{\text{normal}}$ yields the attack-free generation dispatch $\bm{G}_{\text{normal}}$ . Using attacked loads $\bm{P}_{\text{Atk}}$ to run DCOPF gives attacked dispatch $\bm{G}_{\text{Atk}}$ . Applying $\bm{G}_{\text{Atk}}$ on true loads $\bm{P}_{\text{normal}}$ yields attacked line flows $\bm{P}_{\bm{L},\text{Atk}}=\bm{R}(\bm{G}_{\text{Atk}}-\bm{P}_{\text{normal}})$ . When an attack is detected, the system runs DCOPF using the SVR predicted loads $\bm{P}_{\text{SVR}}$ and the resulting dispatch is $\bm{G}_{\text{SVR}}$ . The corresponding line flows are given by $\bm{P}_{\bm{L},\text{SVR}}=\bm{R}(\bm{G}_{\text{SVR}}-\bm{P}_{\text{normal}})$ .

Figure 6(a) illustrates the mitigation results for CM attacks. The word “maximum” on the y-axis indicates the worst consequence among all attacks with each load shift $\tau$ . The red line indicates the maximum cost increase without using our proposed detection framework, calculated as $\bm{a}^{T}(\bm{G}_{\text{Atk}}-\bm{G}_{\text{normal}})$ (recall that $\bm{a}$ is the generation cost vector). When an attack is detected, the resulting cost increase is obtained by $\bm{a}^{T}(\bm{G}_{\text{SVR}}-\bm{G}_{\text{normal}})$ . When the detector fails to detect an attack, the cost increase is the attack consequence $\bm{a}^{T}(\bm{G}_{\text{Atk}}-\bm{G}_{\text{normal}})$ . Thus, for each load shift, if all attacks are detected, the data point on the blue line is given by $\bm{a}^{T}(\bm{G}_{\text{SVR}}-\bm{G}_{\text{normal}})$ . Otherwise, it is $\text{max}[\bm{a}^{T}(\bm{G}_{\text{Atk}}-\bm{G}_{\text{normal}}),\bm{a}^{T}(\bm{G}_{\text{SVR}}-\bm{G}_{\text{normal}})]$ . Similar procedure is performed to create Figure 6(b) for LO attacks. The red line is obtained by taking the maximum $\bm{P}_{\bm{L},\text{Atk}}^{l}$ for each load shift (line $l$ is the target line). The blue line is obtained by $\bm{P}_{\bm{L},\text{SVR}}^{l}$ if all attacks are detected, and $\text{max}[\bm{P}_{\bm{L},\text{Atk}}^{l},\bm{P}_{\bm{L},\text{SVR}}^{l}]$ otherwise.

From Figures 6(a), we can see that for load shift $\tau\geq 3\%$ , the increases in operation cost are significantly reduced by using SVR predicted loads when an attack is flagged. For LO attacks, the overflows are significantly reduced for load shift $\tau\geq 4\%$ . The largest cost increase caused by CM attacks that are not detected is 8.17% (at $\tau=2\%$ ), and the largest overflow caused by LO attacks that are not detected is 3.96% (at $\tau=3\%$ ). Thus, even though our detector fails to detect a small number of attacks, their consequences are minor. Note that at $\tau=1\%$ , using the SVR predicted loads leads to larger overflow due to inaccurate predictions, but the overflow is still very small. Therefore, the consequences of LR attacks can be successfully mitigated using the SVR predicted loads, which gives operators time to take other corrective actions.

5 Concluding Remarks

A machine learning based load redistribution (LR) attack detection framework is proposed. This detection framework consists of a support vector regression (SVR)-based load predictor and a support vector machine (SVM)-based attack detector. The SVR load predictor is trained using features selected from historical load data to capture both spatial and temporal correlations. The prediction results indicate that the SVR load predictor can accurately predict loads at all buses. The SVM attack detector is trained using randomly generated LR attacks, and is shown to be effective in detecting both randomly generated and intelligently designed attacks, especially those with consequences. Using the proposed attack detection framework, system operators can make control decisions based on the predicted loads when attack is flagged to mitigate the consequence of the attacks. It also gives operators time to find the source of the attacks. Future work will include finding attack localization techniques that help system operators identify the loads and/or meters that are modified by the attacker.

Acknowledgment

This material is based on work supported by the National Science Foundation (NSF) under grant number CNS-1449080, and two grants from the Power System Engineering Research Center (PSERC) S-72 and S-74.

References

[1] Liu, Y., Ning, P., Reiter, M.K. ‘False data injection attacks against state estimation in electric power grids’. In: 16th ACM Conference on Computer and Communications Security. CCS ’09. (Chicago, Illinois, USA, 2009. pp. 21–32
[2] Zhang, J., Sankar, L.: ‘Physical system consequences of unobservable state-and-topology cyber-physical attacks’, IEEE Transactions on Smart Grid, 2016, 7, (4), pp. 2016–2025
[3] Liang, J., Sankar, L., Kosut, O.: ‘Vulnerability analysis and consequences of false data injection attack on power system state estimation’, IEEE Transactions on Power Systems, 2016, 31, (5), pp. 3864–3872
[4] Moslemi, R., Mesbahi, A., Velni, J.M.: ‘Design of robust profitable false data injection attacks in multi-settlement electricity markets’, IET Generation, Transmission Distribution, 2018, 12, (6), pp. 1263–1270
[5] Jia, L., Kim, J., Thomas, R.J., Tong, L.: ‘Impact of data quality on real-time locational marginal price’, IEEE Trans Power Systems, 2014, 29, (2), pp. 627–636
[6] An, Y., Liu, D.: ‘Multivariate gaussian-based false data detection against cyber-attacks’, IEEE Access, 2019, 7, pp. 119804–119812
[7] Liu, C., Wu, J., Long, C., Kundur, D.: ‘Reactance perturbation for detecting and identifying fdi attacks in power system state estimation’, IEEE Journal of Selected Topics in Signal Processing, 2018, 12, (4), pp. 763–776
[8] Che, L., Liu, X., Li, Z.: ‘Mitigating false data attacks induced overloads using a corrective dispatch scheme’, IEEE Transactions on Smart Grid, 2019, 10, (3), pp. 3081–3091
[9] Li, X., Hedman, K.W.: ‘Enhancing power system cyber-security with systematic two-stage detection strategy’, IEEE Transactions on Power Systems, 2019, pp. 1–1
[10] Liu, C., Zhou, M., Wu, J., Long, C., Kundur, D.: ‘Financially motivated fdi on sced in real-time electricity markets: Attacks and mitigation’, IEEE Transactions on Smart Grid, 2019, 10, (2), pp. 1949–1959
[11] Ozay, M., Esnaola, I., Yarman Vural, F.T., Kulkarni, S.R., Poor, H.V.: ‘Machine learning methods for attack detection in the smart grid’, IEEE Transactions on Neural Networks and Learning Systems, 2016, 27, (8), pp. 1773–1786
[12] An, D., Yang, Q., Liu, W., Zhang, Y.: ‘Defending against data integrity attacks in smart grid: A deep reinforcement learning-based approach’, IEEE Access, 2019, 7, pp. 110835–110845
[13] Pinceti, A., Sankar, L., Kosut, O. ‘Load redistribution attack detection using machine learning: A data-driven approach’. In: 2018 IEEE Power Energy Society General Meeting (PESGM). (, 2018. pp. 1–5
[14] Smola, A.J., Sch lkopf, B.: ‘A tutorial on support vector regression’, Statistics and Computing, 2004,
[15] Cortes, C., Vapnik, V.: ‘Support-vector networks’, Machine Learning, 1995, 20, (3), pp. 273–297
[16] Yuanhang, D., Lei, C., Weiling, Z., Yong, M. ‘Multi-support vector machine power system transient stability assessment based on relief algorithm’. In: 2015 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC). (, 2015. pp. 1–5
[17] Eskandarpour, R., Khodaei, A. ‘Component outage estimation based on support vector machine’. In: 2017 IEEE Power Energy Society General Meeting. (, 2017. pp. 1–5
[18] Kirincic, V., Ceperic, E., Vlahinic, S., Lerga, J.: ‘Support vector machine state estimation’, Applied Artificial Intelligence, 2019, 33, (6), pp. 517–530
[19] Qiang, S., Pu, Y.: ‘Short-term power load forecasting based on support vector machine and particle swarm optimization’, Journal of Algorithms & Computational Technology, 2019, 13
[20] Capuno, M., Kim, J.S., Song, H.: ‘Very short-term load forecasting using hybrid algebraic prediction and support vector regression’, Mathematical Problems in Engineering, 2017,
[21] Su, F., Xu, Y., Tang, X. ‘Short-and mid-term load forecasting using machine learning models’. In: 2017 China International Electrical and Energy Conference (CIEEC). (, 2017. pp. 406–411
[22] Azad, M.K., Uddin, S., Takruri, M. ‘Support vector regression based electricity peak load forecasting’. In: 2018 11th International Symposium on Mechatronics and its Applications (ISMA). (, 2018. pp. 1–5
[23] Chong, L.W., Rengasamy, D., Wong, Y.W., Rajkumar, R.K. ‘Load prediction using support vector regression’. In: TENCON 2017 - 2017 IEEE Region 10 Conference. (, 2017. pp. 1069–1074
[24] PJM. ‘PJM metered hourly zonal load data’. (PJM), 2019. PJM Data Miner 2 https://dataminer2.pjm.com/feed/hrl_load_metered/definition
[25] Yuan, Y., Li, Z., Ren, K.: ‘Modeling load redistribution attacks in power systems’, Smart Grid, IEEE Transactions on, 2011, 2, (2), pp. 382–390
[26] Boyd, S., Vandenberghe, L.: ‘Convex Optimization’. (Cambridge University Press, 2004)
[27] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: ‘Scikit-learn: Machine learning in Python’, Journal of Machine Learning Research, 2011, 12, pp. 2825–2830

Appendix

The parameters $s$ and $d$ in (23) determines the dimension of SVR input data matrix $\bm{X},m\times p$ . For example, for Model 2, $s=3$ and $d=2$ , the length of $\bm{f}_{i}$ is given by

n_{f}=s+1+2d=8.

(30)

The resulting data sample length $p=3+20\times n_{f}=163$ . Since we use load values of previous $d=2$ days as features, the start hour of our data is 01/03/2015, 0 AM. The end hour is 12/31/2018, 10 PM because for 12/31/2018, 11 PM, we do not have ground truth values of its next hour. In each of the four years, the hour when daylight saving time ends has two load values with identical time stamps, and we approximate the load value at this hour by taking the average of those two values. As a result, the number of data samples for the SVR load predictor is

m=(365\times 3+366-d)*24-1-4=35011.

(31)

The target values for hour $h$ are the metered loads of the 20 zones at hour $h+1$ . Thus, for each data sample of length $p=163$ , the SVR outputs a vector of length 20 as prediction. We use the first $26253$ data samples in year 2015 through 2017 to train the SVR load predictor and use the remaining $8758$ data samples in 2018 to test its performance. The resulting training data matrix $\bm{X}_{\text{train}}$ is of size $26253\times 163$ , training target value matrix $\bm{Y}_{\text{train}}$ is of size $26253\times 20$ , testing data matrix $\bm{X}_{\text{test}}$ is of size $8758\times 163$ , and the testing target value matrix $\bm{Y}_{\text{test}}$ is of size $8758\times 20$ . The dimensions of these matrices for other models can be similarly determined.