This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\supertitle

Submission Template for IET Research Journal Papers

Detecting Load Redistribution Attacks via Support Vector Models

\auZhigang Chu1\corr{}^{1\corr}    \auOliver Kosut1    \auLalitha Sankar1 [email protected] \add1School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, 85287, USA
Abstract

A machine learning-based detection framework is proposed to detect a class of cyber-attacks that redistribute loads by modifying measurements. The detection framework consists of a multi-output support vector regression (SVR) load predictor that predicts loads by exploiting both spatial and temporal correlations, and a subsequent support vector machine (SVM) attack detector to determine the existence of load redistribution (LR) attacks utilizing loads predicted by the SVR predictor. Historical load data for training the SVR are obtained from the publicly available PJM zonal loads and are mapped to the IEEE 30-bus system. The SVM is trained using normal data and randomly created LR attacks, and is tested against both random and intelligently designed LR attacks. The results show that the proposed detection framework can effectively detect LR attacks. Moreover, attack mitigation can be achieved by using the SVR predicted loads to re-dispatch generations.

1 Introduction

Leveraging information technology, the operation of modern electric power grids largely rely on real-time sensing, monitoring, communication, and control. State estimation (SE) utilizes the power system measurements collected by the supervisory control and data acquisition (SCADA) system to estimate the operating states. These states are used by the energy management system (EMS) to allow for real-time control of power system. In the last decade, the cyber-security of SE has been studied with considerable attention. A class of false data injection (FDI) attacks that replace measurements with counterfeits have been shown to be able to easily spoof SE and the traditional bad data detector (BDD) Liu2009 . This finding serves as the basis of a wide class of FDI attacks, called load redistribution (LR) attacks, which make it appear as if the loads are redistributed among load buses while the total load remain unchanged.

Worst-case consequences of LR attacks can be found using bi-level optimization problems. These attacks can be designed to cause physical or economic consequences. For physical consequences, Zhang2016TSG attempts to find an attack to mask the outage of a transmission line, and Liang2015 designs attacks that can cause physical overflows. For economic consequences, Moslemi2018 and Jia2014 show that LR attacks can change locational marginal prices, and/or make profit for attackers. Therefore, it is crucial to develop techniques to detect and mitigate LR attacks.

Various attack detection techniques have been presented in the literature. In An2019 , the authors propose a multivariate Gaussian-based anomaly detector trained using correlation features of micro phasor measurement units (μ\muPMUs), but this detector requires installation of μ\muPMUs in the system. Liu et al. Liu2018 detect and identify attacks using reactance perturbation, but this method only works when the attacker has limited resources. The authors of Che2019 attempt to mitigate LR attacks using a tri-level optimization approach, and the authors of Li2019 try to identify LR attacks by monitoring abnormal load deviations and suspicious branch flow changes. However, they both only focus on attacks that cause line overflows. In Liu2019 , a financially motivated FDI attack model is analyzed and a robust incentive-reduction strategy is proposed to deter such attacks by protecting a subset of meters. More generally, machine learning techniques are also deployed in detecting LR attacks. For example, Ozay2016 proposes supervised and semi-supervised machine learning algorithms to detect FDI attacks by exploiting the relationships between statistical and geometric properties of attack vectors employed in the attack scenarios. A deep reinforcement learning-based approach is proposed to detect LR attacks in An2019a . In Pinceti2018 , three machine learning techniques are introduced for attack detection, namely nearest neighbor, semi-supervised one class SVM, and replicator neural network. These three algorithms compare estimated loads with historical loads and use thresholding to determine the existence of LR attacks.

Estimation-Detection Framework: In this paper, we introduce an LR attack detection framework based on support vector models by leveraging the historical load information commonly available to system operators. Unlike most existing approaches in the literature, our method determines the existence of LR attacks directly through the estimated loads, without requiring installations of new devices nor protection of specific measurements. When an LR attack occurs, the estimated loads obtained from the SE results are different from the true loads, but the net loads are the same. Thus, if accurate load predictions are available, the existence of LR attacks can be determined by comparing the predicted and estimated loads. Moreover, if an LR attack is detected, the predicted loads can be directly used to re-dispatch generation instead of using the estimated loads. By doing this, the attack consequences can be temporarily mitigated, giving operators time to perform other corrective actions.

Support Vector Models: In particular, we propose a support vector regression (SVR) Smola2004 based load predictor to accurately predict loads, and a subsequent support vector machine (SVM) Cortes1995 based attack detector that compares the predicted and observed loads to detect LR attacks. Our choice of this modular design aims to separate the prediction and classification, so that each module can be independently enhanced (e.g., using additional features) and also replaced by other methods, as seen fit. Support vector models are optimization-based machine learning approaches that can be used for both regression and classification purposes. There are many different machine learning methods, and we choose support vector models for the following reasons: (i) they are mature methods that have been proven to be effective for various regression/classification tasks in power systems, including transient stability assessment Yuanhang2015 , component outage estimation Eskandarpour2017 , and state estimation Kirincic2019 ; (ii) they are analytically developed models with fewer and easier to tune parameters compared to many other machine learning methods, e.g., neural networks.

SVR has been widely used for load prediction in electric power systems. In Qiang2019 , a short-term load forecasting algorithm is proposed combining SVR and particle swarm optimization. The authors of Capuno2017 proposes a SVR model that predicts very short term loads using weather data and day ahead predicted loads as features. Similar features along with additional time-related features are used to train a SVR model that predicts short term and mid term loads in Su2017 . In Azad2018 , Azad et al, predict the daily peak load using the historical peak load consumption and the corresponding temperature and relative humidity. Chong et al, propose a K-step ahead prediction using SVR in Chong2017 .

Proposed SVR Load Predictor: The aforementioned references focus on predicting the net load utilizing temporal correlation. To the best of our knowledge, we are one of the first to predict loads at each bus using SVR, leveraging both spatial and temporal correlations between all the loads in the system. Features selected for the SVR predictor include historical load values of all loads chosen at distinct time intervals prior to the target time (e.g., one hour before, one day before, etc.) as well as the specific time information (e.g., month, weekday/weekend). This choice allows for conveniently using the same features to predict loads at different buses as the temporal features for all loads implicitly capture the spatial correlations among them.

Proposed SVM Detector: SVM is a supervised learning approach to solve classification problems, based on learning separating hyperplanes. Our approach using SVR to detect attacks largely mirrors existing approaches; our key contribution is in how we generate the training data needed to learn the SVM model to classify accurately over a large class of attacks. We now describe the dataset and our approach to train and test the two models.

Dataset: We train and test our models using the publicly available PJM metered zonal load data PJM2019 . We map each of the 20 zones of the PJM data to a load bus in the IEEE 30-bus system, leveraging the fact that there are 20 loads in this system.

Training and Testing: To apply SVM on attack detection, it is necessary to create training data in both classes, namely normal and attacked data. The SVR predicted loads and the true loads (assuming trustworthy historical data) naturally form the normal data. For the attacked data, we propose a novel approach that generates random LR attacks in order to maximally explore the attack space, and thereby enhance accuracy in detecting any LR attack. Each of these attacks alters a random number of loads, and a Gaussian distribution is used to generate the deviation of each load from its true value. The severity of the attacks is controlled by varying the maximum deviation percentage over all loads. Our approach also guarantees the net load change is 0 to satisfy the constraints of LR attacks. We use 80% of the data for training, and the remaining 20% for testing.

In addition to the random attacks, we also generate two types of intelligently designed LR attacks, namely cost maximization (CM) and line overflow (LO) attacks, to test the effectiveness of our SVM attack detector. CM attacks aim to maximize the operation cost Yuan11 ; and LO attacks attempt to overflow a target transmission line Liang2015 . These two types of attacks are designed through optimizations to maximize their economic/physical consequences.

Our results show that the proposed attack estimation-detection framework can effectively predict and detect both random and intelligently designed LR attacks. Moreover, we illustrate that using the SVR predicted loads to re-dispatch when attacks are detected can significantly reduce the attack consequences.

Summary of Contributions: The key contributions of this paper are as follows:

1. We propose an LR attack detection framework consisting of an SVR load predictor and a subsequent SVM attack detector. This modular design enables separate enhancement of each block, and also provides sufficiently accurate predicted loads for attack mitigation purposes.

2. The SVR predictor leverages both temporal and spatial correlations within the historical load data to allow for prediction of bus-level loads. Through training and testing the proposed SVR predictor on the PJM metered load data PJM2019 , we show that it can accurately predict every load in the system.

3. Utilizing the SVR predicted loads, we train the SVM detector using normal data and random LR attacks designed to maximally explore the attack space.

4. The performance of the detection framework is tested on random attacks as well as two types of intelligently designed LR attacks. These attacks aim to cause economic/physical consequences. Our simulation results show that our detection framework can significantly reduce the impact of LR attacks.

The rest of this paper is organized as follows. Section 2 introduces LR attacks and existing approaches to create intelligently designed LR attacks. Section 3 describes the structure of the proposed attack detection framework, the formulations of SVR and SVM, as well as random LR attack creation method for SVM training purpose. Section 4 illustrates the performance of the SVR load predictor and the SVM attack detector. Concluding remarks are presented in Section 5.

2 Load Redistribution Attacks

2.1 Load Redistribution (LR) Attacks and Unobservable False Data Injection (FDI) Attacks

Definition 1: LR attacks are a class of cyber-attacks that redistribute loads among the buses, while keeping the net load unchanged. The false loads in an LR attack 𝑷Atk\bm{P}_{\text{Atk}} satisfies

𝑷Atk=𝑷+Δ𝑷,\displaystyle\bm{P}_{\text{Atk}}=\bm{P}+\Delta\bm{P}, (1)
iΔPi=0,\displaystyle\sum_{i}\Delta{P_{i}}=0, (2)

where 𝑷\bm{P} is the true load vector, Δ𝑷\Delta\bm{P} is the load change caused by attack, and ii is the load index.

Definition 2: The load shift τ\tau is defined to be the largest load change in percentage of the true loads:

τ=max𝑖|ΔPiPi|×100%.\tau=\underset{i}{\max}\left|\frac{\Delta{P_{i}}}{P_{i}}\right|\times 100\%. (3)

We use τ\tau as an intrinsic metric to characterize the detectability of LR attacks. We found that it is trivial to detect attacks with sufficiently large τ\tau, because load deviations far from true values are suspicious. Thus, an attacker is likely to limit τ\tau to avoid detection by this metric. In this paper, we only consider LR attacks with τ20%\tau\leq 20\%.

The most common way to generate LR attacks in the literature is through unobservable FDI attacks against power system state estimation (SE). FDI attacks are a class of cyber-attacks that involves an attacker maliciously replacing power system measurements with counterfeits. Under DC power flow assumption111For simplicity, we focus on DC power flow settings, but our work can be generalized to AC cases as in Liang2015 ., the true measurement vector z, consisting of the line power flow and bus power injection measurements, is given by

z=𝑯𝜽+𝒆,\textbf{z}=\bm{H\theta}+\bm{e}, (4)

where 𝜽\bm{\theta} is the state vector (voltage angles), 𝑯\bm{H} is the dependency matrix between measurements and states, and 𝒆\bm{e} is the noise vector.

Definition 3: A false measurement vector z¯\bar{\textbf{z}} created with state attack vector c,

z¯=𝑯(𝜽+c)+𝒆,\bar{\textbf{z}}=\bm{H}(\bm{\theta}+\textbf{c})+\bm{e}, (5)

is unobservable to the conventional bad data detector (BDD) embedded with SE, because it is not distinguishable from the true measurements if the true states were (𝜽+c)(\bm{\theta}+\textbf{c}).

Let 𝑩\bm{B} be the dependency matrix between bus power injections and states, and let 𝑮\bm{G} be a given generation vector, then the bus power injections without attack can be expressed as

𝑮𝑷=𝑩𝜽.\bm{G}-\bm{P}=\bm{B\theta}. (6)

With attack, the false injections are given by

𝑮𝑷Atk=𝑩(𝜽+𝒄).\bm{G}-\bm{P}_{\text{Atk}}=\bm{B}(\bm{\theta}+\bm{c}). (7)

Substituting (6) into (7) yields the load change vector

Δ𝑷=𝑷Atk𝑷=𝑩𝒄.\Delta\bm{P}=\bm{P}_{\text{Atk}}-\bm{P}=-\bm{Bc}. (8)

Note that since 𝟏T𝑩=𝟎T\bm{1}^{T}\bm{B}=\bm{0}^{T}, the net load change is iΔPi=𝟏T𝑩𝒄=0\sum\limits_{i}\Delta{P_{i}}=-\bm{1}^{T}\bm{Bc}=0. Thus, given a generation dispatch, an unobservable FDI attack leads to an LR attack.

2.2 Intelligently Designed LR Attacks

Although an attacker can inject arbitrary 𝒄\bm{c} as long as it controls the measurements corresponding to all non-zero entries of 𝑯𝒄\bm{Hc}, its goal will be to maliciously choose 𝒄\bm{c} so that the resulting false loads can mislead the system re-dispatch to cause physical and/or economical consequences. We define these attacks as intelligent attacks, whose consequences can be maximized by solving optimization problems. In this paper, we consider two specific intelligent attacks to test the robustness of our proposed detector, namely cost maximization (CM) attacks Yuan11 and line overflow (LO) attacks Liang2015 .

CM attacks are a class of FDI attacks that aim to maximize the operation cost after re-dispatch. The attack vector 𝒄\bm{c} of CM attacks can be obtained through the following bi-level optimization problem:

maximize𝒄\displaystyle\underset{\bm{c}}{\text{maximize}}\>\hskip 5.69046pt 𝒂T𝑮\displaystyle\bm{a}^{T}\bm{G}^{*} (9a)
subject to τ𝑷𝑩𝒄τ𝑷\displaystyle-\tau\bm{P}\leq\bm{Bc}\leq\tau\bm{P} (9b)
{𝑮,𝑷𝑳}=arg{min𝑮,𝑷𝑳𝒂T𝑮}\displaystyle\left\{\bm{G}^{*},\bm{P_{L}}^{*}\right\}=\text{arg}\left\{\underset{\bm{G,P_{L}}}{\text{min}}\>\bm{a}^{T}\bm{G}\right\} (9c)
subject to𝑮=𝑷\displaystyle\text{subject to}\hskip 5.69046pt\;\sum\bm{G}=\sum\bm{P} (9d)
𝑷𝑳=𝑹(𝑮𝑷+𝑩𝒄)\displaystyle\hskip 48.36958pt\bm{P_{L}}=\bm{R}(\bm{G-P+Bc}) (9e)
𝑷𝑳max𝑷𝑳𝑷𝑳max\displaystyle\hskip 48.36958pt-\bm{P_{L}}^{\max}\leq\bm{P_{L}}\leq\bm{P_{L}}^{\max} (9f)
𝑮min𝑮𝑮max\displaystyle\hskip 48.36958pt\bm{G}^{\min}\leq\bm{G}\leq\bm{G}^{\max} (9g)

where 𝒂\bm{a} is the generation cost, 𝑷𝑳\bm{P_{L}} is the cyber line power flows, 𝑹\bm{R} is the power transfer distribution factor (PTDF) matrix, 𝑷𝑳max\bm{P_{L}}^{\max} is the line power flow limits, and 𝑮max\bm{G}^{\max} and 𝑮min\bm{G}^{\min} are generation upper and lower limits, respectively. In the upper level, (9a) models the attacker’s objective to maximize the operation cost, and (9b) models the load shift limit. The lower level problem (9c)-(9g) is the system DCOPF under attack. This bi-level optimization problem can be converted to a single level mixed-integer linear program (MILP) by replacing the lower level DCOPF with its Karush-Kuhn-Tucker (KKT) conditions BoydBook , and then converting the complementary slackness conditions to mixed integer constraints. The optimal 𝒄\bm{c} is obtained by solving the MILP.

LO attacks attempt to maximize the physical power flow on a target line ll after re-dispatch, and possibly cause overflows. Optimal 𝒄\bm{c} for LO attacks can be obtained by changing the objective function of (9) to maximizing physical power flow:

maximize𝒄\displaystyle\underset{\bm{c}}{\text{maximize}}\>\hskip 5.69046pt |𝑷𝑳l𝑹l𝑩𝒄|\displaystyle\left|\bm{P_{L}}^{l*}-\bm{R}_{l}\cdot\bm{Bc}\right| (10)
subject to (9b)(9g),\displaystyle\eqref{eq:CM_Con_loadshift}-\eqref{eq:OPFCon:Gen_limit},

where 𝑷𝑳l\bm{P_{L}}^{l*} is the optimal cyber power flow on target line ll, 𝑹l\bm{R}_{l} is the lthl^{\text{th}} row of 𝑹\bm{R}, and the second term in (10) characterizes the impact of false loads on the physical power flow of line ll.

3 Proposed Attack Detection Framework

Figure 1 illustrates the structure of our proposed LR attack detection framework. During the real-time operation, features are selected from the historical load data until the current time step to capture both spatial and temporal correlations. Loads at the next time step are then predicted by the SVR load predictor using these features. One SVR model is trained for each load using the same features. Subsequently, the SVM attack detector takes the predicted loads and loads estimated after SE to determine the existence of LR attacks.

For detecting attacks, it should suffice to skip the SVR load predictor and plug all SVR features into the SVM to perform classification. However, in this paper we include the SVR for the following two reasons. The first one is that we aim to not only find an attack detection technique, but also have a corrective mechanism when attacks are detected. Using the (accurate) predicted loads to perform control actions when attacks are flagged provides time to locate the attacked measurements without causing severe consequences. The second reason is for easier scaling of the proposed models to large-scale power systems. Without the SVR predictor, the number of features used in SVM classifier will be very large, making it difficult to train and perform real-time classifications. With the SVR predictor in place, the SVM detector only needs the predicted and observed load values as features, making it useful for large-scale systems.

Refer to caption
Figure 1: Structure of the proposed LR attack detection framework.

3.1 The SVR Load Predictor

Given data samples 𝒙jp,j=1,2,3,,m\bm{x}_{j}\in\mathbb{R}^{p},j=1,2,3,...,m and target values 𝒚m\bm{y}\in\mathbb{R}^{m}, an SVR attempts to find the best parameters 𝒘r\bm{w}_{r} and brb_{r} to fit |yj𝒘rTϕ(𝒙j)br|ε|y_{j}-\bm{w}_{r}^{T}\phi(\bm{x}_{j})-b_{r}|\leq\varepsilon by solving the following optimization problem Smola2004 :

minimize𝒘r,br,ζj,ζj\displaystyle\underset{\bm{w}_{r},b_{r},\zeta_{j},\zeta_{j}^{\prime}}{\text{minimize}}\>\hskip 5.69046pt 12𝒘rT𝒘r+Mj=1n(ζj+ζj)\displaystyle\frac{1}{2}\bm{w}_{r}^{T}\bm{w}_{r}+M\sum\limits_{j=1}^{n}(\zeta_{j}+\zeta_{j}^{\prime}) (11a)
subject to yj𝒘rTϕ(𝒙j)brε+ζj(αj)\displaystyle y_{j}-\bm{w}_{r}^{T}\phi(\bm{x}_{j})-b_{r}\leq\varepsilon+\zeta_{j}\hskip 8.5359pt(\alpha_{j}) (11b)
𝒘rTϕ(𝒙j)+bryjε+ζj(αj)\displaystyle\bm{w}_{r}^{T}\phi(\bm{x}_{j})+b_{r}-y_{j}\leq\varepsilon+\zeta_{j}^{\prime}\hskip 8.5359pt(\alpha_{j}^{\prime}) (11c)
ζj,ζj0,j,\displaystyle\zeta_{j},\zeta_{j}^{\prime}\geq 0,\forall j, (11d)

where ε\varepsilon is the regression tolerance, ζj,ζj\zeta_{j},\zeta_{j}^{\prime} are slack variables to allow for outliers, MM is the penalty factor for outliers, αj,αj\alpha_{j},\alpha_{j}^{\prime} are dual variables, and ϕ()\phi(\cdot) is a function that implicitly maps the data samples to a higher dimensional space. The dual formulation has a smaller number of variables and allows for applying the kernel trick:

minimize𝜶,𝜶\displaystyle\underset{\bm{\alpha,\alpha^{\prime}}}{\textnormal{minimize}}\hskip 8.5359pt 12(𝜶𝜶)T𝑸(𝜶𝜶)\displaystyle\frac{1}{2}(\bm{\alpha-\alpha^{\prime}})^{T}\bm{Q}(\bm{\alpha-\alpha^{\prime}}) (12a)
+ε𝟏T(𝜶+𝜶)yT(𝜶𝜶)\displaystyle+\varepsilon\bm{1}^{T}(\bm{\alpha+\alpha^{\prime}})-y^{T}(\bm{\alpha-\alpha^{\prime}})
subject to 𝟏T(𝜶𝜶)=0\displaystyle\bm{1}^{T}(\bm{\alpha-\alpha^{\prime}})=0 (12b)
0αj,αjM,j\displaystyle 0\leq\alpha_{j},\alpha_{j}^{\prime}\leq M,\forall j (12c)

where 𝑸\bm{Q} is a positive semi-definite matrix, and Qij=Q(𝒙i,𝒙j)=ϕ(𝒙i)Tϕ(𝒙j)Q_{ij}=Q(\bm{x}_{i},\bm{x}_{j})=\phi(\bm{x}_{i})^{T}\phi(\bm{x}_{j}) is the kernel. Once the optimal solutions (𝜶,𝜶\bm{\alpha}^{*},\bm{\alpha}^{\prime*}) are obtained, the regression value ynewy_{\text{new}} of a new data sample 𝒙new\bm{x}_{\text{new}} can be computed as

ynew=j=1n(αjαj)Q(𝒙j,𝒙new).y_{\text{new}}=\sum_{j=1}^{n}(\alpha_{j}^{*}-\alpha_{j}^{\prime*})Q(\bm{x}_{j},\bm{x}_{\text{new}}). (13)

To accurately predict the load values, many different features can be used, including time, weather, temperature, location, and load type (residential/commercial/industrial). Intuitively, it would be the best if we use all the features to perform the prediction, but many of them are unavailable, and some of them may be redundant. The features used in the SVR load predictor also depend on the available dataset. For example, the time step of the prediction depends on how frequently the historical load data are recorded. For the specific dataset we use in this paper, we select time information and historical load values at several time points relative to the target time to capture the temporal correlation, and load values at the same time points for all loads to capture the spatial correlation. Details of selected features for the SVR load predictor will be given in Section 4.1.

3.2 The SVM Attack Detector

Given data samples 𝒖jq,j=1,2,3,n\bm{u}_{j}\in\mathbb{R}^{q},j=1,2,3,...n and a vector of class labels 𝒗{1,1}n\bm{v}\in\{1,-1\}^{n}, an SVM attempts to find the decision boundary with the maximal margin to best classify 𝒖j\bm{u}_{j} by solving the following optimization problem Cortes1995 :

minimize𝒘m,bm,λj\displaystyle\underset{\bm{w}_{m},b_{m},\lambda_{j}}{\text{minimize}}\>\hskip 5.69046pt 12𝒘mT𝒘m+Cj=1nλj\displaystyle\frac{1}{2}\bm{w}_{m}^{T}\bm{w}_{m}+C\sum\limits_{j=1}^{n}\lambda_{j} (14a)
subject to vj(𝒘mTϕ(𝒖j)+bm)1λj(βj)\displaystyle{v}_{j}(\bm{w}_{m}^{T}\phi(\bm{u}_{j})+b_{m})\geq 1-\lambda_{j}\hskip 8.5359pt(\beta_{j}) (14b)
λj0,j.\displaystyle\lambda_{j}\geq 0,\forall j. (14c)

Similar to the SVR formulation in (11), λj\lambda_{j} is a slack variable to allow for outliers, CC is its penalty factor, and βj\beta_{j} is the dual variable. Again, applying the kernel trick, the dual formulation is used:

minimize𝜷\displaystyle\underset{\bm{\beta}}{\text{minimize}}\hskip 8.5359pt 12𝜷T𝑸𝜷𝟏T𝜷\displaystyle\frac{1}{2}\bm{\beta}^{T}\bm{Q\beta}-\bm{1}^{T}\bm{\beta} (15a)
subject to 𝒗T𝜷=0\displaystyle\bm{v}^{T}\bm{\beta}=0 (15b)
0βjC,j.\displaystyle 0\leq\beta_{j}\leq C,\forall j. (15c)

Note that here Qij=vivjQ(𝒖i,𝒖j)=vivjϕ(𝒖i)Tϕ(𝒖j)Q_{ij}=v_{i}v_{j}Q(\bm{u}_{i},\bm{u}_{j})=v_{i}v_{j}\phi(\bm{u}_{i})^{T}\phi(\bm{u}_{j}). Once the optimal solution 𝜷\bm{\beta} is acquired, the label vnewv_{\text{new}} for a new input data sample 𝒖new\bm{u}_{\text{new}} can be obtained by

vnew=sgn(j=1nvjβjQ(𝒖j,𝒖new))v_{\text{new}}=\text{sgn}(\sum_{j=1}^{n}v_{j}\beta_{j}^{*}Q(\bm{u}_{j},\bm{u}_{\text{new}})) (16)

where sgn()\text{sgn}(\cdot) is the sign function. The features in 𝒖j\bm{u}_{j} include the SVR predicted loads, the observed loads, and the same time information used in the SVR.

3.3 Generating Random LR Attacks to Train the SVM

We train the SVM detector using normal data and randomly designed LR attacks. The SVM detector trained using random attacks is expected to maximally explore the space of LR attacks, and hence, perform well in detecting any LR attacks. Given true loads 𝑷\bm{P}, the false loads 𝑷Atk\bm{P}_{\text{Atk}} in these random attacks are acquired using (1), 𝑷Atk=𝑷+Δ𝑷\bm{P}_{\text{Atk}}=\bm{P}+\Delta\bm{P}. Thus, finding 𝑷Atk\bm{P}_{\text{Atk}} is equivalent to finding Δ𝑷\Delta\bm{P}. In each attack, we assume the attacker changes KK loads at random, whose indices form a set 𝒦\mathcal{K}, so that ΔP𝒦(k)\Delta P_{\mathcal{K}(k)} indicates the load change of the kthk^{\text{th}} attacked load, k=1,2,,Kk=1,2,\dots,K. The load changes of these attacked loads, denoted 𝜸\bm{\gamma}, can be arbitrary. However, according to the LR attack property (2), they must be constrained to have a 0 sum. Thus, we model 𝜸\bm{\gamma} with a joint Gaussian distribution with 0 mean and covariance matrix 𝚪\bm{\Gamma}:

𝜸\displaystyle\bm{\gamma} 𝒩(𝟎,𝚪)\displaystyle\sim\mathcal{N}(\bm{0,\Gamma}) (17)
γk\displaystyle\gamma_{k} =ΔP𝒦(k).\displaystyle=\Delta P_{\mathcal{K}(k)}. (18)

Given a load shift τ\tau, the diagonal entries of 𝚪\bm{\Gamma} must satisfy

Γkk=Var(γk)=(12τP𝒦(k))2,k\Gamma_{kk}=Var(\gamma_{k})=(\frac{1}{2}\tau P_{\mathcal{K}(k)})^{2},\forall k (19)

to ensure that the probability of |γk|τP𝒦(k)|\gamma_{k}|\leq\tau P_{\mathcal{K}(k)} is 95%, because the probability of deviating beyond 2×2\timesstandard deviation in a Gaussian distribution is 5%. Recall that the load changes caused by a valid LR attack must satisfy (2), which can be rewritten as

iΔPi=kΔP𝒦(k)=𝟏T𝜸=0.\sum\limits_{i}\Delta P_{i}=\sum\limits_{k}\Delta P_{\mathcal{K}(k)}=\bm{1}^{T}\bm{\gamma}=0. (20)

Eq. (20) is equivalent to

E[(𝟏T𝜸)2]\displaystyle E[(\bm{1}^{T}\bm{\gamma})^{2}] =E[𝟏T𝜸𝜸T𝟏]\displaystyle=E[\bm{1}^{T}\bm{\gamma}\bm{\gamma}^{T}\bm{1}]
=𝟏T𝚪𝟏\displaystyle=\bm{1}^{T}\bm{\Gamma}\bm{1}
=0.\displaystyle=0. (21)

Finding a valid 𝜸\bm{\gamma} is equivalent to finding a positive semidefinite matrix 𝚪\bm{\Gamma} that satisfies (19) and (21). Since 𝚪\bm{\Gamma} is a covariance matrix, it must be positive semidefinite:

𝚪0.\bm{\Gamma}\succeq 0. (22)

Any 𝚪\bm{\Gamma} satisfying (19), (21) and (22) would suffice for our application. Finding 𝚪\bm{\Gamma} is equivalent to solving a semidefinite program with arbitrary objective, constrained by (19), (21) and (22). The procedure to acquire false loads 𝑷Atk\bm{P}_{\text{Atk}} is summarized in Alg. 1. Varying the attack hour hh, load shift τ\tau, and number of attacked loads KK, we can find feasible 𝚪\bm{\Gamma} to obtain 𝜸\bm{\gamma} using (17), and subsequently create an arbitrary number of false loads 𝑷Atk\bm{P}_{\text{Atk}} using (1). Note that for specific combinations of h,τ,Kh,\tau,K, and 𝒦\mathcal{K}, sometimes no feasible 𝚪\bm{\Gamma} can be found, but we can simply re-run Alg.1 with different inputs. Since (17) is drawing 𝜸\bm{\gamma} randomly from a Gaussian distribution, the resulting real load shift τr\tau_{r} of 𝑷Atk\bm{P}_{\text{Atk}} may be different than the input τ\tau. We keep drawing 𝜸\bm{\gamma} until τrτ\tau_{r}\leq\tau. The false loads created are then used to generate data samples to train and test the SVM detector.

Algorithm 1 Generating random LR attack false loads

Input: hh, KK, τ\tau
Output: 𝑷Atk\bm{P}_{\text{Atk}}

  1. 1.

    Obtain the true loads 𝑷\bm{P} at hour hh.

  2. 2.

    Randomly select KK loads to attack and let 𝒦\mathcal{K} denote the set of indices of the attacked loads.

  3. 3.

    Find a 𝚪\bm{\Gamma} satisfying (19), (21) and (22) with τ,K,𝒦\tau,K,\mathcal{K}, and 𝑷\bm{P}. This can be done by solving a semidefinite program with arbitrary objective, constrained by (19), (21) and (22). If no feasible 𝚪\bm{\Gamma} can be found, terminate.

  4. 4.

    Draw the non-zero load changes 𝜸\bm{\gamma} from 𝒩(𝟎,𝚪)\mathcal{N}(\bm{0,\Gamma}) and obtain false loads 𝑷Atk\bm{P}_{\text{Atk}} using (1).

  5. 5.

    Calculate the real load shift τr\tau_{r} of 𝑷Atk\bm{P}_{\text{Atk}} using (3). If τr>τ\tau_{r}>\tau, go to step 4). Otherwise, terminate.

4 Numerical Results

We use the publicly available PJM zonal hourly metered load data PJM2019 from 2015 through 2018 for 20 transmission zones as the historical data to train and test our LR attack detection framework. In order to conveniently create intelligently designed LR attacks as described in Section 2.2, we map each zone to a load bus in the IEEE 30-bus system, leveraging the fact that there are 20 loads in this system. The mapping relationship is adopted from Pinceti2018 , and all load values are multiplied by a scaling factor of 1.308×1031.308\times 10^{-3} to obtain a system with moderate amount of congestion. Table 1 describes the mapping rules between load indices, PJM zones, and bus indices. The SVR and SVM models are implemented in Python using the Scikit-learn package sklearn . The random, CM and LO attack creation are implemented in Matlab with solver Gurobi. All experiments are conducted on a 2.7 GHz CPU with 32 GB RAM.

Table 1: Mapping rules between load indices, PJM zones, and bus indices
    Load Zone Bus     Load Zone Bus    
    1 DOM 2     11 PL 17    
    2 AE 3     12 PN 18    
    3 JC 4     13 PE 19    
    4 CE 7     14 RECO 20    
    5 AEP 8     15 ATSI 21    
    6 DPL 10     16 DUQ 23    
    7 PS 12     17 BC 24    
    8 DEOK 14     18 ME 26    
    9 PEP 15     19 EKPC 29    
    10 DAY 16     20 AP 30    

4.1 The SVR Load Predictor Performance

In this section, we provide details on training and testing the SVR load predictor. As mentioned above, given the hourly load data we have, our SVR load predictor aims to accurately predict the load values at hour h+1h+1 when the current hour is hh. The features we use include time information and historical load values up to hour hh. We select month (momo), hour (hrhr), and weekday/weekend (wdwd) as the time information features, 𝒕=[mo,wd,hr]\bm{t}=[mo,wd,hr]. Note that hrhr here is the wall clock time, for example, hr=14hr=14 for 2 PM, and is different than hh, which is a unique point in time. Here we only distinguish between weekdays and weekends since loads tend to be similar during weekdays, i.e., wd=1wd=1 for weekdays and wd=2wd=2 for weekends. The temporal correlation of each load is captured by including its historical values, at hour hh and ss previous hours; and at hour hrhr and hr+1hr+1 of dd previous days, as features. For load ii, the load value features 𝒇i\bm{f}_{i} are given by

𝒇i=[\displaystyle\bm{f}_{i}=[ Pih,Pih1,,Pihs,Pih24d,\displaystyle{P}_{i}^{h},{P}_{i}^{h-1},...,{P}_{i}^{h-s},{P}_{i}^{h-24d},
Pih24d+1,,Pih24,Pih23].\displaystyle{P}_{i}^{h-24d+1},...,{P}_{i}^{h-24},{P}_{i}^{h-23}]. (23)

To capture the spatial correlations, we concatenate the load value features of all the loads.

The multi-output SVR load predictor is achieved by solving one SVR optimization problem (11) for each load. In our experiments, we trained three SVR models to justify the contribution of capturing spatial correlations, as well as to see the influence of different selected features. Model 1 predicts each load using only time information 𝒕\bm{t} and its own load value features. A data sample used in Model 1 to predict load ii is given by

𝒙j,i=[𝒕,𝒇i]i.\displaystyle\bm{x}_{j,i}=[\bm{t},\bm{f}_{i}]\forall i. (24)

Model 2 and 3 use 𝒕\bm{t} and 𝒇i,i,\bm{f}_{i},\forall i, as features to predict all loads. A data sample in these two models is given by

𝒙j=[𝒕,𝒇1,𝒇2,𝒇nl],\displaystyle\bm{x}_{j}=[\bm{t},\bm{f}_{1},\bm{f}_{2},...\bm{f}_{n_{l}}], (25)

where nln_{l} is the number of loads in the system. In Model 2, s=3s=3 and d=2d=2; and in Model 3, s=4s=4 and d=3d=3. The ground truth yj,i=Pih+1{y}_{j,i}={P}_{i}^{h+1} is the true load value at hour h+1h+1 for load ii. Table 2 presents some properties of the three tested SVR models. Comparing Models 1 and 2, we can see the influence of considering spatial correlations in addition to temporal correlations, as these two models use the same temporal features, but Model 2 additionally uses the features of all the loads to capture spatial correlations.

Table 2: Statistics of SVR models
    Model ss dd mm pp Training time (h)    
    1 3 2 35011 11 1.927    
    2 3 2 35011 163 4.234    
    3 4 3 34987 223 33.324    

The dimension of the data matrix 𝑿,m×p,\bm{X},m\times p, and target value matrix 𝒀,m×nl,\bm{Y},m\times n_{l}, depend on the values of ss and dd. Derivation of mm and pp are described in the Appendix. For each model, the training data matrix 𝑿train\bm{X}_{\text{train}} contains all data from 2015 - 2017, and data in 2018 are used as 𝑿test\bm{X}_{\text{test}}. Each column of 𝑿train\bm{X}_{\text{train}} is scaled to zero mean and unit variance, and each column of 𝑿test\bm{X}_{\text{test}} is scaled using the mean and variance of the corresponding column in 𝑿train\bm{X}_{\text{train}}. The same split and scaling are performed on 𝒀\bm{Y} to obtain 𝒀train\bm{Y}_{\text{train}} and 𝒀test\bm{Y}_{\text{test}} as well. The parameters in training the SVR models are chosen as ε=102\varepsilon=10^{-2} and M=100M=100. The radial basis function (RBF) kernel

Q(𝒙i,𝒙j)=σ𝒙i𝒙j2Q(\bm{x}_{i},\bm{x}_{j})=-\sigma\|\bm{x}_{i}-\bm{x}_{j}\|^{2} (26)

is used with σ=102\sigma=10^{-2}. Applying the trained SVR predictor on 𝑿train\bm{X}_{\text{train}} and 𝑿test\bm{X}_{\text{test}} yields the predicted loads 𝒀^train\hat{\bm{Y}}_{\text{train}} and 𝒀^test\hat{\bm{Y}}_{\text{test}}, respectively.

Two metrics are used to evaluate the performance of the SVR load predictor, namely root mean square error (RMSE) and mean absolute percentage error (MAPE). RMSE measures the square root of the average squared error for each load, and hence the unit is MW. MAPE measures on average how much the predicted loads deviate from their true values in percentage. These metrics for each load ii are calculated as

RMSEtrain,i=1mj=1m(𝒀train,i,j𝒀^train,i,j)2\displaystyle\text{RMSE}_{\text{train},i}=\sqrt{\frac{1}{m}\sum_{j=1}^{m}(\bm{Y}_{\text{train},i,j}-\hat{\bm{Y}}_{\text{train},i,j})^{2}} (27)
MAPEtrain,i=1mj=1m|𝒀train,i,j𝒀^train,i,j𝒀train,i,j|\displaystyle\text{MAPE}_{\text{train},i}=\frac{1}{m}\sum_{j=1}^{m}\left|\frac{\bm{Y}_{\text{train},i,j}-\hat{\bm{Y}}_{\text{train},i,j}}{\bm{Y}_{\text{train},i,j}}\right| (28)

where 𝒀train,i\bm{Y}_{\text{train},i} is the ithi^{\text{th}} column of 𝒀train\bm{Y}_{\text{train}}, and 𝒀¯train,i\bar{\bm{Y}}_{\text{train},i} is its mean. These metrics are similarly applied on 𝒀test\bm{Y}_{\text{test}} to evaluate the performance of the SVR load predictor on testing data.

Refer to caption
Figure 2: Performance of the SVR models under two metrics: (a) RMSE, and (b) MAPE. Model 1 does not capture spatial correlations. Model 2 uses temporal features of 3 previous hours and 2 previous days. Model 3 uses temporal features of 4 previous hours and 3 previous days. Both Models 2 and 3 capture spatial correlation.

Figures 2 illustrates the RMSE and MAPE for the SVR models. RMSE values largely depend on the load values itself, for example, load 5 has the largest RMSE value because it is the biggest load in the system. From Figure 2(b) we can see that the MAPE for most loads are around 1%, and MAPE for load 19, the most difficult load to predict, is around 2%. Comparing these quantities for Models 1 and 2, we can see that they are both smaller for Model 2. Recall that the difference between Models 1 and 2 is that Model 2 considers all prior loads, while Model 1 only includes the prior data at the load of interest. This result indicates that considering spatial correlations does improve the performance of the SVR load predictor. Comparing Models 2 and 3, it can be concluded that including too much historical data as features decreases the accuracy of the SVR load predictor. Besides, it can be seen from Table 2 that using too many features makes it extremely slow in training the SVR model. Thus, in the following sections, Model 2 is adopted to generate predicted loads used by the SVM attack detector.

4.2 The SVM Attack Detector Performance on Random Attacks

The outputs of the SVR load predictor are used as input features of the SVM attack detector. Depending on the existence of attack, input data samples of the SVM are given by

𝒖j\displaystyle\bm{u}_{j} =[mo,wd,hr,𝑷^,𝑷],if vj=1,\displaystyle=[mo,wd,hr,\hat{\bm{P}},\bm{P}],\text{if }v_{j}=-1, (29a)
𝒖j\displaystyle\bm{u}_{j} =[mo,wd,hr,𝑷^,𝑷Atk],if vj=1,\displaystyle=[mo,wd,hr,\hat{\bm{P}},\bm{P}_{\text{Atk}}],\text{if }v_{j}=1, (29b)

where vj=1v_{j}=-1 indicates that there is no attack, and vj=1v_{j}=1 otherwise. The predicted loads 𝑷^\hat{\bm{P}} of m=35011m=35011 hours, along with their ground truth values 𝑷\bm{P} and time information, yield 3501135011 normal data samples for the SVM detector in the form of (29a). The length of each data sample q=3+20×2=43q=3+20\times 2=43. The normal data matrix 𝑼normal\bm{U}_{\text{normal}} is of size 35011×4335011\times 43. We randomly select 80% of these vectors for training and the remaining 20% for testing. We create 10510^{5} attacked data samples in the form of (29b) using Alg. 1, resulting in 𝑼attack\bm{U}_{\text{attack}} of size 105×4310^{5}\times 43 with real load shift τr\tau_{r} ranging from 1% to 20%. From now on, we omit the subscript in τr\tau_{r} for easier presentation.

We obtain different SVM models to compare their performances by varying the penalty factor CC and τmin\tau_{\min} (the minimal τ\tau used in the training data). The normal data in the training data matrix 𝑼train\bm{U}_{\text{train}} are the same for all models, i.e., the same 80% of 𝑼normal\bm{U}_{\text{normal}}. The attacked data in 𝑼train\bm{U}_{\text{train}} include 80% of attacked data samples with ττmin\tau\geq\tau_{\min}. The testing data 𝑼test\bm{U}_{\text{test}} consists of the remaining 20% of attacked data that are not used in training with all load shifts, and are the same for all models. For each model, every column of training data matrix 𝑼train\bm{U}_{\text{train}} is scaled to zero mean and unit variance, and the same scaling is performed to the testing data. The kernel function used in the SVM detector is also the RBF kernel in the form of (26), but this time σ\sigma is calculated as σ=1/q\sigma=1/q (this is the “scale” option in Scikit-learn).

Figure 3 illustrates the effect of τmin\tau_{\min} on missed detection rate and false alarm rate. The false alarm rate is calculated by applying the detector on all m=35011m=35011 normal data samples, including both training and testing. The parameter CC is fixed at 10001000. τmin\tau_{\min} controls the amount of attacked training data. For instance, if τmin=3%\tau_{\min}=3\%, 𝑼train\bm{U}_{\text{train}} contains 80% of attacks with τ3%\tau\geq 3\%, but does not contain any attack with τ<3%\tau<3\%. Intuitively, attacks with higher τ\tau are further away from the normal data than those with lower τ\tau. Thus, a detector trained with a low τmin\tau_{\min} will have a high false alarm rate, as the SVM is trying to find a decision boundary between normal data and attacks with small load shift. However, it should perform better in detecting attacks with small τ\tau than detectors trained with large τmin\tau_{\min}. In Figure 3, the blue lines indicate the missed detection rate of attacks with certain load shift τ\tau, and the red line shows the false alarm rate. It can be seen that as τmin\tau_{\min} increases, the false alarm rate decreases, but the missed detection rate increases for attacks with small load shifts. This observation justifies the intuition discussed above, indicating that τmin\tau_{\min} is indeed a trade-off between false alarm rate and detection probability for small attacks. Note that for attacks with large τ\tau, the effect of τmin\tau_{\min} is insignificant. For testing attacks with extremely small τ\tau, the missed detection rates are very high even with small τmin\tau_{\min}, because these attacks are in principle very difficult to detect. However, these attacks are also unlikely to cause severe consequences. From Figure 3, we can see that τmin=3%\tau_{\min}=3\% is a good choice for our dataset.

Refer to caption
Figure 3: Effect of minimum training load shift τmin\tau_{\min}. False alarm rate and missed detection rate when testing random attacks are each plotted as a function of τmin\tau_{\min}. Data is shown for C=1000C=1000.

The parameter CC trades off misclassification of training examples against simplicity of the decision boundary. A small CC makes the decision boundary smooth, while a large CC aims at classifying all training samples correctly. Therefore, detector with large CC is expected to have a better performance. However, a large CC allows for fewer outliers, making it harder to solve the SVM optimization problem (14), so the training time increases. Figure 4 shows the performance of models trained with different CC on testing random attacks while fixing τmin=3%\tau_{\min}=3\%. The larger CC is, the higher detection probability we can achieve. This model performs well on attacks with large τ\tau, and the detection probability almost achieves 100% starting at τ=7%\tau=7\%. System operators can similarly vary τmin\tau_{\min} and CC to obtain SVM model with satisfactory performance, in terms of false alarm rate and missed detection rate.

Refer to caption
Figure 4: Effect of outlier penalty factor CC on testing random attack detection probability. Data is shown for τmin=3%\tau_{\text{min}}=3\%.

4.3 The SVM Attack Detector Performance on Intelligently Designed LR Attacks

In this section, we evaluate the performance of the trained SVM detector on cost maximization (CM) and line overflow (LO) attacks. According to the previous section, here we choose SVM parameters C=2000C=2000 and τmin=3%\tau_{\min}=3\% to balance false alarm rate and missed detection. The procedures to generate these attacks are described as follows. On the IEEE 30-bus system, we first perform base case DCOPF for each hour in year 2015 through 2018 using the true loads. At hour hh, if there are at least 2 lines whose power flows are greater than 80% of their ratings, we say those lines are critical lines, and hh is a critical hour. The total number of critical hours is found to be 8038. We focus on critical hours because the false loads are likely to cause congestions at those times, which in turn change the generation dispatch to have consequences. For each critical hour, we solve optimization problem (9) 20 times to obtain attack vector 𝒄\bm{c} fo CM attacks with τ=1%,2%,,20%\tau=1\%,2\%,\dots,20\%. For each critical line, we solve (10) 20 times to obtain 𝒄\bm{c} for LO attacks, also with τ=1%,2%,,20%\tau=1\%,2\%,\dots,20\%. Every non-zero 𝒄\bm{c} is used to construct false load vector 𝑷Atk\bm{P}_{\text{Atk}} as in (8). If a 𝑷Atk\bm{P}_{\text{Atk}} makes the DCOPF infeasible, it is discarded. The total number of false loads for CM attacks and LO attacks are 113031 and 343135, respectively.

Refer to caption
Figure 5: Detection probability on CM and LO attacks as a function of load shift τ\tau. Subplot (a) is for all attacks, and subplot (b) is only for attacks with consequences. Data is shown for τmin=3%\tau_{\text{min}}=3\% and C=2000C=2000.

Figure 5(a) illustrates the detection probability versus the load shift τ\tau on CM and LO attacks. For both attacks, the detection probabilities almost achieve 100% when τ4%\tau\geq 4\%. For attacks with τ=3%\tau=3\%, the detector performance drops to 97% for LO attacks, but it is still perfect in detecting CM attacks. Comparing with the performance on random attacks as shown in Figure 4, it can be seen that intelligently designed attacks are easier to detect than random attacks.

Figure 5(b) illustrates the detection probability versus load shift τ\tau on CM and LO attacks with consequences. CM attacks with consequences are those that increase the operating cost by more than 1%. LO attacks with consequences are those result in physical overflows. Comparing Figures 5(a) and 5(b), it can be seen that the detector performs even better on attacks with consequences.

4.4 Attack Mitigation

If LR attack is flagged by our detection framework, the simplest way to mitigate the attacks is to temporarily use the loads output by the SVR load predictor for re-dispatching purposes. To test the mitigation performance using this method, we compare the worst consequences of intelligently designed attacks with and without our detection framework.

In order to obtain the consequences, we run DCOPF three times using different loads. Under normal operation, running DCOPF with true loads 𝑷normal\bm{P}_{\text{normal}} yields the attack-free generation dispatch 𝑮normal\bm{G}_{\text{normal}}. Using attacked loads 𝑷Atk\bm{P}_{\text{Atk}} to run DCOPF gives attacked dispatch 𝑮Atk\bm{G}_{\text{Atk}}. Applying 𝑮Atk\bm{G}_{\text{Atk}} on true loads 𝑷normal\bm{P}_{\text{normal}} yields attacked line flows 𝑷𝑳,Atk=𝑹(𝑮Atk𝑷normal)\bm{P}_{\bm{L},\text{Atk}}=\bm{R}(\bm{G}_{\text{Atk}}-\bm{P}_{\text{normal}}). When an attack is detected, the system runs DCOPF using the SVR predicted loads 𝑷SVR\bm{P}_{\text{SVR}} and the resulting dispatch is 𝑮SVR\bm{G}_{\text{SVR}}. The corresponding line flows are given by 𝑷𝑳,SVR=𝑹(𝑮SVR𝑷normal)\bm{P}_{\bm{L},\text{SVR}}=\bm{R}(\bm{G}_{\text{SVR}}-\bm{P}_{\text{normal}}).

Figure 6(a) illustrates the mitigation results for CM attacks. The word “maximum” on the y-axis indicates the worst consequence among all attacks with each load shift τ\tau. The red line indicates the maximum cost increase without using our proposed detection framework, calculated as 𝒂T(𝑮Atk𝑮normal)\bm{a}^{T}(\bm{G}_{\text{Atk}}-\bm{G}_{\text{normal}}) (recall that 𝒂\bm{a} is the generation cost vector). When an attack is detected, the resulting cost increase is obtained by 𝒂T(𝑮SVR𝑮normal)\bm{a}^{T}(\bm{G}_{\text{SVR}}-\bm{G}_{\text{normal}}). When the detector fails to detect an attack, the cost increase is the attack consequence 𝒂T(𝑮Atk𝑮normal)\bm{a}^{T}(\bm{G}_{\text{Atk}}-\bm{G}_{\text{normal}}). Thus, for each load shift, if all attacks are detected, the data point on the blue line is given by 𝒂T(𝑮SVR𝑮normal)\bm{a}^{T}(\bm{G}_{\text{SVR}}-\bm{G}_{\text{normal}}). Otherwise, it is max[𝒂T(𝑮Atk𝑮normal),𝒂T(𝑮SVR𝑮normal)]\text{max}[\bm{a}^{T}(\bm{G}_{\text{Atk}}-\bm{G}_{\text{normal}}),\bm{a}^{T}(\bm{G}_{\text{SVR}}-\bm{G}_{\text{normal}})]. Similar procedure is performed to create Figure 6(b) for LO attacks. The red line is obtained by taking the maximum 𝑷𝑳,Atkl\bm{P}_{\bm{L},\text{Atk}}^{l} for each load shift (line ll is the target line). The blue line is obtained by 𝑷𝑳,SVRl\bm{P}_{\bm{L},\text{SVR}}^{l} if all attacks are detected, and max[𝑷𝑳,Atkl,𝑷𝑳,SVRl]\text{max}[\bm{P}_{\bm{L},\text{Atk}}^{l},\bm{P}_{\bm{L},\text{SVR}}^{l}] otherwise.

From Figures 6(a), we can see that for load shift τ3%\tau\geq 3\%, the increases in operation cost are significantly reduced by using SVR predicted loads when an attack is flagged. For LO attacks, the overflows are significantly reduced for load shift τ4%\tau\geq 4\%. The largest cost increase caused by CM attacks that are not detected is 8.17% (at τ=2%\tau=2\%), and the largest overflow caused by LO attacks that are not detected is 3.96% (at τ=3%\tau=3\%). Thus, even though our detector fails to detect a small number of attacks, their consequences are minor. Note that at τ=1%\tau=1\%, using the SVR predicted loads leads to larger overflow due to inaccurate predictions, but the overflow is still very small. Therefore, the consequences of LR attacks can be successfully mitigated using the SVR predicted loads, which gives operators time to take other corrective actions.

Refer to caption
Figure 6: Mitigation results of (a) CM attacks and (b) LO attacks. For each load shift, the points on the red lines indicate the worst consequence as a result of attack, and the points on the blue lines indicate the worst consequence with our attack detection framework. Points on the blue line are obtained by taking the maximum of two quantities: (i) resulting worst consequence if re-dispatch using SVR predicted loads when attack is flagged; and (ii) the worst attack consequence when the detector fails.

5 Concluding Remarks

A machine learning based load redistribution (LR) attack detection framework is proposed. This detection framework consists of a support vector regression (SVR)-based load predictor and a support vector machine (SVM)-based attack detector. The SVR load predictor is trained using features selected from historical load data to capture both spatial and temporal correlations. The prediction results indicate that the SVR load predictor can accurately predict loads at all buses. The SVM attack detector is trained using randomly generated LR attacks, and is shown to be effective in detecting both randomly generated and intelligently designed attacks, especially those with consequences. Using the proposed attack detection framework, system operators can make control decisions based on the predicted loads when attack is flagged to mitigate the consequence of the attacks. It also gives operators time to find the source of the attacks. Future work will include finding attack localization techniques that help system operators identify the loads and/or meters that are modified by the attacker.

Acknowledgment

This material is based on work supported by the National Science Foundation (NSF) under grant number CNS-1449080, and two grants from the Power System Engineering Research Center (PSERC) S-72 and S-74.

References

  • [1] Liu, Y., Ning, P., Reiter, M.K. ‘False data injection attacks against state estimation in electric power grids’. In: 16th ACM Conference on Computer and Communications Security. CCS ’09. (Chicago, Illinois, USA, 2009. pp.  21–32
  • [2] Zhang, J., Sankar, L.: ‘Physical system consequences of unobservable state-and-topology cyber-physical attacks’, IEEE Transactions on Smart Grid, 2016, 7, (4), pp. 2016–2025
  • [3] Liang, J., Sankar, L., Kosut, O.: ‘Vulnerability analysis and consequences of false data injection attack on power system state estimation’, IEEE Transactions on Power Systems, 2016, 31, (5), pp. 3864–3872
  • [4] Moslemi, R., Mesbahi, A., Velni, J.M.: ‘Design of robust profitable false data injection attacks in multi-settlement electricity markets’, IET Generation, Transmission Distribution, 2018, 12, (6), pp. 1263–1270
  • [5] Jia, L., Kim, J., Thomas, R.J., Tong, L.: ‘Impact of data quality on real-time locational marginal price’, IEEE Trans Power Systems, 2014, 29, (2), pp. 627–636
  • [6] An, Y., Liu, D.: ‘Multivariate gaussian-based false data detection against cyber-attacks’, IEEE Access, 2019, 7, pp. 119804–119812
  • [7] Liu, C., Wu, J., Long, C., Kundur, D.: ‘Reactance perturbation for detecting and identifying fdi attacks in power system state estimation’, IEEE Journal of Selected Topics in Signal Processing, 2018, 12, (4), pp. 763–776
  • [8] Che, L., Liu, X., Li, Z.: ‘Mitigating false data attacks induced overloads using a corrective dispatch scheme’, IEEE Transactions on Smart Grid, 2019, 10, (3), pp. 3081–3091
  • [9] Li, X., Hedman, K.W.: ‘Enhancing power system cyber-security with systematic two-stage detection strategy’, IEEE Transactions on Power Systems, 2019, pp.  1–1
  • [10] Liu, C., Zhou, M., Wu, J., Long, C., Kundur, D.: ‘Financially motivated fdi on sced in real-time electricity markets: Attacks and mitigation’, IEEE Transactions on Smart Grid, 2019, 10, (2), pp. 1949–1959
  • [11] Ozay, M., Esnaola, I., Yarman Vural, F.T., Kulkarni, S.R., Poor, H.V.: ‘Machine learning methods for attack detection in the smart grid’, IEEE Transactions on Neural Networks and Learning Systems, 2016, 27, (8), pp. 1773–1786
  • [12] An, D., Yang, Q., Liu, W., Zhang, Y.: ‘Defending against data integrity attacks in smart grid: A deep reinforcement learning-based approach’, IEEE Access, 2019, 7, pp. 110835–110845
  • [13] Pinceti, A., Sankar, L., Kosut, O. ‘Load redistribution attack detection using machine learning: A data-driven approach’. In: 2018 IEEE Power Energy Society General Meeting (PESGM). (, 2018. pp.  1–5
  • [14] Smola, A.J., Sch lkopf, B.: ‘A tutorial on support vector regression’, Statistics and Computing, 2004,
  • [15] Cortes, C., Vapnik, V.: ‘Support-vector networks’, Machine Learning, 1995, 20, (3), pp. 273–297
  • [16] Yuanhang, D., Lei, C., Weiling, Z., Yong, M. ‘Multi-support vector machine power system transient stability assessment based on relief algorithm’. In: 2015 IEEE PES Asia-Pacific Power and Energy Engineering Conference (APPEEC). (, 2015. pp.  1–5
  • [17] Eskandarpour, R., Khodaei, A. ‘Component outage estimation based on support vector machine’. In: 2017 IEEE Power Energy Society General Meeting. (, 2017. pp.  1–5
  • [18] Kirincic, V., Ceperic, E., Vlahinic, S., Lerga, J.: ‘Support vector machine state estimation’, Applied Artificial Intelligence, 2019, 33, (6), pp. 517–530
  • [19] Qiang, S., Pu, Y.: ‘Short-term power load forecasting based on support vector machine and particle swarm optimization’, Journal of Algorithms & Computational Technology, 2019, 13
  • [20] Capuno, M., Kim, J.S., Song, H.: ‘Very short-term load forecasting using hybrid algebraic prediction and support vector regression’, Mathematical Problems in Engineering, 2017,
  • [21] Su, F., Xu, Y., Tang, X. ‘Short-and mid-term load forecasting using machine learning models’. In: 2017 China International Electrical and Energy Conference (CIEEC). (, 2017. pp.  406–411
  • [22] Azad, M.K., Uddin, S., Takruri, M. ‘Support vector regression based electricity peak load forecasting’. In: 2018 11th International Symposium on Mechatronics and its Applications (ISMA). (, 2018. pp.  1–5
  • [23] Chong, L.W., Rengasamy, D., Wong, Y.W., Rajkumar, R.K. ‘Load prediction using support vector regression’. In: TENCON 2017 - 2017 IEEE Region 10 Conference. (, 2017. pp.  1069–1074
  • [24] PJM. ‘PJM metered hourly zonal load data’. (PJM), 2019. PJM Data Miner 2 https://dataminer2.pjm.com/feed/hrl_load_metered/definition
  • [25] Yuan, Y., Li, Z., Ren, K.: ‘Modeling load redistribution attacks in power systems’, Smart Grid, IEEE Transactions on, 2011, 2, (2), pp. 382–390
  • [26] Boyd, S., Vandenberghe, L.: ‘Convex Optimization’. (Cambridge University Press, 2004)
  • [27] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: ‘Scikit-learn: Machine learning in Python’, Journal of Machine Learning Research, 2011, 12, pp. 2825–2830

Appendix

The parameters ss and dd in (23) determines the dimension of SVR input data matrix 𝑿,m×p\bm{X},m\times p. For example, for Model 2, s=3s=3 and d=2d=2, the length of 𝒇i\bm{f}_{i} is given by

nf=s+1+2d=8.n_{f}=s+1+2d=8. (30)

The resulting data sample length p=3+20×nf=163p=3+20\times n_{f}=163. Since we use load values of previous d=2d=2 days as features, the start hour of our data is 01/03/2015, 0 AM. The end hour is 12/31/2018, 10 PM because for 12/31/2018, 11 PM, we do not have ground truth values of its next hour. In each of the four years, the hour when daylight saving time ends has two load values with identical time stamps, and we approximate the load value at this hour by taking the average of those two values. As a result, the number of data samples for the SVR load predictor is

m=(365×3+366d)2414=35011.m=(365\times 3+366-d)*24-1-4=35011. (31)

The target values for hour hh are the metered loads of the 20 zones at hour h+1h+1. Thus, for each data sample of length p=163p=163, the SVR outputs a vector of length 20 as prediction. We use the first 2625326253 data samples in year 2015 through 2017 to train the SVR load predictor and use the remaining 87588758 data samples in 2018 to test its performance. The resulting training data matrix 𝑿train\bm{X}_{\text{train}} is of size 26253×16326253\times 163, training target value matrix 𝒀train\bm{Y}_{\text{train}} is of size 26253×2026253\times 20, testing data matrix 𝑿test\bm{X}_{\text{test}} is of size 8758×1638758\times 163, and the testing target value matrix 𝒀test\bm{Y}_{\text{test}} is of size 8758×208758\times 20. The dimensions of these matrices for other models can be similarly determined.