Physics-Informed Graphical Neural Network
for Parameter & State Estimations in Power Systems

Laurent Pagnier¹ and Michael Chertkov¹ ¹ Program in Applied Mathematics, University of Arizona, Tucson, USA.

Abstract

Parameter Estimation (PE) and State Estimation (SE) are the most wide-spread tasks in the system engineering. They need to be done automatically, fast and frequently, as measurements arrive. Deep Learning (DL) holds the promise of tackling the challenge, however in so far, as PE and SE in power systems is concerned, (a) DL did not win trust of the system operators because of the lack of the physics of electricity based, interpretations and (b) DL remained illusive in the operational regimes were data is scarce. To address this, we present a hybrid scheme which embeds physics modeling of power systems into Graphical Neural Networks (GNN), therefore empowering system operators with a reliable and explainable real-time predictions which can then be used to control the critical infrastructure. To enable progress towards trustworthy DL for PE and SE, we build a physics-informed method, named Power-GNN, which reconstructs physical, thus interpretable, parameters within Effective Power Flow (EPF) models, such as admittances of effective power lines, and NN parameters, representing implicitly unobserved elements of the system. In our experiments, we test the Power-GNN on different realistic power networks, including these with thousands of loads and hundreds of generators. We show that the Power-GNN outperforms vanilla NN scheme unaware of the EPF physics.

I Introduction

Power System (PS) modeling is the primary tool for the system operator to maintain awareness of the current state of the system. It is also an enabler for critical decision making, especially with regards to overriding automatic system in the times when the system is stressed. Power engineers rely on physical modeling of the system expressed through a balanced set of the so-called Power Flow (PF) equations over the graph of the network, originating from the Ohm’s laws governing the physics of the electricity flows and also including physical modeling of devices, such as generators, transformers and loads, and their control systems [26, 46]. Parameters in the PF equations were estimated and updated systematically. Uncertainty and delays of these updates were accounted for in the State Estimation (SE) and Parameter Estimation (PE) [47, 54, 21, 57], but generally were not causing concerns in the past – largely because the PSs were overbuilt, guaranteeing a safe ride through almost any possible state, thus requiring infrequent corrections. However, this (last century) status quo has changed dramatically during the last two decades with introduction of renewable resources pushing PSs to their limits. Emergence of the demand response technology, allowing many small users to become active, has also contributed to increase of uncertainty and fluctuations. It was recognized that to keep the system running one needs to improve observability – in response, Phasor Measurement Units (PMU) technology was developed [40] and installed [50] massively in majority of the modern power systems during the last two decades. The PMUs record instantaneously and synchronously physical measurements (of voltages and power flows) at the points of the installation. The technology is powerful, but also expensive, e.g. due to the massive amount of data which needs to be taken, communicated, processed in real time, stored and accessed. As the result, only major nodes of the power system networks are actually monitored in real time. Given uncertainty in the system parameters, fluctuations in the generation and consumption, limited observability and massive data flows, the primary challenge of the power system operation room remains – to generate projected SE of the PS for the period of interest, say the next 15 min, fast and reliably. This task also comes hand in hand with the secondary challenge – to generate reliable PE within physical models of the power systems, actual – resolving all nodes of the system (up to hundreds of thousands in the Eastern Interconnect of US), or reduced – effective physical models stated in terms of effective power lines connecting only major nodes of the PS where PMU measurements are available.

Driven by the tremendous progress in the Data Science Disciplines power system engineers are now looking very actively into the new Machine Learning (ML) approaches for SE and PE. Such approaches are largely split into three categories – PS-agnostic, PS-informed and hybrid – with the previous work on the subject reviewed in Section II.

The rest of this manuscript, devoted to further development of the hybrid approach, is organized as follows. Section III is devoted to the technical introduction to SE, PE, Power Flows and the PS model reductions, as well as to expressing the problems in the ML terms. We then discuss details of the PS data generation and introduce our new Power- Graphical Neural Network (Power-GNN) in Section IV. In this Section of the manuscript, we also present details of our experiments with Power-GNN and a vanilla NN used as a physics-blind benchmark. A special emphasis is given to the reconstruction in the demanding regime of partial observability. Section V is reserved for conclusions and discussions of the path forward.

II Related works

II.1 Physics-Informed Modeling and Machine Learning

The term “Physics Informed Machine Learning” (PIML) in reference to ML was coined in the series of Los Alamos “Physics Informed Machine Learning” PIML-Workshops which took place in 2016, 2018 and 2020. [1]. The term came in reference to two complementary aspects – (a) Physics providing an intuitive and often constructive input into design, analysis and implementation of various ML schemes, and (b) when ML helps to solve a physical problem. This manuscript is concerned with the later aspect of PIML. The approach often translates into embedding equations, describing related physics for the application of interest, into a ML scheme. These ideas originate from early work, e.g. of [27], on tuning a NN to satisfy the output of an equation (algebraic or differential). Noticeable applications of the methodology came in the context of dynamical equations (ODEs and PDEs), including the so-called Sparse Identification of Nonlinear Dynamics (SINDy) [5], Physics Informed Neural Network (PINN) [44] which was also applied to the PS dynamics in [34], and Neural ODE [8] frameworks. See also most recent review [53], and references there in, e.g. emphasizing different objectives in combining elements PIML with the state-of-the-art ML models to leverage their complementary strengths. Physics informed NN modeling in applications to learning SE under limited observability was also discussed in [35].

II.2 Graphical Neural Networks and Applications

Graph (Convolutional) Neural Network (GNN) is a NN which we build making relations between variables in the (hidden) layers based on the known graph associated with an application, e.g. PS. In this regards, GNN is informed, at least in part, about physical laws and controls associated with the power system operations. One of the first GNN references [25] focuses on citation networks, and then the approach spread into many applications [58] including geo-location [42], infrastructure modeling [11], system of interacting particles [45] and PSs (mentioned in the next Subsection).

II.3 State and Parameter Estimation in Power Systems

PS-agnostic techniques have focused on making SE based on historical and (limited) measurement data utilizing various modern tools of ML, such as Feed-Forward Neural Networks (FFNNs) [48, 33, 55, 56], autoencoders [3], and most recently Convolutional Neural Network (CNN) [28] and Graph Neural Network (GNN) [4, 17, 36, 30, 24, 29, 7] (some of these in the context of a related problem of fault detection). It is also worth mentioning most recent NeuroIPS 2020 Learning To Run Power System (L2RPS) competition [2], won by teams utilizing PS-agnostic approaches. Physics-Informed approaches to SE, and related problem of PE in static and dynamic models of PS, actual or reduced, state the problem as a regression based on the PF equations [32, 23, 10, 59, 22, 60, 9, 6, 52, 31, 35], and also extend in the related areas of probing the proximity to instability and helping in design of the corresponding emergency control actions [20, 51], optimization and resource allocation [41, 13]. Some of these approaches also dealt with the partial observability [12, 14, 38, 15, 31], which is especially well pronounced on the lower voltage (distribution) side of the PS. The hybrid approach, attempting to mix the best of the two aforementioned approaches (and inspired by universal methodologies [27, 5, 43] suggesting to blend equations with modern ML) was also explored in [34] to estimate the PS dynamics. Ability to predict SE and learn physical models, i.e. to make PE, based primarily on the measurements represents an attractive feature of the framework we advance in this manuscript. In addition to applications mentioned above, validated SE and PE are significant for many other tasks of importance for the system instability, protection and energy management reviewed in many classical books on the subject, e.g. [26, 46].

II.4 Reduced Order Models in Power Systems

Importance of developing reduced order static and dynamic models, providing computationally light equivalent representation of actual PS, was long time recognized as important, see e.g. review [19] and extensive historical references therein. We will highlight in this brief Subsection only a small subset of the waste literature on the subject directly related to the so-called Kron reduction methodology [16, 18] most relevant to the manuscript.

III Problem Formulation: State Estimation and Machine Learning

In this Section we set the stage for what follows (experiments with ML schemes) and introduce a few notions and notations from the PS operations, PS analysis and PS state and parameter estimations in Sections III.1,III.2,III.3, respectively. We discuss our model reduction approach in Section III.4, then resulting in the Physics Informed Machine Learning formulations described in Section III.5.

III.1 Power System Operations

Power system should be balanced on the time scale of seconds. This means that electricity generation ought to match the consumption (i.e. the combined value of electric loads and electric/heating losses) at any time. Electricity demand varies over a range of scales – hourly, daily, weekly or seasonal variations are of a concern in different settings. For example, in an area with a continental climate, winter demand is generally higher than summer demand. This variations may be linked to local specifics, e.g. dependence on the electric heating, as in France. In warmer areas, such in the Southern regions of US, the summer demand exceed the winter demand, it is mainly due to air-conditioning. Shorter time scale fluctuations of loads, observed on the scale from seconds to minutes, are mainly due to activities of many small appliances. Note also that renewables, such as wind and solar, even though they contribute generation, puts an additional pressure on the system injecting uncertainty due to wind and solar which needs to be balanced by other generators or batteries. Operational redistribution of generation between flexible generators is implemented via energy market mechanisms. Basic optimization behind this task, the so-called Optimal Power Flow (OPF), is resolved frequently, in some energy markets as often as once every 3-5 minutes. OPF is a constrained optimization, where in addition to generation constraints (limits on the generation possible at any generator) and thermal constraints (limiting power or current flowing through each line of the power system) we also add the so-called Power Flow (PF) equations, expressing Ohm’s laws, and relating injections/consumptions of power at all the nodes of the system to phases, voltages at the nodes and power flows over the power lines. PF equations are defined and discussed in more details in the following Subsection.

III.2 Power Flow Equations

Consider a Power System (PS) which operates over a grid-graph, ${\cal G}=({\cal V},{\cal E})$ , where ${\cal V}$ and ${\cal E}$ are the set on nodes (generators or loads) and set of lines, respectively. PF equations, governing steady redistribution of power over the system, are stated in terms of the complex-valued (also called complex) powers, $\forall i\in{\cal V}:\ {S}_{i}\equiv p_{i}+{\bm{i}}q_{i}$ , and in terms of the complex-valued (electric) potentials, $\forall i\in{\cal V}:\ {V}_{i}\equiv v_{i}\exp({\bm{i}}\theta_{i})$ , where $v_{i}$ and $\theta_{i}$ denote voltage (magnitude) and phase of the potential at the node $i$ , and ${\bm{i}}^{2}=-1$ . In these notations, the PF equations become, see e.g. [26, 46], $\forall i\in{\cal V}$ ¹¹1We simplify a bit and did not include in the PF equations shunt capacitors, associated with nodes of the systems and representing effects of self-inductance and self-conductance of the PS nodes/buses. :

	$\displaystyle\!\!\!\!\!\!p_{i}$	$\displaystyle=\!\!\!\sum_{j;\{i,j\}\in{\cal E}}\!\!\!v_{i}v_{j}\Big{[}g_{ij}\cos\big{(}\theta_{i}-\theta_{j}\big{)}+b_{ij}\sin\big{(}\theta_{i}-\theta_{j}\big{)}\Big{]},\!\!\!\!$		(1)
	$\displaystyle\!\!\!\!\!\!q_{i}$	$\displaystyle=\!\!\!\sum_{j;\{i,j\}\in{\cal E}}\!\!\!v_{i}v_{j}\Big{[}g_{ij}\sin\big{(}\theta_{i}-\theta_{j}\big{)}-b_{ij}\cos\big{(}\theta_{i}-\theta_{j}\big{)}\Big{]},\!\!\!\!$		(2)

where $b_{ij}$ and $g_{ij}$ are susceptance and conductance of the power lines, $\{i,j\}\in{\cal E}$ ; PF Eqs. (1,2) can be pictured as defining implicitly the PF map, ${\bm{\Pi}}:{\bm{S}}\equiv({S}_{i}|i\in{\cal V})\mapsto{\bm{V}}\equiv({V}_{i}|i\in{\cal V})$ (The map is implicit, because it requires solving the PF equations, which may have no or may have many solutions.) It can also be stated as an (explicit) inverse map, ${\bm{\Pi}}^{-1}:{\bm{V}}\mapsto{\bm{S}}$ . To emphasize parametric dependence of the PF map on the graph, ${\cal G}$ , and also on the matrix of the power line admittances defined over the graph, ${\bm{Y}}\equiv{\bm{g}}+{\bm{i}}{\bm{b}}=(g_{ij}+{\bm{i}}b_{ij}|\{i,j\}\in{\cal E})$ , we use the following notations, ${\bm{S}}={\bm{\Pi}}^{-1}_{\bm{Y}}({\bm{V}})$ .

III.3 State and Parameter Estimations

SE – main task in the center of this manuscript – consists in, given available observations, to predict other relevant physical characteristics of the system. For example, assuming that the PS network, i.e. ${\cal G}$ and ${\bm{Y}}$ , are known, and that voltages and phases are measured by PMUs at all nodes of the system, the SE task may be to estimate injected/consumed active and reactive powers. This task can obviously be resolved by explicit (that is simple) application of the inverse PF map, ${\bm{\Pi}}^{-1}$ , set by the PF equations.

In the case of a limited observability, i.e. when some nodes are unobserved the task of SE can be viewed in two complementary ways. We can pose the question of complementing missing power injections/consumptions only at the nodes where voltages and phases are actually measured. Alternatively, we can also pose a more challenging version of the SE under limited observability — in addition to completing measurements at the observed nodes, also to reconstruct injected/consumed powers and also voltages and phases at all nodes of the system. In this manuscript we will focus on the former, less complicated, but still challenging enough version of the SE.

The problem of PE consists in reconstructing from PMU measurements structure of the PS graph and also line characteristics, i.e. ${\bm{Y}}$ . This question is relevant because parameters of the power line are known to change due to change in the operational conditions, added or removed vegetation and aging. These changes are generally infrequent but still significant, e.g. for making accurate state estimations under complete or partial observability.

III.4 Reduced Modeling

Motivated by the case of partial observability, we are interested to pose the question of finding equivalent models of power systems providing a description similar to the one provided by the PF Eqs. (1,2), however stated over a reduced set of nodes. Given that the PF Eqs. (1,2) are nonlinear the quest of finding an equivalent (reduced) model is nontrivial. One, admittedly phenomenological (empirical) approach may consists in drawing some useful conclusion from the current-voltage version of the problem (and not power-voltage used in the PF setting), where model reduction can be evaluated explicitly.

Specifically, let us discuss the so-called Kron reduction – see [18] and references therein. We divide the nodes of the system into two not-overlapping subsets of the observed (labeled $o$ ) and unobserved (labeled $u$ ) nodes. Then injected/consumed currents, ${I}_{i}={P}_{i}/{V}_{i}$ (not powers!) and potentials of the PS are related to each other according to the Ohm’s law describing linear relations between currents and potentials (see Supplementary File for details). Linearity of the relation between the injected/consumed currents and potentials allows to get a closed relation between currents and voltages associated solely with the observed nodes:

\displaystyle{\bm{I}}^{(o)}={\bm{Y}}^{(r)}{\bm{V}}^{(o)}.

(3)

It is therefore suggestive to use the graph structure, ${\cal G}^{(r)}\equiv({\cal V}^{(o)},{\cal E}^{(r)})$ , associated with the reduced admittance matrix ${\bm{Y}}^{(r)}\doteq(\{i,j\}|Y_{ij}^{(r)}\neq 0)$ , for building an equivalent PF model – akin Eqs. (1,2), $\forall i\in{\cal V}^{(o)}$ , i.e.

\displaystyle{\bm{S}}^{(o)}={\bm{\Pi}}^{-1}_{{\bm{Y}}^{(r)}}({\bm{V}}^{(o)}),

(4)

where, ${\bm{Y}}^{(r)}$ denote the reduced (we can also call it “equivalent”) admittance matrix associated with the effective (not necessarily real) power lines, $\{i,j\}\in{\cal E}^{(r)}$ . ${\bm{Y}}^{(r)}$ is to be guessed, or as we do it in the following, to be learned.

III.5 Physics Informed ML

Eq. (4) sets up the backbone of the PIML scheme we develop for the case of partial observability. Specifically, we aim to solve the following parallel SE and PE learning problem, which we call Power-GNN:

		$\displaystyle\min\limits_{{\bm{\varphi}},{\bm{Y}}^{(r)}}L_{\text{Power-GNN}}\left({\bm{\varphi}},{\bm{Y}}^{(r)}\right),$		(5)
		$\displaystyle L_{\text{Power-GNN}}\left({\bm{\varphi}},{\bm{Y}}^{(r)}\right)\equiv\frac{1}{N\|{\cal V}^{(o)}\|}\sum\limits_{n=1}^{N}\Big{\\|}{\bm{S}}^{(o)}_{n}$
		$\displaystyle-\Pi_{{\bm{Y}}^{(r)}}^{-1}\big{(}{\bm{V}}^{(o)}_{n}\big{)}-\Sigma_{\bm{\varphi}}\big{(}{\cal V}_{n}^{(o)},S_{n}^{(o)}\big{)}\Big{\\|}^{2}+\mathcal{R}({\bm{\varphi}})\,,$		(6)

where the $l_{2}$ -norm, $\|\cdots\|^{2}$ , assumes summation over observed nodes of the PS; $|{\cal V}^{(o)}|$ is the number of the observed nodes; ${\bm{S}}^{(o)}_{n}$ and ${\bm{V}}^{(o)}_{n}$ , with $n=1,\cdots,N$ , denote $N$ samples of the vectors of the complex power and of the electric potential, respectively, measured at the observed nodes, $i\in{\cal V}^{(o)}$ . Here in Eq. (6), $\Sigma_{\bm{\varphi}}\big{(}{\cal V}_{n}^{(o)},S_{n}^{(o)}\big{)}$ is introduced to represent effects of the part of the system which is not observed; ${\bm{\varphi}}$ stands for the vector of parameters representing the hidden part, and $\mathcal{R}({\bm{\varphi}})=\alpha\|{\bm{\varphi}}\|^{2}$ is the regularization term, chosen $l_{2}$ .

In the following, and specifically in Section IV.2, we will introduce the Power-GNN scheme implementing optimization (5). The scheme will be training simultaneously physics-loaded parameters (elements of the admittance matrix, ${\bm{Y}}^{(r)}$ ), and physics-blind parameters representing the $\Sigma_{\bm{\varphi}}\big{(}{\cal V}_{n}^{(o)},S_{n}^{(o)}\big{)}$ function, via a Graphical Neural Networks (GNN) defined over ${\cal G}^{(r)}$ , constructed according to the Kron reduction procedure introduced above in Section III.4.

Refer to caption — Figure 1: (a) IEEE 118-bus test case; (b) IEEE 118-bus test case; (c) PanTaGruEl, a model of the continental European grid [37, 49]. In each panel, generators and loads are displayed as black and white discs respectively. For partial observability, we assume to have measurements coming only from generators.

IV Experiments

We start this Section describing our data generation procedure in Section IV.1. We then proceed introducing in Section IV.2 our custom physics-informed NN, coined Power-GNN, and a Vanilla NN used as a physics-blind benchmark. Finally, our experiments, juxtaposing different learning schemes for the set of PS models in different regimes (in terms of observability) are discussed in Section IV.3.

IV.1 Data Generation for testing SE algorithms

To mimic realistic market operations, the PF physics, as well as variability and uncertainty due to fluctuations of loads and renewables, we build the following procedure for generating data samples, which are used in the following to train the ML models we build for the SE.

We work with three exemplary networks, shown in Fig. (1), representing small, medium and large PSs, and generate synthetic data (to train respective ML models) utilizing a popular OPF solver – the MATPOWER [61]. For each of the networks we generate five data sets that we label as cases #1-5. (Later on we will also explain an additional case #6.) Our data generation procedure for case #1 is as follows:
$\bullet$ Pick a reference configuration of loads and find respective solution of the OPF problem with the generation cost fixed.
$\bullet$ Devise a family of load configurations each with the same proportion of the loads at different nodes (the same participation factor), but rescaled differently. The rescaling factor is chosen to guarantee the system reside within the pre-defined interval between the reference (base) load and the peak load. For each load configuration from the family solve the OPF (with the same cost of generation per generator as in the reference case).
$\bullet$ We arrive at the family of samples each represented by complex potentials and injected/consumed powers known for all nodes of the system.

To produce case #2 data set we modify procedure injecting random load fluctuation at every node (distributed according to i.i.d. Gaussian distribution with variance equal to a constant fraction of the load at the respective node). For the case #3 data set active and reactive load are allowed to vary independently. For case #4, one third of generators is removed from the pool of “ready to generate” generators (This relates to the unit commitment problem). A new set of excluded generators is randomly selected for each sample. Finally, for case #5 data set, the generation cost of generators is randomly chosen in a predefined range, new costs are picked for each sample. Each of the data sets for each of the models consists of 2000 samples.

Table 1: Average mismatch of predicted power injections. For the vanilla NN, we also give their values obtained over the training set.

	case #1	case #2	case #3	case #4	case #5	case #6 $\sum^{i}_{i}$
Vanilla NN	4.9E-6	7.2E-5	6.3E-3	5.2E-2	6.3E-2	1.4E0
	(4.2E-6)	(6.6E-5)	(5.0E-5)	(3.7E-5)	(1.2E-4)	(4.2E-6)
Power-GNN	3.0E-6	5.8E-7	6.9E-7	1.3E-6	2.9E-7	3.0E-6

We conclude this Subsection with a methodological remark, emphasizing specifics of the datasets representing realistic PS. Actual states of a PS are distributed over respective space in a highly structured way, which is certainly rather far from uniform. This is due to special correlations induced by the OPF, PF and other procedures, inheriting structure and correlations of the underlying physics and controls. This highly nonuniform representation of the phase space makes the PS learning problem more challenging than in other applications, as it requires to go beyond the standard interpolation — here we ought to extrapolate into regimes which were either unseen or rarely visited.

IV.2 Neural Networks: Methods

In this brief Subsection we describe two NN schemes – our custom Power-GNN – combining elements of the PF equations and GNN, and – a standard (one may call it “vanilla”) NN introduced for comparison (as a bench-mark for the Power-GNN).

Power-GNN: is the algorithm which solves Eq. (5) by minimizing the Loss Function (6) via the gradient descent (we use the Adam algorithm) over the physical parameters, representing the matrix of admittances, ${\bm{Y}}^{(r)}$ ²²2We also include in this matrix self-admittances of the nodes associated with shunt-capacitors., and also parameters of the Neural Network representing effect of unobserved degrees of freedom on the observed physical characteristics and expressed via $\Sigma_{\bm{\varphi}}\big{(}{\cal V}_{n}^{(o)},S_{n}^{(o)}\big{)}$ term in Eq. (5). The total number of physical parameters which we learn in the process of training of the PIML model is twice the number of nodes plus twice the number of edges in ${\cal E}^{(r)}$ . Even though we call parameters of the NN – ${\bm{\varphi}}$ – physics blind, we are still attempting to inject in them some physical meaning by building the NN based on the graph, ${\cal G}^{(r)}$ , associated with the Kron reduction (see discussion of Section III.4). Specifically, we build Graphical Neural Network (GNN) where graph-convolutions, guiding relation between neurons in different layers, follow ${\cal G}^{(r)}$ . More info on the construction of the GNN is given below.

Vanilla NN: is presented as a benchmark. We choose a NN with four fully-connected (hidden) layers each containing $2\cdot N_{\rm bus}$ neurons. We choose this simple architecture after checking, empirically, that it is the smallest layered fully-connected NN which is trained successfully on all the SE data-sets described in Section IV.1. We tested both RELU and SoftSign activation functions – and observe similar results. We use the following Loss Function

\displaystyle L_{\text{NN}}\doteq\frac{1}{N|{\cal V}^{(0)}|}\sum\limits_{n=1}^{N}\Big{\|}{\bm{S}}^{(o)}_{n}-\text{NN}_{\bm{\varphi}}({\bm{V}}^{(o)}_{n})\Big{\|}^{2},

(7)

to train the Vanilla NN, mapping samples of the electric potentials to the samples of complex powers at the observed nodes. We call training successful when $L_{\text{NN}}$ reaches $10^{-4}$ on the training set. (Learning rate and number of epochs is chosen to reach the goal. (See supplementary file for additional details on our choice of the Vanilla NN, including selection of the hyper-parameters.)

We train both Power-GNN and Vanilla-NN using only fifth of the respective dataset, and then validate it (also to verify that we do not overfit) on the complete dataset.

IV.3 Description and Discussion of the Results

IV.3.1 Full Observability

Full observability means that we assume access to electric potentials and complex powers at every node of the system. This is obviously an ideal (gedanken experiment) setting and later one in the Section we will move on to discuss partial observability where measurements are made over a reduced set of nodes.

Table 1 shows a comparison between Power-GNN and the Vanilla NN benchmark method. Both methods perform similarly when the data sets consists of relatively homogeneous system configurations, as represented in the case #1. Adding small noise to the values of the injected/consumed complex powers in the case #2 (in the process of the sample generation, see Section IV.1 for details) does not alter convergence/performance of either of the two methods compared. However, when we start to increase fluctuations – case #3 accuracy of the Vanilla NN starts to decrease. When we very additionally generator commitment (or altering their cost) – in cases #4 and #5 the accuracy of the Vanilla NN degrades even further. We also test on the case #6 sensitivity of the two schemes to the choice of the reference phase (physics of the PS guarantees that predictions should invariant with respect to shift of all phases of the electric potentials in the system on an arbitrary constant value). Power-GNN passes the test successfully for any values of the phase shift, while Vanilla NN fails in the case of a sufficiently large shift (of $20^{\circ}$ or larger).

As mentioned above, the most significant advantage of the Power-GNN is in its ability to learn the physical, i.e. explainable/interpretable parameters of the PS it is trained on. In Fig. 2, we compare the admittance matrix $\bm{Y}$ reconstructed through learning with Power-GNN against the actual/reference $\bm{Y}_{\rm ref}$ , following the respective Frobenius/Eucledian norm as the test of the Power-GNN fitness. We observe that a good reconstruction is achieved at an impressively (roughly ten) number of samples. When the number of samples is increased, the reconstruction becomes even more reliable – as observed through collapse of the difference between the respective min and max values. Normalizing results by the number of nodes show universal asymptotic – independent on the number of nodes.

IV.3.2 Partial observability

Transitioning to partial observability, we observe that the Vanilla NN fails dramatically rather fast (with increase in the number of nodes which are not observed). Therefore, we limit analysis of this regime only to testing performance of the Power-GNN.

Our main test of the reconstruction quality provided by the Power-GNN consists in comparing impedances of the effective lines predicted by the methods with the respective actual/reference values, which are computed according the Kron reduction, described in Section III.4, setting the effective line as valid if amplitude of the respective admittance, $|Y_{ij}^{(r)}|$ exceeds a predefined and system-dependent threshold. The results are shown in Fig. 3. We observe that the Power-GNN is able to reconstruct the reduced admittance matrix well even in the case of the largest network tested. Notice, that the reconstruction is good, but not perfect. For a small number of lines, we observe values beyond their physically-sensible ranges (positive for conductances and negative for susceptances). We explain this minor deficiency of the Power-GNN by the effect that complexly these miscalculations in the Parameter Estimations (PE) have an insignificant effect on the State Estimations (SE) – consisting in predicting relations between injected/consumer complex powers and electric potentials at the observed nodes. In other words, these few lines with miscalculated admittances carry much less power than what flows through the rest of the system.

We conclude this Section with a brief discussion of the effect of the NN-regularization, i.e. ${\cal R}(\varphi)=\lambda\|{\bm{\varphi}}\|^{2}$ term in Eq. (6), on the quality of the SE. We experimented with the scaling factor, $\lambda$ , and observed that the SE reconstruction becomes optimal at an intermediate (not small, but also not large) value of $\lambda$ , which we therefore choose empirically, i.e. via experiments, as illustrated in Fig. 4.

V Conclusions and Path Forward

We conclude presenting a brief summary of the main highlights of the manuscript:
$\bullet$ We address the problem of State Estimation (SE) and Parameter Estimation (PE) in the Power Systems by constructing a novel NN scheme – Power-GNN.
$\bullet$ Power-GNN borrows physics-informed structure from the Power Flow equations – main equations power engineers are relying on when monitoring and controlling Power Systems.
$\bullet$ Power-GNN is built to reconstruct SE and PE in the regime of limited observability. To achieve this goal we design a model reduction procedure based on the Kron reduction approach (well known and developed in the PS community).
$\bullet$ Power-GNN is characterized by both physics-informed parameters (admittances of the effective lines) and physics-blind parameters expressing effect of unobserved degrees of freedom on the observed ones with Graphical Neural Network (GNN).
$\bullet$ Power-GNN is trained by minimizing PF-aware and NN-dependent Loss Function. We utilize PyTorch [39], thus taking advantage of analytic differentiation over all the parameters (physics-meaningful and physics-blind).
$\bullet$ Performance of Power-GNN was tested on synthetic power system models of small, medium and large size. We choose Vanilla NN as the physics-blind bench-mark. The performance is validated both in terms of the quality of the LF decay (confirming the quality of the underlying State Estimation), and also in terms of the quality of the Parameter Estimations (reconstruction of the effective line admittances).
$\bullet$ We show that Power-GNN allows reconstruction of parameters and state estimation in the demanding regime of limited observability, where physics-blind methods are condemned to fail. We argue that Power-GNN returns output in the physical-terms native for the PS practitioners and that it is also capable of extrapolating into regimes of the PS operations which are not contained (or under-represented) in the trained data.

The following extensions of the methods/approaches and tests presented in the manuscript are in progress:
$\bullet$ Exploring other ways of merging physical and engineering constraints and power system modeling with modern methods in machine learning for better Parameter and State Estimations in PS.
$\bullet$ Testing the methodology on the actual (not synthetic) PMU measurements (subject to collaboration with PS utilities).
$\bullet$ Generalization of the approach to the Dynamic State and Parameter Estimations, with a special focus on the online learning (returning predictions in real time based on the scale of seconds).
$\bullet$ Extension to active learning, when SE and PE are also complemented by recommendation to the PS operator on the step mitigating predicted inefficiences and emergencies.
$\bullet$ Extending the developed methodology to other energy infrastructures, such as natural gas and district heating/cooling system, and also to related problems (of state and parameter estimation) in other physical infrastructures, e.g. in the domains of the transportation and water-management.

Broader Impact

This manuscript contributes our general approach towards developing methodology for the Physics Informed Machine Learning. As argued above some of the principal steps which were put forward to design the Power-GNN allows extensions to other applications in the domains where observations are partial and straightforward application of otherwise power modern methods of ML does result in satisfactory predictions.

References

PIM [2020] 3rd Physics Informed Machine Learning Workshop, LANL, Santa Fe, 2020.
A. Marot and I. Guyon and B. Donnot and G. Dulac-Arnold and P. Panciatici and M. Awad and A. O’Sullivan and A. Kelly and Z. Hampel-Arias [2020] A. Marot and I. Guyon and B. Donnot and G. Dulac-Arnold and P. Panciatici and M. Awad and A. O’Sullivan and A. Kelly and Z. Hampel-Arias. Learning To Run a Power Network Challenge. https://l2rpn.chalearn.org/, accessed 2021-01-31, 2020.
Barbeiro et al. [2014] Barbeiro, P. N. P., Krstulovic, J., Teixeira, H., Pereira, J., Soares, F. J., and Iria, J. P. State estimation in distribution smart grids using autoencoders. In 2014 IEEE 8th International Power Engineering and Optimization Conference (PEOCO2014), pp. 358–363, 2014. doi: 10.1109/PEOCO.2014.6814454.
Bolz et al. [2019] Bolz, V., Rueß, J., and Zell, A. Power flow approximation based on graph convolutional networks. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1679–1686, 2019. doi: 10.1109/ICMLA.2019.00274.
Brunton et al. [2016] Brunton, S. L., Proctor, J. L., and Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113(15):3932–3937, 2016. ISSN 0027-8424. doi: 10.1073/pnas.1517384113. URL https://www.pnas.org/content/113/15/3932.
Chavan et al. [2017] Chavan, G., Weiss, M., Chakrabortty, A., Bhattacharya, S., Salazar, A., and Habibi-Ashrafi, F. Identification and predictive analysis of a multi-area wecc power system model using synchrophasors. IEEE Transactions on Smart Grid, 2017.
Chen et al. [2020] Chen, K., Hu, J., Zhang, Y., Yu, Z., and He, J. Fault location in power distribution systems via deep graph convolutional networks. IEEE Journal on Selected Areas in Communications, 38(1):119–131, Jan 2020. ISSN 1558-0008. doi: 10.1109/jsac.2019.2951964. URL http://dx.doi.org/10.1109/JSAC.2019.2951964.
Chen et al. [2018] Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. Neural ordinary differential equations. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 31, pp. 6571–6583. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf.
Chen et al. [2016] Chen, Y. C., Wang, J., Domínguez-García, A. D., and Sauer, P. W. Measurement-based estimation of the power flow jacobian matrix. IEEE Transactions on Smart Grid, 7(5):2507–2515, 2016.
Chiang [2011] Chiang, H.-D. Direct methods for stability analysis of electric power systems: theoretical foundation, BCU methodologies, and applications. John Wiley & Sons, 2011.
Cui et al. [2020] Cui, Z., Henrickson, K., Ke, R., and Wang, Y. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Transactions on Intelligent Transportation Systems, 21(11):4883–4894, 2020. doi: 10.1109/TITS.2019.2950416.
Deka et al. [2016] Deka, D., Backhaus, S., and Chertkov, M. Estimating distribution grid topologies: A graphical learning based approach. In 2016 Power Systems Computation Conference (PSCC), pp. 1–7, 2016. doi: 10.1109/PSCC.2016.7541005.
Deka et al. [2017] Deka, D., Nagarajan, H., and Backhaus, S. Optimal topology design for disturbance minimization in power grids. In American Control Conference (ACC), pp. 2719–2724, May 2017. doi: 10.23919/ACC.2017.7963363.
Deka et al. [2018] Deka, D., Backhaus, S., and Chertkov, M. Structure learning in power distribution networks. IEEE Transactions on Control of Network Systems, 5(3):1061–1074, 2018. doi: 10.1109/TCNS.2017.2673546.
Deka et al. [2020] Deka, D., Chertkov, M., and Backhaus, S. Topology estimation using graphical models in multi-phase power distribution grids. IEEE Transactions on Power Systems, 35(3):1663–1673, 2020. doi: 10.1109/TPWRS.2019.2897004.
Dobson & Parashar [2010] Dobson, I. and Parashar, M. A cutset area concept for phasor monitoring. In IEEE PES General Meeting, pp. 1–8, 2010. doi: 10.1109/PES.2010.5589839.
Donon et al. [2019] Donon, B., Donnot, B., Guyon, I., and Marot, A. Graph neural solver for power systems. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 2019. doi: 10.1109/IJCNN.2019.8851855.
Dorfler & Bullo [2013] Dorfler, F. and Bullo, F. Kron reduction of graphs with applications to electrical networks. IEEE Transactions on Circuits and Systems I: Regular Papers, 60(1):150–163, 2013. doi: 10.1109/TCSI.2012.2215780.
Dukic & Saric [2012] Dukic, S. D. and Saric, A. T. Serbian Journal of Electrical Engineering, 9(2):131–169, 2012.
Ghanavati et al. [2016] Ghanavati, G., Hines, P. D., and Lakoba, T. I. Identifying useful statistical indicators of proximity to instability in stochastic power systems. IEEE Transactions on Power Systems, 31(2):1360–1368, 2016.
Gomez-Exposito et al. [2011] Gomez-Exposito, A., Abur, A., de la Villa Jaen, A., and Gomez-Quiles, C. A multilevel state estimation paradigm for smart grids. Proceedings of the IEEE, 99(6):952–976, 2011. doi: 10.1109/JPROC.2011.2107490.
Guo et al. [2014] Guo, S., Norris, S., and Bialek, J. Adaptive parameter estimation of power system dynamic model using modal information. IEEE Transactions on Power Systems, 29(6):2854–2861, 2014.
Huang et al. [2009] Huang, Z., Du, P., Kosterev, D., and Yang, B. Application of extended kalman filter techniques for dynamic model parameter calibration. In Power & Energy Society General Meeting (PES’09), pp. 1–8. IEEE, 2009.
Kim et al. [2019] Kim, C., Kim, K., Balaprakash, P., and Anitescu, M. Graph convolutional neural networks for optimal load shedding under line contingency. In 2019 IEEE Power Energy Society General Meeting (PESGM), pp. 1–5, 2019. doi: 10.1109/PESGM40551.2019.8973468.
Kipf & Welling [2016] Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks, 2016.
Kundur et al. [1994] Kundur, P., Balu, N. J., and Lauby, M. G. Power system stability and control, volume 7. McGraw-hill New York, 1994.
Lagaris et al. [1998] Lagaris, I., Likas, A., and Fotiadis, D. Artificial neural networks for solving ordinary and partial differential equations. IEEE Transactions on Neural Networks, 9(5):987–1000, 1998. ISSN 1045-9227. doi: 10.1109/72.712178. URL http://dx.doi.org/10.1109/72.712178.
Li et al. [2019] Li, W., Deka, D., Chertkov, M., and Wang, M. Real-time faulted line localization and pmu placement in power systems through convolutional neural networks. IEEE Transactions on Power Systems, 34(6):4640–4651, Nov 2019. ISSN 1558-0679. doi: 10.1109/TPWRS.2019.2917794.
Liao et al. [2020] Liao, W., Yang, D., Wang, Y., and Ren, X. Fault diagnosis of power transformers using graph convolutional network. CSEE Journal of Power and Energy Systems, pp. 1–9, 2020. doi: 10.17775/CSEEJPES.2020.04120.
Liao et al. [2021] Liao, W., Bak-Jensen, B., Pillai, J. R., Wang, Y., and Wang, Y. A review of graph neural networks and their applications in power systems. arXiv:2101.10025, 2021.
Lokhov et al. [2018] Lokhov, A. Y., Vuffray, M., Shemetov, D., Deka, D., and Chertkov, M. Online learning of power transmission dynamics. In 2018 Power Systems Computation Conference (PSCC), pp. 1–7, June 2018. doi: 10.23919/PSCC.2018.8442720.
Machowski et al. [1997] Machowski, J., Bialek, J., and Bumby, J. R. Power system dynamics and stability. John Wiley & Sons, 1997.
Mestav et al. [2018] Mestav, K. R., Luengo-Rozas, J., and Tong, L. State estimation for unobservable distribution systems via deep neural networks. In 2018 IEEE Power Energy Society General Meeting (PESGM), pp. 1–5, 2018. doi: 10.1109/PESGM.2018.8586649.
Misyris et al. [2020] Misyris, G. S., Venzke, A., and Chatzivasileiadis, S. Physics-informed neural networks for power systems. arXiv:1911.03737, 2020.
Ostrometzky et al. [2020] Ostrometzky, J., Berestizshevsky, K., Bernstein, A., and Zussman, G. Physics-informed deep neural network method for limited observability state estimation. 2020.
Owerko et al. [2019] Owerko, D., Gama, F., and Ribeiro, A. Optimal power flow using graph neural networks. arXiv:1910.09658, 2019.
Pagnier & Jacquod [2019] Pagnier, L. and Jacquod, P. Inertia location and slow network modes determine disturbance propagation in large-scale power grids. PloS one, 14(3):e0213550, 2019.
Park et al. [2018] Park, S., Deka, D., and Chcrtkov, M. Exact topology and parameter estimation in distribution grids with minimal observability. In 2018 Power Systems Computation Conference (PSCC), pp. 1–6, 2018. doi: 10.23919/PSCC.2018.8442881.
Paszke et al. [2019] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. Pytorch: An imperative style, high-performance deep learning library. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 32, pp. 8024–8035. Curran Associates, Inc., 2019. URL http://papers.neurips.cc/paper/9015-pytorch-an-i mperative-style-high-performance-deep-learning-l ibrary.pdf
Phadke [2002] Phadke, A. G. Synchronized phasor measurements-a historical overview. In IEEE/PES Transmission and Distribution Conference and Exhibition, volume 1, pp. 476–479 vol.1, 2002. doi: 10.1109/TDC.2002.1178427.
Poolla et al. [2017] Poolla, B. K., Bolognani, S., and Dorfler, F. Optimal placement of virtual inertia in power grids. IEEE Transactions on Automatic Control, 2017.
Rahimi et al. [2018] Rahimi, A., Cohn, T., and Baldwin, T. Semi-supervised user geolocation via graph convolutional networks. arXiv:1804.08049, 2018.
Raissi [2018] Raissi, M. Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res., 19(1):932–955, January 2018. ISSN 1532-4435.
Raissi et al. [2019] Raissi, M., Perdikaris, P., and Karniadakis, G. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686 – 707, 2019. ISSN 0021-9991. doi: https://doi.org/10.1016/j.jcp.2018.10.045. URL http://www.sciencedirect.com/science/article/pii/S0021999118307125.
Sanchez-Gonzalez et al. [2020] Sanchez-Gonzalez, A., Godwin, J., Pfaff, T., Ying, R., Leskovec, J., and Battaglia, P. W. Learning to simulate complex physics with graph networks. arXiv:2002.09405, 2020.
Sauer et al. [2017] Sauer, P. W., Pai, M. A., and Chow, J. H. Power System Dynamics and Stability: With Synchrophasor Measurement and Power System Toolbox. John Wiley & Sons, 2017.
Schweppe & Handschin [1974] Schweppe, F. C. and Handschin, E. J. Static state estimation in electric power systems. Proceedings of the IEEE, 62(7):972–982, 1974. doi: 10.1109/PROC.1974.9549.
Silva et al. [1993] Silva, A. A., Silva, A. D., and Souza, J. C. State forecasting based on artificial neural networks. 1993.
Tyloo et al. [2019] Tyloo, M., Pagnier, L., and Jacquod, P. The key player problem in complex oscillator networks and electric power grids: Resistance centralities identify local vulnerabilities. Science advances, 5(11):eaaw8359, 2019.
[50] U.S. Energy Information Administration. New technology can improve electric power system efficiency and reliability – Today in Energy. https://www.eia.gov/todayinenergy/detail.php?id=5630, accessed 2021-01-31.
Van Cutsem & Vournas [1998] Van Cutsem, T. and Vournas, C. Voltage stability of electric power systems, volume 441. Springer Science & Business Media, 1998.
Wang et al. [2017] Wang, X., Bialek, J., and Turitsyn, K. Pmu-based estimation of dynamic state jacobian matrix and dynamic system state matrix in ambient conditions. IEEE Transactions on Power Systems, 2017.
Willard et al. [2020] Willard, J., Jia, X., Xu, S., Steinbach, M., and Kumar, V. Integrating physics-based modeling with machine learning: A survey. arXiv:2003.04919, 2020.
Wu & Liu [1989] Wu, F. F. and Liu, W. . E. Detection of topology errors by state estimation (power systems). IEEE Transactions on Power Systems, 4(1):176–183, 1989. doi: 10.1109/59.32475.
Zamzam et al. [2019] Zamzam, A. S., Fu, X., and Sidiropoulos, N. D. Data-driven learning-based optimization for distribution system state estimation. arXiv:1807.01671, 2019.
Zhang et al. [2019] Zhang, L., Wang, G., and Giannakis, G. B. Real-time power system state estimation and forecasting via deep unrolled neural networks. IEEE Transactions on Signal Processing, 67(15):4069–4077, 2019. doi: 10.1109/TSP.2019.2926023.
Zhao et al. [2019] Zhao, J., Gómez-Expósito, A., Netto, M., Mili, L., Abur, A., Terzija, V., Kamwa, I., Pal, B., Singh, A. K., Qi, J., Huang, Z., and Meliopoulos, A. P. S. Power system dynamic state estimation: Motivations, definitions, methodologies, and future work. IEEE Transactions on Power Systems, 34(4):3188–3198, 2019. doi: 10.1109/TPWRS.2019.2894769.
Zhou et al. [2019] Zhou, J., Cui, G., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., and Sun, M. Graph neural networks: A review of methods and applications. arXiv:1812.08434, 2019.
Zhou et al. [2011] Zhou, N., Lu, S., Singh, R., and Elizondo, M. A. Calibration of reduced dynamic models of power systems using phasor measurement unit (pmu) data. In North American Power Symposium (NAPS), pp. 1–7. IEEE, 2011.
Zhou et al. [2015] Zhou, N., Meng, D., Huang, Z., and Welch, G. Dynamic state estimation of a synchronous machine using pmu data: A comparative study. IEEE Transactions on Smart Grid, 6(1):450–460, 2015.
Zimmerman et al. [2020] Zimmerman, R. D., Murillo-Sánchez, C. E., and Thomas, R. J. Matpower: Version 7, 2020.

S1 More details on the Kron reduction

As presented in the main text, in the case of partial observability, buses in the system are divided into two complementary sets: one consisting of the observed buses (labelled $o$ ) and the other of the unobserved buses (labelled $u$ ). With this reindexing, Ohm’s law reads

\left[\begin{array}[]{c}\!\!\bm{I}^{(o)}\!\!\\ \!\!\bm{I}^{(u)}\!\!\end{array}\right]=\left[\begin{array}[]{cc}\!\!\bm{Y}^{(oo)}\!\!&\!\!\bm{Y}^{(ou)}\!\!\\ \!\!\bm{Y}^{(uo)}\!\!&\!\!\bm{Y}^{(uu)}\!\!\end{array}\right]\cdot\left[\begin{array}[]{c}\!\!\bm{V}^{(o)}\!\!\\ \!\!\bm{V}^{(u)}\!\!\end{array}\right]\,,

(S1)

where $\bm{Y}^{(oo)}$ , $\bm{Y}^{(ou)}$ , $\bm{Y}^{(uo)}$ and $\bm{Y}^{(uu)}$ are the submatrices of the admittance $\bm{Y}$ defined by this reindexing of buses. Solving for $\bm{V}^{(u)}$ from the second row of Eq. (S1) and using the obtained expression in the first row, one gets

\bm{I}^{(r)}=\bm{Y}^{(r)}\,\bm{V}^{(o)}\,,

(S2)

where $\bm{I}^{(r)}=\bm{I}^{(o)}-\bm{Y}^{(ou)}\bm{Y}^{(uu)-1}\bm{I}^{(u)}$ and $\bm{Y}^{(r)}=\bm{Y}^{(oo)}-\bm{Y}^{(ou)}\bm{Y}^{(uu)-1}\bm{Y}^{(uo)}$ . (For the sake of simplicity and readability, we write $\bm{I}^{(r)}$ as $\bm{I}^{(o)}$ in the main text.) In general, the effective current injections $\bm{I}^{(r)}$ differs from the measured ones $\bm{I}^{(o)}$ . In power system analysis, one usually describes injections in terms of power rather than with current, hence, from Eq. (S2), the effective power injections read

\bm{S}^{(r)}\equiv\bm{V}^{(o)}\circ\bm{I}^{(r)*}=\bm{V}^{(o)}\circ\bm{I}^{(o)*}-\bm{V}^{(o)}\circ\left(\bm{Y}^{(ou)}\bm{Y}^{(uu)-1}\bm{I}^{(u)}\right)^{*}=\bm{V}^{(o)}\circ\left(\bm{Y}^{\,\rm r}\,\bm{V}^{(o)}\right)^{*}\,,

(S3)

where ^∗ denotes the complex conjugate and $\circ$ the Hadamard product (aka the entry-wise product). Rearranging terms in Eq. (S3), one gets that power injections as they are measured at the observed buses are given by

\bm{S}^{(o)}\equiv\bm{V}^{(o)}\circ\bm{I}^{(o)*}=\bm{V}^{(o)}\circ\left(\bm{Y}^{(r)}\,\bm{V}^{(o)}\right)^{*}+\bm{V}^{(o)}\circ\left(\bm{Y}^{(ou)}\bm{Y}^{(uu)-1}\bm{I}^{(u)}\right)^{*}\,.

(S4)

The right-hand side of Eq. (S4) consists of two terms. The first term depends only on the observed voltages and on the reduced admittance matrix $\bm{Y}^{(r)}$ . This term results in Eqs. (1,2) of the main text. The second term is a mixture of observed and unobserved quantities, and therefore it doesn’t have a well-defined expression (in terms of observed quantities). This observation motivates us to use a hybrid method consisting of two parts: (a) physics-informed learning of the grid parameters and (b) physics-blind learning of the (second) mixed term.

S2 Grid Parametrization

Transmission lines are usually described by an equivalent Pi-circuit model expressed in terms of the line impedance, $z_{ij}=r_{ij}+\bm{i}x_{ij}$ , and two (associated with two end points of the power line) shunt admittances $y_{ij}^{\rm sh}$ , see Fig. S1. The Pi-circuit model encodes the following relation between currents and potentials at the end-points of the line

\left[\begin{array}[]{c}\!\!I_{i}\!\!\\ \!\!I_{j}\!\!\end{array}\right]=\left[\begin{array}[]{cc}\!\!z_{ij}^{-1}+y_{ij}^{\rm sh}/2\!\!&\!\!-z_{ij}^{-1}\!\!\\ \!\!-z_{ij}^{-1}\!\!&\!\!z_{ij}^{-1}+y_{ij}^{\rm sh}/2\!\!\end{array}\right]\cdot\left[\begin{array}[]{c}\!\!V_{i}\!\!\\ \!\!V_{j}\!\!\end{array}\right]\,.

(S5)

The term $z_{ij}^{-1}$ can also be expressed as an admittance $y_{ij}=g_{ij}+\bm{i}b_{ij}$ , where

	$\displaystyle g_{ij}$	$\displaystyle=r_{ij}\big{/}\big{(}r_{ij}^{2}+x_{ij}^{2}\big{)}\,,$		(S6)
	$\displaystyle b_{ij}$	$\displaystyle=-g_{ij}\big{/}\big{(}r_{ij}^{2}+x_{ij}^{2}\big{)}\,.$		(S7)

One can characterize a line with the combination of its conductance, $g_{ij}$ , and susceptance, $b_{ij}$ , or equivalently as the combination of its resistance, $r_{ij}$ , and reactance, $x_{ij}$ . Note however that, in our experiment, we choose to work with the resitance-reactance pairs due to empirical evidence that this choice seems more robust to the (training) initialization.

S3 Computing infrastructure

For small and medium size power system models, such as IEEE 14-bus and 118-bus test cases, PowerGNN can be trained on a personal computer within a reasonable time, e.g. training it on IEEE 118-bus test case with $N_{\rm sample}=400$ takes $\sim$ 1600s on a computer with an Intel Core i7 CPU (1.80GHz) and a NVIDIA GeForce MX150 GPU. For larger systems (e.g. PanTaGruEl), training the model within a reasonable time requires the use of an HPC (High-Performance Computer). We have access to the HPC at UArizona with 28 core processors and 46 nodes with Nvidia P100 GPUs. On this HPC, training of the PanTaGruEl model over $N_{\rm sample}=230$ takes $\sim$ 3200s.

S4 List of hyper-parameters

Table S1: Vanilla NN on IEEE 14-bus test case with full observability

\sum_{g}

Learning rate	2E-5
Number of epochs	8E5
Number of units per layer	28
Number of layers	3

Table S2: Vanilla NN on IEEE 118-bus test case with full observability

\sum_{g}

Learning rate	2E-5
Number of epochs	8E5
Number of units per layer	238
Number of layers	3

Table S3: PowerGNN on IEEE 118-bus test case with full observability

\sum_{g}

Learning rate	5E-4
Number of epochs	5E4
Initial line resistance	1E-2
Initial line reactance	1E-1
Initial bus shunt susceptance	1E-1
Initial bus shunt conductance	1E-1

Table S4: PowerGNN on IEEE 118-bus test case with partial observability

\sum_{g}

Learning rate	2E-5
Number of epochs	2E4
Initial line resistance	1E-1
Initial line reactance	6E-1
Initial bus shunt susceptance	1E-2
Initial bus shunt conductance	1E-1
Number of units per layer	4E2
Number of layers	3

Table S5: Vanilla NN on PanTaGruEl test case with full observability

\sum_{g}

Learning rate	2E-5
Number of epochs	8E5
Number of units per layer	7618
Number of layers	3

Table S6: PowerGNN on IEEE 14-bus test case with full observability

\sum_{g}

Learning rate	2E-4
Number of epochs	3E4
Initial line resistance	1
Initial line reactance	1
Initial bus shunt susceptance	1
Initial bus shunt conductance	1

Table S7: PowerGNN on PanTaGruEl with full observability

\sum_{g}

Learning rate	2E-5
Number of epochs	5E4
Initial line resistance	1E-2
Initial line reactance	1E-1
Initial bus shunt susceptance	1E-1
Initial bus shunt conductance	1E-1

Table S8: PowerGNN on PanTaGrEl with partial observability

\sum_{g}

Learning rate	2E-5
Number of epochs	5E4
Initial line resistance	1E-1
Initial line reactance	6E-1
Initial bus shunt susceptance	1E-2
Initial bus shunt conductance	1E-1
Number of units per layer	1E3
Number of layers	3

Physics-Informed Graphical Neural Network for Parameter & State Estimations in Power Systems