End-to-End Demand Response Model Identification and Baseline Estimation with Deep Learning

Yuanyuan Shi, , Bolun Xu Y. Shi is with University of California San Diego, USA (email: [email protected] ); B. Xu is with Columbia University, USA (email: [email protected]).

Abstract

This paper proposes a novel end-to-end deep learning framework that simultaneously identifies demand baselines and the incentive-based agent demand response model, from the net demand measurements and incentive signals. This learning framework is modularized as two modules: 1) the decision making process of a demand response participant is represented as a differentiable optimization layer, which takes the incentive signal as input and predicts user’s response; 2) the baseline demand forecast is represented as a standard neural network model, which takes relevant features and predicts user’s baseline demand. These two intermediate predictions are integrated, to form the net demand forecast. We then propose a gradient-descent approach that backpropagates the net demand forecast errors to update the weights of the agent model and the weights of baseline demand forecast, jointly. We demonstrate the effectiveness of our approach through computation experiments with synthetic demand response traces and a large-scale real world demand response dataset. Our results show that the approach accurately identifies the demand response model, even without any prior knowledge about the baseline demand.

Index Terms:

Demand response, Deep learning, State estimation

I Introduction

Efficient utilization of demand-side resources in aiding grid operation is critical for a reliable and economic transition towards a low-carbon energy system. Federal Energy Regulatory Commission (FERC) has stated in the recent Order 2222 [1] that wholesale electricity markets must “level playing fields” for behind-the-meter demand response (DR) resources, allowing them to be aggregated as a market participant under one or more market models. Future power system operators, utilities, and aggregators must be equipped with the new schemes and demand response models to fairly settle their market participation, and to predict their incentive-based behaviors associated with demand response instructions.

Yet, integration of DR resources comes with unique challenges [2]: the operator must be able to predict and measure how a DR participant responded, i.e., changes from its baseline demands, to incentive signals to ensure system reliability and fairly settle market payments. In practice, power system operators obtain direct output measurements from all participating resources and use these measurements to monitor the operation reliability and settle market payments for market participants. These on-the-meter grid resources react to grid instructions or price signals predictably based on their internal marginal production cost and physical operation constraints. However, these tasks become non-trivial over a DR participant: 1) a DR’s response to market instruction is mingled with its baseline demand which cannot be metered in exact; 2) a DR may not have an explicit physical operation model or marginal production cost as the flexibility comes from an aggregation of appliances with human behavior involved. Although submetering could be used to improve the visibility over DR by installing meters inside a household to provide room or even appliance specific measurements, it poses serious privacy concerns, and the overwhelming amount of data could lead to a significant increase in the infrastructure investment which may out-weight the benefit of demand response [3, 4].

This paper proposes a novel DR parameter identification and baseline estimation approach using deep learning. We innovate the design of end-to-end model structure by having two modules for agent DR optimization and baseline demand forecast separately. For the agent DR optimization module, we incorporate differentiable optimization as a neural network layer motivated by Amos and Kolter’s work [5]. Different from prior baseline estimation approaches using clustering [6], control groups [7], decomposition [8], and self-report incentives [9], our method can identify the demand response model and the baseline demand directly from net demand measurements without any prior knowledge about historical baseline demands. Our work distinguishes itself from the existing literature in the following three aspects:

1.

We pioneer an end-to-end framework that jointly identifies a demand response agent model and its baseline demand. Our design leverages prior knowledge about the agent utility optimization model and reduces the computational complexity and the number of weights in magnitudes while providing good model explainability. We believe this could be a general recipe for end-to-end models to solve power system operation problems that involve both a prediction stage and a subsequent optimization stage.
2.

The proposed end-to-end learning framework is able to identify users’ demand response, without any prior knowledge about the baseline demand. Compared to existing approaches that require historical baseline demand information, our approach obtains similar performance in agent model identification.
3.

We evaluate our model performance using synthetic and real-world demand response datasets showing the immediate applicability for real-world DR identification problems. In addition, we provide various ablation studies to demonstrate the robustness against measurement noises and time-varying agent models.

The remaining of the paper is organized as follows: Section II provides the background and literature review on demand response modeling. Section III introduces the learning model and Section IV explains the proposed training method. Section V presents computation experiment results and Section VI concludes the paper with a discussion on future directions.

II Backgrounds and Literature

II-A Baseline demand estimation

Estimation of demand baselines is a fundamental requirement to systematically incorporate demand response into power system operation and market settlements: the system operator must disaggregate the effective change in the demand in response to grid needs from the demand baseline to monitor the demand resource, settle compensations based on market prices or tariffs [10].

Existing baseline estimation approaches can be grouped into top-down or bottom-up approaches. Top-down approaches assume demand response “buyers” (system operators or utilities) estimate baselines, such as use conventional demand forecast or use specialized price-responsive demand models [11], or ex-post baseline re-estimations based on realized system and weather conditions through methods such as clustering [6] and control groups [7]. Similar baseline estimation methods have been implemented in the industry. The Scheduled Load Reduction Program (SLRP) operated by Pacific Gas & Electric Company (PG&E) calculates the baseline by averaging the load demand of the selected time periods in the ten previous normal operating days [12]. Bottom-up approaches, on the other hand, require demand response participants, the “provider” of demand response, to report baselines a-prior to the dispatch period, accompanied with supporting tariff mechanisms that encourage participants to truthfully report baselines base on the best of their knowledge [9, 13].

II-B Demand response models

The decision-making process of an incentive-based DR participant can be generalized as an optimization problem with corresponding objective functions and constraints. The goal of the optimization is to minimize the total cost of the consumer responding to DR incentives which include the DR reward and the disutility. Quadratic objective functions are often used for modeling disutilities based on the intuition that the marginal “sacrifice” increases as a consumer reduces more demand, such as turning off lighting loads or reducing the HVAC comfort level [14, 15, 16]. The approach of representing DR decision-making with a disutility function has been widely adopted in DR coordination studies to design incentives that properly navigate consumers’ behavior [17, 18].

Constraints in the demand response model reflect the limits that a consumer can change its electricity demand regardless of the disutility cost. This could include demand shifting needs to shift the electricity consumption from peak price periods to off-peak periods, including rescheduling laundries, temporarily lower water heater settings, or use behind-the-meter energy storage to purposefully arbitrage electricity price differences. Operation of behind-the-meter energy storage and electric vehicles, subject to more sophisticated state-of-charge evolution and efficiency constraints, can also be represented as linear constraints [19, 20].

III Modeling

In this section, we present the end-to-end learning framework. We first describe the DR agent model as a constrained quadratic optimization problem. With such a definition, we model the demand response agent identification problem as a joint learning problem, to identify both the baseline demand and the unknown parameters in the agent model.

Note on vector representation. For the convenience of mathematical presentation, $D_{i}$ , $y_{i}$ , $z_{i}$ and $\lambda_{i}$ , are all vectors with the size same as the considered agent optimization time horizon $T$ instead of a scalar. We will add the subscript $t$ to represent a scalar value for a specific time period $t$ , hence $y_{i}=\{y_{i,t}|t\in\{1,2,\dotsc,T\}\}$ means the demand response vector with $T$ dimensions. Sometimes, we drop the index $i$ and just use $D_{t},y_{t},z_{t},\lambda_{t}$ to denote the $t$ -th element of the corresponding vector. In addition, $x$ is the input features for baseline demand forecast (eg. temperature, calendar information and etc), and $\beta$ , $\theta$ are the model parameters to be learnt.

III-A Demand response agent model

We assume a demand response agent conducts a private optimization over time horizon $T$ to determine its responses $\{y_{t}\}_{t=1}^{T}$ to a time-of-use price $\{\lambda_{t}\}_{t=1}^{T}$ on top of its baseline demand $\{D_{t}\}_{t=1}^{T}$ . The objective of the demand response agent is to minimize its total electricity bill and the personal discomfort as follows,


$\displaystyle\textstyle\min_{y}\;g(D_{t},y_{t};\lambda_{t},\theta):=$	$\displaystyle\sum_{t=1}^{T}\lambda_{t}y_{t}+\frac{\alpha(D_{t})}{2}y_{t}^{2}\,,$	(1a)
s.t.	$\displaystyle\underline{P}\leq y_{t}\leq\overline{P}\,,\forall t\quad:(\underline{\mu}_{t},\overline{\mu}_{t})$	(1b)
	$\displaystyle\underline{E}\leq\sum_{\tau=1}^{t}y_{\tau}\leq\overline{E}\,,\forall t\quad:(\underline{\nu}_{t},\overline{\nu}_{t})$	(1c)

where $y_{t}$ is the optimization decision variable, and $\theta=\{\alpha,\underline{P},\overline{P},\underline{E},\overline{E}\}$ includes parameters in the agent model. We model the personal discomfort as a quadratic function with respect to the preferred consumption schedule (no demand response) [21] and $\{\alpha(D_{t})\}_{t=1}^{T}$ is a time-varying comfort deviation coefficient. We model $\alpha$ a function of $D_{t}$ indicating the user comfort deviation coefficient is dependent on the baseline demand. For example, reducing the same amount of demand may incur a higher discomfort to a user during low demand periods. (1b) models the response limit at each time period in which $\overline{P}$ is the maximum demand increase and $\underline{P}$ is the maximum decrease. (1c) is the demand response balance constraint capping the total amount of demand response over the considered time periods between $\underline{E}$ and $\overline{E}$ . This constraint loosely models resources such as thermal balancing constraint or energy storage state-of-charge constraints [22]. We also list the dual variables associated with these constraints $\{\underline{\mu}_{t},\overline{\mu}_{t},\underline{\nu}_{t},\overline{\nu}_{t}\}$ , these dual variables will later be utilized for designing our gradient-based training algorithm.

In our DR identification problem, we assume parameters in $\theta$ are unknown to the system operator. For example, the system operator does not know what are the discomfort coefficient for their DR participants (i.e., $\alpha$ ), nor the DR response capacity that a participant can provide (i.e., $\{\underline{P},\overline{P},\underline{E},\overline{E}\}$ ).

III-B Baseline demand forecast model

Baseline demand forecast is an important topic in power system planning and operation, and there have been many works proposed for both individual consumer’s and aggregated users’ load forecasting, such as linear models [23], deep learning methods [24, 25] and many more. Our framework is flexible to incorporate different baseline demand forecast methods as a plug-in module, as long as it supports the automatic differentiation [26].

In particular, we use a fully connected multi-layer perceptron (MLP) model for the baseline demand forecast in all our experiments. Inputs to the module include useful features such as weather data (eg. temperature, humidity, wind speed), calendar information (e.g., month, day, weekday/weekend), and effects of human activities. The baseline demand forecast model is represented with the following formulation,

\displaystyle\hat{D}=f(x;\beta)

(2)

where $\hat{D}=\{\hat{D_{t}}\}_{t=1}^{T}\in\mathbb{R}^{T}$ is the predicted baseline demand, $x\in\mathbb{R}^{m}$ is the demand forecast features and $f(\cdot)$ is the baseline demand prediction network with model parameters $\beta$ .

III-C Joint baseline grounding and agent model identification

The goal of the system operator is to jointly identify the DR agent model $\theta=\{\alpha,\underline{P},\overline{P},\underline{E},\overline{E}\}$ and baseline demand $\{D_{t}\}_{t=1}^{T}$ . However, the challenge is that the system operator cannot directly observe agent’s response $\{y_{t}\}_{t=1}^{T}$ with respect to the incentive price signal $\{\lambda_{t}\}_{t=1}^{T}$ . Instead, the system operator only observes agent’s net demand $\{D_{t}+y_{t}\}_{t=1}^{T}$ , which is the summation of agent’s baseline demand and demand response. Existing works usually take a two-step approach. In particular, it first estimates the baseline demand using historical demand data, then subtracts baseline estimation from observed net demand for agent DR model identification. Yet, as we have discussed in the literature review, in this approach the system operator must have access to a sufficient amount of “vanilla” demand data to make sure the baseline estimation is fair and accurate. However, such data might not be available if the participant has been constantly responding to DR signals.

Different from the two-step approach, we propose an end-to-end differential learning approach to jointly minimize the baseline prediction and agent model identification errors with a set of historical net demand measurements consisting $N$ days or scenarios, and each scenario includes $T$ time steps. In this case, the system operator solves the following problem


$\displaystyle\min_{\theta,\beta}\;L:=$	$\displaystyle\frac{1}{2N}\sum_{i=1}^{N}\sum_{t=1}^{T}\Big{(}\hat{D}_{i,t}+y^{*}_{i,t}-z_{i,t}\Big{)}^{2}$	(3a)
s.t.	$\displaystyle\{\hat{D}_{i,t}\}_{t=1}^{T}=f(x_{i};\beta)$	(3b)
	$\displaystyle\{y^{*}_{i,t}\}_{t=1}^{T}\in\arg\min_{y}\;g(\{\hat{D}_{i,t},y_{i,t}\}_{t=1}^{T};\{\lambda_{i,t}\}_{t=1}^{T},\theta)$
	s.t. (1b) and (1c)	(3c)

in which the training / identification inputs include

•

$z_{i,t}$ is the net demand measurement over scenario $i$ and time period $t$ ;
•

$x_{i}$ is the baseline demand prediction features;
•

$\lambda_{i,t}$ is the demand response incentives over scenario $i$ and time period $t$ ;
•

$f(\cdot)$ is the MLP model used for baseline demand prediction;
•

$g(\cdot)$ is the agent demand response model.

and the problem has the following variables

•

$\theta$ is the DR agent model parameters;
•

$\beta$ is the MLP network weights used for the baseline demand model;
•

$\hat{D}_{i,t}$ is the baseline demand over scenario $i$ and time period $t$ ;
•

$y^{*}_{i,t}$ is the DR agent response decision over scenario $i$ and time period $t$ .

The joint identification problem minimizes the mean squared loss between the predicted net demand $(\hat{D}_{i,t}+y^{*}_{i,t})$ and the observed net demand $z_{i,t}$ for $N$ training samples over $T$ horizon. Fig 1 visualizes the proposed end-to-end joint baseline demand estimation and agent model identification framework. Specifically, the top green colored block in Fig 1 is the baseline demand forecast module that maps the demand forecast features to baseline demand prediction; and the bottom green colored block is the demand response forecast, that takes DR signals and predict the DR agent response. The summation of these two intermediate predictions construct the net demand prediction $(\hat{D}_{i}+y^{*}_{i})$ , which is compared against the true net demand data $z_{i}$ as the model loss function (3a).

Refer to caption — Figure 1: Illustration of the proposed end-to-end joint DR agent identification and baseline demand estimation framework.

IV Training Method

We propose a gradient-descent approach to solve the joint baseline and DR agent identification problem (3). We start by taking the gradient of the training objective function $L$ with respect to model parameters $\theta$ and $\beta$


$\displaystyle\frac{\partial L}{\partial\theta}$	$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\sum_{t=1}^{T}\Big{(}\hat{D}_{i,t}+y^{}_{i,t}-z_{i,t}\Big{)}\Big{(}\frac{\partial y^{}_{t}}{\partial\theta}\Big{)}$	(4a)
$\displaystyle\frac{\partial L}{\partial\beta}$	$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\sum_{t=1}^{T}\Big{(}\hat{D}_{i,t}+y^{}_{i,t}-z_{i,t}\Big{)}\Big{(}\frac{\partial\hat{D}_{t}}{\partial\beta}+\frac{\partial y^{}_{t}}{\partial\beta}\Big{)}$	(4b)

and now we are interested in calculating the three partial derivative terms $\frac{\partial y^{*}_{t}}{\partial\theta},\frac{\partial\hat{D}_{t}}{\partial\beta},\frac{\partial y^{*}_{t}}{\partial\beta}$ . The demand response $y^{*}_{t}$ is dependent on the baseline demand coefficients $\beta$ because the discomfort coefficient $\alpha$ is dependent on the baseline $D_{t}$ as shown in (1). Also note that gradient $\frac{\partial\hat{D}_{t}}{\partial\beta}$ is the gradient of the MLP, which can be easily computed via backpropagation. Calculating the gradients of the DR response $y^{*}_{i,t}$ is more sophisticated and is the focus of our proposed method.

IV-A Gradients of the DR response

We use the OptNet architecture [5] to represent the agent identification problem (1). The OptNet architecture leverages prior knowledge about the optimization and thereby, favouring sample efficiency. To obtain the demand response forecast, it solves the optimization problem in the forward loop, with the learnt parameters. Then it compares the difference between the optima of the optimization problem with the learnt parameters, and the optima of the true optimization problem, and use the difference as the loss function to update the unknown optimization parameters.

Our objective is to calculate $\frac{\partial y^{*}_{t}}{\partial\theta}$ and to backpropagate the training loss with respect to the agent model parameters $\theta$ . To this end, we ignore the dependency of $\alpha$ over $D_{t}$ for now and treat it as a parameter instead of a function. In the next subsection on end-to-end training, we will discuss incorporate the dependency $\alpha$ over $D_{t}$ using derivative chain rules. We first converting the agent model into a set of equations using the Karush–Kuhn–Tucker (KKT) conditions. The KKT conditions of optimization problem (1) can be written as, for $t=1,...,T$ ,


$\displaystyle\textstyle\lambda_{t}+\alpha y_{t}^{}+\overline{\mu}_{t}^{}-\underline{\mu}_{t}^{}+\sum_{\tau=t}^{T}\overline{\nu}_{\tau}^{}-\sum_{\tau=t}^{T}\underline{\nu}_{\tau}^{*}$	$\displaystyle=0$	(5a)
$\displaystyle\overline{\mu}_{t}^{}(y_{t}^{}-\overline{P})=0\,,\quad\overline{\mu}_{t}^{*}$	$\displaystyle\geq 0$	(5b)
$\displaystyle\underline{\mu}_{t}^{}(\underline{P}-y_{t}^{})=0\,,\quad\underline{\mu}_{t}^{*}$	$\displaystyle\geq 0$	(5c)
$\displaystyle\textstyle\overline{\nu}_{t}^{}(\sum_{\tau=1}^{t}y_{\tau}^{}-\overline{E})=0\,,\quad\overline{\nu}_{t}^{*}$	$\displaystyle\geq 0$	(5d)
$\displaystyle\textstyle\underline{\nu}_{t}^{}(\underline{E}-\sum_{\tau=1}^{t}y_{\tau}^{})=0\,,\quad\underline{\nu}_{t}^{*}$	$\displaystyle\geq 0$	(5e)

where (5a) is the stationarity condition of the DR agent problem (1), (5b)–(5e) are the complementary slackness conditions associated with constraints (1b) and (1c). $y^{*},\underline{\mu}^{*},\overline{\mu}^{*},\underline{\nu}^{*},\overline{\nu}^{*}$ are the optimal primal and dual variables.

Next, we calculate the derivative of the optimal response $y_{t}^{*}$ with respect to model parameters $\theta=\{\alpha,\underline{P},\overline{P},\underline{E},\overline{E}\}$ . We first take the total derivatives of the KKT conditions and summarize in compact matrix form, which is given in Eq (6), {strip}

\displaystyle\underbrace{\begin{bmatrix}\Lambda(\mathbbm{1}\alpha)&I&-I&\Gamma^{\top}&-\Gamma^{\top}\\ \Lambda(\overline{\mu}^{*})&\Lambda(y^{*}\!-\!\mathbbm{1}\overline{P})&0&0&0\\ -\Lambda(\underline{\mu}^{*})&0&\Lambda(\mathbbm{1}\underline{P}\!-\!y^{*})&0&0\\ \Lambda(\overline{\nu}^{*})\Gamma&0&0&\Lambda(\Gamma y^{*}\!-\!\mathbbm{1}\overline{E})&0\\ -\Lambda(\underline{\nu}^{*})\Gamma&0&0&0&\Lambda(\mathbbm{1}\underline{E}\!-\!\Gamma y^{*})\end{bmatrix}}_{A}*\begin{bmatrix}dy\\ d\overline{\mu}\\ d\underline{\mu}\\ d\overline{\nu}\\ d\underline{\nu}\end{bmatrix}=-\begin{bmatrix}\Lambda(y^{*})\mathbbm{1}d\alpha\\ -\Lambda(\overline{\mu}^{*})\mathbbm{1}d\overline{P}\\ \Lambda(\underline{\mu}^{*})\mathbbm{1}d\underline{P}\\ -\Lambda(\overline{\nu}^{*})\mathbbm{1}d\overline{E}\\ \Lambda(\underline{\nu}^{*})\mathbbm{1}d\underline{E}\end{bmatrix}

(6)

where $\mathbbm{1}$ is an all-one vector and $\Gamma$ is a lower triangular matrix with appropriate dimension, and $\Lambda(\cdot)$ creates a diagonal matrix from a vector. The deviation process of Eq (6) is provided in the Appendix.

Our goal is to obtain the derivative of the optimal primal solution $y^{*}$ with respect to the optimization parameters $\theta=\{\alpha,\underline{P},\overline{P},\underline{E},\overline{E}\}$ (we can also compute the Jacobian of optimal dual solutions with respect to the optimization parameters using the same approach). We divide both sides of (6) with the parameter derivatives $d\alpha,d\underline{P},d\overline{P},d\underline{E},d\overline{E}$ respectively, and multiply both sides with $A^{-1}$ :


$\displaystyle\begin{bmatrix}\frac{dy}{d\alpha}\;\frac{d\overline{\mu}}{d\alpha}\;\frac{d\underline{\mu}}{d\alpha}\;\frac{d\overline{\nu}}{d\alpha}\;\frac{d\underline{\nu}}{d\alpha}\end{bmatrix}^{\top}$	$\displaystyle=-A^{-1}\begin{bmatrix}y^{*}\;\vec{0}\;\vec{0}\;\vec{0}\;\vec{0}\end{bmatrix}^{\top}$	(7a)
$\displaystyle\begin{bmatrix}\frac{dy}{d\overline{P}}\;\frac{d\overline{\mu}}{d\overline{P}}\;\frac{d\underline{\mu}}{d\overline{P}}\;\frac{d\overline{\nu}}{d\overline{P}}\;\frac{d\underline{\nu}}{d\overline{P}}\end{bmatrix}^{\top}$	$\displaystyle=-A^{-1}\begin{bmatrix}\vec{0}\;-\overline{\mu}^{*}\;\vec{0}\;\vec{0}\;\vec{0}\end{bmatrix}^{\top}$	(7b)
$\displaystyle\begin{bmatrix}\frac{dy}{d\underline{P}}\;\frac{d\overline{\mu}}{d\underline{P}}\;\frac{d\underline{\mu}}{d\underline{P}}\;\frac{d\overline{\nu}}{d\underline{P}}\;\frac{d\underline{\nu}}{d\underline{P}}\end{bmatrix}^{\top}$	$\displaystyle=-A^{-1}\begin{bmatrix}\vec{0}\;\vec{0}\;\underline{\mu}^{*}\;\vec{0}\;\vec{0}\end{bmatrix}^{\top}$	(7c)
$\displaystyle\begin{bmatrix}\frac{dy}{d\overline{E}}\;\frac{d\overline{\mu}}{d\overline{E}}\;\frac{d\underline{\mu}}{d\overline{E}}\;\frac{d\overline{\nu}}{d\overline{E}}\;\frac{d\underline{\nu}}{d\overline{E}}\end{bmatrix}^{\top}$	$\displaystyle=-A^{-1}\begin{bmatrix}\vec{0}\;\vec{0}\;\vec{0}\;-\overline{\nu}^{*}\;\vec{0}\end{bmatrix}^{\top}$	(7d)
$\displaystyle\begin{bmatrix}\frac{dy}{d\underline{E}}\;\frac{d\overline{\mu}}{d\underline{E}}\;\frac{d\underline{\mu}}{d\underline{E}}\;\frac{d\overline{\nu}}{d\underline{E}}\;\frac{d\underline{\nu}}{d\underline{E}}\end{bmatrix}^{\top}$	$\displaystyle=-A^{-1}\begin{bmatrix}\vec{0}\;\vec{0}\;\vec{0}\;\vec{0}\;\underline{\nu}^{*}\end{bmatrix}^{\top}$	(7e)

Note that the derivative of a parameter with respect to itself (for example, $\frac{d\alpha}{d\alpha}$ ) is one while the derivative to different parameters (for example, $\frac{d\overline{P}}{d\alpha}$ ) are zeros. Therefore we obtain $\frac{\partial y^{*}}{\partial\alpha},\frac{\partial y^{*}}{\partial\overline{P}},\frac{\partial y^{*}}{\partial\underline{P}},\frac{\partial y^{*}}{\partial\overline{E}},\frac{\partial y^{*}}{\partial\underline{E}}$ each as the first element of (7a)–(7e).

Remark 1. An implicit assumption for the above derivation regarding the Jacobian terms $\frac{\partial y^{*}}{\partial\alpha},\frac{\partial y^{*}}{\partial\overline{P}},\frac{\partial y^{*}}{\partial\underline{P}},\frac{\partial y^{*}}{\partial\overline{E}},\frac{\partial y^{*}}{\partial\underline{E}}$ is that the optimal solution is continuously differentiable in a neighborhood of the optimization parameters. However, for a general convex optimization problem, the optimal solution sets are not necessarily continuously differentiable in the optimization parameters. Paper [27] provided sufficient conditions under which these Jacobian are well-defined. We consider the DR agent optimization model as a Quadratic Program (QP), which satisfies Assumptions 1-3 in [27], thus the Jacobians are well-defined. Beyond the quadratic form DR agent model, our framework can also be used in general convex DR agent models, where both the objective function and constraints are twice continuously differentiable in the optimization variable $y$ , and continuously differentiable in the optimization parameters.

IV-B End-to-end training

Remember that we consider the problem setting where the system operator only has access to historical demand response incentive signal $\lambda_{i}$ , baseline demand feature $x_{i}$ and the net demand measurement $z_{i}$ (baseline demand + demand response). The system operator solves the joint learning problem as stated in (3), in order to jointly identify the baseline demand and agent DR model. To this end, we propose an end-to-end (E2E) training framework that combines both the gradient of the baseline demand forecast module and the demand response agent module. The model parameters are updated as follows during iteration $k$ :


$\displaystyle\text{Baseline: }\beta^{(k+1)}$	$\displaystyle\leftarrow\beta^{(k)}+\eta_{1}\Big{(}\frac{\partial L}{\partial D}+\frac{\partial L}{\partial y^{}}\frac{\partial y^{}}{\partial\alpha}\frac{\partial\alpha}{\partial D}\Big{)}\frac{\partial D}{\partial\beta}\,,$	(8a)
$\displaystyle\text{DR agent: }\theta^{(k+1)}$	$\displaystyle\leftarrow\theta^{(k)}+\eta_{2}\frac{\partial L}{\partial y^{}}\frac{\partial{y}^{}}{\partial\theta}\,,$	(8b)

where $\eta_{1}$ and $\eta_{2}$ are learning rates for baseline and DR agent modules. $\frac{\partial L}{\partial D}$ and $\frac{\partial L}{\partial y^{*}}$ are gradients of the loss function with respect to the intermediate baseline demand and agent response predictions. $\frac{\partial D}{\partial\beta}$ is the gradient of $D$ with respect to the baseline demand forecast model weights. For the MLP network model used in this paper, we compute the gradient via backpropagation provided by PyTorch. The gradient update of the baseline module also includes $\frac{\partial y^{*}}{\partial\alpha}\frac{\partial\alpha}{\partial D}$ as we assume the DR agent’s discomfort coefficient $\alpha$ depends on the baseline demand. $\frac{\partial y^{*}}{\partial\alpha}$ is calculated in (7a), and $\frac{\partial\alpha}{\partial D}$ depends on the agent model. Gradient updates of the DR agent module is solely based on $\frac{\partial y^{*}}{\partial\theta}$ which we calculate according to (7). Notably, this training step has a separable structure that the demand forecast neural network module has no common parameters with the differentiable optimization module, hence their gradient will not impact each other.

Compared to classic two-step methods that first predict the baseline demand and then estimate the agent DR model and response amount, our end-to-end framework alleviates error accumulation since both modules are optimized simultaneously. Compared with a fully connected network overall features, our design embeds prior knowledge about the agent behavior (i.e., the utility optimization problem), which reduces the sample and computational complexity in magnitudes, and also providing better model explainability.

The training process is summarized as following for a given training data set $\{x_{i},\lambda_{i},z_{i}\}$

1.

Initialize unknown parameters for the baseline forecast module $\beta^{(1)}$ and the DR agent module $\theta^{(1)}$ ; set the iteration number to 1.
2.
At $k$ th iteration
1. (a)
  
  Forward the baseline demand prediction module (2) with feature input $x_{i}$ and parameters $\beta^{(k)}$ , record the baseline prediction result $\hat{D}_{i}^{(k)}$ .
2. (b)
  
  Forward solve the DR problem (1) with price $\lambda_{i}$ and parameters $\theta^{(k)}$ using an optimization solver, record the primal results ${y_{i}^{*}}^{(k)}$ and the dual results ${\underline{\mu}_{i}^{*}}{(k)},{\overline{\mu}_{i}^{*}}{(k)},{\underline{\nu}_{i}^{*}}{(k)},{\overline{\nu}_{i}^{*}}{(k)}$ .
3. (c)
  
  Add up the two prediction as the net demand prediction $\hat{D}_{i}^{(k)}+{y_{i}^{*}}^{(k)}$ , compare it against the observed net demand $z_{i}$ and obtain loss $L=||\hat{D}_{i}^{(k)}+{y_{i}^{*}}^{(k)}-z_{i}||_{2}^{2}$ .
4. (d)
  
  Update $\beta^{(k+1)}$ and $\theta^{(k+1)}$ with Eq (8a) and (8b). Proceed to the next iteration.

Remark 2. Warm start. In experiments, we observed that adding a warm start to the baseline demand forecast module using net measurements could improve the performance. In particular, the warm start is stated as

\displaystyle\min_{\beta}\;

\displaystyle\textstyle\frac{1}{N}\sum_{i=1}^{N}||\hat{D}_{i}-z_{i}||^{2}\text{ s.t. }\hat{D}_{i}=f(x_{i};\beta)

(9)

We note that the warm start trains the baseline demand prediction model using net demand measurements $z_{i}$ rather than historical baseline demand (which the system operator may not have access to). The optimized model parameter $\beta$ is then used as a warm starting point for the end-to-end training stage, in which $\beta$ is optimized together with DR agent model parameters $\theta$ following Eq (8a) and (8b).

Remark 3. Demand response dependency over baseline. We ends this section by discussing how the proposed framework handles the dependency of demand response over the baseline demand. In some demand response scenarios like load shedding and peak shaving, the demand response is dependent over the baseline demand. In other scenarios like behind-the-meter energy storage price response, the change of demand may not depend on the baseline demand. This difference is modeled as $\frac{\partial\alpha}{\partial D}$ that if the disutility parameter $\alpha$ is dependent/independent of the baseline demand estimation $D$ . If there is no dependency, we can set this value to zero and then the baseline gradient update reduces to $\beta^{(k+1)}\leftarrow\beta^{(k)}+\eta_{1}\frac{\partial L}{\partial D}\frac{\partial D}{\partial\beta}$ .

V Experiment Result and Discussion

We demonstrate the performance of the proposed framework in identifying the DR agent model and baseline demand with both synthetic dataset and real world demand response dataset. In the synthetic dataset experiment, we compared the identification result with the ground-truth DR model, while with the real world dataset we compared end-to-end training with a benchmark baseline estimation method using a control group. We use Pytorch to build the models and run the training process in Google Colab with GPU acceleration.

V-A Demand response identification with synthetic dataset

We generate DR data using real demand profiles as the baseline, with added response to price signals. For the baseline demand, we assume a demand response aggregator works directly with aggregated measurement which is viewed as one response agent. We use smart meter data from the Iowa Distribution Test Systems Dataset [28]. The system consists of 3 feeders, 240 nodes, and 1120 customers, all of which are equipped with smart meters. These smart meters measure hourly energy consumption (kWh) in the year 2017. The electricity consumption data is at the secondary distribution transformer level. In particular, we use the aggregated demand data from Feeder 1, Bus 1004 for the experiment. We also include the temperature data of Columbus from the Climate Data Online (CDO) portal in the National Climatic Data Center (NCDC) website [29] as baseline load forecast features. We use 200 days of data for training and 60 days for testing the demand response prediction results using the identified parameters. Table I provides the detailed statistics of the demand profile.

TABLE I: Summary of Statistics of the Iowa Distribution Bus 1004

Max (kW)	Min (kW)	Mean (kW)	Std (kW)
41.534	2.401	11.332	6.166

For demand response, we consider a demand-independent demand response agent model in this case study as follows

\displaystyle\textstyle\min_{\{y_{t}\}}\sum_{t=1}^{T}\lambda_{t}y_{t}+\frac{\alpha}{2}y_{t}^{2}\text{ s.t. }-M\leq\sum_{t=1}^{T}y_{t}\leq M

(10)

where $\alpha$ is the comfort deviation coefficient and $M$ represents the limit of the total demand change within the optimization period. In order to test the performance of different agent models, we randomly sample the ground truth $\alpha\sim\text{Uniform}[10,50]$ and $M\sim\text{Uniform}[1,10]$ at each test time. $\lambda$ is the time-of-use tariff, where we use the day ahead LMP data from NYISO [30]. We assume a day-ahead scheduled DR in which the optimization horizon is set to be $T=24$ (24 hours).

We use the above agent model (10) and the corresponding day ahead LMP [30] to generate synthetic demand response data, for a specific sampled combination ( $\alpha$ , $M$ ). The baseline demand data and the synthetic demand response data are combined to generate the net measurement data set. Fig 2 visualizes an example of the baseline demand and net demand for 1 day. Throughout our experiments, all MLP baseline demand forecast models are feed-forward neural networks consisting of three hidden layers with (200, 100, 100) hidden units per layer, and the demand response module uses OptNet architecture defined on optimization Eq. (10). We use Adam optimizer with a learning rate of $1e-3$ for the base load forecast module and $1e-1$ for the OptNet layer. For evaluations, we repeated for $N=10$ times with different choices of $(\alpha,M)$ to report the average performance and the standard deviations. Table II summarizes the mean absolute error (MAE) and standard deviations for the parameter identification results.

TABLE II: Agent Model Identification Errors (MAE).

Discomfort coefficient $\alpha$		Limit of total demand change $M$
Mean	Std	Mean	Std
1.178	0.574	2.164	0.948

In addition to agent demand response model identification, we compare the baseline forecast performance over two settings: a-priori baseline prediction, in which we directly use the trained baseline forecast model to predict demand over a 24-hour horizon. A-priori baseline prediction is useful for resource scheduling and estimating feeder loads. The other setting is ex-post baseline estimation, in which we estimate the baseline by subtracting the demand response component (based on the identified response model parameters) from the net demand. The ex-post baseline estimation is useful to identify effective demand response and to settle payments.

TABLE III: Baseline demand estimation error metrics (Mean absolute error (MAE)/Mean absolute Percentage Error (MAPE)).

A-priori Baseline Estimation		Ex-post Baseline Estimation
MAE	MAPE	MAE	MAPE
2.93	27.10%	0.09	1.01%

Results in Table III show a relatively large a-priori baseline prediction error due to the challenge of performing day-ahead demand prediction with limited training data and features (note that we train the model over 200 days and test over the rest 60 days). But the ex-post baseline estimation results are substantially improved. As a base for comparison, the MAE and MAPE between net demand (baseline+response) and baseline are 0.44 kW / 4.77% for the aggregation case. Hence, our method provides a significant improvement over ex-post baseline estimation, where the baseline estimation error is reduced by about 90%. Finally, Fig 3 provides an example of the end-to-end training approach performance in demand response identification, a-priori baseline estimation and ex-post baseline estimation in 5 consecutive test days.

V-B Real world demand response dataset

We conduct identification experiments using real world demand response experiment data from the low carbon London project [31], which included demand profile at 30-minute granularity during full year 2013 from 1,100 households receiving time-of-use (ToU) tariffs and 4,400 households receiving non-ToU flat rate (14.23 pence/kWh) for comparison, all located in the London area. The ToU tariff is set to 11.76 pence/kWh on default, 67.20 pence/kWh during high price periods, 3.99 pence/kWh during low price periods. High price periods were activated 69 times with a total duration of 394 hours, and low price periods were activated 93 times with a total duration of 830 hours. Customers are grouped into three categories based on income level: affluent, comfortable, and adversity, each category contains ToU and non-ToU customers. Fig. 4 shows an example of the average consumer demand from the ToU group and the non-ToU group.

We design the agent optimization model as

\displaystyle\textstyle\min_{\{y_{t}\}}\sum_{t=1}^{T}\lambda_{t}y_{t}+\frac{0.5\alpha_{1}}{D_{t}-D_{\mathrm{min}}}([y_{t}]^{+})^{2}+\frac{0.5\alpha_{2}}{D_{t}-D_{\mathrm{min}}}([-y_{t}]^{+})^{2}

(11)

where $[x]^{+}=\max\{0,x\}$ is the positive value function, $D_{t}$ is the baseline demand, and $D_{\mathrm{min}}$ is the annual minimum demand that represents non-responsive demand, also called the “ghost demand”. Hence, we assume the disutility coefficient is inverse proportional to the responsive baseline demand and has different values for demand increments ( $\alpha_{1}$ ) and demand reduction ( $\alpha_{2}$ ). We do not include the daily total response constraints in this design because there is only at most one demand response event per day in the considered dataset.

The input features to the baseline demand forecast MLP model are 6 dimensional: temperature, humidity, demand of the previous day and calendar information including hour/day/month.

TABLE IV: Comparison between the E2E model identification result and benchmark model on London DR Dataset

Method	Affluent		Comfortable		Adversity
Method	$\alpha_{1}$	$\alpha_{2}$	$\alpha_{1}$	$\alpha_{2}$	$\alpha_{1}$	$\alpha_{2}$
E2E	896	-42	730	-28	829	-32
Benchmark	896	-42	665	-34	819	-34

We compare our end-to-end model performance with a benchmark method to estimate $\alpha_{1}$ and $\alpha_{2}$ using control groups. The benchmark method follows a regression approach and features include non-ToU control group demand and temperature. The baseline prediction MAE of the benchmark method over the default tariff period is 0.009 kW. The MAPE is 5.8%, $\alpha_{1}$ and $\alpha_{2}$ in (11) is thus estimated using least square using the difference between the actual demand and the predicted baseline during the higher or low tariff period.

Table IV shows the estimation result of $\alpha_{1}$ and $\alpha_{2}$ as formulated in (11) for three aggregated demand groups. The result shows our method generates very similar results compared to the benchmark method, which has access to the historical baseline demand information using a control group. Note that we do not assume any prior knowledge about the baseline demand or use demand profile from the non-ToU control group in the end-to-end method. Both methods provide very similar identification results and all are consistent with the study result from the London Project [31]. E2E training generates higher $\alpha_{1}$ in the Comfortable and Adversity group, possibly due to different error biases in the baseline prediction compared to the benchmark method; note that the benchmark method still has a baseline prediction MAPE of 5.8%.

V-C Ablation studies: additive Gaussian noise

As observed in Section V-A, the agent DR model identification module demonstrates robustness against baseline estimation errors, i.e., even though the a-priori baseline load estimation contains large errors, the demand response identification, and ex-post baseline estimation are accurate. In this section, we conduct investigative studies to understand why the OptNet layer is robust to the baseline estimation error, and how robust the model is against different levels of prediction errors. To exam the effect of prediction errors, we consider a setting where the incentive signal $\lambda_{t}$ and noisy demand response measurements $\hat{y}^{*}$ are available. In particular, we inject a zero-mean Gaussian noise with different standard deviations to the ground-truth demand response optima $y^{*}$ . By changing the standard deviations of noise, we aim to simulate the effect of varying prediction errors, i.e., a larger standard deviation corresponds to a prediction model with higher errors.

TABLE V: Demand response model identification error under different additive Gaussian noise.

Parameter	$\sigma=0$	$\sigma=0.5$	$\sigma=1$	$\sigma=2$	$\sigma=3$	$\sigma=5$
$\alpha$ (MAE)	1.23e-5	0.062	0.119	0.252	0.380	0.641
$\alpha$ (MAPE)	0	0.26%	0.48%	1.04%	1.57%	2.64%
$M$ (MAE)	1.021e-4	0.159	0.258	0.636	0.955	1.591
$M$ (MAPE)	0	2.57%	4.17%	10.28%	15.43%	25.7%

Table V shows the DR model identification results under different levels of Gaussian noise. Note that $\sigma=0$ is the setting of no Gaussian noise. In practice, this corresponds to the scenario where the true baseline demands of users are available. As the noise level (i.e., $\sigma$ ) increases, the parameter identification error increases. But in general, the results demonstrate that the implicit optimization layer is robust to erroneous input data and can recover the underlying optimization problem parameters even under high-magnitude noise.

In addition, we plot the residual error of the baseline demand forecast network trained with historical data in Section V-A. As shown in Fig 5, the residual error of the baseline demand forecast model roughly follows a normal distribution, where the mean is near zero (unbiased estimator), while the standard deviation equals $3.5$ (relatively high variance). The ablation result shows in Table V that, the agent model can still achieve accurate parameter estimation, as long as the residual error of baseline estimation is near zero. Robustness of the implicit differential layer is an attractive property since it guarantees the applicability of the proposed approach under incomplete information (i.e., no true baseline), erroneous and noisy prediction, and measurement. An interesting future direction would be to formally study the robustness of the OptNet for DR model identification.

V-D Ablation studies: time-varying agent model

In all the previous experiments, the demand response and baseline models are trained with data collected, assuming the agent models are time-variant. However, in practice, users’ preferences and flexibility for demand response may vary with respect to time. For instance, a user might be less willing to reduce the energy usage of his air conditioner on an extremely hot day than a mild day ( $\alpha$ value is higher and $M$ value is smaller in the first case than the latter). This is challenging since the system operators don’t know when/how the agent DR model is changed.

We performed an experiment to mimic such effects whereby we generated data for training the demand response and baseline models from a mixture of two agent models. As shown in Figure 6, in the green area user model is $\alpha=20,M=3$ (more responsive) and in the red area, user model is $\alpha=50,M=0.5$ (less responsive). The identified result by our model is $\hat{\alpha}=28.57,\hat{M}=1.749$ , which is close to the mean of the two models. It would be interesting to investigate different clustering or classification methods to first separate data from different user response modes and then run parameter identification for future works.

VI Conclusion

In this work, we developed an end-to-end deep learning model with to jointly identify demand response baseline and the agent decision making model. We proposed a gradient-descend based approach to solve the joint learning problem, and tested over both synthetic demand response and real-world demand response datasets. Results show that our approach provides robust estimation performance against baseline prediction error and measurement noises. In our future work we plan to expand this learning framework with more sophisticated demand response agent models, and apply it over a dynamical identification setting such as rolling hour-ahead predictions and time-varying agent models. We are also interested in exploring the learning stability especially between the neural network and the implicit optimization model, and investigate more efficient training algorithm targeting learning models with more complex structures.

References

[1] E. D. Cartwright, “Ferc order 2222 gives boost to ders,” 2020.
[2] Z. Hu, J.-h. Kim, J. Wang, and J. Byrne, “Review of dynamic pricing programs in the us and europe: Status quo and policy recommendations,” Renewable and Sustainable Energy Reviews, vol. 42, pp. 743–751, 2015.
[3] Y. Ji, P. Xu, P. Duan, and X. Lu, “Estimating hourly cooling load in commercial buildings using a thermal network model and electricity submetering data,” Applied Energy, vol. 169, pp. 309–323, 2016.
[4] N. G. Paterakis, O. Erdinç, and J. P. Catalão, “An overview of demand response: Key-elements and international experience,” Renewable and Sustainable Energy Reviews, vol. 69, pp. 871–891, 2017.
[5] B. Amos and J. Z. Kolter, “Optnet: Differentiable optimization as a layer in neural networks,” in International Conference on Machine Learning. PMLR, 2017, pp. 136–145.
[6] Y. Zhang, W. Chen, R. Xu, and J. Black, “A cluster-based method for calculating baselines for residential loads,” IEEE Transactions on smart grid, vol. 7, no. 5, pp. 2368–2377, 2015.
[7] L. Hatton, P. Charpentier, and E. Matzner-Løber, “Statistical estimation of the residential baseline,” IEEE Transactions on Power Systems, vol. 31, no. 3, pp. 1752–1759, 2015.
[8] D. Hong, S. Lei, J. L. Mathieu, and L. Balzano, “Exploration of tensor decomposition applied to commercial building baseline estimation,” in IEEE Global Conference on Signal and Information Processing (GlobalSIP), 2019.
[9] D. Muthirayan, D. Kalathil, K. Poolla, and P. Varaiya, “Mechanism design for self-reporting baselines in demand response,” in 2016 American Control Conference (ACC). IEEE, 2016, pp. 1446–1451.
[10] J. L. Mathieu, D. S. Callaway, and S. Kiliccote, “Examining uncertainty in demand response baseline models and variability in automated responses to dynamic pricing,” in 2011 50th IEEE Conference on Decision and Control and European Control Conference. IEEE, 2011, pp. 4332–4339.
[11] R. Fernández-Blanco, J. M. Morales, and S. Pineda, “Forecasting the price-response of a pool of buildings via homothetic inverse optimization,” Applied Energy, vol. 290, p. 116791, 2021.
[12] “Electric schedule e-slrp scheduled load reduction program.” [Online]. Available: https://www.pge.com/en_US/large-business/save-energy-and-money/energy-management-programs/demand-response-programs/scheduled-load-reduction.page
[13] D. G. Dobakhshari and V. Gupta, “A contract design approach for phantom demand response,” IEEE Transactions on Automatic Control, vol. 64, no. 5, pp. 1974–1988, 2018.
[14] A. Hassan, S. Acharya, M. Chertkov, D. Deka, and Y. Dvorkin, “A hierarchical approach to multienergy demand response: From electricity to multienergy applications,” Proceedings of the IEEE, vol. 108, no. 9, pp. 1457–1474, 2020.
[15] G. Hug, S. Kar, and C. Wu, “Consensus+ innovations approach for distributed multiagent coordination in a microgrid,” IEEE Transactions on Smart Grid, vol. 6, no. 4, pp. 1893–1903, 2015.
[16] A. Papavasiliou, H. Hindi, and D. Greene, “Market-based control mechanisms for electric power demand response,” in 49th IEEE Conference on Decision and Control (CDC). IEEE, 2010, pp. 1891–1898.
[17] A. Paudel, L. Sampath, J. Yang, and H. B. Gooi, “Peer-to-peer energy trading in smart grid considering power losses and network fees,” IEEE Transactions on Smart Grid, vol. 11, no. 6, pp. 4727–4737, 2020.
[18] W. Tushar, B. Chai, C. Yuen, D. B. Smith, K. L. Wood, Z. Yang, and H. V. Poor, “Three-party energy management with distributed energy resources in smart grid,” IEEE Transactions on Industrial Electronics, vol. 62, no. 4, pp. 2487–2498, 2014.
[19] C. Ziras, T. Sousa, and P. Pinson, “What do prosumer marginal utility functions look like? derivation and analysis,” IEEE Transactions on Power Systems, 2021.
[20] B. Xu, M. Korpås, A. Botterud, and F. O’Sullivan, “A lagrangian policy for optimal energy storage control,” in 2020 American Control Conference (ACC). IEEE, 2020, pp. 224–230.
[21] P. Jacquot, O. Beaude, S. Gaubert, and N. Oudjane, “Demand response in the smart grid: The impact of consumers temporal preferences,” in 2017 IEEE International Conference on Smart Grid Communications (SmartGridComm). IEEE, 2017, pp. 540–545.
[22] J. L. Mathieu, M. Kamgarpour, J. Lygeros, G. Andersson, and D. S. Callaway, “Arbitraging intraday wholesale energy market prices with aggregations of thermostatic loads,” IEEE Transactions on Power Systems, vol. 30, no. 2, pp. 763–772, 2014.
[23] P. Li, B. Zhang, Y. Weng, and R. Rajagopal, “A sparse linear model and significance test for individual consumption prediction,” IEEE Transactions on Power Systems, vol. 32, no. 6, pp. 4489–4500, 2017.
[24] H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting—a novel pooling deep rnn,” IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 5271–5280, 2017.
[25] Y. Wang, D. Gan, M. Sun, N. Zhang, Z. Lu, and C. Kang, “Probabilistic individual load forecasting using pinball loss guided lstm,” Applied Energy, vol. 235, pp. 10–20, 2019.
[26] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
[27] S. Barratt, “On the differentiability of the solution to convex optimization problems,” arXiv preprint arXiv:1804.05098, 2018.
[28] F. Bu, Y. Yuan, Z. Wang, K. Dehghanpour, and A. Kimber, “A time-series distribution test system based on real utility data,” in 2019 North American Power Symposium (NAPS). IEEE, 2019, pp. 1–6.
[29] “Climate data online (cdo): Global historical weather and climate data.” [Online]. Available: https://www.ncdc.noaa.gov/cdo-web/datatools/findstation
[30] “New york independent system operator - energy market and operation data.” [Online]. Available: https://www.nyiso.com/energy-market-operational-data
[31] “Low carbon london project.” [Online]. Available: https://innovation.ukpowernetworks.co.uk/projects/low-carbon-london/

To derive the Jacobian terms of $y^{*}$ with respective to the agent DR model coefficients, we first re-write the DR agent optimization model (1) in the compact matrix form,


$\displaystyle\textstyle\min_{y}\;g(D,y;\lambda,\theta):=$	$\displaystyle\lambda^{\top}(y+D)+\frac{\alpha}{2}y^{\top}y\,,$	(12a)
s.t.	$\displaystyle Iy\leq\overline{P}\,,\quad:\overline{\mu}$	(12b)
	$\displaystyle-Iy\leq-\underline{P}\,,\quad:\underline{\mu}$	(12c)
	$\displaystyle\Gamma y\leq\overline{E}\,,\quad:\overline{\nu}$	(12d)
	$\displaystyle-\Gamma y\leq-\underline{E}\,,\quad:\underline{\nu}$	(12e)

The Lagrangian of (12) is given by,

	$\displaystyle L(y,\mu,\nu)$	$\displaystyle=\lambda^{\top}(y+D)+\frac{\alpha}{2}y^{\top}y+\overline{\mu}^{\top}(y-\mathbbm{1}\overline{P})+\underline{\mu}^{\top}(\mathbbm{1}\underline{P}-y)$
		$\displaystyle+\overline{\nu}^{\top}(\Gamma y-\mathbbm{1}\overline{E})+\underline{\nu}^{\top}(\mathbbm{1}\underline{E}-\Gamma y)$		(13)

where $\mathbbm{1}$ is an all-one vector and $\Gamma$ is a lower triangular matrix, with appropriate dimension.

Then, we write out the KKT conditions for stationarity and complementary slackness conditions of the agent optimization problem in matrix form,


$\displaystyle\lambda+\alpha y^{}+\overline{\mu}^{}-\underline{\mu}^{}+\Gamma^{\top}\overline{\nu}^{}-\Gamma^{T}\underline{\nu}^{*}$	$\displaystyle=0$	(14a)
$\displaystyle\Lambda(\overline{\mu}^{})(y^{}-\mathbbm{1}\overline{P})$	$\displaystyle=0$	(14b)
$\displaystyle\Lambda(\underline{\mu}^{})(\mathbbm{1}\underline{P}-y^{})$	$\displaystyle=0$	(14c)
$\displaystyle\Lambda(\overline{\nu}^{})(\Gamma y^{}-\mathbbm{1}\overline{E})$	$\displaystyle=0$	(14d)
$\displaystyle\Lambda(\underline{\nu}^{})(\mathbbm{1}\underline{E}-\Gamma y^{})$	$\displaystyle=0$	(14e)

where $\Lambda(\cdot)$ creates a diagonal matrix from a vector. Taking the total derivatives of these conditions gives the following equations,


$\displaystyle d\alpha y^{*}+\alpha dy+d\overline{\mu}-d\underline{\mu}+\Gamma^{\top}d\overline{\nu}-\Gamma^{\top}d\underline{\nu}$	$\displaystyle=0$	(15a)
$\displaystyle\Lambda(y^{}-\mathbbm{1}\overline{P})d\overline{\mu}+\Lambda(\overline{\mu}^{})(dy-\mathbbm{1}d\overline{P})$	$\displaystyle=0$	(15b)
$\displaystyle\Lambda(\mathbbm{1}\underline{P}-y^{})d\underline{\mu}+\Lambda(\underline{\mu}^{})(\mathbbm{1}d\underline{P}-dy)$	$\displaystyle=0$	(15c)
$\displaystyle\Lambda(\Gamma y^{}-\mathbbm{1}\overline{E})d\overline{\nu}+\Lambda(\overline{\nu}^{})(\Gamma dy-\mathbbm{1}d\overline{E})$	$\displaystyle=0$	(15d)
$\displaystyle\Lambda(\mathbbm{1}\underline{E}-\Gamma y^{})d\underline{\nu}+\Lambda(\underline{\nu}^{})(\mathbbm{1}d\underline{E}-\Gamma dy)$	$\displaystyle=0$	(15e)

We re-write Eq (15) in the compact matrix form and obtain (6) in the main paper.