This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\floatsetup

[table]capposition=top

Privacy Preserving in Non-Intrusive Load Monitoring: A Differential Privacy Perspective

Haoxiang Wang, Jiasheng Zhang, Chenbei Lu, and Chenye Wu H. Wang, J. Zhang and C. Lu are with the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. C. Wu is with School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen. C. Wu is the correspondence author. Email: [email protected].
Abstract

Smart meter devices enable a better understanding of the demand at the potential risk of private information leakage. One promising solution to mitigating such risk is to inject noises into the meter data to achieve a certain level of differential privacy. In this paper, we cast one-shot non-intrusive load monitoring (NILM) in the compressive sensing framework, and bridge the gap between theoretical accuracy of NILM inference and differential privacy’s parameters. We then derive the valid theoretical bounds to offer insights on how the differential privacy parameters affect the NILM performance. Moreover, we generalize our conclusions by proposing the hierarchical framework to solve the multi-shot NILM problem. Numerical experiments verify our analytical results and offer better physical insights of differential privacy in various practical scenarios. This also demonstrates the significance of our work for the general privacy preserving mechanism design.

Index Terms:
Differential Privacy, Non-Intrusive Load Monitoring, Compressive Sensing.

I Introduction

The pervasive intelligent devices are gathering huge amounts of data in our daily lives, spanning from our shopping lists to our favorite restaurants, from travel histories to personal social networks. These big data have eased our social lives and have dramatically changed our behaviors. In the electricity sector, the widely deployed smart meters are collecting user’s energy consumption data in near real time. While these data could be rather valuable to achieve a more efficient power system, they raise significant public concern on private information leakage. Specifically, the big data in the electricity sector speed up the advance in non-intrusive load monitoring (NILM), which aims to infer the user’s energy consumption pattern from meter data.

NILM is one of the most effective ways to conduct consumer behavior analysis [1, 2]. More comprehensive behavior analysis could benefit the consumers by providing ambient assisted living [3], real time energy saving [4], etc. It is evident that such analysis is crucial for the active demand side management, which is believed to be able to significantly improve the whole system efficiency [5]. However, the inferred consumption pattern often exposes individual lifestyles. This implies that the leakage of smart meter data may lead to the concern over the leakage of private information, which calls for a comprehensive privacy preservation scheme. The European Union is the pioneer in customer privacy protection: it has legislated an anti-privacy data protection regulation in 2018 [6]. Brazil also has also set up the General Data Protection Law which become effective in February 2020 [7]. Moreover, 11 states in the US have recently enacted privacy, data security, cybersecurity, and data breach notification laws [8].

To achieve the privacy protection, the most commonly adopted technique is differential privacy (DP), first proposed by Dwork et al. in [9]. DP facilitates the mathematical analysis and is also closely related to other privacy metrics such as mutual information [10]. However, despite well investigated, the parameters in DP do not offer intuitive physical insights, which prevents this technique from wide deployment. Hence, it is a delicate task to design a practical privacy preserving mechanism of NILM.

In this work, we submit that DP indeed has physical implications in the electricity sector. We propose to understand DP through NILM and characterize how the parameters in DP affect the performance guarantee of NILM inference. Our work offers the end users a better idea on the different levels of privacy preserving services in gathering meter data.

I-A Related Works

NILM and DP are both well investigated. Since the seminal work by Hart [5], diverse techniques for NILM have been proposed to solve NILM for improved inference performances. Our work focuses on the disaggregation process in NILM. The classical algorithm is Combinatorial Optimization (CO), proposed in Hart’s seminal work. This algorithm combines heuristic methods with prior knowledge on the switching events. Zulfiquar et al. have furthered this research by introducing the aided linear integer programming technique to speed up CO in [11]. Probabilistic methods have also been proposed recently. For example, Kim et al. have applied Factorial Hidden Markov Model (FHMM) to NILM, and have used Viterbi algorithm for decoding [12]. The key challenge in this line of research is to model the appropriate states for the hidden Markov model. Makonin et al. have used machine learning techniques to solve this issue and have proposed the notion of super-state HMM [13]. The advance in deep learning warrants exploiting more temporal features in NILM. For example, Kelly et al. have deployed the long short-time memory (LSTM) framework for NILM in [14].

There is a recent interest in designing the privacy preserving NILM, and our work also falls into this category. However, most literature have focused on discussing thwarting the privacy attacks on the meter data [15]. The major technique is to utilize the storage system to physically inject noises into the meter data, which does offer the end users a certain privacy guarantee. Backes et al. have theoretically examined the way to achieve different levels of privacy using storage injection manipulation in [16]. More recently, a variety of methods have been exploited to achieve privacy preservation.

Chen et al. have utilized the combined heat and privacy system to prevent occupancy detection in [17]. Cao et al. have investigated the practical implementation from a fog computing approach [18]. Rastogi et al. have further proposed a distributed implementation in [19].

To the best of our knowledge, we are the first to exploit the physical meaning of privacy parameters in the electricity sector.

I-B Our Contributions

In establishing the connection between DP parameters and the inference accuracy of NILM, our principal contributions can be summarized as follows:

  • NILM Formulation with DP: We use the compressive sensing framework to formulate NILM inference and incorporate the parameters of DP into the formulation.

  • Theoretical Characterization of NILM Inference: Based on the compressive sensing formulation, we theoretically characterize the asymptotic upper and lower bounds for the one-shot NILM inference accuracy.

  • Hierarchical Multi-shot NILM: We generalize the one-shot NILM solution to more practical multi-shot scenarios, and propose an effective hierarchical algorithm.

We imagine our proposed framework has at least three early adopters.

  • The first adopter could be the consumers themselves. To preserve privacy, the consumers could utilize the local storage devices (provided by electric vehicles, photovoltaic panels, etc.) to inject noises to achieve certain level of differential privacy. Our theoretical understanding provides an inference accuracy bound to decipher the privacy preserving guarantee.

  • The other adopter could be the ISOs or the utility companies. When recording the meter data from the consumers for potential behavior analysis, the ISOs or utility companies could directly inject noises into the recorded meter data. Due to the Law of Larger Numbers, the injected noise will incur minimal impact on auditing (in fact, auditing before injecting the noise could solve this issue) and most classical tasks of utility companies. However, the injected noise could affect behavior analysis (see detailed discussion in Appendix -A). This is another way to decipher the physical meaning of DP parameters.

  • The last adopter could be the third-party privacy preserving entities. They could collect the data from the consumers for different purposes. And for different privacy preserving requirements, the consumers are paid for different compensations. This could enable new business models for the privacy preserving industry, and our theoretical results could give the first cut understanding on the cost benefit analysis for the emerging business models.

Refer to caption
Figure 1: The paradigm of our paper

The rest of the paper is organized as follows. Section II formulates NILM as a sparse optimization problem. Using the compressive sensing framework, we establish the lower and upper bounds of inference accuracy for NILM, and relate the ϵ\epsilon-differential privacy to these asymptotic bounds in Section III to better understand the physical implication of DP. Then, in Section IV, we generalize the compressive sensing framework to the multi-shot scenario and introduce our hierarchical algorithmic framework. Numerical studies verify the effectiveness of our theoretical conclusion in Section V. Concluding remarks and several interesting future directions are given in Section VI. We provide all the necessary proofs and justifications in Appendix. We visualize the paradigm of our paper in Fig. 1, which highlights the underlying logic between sections.

II Problem Formulation

This section first introduces the general NILM problem, and then formulates the one-shot NILM in the compressive sensing framework. We conclude by revisiting the generic compressive sensing algorithm to solve the resulting NILM problem.

II-A Non-intrusive Load Monitoring Problem

In essence, the goal of NILM is to infer user behaviors from meter data. Mathematically, for each end user (or a building complex, a campus, etc.), it may own NN appliances, denoted by a set 𝒜={A1,..,AN}\mathcal{A}=\{A_{1},..,A_{N}\}.

For simplicity, we assume the state space for each appliance is binary, i.e., on or off. For appliance i𝒜i\in\mathcal{A}, when its state is on, its energy consumption at each time tt is a random variable ZitZ_{i}^{t}, with mean of PiP_{i}. Define the vector of mean energy consumption over all appliances 𝐏\mathbf{P} as follows:

𝐏=(P1,..,PN)𝐑N.\displaystyle\mathbf{P}=(P_{1},..,P_{N})^{\dagger}\in\mathbf{R}^{N}. (1)

Hence, at each time tt, the energy consumption of the end user is this aggregate energy consumption of its appliances whose states are on. We denote this subset by St{1,,N}S_{t}\subseteq\{1,...,N\}, and denote the meter data at time tt by yty_{t}. Thus, by definition of yty_{t}, we know

yt=iStZit,t=1,,T,\displaystyle y_{t}=\sum\nolimits_{i\in S_{t}}Z_{i}^{t},\,\,\,\,\,\,\,\ \forall t=1,...,T, (2)

where TT stands for the length of observation period. If we consider the end user’s hourly operation with a resolution of 66 seconds, then TT is 600600. The advanced metering technology has made it possible to collect data at much finer resolution, e.g., in a sub-second scale [20]. Such data could allow us to exploit more unique energy consumption patterns for each appliance. However, in this paper, we choose to focus on a stylized model to better clarify the connection between DP and NILM, without specifying the resolution.

The classical NILM seeks to infer the on/off switch events of all appliances. We denote Xt{0,1}NX_{t}\in\{0,1\}^{N} the on/off indicator of the NN appliances at time tt (with zero indicating off and one indicating on). NILM aims at inferring XtX_{t} from yty_{t}.

A variety of NILM algorithms have been proposed as we have reviewed in Section I. In this work, we choose the compressive sensing framework for mathematically exploiting the sparsity structure in the event vector XtX_{t}. However, conducting NILM inference can be delicate as the energy consumptions in 𝒜\mathcal{A} can be quite diverse, and some ‘elephant’ appliances (that consume a huge amount of power) may dominate the period of interest. In this case, it is almost impossible to accurately distinguish the on/off states of small appliances. In the following formulation, we devise a sufficient condition to enable accurate NILM inference, based on which, we propose the hierarchical framework for more practical situations in the subsequence analysis.

II-B One-shot NILM Inference

To better characterize the relationship between differential privacy and NILM, we focus on the inference for a particular time tt and assume the knowledge of Xt1X_{t-1}. We term this problem the one-shot NILM inference.

To exploit the sparse structure, the compressive sensing formulation requires the following transforms:

Kt=|ytyt1|𝐑,\displaystyle K_{t}=|y_{t}-y_{t-1}|\in\mathbf{R}, (3)
Δt=|XtXt1|𝐑N.\displaystyle\Delta_{t}=|X_{t}-X_{t-1}|\in\mathbf{R}^{N}. (4)

With these transforms, we can mathematically characterize the sparsity assumption.

Assumption 1.

(Sparsity Assumption) The switch events are sparse across time. That is, the number of switch events in Δt\Delta_{t} (i.e., Δt0\|\Delta_{t}\|_{0}) is bounded by UtU_{t}, which satisfies

Δt0UtN.\displaystyle\|\Delta_{t}\|_{0}\leq U_{t}\ll{N}. (5)

Remark: This assumption holds for most publicly available datasets with the resolution on the second scale. In Appendix -B, we characterize the sparsity of three most widely adopted public datasets and justify how the failure to meet this assumption may affect the performance of our proposed framework.

This sparsity assumption motivates us to formulate the following optimization problem for the one-shot NILM inference at each time tt:

(P1)min\displaystyle(\text{P1})\,\,\ \min Δt0\displaystyle\|\Delta_{t}\|_{0} (6)
s.t.\displaystyle s.t. Δt𝐏Kt2<δ,\displaystyle\|\Delta_{t}\mathbf{P}-K_{t}\|_{2}<\delta,
Δt{0,1}N,\displaystyle\Delta_{t}\in\{0,1\}^{N},

where δ\delta is the parameter to characterize the sensitivity of yty_{t}.

The notion of sensitivity is defined for random variables:

Definition 1.

We define Δf\Delta f to be the sensitivity of a sequence of bounded random variable yty_{t}:

Δf=max1tT|y¯ty¯t|=δ2,\Delta f=\max_{1\leq t\leq T}\left|\bar{y}_{t}-\underline{y}_{t}\right|=\frac{\delta}{2}, (7)

where y¯t\underline{y}_{t} and y¯t\bar{y}_{t} are the lower and upper bounds of yty_{t}.

Back to our problem (P1), the sensitivity constraints require the meter data are bounded, which yields our second assumption:

Assumption 2.

The meter data111We only consider the pure load meter data in this work. To handle the net metering cases, we need to first resort to load disaggregation techniques [21, 22, 23], and then apply our framework to the disaggregation pure load data. yty_{t} is bounded, i.e.,

yt[y¯t,y¯t],1tT.\displaystyle y_{t}\in\left[\underline{y}_{t},\bar{y}_{t}\right],1\leq t\leq T. (8)

This assumption is straightforward to justify: it simply suggests that when the appliances are running, their energy consumption levels are within the specific range. And such range corresponds to the sensitivity parameter δ\delta.

Remark: This assumption also partially handles the measurement error. Such error could be due to environmental issues, operational issues, or other issues. Practically, such error is not a major concern. The reason is two-folded. Firstly, as indicated in [24], the magnitude of such error is required to be bounded by 5%5\% of the total power. Secondly, if the data quality is too poor to infer any useful information, there won’t be any privacy preserving requirement at all.

Note that, (P1) is similar to the classical problem in the compressive sensing framework [25] but not exactly the same. We denote the optimal solution to (P1) by Δt0\Delta_{t}^{0} and regard it as the ground truth. Deciphering this ground truth is quite challenging: the binary constraints in addition to the l0l_{0}-norm objective function make the problem intractable.

To tackle these challenges, we propose to first relax the binary constraints and then use the standard compressive sensing technique to handle the non-convexity induced by the l0l_{0}-norm objective function.

Specifically, we make the following technique alignment assumption to enable exact relaxation:

Assumption 3.

(Power Concentration Assumption) We assume the mean energy consumptions of all the NN appliances are on the same order. Mathematically, denote {P1ϵ,,PNϵ}\{P_{1}^{\epsilon},...,P_{N}^{\epsilon}\} the ascendingly ordered sequence of PP, we assume the following condition holds for all U<UtU<U_{t}:

k=1UPkϵk=1U1PN+1kϵ>2δ.\displaystyle\sum\nolimits_{k=1}^{U}P_{k}^{\epsilon}-\sum\nolimits_{k=1}^{U-1}P_{N+1-k}^{\epsilon}>2\delta. (9)

Remark: Assumption 3 guarantees the effectiveness of the compressive sensing framework. This condition tries to eliminate the task to distinguish the lightning load from the refrigerator or the heating load, which allows us to focus on the inference of loads of similar energy consumption levels. For cases with diverse energy consumptions, we defer the detailed discussion to Section IV-B. The general idea is to design a hierarchical NILM, where in each hierarchy, the NILM inference satisfies this technical alignment assumption.

This assumption allows us to relax the binary constraints without changing the structure of the optimal solution set:

(P2)min\displaystyle(\text{P2})\,\,\ \min Δt0\displaystyle\|\Delta_{t}\|_{0} (10)
s.t.\displaystyle s.t. Δt𝐏Kt2<δ,\displaystyle\|\Delta_{t}\mathbf{P}-K_{t}\|_{2}<\delta,
Δt[0,1]N.\displaystyle\Delta_{t}\in[0,1]^{N}.

We formally state the equivalence between (P1) and (P2) in the following Lemma.

Lemma 1.

If Assumptions 1 to 3 hold, problems (P1) and (P2) are equivalent in terms of the optimal solution set. Specifically, if we denote the optimal solution to (P1) by Δt0\Delta_{t}^{0} and the optimal solution to (P2) by Δt\Delta_{t}, the equivalence is indicated by the following equation:

Δt00=Δt0.\|\Delta_{t}^{0}\|_{0}=\|\Delta_{t}\|_{0}. (11)

We show detailed proof in Appendix -C. This lemma allows us to focus on solving a more tractable problem, (P2). This is due to the key theoretical basis of compressive sensing: we could effectively use l1l_{1}-norm to approximate l0l_{0}-norm in (P2) [26], which yields (P3),

(P3)min\displaystyle(\text{P3})\,\,\ \min Δt1\displaystyle\|\Delta_{t}\|_{1} (12)
s.t.\displaystyle s.t. Δt𝐏Kt2<δ,\displaystyle\|\Delta_{t}\mathbf{P}-K_{t}\|_{2}<\delta,
Δt[0,1]N.\displaystyle\Delta_{t}\in[0,1]^{N}.

Denote its optimal solution by Δt\Delta_{t}^{*}. To map Δt\Delta_{t}^{*} onto {0,1}N\{0,1\}^{N}, we conduct rounding. Denote the solution after rounding by Δ¯t\bar{\Delta}_{t}. The rounding process is as follows: we first set all elements in Δ¯t\bar{\Delta}_{t} to be 0; and then, for each non-zero element Δt(j)\Delta_{t}^{*}(j) in Δt\Delta_{t}^{*}, with probability Δt(j)\Delta_{t}^{*}(j), we set the corresponding element Δ¯t(j)\bar{\Delta}_{t}(j) to be 11. To characterize the inaccuracy induced by the approximation of l0l_{0}-norm, we compare 𝐄[Δ¯t]\mathbf{E}[\bar{\Delta}_{t}] and the ground truth, Δt0\Delta_{t}^{0}. The comparison yields the following theorem:

Theorem 1.

On expectation, the difference between Δ¯t\bar{\Delta}_{t} and Δt0\Delta_{t}^{0} is bounded. Specifically,

𝐄[Δ¯t]Δt02C(𝐏)δ,\|\mathbf{E}[\bar{\Delta}_{t}]-\Delta_{t}^{0}\|_{2}\leq C(\mathbf{P})\cdot\delta, (13)

where C(𝐏)C(\mathbf{P}) is a constant uniquely determined by vector 𝐏\mathbf{P}.

We characterize the specific form of C(𝐏)C(\mathbf{P}) in Appendix -E. To prove Theorem 1, we mainly make use of the important conclusion in compressive sensing proposed by Candes et al. in [26].

III Encode DP into NILM

Based on the classical compressive sensing methods to solve the one-shot NILM inference, we establish the connection between DP and NILM for exploiting the physical meaning of different DP parameters in the context of NILM.

III-A Revisit ϵ\epsilon-Differential Privacy

We adopt the notion of ϵ\epsilon-differential privacy [9] to connect NILM with DP. It uses the parameter ϵ\epsilon to denote the probability that we are unable to differentiate the two datasets that have only one piece of data difference.

To put it more formally, we define a mapping (D)\mathscr{B}(D) from a dataset DD to \mathbb{R}. This mapping enables a query function q:Dq:D\to\mathbb{R}. To measure the difference between two datasets DD and DD^{\prime}, we adopt the distance metric d(D,D)d(D,D^{\prime}), measuring the minimal number of sample changes that are required to make DD identical to DD^{\prime}. When d(D,D)=1d(D,D^{\prime})=1, we say DD and DD^{\prime} are neighbor datasets. This allows us to characterize the notion of ϵ\epsilon-DP: if for all neighbor datasets D1D_{1} and D2D_{2}, and for all measurable subsets YY\subset\mathbb{R}, the mapping \mathscr{B} satisfies,

Pr((D1)Y)Pr((D2)Y)eϵ,\frac{Pr(\mathscr{B}(D_{1})\in Y)}{Pr(\mathscr{B}(D_{2})\in Y)}\leq e^{\epsilon}, (14)

we say the mechanism \mathscr{B} achieves ϵ\epsilon-DP [9].

To achieve ϵ\epsilon-DP, a simple mechanism is Laplace noise injection [9]. We show the detailed noise injection mechanism in the following theorem.

Theorem 2.

(Theorem 1 in [9]) We say a mechanism \mathscr{B} achieves ϵ\epsilon-DP, if (D)\mathscr{B}(D) satisfies

(D)=q(D)+n,\mathscr{B}(D)=q(D)+n, (15)

and nn is Laplace noise with probability density function p(s)p(s):

p(s)=12λe|s|λ,p(s)=\frac{1}{2\lambda}e^{-\frac{|s|}{\lambda}}, (16)

in which λ=Δf(D)ϵ\lambda=\frac{\Delta f(D)}{\epsilon} combines the privacy parameter ϵ\epsilon that describes the level of the privacy and the sensitivity Δf(D)\Delta f(D) satisfying Δf(D)maxq(D1)q(D2)\Delta f(D)\geq\max\|q(D_{1})-q(D_{2})\| for all neighbor datasets D1D_{1} and D2D_{2}.

Remark: In our setting, the dataset DD can be viewed as the appliance states. The query function qq could be viewed as the meter data derived from the appliance states and appliances’ mean energy consumption. The sensitivity Δf\Delta f is the meter data sensitivity defined in (7) for yty_{t}; and ϵ\epsilon describes user’s required level of the privacy.

Thus, the Laplace noise injection mechanism helps us establish the connection between ϵ\epsilon-DP and NILM inference.

Next, we construct a performance bound related to the privacy parameter ϵ\epsilon and show how the level of privacy preservation connects to the performance bound of the NILM inference.

III-B Asymptotic Bounds for NILM with DP

Recall that Δf\Delta f is the sensitivity for meter data yty_{t}. After injecting the noise, instead of inferring XtX_{t} using yty_{t}, now we can only rely on the noisy data yt+nty_{t}+n_{t}, which yields optimization problem (P4).

(P4)min\displaystyle(\text{P4})\,\,\ \min Δt1\displaystyle\|\Delta_{t}\|_{1} (17)
s.t.\displaystyle s.t. Ktϵ=|yt+ntyt1nt1|,\displaystyle K_{t}^{\epsilon}=|y_{t}+n_{t}-y_{t-1}-n_{t-1}|,
ΔtPKtϵ2<δ,\displaystyle\|\Delta_{t}P-K_{t}^{\epsilon}\|_{2}<\delta,
Δt[0,1]N.\displaystyle\Delta_{t}\in[0,1]^{N}.

Denote the solution to (P4) by Δtϵ\Delta_{t}^{\epsilon}, and the solution after rounding by Δ¯tϵ\bar{\Delta}_{t}^{\epsilon}. Then, it is straightforward to apply Theorem 1 to (P4), and obtain the following proposition.

Proposition 1.

The difference between 𝐄[Δ¯tϵ]\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}] and the ground truth Δt0\Delta_{t}^{0} is bounded. More precisely,

𝐄[Δ¯tϵ]Δt02(2+C(𝐏))δ+|ntnt1|.\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}\leq(2+C(\mathbf{P}))\cdot\delta+|n_{t}-n_{t-1}|. (18)

Note that again C(𝐏)C(\mathbf{P}) is a function of matrix 𝐏\mathbf{P}, defined in Theorem 1.

This proposition not only dictates that the error’s upper bound increases almost linearly with the magnitude of noises, but also allows us to connect the DP parameters with the NILM inference accuracy.

Mathematically, we define inference accuracy α\alpha as follows:

α=1𝐄Δtϵ¯Δt01N.\alpha=1-\frac{\mathbf{E}\|\bar{\Delta_{t}^{\epsilon}}-\Delta_{t}^{0}\|_{1}}{N}. (19)

To evaluate 𝐄[α]\mathbf{E}[\alpha] over all possible injected noises ntn_{t} and nt1n_{t-1}, we denote

b=2(N(2+C(𝐏))δ).\displaystyle b=2\left(N-(2+C(\mathbf{P}))\delta\right). (20)

We can now establish the lower bound for NILM inference accuracy as Theorem 3 suggests.

Theorem 3.

The expectation of the inference accuracy α\alpha can be lower bounded. Specifically,

𝐄[α]14C(𝐏)δϵ+8δϵ+3δ4ϵN+A1ϵ+B14ϵNe2ϵbδ,\displaystyle\mathbf{E}[\alpha]\geq 1-\frac{4C(\mathbf{P})\delta\epsilon+8\delta\epsilon+3\delta}{4\epsilon N}+\frac{A_{1}\epsilon+B_{1}}{4\epsilon N}e^{-\frac{2\epsilon b}{\delta}}, (21)

where

A1=2N4δ2C(𝐏)δ,\displaystyle A_{1}=2N-4\delta-2C(\mathbf{P})\delta, (22)
B1=3δ.\displaystyle B_{1}=3\delta. (23)

Note that a smaller ϵ\epsilon implies a higher level of differential privacy. In proving the lower bound of inference accuracy, the key is to identify that for 𝐄[Δ¯tϵ]Δt02\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2} in Proposition 1, there is a trivial upper bound, NN. Hence, we can refine Proposition 1 to construct the lower bound for 𝐄[α]\mathbf{E}[\alpha].

As indicated by Theorem 3, the derived lower bound decreases at the rate of ϵ1\epsilon^{-1} when δ\delta is small enough (i.e. the impact of the exponential term is neglectable).

To construct the upper bound, we need to exploit the structure of problem (P4). Specifically, standard mathematical manipulations of constraints in (P4) yield the following upper bound characterization:

Theorem 4.

Denote m=N𝐏2+2δm=N\|\mathbf{P}\|_{2}+2\delta for notational simplification. The expected inference accuracy α\alpha can be upper bounded:

𝐄[α]1+A2ϵ2+B2ϵ+C28δϵN𝐏2e2ϵmδ16δϵ2+4δϵ+3δ16ϵN𝐏2eϵ,\displaystyle\mathbf{E}[\alpha]\leq 1+\frac{A_{2}\epsilon^{2}+B_{2}\epsilon+C_{2}}{8\delta\epsilon N\|\mathbf{P}\|_{2}}e^{-\frac{2\epsilon m}{\delta}}-\frac{16\delta\epsilon^{2}+4\delta\epsilon+3\delta}{16\epsilon N\|\mathbf{P}\|_{2}}e^{-\epsilon},

where

A2=4m28δm4mN𝐏2,\displaystyle A_{2}=4m^{2}-8\delta m-4mN\|\mathbf{P}\|_{2}, (24)
B2=6δm8δ24δN𝐏2,\displaystyle B_{2}=6\delta m-8\delta^{2}-4\delta N\|\mathbf{P}\|_{2}, (25)
C2=3δ2.\displaystyle C_{2}=3\delta^{2}. (26)

Theorem 4 implies that the upper bound decreases at the rate of o(ϵ1eϵ)o({\epsilon}^{-1}e^{-\epsilon}). That is, the upper bound would also decrease when δ\delta becomes larger. Hence, there is a certain gap between the lower bound and the upper bound. We want to emphasize that although the proof (in Appendix -G) is based on the compressive sensing framework, it can be generalized to many other NILM algorithms for the upper bound construction.

IV Hierarchical Multi-shot NILM

In this section, we analyze the multi-shot NILM and propose our hierarchical decomposition algorithm. Specifically, we first introduce the multi-shot NILM formulation, identify its challenges and conduct the classical treatment. Then, we explain how to relax Assumption 3 with hierarchical decomposition. We conclude this section by analyzing the accuracy bounds for the corresponding multi-shot NILM inference.

IV-A Multi-shot NILM Inference

The special difficulty in multi-shot NILM inference, compared with one-shot NILM, is the temporal coupling between inferences. This issue, if not well addressed, could lead to large cumulative errors, and ultimately lead to the cascading inference failure222See the cascading effect over network in [27] for more detailed discussion.. Therefore, in this part, after the problem formulation, we intend to design an effective algorithm to contain this error.

The multi-shot NILM inference problem is a straightforward generalization of the one-shot NILM inference. It seeks to infer the appliance switch events over a period of TT from the meter data sequence 𝒀=[y1,..yT]\boldsymbol{Y}=[y_{1},..y_{T}]. The inference also relies on the vector of mean energy consumption 𝐏\mathbf{P}, defined in Eq. (1). We denote 𝑿=[X1,,XT]\boldsymbol{X}=[X_{1},...,X_{T}] the state probability matrix, where each XtX_{t} captures the probability of appliances states at time tt. To establish the asymptotic bounds for our multi-shot NILM inference accuracy, we denote 𝑿0=[X10,,XT0]\boldsymbol{X}^{0}=[X^{0}_{1},...,X^{0}_{T}] the ground truth of switching events over TT, and 𝑿ϵ¯=[X1ϵ¯,..,XTϵ¯]\bar{\boldsymbol{X}^{\epsilon}}=[\bar{X_{1}^{\epsilon}},..,\bar{X_{T}^{\epsilon}}] the inference results returned by our proposed algorithm.

It is worth noting that the situation becomes even more complicated if we conduct rounding at each time slot. We submit that it is valuable to keep the inference results obtained in solving problem (P3) without rounding as these results can be interpreted as the probability of switching. This allows us to derive the state probability matrix 𝑿=[X1,,XT]\boldsymbol{X}=[X_{1},...,X_{T}]. Note that, after obtaining the matrix 𝑿\boldsymbol{X}, we can no longer conduct the rounding naively: there is more information about the actual meter readings. We could utilize the simple error correcting methods to decipher the embedded information.

Specifically, the error correcting aims to make the binary adjustments to reduce the approximation error according to the actual meter data. Though it could be possible for us to consider more complex methods for error correcting rather than direct adjustment, they make little difference in terms of the worst case bound analysis.

We describe the classical treatment to multi-shot NILM inference in Algorithm 1, where we adopt the notion of the Hadamard Product, which is defined for two matrices 𝑨\boldsymbol{A} and 𝑩\boldsymbol{B} as follows:

Definition 2.

For two matrices AA and BB of the same dimension m×nm\times n, the Hadamard product ABA\bigodot B is the m×nm\times n matrix CC with elements given by

cij=aijbij,1im,1jn.\displaystyle c_{ij}=a_{ij}b_{ij},\ \forall 1\leq i\leq m,\ \forall 1\leq j\leq n. (27)

There is a simple way to construct rough inference accuracy bounds directly from Theorem 3 and Theorem 4. The trick is to examine the rounding procedure in Algorithm 1 and to connect it with the bounds for one-shot NILM inference. Formally, we define the accuracy of the classical algorithm for multi-shot NILM inference by αm\alpha_{m}:

αm=1𝑿ϵ¯𝑿01NT.{\alpha}_{m}=1-\frac{\|\bar{\boldsymbol{X}^{\epsilon}}-\boldsymbol{X}^{0}\|_{1}}{NT}. (28)

Denote the lower bound and upper bound for the one-shot NILM inference accuracy by b(δ,ϵ)b(\delta,\epsilon) and B(δ,ϵ)B(\delta,\epsilon), respectively (see Theorem 3 and 4). Recall that ϵ\epsilon is the DP’s parameters (see Theorem 2) and δ\delta describes the sensitivity of the meter data for each period yty_{t} (see Eq. (7)).

With these definitions, we can characterize the lower and upper bounds for 𝐄[αm]\mathbf{E}[\alpha_{m}]:

Theorem 5.

The expected inference accuracy 𝐄[αm]\mathbf{E}[\alpha_{m}] for the multi-shot NILM problem can be lower and upper bounded. Specifically, we have

1(T1)(1b(δ,ϵ)N)2𝐄[αm]11B(δ,ϵ)T,1-\frac{(T-1)(1-b(\delta,\epsilon)N)}{2}\leq\mathbf{E}[\alpha_{m}]\leq 1-\frac{1-B(\delta,\epsilon)}{T}, (29)

where b(δ,ϵ)b(\delta,\epsilon) and B(δ,ϵ)B(\delta,\epsilon) are the lower bound and upper bound for the one-shot NILM inference accuracy.

0:  The initial state X0X_{0};
The meter reading sequence 𝒀=y0,y1,,yt\boldsymbol{Y}=y_{0},y_{1},...,y_{t};
The appliances’ number NN;
The appliances’ mean power vector 𝐏\mathbf{P};
0:  The appliance state sequences X1ϵ¯,,XTϵ¯\bar{X^{\epsilon}_{1}},...,\bar{X^{\epsilon}_{T}};
1:  Get the differences between time points of the meter readings as k1,,ktk_{1},...,k_{t}, Kt=ytyt1,t=1,,T;K_{t}=y_{t}-y_{t-1},t=1,...,T;
2:  Solve the optimization problem (12) for each tt and get the approximation solution Δ1,,ΔT\Delta_{1},...,\Delta_{T};
3:  for time tt from 11 to TT do
4:     Xt=Xt1(1Δt1)+(1Xt1)Δt1X_{t}=X_{t-1}\bigodot(1-\Delta_{t-1})+(1-X_{t-1})\bigodot\Delta_{t-1};
5:  end for
6:  for time tt from 11 to TT do
7:     Do rounding on XtX_{t} and derive the vector Xtϵ¯\bar{X_{t}^{\epsilon}}
8:     if Xtϵ¯𝐏>yt\bar{X_{t}^{\epsilon}}\mathbf{P}>y_{t} then
9:        k=0k=0;
10:        while Xtϵ¯𝐏>yt\bar{X_{t}^{\epsilon}}\mathbf{P}>y_{t} and k<Nk<N do
11:           find kthk^{th} largest appliance jj and set Xtϵj¯=0\bar{X_{t}^{\epsilon j}}=0;
12:           k=k+1k=k+1;
13:        end while
14:     else if Xtϵ¯𝐏<yt\bar{X_{t}^{\epsilon}}\mathbf{P}<y_{t} then
15:        k=0k=0;
16:        while Xtϵ¯𝐏<yt\bar{X_{t}^{\epsilon}}\mathbf{P}<y_{t} and k<Nk<N do
17:           find kthk^{th} smallest appliance jj and set Xtϵj¯=1\bar{X_{t}^{\epsilon j}}=1;
18:           k=k+1k=k+1;
19:        end while
20:     end if
21:  end for
22:  return  X1ϵ¯,,XTϵ¯\bar{X^{\epsilon}_{1}},...,\bar{X^{\epsilon}_{T}}.
Algorithm 1 Multi-shot Inference

IV-B Hierarchical Decomposition

Albeit that we have deployed the error correction method, the two bounds in Theorem 55 are still rather loose. For example, as TT increases, the lower bound could quickly approach 0 while the upper bound would quickly approach 11. This is because we still need to take into account the cascading effects in the union bound though we have made some efforts to contain the error. Another reason is due to Assumption 3. We may relax this assumption for improved bounds.

The idea is simple and straightforward. Intuitively, to relax Assumption 3, we need to conduct a hierarchical decomposition for different proper appliance sets. To obtain the proper sets, we first sort all the appliances in 𝒜\mathcal{A} according to their mean energy consumption levels. Then, starting with the least energy-consuming appliance, we could form a number of proper sets that satisfy Assumption 3.

To construct the sets, we use the greedy policy. For a certain set with SS appliances, the condition that makes the (S+1)th(S+1)^{th} appliance join the current set instead of creating a new set is as follows:

i=1S2+1Pi2δi=1S21PS+1i+PS+1,\displaystyle\sum\nolimits_{i=1}^{\lfloor\frac{S}{2}\rfloor+1}P_{i}-2\delta\geq\sum\nolimits_{i=1}^{\lfloor\frac{S}{2}\rfloor-1}P_{S+1-i}+P_{S+1}, (30)

where the subscript ii in PiP_{i} also illustrates its ascending order in the set.

Proposition 2.

The set generation criteria (30) guarantees the validity of Assumption 3 for each proper generated set.

We call each set a hierarchy, which establishes the basis for our hierarchical multi-shot NILM inference. We apply Algorithm 1 to each hierarchy in the descending order. We characterize the whole procedure in Algorithm 2.

We submit that the hierarchical decomposition helps refine asymptotic bounds for 𝐄[αm]\mathbf{E}[\alpha_{m}] in that it characterizes the connection between hierarchies. This allows us to derive a recursive formulation to estimate the overall inference accuracy. Define inference accuracy for hierarchy ii as αi\alpha_{i}:

αi=1𝑿ϵ[i]¯𝑿[i]01NiT,\alpha_{i}=1-\frac{\|\bar{\boldsymbol{X}^{\epsilon}[i]}-\boldsymbol{X}[i]^{0}\|_{1}}{N_{i}T}, (31)

where NiN_{i} denotes the number of appliances in hierarchy ii; 𝑿ϵ[i]\boldsymbol{X}^{\epsilon}[i] and 𝑿[i]0\boldsymbol{X}[i]^{0} denote the inferred result and the ground truth of the appliances in hierarchy ii, respectively. We also define the mean power matrix for hierarchy ii as 𝐏i\mathbf{P}_{i} and the minimum power in hierarchy ii as PmiP^{i}_{m}. For simplicity, we also define the maximal summation of UU appliances whose power is smaller than PmiP^{i}_{m} as PUiP^{i}_{U}, in which UU is the sparsity of the switching events as we defined in Assumption 1.

Now we can formally introduce our bounds characterization theorem for hierarchical multi-shot NILM inference, with the help of Theorem 5. For simplicity, define

bm(δ,ϵ)=1(T1)(1b(δ,ϵ)N)2,\displaystyle b_{m}(\delta,\epsilon)=1-\frac{(T-1)(1-b(\delta,\epsilon)N)}{2}, (32)
BM(δ,ϵ)=11B(δ,ϵ)T.\displaystyle B_{M}(\delta,\epsilon)=1-\frac{1-B(\delta,\epsilon)}{T}. (33)

Then, we can show that

Theorem 6.

The expected inference accuracy for each hierarchy ii, 𝐄[αi]\mathbf{E}[\alpha_{i}], can be lower bounded by mim_{i} and upper bounded by MiM_{i}. Specifically,

mi=bm(δ+22+C(𝐏)δi,ϵ),\displaystyle m_{i}=b_{m}\left(\delta+\frac{2}{2+C(\mathbf{P})}\delta_{i}^{\prime},\epsilon\right), (34)
Mi=BM(δ+δi,ϵ),\displaystyle M_{i}=B_{M}(\delta+\delta_{i}^{\prime},\epsilon),

where

δi=PUi2+k=1i1NkT(1mk)𝐏k2.\delta_{i}^{\prime}=\frac{P_{U}^{i}}{2}+\sum_{k=1}^{i-1}N_{k}T(1-m_{k})\|\mathbf{P}_{k}\|_{2}. (35)

Note that, by definition of summation, δ1=PU12\delta_{1}^{\prime}=\frac{P_{U}^{1}}{2}.

0:  The initial state X0X_{0};
The meter reading sequence y0,y1,,yTy_{0},y_{1},...,y_{T};
The number of appliances NN;
The appliances power vector 𝐏\mathbf{P};
0:  The appliances states sequences 𝑿ϵ¯=X1ϵ¯,,XTϵ¯\bar{\boldsymbol{X^{\epsilon}}}=\bar{X^{\epsilon}_{1}},...,\bar{X^{\epsilon}_{T}};
1:  Decomposition:
2:  Rank the power vector 𝐏\mathbf{P} ascendingly: P1,,PNP_{1},...,P_{N};
3:  k=1k=1, C=C=\varnothing, S=S=\varnothing;
4:  for ii from 1 to NN do
5:     if |C|=0|C|=0 or 11 then
6:        CC{Pi}C\leftarrow{C\cup\{P_{i}\}};
7:     else
8:        For the items in CC, rank them ascendingly: c1,,c|C|c_{1},...,c_{|C|};
9:        if  j=1|C|2+1cj2δj=1|C|21c|C|+1j+Pi\sum_{j=1}^{\lfloor\frac{|C|}{2}\rfloor+1}c_{j}-2\delta\geq\sum_{j=1}^{\lfloor\frac{|C|}{2}\rfloor-1}c_{|C|+1-j}+P_{i} then
10:           CC{Pi}C\leftarrow{C\cup\{P_{i}\}};
11:        else
12:           SS{C}S\leftarrow{S\cup\{C\}};
13:           C=PiC={P_{i}};
14:        end if
15:     end if
16:  end for
17:  Decoding:
18:  Rank SS descendingly with respect to the largest element in each set;
19:  for SiS_{i} in SS do
20:     Construct the appliances in SiS_{i}’s power vector PSiP_{S_{i}};
21:     X0[Si]X_{0}[S_{i}] to denote the initial states of the appliances in the set SS;
22:     Conduct Multi-shot Compressive Sensing with X0[Si]X_{0}[S_{i}], 𝒀\boldsymbol{Y}, |Si||S_{i}|, PSiP_{S_{i}} and derive 𝑿ϵ¯[Si]\bar{\boldsymbol{X}^{\epsilon}}[S_{i}]
23:     𝒀=𝒀𝑿ϵ¯[Si]PSi\boldsymbol{Y}=\boldsymbol{Y}-\bar{\boldsymbol{X}^{\epsilon}}[S_{i}]P_{S_{i}};
24:  end for
25:  return  𝑿ϵ¯=X1ϵ¯,,XTϵ¯\bar{\boldsymbol{X}^{\epsilon}}=\bar{X_{1}^{\epsilon}},...,\bar{X_{T}^{\epsilon}}.
Algorithm 2 Hierarchical Multi-shot Inference

Specifically, suppose we have KK hierarchies in the problem, the overall accuracy αH\alpha_{H} can be recursively obtained as follows:

𝐄[αH]=i=1K𝐄[αi]Nii=1KNi.\mathbf{E}[\alpha_{H}]=\frac{\sum_{i=1}^{K}\mathbf{E}[\alpha_{i}]N_{i}}{\sum_{i=1}^{K}N_{i}}. (36)

The associated bounds for our multi-shot compressive sensing algorithm with hierarchical decomposition could also be derived straightforwardly with the help of Theorem 6.

Remark: It is possible to generalize our framework to the non-binary multiple discrete states case. The key modification is to characterize the state of each appliance with a vector of random variables, instead of a single random variable. However, in this paper, we choose to focus on a stylized model, which only considers the binary state. This stylized model, by simplifying the mathematical details, helps to sharpen our understanding of the physical insights into the DP parameters in the electricity sector.

V Numerical Studies

In this section, we design a sequence of numerical studies to highlight the effectiveness of our proposed bound characterization. The effectiveness lies in the consistent magnitude with the empirical trends of the inference accuracy (in DP parameter ϵ\epsilon).

We conclude this section by illustrating that our derived asymptotic bounds are not only effective to the compressing sensing framework, but also effective to characterize the performance of many other NILM algorithms.

V-A Overview of the Datasets

In this paper, we use three publicly available datasets.

  • The UK-DALE dataset [28] records the electricity consumption from five households in UK, with a sampling rate of 6 Hz for each individual appliances. For performance evaluations, we choose the data segment from 2013/05/26 21:03:14 to 2013/05/27 0:54:50.

  • The TEALD dataset [20] is collected by Rainforest EMU2 in Canada. With a sampling rate of 1 Hz, we use the data segment from 2016/02/09 16:00:00 to 2016/02/10 15:59:59 for performance evaluation.

  • The Redd dataset [29] is also commonly used for NILM performance evaluation. We use the data from 2011/04/18 21:22:13 to 2011/05/25 03:56:34 with a sampling rate of 3 HZ.

V-B One-shot NILM Inference

We use the dataset of building 4 in UK-DALE dataset [28] to conduct the first numerical analysis. In this building, there are 8 appliances.

Using the one-shot NILM inference based on compressive sensing, we evaluate the inference performance with increasing level of DP. Fig. 2 plots the theoretical lower bound, the actual performance traces, and the theoretical upper bound to evaluate our theorems. The lower bound is not tight mainly due to the sensitivity and parameter C(𝐏)C(\mathbf{P}). The sensitivity is not tight in that it is time-varying. In some time slots, some appliances are simply idle. However, they all contribute to the sensitivity. The parameter C(𝑷)C(\boldsymbol{P}) is not tight due to exactly the same reason. The meter data suggests δ\delta be 22, and C(𝑷)C(\boldsymbol{P}) should be 0.0150.015. Nevertheless, our lower bound is tight in terms of magnitude. The mean of the actual inference accuracy indeed decreases at the rate of ϵ1\epsilon^{-1}.

On the other hand, our proposed upper bound seems tight for this case study. We want to emphasize that this upper bound may not always be tight in that we assume the existence of an oracle in the upper bound construction. Hence, this upper bound would be more ideal to serve as a benchmark for all possible NILM algorithms. Hence, from Fig. 2, we can conclude that compressive sensing is a good approach for one-shot NILM inference.

Refer to caption
Figure 2: One-shot NILM Inference Accuracy in ln(1ϵ)\ln(\frac{1}{\epsilon}).

V-C Multi-shot NILM Inference

We then use the UK-DALE dataset to validate our Algorithm 1 which relies on Assumption 3. The inference accuracy is shown in Fig. 3. In fact, our proposed approach performs remarkably good, and the inference accuracy trend shows a similar decreasing rate as Theorem 6 dictates. We want to emphasize that, all of our figures are plotted on a half log scale, which is ideal to verify the decreasing rate of ϵ1\epsilon^{-1}.

We use the TEALD dataset to further demonstrate the effectiveness of our hierarchical decomposition. There are 13 appliances in the dataset, with diverse energy consumptions. We group them into four hierarchies and conduct the NILM inference. Fig. 4 plots its inference accuracy trend in the DP parameter, which again highlights the remarkable performance of our proposed algorithm, and verifies the effectiveness of our derived bounds. During the simulation, we observe that our hierarchical algorithm seems to be quite robust against noises, which may lead to interesting theoretical results, though a detailed discussion is beyond the scope of our work.

Refer to caption
Figure 3: Multi-shot NILM Inference Accuracy in ln(1ϵ)\ln(\frac{1}{\epsilon}).
Refer to caption
Figure 4: Hierarchical Multi-shot NILM Inference Accuracy in ln(1ϵ)\ln(\frac{1}{\epsilon})

V-D Robustness of Noise Injection Methods

We examine the robustness of our conclusion through numerical studies. Specifically, to understand if our conclusion only holds for Laplace noise injection. We consider other noise injection methods, which also achieves ϵ\epsilon-DP. Here, we employ the staircase mechanism, described in Algorithm 3, which achieves ϵ\epsilon-DP [30].

0:  ϵ\epsilon, Δf\Delta f;
0:  the staircase noise nn;
1:  γ1/(1+eϵ/2)\gamma\leftarrow 1/(1+e^{\epsilon/2});
2:  Generate a discrete random variable SS Pr(S=1)=Pr(S=1)=12\rm{Pr}(S=1)=\rm{Pr}(S=-1)=\frac{1}{2};
3:  Generate a geometric random variable GG with Pr(G=i)=(1b)bi\rm{Pr}(G=i)=(1-b)b^{i} for integer i0i\geq 0,
where b=eϵb=e^{-\epsilon};
4:  Generate a random variable UU from a uniform distribution in [0,1][0,1];
5:  Generate a binary random variable BB with Pr(B=0)=γγ+(1γ)b\rm{Pr}(B=0)=\frac{\gamma}{\gamma+(1-\gamma)b} and Pr(B=1)=(1γ)bγ+(1γ)b\rm{Pr}(B=1)=\frac{(1-\gamma)b}{\gamma+(1-\gamma)b};
6:  nS((1B)((G+γU)Δf)+B((G+γ+(1γ)U)Δf))n\!\leftarrow\!S((1\!-\!B)((G\!+\!\gamma U)\Delta f)\!+\!B((G\!+\!\gamma\!+\!(1\!-\!\gamma)U)\Delta f));
7:  return  nn.
Algorithm 3 Generation of Staircase noise [30]
Refer to caption
Figure 5: Robustness Analysis

We follow the same routine in Section V-C to conduct the numerical study and compare the results in Fig. 5. It is clear that different noise injection methods lead to similar relationships between the DP parameters and the inference accuracy. This further illustrates the robustness of our conclusion against noise injection methods.

V-E Beyond Compressive Sensing

We can now safely conclude that, in the compressive sensing framework, the inference accuracy heavily relies on the DP parameter ϵ\epsilon. Specifically, we observe the inference accuracy decreases linearly with ln(1ϵ)\ln{(\frac{1}{\epsilon})} as our figure shows when ln(1ϵ)\ln{(\frac{1}{\epsilon})} is large. We now try to generalize this conclusion to other common frameworks in NILM literature. Specifically, we conduct numerical studies to empirically compare how DP’s parameter affects the accuracy of common NILM algorithms, including sparse Viterbi algorithm for super-state HMM (Sparse-HMM), aided linear Integer Programming (ALIP), linear Integer Programming (IP), Recurrent Neural Network (RNN), CO, FHMM, and our proposed algorithm (Multi-shotCS).

We conduct numerical comparison on Building 1 in the Redd dataset. For supervised learning models, we divide the dataset for training (before 2011-04-03) and for validation (after 2011-04-03). Then, we inject noises to the validation set for performance evaluation. For unsupervised model (including our approach), we directly inject noises to the meter data.

We explain the details about the hyperparameter settings as follows. In fact, most of the settings follow the seminal works.

  • When implementing the SparseHMM algorithm, we follow the hyperparameters in the seminal Redd dataset numerical study in [13]. Specifically, the number of the maximal super state is set to be 44; and the state number choice parameter ϵ\epsilon is set to be 0.000210.00021.

  • As for ALIP and IP algorithms, we follow all the parameters in [11], including states, specific ratings and the state transient values. We choose the zero initialization for these two algorithms.

  • For RNN implementation, we follow the RNN structure in [14]. Specifically, we first adopt the 1D convolution layer to handle the input data. And then use two bidirection LSTM model and a fully connected layer to derive the output. We adopt the Adam optimizer and the initial learning rate of 0.010.01 for training. The number of training epochs is set to be 55 and the batch size is set to be 128128.

  • For CO and FHMM, we again adopt the classical settings in NILMTK [31]. The detailed settings are also provided in the online document [32].

  • For our proposed Multi-shot compressive sensing algorithm, we evaluate the meter data and suggest δ\delta should be 2020. This is derived by observing the maximal fluctuations in the historical meter data during the period without any switch event.

As indicated by Fig. 6, among all the algorithms, SparseHMM achieves the best performance as it makes full use of the sparsity assumption and exploits the temporal correlations through the hidden Markov chain. They are also the underlying guarantee for the remarkable performance of our proposed framework. However, the utilization of temporal correlation makes both algorithms vulnerable to the noise injection due to the cascading inference failure. While ALIP outperforms the conventional IP approach, both of them show strong robustness to noise injection. This is because they do not rely on the sparsity assumption, and the median filter process in the two algorithms contributes to smooth out the fluctuations incurred by noise injection. The ordinary performance of RNN is due to the limited training data and the overfitting issue. The fewer assumption utilized by classical methods (CO and FHMM) makes them more robust to noise injection at the cost of low inference accuracy.

The key observation is that the inference accuracy of each of the seven algorithms exhibits piecewise linear relationship with ln(1ϵ)\ln(\frac{1}{\epsilon}), though with different breaking points and slopes. The slopes can serve as the indicators to evaluate each algorithm’s robustness to Laplace noise (and hence its difficulty to achieve privacy preserving).

Nonetheless, our proposed bounds provide the correct magnitude to explain the physical meaning of DP parameters in most common frameworks for NILM, which could be inferred from Fig. 6 empirically. Therefore, in the practical utilization of DP, our theoretical performance bounds could offer magnitude estimation for the effects of the different DP levels. Moreover, these empirical results also inspire us to derive a performance bound for these NILM algorithms for further discussion.

Refer to caption
Figure 6: NILM inference accuracy comparison among 77 algorithms.

VI Conclusion

In this paper, we seek to offer the explicit physical meaning of ϵ\epsilon in ϵ\epsilon-DP. We firstly theoretically derive how the level of DP affects the performance guarantee for NILM inference. Then, we use extensive numerical studies to highlight that our theoretical results are effective not only for in the compressive sensing framework, but also for many other algorithms. These results could be helpful for the potential new business models for privacy preserving in the electricity sector.

This work can be extended in many ways. For example, it will be interesting to derive a tighter upper bound, or a tighter lower bound for the inference accuracy. Then a more detailed analysis for the behavior individual appliance under DP framework could be conducted. It is also an interesting topic to analyze the robustness of different NILM algorithms when the privacy preservation is implemented.

References

  • [1] George W Hart. Residential energy monitoring and computerized surveillance via utility power flows. IEEE Technology and Society Magazine, 8(2):12–16, 1989.
  • [2] Frank Kreith and Raj P Chhabra. CRC handbook of thermal engineering. CRC press, 2017.
  • [3] Michael Fell, Harry Kennard, Gesche Huebner, Moira Nicolson, Simon Elam, and David Shipworth. Energising health: A review of the health and care applications of smart meter data. London, UK: SMART Energy GB, 2017.
  • [4] Jorge Revuelta Herrero, Álvaro Lozano Murciego, Alberto López Barriuso, Daniel Hernández de La Iglesia, Gabriel Villarrubia González, Juan Manuel Corchado Rodríguez, and Rita Carreira. Non intrusive load monitoring (nilm): A state of the art. In International Conference on Practical Applications of Agents and Multi-Agent Systems, pages 125–138. Springer, 2017.
  • [5] G. W. Hart. Residential energy monitoring and computerized surveillance via utility power flows. IEEE Technology and Society Magazine, 8(2):12–16, June 1989.
  • [6] A. Connolly. Freedom of encryption. IEEE Security Privacy, 16(1):102–103, January 2018.
  • [7] Renato Leite Monteiro. The new brazilian general data protection law — a detailed analysis, 2018.
  • [8] Cynthia Brumfield. 11 new state privacy and security laws explained: Is your business ready? https://www.csoonline.com/article/3429608/11-new-state-privacy-and-security-laws-explained-is-your-business-ready.html, 2018.
  • [9] Cynthia Dwork, Frank Mcsherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. Lecture Notes in Computer Science, pages 265–284, 2006.
  • [10] Weina Wang, Lei Ying, and Junshan Zhang. On the relation between identifiability, differential privacy, and mutual-information privacy. IEEE Trans. on Information Theory, 62(9):5018–5029, 2016.
  • [11] Md Zulfiquar Ali Bhotto, Stephen Makonin, and Ivan V Bajic. Load disaggregation based on aided linear integer programming. IEEE Trans. on Circuits and Systems Ii-express Briefs, 64(7):792–796, 2017.
  • [12] Hyungsul Kim, Manish Marwah, Martin Arlitt, Geoff Lyon, and Jiawei Han. Unsupervised disaggregation of low frequency power measurements, 2011.
  • [13] Stephen Makonin, Fred Popowich, Ivan V Bajic, Bob Gill, and Lyn Bartram. Exploiting hmm sparsity to perform online real-time nonintrusive load monitoring. IEEE Trans. on Smart Grid, 7(6):2575–2585, 2016.
  • [14] Jack Kelly and William J Knottenbelt. Neural nilm: Deep neural networks applied to energy disaggregation. arXiv: Neural and Evolutionary Computing, pages 55–64, 2015.
  • [15] Georgios Kalogridis, Costas Efthymiou, Stojan Z Denic, Tim A Lewis, and Rafael Cepeda. Privacy for smart meters: Towards undetectable appliance load signatures, 2010.
  • [16] M Backes and Sebastian Meiser. Differentially private smart metering with battery recharging. IACR Cryptology ePrint Archive, 2012:183, 2012.
  • [17] Dong Chen, Sandeep Kalra, David Irwin, Prashant Shenoy, and Jeannie Albrecht. Preventing occupancy detection from smart meters. IEEE Trans. on Smart Grid, 6(5):2426–2434, 2015.
  • [18] Hui Cao, Shubo Liu, Longfei Wu, Zhitao Guan, and Xiaojiang Du. Achieving differential privacy against non‐intrusive load monitoring in smart grid: A fog computing approach. Concurrency and Computation: Practice and Experience, 31(22), 2019.
  • [19] Vibhor Rastogi and Suman Nath. Differentially private aggregation of distributed time-series with transformation and encryption, 2010.
  • [20] Stephen Makonin, Z Jane Wang, and Chris Tumpach. Rae: The rainforest automation energy dataset for smart grid meter data analysis. international conference on data technologies and applications, 3(1):8, 2018.
  • [21] Emre C Kara, Ciaran M Roberts, Michaelangelo Tabone, Lilliana Alvarez, Duncan S Callaway, and Emma M Stewart. Disaggregating solar generation from feeder-level measurements. Sustainable Energy, Grids and Networks, 13:112–121, 2018.
  • [22] Michaelangelo Tabone, Sila Kiliccote, and Emre Can Kara. Disaggregating solar generation behind individual meters in real time. In Proceedings of the 5th Conference on Systems for Built Environments, pages 43–52, 2018.
  • [23] Fei Wang, Kangping Li, Xinkang Wang, Lihui Jiang, Jianguo Ren, Zengqiang Mi, Miadreza Shafie-khah, and João PS Catalão. A distributed pv system capacity estimation approach based on support vector machine with customer net load curve features. Energies, 11(7):1750, 2018.
  • [24] National Electrical Manufacturers Association et al. Ansi c12. 20-2010: American national standard for electricity meter: 0.2 and 0.5 accuracy classes. American National Standards Institute, 2010.
  • [25] David L Donoho and Michael Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization. Proc. of the National Academy of Sciences, 100(5):2197–2202, 2003.
  • [26] Emmanuel J Candes, Justin Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006.
  • [27] David Easley, Jon Kleinberg, et al. Networks, crowds, and markets: Reasoning about a highly connected world. Significance, 9:43–44, 2012.
  • [28] Jack Kelly and William J Knottenbelt. The uk-dale dataset, domestic appliance-level electricity demand and whole-house demand from five uk homes. Scientific Data, 2(1):150007–150007, 2015.
  • [29] J Zico Kolter and Matthew J Johnson. Redd: A public data set for energy disaggregation research, 2011.
  • [30] Quan Geng and Pramod Viswanath. The optimal mechanism in differential privacy. In 2014 IEEE international symposium on information theory, pages 2371–2375. IEEE, 2014.
  • [31] Nipun Batra, Jack Kelly, Oliver Parson, Haimonti Dutta, William Knottenbelt, Alex Rogers, Amarjeet Singh, and Mani Srivastava. Nilmtk: an open source toolkit for non-intrusive load monitoring. In Proceedings of the 5th international conference on Future energy systems, pages 265–276, 2014.
  • [32] NILMTK Authors. Nilmtk’s api documentation. http://nilmtk.github.io/nilmtk/master/index.html.
  • [33] Yang Yu, Guangyi Liu, Wendong Zhu, Fei Wang, Bin Shu, Kai Zhang, Nicolas Astier, and Ram Rajagopal. Good consumer or bad consumer: Economic information revealed from demand profiles. IEEE Transactions on Smart Grid, 9(3):2347–2358, 2017.
  • [34] Pecan Street INC. Pecan street data. http://www.pecanstreet.org.

-A DP May Affect Behavior Analysis

It is believed that the load profile is an important pattern to characterize consumer behavior. Yu et al. further show that the load profile can uniquely determine the system’s marginal cost in serving each kind of consumer [33], and suggest that the k-means clustering based on l1l_{1}-norm is the suitable measure to cluster the users with similar system’s marginal serving cost.

To empirically highlight how the DP parameters may blur the behavior analysis, we use the Pecan Street dataset [34], and randomly select 4040 users’ 33-month daily energy consumption profiles (from May 1 to Aug. 8, 2015). For a larger dataset, we combine all the profiles together and directly conduct the behavior analysis on this combined dataset. We select the number of clusters to be 3030. The initial clustering results are indicated in Fig. 7.

Refer to caption
Figure 7: The profile of the clustering center

We inject the Laplace noise into every single consumer’s load profile. We set the DP-parameter λ\lambda to be 0.01,0.05,0.1,0.50.01,0.05,0.1,0.5, respectively to examine how this parameter may change the clustering result. Fig. 8 plots such impacts. Specifically, if a cluster has little users deviating to other clusters after noise injection, we call it a stable cluster. The color of each cluster represents such stability. The arrows in the figure show the trajectories of cluster interchanges. Obviously, higher privacy requirement (higher λ\lambda) leads to higher probability that users deviate from the original cluster.

Refer to caption
Figure 8: The clusters interchanges influenced by different levels of noise injection

To better connect the DP parameter with accuracy, we can estimate the probability that each consumer may stay in the original cluster and derive the upper bound for this probability.

-B Justification of Sparsity Assumption

While it is hard to theoretically examine how the failure to meet this assumption may affect the performance of the proposed inference framework, we provide the following numerical analysis. Define the degree of sparsity ss for vector X{0,1}N×TX\in\{0,1\}^{N\times T} as follows:

s=1t=1T1Xt+1Xt0N(T1).s=1-\frac{\sum_{t=1}^{T-1}\|X_{t+1}-X_{t}\|_{0}}{N(T-1)}. (37)

This metric allows us to synthetic load data with different sparsity levels to our theoretical conclusions. Fig. 9 plots the relationship between DP parameter and inference accuracy for different sparsity levels, from 0.30.3 to 0.90.9. It is clear that the general trends remain the same for different sparsity levels while higher sparsity level leads to better performance.

Refer to caption
Figure 9: Performance of One-shot compressive sensing inference for different sparsity

We further illustrate the sparsity levels of three most widely adopted public datasets in Table I. The sparsity levels of these three datasets are all very high.

TABLE I: The Sparsity of Three Widely Adopted Public Datasets
Dataset Sparsity
UK-DALE 99.86%
TEALD 99.29%
REDD 98.97%

-C Proof of Lemma 1

Denote the optimal solution to (P1) by Δt0\Delta_{t}^{0} and the optimal solution to (P2) by Δt\Delta_{t}. By the equivalence of optimal solution set, we mean

Δt00=Δt0.\|\Delta_{t}^{0}\|_{0}=\|\Delta_{t}\|_{0}. (38)

Hence, it suffices for us to show two cases: Δt00Δt0\|\Delta_{t}^{0}\|_{0}\leq\|\Delta_{t}\|_{0} and Δt00Δt0\|\Delta_{t}^{0}\|_{0}\geq\|\Delta_{t}\|_{0}. Note that, it is straightforward to see that all feasible solutions to (P1) are also feasible to (P2). Hence, the second inequality immediately follows. All that remains is to prove the first inequality.

We prove it by contradiction by utilizing the crucial technical alignment assumption: Assumption 3. It states that for the ascending order of 𝐏\mathbf{P}’s entries {P1ϵ,..,PNϵ}\{P^{\epsilon}_{1},..,P^{\epsilon}_{N}\}, and for all UUtU\leq U_{t}, it holds:

k=1UPkϵk=1U1PN+1kϵ>2δ.\displaystyle\sum\nolimits_{k=1}^{U}P_{k}^{\epsilon}-\sum\nolimits_{k=1}^{U-1}P^{\epsilon}_{N+1-k}>2\delta. (39)

Substituting the sparsity assumption into the sensitivity constraint yields that there exists P1P_{1},..,PUP_{U} satisfying:

k=1UPiKt+δ.\displaystyle\sum\nolimits_{k=1}^{U}P_{i}\leq K_{t}+\delta. (40)

It holds for all possible U¯<U\underline{U}<U,

k=1U¯PN+1kϵ<k=1U¯Pi2δ\displaystyle\sum\nolimits_{k=1}^{\underline{U}}P_{N+1-k}^{\epsilon}<\sum\nolimits_{k=1}^{\underline{U}}P_{i}-2\delta (41)
<\displaystyle< k=1U¯Pkϵ2δKtδ.\displaystyle\sum\nolimits_{k=1}^{\underline{U}}P_{k}^{\epsilon}-2\delta\leq K_{t}-\delta.

Note that, Eqs. (40) and (41) hold for both problems (P1) and (P2).

Denote Δt00\|\Delta_{t}^{0}\|_{0} by UU, and denote Δt0\|\Delta_{t}\|_{0} by U¯\underline{U}. Suppose U¯<U\underline{U}<U, Eqs. (40) and (41) yield that

Δt𝐏k=1U¯PN+1kϵ<Ktδ,\displaystyle\Delta_{t}\mathbf{P}\leq\sum\nolimits_{k=1}^{\underline{U}}P_{N+1-k}^{\epsilon}<K_{t}-\delta, (42)

This indicates that Δt\Delta_{t} does not satisfy the sensitivity constraint in (P2), which contradicts to our assumption that it is the optimal solution to (P2). Thus, we could know that Δt00Δt0\|\Delta_{t}^{0}\|_{0}\leq\|\Delta_{t}\|_{0}. This concludes our proof.

-D Proof for Theorem 1

We first define the notations and present the lemma that we use to prove this theorem.

For a set U{1,,N}U\subseteq\{1,\cdots,N\}, define vector 𝐏U\mathbf{P}_{U} as follows:

𝐏U={Pi|Pi𝐏,iU}.\mathbf{P}_{U}=\left\{P_{i}\bigg{|}P_{i}\in\mathbf{P},i\in U\right\}. (43)

We say 𝐏U\mathbf{P}_{U} is SS-bounded by δS\delta_{S}, if the following holds for all cc, and for all UU such that |U|S|U|\leq S:

(1δS)c22𝐏Uc22(1+δS)c22.(1-\delta_{S})\|c\|_{2}^{2}\leq\|\mathbf{P}_{U}\cdot c\|_{2}^{2}\leq(1+\delta_{S})\|c\|_{2}^{2}. (44)

For our problem, the sparsity assumption implies that Δt\Delta_{t} has at most UtU_{t} non-zero entries. Assume 𝐏U\mathbf{P}_{U} is both 3Ut3U_{t}-bounded and 4Ut4U_{t}-bounded, and the corresponding bounds are δ3Ut\delta_{3U_{t}} and δ4Ut\delta_{4U_{t}}, respectively. We use Theorem 1 in [26] as Lemma 2 for our subsequent proof.

Lemma 2.

(Theorem 1 in [26]): For XtX_{t} with UtU_{t} non-zero entries, and for the matrix PP with nn columns, if PP is both 3Ut3U_{t}-bounded and 4Ut4U_{t}-bounded, the difference between the solution for (P3) and ground truth is upper bounded, i.e.,

ΔtΔt024(3(1δ4Ut)1+δ3Ut)δ.\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}\leq\frac{4}{(\sqrt{3(1-\delta_{4U_{t}})}-\sqrt{1+\delta_{3U_{t}}})}\cdot\delta. (45)

To apply Lemma 2 to our problem, we need to express the constraint Δt𝑷Kt2<δ\|\Delta_{t}\boldsymbol{P}-K_{t}\|_{2}<\delta in the following equivalent form:

Δt𝐏𝐏2Kt𝐏22<δ𝐏2.\left|\left|\Delta_{t}\frac{\mathbf{P}}{\|\mathbf{P}\|_{2}}-\frac{K_{t}}{\|\mathbf{P}\|_{2}}\right|\right|_{2}<\frac{\delta}{\|\mathbf{P}\|_{2}}. (46)

Thus, Lemma 2 directly dictates

ΔtΔt02C(𝐏)δ,\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}\leq C(\mathbf{P})\cdot\delta, (47)

where

C(𝐏)=4(3(1δ4Ut)1+δ3Ut)𝐏2.C(\mathbf{P})=\frac{4}{(\sqrt{3(1-\delta_{4U_{t}})}-\sqrt{1+\delta_{3U_{t}}})\|\mathbf{P}\|_{2}}. (48)

Note that 𝐄[Δ¯t]=Δt\mathbf{E}[\bar{\Delta}_{t}]=\Delta_{t}^{*}, which yields

𝐄[Δ¯t]Δt02\displaystyle\|\mathbf{E}[\bar{\Delta}_{t}]-\Delta_{t}^{0}\|_{2} 𝐄[Δ¯t]Δt2+ΔtΔt02\displaystyle\leq\|\mathbf{E}[\bar{\Delta}_{t}]-\Delta_{t}^{*}\|_{2}+\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2} (49)
C(𝐏)δ.\displaystyle\leq C(\mathbf{P})\cdot\delta.

This completes our proof for Theorem 1.

-E Proof for Proposition 1

Note that the following triangle inequality holds with KtϵK_{t}^{\epsilon} defined in (P4):

|Δtϵ𝐏Δt0𝐏|\displaystyle|\Delta_{t}^{\epsilon}\mathbf{P}-\Delta_{t}^{0}\mathbf{P}| (50)
\displaystyle\leq |Δtϵ𝐏Δt𝐏+Δt𝐏Δt0𝐏|\displaystyle|\Delta^{\epsilon}_{t}\mathbf{P}-\Delta^{*}_{t}\mathbf{P}+\Delta^{*}_{t}\mathbf{P}-\Delta^{0}_{t}\mathbf{P}|
\displaystyle\leq |Δtϵ𝐏Δt𝐏|+|Δt𝐏Δt0𝐏|\displaystyle|\Delta^{\epsilon}_{t}\mathbf{P}-\Delta^{*}_{t}\mathbf{P}|+|\Delta^{*}_{t}\mathbf{P}-\Delta^{0}_{t}\mathbf{P}|
\displaystyle\leq C(𝐏)δ+|ΔtϵKtϵ+KtϵΔt𝐏|\displaystyle C(\mathbf{P})\cdot\delta+|\Delta^{\epsilon}_{t}-K_{t}^{\epsilon}+K_{t}^{\epsilon}-\Delta^{*}_{t}\mathbf{P}|
\displaystyle\leq C(𝐏)δ+δ+||ytyt+1+ntnt+1|Δt𝐏|\displaystyle C(\mathbf{P})\cdot\delta+\delta+||y_{t}-y_{t+1}+n_{t}-n_{t+1}|-\Delta^{*}_{t}\mathbf{P}|
\displaystyle\leq C(𝐏)δ+δ+||ytyt+1|+|ntnt+1|Δt𝐏|\displaystyle C(\mathbf{P})\cdot\delta+\delta+||y_{t}-y_{t+1}|+|n_{t}-n_{t+1}|-\Delta^{*}_{t}\mathbf{P}|
\displaystyle\leq C(𝐏)δ+δ+δ+|ntnt+1|\displaystyle C(\mathbf{P})\cdot\delta+\delta+\delta+|n_{t}-n_{t+1}|
\displaystyle\leq (C(𝐏)+2)δ+|ntnt+1|\displaystyle(C(\mathbf{P})+2)\cdot\delta+|n_{t}-n_{t+1}|

Proposition 1 immediately follows.

-F Proof for Theorem 3

Denote lt=ntnt1l_{t}=n_{t}-n_{t-1}. Since ntn_{t} and nt1n_{t-1} are i.i.d Laplace noises, we can show that

𝐄[|lt|]=20x(x4λ2+14λ)exλ𝑑x=3λ2=3δ4ϵ.\displaystyle\mathbf{E}[|l_{t}|]=2\int^{\infty}_{0}\!\!x\left(\frac{x}{4\lambda^{2}}\!+\!\frac{1}{4\lambda}\right)e^{-\frac{x}{\lambda}}dx=\frac{3\lambda}{2}=\frac{3\delta}{4\epsilon}. (51)

Note that the last equality holds because Δf=δ2\Delta{f}=\frac{\delta}{2} and λ=Δfϵ\lambda=\frac{\Delta{f}}{\epsilon}.

This allows us to express 𝐄[α]\mathbf{E}[\alpha] in terms of δ\delta and ϵ\epsilon:

𝐄[α]=𝐄lt[1𝐄Δ¯tϵΔt01N]\displaystyle\mathbf{E}[\alpha]=\mathbf{E}_{l_{t}}\left[1-\frac{\mathbf{E}\|\bar{\Delta}_{t}^{\epsilon}-\Delta_{t}^{0}\|_{1}}{N}\right] (52)
=\displaystyle= 𝐄lt[1𝐄Δ¯tϵΔt02N]=𝐄lt[1𝐄[Δ¯tϵ]Δt02N]\displaystyle\mathbf{E}_{l_{t}}\left[1\!-\!\frac{\mathbf{E}\|\bar{\Delta}_{t}^{\epsilon}\!-\!\Delta_{t}^{0}\|_{2}}{N}\right]\!=\!\mathbf{E}_{l_{t}}\left[1\!-\!\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]\!-\!\Delta_{t}^{0}\|_{2}}{N}\right]
\displaystyle\geq 𝐄lt[max(1(C(𝐏)+2)δ+|lt|N,0)].\displaystyle\ \mathbf{E}_{l_{t}}\left[\max(1-\frac{(C(\mathbf{P})+2)\delta+|l_{t}|}{N},0)\right].

Expressing the expectation in the integral form further yields that

𝐄[α]\displaystyle\mathbf{E}[\alpha] (53)
12N(C(𝐏)+2)δ(x4λ2+14λ)exλ𝑑x\displaystyle\geq 1-2\int_{N-(C(\mathbf{P})+2)\delta}^{\infty}\!\left(\frac{x}{4\lambda^{2}}\!+\!\frac{1}{4\lambda}\right)e^{-\frac{x}{\lambda}}dx
20N(C(𝐏)+2)δ((C(𝐏)+2)δ+xN)(x4λ2+14λ)exλ𝑑x.\displaystyle-2\int_{0}^{N-(C(\mathbf{P})+2)\delta}\!\left(\frac{(C(\mathbf{P})+2)\delta+x}{N}\right)\left(\frac{x}{4\lambda^{2}}\!+\!\frac{1}{4\lambda}\right)e^{-\frac{x}{\lambda}}dx.

From Eq. (52), the inequality holds due to the fact that accuracy has a trivial lower bound 0, which indicates the error has a trivial upper bound 1. Theorem 3 follows by further standard mathematical manipulations.

-G Proof for Theorem 4

The accuracy is defined as follows

α=1𝐄Δtϵ¯Δt01N.\displaystyle\alpha=1-\frac{\mathbf{E}\|\bar{\Delta_{t}^{\epsilon}}-\Delta_{t}^{0}\|_{1}}{N}. (54)

To find its upper bound, we need to find the lower bound of the error. With KtϵK_{t}^{\epsilon} in Problem (P4), we define event Ω\Omega as {Ktϵ[0,Δt0𝐏+2δ]}{(ytyt+1)(ntnt+1)>0}\{K_{t}^{\epsilon}\not\in[0,\Delta^{0}_{t}\mathbf{P}+2\delta]\}\cap\{(y_{t}-y_{t+1})(n_{t}-n_{t+1})>0\}.

This event Ω\Omega allows us to lower bound the error:

𝐄[1α]=𝐄Ktϵ𝐄Δ¯tϵΔt01N\displaystyle\mathbf{E}[1-\alpha]=\mathbf{E}_{K_{t}^{\epsilon}}\frac{\mathbf{E}\|\bar{\Delta}_{t}^{\epsilon}-\Delta_{t}^{0}\|_{1}}{N} (55)
=\displaystyle= 𝐄Ktϵ𝐄[Δ¯tϵ]Δt02N𝐄[𝐄[Δ¯tϵ]Δt02N𝐈{Ω}],\displaystyle\mathbf{E}_{K_{t}^{\epsilon}}\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}}{N}\geq\mathbf{E}\left[\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}}{N}\mathbf{I}\{\Omega\}\right],

where 𝐈{}\mathbf{I}\{\cdot\} is the indicator function.

For a certain KtϵK_{t}^{\epsilon}, from the definition of rounding, we know that the expectation of rounding is the same as the optimal solution to (P3), denoted by Δt\Delta_{t}^{*}. Hence, we could substitute 𝐄[Δ¯tϵ]\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}] with Δt\Delta_{t}^{*}.

When the event Ω\mathscr{\Omega} happens, KtϵK_{t}^{\epsilon} could be related to the ground truth Δt0\Delta_{t}^{0}, in which ltl_{t} denotes the ntnt1n_{t}-n_{t-1}. Since the (ytyt+1)(ntnt+1)>0(y_{t}-y_{t+1})(n_{t}-n_{t+1})>0, we could derive that,

Ktϵ=|ytyt1|+|lt|\displaystyle K_{t}^{\epsilon}=|y_{t}-y_{t-1}|+|l_{t}| Δt0Pδ+|lt|.\displaystyle\geq\Delta_{t}^{0}P-\delta+|l_{t}|. (56)

Thus, if the Δt\Delta_{t}^{*} is feasible, i.e.,

|Δt𝐏Ktϵ|δ,\displaystyle|\Delta_{t}^{*}\mathbf{P}-K_{t}^{\epsilon}|\leq\delta, (57)

then we know

Δt𝐏Ktϵδ.\displaystyle\Delta_{t}^{*}\mathbf{P}-K_{t}^{\epsilon}\geq-\delta. (58)

When the event Ω\mathscr{\Omega} happens, we could conclude that |lt|>2δ|l_{t}|>2\delta. Therefore, combining with Eqs.(56) and (58) yields

||lt|2δ|\displaystyle||l_{t}|-2\delta| |(ΔtΔt0)𝐏|\displaystyle\leq|(\Delta_{t}^{*}-\Delta_{t}^{0})\mathbf{P}| (59)
ΔtΔt02𝐏2.\displaystyle\leq\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}\|\mathbf{P}\|_{2}.

Plugging Eq. (59) into Eq. (LABEL:omegaerr) further implies that

𝐄[1α]\displaystyle\mathbf{E}[1-\alpha]\geq 𝐄[(𝐄[Δ¯tϵ]Δt0)2N𝐈{Ω}]\displaystyle\mathbf{E}\left[\frac{\|(\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0})\|_{2}}{N}\mathbf{I}\{\Omega\}\right] (60)
\displaystyle\geq 𝐄[||lt|2δ|N𝐏2𝐈{Ω}]\displaystyle\mathbf{E}\left[\frac{||l_{t}|-2\delta|}{N\|\mathbf{P}\|_{2}}\mathbf{I}\{\Omega\}\right]
=\displaystyle= 2δ2n𝐏2+2δ||x|2δ|N𝐏2(x4λ2+14λ)exλ𝑑x\displaystyle\int^{2n\|\mathbf{P}\|_{2}+2\delta}_{2\delta}\frac{||x|-2\delta|}{N\|\mathbf{P}\|_{2}}(\frac{x}{4\lambda^{2}}+\frac{1}{4\lambda})e^{-\frac{x}{\lambda}}dx
+2n𝐏2+2δ(x4λ2+14λ)exλ𝑑x.\displaystyle+\int_{\infty}^{2n\|\mathbf{P}\|_{2}+2\delta}(\frac{x}{4\lambda^{2}}+\frac{1}{4\lambda})e^{-\frac{x}{\lambda}}dx.

The inequalities are again making use of the fact that accuracy has a trivial lower bound 0. With mathematical manipulations, we can prove Theorem 4.

-H Proof for Theorem 5

For one-shot compressive sensing, with the above proofs of Theorem 3 and 4, we could derive the bounds for the difference between the solution to problem (P4) Δt\Delta_{t}^{*} and the ground truth Δt0\Delta_{t}^{0} in the following form:

g(δ,ϵ)𝐄ΔtΔt02G(δ,ϵ).g(\delta,\epsilon)\leq\mathbf{E}\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}\leq G(\delta,\epsilon). (61)

Next, we construct g(δ,ϵ)g(\delta,\epsilon) and G(δ,ϵ)G(\delta,\epsilon). Following the rounding procedure, for the appliance kk, we denote its approximate and true states as XtkX_{t}^{k} and Xt0kX_{t}^{0k}, respectively. We also denote its approximate and true switching events by Δtk\Delta_{t}^{*k} and Δt0k\Delta_{t}^{0k}. The procedure leads to the following equations:

Xt+1k\displaystyle X_{t+1}^{k} =Δtk(1Xtk)+(1Δtk)Xtk,\displaystyle=\Delta_{t}^{*k}(1-X_{t}^{k})+(1-\Delta_{t}^{*k})X_{t}^{k}, (62)
Xt+10k\displaystyle X_{t+1}^{0k} =Δt0k(1Xt0k)+(1Δt0k)Xt0k.\displaystyle=\Delta_{t}^{0k}(1-X_{t}^{0k})+(1-\Delta_{t}^{0k})X_{t}^{0k}. (63)

Together, they indicate

|Xt+1kXt+10k|\displaystyle|X_{t+1}^{k}-X_{t+1}^{0k}| (64)
=\displaystyle= |XtkXt0k+ΔtkΔt0k+2(Xt0kΔt0kXtkΔtk)|\displaystyle|X_{t}^{k}-X_{t}^{0k}+\Delta_{t}^{*k}-\Delta_{t}^{0k}+2(X_{t}^{0k}\Delta_{t}^{0k}-X_{t}^{k}\Delta_{t}^{*k})|
=\displaystyle= |(12Δtk)(XtkXt0k)+(12Xt0k)(ΔtkΔt0k)|\displaystyle|(1-2\Delta_{t}^{*k})(X_{t}^{k}-X_{t}^{0k})+(1-2X_{t}^{0k})(\Delta_{t}^{*k}-\Delta_{t}^{0k})|
\displaystyle\leq |(12Δtk)(XtkXt0k)|+|(12Xt0k)(ΔtkΔt0k)|\displaystyle|(1-2\Delta_{t}^{*k})(X_{t}^{k}-X_{t}^{0k})|+|(1-2X_{t}^{0k})(\Delta_{t}^{*k}-\Delta_{t}^{0k})|
\displaystyle\leq |XtkXt0k|+|ΔtkΔt0k|.\displaystyle|X_{t}^{k}-X_{t}^{0k}|+|\Delta_{t}^{*k}-\Delta_{t}^{0k}|.

The last inequality holds because either Δtk\Delta_{t}^{*k} or Xt0kX_{t}^{0k} is in [0,1][0,1]. Hence, both 12Δtk1-2\Delta_{t}^{*k} and 12Xt0k1-2X_{t}^{0k} are in [1,1][-1,1].

Taking the expectation, we have

𝐄|XtkXt0k|\displaystyle\mathbf{E}|X_{t}^{k}-X_{t}^{0k}| (65)
\displaystyle\leq 𝐄i=1T1|ΔikΔi0k|=𝐄i=1T1ΔikΔi0k2\displaystyle\mathbf{E}\sum\nolimits_{i=1}^{T-1}|\Delta_{i}^{*k}-\Delta_{i}^{0k}|=\mathbf{E}\sum\nolimits_{i=1}^{T-1}\|\Delta_{i}^{*k}-\Delta_{i}^{0k}\|_{2}
\displaystyle\leq 𝐄i=1T1ΔiΔi02=(T1)G(δ,ϵ).\displaystyle\mathbf{E}\sum\nolimits_{i=1}^{T-1}\|\Delta_{i}^{*}-\Delta_{i}^{0}\|_{2}=(T-1)G(\delta,\epsilon).

Denoting 𝑿0\boldsymbol{X}^{0} as the ground truth and 𝑿\boldsymbol{X} as the results before rounding, we know

𝐄XX01\displaystyle\mathbf{E}\|X-X^{0}\|_{1} (66)
=\displaystyle= 𝐄k=1Ni=1T|XikXi0k|𝐄T(T1)2G(δ,ϵ)N.\displaystyle\mathbf{E}\sum_{k=1}^{N}\sum_{i=1}^{T}|X_{i}^{k}-X_{i}^{0k}|\leq\mathbf{E}\frac{T(T-1)}{2}G(\delta,\epsilon)N.

Then, back to lower bound the accuracy, in which 𝑿ϵ¯\bar{\boldsymbol{X}^{\epsilon}} denotes the actual decoding results after rounding, we could derive that:

𝐄[αm]\displaystyle\mathbf{E}[\alpha_{m}] =1𝐄𝑿ϵ¯𝑿01NT=1𝐄𝐄[𝑿ϵ¯]𝑿01NT\displaystyle=1-\frac{\mathbf{E}\|\bar{\boldsymbol{X}^{\epsilon}}-\boldsymbol{X}^{0}\|_{1}}{NT}=1-\frac{\mathbf{E}\|\mathbf{E}[\bar{\boldsymbol{X}^{\epsilon}}]-\boldsymbol{X}^{0}\|_{1}}{NT} (67)
=1𝑿𝑿01NT1(T1)2G(δ,ϵ).\displaystyle=1-\frac{\|\boldsymbol{X}-\boldsymbol{X}^{0}\|_{1}}{NT}\geq 1-\frac{(T-1)}{2}G(\delta,\epsilon).

We construct G(δ,ϵ)G(\delta,\epsilon) in terms of the accuracy bound for the one-shot algorithm. If we denote the lower and upper bounds for one-shot compressive sensing algorithm by b(δ,ϵ)b(\delta,\epsilon) and B(δ,ϵ)B(\delta,\epsilon), then from Eq. (52), we have

𝐄[1𝐄[Δ¯tϵ]Δt02N]\displaystyle\mathbf{E}\left[1-\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}}{N}\right] =𝐄[1ΔtΔt02N]\displaystyle=\mathbf{E}\left[1-\frac{\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}}{N}\right] (68)
=𝐄[α]b(δ,ϵ).\displaystyle=\mathbf{E}[\alpha]\geq b(\delta,\epsilon).

This constructs the lower bound:

G(δ,ϵ)=1b(δ,ϵ)N.G(\delta,\epsilon)=1-b(\delta,\epsilon)N. (69)

Specifically, we could derive the lower bound for multi-shot compressive sensing as

𝐄[αm]1(T1)(1b(δ,ϵ)N)2.\mathbf{E}[\alpha_{m}]\geq 1-\frac{(T-1)(1-b(\delta,\epsilon)N)}{2}. (70)

As for the upper bound, we could construct an oracle to analyze a simple bound. The oracle could identify all the true states when t>1t>1. When t=1t=1, it does the same as our multi-shot compressive sensing algorithm. Hence, its error totally comes from the error for time period 11. Since we know the initial states, our error comes from the first period’s one-shot compressive sensing’s calculations, that is the error for Δ¯1ϵ\bar{\Delta}_{1}^{\epsilon}. This observation yields that

𝐄[1𝐄[Δ¯tϵ]Δt02N]B(δ,ϵ).\mathbf{E}\left[1-\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}}{N}\right]\leq B(\delta,\epsilon). (71)

This further implies that

𝐄[𝐄[Δ¯tϵ]Δt02N]1B(δ,ϵ).\mathbf{E}\left[\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}}{N}\right]\geq 1-B(\delta,\epsilon). (72)

Therefore, the upper bound could be derived with the help of Theorem 4 as follows:

𝐄[αm]\displaystyle\mathbf{E}[\alpha_{m}] =1𝐄𝑿ϵ¯𝑿01NT1𝐄X1ϵ¯X101NT\displaystyle=1-\frac{\mathbf{E}\|\bar{\boldsymbol{X}^{\epsilon}}-\boldsymbol{X}^{0}\|_{1}}{NT}\leq 1-\frac{\mathbf{E}\|\bar{X_{1}^{\epsilon}}-X_{1}^{0}\|_{1}}{NT} (73)
=𝐄[1𝐄[Δ¯tϵ]Δt02NT]11B(δ,ϵ)T.\displaystyle=\mathbf{E}\left[1-\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}}{NT}\right]\leq 1-\frac{1-B(\delta,\epsilon)}{T}.

Thus, we complete our proof.

-I Proof for Proposition 2

We first define the notion of a good hierarchy. For a hierarchy \mathscr{H} with SS items, we rank the items ascendingly: P1,..,PSP_{1},..,P_{S}. If \mathscr{H} satisfies that

i=1UPi2δ>i=1U1PSi+1,U<S,\sum\nolimits_{i=1}^{U}P_{i}-2\delta>\sum\nolimits_{i=1}^{U-1}P_{S-i+1},\forall U<S, (74)

we call it a good hierarchy. Notice this characteristic is stronger than Assumption 3. For a good hierarchy with SS items, if SUtS\leq U_{t}, Assumption 3 immediately holds.

We prove the proposition by induction. The induction basis is clear: the single item hierarchy is a trivial good hierarchy. The key step is the induction process, where we want to show that for a good hierarchy with SS items, if the (S+1)th(S+1)^{th} appliance satisfies,

i=1S2+1Pi2δi=1S21PS+1i+PS+1,\sum\nolimits_{i=1}^{\lfloor\frac{S}{2}\rfloor+1}P_{i}-2\delta\geq\sum\nolimits_{i=1}^{\lfloor\frac{S}{2}\rfloor-1}P_{S+1-i}+P_{S+1}, (75)

then we should include it to the set and form a larger good hierarchy. This can be proved by examining the following two cases.

For US2+1U\leq\lfloor\frac{S}{2}\rfloor+1, since

i=U+1S2+1Pi<i=U1S21PSi+1,\sum\nolimits_{i=U+1}^{\lfloor\frac{S}{2}\rfloor+1}P_{i}<\sum\nolimits_{i=U-1}^{\lfloor\frac{S}{2}\rfloor-1}P_{S-i+1}, (76)

together with condition (75), we could obtain that

i=1UPi2δ>i=1U2PSi+1+PS+1.\sum\nolimits_{i=1}^{U}P_{i}-2\delta>\sum\nolimits_{i=1}^{U-2}P_{S-i+1}+P_{S+1}. (77)

For S2+1<US\lfloor\frac{S}{2}\rfloor+1<U\leq S, we can observe the following two facts:

SU+3UPi=i=SU+1U2PSi+1,\sum\nolimits_{S-U+3}^{U}P_{i}=\sum\nolimits_{i=S-U+1}^{U-2}P_{S-i+1}, (78)

and

SU+2S2+1.S-U+2\leq\lfloor\frac{S}{2}\rfloor+1. (79)

Together, they imply that

i=1SU+2Pi2δ>i=1SUPSi+1+PS+1.\sum\nolimits_{i=1}^{S-U+2}P_{i}-2\delta>\sum\nolimits_{i=1}^{S-U}P_{S-i+1}+P_{S+1}. (80)

Combining the two cases, we can conclude the induction. Following the same routine, we can show that, condition (75) is also necessary. That is, if the (S+1)th(S+1)^{th} appliance does not satisfy this condition, it should not be included into the current set (hierarchy), as it will make the hierarchy not good. We should, instead, start a new set. Thus, we complete the proof for Proposition 2.

-J Proof for Theorem 6

We prove the following Lemma, which is crucial to our proof.

Lemma 3.

For our one-shot compressive sensing algorithm, we denote the lower and upper bound as b(δ,ϵ)b(\delta,\epsilon) and B(δ,ϵ)B(\delta,\epsilon). Besides the Laplace noises, we inject new disturbances ξt\xi_{t} and ξt+1\xi_{t+1} into yt+nty_{t}+n_{t} and yt+1+nt+1y_{t+1}+n_{t+1}, in which yty_{t} and yt+1y_{t+1} is the meter readings and ntn_{t} is the injecting Laplace noise as we defined in Eq. (16). If the disturbances satisfy that 𝐄|ξtξt+1|Y\mathbf{E}|\xi_{t}-\xi_{t+1}|\leq Y, then the bounds associated with our algorithm with disturbances are b(δ+Y2+C(𝐏),ϵ)b(\delta+\frac{Y}{2+C(\mathbf{P})},\epsilon) and B(δ+Y2,ϵ)B(\delta+\frac{Y}{2},\epsilon), respectively.

Proof: First we construct the new lower bound. Following the same routine as Eq. (50), we have

|Δtϵ𝐏Δt0𝐏|\displaystyle|\Delta_{t}^{\epsilon}\mathbf{P}-\Delta_{t}^{0}\mathbf{P}| (81)
\displaystyle\leq (C(𝐏)+2)δ+|ntnt+1+ξtξt+1|.\displaystyle(C(\mathbf{P})+2)\cdot\delta+|n_{t}-n_{t+1}+\xi_{t}-\xi_{t+1}|.
\displaystyle\leq (C(𝐏)+2)δ+|ntnt+1|+|ξtξt+1|.\displaystyle(C(\mathbf{P})+2)\cdot\delta+|n_{t}-n_{t+1}|+|\xi_{t}-\xi_{t+1}|.

This implies that

𝐄[α]\displaystyle\mathbf{E}[\alpha] 𝐄[1(C(𝐏)+2)δ+|ξtξt+1|+|lt|N]\displaystyle\geq\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)\delta+|\xi_{t}-\xi_{t+1}|+|l_{t}|}{N}\right] (82)
𝐄[1(C(𝐏)+2)δ+Y+|lt|N]\displaystyle\geq\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)\delta+Y+|l_{t}|}{N}\right]
=𝐄[1(C(𝐏)+2)(δ+Y2+C(𝐏))+|lt|N],\displaystyle=\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)(\delta+\frac{Y}{2+C(\mathbf{P})})+|l_{t}|}{N}\right],

where ltl_{t} is the difference between the Laplace noises, i.e., lt=ntnt+1l_{t}=n_{t}-n_{t+1}. Clearly, the disturbance plays a similar role as the noises. Revisiting Eq. (56) yields that

Ktϵ=|ytyt+1+ξtξt+1|+|lt|.K_{t}^{\epsilon}=|y_{t}-y_{t+1}+\xi_{t}-\xi_{t+1}|+|l_{t}|. (83)

Hence, we could establish the new upper bound as follows:

||lt|2δY||𝐄ξ[|lt|2δ|ξtξt+1|]|\displaystyle||l_{t}|-2\delta-Y|\leq|\mathbf{E}_{\xi}[|l_{t}|-2\delta-|\xi_{t}-\xi_{t+1}|]| (84)
\displaystyle\leq 𝐄ξlt|2δ|ξtξt+1\displaystyle\mathbf{E}_{\xi}||l_{t}|-2\delta-|\xi_{t}-\xi_{t+1}||
\displaystyle\leq 𝐄ξ|(ΔtΔt0)𝐏|𝐄ξΔtΔt02𝐏2.\displaystyle\mathbf{E}_{\xi}|(\Delta_{t}^{*}-\Delta_{t}^{0})\mathbf{P}|\leq\mathbf{E}_{\xi}\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}\|\mathbf{P}\|_{2}.

The upper bound can be derived following the same routine. Then we finish our proof of Lemma 3.

The key to apply Lemma 3 to the proof of Theorem 6 is to characterize the disturbance connections between hierarchies. We characterize such connections by induction. The induction basis is to examine hierarchy 1, where the disturbances come from all the other hierarchies’ instead of decoding error.

Specifically, for yt,yt+1\forall y_{t},y_{t+1}, if we denote yt01y^{01}_{t} as the hierarchy 11’s meter reading at time tt and ytout1y^{out_{1}}_{t} as the other hierarchies meter reading at time tt, it holds that:

|ytyt+1|=|yt01yt+101+lt+ytout1yt+1out1|.|y_{t}-y_{t+1}|=|y^{01}_{t}-y^{01}_{t+1}+l_{t}+y^{out_{1}}_{t}-y^{out_{1}}_{t+1}|. (85)

With the sparsity UU, we could further derive that

|ytout1yt+1out1||PU1|,|y^{out_{1}}_{t}-y^{out_{1}}_{t+1}|\leq|P_{U}^{1}|, (86)

where PU1P_{U}^{1} is defined as the maximum summation of UU appliances whose power is smaller than the smallest energy consumption in the hierarchy 11, which is denoted by Pm1P_{m}^{1}. Note that ξt=ytout1\xi_{t}=y^{out_{1}}_{t}, i.e.,

|ξtξt+1|PU1.|\xi_{t}-\xi_{t+1}|\leq P_{U}^{1}. (87)

This constructs our induction basis with the above Lemma 3.

Next, we analyze the bounds for hierarchy ii. We assume for each hierarchy kk, 1k<i1\leq k<i, we have

mk𝐄[αk]Mk,m_{k}\leq\mathbf{E}[\alpha_{k}]\leq M_{k}, (88)

where mkm_{k} is the lower bound of the multi-shot compressive sensing algorithm accuracy for hierarchy kk and MkM_{k} is the upper bound for this hierarchy as indicated in Theorem 6.

Thus, denote the hierarchy ii’s meter reading as yt0iy^{0i}_{t} at time tt and the decoding results as y~ti\tilde{y}^{i}_{t}. By the definition of αk\alpha_{k}, we can show that k<i\forall k<i,

𝐄|y~tkyt0k|\displaystyle\mathbf{E}|\tilde{y}^{k}_{t}-y_{t}^{0k}| =𝐄|(𝑿ϵ[k]¯𝑿0[k])𝐏k|\displaystyle=\mathbf{E}|(\bar{\boldsymbol{X}^{\epsilon}[k]}-\boldsymbol{X}^{0}[k])\mathbf{P}_{k}| (89)
𝐄(𝑿ϵ[k]¯𝑿0[k])2𝐏k2\displaystyle\leq\mathbf{E}\|(\bar{\boldsymbol{X}^{\epsilon}[k]}-\boldsymbol{X}^{0}[k])\|_{2}\|\mathbf{P}_{k}\|_{2}
=NkT(1𝐄[αi])𝐏k2\displaystyle=N_{k}T(1-\mathbf{E}[\alpha_{i}])\|\mathbf{P}_{k}\|_{2}
NkT(1mk)𝐏k2.\displaystyle\leq N_{k}T(1-m_{k})\|\mathbf{P}_{k}\|_{2}.

Also, if we denote all hierarchies whose index m>im>i’s actual total meter reading as ytoutiy^{out_{i}}_{t}, it could be directly derived from the sparsity of switching events that:

|ytoutiyt+1outi||PUi||y^{out_{i}}_{t}-y^{out_{i}}_{t+1}|\leq|P_{U}^{i}| (90)

Thus, the total disturbance for hierarchy ii can be calculated as follows:

ξt=k=1i1(yt0ky~tk)+ytouti.\xi_{t}=\sum\nolimits_{k=1}^{i-1}(y^{0k}_{t}-\tilde{y}^{k}_{t})+y^{out_{i}}_{t}. (91)

This allows as to construct the bounds for 𝐄|ξtξt+1|\mathbf{E}|\xi_{t}-\xi_{t+1}|:

𝐄|ξtξt+1|\displaystyle\mathbf{E}|\xi_{t}-\xi_{t+1}| (92)
\displaystyle\leq 𝐄[k=1i1(|yt0ky~tk|+|yt+10ky~t+1k|)+|ytoutiyt+1outi|]\displaystyle\mathbf{E}[\sum_{k=1}^{i-1}(|y^{0k}_{t}-\tilde{y}^{k}_{t}|+|y^{0k}_{t+1}-\tilde{y}^{k}_{t+1}|)+|y^{out_{i}}_{t}-y^{out_{i}}_{t+1}|]
\displaystyle\leq 2k=1i1NkT(1mk)𝐏k2+|PUi|.\displaystyle 2\sum\nolimits_{k=1}^{i-1}N_{k}T(1-m_{k})\|\mathbf{P}_{k}\|_{2}+|P_{U}^{i}|.

Together with the Lemma 3, we complete the induction, and hence prove Theorem 6.