\floatsetup

[table]capposition=top

Privacy Preserving in Non-Intrusive Load Monitoring: A Differential Privacy Perspective

Haoxiang Wang, Jiasheng Zhang, Chenbei Lu, and Chenye Wu H. Wang, J. Zhang and C. Lu are with the Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University. C. Wu is with School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen. C. Wu is the correspondence author. Email: [email protected].

Abstract

Smart meter devices enable a better understanding of the demand at the potential risk of private information leakage. One promising solution to mitigating such risk is to inject noises into the meter data to achieve a certain level of differential privacy. In this paper, we cast one-shot non-intrusive load monitoring (NILM) in the compressive sensing framework, and bridge the gap between theoretical accuracy of NILM inference and differential privacy’s parameters. We then derive the valid theoretical bounds to offer insights on how the differential privacy parameters affect the NILM performance. Moreover, we generalize our conclusions by proposing the hierarchical framework to solve the multi-shot NILM problem. Numerical experiments verify our analytical results and offer better physical insights of differential privacy in various practical scenarios. This also demonstrates the significance of our work for the general privacy preserving mechanism design.

Index Terms:

Differential Privacy, Non-Intrusive Load Monitoring, Compressive Sensing.

I Introduction

The pervasive intelligent devices are gathering huge amounts of data in our daily lives, spanning from our shopping lists to our favorite restaurants, from travel histories to personal social networks. These big data have eased our social lives and have dramatically changed our behaviors. In the electricity sector, the widely deployed smart meters are collecting user’s energy consumption data in near real time. While these data could be rather valuable to achieve a more efficient power system, they raise significant public concern on private information leakage. Specifically, the big data in the electricity sector speed up the advance in non-intrusive load monitoring (NILM), which aims to infer the user’s energy consumption pattern from meter data.

NILM is one of the most effective ways to conduct consumer behavior analysis [1, 2]. More comprehensive behavior analysis could benefit the consumers by providing ambient assisted living [3], real time energy saving [4], etc. It is evident that such analysis is crucial for the active demand side management, which is believed to be able to significantly improve the whole system efficiency [5]. However, the inferred consumption pattern often exposes individual lifestyles. This implies that the leakage of smart meter data may lead to the concern over the leakage of private information, which calls for a comprehensive privacy preservation scheme. The European Union is the pioneer in customer privacy protection: it has legislated an anti-privacy data protection regulation in 2018 [6]. Brazil also has also set up the General Data Protection Law which become effective in February 2020 [7]. Moreover, 11 states in the US have recently enacted privacy, data security, cybersecurity, and data breach notification laws [8].

To achieve the privacy protection, the most commonly adopted technique is differential privacy (DP), first proposed by Dwork et al. in [9]. DP facilitates the mathematical analysis and is also closely related to other privacy metrics such as mutual information [10]. However, despite well investigated, the parameters in DP do not offer intuitive physical insights, which prevents this technique from wide deployment. Hence, it is a delicate task to design a practical privacy preserving mechanism of NILM.

In this work, we submit that DP indeed has physical implications in the electricity sector. We propose to understand DP through NILM and characterize how the parameters in DP affect the performance guarantee of NILM inference. Our work offers the end users a better idea on the different levels of privacy preserving services in gathering meter data.

I-A Related Works

NILM and DP are both well investigated. Since the seminal work by Hart [5], diverse techniques for NILM have been proposed to solve NILM for improved inference performances. Our work focuses on the disaggregation process in NILM. The classical algorithm is Combinatorial Optimization (CO), proposed in Hart’s seminal work. This algorithm combines heuristic methods with prior knowledge on the switching events. Zulfiquar et al. have furthered this research by introducing the aided linear integer programming technique to speed up CO in [11]. Probabilistic methods have also been proposed recently. For example, Kim et al. have applied Factorial Hidden Markov Model (FHMM) to NILM, and have used Viterbi algorithm for decoding [12]. The key challenge in this line of research is to model the appropriate states for the hidden Markov model. Makonin et al. have used machine learning techniques to solve this issue and have proposed the notion of super-state HMM [13]. The advance in deep learning warrants exploiting more temporal features in NILM. For example, Kelly et al. have deployed the long short-time memory (LSTM) framework for NILM in [14].

There is a recent interest in designing the privacy preserving NILM, and our work also falls into this category. However, most literature have focused on discussing thwarting the privacy attacks on the meter data [15]. The major technique is to utilize the storage system to physically inject noises into the meter data, which does offer the end users a certain privacy guarantee. Backes et al. have theoretically examined the way to achieve different levels of privacy using storage injection manipulation in [16]. More recently, a variety of methods have been exploited to achieve privacy preservation.

Chen et al. have utilized the combined heat and privacy system to prevent occupancy detection in [17]. Cao et al. have investigated the practical implementation from a fog computing approach [18]. Rastogi et al. have further proposed a distributed implementation in [19].

To the best of our knowledge, we are the first to exploit the physical meaning of privacy parameters in the electricity sector.

I-B Our Contributions

In establishing the connection between DP parameters and the inference accuracy of NILM, our principal contributions can be summarized as follows:

•

NILM Formulation with DP: We use the compressive sensing framework to formulate NILM inference and incorporate the parameters of DP into the formulation.
•

Theoretical Characterization of NILM Inference: Based on the compressive sensing formulation, we theoretically characterize the asymptotic upper and lower bounds for the one-shot NILM inference accuracy.
•

Hierarchical Multi-shot NILM: We generalize the one-shot NILM solution to more practical multi-shot scenarios, and propose an effective hierarchical algorithm.

We imagine our proposed framework has at least three early adopters.

•

The first adopter could be the consumers themselves. To preserve privacy, the consumers could utilize the local storage devices (provided by electric vehicles, photovoltaic panels, etc.) to inject noises to achieve certain level of differential privacy. Our theoretical understanding provides an inference accuracy bound to decipher the privacy preserving guarantee.
•

The other adopter could be the ISOs or the utility companies. When recording the meter data from the consumers for potential behavior analysis, the ISOs or utility companies could directly inject noises into the recorded meter data. Due to the Law of Larger Numbers, the injected noise will incur minimal impact on auditing (in fact, auditing before injecting the noise could solve this issue) and most classical tasks of utility companies. However, the injected noise could affect behavior analysis (see detailed discussion in Appendix -A). This is another way to decipher the physical meaning of DP parameters.
•

The last adopter could be the third-party privacy preserving entities. They could collect the data from the consumers for different purposes. And for different privacy preserving requirements, the consumers are paid for different compensations. This could enable new business models for the privacy preserving industry, and our theoretical results could give the first cut understanding on the cost benefit analysis for the emerging business models.

Refer to caption — Figure 1: The paradigm of our paper

The rest of the paper is organized as follows. Section II formulates NILM as a sparse optimization problem. Using the compressive sensing framework, we establish the lower and upper bounds of inference accuracy for NILM, and relate the $\epsilon$ -differential privacy to these asymptotic bounds in Section III to better understand the physical implication of DP. Then, in Section IV, we generalize the compressive sensing framework to the multi-shot scenario and introduce our hierarchical algorithmic framework. Numerical studies verify the effectiveness of our theoretical conclusion in Section V. Concluding remarks and several interesting future directions are given in Section VI. We provide all the necessary proofs and justifications in Appendix. We visualize the paradigm of our paper in Fig. 1, which highlights the underlying logic between sections.

II Problem Formulation

This section first introduces the general NILM problem, and then formulates the one-shot NILM in the compressive sensing framework. We conclude by revisiting the generic compressive sensing algorithm to solve the resulting NILM problem.

II-A Non-intrusive Load Monitoring Problem

In essence, the goal of NILM is to infer user behaviors from meter data. Mathematically, for each end user (or a building complex, a campus, etc.), it may own $N$ appliances, denoted by a set $\mathcal{A}=\{A_{1},..,A_{N}\}$ .

For simplicity, we assume the state space for each appliance is binary, i.e., on or off. For appliance $i\in\mathcal{A}$ , when its state is on, its energy consumption at each time $t$ is a random variable $Z_{i}^{t}$ , with mean of $P_{i}$ . Define the vector of mean energy consumption over all appliances $\mathbf{P}$ as follows:

\displaystyle\mathbf{P}=(P_{1},..,P_{N})^{\dagger}\in\mathbf{R}^{N}.

(1)

Hence, at each time $t$ , the energy consumption of the end user is this aggregate energy consumption of its appliances whose states are on. We denote this subset by $S_{t}\subseteq\{1,...,N\}$ , and denote the meter data at time $t$ by $y_{t}$ . Thus, by definition of $y_{t}$ , we know

\displaystyle y_{t}=\sum\nolimits_{i\in S_{t}}Z_{i}^{t},\,\,\,\,\,\,\,\ \forall t=1,...,T,

(2)

where $T$ stands for the length of observation period. If we consider the end user’s hourly operation with a resolution of $6$ seconds, then $T$ is $600$ . The advanced metering technology has made it possible to collect data at much finer resolution, e.g., in a sub-second scale [20]. Such data could allow us to exploit more unique energy consumption patterns for each appliance. However, in this paper, we choose to focus on a stylized model to better clarify the connection between DP and NILM, without specifying the resolution.

The classical NILM seeks to infer the on/off switch events of all appliances. We denote $X_{t}\in\{0,1\}^{N}$ the on/off indicator of the $N$ appliances at time $t$ (with zero indicating off and one indicating on). NILM aims at inferring $X_{t}$ from $y_{t}$ .

A variety of NILM algorithms have been proposed as we have reviewed in Section I. In this work, we choose the compressive sensing framework for mathematically exploiting the sparsity structure in the event vector $X_{t}$ . However, conducting NILM inference can be delicate as the energy consumptions in $\mathcal{A}$ can be quite diverse, and some ‘elephant’ appliances (that consume a huge amount of power) may dominate the period of interest. In this case, it is almost impossible to accurately distinguish the on/off states of small appliances. In the following formulation, we devise a sufficient condition to enable accurate NILM inference, based on which, we propose the hierarchical framework for more practical situations in the subsequence analysis.

II-B One-shot NILM Inference

To better characterize the relationship between differential privacy and NILM, we focus on the inference for a particular time $t$ and assume the knowledge of $X_{t-1}$ . We term this problem the one-shot NILM inference.

To exploit the sparse structure, the compressive sensing formulation requires the following transforms:

\displaystyle K_{t}=|y_{t}-y_{t-1}|\in\mathbf{R},

(3)

\displaystyle\Delta_{t}=|X_{t}-X_{t-1}|\in\mathbf{R}^{N}.

(4)

With these transforms, we can mathematically characterize the sparsity assumption.

Assumption 1.

(Sparsity Assumption) The switch events are sparse across time. That is, the number of switch events in $\Delta_{t}$ (i.e., $\|\Delta_{t}\|_{0}$ ) is bounded by $U_{t}$ , which satisfies

\displaystyle\|\Delta_{t}\|_{0}\leq U_{t}\ll{N}.

(5)

Remark: This assumption holds for most publicly available datasets with the resolution on the second scale. In Appendix -B, we characterize the sparsity of three most widely adopted public datasets and justify how the failure to meet this assumption may affect the performance of our proposed framework.

This sparsity assumption motivates us to formulate the following optimization problem for the one-shot NILM inference at each time $t$ :

$\displaystyle(\text{P1})\,\,\ \min$	$\displaystyle\\|\Delta_{t}\\|_{0}$	(6)
$\displaystyle s.t.$	$\displaystyle\\|\Delta_{t}\mathbf{P}-K_{t}\\|_{2}<\delta,$
	$\displaystyle\Delta_{t}\in\{0,1\}^{N},$

where $\delta$ is the parameter to characterize the sensitivity of $y_{t}$ .

The notion of sensitivity is defined for random variables:

Definition 1.

We define $\Delta f$ to be the sensitivity of a sequence of bounded random variable $y_{t}$ :

\Delta f=\max_{1\leq t\leq T}\left|\bar{y}_{t}-\underline{y}_{t}\right|=\frac{\delta}{2},

(7)

where $\underline{y}_{t}$ and $\bar{y}_{t}$ are the lower and upper bounds of $y_{t}$ .

Back to our problem (P1), the sensitivity constraints require the meter data are bounded, which yields our second assumption:

Assumption 2.

The meter data¹¹1We only consider the pure load meter data in this work. To handle the net metering cases, we need to first resort to load disaggregation techniques [21, 22, 23], and then apply our framework to the disaggregation pure load data. $y_{t}$ is bounded, i.e.,

\displaystyle y_{t}\in\left[\underline{y}_{t},\bar{y}_{t}\right],1\leq t\leq T.

(8)

This assumption is straightforward to justify: it simply suggests that when the appliances are running, their energy consumption levels are within the specific range. And such range corresponds to the sensitivity parameter $\delta$ .

Remark: This assumption also partially handles the measurement error. Such error could be due to environmental issues, operational issues, or other issues. Practically, such error is not a major concern. The reason is two-folded. Firstly, as indicated in [24], the magnitude of such error is required to be bounded by $5\%$ of the total power. Secondly, if the data quality is too poor to infer any useful information, there won’t be any privacy preserving requirement at all.

Note that, (P1) is similar to the classical problem in the compressive sensing framework [25] but not exactly the same. We denote the optimal solution to (P1) by $\Delta_{t}^{0}$ and regard it as the ground truth. Deciphering this ground truth is quite challenging: the binary constraints in addition to the $l_{0}$ -norm objective function make the problem intractable.

To tackle these challenges, we propose to first relax the binary constraints and then use the standard compressive sensing technique to handle the non-convexity induced by the $l_{0}$ -norm objective function.

Specifically, we make the following technique alignment assumption to enable exact relaxation:

Assumption 3.

(Power Concentration Assumption) We assume the mean energy consumptions of all the $N$ appliances are on the same order. Mathematically, denote $\{P_{1}^{\epsilon},...,P_{N}^{\epsilon}\}$ the ascendingly ordered sequence of $P$ , we assume the following condition holds for all $U<U_{t}$ :

\displaystyle\sum\nolimits_{k=1}^{U}P_{k}^{\epsilon}-\sum\nolimits_{k=1}^{U-1}P_{N+1-k}^{\epsilon}>2\delta.

(9)

Remark: Assumption 3 guarantees the effectiveness of the compressive sensing framework. This condition tries to eliminate the task to distinguish the lightning load from the refrigerator or the heating load, which allows us to focus on the inference of loads of similar energy consumption levels. For cases with diverse energy consumptions, we defer the detailed discussion to Section IV-B. The general idea is to design a hierarchical NILM, where in each hierarchy, the NILM inference satisfies this technical alignment assumption.

This assumption allows us to relax the binary constraints without changing the structure of the optimal solution set:

$\displaystyle(\text{P2})\,\,\ \min$	$\displaystyle\\|\Delta_{t}\\|_{0}$	(10)
$\displaystyle s.t.$	$\displaystyle\\|\Delta_{t}\mathbf{P}-K_{t}\\|_{2}<\delta,$
	$\displaystyle\Delta_{t}\in[0,1]^{N}.$

We formally state the equivalence between (P1) and (P2) in the following Lemma.

Lemma 1.

If Assumptions 1 to 3 hold, problems (P1) and (P2) are equivalent in terms of the optimal solution set. Specifically, if we denote the optimal solution to (P1) by $\Delta_{t}^{0}$ and the optimal solution to (P2) by $\Delta_{t}$ , the equivalence is indicated by the following equation:

\|\Delta_{t}^{0}\|_{0}=\|\Delta_{t}\|_{0}.

(11)

We show detailed proof in Appendix -C. This lemma allows us to focus on solving a more tractable problem, (P2). This is due to the key theoretical basis of compressive sensing: we could effectively use $l_{1}$ -norm to approximate $l_{0}$ -norm in (P2) [26], which yields (P3),

$\displaystyle(\text{P3})\,\,\ \min$	$\displaystyle\\|\Delta_{t}\\|_{1}$	(12)
$\displaystyle s.t.$	$\displaystyle\\|\Delta_{t}\mathbf{P}-K_{t}\\|_{2}<\delta,$
	$\displaystyle\Delta_{t}\in[0,1]^{N}.$

Denote its optimal solution by $\Delta_{t}^{*}$ . To map $\Delta_{t}^{*}$ onto $\{0,1\}^{N}$ , we conduct rounding. Denote the solution after rounding by $\bar{\Delta}_{t}$ . The rounding process is as follows: we first set all elements in $\bar{\Delta}_{t}$ to be $0$ ; and then, for each non-zero element $\Delta_{t}^{*}(j)$ in $\Delta_{t}^{*}$ , with probability $\Delta_{t}^{*}(j)$ , we set the corresponding element $\bar{\Delta}_{t}(j)$ to be $1$ . To characterize the inaccuracy induced by the approximation of $l_{0}$ -norm, we compare $\mathbf{E}[\bar{\Delta}_{t}]$ and the ground truth, $\Delta_{t}^{0}$ . The comparison yields the following theorem:

Theorem 1.

On expectation, the difference between $\bar{\Delta}_{t}$ and $\Delta_{t}^{0}$ is bounded. Specifically,

\|\mathbf{E}[\bar{\Delta}_{t}]-\Delta_{t}^{0}\|_{2}\leq C(\mathbf{P})\cdot\delta,

(13)

where $C(\mathbf{P})$ is a constant uniquely determined by vector $\mathbf{P}$ .

We characterize the specific form of $C(\mathbf{P})$ in Appendix -E. To prove Theorem 1, we mainly make use of the important conclusion in compressive sensing proposed by Candes et al. in [26].

III Encode DP into NILM

Based on the classical compressive sensing methods to solve the one-shot NILM inference, we establish the connection between DP and NILM for exploiting the physical meaning of different DP parameters in the context of NILM.

III-A Revisit $\epsilon$ -Differential Privacy

We adopt the notion of $\epsilon$ -differential privacy [9] to connect NILM with DP. It uses the parameter $\epsilon$ to denote the probability that we are unable to differentiate the two datasets that have only one piece of data difference.

To put it more formally, we define a mapping $\mathscr{B}(D)$ from a dataset $D$ to $\mathbb{R}$ . This mapping enables a query function $q:D\to\mathbb{R}$ . To measure the difference between two datasets $D$ and $D^{\prime}$ , we adopt the distance metric $d(D,D^{\prime})$ , measuring the minimal number of sample changes that are required to make $D$ identical to $D^{\prime}$ . When $d(D,D^{\prime})=1$ , we say $D$ and $D^{\prime}$ are neighbor datasets. This allows us to characterize the notion of $\epsilon$ -DP: if for all neighbor datasets $D_{1}$ and $D_{2}$ , and for all measurable subsets $Y\subset\mathbb{R}$ , the mapping $\mathscr{B}$ satisfies,

\frac{Pr(\mathscr{B}(D_{1})\in Y)}{Pr(\mathscr{B}(D_{2})\in Y)}\leq e^{\epsilon},

(14)

we say the mechanism $\mathscr{B}$ achieves $\epsilon$ -DP [9].

To achieve $\epsilon$ -DP, a simple mechanism is Laplace noise injection [9]. We show the detailed noise injection mechanism in the following theorem.

Theorem 2.

(Theorem 1 in [9]) We say a mechanism $\mathscr{B}$ achieves $\epsilon$ -DP, if $\mathscr{B}(D)$ satisfies

\mathscr{B}(D)=q(D)+n,

(15)

and $n$ is Laplace noise with probability density function $p(s)$ :

p(s)=\frac{1}{2\lambda}e^{-\frac{|s|}{\lambda}},

(16)

in which $\lambda=\frac{\Delta f(D)}{\epsilon}$ combines the privacy parameter $\epsilon$ that describes the level of the privacy and the sensitivity $\Delta f(D)$ satisfying $\Delta f(D)\geq\max\|q(D_{1})-q(D_{2})\|$ for all neighbor datasets $D_{1}$ and $D_{2}$ .

Remark: In our setting, the dataset $D$ can be viewed as the appliance states. The query function $q$ could be viewed as the meter data derived from the appliance states and appliances’ mean energy consumption. The sensitivity $\Delta f$ is the meter data sensitivity defined in (7) for $y_{t}$ ; and $\epsilon$ describes user’s required level of the privacy.

Thus, the Laplace noise injection mechanism helps us establish the connection between $\epsilon$ -DP and NILM inference.

Next, we construct a performance bound related to the privacy parameter $\epsilon$ and show how the level of privacy preservation connects to the performance bound of the NILM inference.

III-B Asymptotic Bounds for NILM with DP

Recall that $\Delta f$ is the sensitivity for meter data $y_{t}$ . After injecting the noise, instead of inferring $X_{t}$ using $y_{t}$ , now we can only rely on the noisy data $y_{t}+n_{t}$ , which yields optimization problem (P4).

$\displaystyle(\text{P4})\,\,\ \min$	$\displaystyle\\|\Delta_{t}\\|_{1}$	(17)
$\displaystyle s.t.$	$\displaystyle K_{t}^{\epsilon}=\|y_{t}+n_{t}-y_{t-1}-n_{t-1}\|,$
	$\displaystyle\\|\Delta_{t}P-K_{t}^{\epsilon}\\|_{2}<\delta,$
	$\displaystyle\Delta_{t}\in[0,1]^{N}.$

Denote the solution to (P4) by $\Delta_{t}^{\epsilon}$ , and the solution after rounding by $\bar{\Delta}_{t}^{\epsilon}$ . Then, it is straightforward to apply Theorem 1 to (P4), and obtain the following proposition.

Proposition 1.

The difference between $\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]$ and the ground truth $\Delta_{t}^{0}$ is bounded. More precisely,

\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}\leq(2+C(\mathbf{P}))\cdot\delta+|n_{t}-n_{t-1}|.

(18)

Note that again $C(\mathbf{P})$ is a function of matrix $\mathbf{P}$ , defined in Theorem 1.

This proposition not only dictates that the error’s upper bound increases almost linearly with the magnitude of noises, but also allows us to connect the DP parameters with the NILM inference accuracy.

Mathematically, we define inference accuracy $\alpha$ as follows:

\alpha=1-\frac{\mathbf{E}\|\bar{\Delta_{t}^{\epsilon}}-\Delta_{t}^{0}\|_{1}}{N}.

(19)

To evaluate $\mathbf{E}[\alpha]$ over all possible injected noises $n_{t}$ and $n_{t-1}$ , we denote

\displaystyle b=2\left(N-(2+C(\mathbf{P}))\delta\right).

(20)

We can now establish the lower bound for NILM inference accuracy as Theorem 3 suggests.

Theorem 3.

The expectation of the inference accuracy $\alpha$ can be lower bounded. Specifically,

\displaystyle\mathbf{E}[\alpha]\geq 1-\frac{4C(\mathbf{P})\delta\epsilon+8\delta\epsilon+3\delta}{4\epsilon N}+\frac{A_{1}\epsilon+B_{1}}{4\epsilon N}e^{-\frac{2\epsilon b}{\delta}},

(21)

where

	$\displaystyle A_{1}=2N-4\delta-2C(\mathbf{P})\delta,$		(22)
	$\displaystyle B_{1}=3\delta.$		(23)

Note that a smaller $\epsilon$ implies a higher level of differential privacy. In proving the lower bound of inference accuracy, the key is to identify that for $\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}$ in Proposition 1, there is a trivial upper bound, $N$ . Hence, we can refine Proposition 1 to construct the lower bound for $\mathbf{E}[\alpha]$ .

As indicated by Theorem 3, the derived lower bound decreases at the rate of $\epsilon^{-1}$ when $\delta$ is small enough (i.e. the impact of the exponential term is neglectable).

To construct the upper bound, we need to exploit the structure of problem (P4). Specifically, standard mathematical manipulations of constraints in (P4) yield the following upper bound characterization:

Theorem 4.

Denote $m=N\|\mathbf{P}\|_{2}+2\delta$ for notational simplification. The expected inference accuracy $\alpha$ can be upper bounded:

\displaystyle\mathbf{E}[\alpha]\leq 1+\frac{A_{2}\epsilon^{2}+B_{2}\epsilon+C_{2}}{8\delta\epsilon N\|\mathbf{P}\|_{2}}e^{-\frac{2\epsilon m}{\delta}}-\frac{16\delta\epsilon^{2}+4\delta\epsilon+3\delta}{16\epsilon N\|\mathbf{P}\|_{2}}e^{-\epsilon},

where

	$\displaystyle A_{2}=4m^{2}-8\delta m-4mN\\|\mathbf{P}\\|_{2},$		(24)
	$\displaystyle B_{2}=6\delta m-8\delta^{2}-4\delta N\\|\mathbf{P}\\|_{2},$		(25)
	$\displaystyle C_{2}=3\delta^{2}.$		(26)

Theorem 4 implies that the upper bound decreases at the rate of $o({\epsilon}^{-1}e^{-\epsilon})$ . That is, the upper bound would also decrease when $\delta$ becomes larger. Hence, there is a certain gap between the lower bound and the upper bound. We want to emphasize that although the proof (in Appendix -G) is based on the compressive sensing framework, it can be generalized to many other NILM algorithms for the upper bound construction.

IV Hierarchical Multi-shot NILM

In this section, we analyze the multi-shot NILM and propose our hierarchical decomposition algorithm. Specifically, we first introduce the multi-shot NILM formulation, identify its challenges and conduct the classical treatment. Then, we explain how to relax Assumption 3 with hierarchical decomposition. We conclude this section by analyzing the accuracy bounds for the corresponding multi-shot NILM inference.

IV-A Multi-shot NILM Inference

The special difficulty in multi-shot NILM inference, compared with one-shot NILM, is the temporal coupling between inferences. This issue, if not well addressed, could lead to large cumulative errors, and ultimately lead to the cascading inference failure²²2See the cascading effect over network in [27] for more detailed discussion.. Therefore, in this part, after the problem formulation, we intend to design an effective algorithm to contain this error.

The multi-shot NILM inference problem is a straightforward generalization of the one-shot NILM inference. It seeks to infer the appliance switch events over a period of $T$ from the meter data sequence $\boldsymbol{Y}=[y_{1},..y_{T}]$ . The inference also relies on the vector of mean energy consumption $\mathbf{P}$ , defined in Eq. (1). We denote $\boldsymbol{X}=[X_{1},...,X_{T}]$ the state probability matrix, where each $X_{t}$ captures the probability of appliances states at time $t$ . To establish the asymptotic bounds for our multi-shot NILM inference accuracy, we denote $\boldsymbol{X}^{0}=[X^{0}_{1},...,X^{0}_{T}]$ the ground truth of switching events over $T$ , and $\bar{\boldsymbol{X}^{\epsilon}}=[\bar{X_{1}^{\epsilon}},..,\bar{X_{T}^{\epsilon}}]$ the inference results returned by our proposed algorithm.

It is worth noting that the situation becomes even more complicated if we conduct rounding at each time slot. We submit that it is valuable to keep the inference results obtained in solving problem (P3) without rounding as these results can be interpreted as the probability of switching. This allows us to derive the state probability matrix $\boldsymbol{X}=[X_{1},...,X_{T}]$ . Note that, after obtaining the matrix $\boldsymbol{X}$ , we can no longer conduct the rounding naively: there is more information about the actual meter readings. We could utilize the simple error correcting methods to decipher the embedded information.

Specifically, the error correcting aims to make the binary adjustments to reduce the approximation error according to the actual meter data. Though it could be possible for us to consider more complex methods for error correcting rather than direct adjustment, they make little difference in terms of the worst case bound analysis.

We describe the classical treatment to multi-shot NILM inference in Algorithm 1, where we adopt the notion of the Hadamard Product, which is defined for two matrices $\boldsymbol{A}$ and $\boldsymbol{B}$ as follows:

Definition 2.

For two matrices $A$ and $B$ of the same dimension $m\times n$ , the Hadamard product $A\bigodot B$ is the $m\times n$ matrix $C$ with elements given by

\displaystyle c_{ij}=a_{ij}b_{ij},\ \forall 1\leq i\leq m,\ \forall 1\leq j\leq n.

(27)

There is a simple way to construct rough inference accuracy bounds directly from Theorem 3 and Theorem 4. The trick is to examine the rounding procedure in Algorithm 1 and to connect it with the bounds for one-shot NILM inference. Formally, we define the accuracy of the classical algorithm for multi-shot NILM inference by $\alpha_{m}$ :

{\alpha}_{m}=1-\frac{\|\bar{\boldsymbol{X}^{\epsilon}}-\boldsymbol{X}^{0}\|_{1}}{NT}.

(28)

Denote the lower bound and upper bound for the one-shot NILM inference accuracy by $b(\delta,\epsilon)$ and $B(\delta,\epsilon)$ , respectively (see Theorem 3 and 4). Recall that $\epsilon$ is the DP’s parameters (see Theorem 2) and $\delta$ describes the sensitivity of the meter data for each period $y_{t}$ (see Eq. (7)).

With these definitions, we can characterize the lower and upper bounds for $\mathbf{E}[\alpha_{m}]$ :

Theorem 5.

The expected inference accuracy $\mathbf{E}[\alpha_{m}]$ for the multi-shot NILM problem can be lower and upper bounded. Specifically, we have

1-\frac{(T-1)(1-b(\delta,\epsilon)N)}{2}\leq\mathbf{E}[\alpha_{m}]\leq 1-\frac{1-B(\delta,\epsilon)}{T},

(29)

where $b(\delta,\epsilon)$ and $B(\delta,\epsilon)$ are the lower bound and upper bound for the one-shot NILM inference accuracy.

0: The initial state

X_{0}

;

The meter reading sequence

\boldsymbol{Y}=y_{0},y_{1},...,y_{t}

;

The appliances’ number

N

;

The appliances’ mean power vector

\mathbf{P}

;

0: The appliance state sequences

\bar{X^{\epsilon}_{1}},...,\bar{X^{\epsilon}_{T}}

;

1: Get the differences between time points of the meter readings as

k_{1},...,k_{t}

K_{t}=y_{t}-y_{t-1},t=1,...,T;

2: Solve the optimization problem (12) for each

t

and get the approximation solution

\Delta_{1},...,\Delta_{T}

;

3: for time

t

from

1

T

X_{t}=X_{t-1}\bigodot(1-\Delta_{t-1})+(1-X_{t-1})\bigodot\Delta_{t-1}

;

5: end for

6: for time

t

from

1

T

7: Do rounding on

X_{t}

and derive the vector

\bar{X_{t}^{\epsilon}}

8: if

\bar{X_{t}^{\epsilon}}\mathbf{P}>y_{t}

then

k=0

;

10: while

\bar{X_{t}^{\epsilon}}\mathbf{P}>y_{t}

and

k<N

11: find

k^{th}

largest appliance

j

and set

\bar{X_{t}^{\epsilon j}}=0

;

12:

k=k+1

;

13: end while

14: else if

\bar{X_{t}^{\epsilon}}\mathbf{P}<y_{t}

then

15:

k=0

;

16: while

\bar{X_{t}^{\epsilon}}\mathbf{P}<y_{t}

and

k<N

17: find

k^{th}

smallest appliance

j

and set

\bar{X_{t}^{\epsilon j}}=1

;

18:

k=k+1

;

19: end while

20: end if

21: end for

22: return

\bar{X^{\epsilon}_{1}},...,\bar{X^{\epsilon}_{T}}

Algorithm 1 Multi-shot Inference

IV-B Hierarchical Decomposition

Albeit that we have deployed the error correction method, the two bounds in Theorem $5$ are still rather loose. For example, as $T$ increases, the lower bound could quickly approach $0$ while the upper bound would quickly approach $1$ . This is because we still need to take into account the cascading effects in the union bound though we have made some efforts to contain the error. Another reason is due to Assumption 3. We may relax this assumption for improved bounds.

The idea is simple and straightforward. Intuitively, to relax Assumption 3, we need to conduct a hierarchical decomposition for different proper appliance sets. To obtain the proper sets, we first sort all the appliances in $\mathcal{A}$ according to their mean energy consumption levels. Then, starting with the least energy-consuming appliance, we could form a number of proper sets that satisfy Assumption 3.

To construct the sets, we use the greedy policy. For a certain set with $S$ appliances, the condition that makes the $(S+1)^{th}$ appliance join the current set instead of creating a new set is as follows:

\displaystyle\sum\nolimits_{i=1}^{\lfloor\frac{S}{2}\rfloor+1}P_{i}-2\delta\geq\sum\nolimits_{i=1}^{\lfloor\frac{S}{2}\rfloor-1}P_{S+1-i}+P_{S+1},

(30)

where the subscript $i$ in $P_{i}$ also illustrates its ascending order in the set.

Proposition 2.

The set generation criteria (30) guarantees the validity of Assumption 3 for each proper generated set.

We call each set a hierarchy, which establishes the basis for our hierarchical multi-shot NILM inference. We apply Algorithm 1 to each hierarchy in the descending order. We characterize the whole procedure in Algorithm 2.

We submit that the hierarchical decomposition helps refine asymptotic bounds for $\mathbf{E}[\alpha_{m}]$ in that it characterizes the connection between hierarchies. This allows us to derive a recursive formulation to estimate the overall inference accuracy. Define inference accuracy for hierarchy $i$ as $\alpha_{i}$ :

\alpha_{i}=1-\frac{\|\bar{\boldsymbol{X}^{\epsilon}[i]}-\boldsymbol{X}[i]^{0}\|_{1}}{N_{i}T},

(31)

where $N_{i}$ denotes the number of appliances in hierarchy $i$ ; $\boldsymbol{X}^{\epsilon}[i]$ and $\boldsymbol{X}[i]^{0}$ denote the inferred result and the ground truth of the appliances in hierarchy $i$ , respectively. We also define the mean power matrix for hierarchy $i$ as $\mathbf{P}_{i}$ and the minimum power in hierarchy $i$ as $P^{i}_{m}$ . For simplicity, we also define the maximal summation of $U$ appliances whose power is smaller than $P^{i}_{m}$ as $P^{i}_{U}$ , in which $U$ is the sparsity of the switching events as we defined in Assumption 1.

Now we can formally introduce our bounds characterization theorem for hierarchical multi-shot NILM inference, with the help of Theorem 5. For simplicity, define

		$\displaystyle b_{m}(\delta,\epsilon)=1-\frac{(T-1)(1-b(\delta,\epsilon)N)}{2},$		(32)
		$\displaystyle B_{M}(\delta,\epsilon)=1-\frac{1-B(\delta,\epsilon)}{T}.$		(33)

Then, we can show that

Theorem 6.

The expected inference accuracy for each hierarchy $i$ , $\mathbf{E}[\alpha_{i}]$ , can be lower bounded by $m_{i}$ and upper bounded by $M_{i}$ . Specifically,

	$\displaystyle m_{i}=b_{m}\left(\delta+\frac{2}{2+C(\mathbf{P})}\delta_{i}^{\prime},\epsilon\right),$		(34)
	$\displaystyle M_{i}=B_{M}(\delta+\delta_{i}^{\prime},\epsilon),$		(34)

where

\delta_{i}^{\prime}=\frac{P_{U}^{i}}{2}+\sum_{k=1}^{i-1}N_{k}T(1-m_{k})\|\mathbf{P}_{k}\|_{2}.

(35)

Note that, by definition of summation, $\delta_{1}^{\prime}=\frac{P_{U}^{1}}{2}$ .

0: The initial state

X_{0}

;

The meter reading sequence

y_{0},y_{1},...,y_{T}

;

The number of appliances

N

;

The appliances power vector

\mathbf{P}

;

0: The appliances states sequences

\bar{\boldsymbol{X^{\epsilon}}}=\bar{X^{\epsilon}_{1}},...,\bar{X^{\epsilon}_{T}}

;

1: Decomposition:

2: Rank the power vector

\mathbf{P}

ascendingly:

P_{1},...,P_{N}

;

k=1

C=\varnothing

S=\varnothing

;

4: for

i

from 1 to

N

5: if

|C|=0

1

then

C\leftarrow{C\cup\{P_{i}\}}

;

7: else

8: For the items in

C

, rank them ascendingly:

c_{1},...,c_{|C|}

;

9: if

\sum_{j=1}^{\lfloor\frac{|C|}{2}\rfloor+1}c_{j}-2\delta\geq\sum_{j=1}^{\lfloor\frac{|C|}{2}\rfloor-1}c_{|C|+1-j}+P_{i}

then

10:

C\leftarrow{C\cup\{P_{i}\}}

;

11: else

12:

S\leftarrow{S\cup\{C\}}

;

13:

C={P_{i}}

;

14: end if

15: end if

16: end for

17: Decoding:

18: Rank

S

descendingly with respect to the largest element in each set;

19: for

S_{i}

S

20: Construct the appliances in

S_{i}

’s power vector

P_{S_{i}}

;

21:

X_{0}[S_{i}]

to denote the initial states of the appliances in the set

S

;

22: Conduct Multi-shot Compressive Sensing with

X_{0}[S_{i}]

\boldsymbol{Y}

|S_{i}|

P_{S_{i}}

and derive

\bar{\boldsymbol{X}^{\epsilon}}[S_{i}]

23:

\boldsymbol{Y}=\boldsymbol{Y}-\bar{\boldsymbol{X}^{\epsilon}}[S_{i}]P_{S_{i}}

;

24: end for

25: return

\bar{\boldsymbol{X}^{\epsilon}}=\bar{X_{1}^{\epsilon}},...,\bar{X_{T}^{\epsilon}}

Algorithm 2 Hierarchical Multi-shot Inference

Specifically, suppose we have $K$ hierarchies in the problem, the overall accuracy $\alpha_{H}$ can be recursively obtained as follows:

\mathbf{E}[\alpha_{H}]=\frac{\sum_{i=1}^{K}\mathbf{E}[\alpha_{i}]N_{i}}{\sum_{i=1}^{K}N_{i}}.

(36)

The associated bounds for our multi-shot compressive sensing algorithm with hierarchical decomposition could also be derived straightforwardly with the help of Theorem 6.

Remark: It is possible to generalize our framework to the non-binary multiple discrete states case. The key modification is to characterize the state of each appliance with a vector of random variables, instead of a single random variable. However, in this paper, we choose to focus on a stylized model, which only considers the binary state. This stylized model, by simplifying the mathematical details, helps to sharpen our understanding of the physical insights into the DP parameters in the electricity sector.

V Numerical Studies

In this section, we design a sequence of numerical studies to highlight the effectiveness of our proposed bound characterization. The effectiveness lies in the consistent magnitude with the empirical trends of the inference accuracy (in DP parameter $\epsilon$ ).

We conclude this section by illustrating that our derived asymptotic bounds are not only effective to the compressing sensing framework, but also effective to characterize the performance of many other NILM algorithms.

V-A Overview of the Datasets

In this paper, we use three publicly available datasets.

•

The UK-DALE dataset [28] records the electricity consumption from five households in UK, with a sampling rate of 6 Hz for each individual appliances. For performance evaluations, we choose the data segment from 2013/05/26 21:03:14 to 2013/05/27 0:54:50.
•

The TEALD dataset [20] is collected by Rainforest EMU2 in Canada. With a sampling rate of 1 Hz, we use the data segment from 2016/02/09 16:00:00 to 2016/02/10 15:59:59 for performance evaluation.
•

The Redd dataset [29] is also commonly used for NILM performance evaluation. We use the data from 2011/04/18 21:22:13 to 2011/05/25 03:56:34 with a sampling rate of 3 HZ.

V-B One-shot NILM Inference

We use the dataset of building 4 in UK-DALE dataset [28] to conduct the first numerical analysis. In this building, there are 8 appliances.

Using the one-shot NILM inference based on compressive sensing, we evaluate the inference performance with increasing level of DP. Fig. 2 plots the theoretical lower bound, the actual performance traces, and the theoretical upper bound to evaluate our theorems. The lower bound is not tight mainly due to the sensitivity and parameter $C(\mathbf{P})$ . The sensitivity is not tight in that it is time-varying. In some time slots, some appliances are simply idle. However, they all contribute to the sensitivity. The parameter $C(\boldsymbol{P})$ is not tight due to exactly the same reason. The meter data suggests $\delta$ be $2$ , and $C(\boldsymbol{P})$ should be $0.015$ . Nevertheless, our lower bound is tight in terms of magnitude. The mean of the actual inference accuracy indeed decreases at the rate of $\epsilon^{-1}$ .

On the other hand, our proposed upper bound seems tight for this case study. We want to emphasize that this upper bound may not always be tight in that we assume the existence of an oracle in the upper bound construction. Hence, this upper bound would be more ideal to serve as a benchmark for all possible NILM algorithms. Hence, from Fig. 2, we can conclude that compressive sensing is a good approach for one-shot NILM inference.

V-C Multi-shot NILM Inference

We then use the UK-DALE dataset to validate our Algorithm 1 which relies on Assumption 3. The inference accuracy is shown in Fig. 3. In fact, our proposed approach performs remarkably good, and the inference accuracy trend shows a similar decreasing rate as Theorem 6 dictates. We want to emphasize that, all of our figures are plotted on a half log scale, which is ideal to verify the decreasing rate of $\epsilon^{-1}$ .

We use the TEALD dataset to further demonstrate the effectiveness of our hierarchical decomposition. There are 13 appliances in the dataset, with diverse energy consumptions. We group them into four hierarchies and conduct the NILM inference. Fig. 4 plots its inference accuracy trend in the DP parameter, which again highlights the remarkable performance of our proposed algorithm, and verifies the effectiveness of our derived bounds. During the simulation, we observe that our hierarchical algorithm seems to be quite robust against noises, which may lead to interesting theoretical results, though a detailed discussion is beyond the scope of our work.

V-D Robustness of Noise Injection Methods

We examine the robustness of our conclusion through numerical studies. Specifically, to understand if our conclusion only holds for Laplace noise injection. We consider other noise injection methods, which also achieves $\epsilon$ -DP. Here, we employ the staircase mechanism, described in Algorithm 3, which achieves $\epsilon$ -DP [30].

\epsilon

\Delta f

;

0: the staircase noise

n

;

\gamma\leftarrow 1/(1+e^{\epsilon/2})

;

2: Generate a discrete random variable

S

\rm{Pr}(S=1)=\rm{Pr}(S=-1)=\frac{1}{2}

;

3: Generate a geometric random variable

G

with

\rm{Pr}(G=i)=(1-b)b^{i}

for integer

i\geq 0

where

b=e^{-\epsilon}

;

4: Generate a random variable

U

from a uniform distribution in

[0,1]

;

5: Generate a binary random variable

B

with

\rm{Pr}(B=0)=\frac{\gamma}{\gamma+(1-\gamma)b}

and

\rm{Pr}(B=1)=\frac{(1-\gamma)b}{\gamma+(1-\gamma)b}

;

n\!\leftarrow\!S((1\!-\!B)((G\!+\!\gamma U)\Delta f)\!+\!B((G\!+\!\gamma\!+\!(1\!-\!\gamma)U)\Delta f))

;

7: return

n

Algorithm 3 Generation of Staircase noise [30]

We follow the same routine in Section V-C to conduct the numerical study and compare the results in Fig. 5. It is clear that different noise injection methods lead to similar relationships between the DP parameters and the inference accuracy. This further illustrates the robustness of our conclusion against noise injection methods.

V-E Beyond Compressive Sensing

We can now safely conclude that, in the compressive sensing framework, the inference accuracy heavily relies on the DP parameter $\epsilon$ . Specifically, we observe the inference accuracy decreases linearly with $\ln{(\frac{1}{\epsilon})}$ as our figure shows when $\ln{(\frac{1}{\epsilon})}$ is large. We now try to generalize this conclusion to other common frameworks in NILM literature. Specifically, we conduct numerical studies to empirically compare how DP’s parameter affects the accuracy of common NILM algorithms, including sparse Viterbi algorithm for super-state HMM (Sparse-HMM), aided linear Integer Programming (ALIP), linear Integer Programming (IP), Recurrent Neural Network (RNN), CO, FHMM, and our proposed algorithm (Multi-shotCS).

We conduct numerical comparison on Building 1 in the Redd dataset. For supervised learning models, we divide the dataset for training (before 2011-04-03) and for validation (after 2011-04-03). Then, we inject noises to the validation set for performance evaluation. For unsupervised model (including our approach), we directly inject noises to the meter data.

We explain the details about the hyperparameter settings as follows. In fact, most of the settings follow the seminal works.

•

When implementing the SparseHMM algorithm, we follow the hyperparameters in the seminal Redd dataset numerical study in [13]. Specifically, the number of the maximal super state is set to be $4$ ; and the state number choice parameter $\epsilon$ is set to be $0.00021$ .
•

As for ALIP and IP algorithms, we follow all the parameters in [11], including states, specific ratings and the state transient values. We choose the zero initialization for these two algorithms.
•

For RNN implementation, we follow the RNN structure in [14]. Specifically, we first adopt the 1D convolution layer to handle the input data. And then use two bidirection LSTM model and a fully connected layer to derive the output. We adopt the Adam optimizer and the initial learning rate of $0.01$ for training. The number of training epochs is set to be $5$ and the batch size is set to be $128$ .
•

For CO and FHMM, we again adopt the classical settings in NILMTK [31]. The detailed settings are also provided in the online document [32].
•

For our proposed Multi-shot compressive sensing algorithm, we evaluate the meter data and suggest $\delta$ should be $20$ . This is derived by observing the maximal fluctuations in the historical meter data during the period without any switch event.

As indicated by Fig. 6, among all the algorithms, SparseHMM achieves the best performance as it makes full use of the sparsity assumption and exploits the temporal correlations through the hidden Markov chain. They are also the underlying guarantee for the remarkable performance of our proposed framework. However, the utilization of temporal correlation makes both algorithms vulnerable to the noise injection due to the cascading inference failure. While ALIP outperforms the conventional IP approach, both of them show strong robustness to noise injection. This is because they do not rely on the sparsity assumption, and the median filter process in the two algorithms contributes to smooth out the fluctuations incurred by noise injection. The ordinary performance of RNN is due to the limited training data and the overfitting issue. The fewer assumption utilized by classical methods (CO and FHMM) makes them more robust to noise injection at the cost of low inference accuracy.

The key observation is that the inference accuracy of each of the seven algorithms exhibits piecewise linear relationship with $\ln(\frac{1}{\epsilon})$ , though with different breaking points and slopes. The slopes can serve as the indicators to evaluate each algorithm’s robustness to Laplace noise (and hence its difficulty to achieve privacy preserving).

Nonetheless, our proposed bounds provide the correct magnitude to explain the physical meaning of DP parameters in most common frameworks for NILM, which could be inferred from Fig. 6 empirically. Therefore, in the practical utilization of DP, our theoretical performance bounds could offer magnitude estimation for the effects of the different DP levels. Moreover, these empirical results also inspire us to derive a performance bound for these NILM algorithms for further discussion.

VI Conclusion

In this paper, we seek to offer the explicit physical meaning of $\epsilon$ in $\epsilon$ -DP. We firstly theoretically derive how the level of DP affects the performance guarantee for NILM inference. Then, we use extensive numerical studies to highlight that our theoretical results are effective not only for in the compressive sensing framework, but also for many other algorithms. These results could be helpful for the potential new business models for privacy preserving in the electricity sector.

This work can be extended in many ways. For example, it will be interesting to derive a tighter upper bound, or a tighter lower bound for the inference accuracy. Then a more detailed analysis for the behavior individual appliance under DP framework could be conducted. It is also an interesting topic to analyze the robustness of different NILM algorithms when the privacy preservation is implemented.

References

[1] George W Hart. Residential energy monitoring and computerized surveillance via utility power flows. IEEE Technology and Society Magazine, 8(2):12–16, 1989.
[2] Frank Kreith and Raj P Chhabra. CRC handbook of thermal engineering. CRC press, 2017.
[3] Michael Fell, Harry Kennard, Gesche Huebner, Moira Nicolson, Simon Elam, and David Shipworth. Energising health: A review of the health and care applications of smart meter data. London, UK: SMART Energy GB, 2017.
[4] Jorge Revuelta Herrero, Álvaro Lozano Murciego, Alberto López Barriuso, Daniel Hernández de La Iglesia, Gabriel Villarrubia González, Juan Manuel Corchado Rodríguez, and Rita Carreira. Non intrusive load monitoring (nilm): A state of the art. In International Conference on Practical Applications of Agents and Multi-Agent Systems, pages 125–138. Springer, 2017.
[5] G. W. Hart. Residential energy monitoring and computerized surveillance via utility power flows. IEEE Technology and Society Magazine, 8(2):12–16, June 1989.
[6] A. Connolly. Freedom of encryption. IEEE Security Privacy, 16(1):102–103, January 2018.
[7] Renato Leite Monteiro. The new brazilian general data protection law — a detailed analysis, 2018.
[8] Cynthia Brumfield. 11 new state privacy and security laws explained: Is your business ready? https://www.csoonline.com/article/3429608/11-new-state-privacy-and-security-laws-explained-is-your-business-ready.html, 2018.
[9] Cynthia Dwork, Frank Mcsherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. Lecture Notes in Computer Science, pages 265–284, 2006.
[10] Weina Wang, Lei Ying, and Junshan Zhang. On the relation between identifiability, differential privacy, and mutual-information privacy. IEEE Trans. on Information Theory, 62(9):5018–5029, 2016.
[11] Md Zulfiquar Ali Bhotto, Stephen Makonin, and Ivan V Bajic. Load disaggregation based on aided linear integer programming. IEEE Trans. on Circuits and Systems Ii-express Briefs, 64(7):792–796, 2017.
[12] Hyungsul Kim, Manish Marwah, Martin Arlitt, Geoff Lyon, and Jiawei Han. Unsupervised disaggregation of low frequency power measurements, 2011.
[13] Stephen Makonin, Fred Popowich, Ivan V Bajic, Bob Gill, and Lyn Bartram. Exploiting hmm sparsity to perform online real-time nonintrusive load monitoring. IEEE Trans. on Smart Grid, 7(6):2575–2585, 2016.
[14] Jack Kelly and William J Knottenbelt. Neural nilm: Deep neural networks applied to energy disaggregation. arXiv: Neural and Evolutionary Computing, pages 55–64, 2015.
[15] Georgios Kalogridis, Costas Efthymiou, Stojan Z Denic, Tim A Lewis, and Rafael Cepeda. Privacy for smart meters: Towards undetectable appliance load signatures, 2010.
[16] M Backes and Sebastian Meiser. Differentially private smart metering with battery recharging. IACR Cryptology ePrint Archive, 2012:183, 2012.
[17] Dong Chen, Sandeep Kalra, David Irwin, Prashant Shenoy, and Jeannie Albrecht. Preventing occupancy detection from smart meters. IEEE Trans. on Smart Grid, 6(5):2426–2434, 2015.
[18] Hui Cao, Shubo Liu, Longfei Wu, Zhitao Guan, and Xiaojiang Du. Achieving differential privacy against non‐intrusive load monitoring in smart grid: A fog computing approach. Concurrency and Computation: Practice and Experience, 31(22), 2019.
[19] Vibhor Rastogi and Suman Nath. Differentially private aggregation of distributed time-series with transformation and encryption, 2010.
[20] Stephen Makonin, Z Jane Wang, and Chris Tumpach. Rae: The rainforest automation energy dataset for smart grid meter data analysis. international conference on data technologies and applications, 3(1):8, 2018.
[21] Emre C Kara, Ciaran M Roberts, Michaelangelo Tabone, Lilliana Alvarez, Duncan S Callaway, and Emma M Stewart. Disaggregating solar generation from feeder-level measurements. Sustainable Energy, Grids and Networks, 13:112–121, 2018.
[22] Michaelangelo Tabone, Sila Kiliccote, and Emre Can Kara. Disaggregating solar generation behind individual meters in real time. In Proceedings of the 5th Conference on Systems for Built Environments, pages 43–52, 2018.
[23] Fei Wang, Kangping Li, Xinkang Wang, Lihui Jiang, Jianguo Ren, Zengqiang Mi, Miadreza Shafie-khah, and João PS Catalão. A distributed pv system capacity estimation approach based on support vector machine with customer net load curve features. Energies, 11(7):1750, 2018.
[24] National Electrical Manufacturers Association et al. Ansi c12. 20-2010: American national standard for electricity meter: 0.2 and 0.5 accuracy classes. American National Standards Institute, 2010.
[25] David L Donoho and Michael Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization. Proc. of the National Academy of Sciences, 100(5):2197–2202, 2003.
[26] Emmanuel J Candes, Justin Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006.
[27] David Easley, Jon Kleinberg, et al. Networks, crowds, and markets: Reasoning about a highly connected world. Significance, 9:43–44, 2012.
[28] Jack Kelly and William J Knottenbelt. The uk-dale dataset, domestic appliance-level electricity demand and whole-house demand from five uk homes. Scientific Data, 2(1):150007–150007, 2015.
[29] J Zico Kolter and Matthew J Johnson. Redd: A public data set for energy disaggregation research, 2011.
[30] Quan Geng and Pramod Viswanath. The optimal mechanism in differential privacy. In 2014 IEEE international symposium on information theory, pages 2371–2375. IEEE, 2014.
[31] Nipun Batra, Jack Kelly, Oliver Parson, Haimonti Dutta, William Knottenbelt, Alex Rogers, Amarjeet Singh, and Mani Srivastava. Nilmtk: an open source toolkit for non-intrusive load monitoring. In Proceedings of the 5th international conference on Future energy systems, pages 265–276, 2014.
[32] NILMTK Authors. Nilmtk’s api documentation. http://nilmtk.github.io/nilmtk/master/index.html.
[33] Yang Yu, Guangyi Liu, Wendong Zhu, Fei Wang, Bin Shu, Kai Zhang, Nicolas Astier, and Ram Rajagopal. Good consumer or bad consumer: Economic information revealed from demand profiles. IEEE Transactions on Smart Grid, 9(3):2347–2358, 2017.
[34] Pecan Street INC. Pecan street data. http://www.pecanstreet.org.

-A DP May Affect Behavior Analysis

It is believed that the load profile is an important pattern to characterize consumer behavior. Yu et al. further show that the load profile can uniquely determine the system’s marginal cost in serving each kind of consumer [33], and suggest that the k-means clustering based on $l_{1}$ -norm is the suitable measure to cluster the users with similar system’s marginal serving cost.

To empirically highlight how the DP parameters may blur the behavior analysis, we use the Pecan Street dataset [34], and randomly select $40$ users’ $3$ -month daily energy consumption profiles (from May 1 to Aug. 8, 2015). For a larger dataset, we combine all the profiles together and directly conduct the behavior analysis on this combined dataset. We select the number of clusters to be $30$ . The initial clustering results are indicated in Fig. 7.

We inject the Laplace noise into every single consumer’s load profile. We set the DP-parameter $\lambda$ to be $0.01,0.05,0.1,0.5$ , respectively to examine how this parameter may change the clustering result. Fig. 8 plots such impacts. Specifically, if a cluster has little users deviating to other clusters after noise injection, we call it a stable cluster. The color of each cluster represents such stability. The arrows in the figure show the trajectories of cluster interchanges. Obviously, higher privacy requirement (higher $\lambda$ ) leads to higher probability that users deviate from the original cluster.

To better connect the DP parameter with accuracy, we can estimate the probability that each consumer may stay in the original cluster and derive the upper bound for this probability.

-B Justification of Sparsity Assumption

While it is hard to theoretically examine how the failure to meet this assumption may affect the performance of the proposed inference framework, we provide the following numerical analysis. Define the degree of sparsity $s$ for vector $X\in\{0,1\}^{N\times T}$ as follows:

s=1-\frac{\sum_{t=1}^{T-1}\|X_{t+1}-X_{t}\|_{0}}{N(T-1)}.

(37)

This metric allows us to synthetic load data with different sparsity levels to our theoretical conclusions. Fig. 9 plots the relationship between DP parameter and inference accuracy for different sparsity levels, from $0.3$ to $0.9$ . It is clear that the general trends remain the same for different sparsity levels while higher sparsity level leads to better performance.

We further illustrate the sparsity levels of three most widely adopted public datasets in Table I. The sparsity levels of these three datasets are all very high.

TABLE I: The Sparsity of Three Widely Adopted Public Datasets

Dataset	Sparsity
UK-DALE	99.86%
TEALD	99.29%
REDD	98.97%

-C Proof of Lemma 1

Denote the optimal solution to (P1) by $\Delta_{t}^{0}$ and the optimal solution to (P2) by $\Delta_{t}$ . By the equivalence of optimal solution set, we mean

\|\Delta_{t}^{0}\|_{0}=\|\Delta_{t}\|_{0}.

(38)

Hence, it suffices for us to show two cases: $\|\Delta_{t}^{0}\|_{0}\leq\|\Delta_{t}\|_{0}$ and $\|\Delta_{t}^{0}\|_{0}\geq\|\Delta_{t}\|_{0}$ . Note that, it is straightforward to see that all feasible solutions to (P1) are also feasible to (P2). Hence, the second inequality immediately follows. All that remains is to prove the first inequality.

We prove it by contradiction by utilizing the crucial technical alignment assumption: Assumption 3. It states that for the ascending order of $\mathbf{P}$ ’s entries $\{P^{\epsilon}_{1},..,P^{\epsilon}_{N}\}$ , and for all $U\leq U_{t}$ , it holds:

\displaystyle\sum\nolimits_{k=1}^{U}P_{k}^{\epsilon}-\sum\nolimits_{k=1}^{U-1}P^{\epsilon}_{N+1-k}>2\delta.

(39)

Substituting the sparsity assumption into the sensitivity constraint yields that there exists $P_{1}$ ,.., $P_{U}$ satisfying:

\displaystyle\sum\nolimits_{k=1}^{U}P_{i}\leq K_{t}+\delta.

(40)

It holds for all possible $\underline{U}<U$ ,

		$\displaystyle\sum\nolimits_{k=1}^{\underline{U}}P_{N+1-k}^{\epsilon}<\sum\nolimits_{k=1}^{\underline{U}}P_{i}-2\delta$		(41)
	$\displaystyle<$	$\displaystyle\sum\nolimits_{k=1}^{\underline{U}}P_{k}^{\epsilon}-2\delta\leq K_{t}-\delta.$		(41)

Note that, Eqs. (40) and (41) hold for both problems (P1) and (P2).

Denote $\|\Delta_{t}^{0}\|_{0}$ by $U$ , and denote $\|\Delta_{t}\|_{0}$ by $\underline{U}$ . Suppose $\underline{U}<U$ , Eqs. (40) and (41) yield that

\displaystyle\Delta_{t}\mathbf{P}\leq\sum\nolimits_{k=1}^{\underline{U}}P_{N+1-k}^{\epsilon}<K_{t}-\delta,

(42)

This indicates that $\Delta_{t}$ does not satisfy the sensitivity constraint in (P2), which contradicts to our assumption that it is the optimal solution to (P2). Thus, we could know that $\|\Delta_{t}^{0}\|_{0}\leq\|\Delta_{t}\|_{0}$ . This concludes our proof.

-D Proof for Theorem 1

We first define the notations and present the lemma that we use to prove this theorem.

For a set $U\subseteq\{1,\cdots,N\}$ , define vector $\mathbf{P}_{U}$ as follows:

\mathbf{P}_{U}=\left\{P_{i}\bigg{|}P_{i}\in\mathbf{P},i\in U\right\}.

(43)

We say $\mathbf{P}_{U}$ is $S$ -bounded by $\delta_{S}$ , if the following holds for all $c$ , and for all $U$ such that $|U|\leq S$ :

(1-\delta_{S})\|c\|_{2}^{2}\leq\|\mathbf{P}_{U}\cdot c\|_{2}^{2}\leq(1+\delta_{S})\|c\|_{2}^{2}.

(44)

For our problem, the sparsity assumption implies that $\Delta_{t}$ has at most $U_{t}$ non-zero entries. Assume $\mathbf{P}_{U}$ is both $3U_{t}$ -bounded and $4U_{t}$ -bounded, and the corresponding bounds are $\delta_{3U_{t}}$ and $\delta_{4U_{t}}$ , respectively. We use Theorem 1 in [26] as Lemma 2 for our subsequent proof.

Lemma 2.

(Theorem 1 in [26]): For $X_{t}$ with $U_{t}$ non-zero entries, and for the matrix $P$ with $n$ columns, if $P$ is both $3U_{t}$ -bounded and $4U_{t}$ -bounded, the difference between the solution for (P3) and ground truth is upper bounded, i.e.,

\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}\leq\frac{4}{(\sqrt{3(1-\delta_{4U_{t}})}-\sqrt{1+\delta_{3U_{t}}})}\cdot\delta.

(45)

To apply Lemma 2 to our problem, we need to express the constraint $\|\Delta_{t}\boldsymbol{P}-K_{t}\|_{2}<\delta$ in the following equivalent form:

\left|\left|\Delta_{t}\frac{\mathbf{P}}{\|\mathbf{P}\|_{2}}-\frac{K_{t}}{\|\mathbf{P}\|_{2}}\right|\right|_{2}<\frac{\delta}{\|\mathbf{P}\|_{2}}.

(46)

Thus, Lemma 2 directly dictates

\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}\leq C(\mathbf{P})\cdot\delta,

(47)

where

C(\mathbf{P})=\frac{4}{(\sqrt{3(1-\delta_{4U_{t}})}-\sqrt{1+\delta_{3U_{t}}})\|\mathbf{P}\|_{2}}.

(48)

Note that $\mathbf{E}[\bar{\Delta}_{t}]=\Delta_{t}^{*}$ , which yields

	$\displaystyle\\|\mathbf{E}[\bar{\Delta}_{t}]-\Delta_{t}^{0}\\|_{2}$	$\displaystyle\leq\\|\mathbf{E}[\bar{\Delta}_{t}]-\Delta_{t}^{}\\|_{2}+\\|\Delta_{t}^{}-\Delta_{t}^{0}\\|_{2}$		(49)
		$\displaystyle\leq C(\mathbf{P})\cdot\delta.$		(49)

This completes our proof for Theorem 1.

-E Proof for Proposition 1

Note that the following triangle inequality holds with $K_{t}^{\epsilon}$ defined in (P4):

	$\displaystyle\|\Delta_{t}^{\epsilon}\mathbf{P}-\Delta_{t}^{0}\mathbf{P}\|$	(50)
$\displaystyle\leq$	$\displaystyle\|\Delta^{\epsilon}_{t}\mathbf{P}-\Delta^{}_{t}\mathbf{P}+\Delta^{}_{t}\mathbf{P}-\Delta^{0}_{t}\mathbf{P}\|$
$\displaystyle\leq$	$\displaystyle\|\Delta^{\epsilon}_{t}\mathbf{P}-\Delta^{}_{t}\mathbf{P}\|+\|\Delta^{}_{t}\mathbf{P}-\Delta^{0}_{t}\mathbf{P}\|$
$\displaystyle\leq$	$\displaystyle C(\mathbf{P})\cdot\delta+\|\Delta^{\epsilon}_{t}-K_{t}^{\epsilon}+K_{t}^{\epsilon}-\Delta^{*}_{t}\mathbf{P}\|$
$\displaystyle\leq$	$\displaystyle C(\mathbf{P})\cdot\delta+\delta+\|\|y_{t}-y_{t+1}+n_{t}-n_{t+1}\|-\Delta^{*}_{t}\mathbf{P}\|$
$\displaystyle\leq$	$\displaystyle C(\mathbf{P})\cdot\delta+\delta+\|\|y_{t}-y_{t+1}\|+\|n_{t}-n_{t+1}\|-\Delta^{*}_{t}\mathbf{P}\|$
$\displaystyle\leq$	$\displaystyle C(\mathbf{P})\cdot\delta+\delta+\delta+\|n_{t}-n_{t+1}\|$
$\displaystyle\leq$	$\displaystyle(C(\mathbf{P})+2)\cdot\delta+\|n_{t}-n_{t+1}\|$

Proposition 1 immediately follows.

-F Proof for Theorem 3

Denote $l_{t}=n_{t}-n_{t-1}$ . Since $n_{t}$ and $n_{t-1}$ are i.i.d Laplace noises, we can show that

\displaystyle\mathbf{E}[|l_{t}|]=2\int^{\infty}_{0}\!\!x\left(\frac{x}{4\lambda^{2}}\!+\!\frac{1}{4\lambda}\right)e^{-\frac{x}{\lambda}}dx=\frac{3\lambda}{2}=\frac{3\delta}{4\epsilon}.

(51)

Note that the last equality holds because $\Delta{f}=\frac{\delta}{2}$ and $\lambda=\frac{\Delta{f}}{\epsilon}$ .

This allows us to express $\mathbf{E}[\alpha]$ in terms of $\delta$ and $\epsilon$ :

	$\displaystyle\mathbf{E}[\alpha]=\mathbf{E}_{l_{t}}\left[1-\frac{\mathbf{E}\\|\bar{\Delta}_{t}^{\epsilon}-\Delta_{t}^{0}\\|_{1}}{N}\right]$	(52)
$\displaystyle=$	$\displaystyle\mathbf{E}_{l_{t}}\left[1\!-\!\frac{\mathbf{E}\\|\bar{\Delta}_{t}^{\epsilon}\!-\!\Delta_{t}^{0}\\|_{2}}{N}\right]\!=\!\mathbf{E}_{l_{t}}\left[1\!-\!\frac{\\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]\!-\!\Delta_{t}^{0}\\|_{2}}{N}\right]$
$\displaystyle\geq$	$\displaystyle\ \mathbf{E}_{l_{t}}\left[\max(1-\frac{(C(\mathbf{P})+2)\delta+\|l_{t}\|}{N},0)\right].$

Expressing the expectation in the integral form further yields that

		$\displaystyle\mathbf{E}[\alpha]$		(53)
		$\displaystyle\geq 1-2\int_{N-(C(\mathbf{P})+2)\delta}^{\infty}\!\left(\frac{x}{4\lambda^{2}}\!+\!\frac{1}{4\lambda}\right)e^{-\frac{x}{\lambda}}dx$
		$\displaystyle-2\int_{0}^{N-(C(\mathbf{P})+2)\delta}\!\left(\frac{(C(\mathbf{P})+2)\delta+x}{N}\right)\left(\frac{x}{4\lambda^{2}}\!+\!\frac{1}{4\lambda}\right)e^{-\frac{x}{\lambda}}dx.$

From Eq. (52), the inequality holds due to the fact that accuracy has a trivial lower bound 0, which indicates the error has a trivial upper bound 1. Theorem 3 follows by further standard mathematical manipulations.

-G Proof for Theorem 4

The accuracy is defined as follows

\displaystyle\alpha=1-\frac{\mathbf{E}\|\bar{\Delta_{t}^{\epsilon}}-\Delta_{t}^{0}\|_{1}}{N}.

(54)

To find its upper bound, we need to find the lower bound of the error. With $K_{t}^{\epsilon}$ in Problem (P4), we define event $\Omega$ as $\{K_{t}^{\epsilon}\not\in[0,\Delta^{0}_{t}\mathbf{P}+2\delta]\}\cap\{(y_{t}-y_{t+1})(n_{t}-n_{t+1})>0\}$ .

This event $\Omega$ allows us to lower bound the error:

		$\displaystyle\mathbf{E}[1-\alpha]=\mathbf{E}_{K_{t}^{\epsilon}}\frac{\mathbf{E}\\|\bar{\Delta}_{t}^{\epsilon}-\Delta_{t}^{0}\\|_{1}}{N}$		(55)
	$\displaystyle=$	$\displaystyle\mathbf{E}_{K_{t}^{\epsilon}}\frac{\\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\\|_{2}}{N}\geq\mathbf{E}\left[\frac{\\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\\|_{2}}{N}\mathbf{I}\{\Omega\}\right],$		(55)

where $\mathbf{I}\{\cdot\}$ is the indicator function.

For a certain $K_{t}^{\epsilon}$ , from the definition of rounding, we know that the expectation of rounding is the same as the optimal solution to (P3), denoted by $\Delta_{t}^{*}$ . Hence, we could substitute $\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]$ with $\Delta_{t}^{*}$ .

When the event $\mathscr{\Omega}$ happens, $K_{t}^{\epsilon}$ could be related to the ground truth $\Delta_{t}^{0}$ , in which $l_{t}$ denotes the $n_{t}-n_{t-1}$ . Since the $(y_{t}-y_{t+1})(n_{t}-n_{t+1})>0$ , we could derive that,

\displaystyle K_{t}^{\epsilon}=|y_{t}-y_{t-1}|+|l_{t}|

\displaystyle\geq\Delta_{t}^{0}P-\delta+|l_{t}|.

(56)

Thus, if the $\Delta_{t}^{*}$ is feasible, i.e.,

\displaystyle|\Delta_{t}^{*}\mathbf{P}-K_{t}^{\epsilon}|\leq\delta,

(57)

then we know

\displaystyle\Delta_{t}^{*}\mathbf{P}-K_{t}^{\epsilon}\geq-\delta.

(58)

When the event $\mathscr{\Omega}$ happens, we could conclude that $|l_{t}|>2\delta$ . Therefore, combining with Eqs.(56) and (58) yields

	$\displaystyle\|\|l_{t}\|-2\delta\|$	$\displaystyle\leq\|(\Delta_{t}^{*}-\Delta_{t}^{0})\mathbf{P}\|$		(59)
		$\displaystyle\leq\\|\Delta_{t}^{*}-\Delta_{t}^{0}\\|_{2}\\|\mathbf{P}\\|_{2}.$		(59)

Plugging Eq. (59) into Eq. (LABEL:omegaerr) further implies that

$\displaystyle\mathbf{E}[1-\alpha]\geq$	$\displaystyle\mathbf{E}\left[\frac{\\|(\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0})\\|_{2}}{N}\mathbf{I}\{\Omega\}\right]$	(60)
$\displaystyle\geq$	$\displaystyle\mathbf{E}\left[\frac{\|\|l_{t}\|-2\delta\|}{N\\|\mathbf{P}\\|_{2}}\mathbf{I}\{\Omega\}\right]$
$\displaystyle=$	$\displaystyle\int^{2n\\|\mathbf{P}\\|_{2}+2\delta}_{2\delta}\frac{\|\|x\|-2\delta\|}{N\\|\mathbf{P}\\|_{2}}(\frac{x}{4\lambda^{2}}+\frac{1}{4\lambda})e^{-\frac{x}{\lambda}}dx$
	$\displaystyle+\int_{\infty}^{2n\\|\mathbf{P}\\|_{2}+2\delta}(\frac{x}{4\lambda^{2}}+\frac{1}{4\lambda})e^{-\frac{x}{\lambda}}dx.$

The inequalities are again making use of the fact that accuracy has a trivial lower bound $0$ . With mathematical manipulations, we can prove Theorem 4.

-H Proof for Theorem 5

For one-shot compressive sensing, with the above proofs of Theorem 3 and 4, we could derive the bounds for the difference between the solution to problem (P4) $\Delta_{t}^{*}$ and the ground truth $\Delta_{t}^{0}$ in the following form:

g(\delta,\epsilon)\leq\mathbf{E}\|\Delta_{t}^{*}-\Delta_{t}^{0}\|_{2}\leq G(\delta,\epsilon).

(61)

Next, we construct $g(\delta,\epsilon)$ and $G(\delta,\epsilon)$ . Following the rounding procedure, for the appliance $k$ , we denote its approximate and true states as $X_{t}^{k}$ and $X_{t}^{0k}$ , respectively. We also denote its approximate and true switching events by $\Delta_{t}^{*k}$ and $\Delta_{t}^{0k}$ . The procedure leads to the following equations:

	$\displaystyle X_{t+1}^{k}$	$\displaystyle=\Delta_{t}^{k}(1-X_{t}^{k})+(1-\Delta_{t}^{k})X_{t}^{k},$		(62)
	$\displaystyle X_{t+1}^{0k}$	$\displaystyle=\Delta_{t}^{0k}(1-X_{t}^{0k})+(1-\Delta_{t}^{0k})X_{t}^{0k}.$		(63)

Together, they indicate

	$\displaystyle\|X_{t+1}^{k}-X_{t+1}^{0k}\|$	(64)
$\displaystyle=$	$\displaystyle\|X_{t}^{k}-X_{t}^{0k}+\Delta_{t}^{k}-\Delta_{t}^{0k}+2(X_{t}^{0k}\Delta_{t}^{0k}-X_{t}^{k}\Delta_{t}^{k})\|$
$\displaystyle=$	$\displaystyle\|(1-2\Delta_{t}^{k})(X_{t}^{k}-X_{t}^{0k})+(1-2X_{t}^{0k})(\Delta_{t}^{k}-\Delta_{t}^{0k})\|$
$\displaystyle\leq$	$\displaystyle\|(1-2\Delta_{t}^{k})(X_{t}^{k}-X_{t}^{0k})\|+\|(1-2X_{t}^{0k})(\Delta_{t}^{k}-\Delta_{t}^{0k})\|$
$\displaystyle\leq$	$\displaystyle\|X_{t}^{k}-X_{t}^{0k}\|+\|\Delta_{t}^{*k}-\Delta_{t}^{0k}\|.$

The last inequality holds because either $\Delta_{t}^{*k}$ or $X_{t}^{0k}$ is in $[0,1]$ . Hence, both $1-2\Delta_{t}^{*k}$ and $1-2X_{t}^{0k}$ are in $[-1,1]$ .

Taking the expectation, we have

	$\displaystyle\mathbf{E}\|X_{t}^{k}-X_{t}^{0k}\|$	(65)
$\displaystyle\leq$	$\displaystyle\mathbf{E}\sum\nolimits_{i=1}^{T-1}\|\Delta_{i}^{k}-\Delta_{i}^{0k}\|=\mathbf{E}\sum\nolimits_{i=1}^{T-1}\\|\Delta_{i}^{k}-\Delta_{i}^{0k}\\|_{2}$
$\displaystyle\leq$	$\displaystyle\mathbf{E}\sum\nolimits_{i=1}^{T-1}\\|\Delta_{i}^{*}-\Delta_{i}^{0}\\|_{2}=(T-1)G(\delta,\epsilon).$

Denoting $\boldsymbol{X}^{0}$ as the ground truth and $\boldsymbol{X}$ as the results before rounding, we know

		$\displaystyle\mathbf{E}\\|X-X^{0}\\|_{1}$		(66)
	$\displaystyle=$	$\displaystyle\mathbf{E}\sum_{k=1}^{N}\sum_{i=1}^{T}\|X_{i}^{k}-X_{i}^{0k}\|\leq\mathbf{E}\frac{T(T-1)}{2}G(\delta,\epsilon)N.$		(66)

Then, back to lower bound the accuracy, in which $\bar{\boldsymbol{X}^{\epsilon}}$ denotes the actual decoding results after rounding, we could derive that:

	$\displaystyle\mathbf{E}[\alpha_{m}]$	$\displaystyle=1-\frac{\mathbf{E}\\|\bar{\boldsymbol{X}^{\epsilon}}-\boldsymbol{X}^{0}\\|_{1}}{NT}=1-\frac{\mathbf{E}\\|\mathbf{E}[\bar{\boldsymbol{X}^{\epsilon}}]-\boldsymbol{X}^{0}\\|_{1}}{NT}$		(67)
		$\displaystyle=1-\frac{\\|\boldsymbol{X}-\boldsymbol{X}^{0}\\|_{1}}{NT}\geq 1-\frac{(T-1)}{2}G(\delta,\epsilon).$		(67)

We construct $G(\delta,\epsilon)$ in terms of the accuracy bound for the one-shot algorithm. If we denote the lower and upper bounds for one-shot compressive sensing algorithm by $b(\delta,\epsilon)$ and $B(\delta,\epsilon)$ , then from Eq. (52), we have

	$\displaystyle\mathbf{E}\left[1-\frac{\\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\\|_{2}}{N}\right]$	$\displaystyle=\mathbf{E}\left[1-\frac{\\|\Delta_{t}^{*}-\Delta_{t}^{0}\\|_{2}}{N}\right]$		(68)
		$\displaystyle=\mathbf{E}[\alpha]\geq b(\delta,\epsilon).$		(68)

This constructs the lower bound:

G(\delta,\epsilon)=1-b(\delta,\epsilon)N.

(69)

Specifically, we could derive the lower bound for multi-shot compressive sensing as

\mathbf{E}[\alpha_{m}]\geq 1-\frac{(T-1)(1-b(\delta,\epsilon)N)}{2}.

(70)

As for the upper bound, we could construct an oracle to analyze a simple bound. The oracle could identify all the true states when $t>1$ . When $t=1$ , it does the same as our multi-shot compressive sensing algorithm. Hence, its error totally comes from the error for time period $1$ . Since we know the initial states, our error comes from the first period’s one-shot compressive sensing’s calculations, that is the error for $\bar{\Delta}_{1}^{\epsilon}$ . This observation yields that

\mathbf{E}\left[1-\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}}{N}\right]\leq B(\delta,\epsilon).

(71)

This further implies that

\mathbf{E}\left[\frac{\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\|_{2}}{N}\right]\geq 1-B(\delta,\epsilon).

(72)

Therefore, the upper bound could be derived with the help of Theorem 4 as follows:

	$\displaystyle\mathbf{E}[\alpha_{m}]$	$\displaystyle=1-\frac{\mathbf{E}\\|\bar{\boldsymbol{X}^{\epsilon}}-\boldsymbol{X}^{0}\\|_{1}}{NT}\leq 1-\frac{\mathbf{E}\\|\bar{X_{1}^{\epsilon}}-X_{1}^{0}\\|_{1}}{NT}$		(73)
		$\displaystyle=\mathbf{E}\left[1-\frac{\\|\mathbf{E}[\bar{\Delta}_{t}^{\epsilon}]-\Delta_{t}^{0}\\|_{2}}{NT}\right]\leq 1-\frac{1-B(\delta,\epsilon)}{T}.$		(73)

Thus, we complete our proof.

-I Proof for Proposition 2

We first define the notion of a good hierarchy. For a hierarchy $\mathscr{H}$ with $S$ items, we rank the items ascendingly: $P_{1},..,P_{S}$ . If $\mathscr{H}$ satisfies that

\sum\nolimits_{i=1}^{U}P_{i}-2\delta>\sum\nolimits_{i=1}^{U-1}P_{S-i+1},\forall U<S,

(74)

we call it a good hierarchy. Notice this characteristic is stronger than Assumption 3. For a good hierarchy with $S$ items, if $S\leq U_{t}$ , Assumption 3 immediately holds.

We prove the proposition by induction. The induction basis is clear: the single item hierarchy is a trivial good hierarchy. The key step is the induction process, where we want to show that for a good hierarchy with $S$ items, if the $(S+1)^{th}$ appliance satisfies,

\sum\nolimits_{i=1}^{\lfloor\frac{S}{2}\rfloor+1}P_{i}-2\delta\geq\sum\nolimits_{i=1}^{\lfloor\frac{S}{2}\rfloor-1}P_{S+1-i}+P_{S+1},

(75)

then we should include it to the set and form a larger good hierarchy. This can be proved by examining the following two cases.

For $U\leq\lfloor\frac{S}{2}\rfloor+1$ , since

\sum\nolimits_{i=U+1}^{\lfloor\frac{S}{2}\rfloor+1}P_{i}<\sum\nolimits_{i=U-1}^{\lfloor\frac{S}{2}\rfloor-1}P_{S-i+1},

(76)

together with condition (75), we could obtain that

\sum\nolimits_{i=1}^{U}P_{i}-2\delta>\sum\nolimits_{i=1}^{U-2}P_{S-i+1}+P_{S+1}.

(77)

For $\lfloor\frac{S}{2}\rfloor+1<U\leq S$ , we can observe the following two facts:

\sum\nolimits_{S-U+3}^{U}P_{i}=\sum\nolimits_{i=S-U+1}^{U-2}P_{S-i+1},

(78)

and

S-U+2\leq\lfloor\frac{S}{2}\rfloor+1.

(79)

Together, they imply that

\sum\nolimits_{i=1}^{S-U+2}P_{i}-2\delta>\sum\nolimits_{i=1}^{S-U}P_{S-i+1}+P_{S+1}.

(80)

Combining the two cases, we can conclude the induction. Following the same routine, we can show that, condition (75) is also necessary. That is, if the $(S+1)^{th}$ appliance does not satisfy this condition, it should not be included into the current set (hierarchy), as it will make the hierarchy not good. We should, instead, start a new set. Thus, we complete the proof for Proposition 2.

-J Proof for Theorem 6

We prove the following Lemma, which is crucial to our proof.

Lemma 3.

For our one-shot compressive sensing algorithm, we denote the lower and upper bound as $b(\delta,\epsilon)$ and $B(\delta,\epsilon)$ . Besides the Laplace noises, we inject new disturbances $\xi_{t}$ and $\xi_{t+1}$ into $y_{t}+n_{t}$ and $y_{t+1}+n_{t+1}$ , in which $y_{t}$ and $y_{t+1}$ is the meter readings and $n_{t}$ is the injecting Laplace noise as we defined in Eq. (16). If the disturbances satisfy that $\mathbf{E}|\xi_{t}-\xi_{t+1}|\leq Y$ , then the bounds associated with our algorithm with disturbances are $b(\delta+\frac{Y}{2+C(\mathbf{P})},\epsilon)$ and $B(\delta+\frac{Y}{2},\epsilon)$ , respectively.

Proof: First we construct the new lower bound. Following the same routine as Eq. (50), we have

	$\displaystyle\|\Delta_{t}^{\epsilon}\mathbf{P}-\Delta_{t}^{0}\mathbf{P}\|$	(81)
$\displaystyle\leq$	$\displaystyle(C(\mathbf{P})+2)\cdot\delta+\|n_{t}-n_{t+1}+\xi_{t}-\xi_{t+1}\|.$
$\displaystyle\leq$	$\displaystyle(C(\mathbf{P})+2)\cdot\delta+\|n_{t}-n_{t+1}\|+\|\xi_{t}-\xi_{t+1}\|.$

This implies that

$\displaystyle\mathbf{E}[\alpha]$	$\displaystyle\geq\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)\delta+\|\xi_{t}-\xi_{t+1}\|+\|l_{t}\|}{N}\right]$	(82)
	$\displaystyle\geq\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)\delta+Y+\|l_{t}\|}{N}\right]$
	$\displaystyle=\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)(\delta+\frac{Y}{2+C(\mathbf{P})})+\|l_{t}\|}{N}\right],$

where $l_{t}$ is the difference between the Laplace noises, i.e., $l_{t}=n_{t}-n_{t+1}$ . Clearly, the disturbance plays a similar role as the noises. Revisiting Eq. (56) yields that

K_{t}^{\epsilon}=|y_{t}-y_{t+1}+\xi_{t}-\xi_{t+1}|+|l_{t}|.

(83)

Hence, we could establish the new upper bound as follows:

	$\displaystyle\|\|l_{t}\|-2\delta-Y\|\leq\|\mathbf{E}_{\xi}[\|l_{t}\|-2\delta-\|\xi_{t}-\xi_{t+1}\|]\|$	(84)
$\displaystyle\leq$	$\displaystyle\mathbf{E}_{\xi}\|\|l_{t}\|-2\delta-\|\xi_{t}-\xi_{t+1}\|\|$
$\displaystyle\leq$	$\displaystyle\mathbf{E}_{\xi}\|(\Delta_{t}^{}-\Delta_{t}^{0})\mathbf{P}\|\leq\mathbf{E}_{\xi}\\|\Delta_{t}^{}-\Delta_{t}^{0}\\|_{2}\\|\mathbf{P}\\|_{2}.$

The upper bound can be derived following the same routine. Then we finish our proof of Lemma 3.

The key to apply Lemma 3 to the proof of Theorem 6 is to characterize the disturbance connections between hierarchies. We characterize such connections by induction. The induction basis is to examine hierarchy 1, where the disturbances come from all the other hierarchies’ instead of decoding error.

Specifically, for $\forall y_{t},y_{t+1}$ , if we denote $y^{01}_{t}$ as the hierarchy $1$ ’s meter reading at time $t$ and $y^{out_{1}}_{t}$ as the other hierarchies meter reading at time $t$ , it holds that:

|y_{t}-y_{t+1}|=|y^{01}_{t}-y^{01}_{t+1}+l_{t}+y^{out_{1}}_{t}-y^{out_{1}}_{t+1}|.

(85)

With the sparsity $U$ , we could further derive that

|y^{out_{1}}_{t}-y^{out_{1}}_{t+1}|\leq|P_{U}^{1}|,

(86)

where $P_{U}^{1}$ is defined as the maximum summation of $U$ appliances whose power is smaller than the smallest energy consumption in the hierarchy $1$ , which is denoted by $P_{m}^{1}$ . Note that $\xi_{t}=y^{out_{1}}_{t}$ , i.e.,

|\xi_{t}-\xi_{t+1}|\leq P_{U}^{1}.

(87)

This constructs our induction basis with the above Lemma 3.

Next, we analyze the bounds for hierarchy $i$ . We assume for each hierarchy $k$ , $1\leq k<i$ , we have

m_{k}\leq\mathbf{E}[\alpha_{k}]\leq M_{k},

(88)

where $m_{k}$ is the lower bound of the multi-shot compressive sensing algorithm accuracy for hierarchy $k$ and $M_{k}$ is the upper bound for this hierarchy as indicated in Theorem 6.

Thus, denote the hierarchy $i$ ’s meter reading as $y^{0i}_{t}$ at time $t$ and the decoding results as $\tilde{y}^{i}_{t}$ . By the definition of $\alpha_{k}$ , we can show that $\forall k<i$ ,

$\displaystyle\mathbf{E}\|\tilde{y}^{k}_{t}-y_{t}^{0k}\|$	$\displaystyle=\mathbf{E}\|(\bar{\boldsymbol{X}^{\epsilon}[k]}-\boldsymbol{X}^{0}[k])\mathbf{P}_{k}\|$	(89)
	$\displaystyle\leq\mathbf{E}\\|(\bar{\boldsymbol{X}^{\epsilon}[k]}-\boldsymbol{X}^{0}[k])\\|_{2}\\|\mathbf{P}_{k}\\|_{2}$
	$\displaystyle=N_{k}T(1-\mathbf{E}[\alpha_{i}])\\|\mathbf{P}_{k}\\|_{2}$
	$\displaystyle\leq N_{k}T(1-m_{k})\\|\mathbf{P}_{k}\\|_{2}.$

Also, if we denote all hierarchies whose index $m>i$ ’s actual total meter reading as $y^{out_{i}}_{t}$ , it could be directly derived from the sparsity of switching events that:

|y^{out_{i}}_{t}-y^{out_{i}}_{t+1}|\leq|P_{U}^{i}|

(90)

Thus, the total disturbance for hierarchy $i$ can be calculated as follows:

\xi_{t}=\sum\nolimits_{k=1}^{i-1}(y^{0k}_{t}-\tilde{y}^{k}_{t})+y^{out_{i}}_{t}.

(91)

This allows as to construct the bounds for $\mathbf{E}|\xi_{t}-\xi_{t+1}|$ :

	$\displaystyle\mathbf{E}\|\xi_{t}-\xi_{t+1}\|$	(92)
$\displaystyle\leq$	$\displaystyle\mathbf{E}[\sum_{k=1}^{i-1}(\|y^{0k}_{t}-\tilde{y}^{k}_{t}\|+\|y^{0k}_{t+1}-\tilde{y}^{k}_{t+1}\|)+\|y^{out_{i}}_{t}-y^{out_{i}}_{t+1}\|]$
$\displaystyle\leq$	$\displaystyle 2\sum\nolimits_{k=1}^{i-1}N_{k}T(1-m_{k})\\|\mathbf{P}_{k}\\|_{2}+\|P_{U}^{i}\|.$

Together with the Lemma 3, we complete the induction, and hence prove Theorem 6.

	$\displaystyle\|\|l_{t}\|-2\delta\|$	$\displaystyle\leq\|(\Delta_{t}^{*}-\Delta_{t}^{0})\mathbf{P}\|$		(59)
		$\displaystyle\leq\\|\Delta_{t}^{*}-\Delta_{t}^{0}\\|_{2}\\|\mathbf{P}\\|_{2}.$		(59)

$\displaystyle\mathbf{E}[\alpha]$	$\displaystyle\geq\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)\delta+\|\xi_{t}-\xi_{t+1}\|+\|l_{t}\|}{N}\right]$	(82)
	$\displaystyle\geq\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)\delta+Y+\|l_{t}\|}{N}\right]$
	$\displaystyle=\mathbf{E}\left[1-\frac{(C(\mathbf{P})+2)(\delta+\frac{Y}{2+C(\mathbf{P})})+\|l_{t}\|}{N}\right],$

$\displaystyle\mathbf{E}\|\tilde{y}^{k}_{t}-y_{t}^{0k}\|$	$\displaystyle=\mathbf{E}\|(\bar{\boldsymbol{X}^{\epsilon}[k]}-\boldsymbol{X}^{0}[k])\mathbf{P}_{k}\|$	(89)
	$\displaystyle\leq\mathbf{E}\\|(\bar{\boldsymbol{X}^{\epsilon}[k]}-\boldsymbol{X}^{0}[k])\\|_{2}\\|\mathbf{P}_{k}\\|_{2}$
	$\displaystyle=N_{k}T(1-\mathbf{E}[\alpha_{i}])\\|\mathbf{P}_{k}\\|_{2}$
	$\displaystyle\leq N_{k}T(1-m_{k})\\|\mathbf{P}_{k}\\|_{2}.$

Privacy Preserving in Non-Intrusive Load Monitoring: A Differential Privacy Perspective

Abstract

Index Terms:

I Introduction

I-A Related Works

I-B Our Contributions

II Problem Formulation

II-A Non-intrusive Load Monitoring Problem

II-B One-shot NILM Inference

Assumption 1.

Definition 1.

Assumption 2.

Assumption 3.

Lemma 1.

Theorem 1.

III Encode DP into NILM

III-A Revisit ϵ\epsilon-Differential Privacy

Theorem 2.

III-B Asymptotic Bounds for NILM with DP

Proposition 1.

Theorem 3.

Theorem 4.

IV Hierarchical Multi-shot NILM

IV-A Multi-shot NILM Inference

Definition 2.

Theorem 5.

IV-B Hierarchical Decomposition

Proposition 2.

Theorem 6.

V Numerical Studies

V-A Overview of the Datasets

V-B One-shot NILM Inference

V-C Multi-shot NILM Inference

V-D Robustness of Noise Injection Methods

V-E Beyond Compressive Sensing

VI Conclusion

References

-A DP May Affect Behavior Analysis

-B Justification of Sparsity Assumption

-C Proof of Lemma 1

-D Proof for Theorem 1

Lemma 2.

-E Proof for Proposition 1

-F Proof for Theorem 3

-G Proof for Theorem 4

-H Proof for Theorem 5

-I Proof for Proposition 2

-J Proof for Theorem 6

Lemma 3.

III-A Revisit $\epsilon$ -Differential Privacy