\old@ps@headings

A New Time Series Similarity Measure and Its Smart Grid Applications

Rui Yuan1, S. Ali Pourmousavi1, Wen L. Soong1, Andrew J. Black2, Jon A. R. Liisberg3 and Julian Lemos-Vinasco3 1 School of Electrical and Mechanical Engineering
The University of Adelaide, Adelaide, Australia 2 School of Computer and Mathematical Sciences
The University of Adelaide, Adelaide, Australia 3 Research & Department of Data Science
Watts A/S, Køge, Zealand, Denmark

Abstract

Many smart grid applications involve data mining, clustering, classification, identification, and anomaly detection, among others. These applications primarily depend on the measurement of similarity, which is the distance between different time series or subsequences of a time series. The commonly used time series distance measures, namely Euclidean Distance (ED) and Dynamic Time Warping (DTW), do not quantify the flexible nature of electricity usage data in terms of temporal dynamics. As a result, there is a need for a new distance measure that can quantify both the amplitude and temporal changes of electricity time series for smart grid applications, e.g., demand response and load profiling. This paper introduces a novel distance measure to compare electricity usage patterns. The method consists of two phases that quantify the effort required to reshape one time series into another, considering both amplitude and temporal changes. The proposed method is evaluated against ED and DTW using real-world data in three smart grid applications. Overall, the proposed measure outperforms ED and DTW in accurately identifying the best load scheduling strategy, anomalous days with irregular electricity usage, and determining electricity users’ behind-the-meter (BTM) equipment.

Index Terms:

Renewable energy, load profiling, data mining, multi-label classification, and smart grid.

\thanksto

This project is supported by the Australian Government Research Training Program (RTP) through the University of Adelaide, and a supplementary scholarship provided by Watts A/S, Denmark.

I Introduction

Modern power systems face serious challenges imposed by the integration of renewable power plants, the emergence of distributed generation resources, and the requirements for decarbonisation. To address these challenges, traditional power systems must evolve into smarter, more efficient, and more sustainable energy networks, that is, smart grids, with the help of grid digitalisation. Digitalisation refers to the use of digital and other advanced technologies to monitor and coordinate the needs and capabilities of generators, grid operators, end users and other stakeholders in the electricity market [1, 2]. With the increased usage of digital devices and the improvement of computational capacity on smart sensors, data mining has become an essential tool to realise the true benefits of grid digitalisation in extracting useful information from the vast amounts of data collected by data acquisition terminals in smart energy systems. This will ultimately lead to higher customer satisfaction and energy efficiency [3].

Data mining in this context refers to any mathematical techniques that help discover energy usage patterns, among other insights, to facilitate the implementation of smart grid applications, e.g., studying occupancy and occupant behaviour. In this way, energy providers can offer more tailored services to consumers, detect faults remotely, operate demand response programs, optimise energy use, and reduce green energy spillage by load shifting. Energy usage patterns contain trends and features that are indicative of long time series and can be used to improve decision making through knowledge discovery in smart grids [4, 5]. Discovering patterns requires approaches to measure the similarity between two or more time series, allowing the identification of patterns, trends, anomalies, and outliers in time series data. This problem is called range queries in the literature when identified patterns are extracted based on a given distance threshold [6]. Alternatively, it is referred to as nearest-neighbour queries when the extracted patterns are the ones with the smallest distance [6]. Consequently, distance measurements are critical in identifying featured patterns for smart grid applications.

In general, time series distance measurements fall into two categories: shape-based similarity and structure-based similarity [7]. The former compares the similarity of two time series pointwise, whereas the latter converts the data into higher-level structures for comparison [8, 5]. Shape-based similarity approaches are generally preferred in electricity data analysis because user behaviours affect the use of appliances, making it more accurate and interpretable [9, 7]. Within the shape-based similarity measures, two approaches are extensively used in the literature, namely Euclidean distance (ED) and dynamic time warping (DTW) [8, 7, 5]. In addition to ED and DTW, different studies apply other basic spatial measurements such as Manhattan or Chebyshev distances, but the performances are reported to be no different from ED [10].

Depending on consumers’ preferences and external factors, such as weather, day-to-day residential electricity consumption can deviate from expected values in time (by shifting loads) and amplitude (by changing consumption level). Therefore, comparing two consumption profiles makes more sense when both aspects are considered. However, ED only accounts for changes in electricity loads at the same time without temporal considerations. To address this issue, DTW was introduced for load signature detection [11, 5], which considers the temporal dynamics of a time series by disregarding the indices (timestamps). However, DTW cannot take into account the time shifting or the exchange of the temporal sequence of the electrical load, both of which significantly affect energy consumption. This issue is explained in depth in Section II with a discussion of measuring electric vehicle (EV) charging series. To address this challenge, our paper presents a new distance measure that considers both the temporal and the amplitude changes in the electricity time series data. It takes into account load increments, decrements, shifts, and temporal sequence exchange. Therefore, it can quantify both temporal and amplitude differences in time series. We specifically developed this measure for the analysis of electricity data, but it can also be used in range queries and nearest neighbour queries for other data mining and time series analysis. The performance of the proposed measure is investigated using real-world data employed in three different applications, namely anomaly detection, load shifting, and behind-the-meter (BTM) equipment identification.

This paper is organised as follows: Section I provides background on applying data mining in power systems and identifies issues in distance measurement. Section II discusses these problems in more detail by outlining the issues with existing similarity measures. Section III presents the proposed methodology based on the requirements of an ideal similarity metric and compares the proposed method with ED and DTW. Simulation studies are given in Section IV, where the results are analysed. In Section V, limitations and future work are discussed. The article is concluded in Section VI.

II Problem Definition

Consider two time series of length $m$ , $\mathbf{X}=\{X_{1},...X_{m}\}$ and $\mathbf{Y}=\{Y_{1},...Y_{m}\}$ . The similarity of the two time series defined by ED is calculated by accumulating the point value differences at the same timestamp:

\displaystyle\text{ED}(\mathbf{X},\mathbf{Y})=\sqrt{\sum^{m}_{i=1}(X_{i}-Y_{i})^{2}}

(1)

On the other hand, DTW finds the minimum warping path and its associated distance of the two series [12]:

\text{DTW}(\mathbf{X},\mathbf{Y})=\sqrt{\Theta_{X_{m},Y_{m}}},

(2)

where $X_{m}$ and $Y_{m}$ are the $m$ ^th point in $\mathbf{X}$ and $\mathbf{Y}$ , $\Theta_{X_{m},Y_{m}}$ is the cumulative distance of $\mathbf{X}$ and $\mathbf{Y}$ from $1$ to $m$ . The distance between $i$ ^th point in $\mathbf{X}$ and $j$ ^th point in $\mathbf{Y}$ can be calculated as:

\displaystyle\!\!\Theta_{X_{i},Y_{j}}\!\!=\!(X_{i}\!-\!Y_{j})^{2}\!\!+\!\min\{\Theta_{X_{i\!-\!1},Y_{j\!-\!1}},\!\Theta_{X_{i\!-\!1},Y_{j}},\!\Theta_{X_{i},Y_{j\!-\!1}}\!\}.

(3)

More specifically, ED only measures the amount of energy change at each interval, whereas DTW considers two time series similar if they have similar patterns, even if these occur at different times. This difference is illustrated in an example shown in Fig. 1, where we compare two data sets of EV charging between 10 am and 7 pm. ED compares electricity consumption at the same interval by comparing charging levels at the same points in time, disregarding any temporal delay or shift. However, DTW considers the temporal delay by comparing the point value of series $\mathbf{A}$ with the previous, current, and next counterparts in series $\mathbf{B}$ . For example, it compares the charging consumption at 12 pm in series $\mathbf{A}$ with the value at 11 am in series $\mathbf{B}$ , resulting in a zero difference. However, from a power system engineering point of view, they should not be considered identical, as it requires a certain amount of effort, incentive, or compromise in comfort to shift the load to an hour later. The ability to quantify temporal changes is referred to as temporal sensitivity hereafter. Furthermore, DTW finds the EV charging at 1 pm in time series $\mathbf{A}$ similar to the values from noon to 2 pm in time series $\mathbf{B}$ with zero difference, while these two patterns are not similar. In time series $\mathbf{A}$ , 12 kW is consumed in one hour, whereas in time series $\mathbf{B}$ , we see a shift in EV charging by one hour and an extension of charging for the next two hours. As a result, these two patterns show different levels of energy consumption, starting points, and duration, making them dissimilar. This characteristic that DTW cannot handle is later referred to as temporal uniqueness.

Refer to caption — Figure 1: Distance measurement comparison with ED (top) and DTW (bottom) for a hypothetical electric vehicle (EV) charging time series $\mathbf{A}$ (blue) and time series $\mathbf{B}$ (orange). The dashed line represents the selected route $r$ to accumulate the point difference between two time series by ED (purple) and DTW (green)

Thus, given two time series $\mathbf{X}$ and $\mathbf{Y}$ , a distance matrix with elements $d_{i,j}=(X_{i}-Y_{j})^{2}$ contains the point value distance between $X_{i}$ and $Y_{j}$ . A route selection matrix with elements $r_{i,j}\in\{0,1\}$ , specifies the path for computing the similarity of $\mathbf{X}$ and $\mathbf{Y}$ by accumulating selected point value distances in $d_{i,j}$ . Hence, an ideal similarity measure, $M(\mathbf{X},\mathbf{Y})=\sum_{i=1}^{m}\sum_{j=1}^{m}r_{i,j}\cdot d_{i,j}$ should meet the following seven requirements:

1.

Non-negativity: $M(\mathbf{X},\mathbf{Y})\geq 0$
2.

Identity: $M(\mathbf{X},\mathbf{Y})=0$ , if and only if $\mathbf{X}=\mathbf{Y}$
3.

Symmetry: $M(\mathbf{X},\mathbf{Y})=M(\mathbf{Y},\mathbf{X})$
4.

Triangle inequality: $M(\mathbf{X},\mathbf{Y})\leq M(\mathbf{X},\mathbf{Z})+M(\mathbf{Z},\mathbf{Y})$ for any time series $\mathbf{X},\mathbf{Y},\mathbf{Z}$
5.

Temporal uniqueness: $\sum_{i=1}^{m}r_{ij}=1\;\forall\;j\in\{1,\dots,m\}$
6.

Temporal sensitivity: $d_{i,j}\neq 0$ , if $i\neq j$
7.

Optimal match:
$M(\mathbf{X},\mathbf{Y})=\displaystyle\min_{r\in\mathcal{R}}\{\sum_{i=1}^{m}\sum_{j=1}^{m}r_{i,j}d_{i,j}$ }

Upon comparing ED and DTW with the ideal measurement requirements outlined above, it is apparent that ED fails to satisfy requirements 5 to 7. DTW fails to meet requirements 2, 5 and 6, while requirement 3 is partially fulfilled depending on whether a symmetric step pattern is applied [12]. A detailed comparison between the two methods is illustrated in Fig. 2 for two sample time series. Time series $\mathbf{A}$ and $\mathbf{B}$ in the time domain are displayed on top, while the distance matrix built from the pointwise distance $d_{i,j}=(A_{i}-B_{j})^{2}$ is shown on the bottom. The selected route where $r_{i,j}=1$ is marked in purple for ED and green for DTW. To conclude the measures of ED and DTW as defined in Eqs. (1) and (2), the distance of the series $\mathbf{A}$ and $\mathbf{B}$ can be calculated as the square root of the accumulation of all pointwise distances on the path following the two routes. As shown by the purple line, ED only accounts for pairwise value changes of the two series at the same interval, i.e., along the diagonal of the distance matrix. Whereas DTW considers the patterns’ dissimilarity by calculating the warping path, which minimises the total distances between corresponding points (the green line in the distance matrix in Fig. 2). It allows “stretched” and “shifted” patterns to be identified as identical by choosing multiple points in a series to compare with another point in the other time series from different time steps. Furthermore, DTW inherently ignores the fact that we cannot compare one interval from series $\mathbf{A}$ with three intervals from series $\mathbf{B}$ . Consequently, DTW fails to meet requirement 5.

III Proposed Methodology

In reality, electricity usage patterns are influenced by various factors, resulting in load patterns that can shift, stretch, and swap in different temporal directions. Some examples are charging an EV during the day instead of charging at night, cooking an hour longer than usual, or watching TV before dinner instead of after dinner. These changes in the load patterns should be taken into account in the similarity measure because they impact the consumers’ welfare, that is, requirement 6, which is not fulfilled by ED nor DTW. Furthermore, reshaping one electrical usage pattern to another involves multiple possible solutions at different costs, and the solution with minimum cost should always be selected for optimal matching, i.e., requirement 7. This requirement cannot be met by ED and is only partially met by DTW with minimum cost but via a warping path, as discussed above. Therefore, we propose a novel distance measure, called Flexibility Distance (FD) hereafter, which consists of two phases to quantify the similarity of the electricity usage profiles. The first phase is to build a cost matrix that considers both the amplitude and the temporal differences. Therefore, $C_{i,j}$ is defined as an improved distance matrix of point values that represents the effort required to reshape the load $X_{i}$ to $Y_{j}$ considering the temporal and amplitude changes.

\displaystyle C_{i,j}=|X_{i}-Y_{j}|\cdot P_{i,j}+|i-j|\cdot T_{i,j}

(4)

where $P_{i,j}$ and $T_{i,j}$ are the amplitude and temporal weights, respectively. For illustration purposes, we use a constant number $1$ , as the amplitude weight and a max-min scaler as the temporal weight to make the amplitude and temporal effects similar for simplicity. Hence, $C_{i,j}$ can be calculated using Eq. (5). In real-world applications, different weights can be set with better understanding of users.

\displaystyle C_{i,j}\!=\!|X_{i}-Y_{j}|\!+\!|i-j|\!\cdot\!\frac{(\max(\mathbf{X},\mathbf{Y})-\min(\mathbf{X},\mathbf{Y}))}{m}

(5)

To represent the possible route that fulfills temporal uniqueness of requirement 5, we define $r_{i,j}$ as binary matrix elements, where each row and column has only one non-zero element, i.e., a route selection matrix, subject to:


	$\displaystyle r_{i,j}\in\{0,1\}\;\forall\;i,j\in\{1,2,\dots,m\}$		(6a)
	$\displaystyle\sum_{j=1}^{m}r_{i,j}=1\;\forall\;i\in\{1,2,\dots,m\}$		(6b)
	$\displaystyle\sum_{i=1}^{m}r_{i,j}=1\;\forall\;j\in\{1,2,\dots,m\}$		(6c)

In the second phase, we select the optimal route to fulfill Requirement 7. The FD metric can be calculated as follows:

\text{FD}(\mathbf{X},\mathbf{Y})=\textit{arg}\min\limits_{r_{i,j}}\{\sum_{i=1}^{m}\sum_{j=1}^{m}r_{i,j}\cdot C_{i,j}\}

(7)

The main idea in the proposed distance measure is to see how one time series can be reshaped into the other time series at a minimum cost. In contrast to the pointwise distance matrix built with ED and DTW shown in Fig. 2, our proposed method considers the cost of both the amplitude and temporal changes by the first phase, i.e. Eqs. (4) and (5). For example, if we move the load consumption from time $i$ in the first time series to a different time $j$ in the other time series, the ED and DTW are unable to tell the difference, i.e., $C_{3,4}=0$ when $X_{3}=Y_{4}$ and $3\neq 4$ , as shown by the number in boldface in Fig. 2. However, the FD metric can quantify this difference, as shown in Fig. 3. The second phase aims to minimise the cost of reshaping the time series $\mathbf{X}$ into $\mathbf{Y}$ by Eq. (7), which is a linear sum assignment problem (LSAP). The LSAP problem can be solved by the Kuhn-Munkres Hungarian Algorithm with $O(n^{3})$ time and $O(n^{2})$ space [13]. This process involves considering all possibilities and combinations in increment, decrease, shift and temporal sequence exchange, as depicted by the red arrows derived from the improved distance matrix in Fig. 3.

Table I summarises the capability of ED, DTW and FD to meet the requirements listed in Section III. It is important to note that, under certain circumstances, if one of the two time series is flat, the cost of reshaping using FD will be equivalent to the sum of diagonal elements in the distance matrix. This is similar to ED but with a different magnitude, as in Eq. (4). In addition, 0 temporal weights make it similar to Wasserstein distance. Furthermore, an important by-product of the proposed distance metric is to find the shortest path in Fig. 3, which provides an optimal solution to reshape time series $\mathbf{A}$ into time series $\mathbf{B}$ . Hence, FD as a metric could provide a quantitative distance that presents the minimum cost for reshaping electricity time series and strategies for optimal reshaping. To this end, FD could be used in many smart grid applications, e.g., to design better home energy management systems and demand response programs.

TABLE I: Feature comparison of different distance measures

Requirements	ED	DTW	FD
Non-negativity	✓	✓	✓
Identity	✓	✕	✓
Symmetry	✓	✗	✓
Triangle inequality	✓	✕	✓
Temporal uniqueness	✓	✕	✓
Temporal sensitivity	✕	✕	✓
Minimum cost	✕	✗	✓

•

✓: accomplished requirements, ✕: unfulfilled requirements, ✗: requirements partially fulfilled.

IV Simulation Studies

To further evaluate the effectiveness of the proposed method, we conducted three experiments, including load scheduling, anomaly detection, and multi-label classification to assess and compare the performance of the ED, DTW and FD metrics.

IV-A Load scheduling

The first simulation study compares different similarity measures in a load scheduling problem using real-world data from a residential consumer with a rooftop PV system in Sydney, Australia, obtained from the SolarHome dataset [14]. We selected a certain day (28 May 2013) [14], in which solar generation and maximum load during the evening and night are shown in Fig. 4 (A). The ideal load profile seeks to reduce the consumer’s electricity bill by shifting the load from the on-peak evening hour at 7:30 pm to the off-peak hour at noon when solar generation is plentiful, as shown in Fig. 4 (A). As a result, the blue curve needs to be reshaped based on the consumer’s flexibility into the orange dashed line. To better show the effectiveness of FD in measuring the true flexibility provided by the user when the realised profile does not completely follow the ideal profile, we generated five scenarios, labelled #1-5, arbitrarily shifting loads from different intervals, as in Fig. 4 (B). The amount of shifted load is determined by the solar generation peak hour, as in Fig. 4 (A). In particular, scenario #1 is a special case of moving the morning peak load to noon, which does not help achieve the ideal profile. This is because it leads to another gap in the morning that is temporally further from the peak loads and fails to reduce the peak load as intended, as shown in Fig. 4 (B). To measure how close each potential rescheduling scenario is to the ideal profile, we calculated the distance between each scenario and the ideal profile using the three different distance measures, as summarised in Table II. Note that the FD values are scaled to be in a similar range for ease of comparison. The first row shows the distance between the original load profile $\mathbf{O}$ and the ideal load profile $\mathbf{E}$ , denoted as $M(\mathbf{O},\mathbf{E})$ . This quantifies the extent of effort required to reschedule the original load profile to the ideal profile. In this way, the lower positive distance values of the rescheduling scenarios indicate that the shifted profile has become more similar to the ideal profile, suggesting a better rescheduling result.

TABLE II: Distance values for all scenarios obtained by ED, DTW, and FD (brighter blue indicates a smaller distance value for each column)

Distance	ED	DTW	FD
$M(\mathbf{O},\mathbf{E})$	1.45	1.17	1.44
$M(\mathbf{S_{1}},\mathbf{E})$	1.45	1.07	1.44
$M(\mathbf{S_{2}},\mathbf{E})$	1.45	0.82	0.46
$M(\mathbf{S_{3}},\mathbf{E})$	1.45	1.16	0.96
$M(\mathbf{S_{4}},\mathbf{E})$	1.04	0.78	0.62
$M(\mathbf{S_{5}},\mathbf{E})$	1.45	0.89	0.93

It can be seen in Table II that the distances measured by ED are identical for all scenarios, except for scenario #4, which has a partial overlap with the ideal profile. This indicates that ED lacks the temporal sensitivity needed to distinguish the load shift in the direction required by the ideal profile. For DTW, the distance varies for all five scenarios, while the DTW value, shown in scenario #1, does not agree with domain knowledge, as discussed above. In scenario #3, both ED and DTW show little effectiveness, as the $M(\mathbf{O},\mathbf{E})$ distance in these approaches is almost identical to $M(\mathbf{S_{3}},\mathbf{E})$ . However, scenario #3 represents shifts of a significant portion of the load from evening peak hours to noon, resembling a profile closer to the ideal profile, and should show positive effects in the measure. Scenario #2 is the second-best option following the ideal profile, as it shifts the second largest peak load to noon and ends with a similar one-peak profile. Furthermore, this scenario is similar to the ideal profile, both from a pattern and load rescheduling perspective. FD correctly identified this, while ED and DTW could not identify this profile as the second best. This shows that, in general, FD provides the most meaningful results.

IV-B Anomaly detection

Anomaly detection in smart metering data is an interesting problem that has been extensively discussed in the literature [15]. Therefore, we selected it as the second simulation study. An anomaly is defined as a time series discord in the field of data mining, which is calculated with distance measures [16, 17, 18]. It is defined as the subsequence with the maximum distance to its nearest neighbours [17]. In the context of residential user data analysis, the subsequence is the daily data identified by nearest neighbour queries from $N$ days of the demand profile. Consequently, we look for one day’s data $a$ with the largest dissimilarity to its most similar neighbour, that is, another day’s pattern $a_{\textit{NN}}$ with the smallest $M(a,a_{\textit{NN}})=\min(M(a,i)),\quad i\in\{1...N\}\wedge i\neq a$ . Hence, it is $\max(M(a,a_{\textit{NN}})),\quad a\in\{1...N\}$ .

In this simulation study, we used unidentified residential consumer data from Adelaide, Australia. The consumer has a 10.56kW PV system with a private meter installed to measure solar generation. The sampling interval is 5 minutes. The aim of this experiment is to detect the day with an anomaly in the user’s electricity consumption data in August 2021 by applying the three distance measures, i.e., ED, DTW and the proposed FD method. Using the Matrix Profile, a vector of the distances between all subsequences and their nearest neighbours of a given time series introduced in [19], the discord is identified and is shown in Fig. 5 for the three distance measures.

Based on the figure, each distance metric identified a different day as the anomalous day. Specifically, ED detected August 1^st as the day with anomaly, DTW flagged August 25^th, and FD identified August 2^nd. Digging deeper into the three anomalous days, shown at the bottom of Fig. 5, it can be observed that the anomaly (discord) detected by the FD measure is indeed unique, as it does not exhibit the signature duck curve pattern due to solar generation. In contrast, the anomaly pattern identified by ED and DTW still shows the signature of the duck curve, making it harder to classify it as an anomalous day. This is because, as we explained in Section III, the proposed method considers the rearrangement of the two time series rather than just the pointwise comparison (ED) or the stretching of one time series to fit the other (DTW).

IV-C BTM equipment identification

In modern power systems, the identification of BTM equipment is a highly desirable application for utilities to provide their customers with satisfactory services at a minimum cost. This challenge, known as time series classification in the realm of data mining, has garnered substantial research attention. Various methods have been proposed, for example, wavelet-based methods [20], deep neural network (DNN)-based methods [21], and conventional machine learning (ML)-based methods [22], where distance measures are vital for dimension reduction before the data is fed into the classifier. To compare the efficiency of the three distance measures, we apply the K Nearest Neighbour (KNN) solution, valued for its straightforward implementation, where the only factors that affect performance are the chosen distance measures and the value of the parameter $k$ , which is set to $5$ based on related studies [22, 23]. In the simulation study, we used a state-of-the-art data set comprising 60,000 daily electricity usage patterns from six distinct types of residential users, namely PV users, PV & ESS (energy storage system) users, PV& EV users, PV & EV & ESS users, EV users, and conventional consumers [24]. The accuracy of each distance measure is summarised in the confusion matrix in Fig. 6.

From the figures, the FD shows the best overall accuracy of 65%, followed by ED (60%) and DTW (51%). It also outperforms the other two methods for each type of user identification according to the diagonal precision in Fig. 6. This shows that the proposed method is a robust distance measure compared to ED and DTW. It also demonstrates the potential of applying FD as the primary distance metric for energy consumers’ identification problems versus other state-of-the-art classification methods.

V Limitations and future works

One downside of the proposed distance measure is the higher complexity of solving an LSAP problem compared to ED and DTW. However, we believe that the effectiveness and accuracy of the proposed method outweigh the time requirements, particularly when it comes to electricity grids as critical systems. It should be noted that modern residential users’ data is typically measured at hourly or half-hourly intervals, which means that the amount of data to be processed for each computation is limited to a relatively small window of 24 or 48 data points. For example, it takes 11 ms, 6 ms, and 1 ms for FD, DTW, and ED, respectively, to compute the distance between two time series with half-hour intervals for two days. As a result, the computational requirements are not likely to be a major obstacle, especially for applications that do not require all subsequences’ similarity. Moreover, the increasing computational capacity and enhanced edge computing techniques available to modern smart grid systems can help compensate for the higher complexity of the proposed method when high-resolution data is required for certain applications.

In future work, the temporal weights can be further developed and elaborated by involving users’ preferences, e.g., decomposing the historical load demand and using the residuals of each time stamp to reflect their temporal effort more accurately in a user-centric manner.

The simulation studies conducted in this paper illustrate the effectiveness of the proposed method in three applications. In future work, we plan to apply our method to a large number of users with high-resolution data. Other smart grid applications, e.g., clustering, load disaggregation, and demand side management, could also benefit from FD and will be investigated in the future.

VI Conclusions

This paper proposed a new distance metric that can quantify the effort to reshape a time series to another. It can be used in time series analysis when users must consider reshaping and scheduling. Compared to ED and DTW, the proposed FD metric considers both the pointwise changes and the temporal shifting and temporal sequence exchange in the data. Therefore, it can play a critical role in many smart grid applications when energy utilities and participants analyse data on residential electricity usage collected by smart devices. This could result in improved energy efficiency, reduced green energy spillage, and greater customer satisfaction.

References

[1] P. Verma, R. Savickas, S. M. Buettner, J. Strüker, O. Kjeldsen, and X. Wang, “Digitalization: enabling the new phase of energy efficiency,” Group of Experts on Energy Efficiency, no. Seventh session, pp. 1–16, 2020.
[2] European Commission. (2022) Digitalisation of the energy system. [Online]. Available: "https://energy.ec.europa.eu/topics/energy-systems-integration/digitalisation-energy-system_en"
[3] K. Zhou, C. Yang, and J. Shen, “Discovering residential electricity consumption patterns through smart-meter data mining: A case study from China,” Utilities Policy, vol. 44, pp. 73–84, 2017. [Online]. Available: http://dx.doi.org/10.1016/j.jup.2017.01.004
[4] R. Granell, C. J. Axon, and D. C. Wallom, “Clustering disaggregated load profiles using a Dirichlet process mixture model,” Energy Conversion and Management, vol. 92, pp. 507–516, 2015. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0196890414011194
[5] R. Yuan, S. A. Pourmousavi, W. L. Soong, G. Nguyen, and J. A. Liisberg, “Irmac: Interpretable refined motifs in binary classification for smart grid applications,” Engineering Applications of Artificial Intelligence, vol. 117, p. 105588, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0952197622005784
[6] Q. Wang and V. Megalooikonomou, “A dimensionality reduction technique for efficient time series similarity analysis,” Information Systems, vol. 33, no. 1, pp. 115–132, 2008.
[7] Z. Fang, P. Wang, and W. Wang, “Efficient learning interpretable shapelets for accurate time series classification,” Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018, pp. 497–508, 2018.
[8] J. Lin, R. Khade, and Y. Li, “Rotation-invariant similarity in time series using bag-of-patterns representation,” Journal of Intelligent Information Systems, vol. 39, no. 2, pp. 287–315, 2012.
[9] N. A. Funde, M. M. Dhabu, A. Paramasivam, and P. S. Deshpande, “Motif-based association rule mining and clustering technique for determining energy usage patterns for smart meter data,” Sustainable Cities and Society, vol. 46, no. November 2018, p. 101415, 2019.
[10] C. Nichiforov and M. Alamaniotis, “Load-based classification of academic buildings using matrix profile and supervised learning,” pp. 01–05, 2021.
[11] G. Elafoudi, L. Stankovic, and V. Stankovic, “Power disaggregation of domestic smart meter readings using dynamic time warping,” in 2014 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), 2014, pp. 36–39.
[12] E. Keogh and C. A. Ratanamahatana, “Exact indexing of dynamic time warping,” Knowledge and Information Systems, vol. 7, no. 3, pp. 358–386, 2005.
[13] S. Bougleux, B. Gaüzère, D. B. Blumenthal, and L. Brun, “Fast linear sum assignment with error-correction and no cost constraints,” Pattern Recognition Letters, vol. 134, pp. 37–45, 2020.
[14] Ausgrid. (2015) Solar home electricity data. [Online]. Available: "https://www.ausgrid.com.au/Industry/Our-Research/Data-to-share/Solar-home-electricity-data"
[15] T. Andrysiak, Ł. Saganowski, and P. Kiedrowski, “Anomaly Detection in Smart Metering Infrastructure with the Use of Time Series Analysis,” Journal of Sensors, vol. 2017, 2017.
[16] V. Chandola, D. Cheboli, and V. Kumar, “Detecting anomalies in a time series database,” Computer Science & Engineering (CS&E) Technical Reports, 2009.
[17] C. C. M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. A. Dau, Z. Zimmerman, D. F. Silva, A. Mueen, and E. Keogh, “Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile,” Data Mining and Knowledge Discovery, vol. 32, no. 1, pp. 83–123, 2018.
[18] D. Furtado Silva and G. E. Batista, “Elastic Time Series Motifs and Discords,” Proceedings - 17th IEEE International Conference on Machine Learning and Applications, ICMLA 2018, pp. 237–242, 2019.
[19] S. Alaee, K. Kamgar, and E. Keogh, “Matrix profile XXII: Exact discovery of time series motifs under DTW,” Proceedings - IEEE International Conference on Data Mining, ICDM, vol. 2020-Novem, pp. 900–905, 2020.
[20] Y. S. Jeong, M. K. Jeong, and O. A. Omitaomu, “Weighted dynamic time warping for time series classification,” Pattern Recognition, vol. 44, no. 9, pp. 2231–2240, 2011. [Online]. Available: http://dx.doi.org/10.1016/j.patcog.2010.09.022
[21] J. Pöppelbaum, G. S. Chadha, and A. Schwung, “Contrastive learning based self-supervised time-series analysis,” Applied Soft Computing, vol. 117, p. 108397, 2022. [Online]. Available: https://doi.org/10.1016/j.asoc.2021.108397
[22] S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, “Learning k for kNN Classification,” ACM Transactions on Intelligent Systems and Technology, vol. 8, no. 3, 2017.
[23] M. M. R. Khan, M. A. B. Siddique, and S. Sakib, “Non-Intrusive Electrical Appliances Monitoring and Classification using K-Nearest Neighbors,” in ICIET 2019 - 2nd International Conference on Innovation in Engineering and Technology, 2019, pp. 23–24.
[24] R. Yuan, S. A. Pourmousavi, W. L. Soong, A. J. Black, J. A. R. Liisberg, and J. Lemos-vinasco, “A synthetic dataset of Danish residential electricity prosumers,” Scientific Data, pp. 1–15, 2023.