This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

LSTM-Autoencoder based Anomaly Detection for Indoor Air Quality Time Series Data

Yuanyuan Wei1, Julian Jang-Jaccard1, Wen Xu1, Fariza Sabrina2, Seyit Camtepe3, Mikael Boulic4 1CybersecurityLab, Comp Sci/Info Tech, Massey University, Auckland, 0632, NEW ZEALAND
2School of Engineering and Technology, Central Queensland University, Sydney NSW 2000, AUSTRALIA
3CSIRO Data61, AUSTRALIA
4School of Built Environment, Massey University, Auckland, 0632, NEW ZEALAND
[email protected], [email protected], [email protected], [email protected],
[email protected], [email protected]
Abstract

Anomaly detection for indoor air quality (IAQ) data has become an important area of research as the quality of air is closely related to human health and well-being. However, traditional statistics and shallow machine learning-based approaches in anomaly detection in the IAQ area could not detect anomalies involving the observation of correlations across several data points (i.e., often referred to as long-term dependences). We propose a hybrid deep learning model that combines LSTM with Autoencoder for anomaly detection tasks in IAQ to address this issue. In our approach, the LSTM network is comprised of multiple LSTM cells that work with each other to learn the long-term dependences of the data in a time-series sequence. Autoencoder identifies the optimal threshold based on the reconstruction loss rates evaluated on every data across all time-series sequences. Our experimental results, based on the Dunedin CO2CO_{2} time-series dataset obtained through a real-world deployment of the schools in New Zealand, demonstrate a very high and robust accuracy rate (99.50%) that outperforms other similar models.

Index Terms:
Long short-term memory (LSTM), Autoencoder, Indoor air quality, CO2CO_{2}, Time series, Anomaly detection

I Introduction

The indoor air quality (IAQ) is closely related to human health, productivity, and work efficiency[1]. Good air quality is even more important for children who spend a vast majority of their time at school. Providing children with fresh air in their classroom environment is of high importance for their health and well-being. However, the formulation of CO2CO_{2}, which is considered to be the major constituents of indoor air pollutants, can be easily built up by children studying/playing inside classrooms and also accompanied by the emission from floors and other surface [2]. Such CO2CO_{2} can become the basis of creating harmful mold and bacteria that could contribute to poor health and degrading academic performance.

Constant monitoring of indoor air quality including the measurement of the level of CO2CO_{2} in school environments has been problematic for many countries including the OECD as the large-scale monitoring of indoor air quality has proven to be too expensive for budget-strapped schools. To address this concern, a team of researchers at Massey University developed a low-cost monitoring suite called SKOol MOnitoring BOx (SKOMOBO) with the National Institute of Water and Atmospheric Research (NIWA) [3, 4, 5]. A SKOMOBO unit is a small box, approximately the size of 100 × 100 × 100 mm, designed to house a number of low-cost sensors that could capture a number of indoor air quality-related data such as particulate matter (PM2.5PM_{2.5} and PM10PM_{10}), temperature, relative humidity, carbon dioxide (CO2CO_{2}) and human occupancy in classrooms.

The monitoring of measurable indoor air quality is challenging due to the fluctuation of IAQ data reading and the question of data quality, for example, deploy in non-stationary or uncontrolled environments  [6]. In addition, rare or consistent contamination events can also corrupt the data quality (i.e., natural disasters such as fire, flooding, thunderstorms). As a result, numerous proposals for detecting anomalous events in IAQ have been attempted by utilizing the latest advancement in artificial intelligence-based techniques. For instance, many attempted to use statistical methods (e.g., using the means, standard deviation, Gaussian q-distribution) [7, 8] and other combinations of shallow machine learning techniques (e.g., kNN, k-means, regression) [9, 10, 11] to find the patterns of normal behaviors of IAQ and use that as a basis to detect anomalous events or for improving forecasting capabilities such as [12, 13]. However, these existing methods could not detect anomalies where the observation of several correlated data points is necessary.

In this research, we propose a hybrid deep learning model that combines the capabilities of long short-term memory (LSTM) and Autoencoder (AE) for detecting anomalous data points in IAQ datasets based on the understanding of long-term dependencies that exist in data samples.

The main contributions of our proposed model are the following.

Summary of Original Contributions

  • In our proposed model, LSTM networks are comprised of multiple LSTM units that work with each other to learn the long-term correlation of data within a time series sequence. Autoencoder is used to identify the optimal threshold based on the reconstruction error rates evaluated on every data across all time-series sequences. This threshold is used to identify anomalies.

  • We apply our proposed model to the Dunedin CO2CO_{2} Dataset obtained from a real-world deployment of multiple primary/secondary schools in New Zealand.

  • We compare the performance of the proposed model with other similar approaches that use different aspects of LSTM and/or AE. Our experimental results, performed based on the comprehensive set of evaluation criteria, demonstrate that our proposed model can effectively detect anomalies reaching the detection accuracy that exceeds 99%.

The rest of this paper is structured as follows. Section II introduces related works in the field of indoor air quality. Section IV introduces the details of our proposed model. Section V illustrates the experimental setup and analysis of results evaluated on the Dunedin CO2CO_{2} dataset. Section VII concludes the paper with the planned future works.

II Related Work

We review the existing state-of-the-art techniques for detecting anomalies in indoor air quality datasets and other similar fields.

Ottosen et al. [9] introduced two anomaly detection techniques that use k-Nearest Neighbour (KNN) and AutoRegressive Integrated Moving Average (ARIMA) to detect both point and contextual anomalies separately in a low-cost air quality dataset. In their approach to detect point anomalies, they simply utilized the average Euclidean distance to compute the similarity with the remaining points and assign an anomaly score to the individual point. ARIMA was used to detect contextual anomalies by calculating anomaly scores for each data point between the model and measurement based on the absolute value of the residual. Both point and contextual anomalies are classified into two clusters as normal and anomaly by K-means clustering.
Wei et al. [7] proposed a hybrid model of MSD-Kmeans to detect anomalies with indoor PM10 dataset. They first used the statistical method of Mean and Standard Deviation (MSD) which was used to eliminate noisy data to reduce the impact of clustering from the noise. Then they applied the K-means algorithm to achieve better local optimal clustering. The performance of their proposal showed the detection accuracy (97.6%) and F1-score (91.9%).
Li et al. [10] proposed clustering-based Fuzzy C-means. In their approach, the authors used a reconstruction criterion to reconstruct the optimal cluster centers and partition matrix based on multivariate subsequences data. They also used a reconstruction error as the fitness function of the Particle Swarm Optimization (PSO) algorithm to define a level for detecting anomalies in multivariate data. However, the proposed algorithm cannot reveal the structure of high dimensional multivariate time series due to the issue involved in the PSO algorithm trap in local optima. Sharma et al. [12] proposed a low-cost framework named IndoAirSense to estimate and forecast indoor air quality in selected classrooms in the university. They first used Multi-Layer Perceptron (MLP) and eXtream Gradient Boosting Regression (XGBR) to estimate real-time indoor air quality. Then they used LSTM-wF (Long Short Term Memory without using the forget gate) to reduce the complexity of LSTM to forecast indoor air pollutants. Clearly not using the forget gate that keeps the long-term memory, this model could not detect anomalies in the time series dataset.

Mumtaz et al. [14] proposed an LSTM-based model for predicting the concentration of different air pollutants to examine the overall quality of an indoor environment. In their research, they collected the base data through IoT sensors that collect different air pollutants (e.g., NH3NH_{3}, COCO, NO2NO_{2}, CH4CH_{4}, CO2CO_{2}, PM2.5PM_{2}.5). Their proposed system had the capability of sending alerts after detecting anomalies in the air quality.

Xu et al. [15] proposed an LSTM model with an added error correction model (ECM) for improving the prediction of indoor temperature in public buildings. In their approach, an ECM is built when the predicted and measured data are co-integrated in the same order, then utilized to revise the prediction of the testing dataset. Jung et al. [16] utilized LSTM for predicting the conditions of indoor space for facility management based on the three IoT sensor datasets that measure the temperature, humidity, and brightness of a space. LSTM is used to detect anomalous indoor space conditions where the readings on the combination of three indoor air condition datasets deviate from a threshold obtained during training.

Hossain et al. [17] proposed a combined prediction scheme using two variations of the RNN model (i.e., GRU and LSTM, respectively) to forecast the daily air quality index (AQI) for two of the biggest cities (i.e., Dhaka and Chattogram) in Bangladesh. In their proposal, they used GRU and LSTM as the first and second hidden layers respectively followed by two dense layers as a prediction model. Results reflected that their model followed the actual AQI trends for both cities and demonstrated that their two models improved the overall performance when compared to a single model of GRU or LSTM was used.

Park et al. [18] introduced a hybrid model of LSTM-VAE (LSTM Variational Autoencoder) to detect multimodal anomalies on robot-assisted feeding systems designed to help people with disabilities often needing physical assistance. They used the hybrid model to analyze the data collected through 155 robot-assisted feeding systems to find any anomalous behaviors exhibited by any robots. Anomalies are found based on the state threshold by calculating a reconstruction score according to the data containing the state of task execution. The result of the ROC curve obtained in their model was higher than other similar approaches. Similarly, Liu et al. [19] proposed a combination model VAE-LSTM that combines Variational Autoencoder and LSTM for identifying anomalies that span over multiple time scales. They used VAE to summarise the local information that happens over a short term while using LSTM to analyze the patterns that happen over a longer term which allowed their proposed model to detect anomalies that occur both in short and long periods.

The time dependency is significantly related to detecting anomalies in the field of Industrial Internet of Things (IIoT) because anomalies happen in both past and current states. In order to detect time series anomalies in IIoT, Wu et al. [8] proposed an LSTM-Gauss-NBayes approach for detecting anomalies. They first utilized stacked LSTM to deal with time-series data and obtained the prediction error, then used these prediction errors to detect anomalies by the Gaussian Naive Bayes model. The evaluation showed that their proposed method achieved higher accuracy (0.969%) in Power dataset, compared to the proposals using stacked Bi-LSTM (0.924%) [20], LSTM NN (0.905%) [21], and MLP (0.873%) [22].

In order to deal with high dimensional anomaly problems in intelligent industrial applications, Homayouni et al. [11] proposed VLSTM (variational LSTM) to deal with the high dimensional and imbalance industrial big data (IBD). Their VLSTM includes three parts: LSTM encoder, variational reparameterization module, and LSTM decoder. The LSTM encoder and decoder are used to extract the raw input data from high dimensional to low dimensional without losing critical features. The variational reparameterization module was used to deal with reconstructing hidden variables for low-dimensional feature representation by using variational Bayes for network traffic classification. Similarly, Trinh et al. [23] proposed a hybrid model that combines LSTM with Autoencoder as well as an Isolation Forest. They used LSTM-AE to extract significant features and calculated the reconstruction error. The author used iForest to detect anomalies based on the error vector.

III Preliminaries

III-A LSTM

LSTM stands for long short term memory which is often regarded as an extension of Recurrent Neural Networks (RNN). RNN provided the capability of “short-term memory” which allowed the use of the previous information (at a certain point only) to be used for the present task. Extending from RNN, LSTM architecture provides the capability of “long-term memory” where a list of all of the previous information (opposed to a point of time) is available for the current neural node.

A common LSTM unit, depicted in Fig. 1, is composed of a cell, an input gate, an output gate, and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information in and out of the cell.

Refer to caption
Figure 1: How LSTM Unit Works

Note that:

  • The Cell State refers to the current long-term memory of the network that stores the list of previous information.

  • The previous Hidden State refers to the output at the previous point in time which can be seen as short-term memory.

  • The input data contains the input value at the current time step.

Step1 : Forget Gate
The main purpose of the forget gate is to decide which bits of the cell state are useful given both the previous hidden state and new input data. Towards this, the previous hidden state and the new input data are fed into the neural network which generates a vector where each element of the vector is in the interval in the range of [0,1] using a sigmoid activation function.

The forget gate part of the network is trained so that it outputs close to 0 when a component of the input is irrelevant, otherwise closer to 1 when relevant. These outputs are then sent up and pointwise multiplied with the previous cell state. Mathematically, the results (ftf_{t}) from the forget gate can be presented as:

ft=σ(wf[Ht1,Xt]+bf){f_{t}}=\sigma({w_{f}}[H_{t-1},X_{t}]+{b_{f}}) (1)

where σ\sigma is the activation function, wfw_{f} and bfb_{f} is the wight and bias of the forget gate. Ht1H_{t-1} and XtX_{t} present the concatenation of hidden state and the current input respectively.

Step2 : Input Gate
The main purpose of the input gate is two folds. The first is to check if the new information (i.e., the previous hidden state and new input data) is worth keeping in the cell state. If there are, secondly, it decides what new information should be added to the cell state. Towards this, the input gate goes through two processes.

One process involves generating a new memory update vector, represented as Ct~\tilde{C_{t}}, by combining the previous hidden state and new input data. A tanh activation function is used to generate the elements of the memory update vector to contain the value in the range of [-1, 1] where the negative values are used to reduce the impact of a component in the cell state. This vector represents how much to update each component of the cell state given the new data. This process is depicted in Equation 2 as follows.

Ct~=tanh(wc[Ht1,Xt]+bc){\tilde{C_{t}}}=tanh({w_{c}}[H_{t-1},X_{t}]+{b_{c}}) (2)

where wcw_{c} and bcb_{c} are the weight matrices and the bias of the input gate respectively while a tanh is used as an activation function.

Another process of the input gate involves identifying which components of the new input, given the context of the previous hidden state, are worth remembering. Similar to the forget gate, the input gate is trained to output a vector of values in [0,1] using the sigmoid activation function. Any output closer to 0 will not be updated in the cell state. This process is depicted in Equation 3 as follows.

it=σ(wi[Ht1,Xt]+bi){i_{t}}=\sigma({w_{i}}[H_{t-1},X_{t}]+{b_{i}}) (3)

where wiw_{i} and bib_{i} are the weight matrices and the bias of the input gate.

These two processes are pointwise multiplied. This causes the magnitude of new information decided by Equation 3 to be regulated and set to 0 if needs be. This resulting combined vector is then added to the cell state, resulting in the long-term memory of the network being updated, as shown in Equation 4.

Ct=ftCt1+itCt~{C_{t}}={{f_{t}}\odot{C{{}_{t}-1}}}+{{i_{t}}\odot{\tilde{C_{t}}}} (4)

Step3 : Output Gate
With the updates to the long-term memory done, it is time to work with the output gate. The main purpose of the output gate is to decide the new hidden state. Towards this, the output gates use three different information, the newly updated cell state, the previous hidden state, and the new input data.

It first applies the previous hidden state and current input data through the sigmoid activated network to obtain the filter vector ot{o_{t}} as shown in Equation 5.

ot=σ(wo[Ht1,Xt]+bo){o_{t}}=\sigma({w_{o}}[H_{t-1},X_{t}]+{b_{o}}) (5)

where wow_{o} and bob_{o} are the weight matrix and the bias of output gate.

The cell state is passed through a tanh activation function to force the values into the interval [-1, 1] to create a squished cell state which is applied to the filter vector by pointwise multiplication. A new hidden state Ht{H_{t}} is created and outputted, aloing with the new cell state Ct{C_{t}}, as shown in Equation 6.

Ht=ottanh(Ct){H_{t}}={o_{t}}\odot tanh({C_{t}}) (6)

The new cell state Ct{C_{t}} becomes a previous cell state Ct1{C_{t-1}}to the next LSTM unit while the new hidden state Ht{H_{t}} becomes a previous hidden state Ht1{H_{t-1}} to the next LSTM unit. These are repeated until the input data from all time series sequences are processed by all LSTM cells involved.

III-B Autoencoder (AE)

An autoencoder is a type of unsupervised neural network that is used to learn efficient codings of unlabelled data. It learns a representation for a set of input data by training the neural network to ignore insignificant data (i.e., often termed as “noise”). A typical autoencoder is composed of an input layer, an output layer, and several hidden layers. The operations of an autoencoder can be divided as Encoding, Decoding, and Reconstruction Loss as illustrated in Fig. 2.

Refer to caption
Figure 2: How Autoencoder Works

Step1: Encoding
In the encoding operation, input data xx is a mm high-dimensional vector (xm)(x\in\mathbb{R}^{m}) that is mapped to a low-dimensional bottleneck layer representation (h)(h) after removing any insignificant features, as shown in Equation (7).

h=f1(wix+bi)h=f_{1}(w_{i}x+b_{i}) (7)

where wiw_{i} is the weight matrix, bib_{i} is a bias and f1f_{1} is an activation function.

Step2: Decoding
In the decoding operation, the bottleneck layer representation of (h)(h) is used to generate the output x^\hat{x} that maps back into reconstruction of x{x}, as shown in Equation (8):

x^=f2(wjh+bj)\hat{x}=f_{2}(w_{j}h+b_{j}) (8)

where f2f_{2} is an activation function for the decoder. wjw_{j} is the weight matrix, bjb_{j} represents a bias and x^\hat{x} represent reconstructed input sample. Note that wjw_{j} and bjb_{j} may be unrelated to the corresponding wiw_{i} and bib_{i} for the encoder.

Step3: Reconstruction Loss
In a standard autoencoder model, a reconstruction loss (L)(L) is calculated to minimize the difference between the output and the input, as shown in Equation 9. It is often this reconstruction loss that is used for anomaly detection task [24, 25].

L(xx^)=1nn=1n|x^txt|\displaystyle L(x-\hat{x})=\frac{1}{n}\sum_{n=1}^{n}|\hat{x}_{t}-x_{t}| (9)

where xx represents the input data, x^\hat{x} indicates the output data, and nn is the number of samples in the training dataset.

Refer to caption
Figure 3: Overview of our proposed model

However, this is extended to compute a reconstruction loss of a sample in our model as following:

xi=1nn=1n|x^ixi|\displaystyle x_{i}=\frac{1}{n}\sum_{n=1}^{n}|\hat{x}_{i}-x_{i}| (10)
s.t.n={iN+12Ni+1n>N+12\displaystyle\quad\quad s.t.\ \ \ n=\begin{cases}i&\quad\leq\frac{N+1}{2}\\ N-i+1&\quad n>\frac{N+1}{2}\end{cases}

where NN is the total number of samples, and nn dedicate the nnth sample, Xi={x1,,xi}X_{i}=\{x_{1},...,x_{i}\}.

Then, we compute the the reconstruction loss for all samples in time-series as follows.

loss=1NN=i1xiloss=\frac{1}{N}\sum_{N=i}^{1}x_{i} (11)

where NN is the total number of samples, and xx indicates a reconstruction loss computed for a sample.

IV Methodology

In this section, we introduce our proposed model that uses the combination of LSTM and Autoencoder to detect anomalies based on the analysis of time-series data. We first provide the overview of our proposed model based on the four steps of creating the input sequence, LSTM Encoder, LSTM Decoder, and anomaly detection. We also provide a detailed description of the algorithm our model uses in terms of training and testing phases.

IV-A LSTM-Autoencoder

The overview of our proposed model is illustrated in Fig. 3. Our LSTM-Autoencoder utilizes the capabilities of both the LSTM neural network and Autoencoder which builts the LSTM networks on the encoder and decoder schemes of Autoencoder. The encoder obtains the sequence of the high-dimensional input data as a fixed-size vector. Using the memory cells of LSTM the data processed by the encoder scheme retains the dependencies across multiple data points within a time-series sequence while keeps reducing the high dimensional input vector representation into low-dimensional representation until it reaches the latent space. The decoder LSTM reproduces the fixed-size input sequence from the reduced representation of the input data in the latent space using reconstruction error rates to set a threshold. This threshold is used to detect anomalies.

Step1: Input Sequence Data
The original dataset is made as a series of time sequence [X1,X2,X3,,Xn][X_{1},X_{2},X_{3},...,X_{n}]. Each sequence XX with a fixed T-length time window data [x1,x2,x3,,xt][x_{1},x_{2},x_{3},...,x_{t}] is created where xtRmx_{t}\in R{{}^{m}} represents an mm-features input at time-instance tt. This is then again reshaped into a 2d (2-dimensional) array, representing samples and timesteps. For example, a sequence of our CO2CO_{2} data is converted into the 2d array where each dimension indicates the list of samples at 10 timesteps.

Step2: LSTM Encoder
The main purpose of the LSTM encoder is to act like a sequence folding layer that converts features to a batch of time-based feature sequences. It is like convolution operations on timesteps of feature sequences independently. Fig. 4 describes the details of how the (AE) encoder interacts with the series of LSTM unit cells trained to recognize the most relevant features in the input sequence.

Refer to caption
Figure 4: Details of LSTM Encoder

Each time series of XiX_{i} contains 10 samples collected on 10 timesteps (of a 1-minute interval). This one-dimensional dataset is reshaped into a two-dimensional dataset to feed to the encoder. For example, to unroll the input dataset based on timesteps, the number of input is coveted as a 2d vector where one dimension contains the 10 timesteps and another dimension contains the feature (i.e., samples of CO2CO_{2} reading), presenting as a vector of 10×110\times 1. This is now fed to the encoder. The encoder creates Layer 1 which contains an LSTM network with 10 LSTM cells. Each LSTM cell unit processes a sample. 10 LSTM cells work in a sequential manner where the 1st LSTM unit passes the result of the sample to the 2nd LSTM. The 2nd LSTM unit decides whether to keep the previous sample from the 1st LSTM to keep or forget it. If the 2nd LSTM decides to keep it, it writes it in the long-term memory and passes the information of the sample from the 1st LSTM along with the feature information processed from the sample it is processing to the 3rd LSTM and so on. The last LSTM, the 10th in our model, has all samples worth keeping processed by the 9 previous LSTM cells. The information about all relevant samples is outputted by the last LSTM cell. This output is now coveted as the 1×161\times 16 vector as encoded features.

Note that we added a RepeatVector as Layer 2 to create the copies of the 1×161\times 16 vector as many as equal to the number of timesteps. For example, the size of timesteps in our model is 10 therefore Layer 2 creates 10 copies of encoded features as a two-dimensional vector that equals 10×1610\times 16.

Step3: LSTM Decoder
The main purpose of the LSTM decoder is to act like a sequence unfolding layer that restores the sequence structure of the input data after the sequence folding on timesteps. Fig. 5 describes the details of how the decoder interacts with LSTM cells to reconstruct the outputs.

Refer to caption
Figure 5: Details of LSTM Decoder

Each 1×161\times 16 set is now fed as an input to the decoder which creates a Layer 3 network with 10 LSTM cell units. Each LSTM cell unit processes each 1×161\times 16 encoded feature. Each LSTM unit produces an output that represents the result of the learning from the encoded feature where the output is multiplied with the 1×161\times 16 vector created by the additional TimeDistribution layer. At the same time, each LSTM cell unit produces another 2nd output containing the state of what has been processed by the current LSTM cell passing to the next LSTM, except the last LSTM unit. Note that matrix multiplication between the output of each LSTM layer (LL) (10×1610\times 16) and TimeDistribution layer (16×116\times 1) is calculated which results in a vector with the size of 10×110\times 1 which is the same as the size of the input.

Step4: Anomaly Detection

Refer to caption
Figure 6: Computing Reconstruction Loss on Time Series

An anomaly can be defined as an observation diverging from the majority of the data. A threshold can be set as a decision point to decide how much an observation deviates. Any observations that go beyond the threshold are defined as anomalies.

Applying this threshold-based anomaly detection technique, our model is trained with the dataset that contains the CO2CO_{2} values within a normal range. This is to obtain the reconstruction error rates associated with the normal CO2CO_{2} data points. Once training is done and all different reconstruction error is computed on all samples, the max reconstruction error rate is set as a threshold. Once the threshold is decided, we input the testing CO2CO_{2} dataset which now contains all ranges of CO2CO_{2} readings. A reconstruction error rate of each CO2CO_{2} value is computed for each sample in the testing set. If the reconstruction error rate goes beyond the threshold, this sample is considered as an anomaly.

Fig. 6 illustrates how we calculate reconstruction loss for each sample contained in different time-series sequences. Let’s presume that there are 5 samples [x1,x2,x3,x4,x5x_{1},x_{2},x_{3},x_{4},x_{5}] which are made as 3 time-series sequences of [X1,X2,X3X_{1},X_{2},X_{3}] where each sequence containing 3 samples on 3 different timesteps where X1[x1,x2,x3]X_{1}\in[x_{1},x_{2},x_{3}], X2[x2,x3,x4X_{2}\in[x_{2},x_{3},x_{4}], and X3[x3,x4,x5X_{3}\in[x_{3},x_{4},x_{5}] - as shown in blue dotted blocks. Our model trains these 3 time-series of sequences as inputs and constructs the outputs that maps to each sequence X1^[x1^,x2^,x3^],X2^[x2^,x3^,x4^],andX3^[x3^,x4^,x5^].\hat{X_{1}}\in[\hat{x_{1}},\,\hat{x_{2}},\,\hat{x_{3}}],\,\hat{X_{2}}\in[\hat{x_{2}},\,\hat{x_{3}},\,\hat{x_{4}}],\,and\>\hat{X_{3}}\in[\hat{x_{3}},\,\hat{x_{4}},\,\hat{x_{5}}].

Let’s assume that the original value for the 3 sequences were: X1[x1=1,x2=2,x3=3]X_{1}\in[x_{1}=1,\,x_{2}=2,\,x_{3}=3], X2[x2=2,x3=3,x4=4]X_{2}\in[x_{2}=2,\,x_{3}=3,\,x_{4}=4], and X3[x3=3,x4=4,x5=5]X_{3}\in[x_{3}=3,\,x_{4}=4,\,x_{5}=5] where the mapping outputs for each time sequence came out as

X1^[x1^=1.1,x2^=2.02,x3^=3.01],X2^[x2^=1.99,x3^=2.99,x4^=3.99],andX3^[x3^=3.01,x4^=4.02,x5^=5.02].\hat{X_{1}}\in[\hat{x_{1}}=1.1,\,\hat{x_{2}}=2.02,\,\hat{x_{3}}=3.01],\>\hat{X_{2}}\in[\hat{x_{2}}=1.99,\,\hat{x_{3}}=2.99,\,\hat{x_{4}}=3.99],\>and\>\hat{X_{3}}\in[\hat{x_{3}}=3.01,\,\hat{x_{4}}=4.02,\,\hat{x_{5}}=5.02].

The reconstruction loss for each sample can be calculated as:

x1=|1.11|/ 1=0.1\displaystyle x_{1}=\lvert 1.1-1\lvert\,/\,1=0.1
x2=(|2.022|+|1.992|)/ 2=0.01\displaystyle x_{2}=(\lvert 2.02-2\lvert+\lvert 1.99-2\lvert)\,/\,2=0.01
x3=(|3.013|+|2.993|+|3.013|)/ 3=0.01\displaystyle x_{3}=(\lvert 3.01-3\lvert+\lvert 2.99-3\lvert+\lvert 3.01-3\lvert)\,/\,3=0.01
x4=(|3.994|+|4.014|)/ 2=0.01\displaystyle x_{4}=(\lvert 3.99-4\lvert+\lvert 4.01-4\lvert)\,/\,2=0.01
x5=|5.025|/ 1=0.02\displaystyle x_{5}=\lvert 5.02-5\lvert\,/\,1=0.02

The max reconstruction loss is set as a threshold which in our case is set to be 0.1. During testing, any samples whose reconstruction loss goes beyond 0.1 is now labeled as an anomaly.

IV-B Algorithm

The algorithm for the proposed model is shown in Algorithm 1. The main goal of the training phase in our proposed model is two folds. Firstly, the focus of the training is to minimize the reconstruction error so that the outputs reconstructed from the reduced representation of the input resemble the input as much as possible. Secondly, our model obtains the typical reconstruction error rate associated with the normal range CO2CO_{2} data points to find an optimal threshold to use for detection during the test phase. The main goal of the testing phase is to use the threshold to detect anomalies in the test dataset.

Algorithm 1 LSTM-AE Anomaly Detection

Input:

Training set {x0,x1,x2,,xn1}\{x_{0},x_{1},x_{2},\dots,x_{n-1}\},

Test set {x0,x1,x2,,xm1}\{x^{\prime}_{0},x^{\prime}_{1},x^{\prime}_{2},\dots,x^{\prime}_{m-1}\},

Timesteps tt

Output: A Set of anomalies(AtA_{t}) or normal (NtN_{t})

begin

       /* Phase 1: To sequence */
XiX_{i}
, XiX^{\prime}_{i}: sets of training and testing data based on timesteps (t=10)
for i[0,nt)i\in[0,n-t) do
             Xi=[xi::xi+t]X_{i}=[x_{i}::x_{i+t}]
       end for
      for i[0,mt)i\in[0,m-t) do
             Xi=[xi::xi+t]X^{\prime}_{i}=[x^{\prime}_{i}::x^{\prime}_{i+t}]
       end for
       /* Phase 2: LSTM-AE training */
Initialize the parameter of LSTM-AE model (M)
for Xi[X0,X1,,Xnt)X_{i}\in[X_{0},X_{1},...,X_{n-t})  do
             Xi^=M(Xi)\hat{X_{i}}=M(X_{i})
LerrL_{err} = |XiXi^|\sum{|X_{i}-\hat{X_{i}}|}
Update LSTM-AE to minimize LerrL_{err} by Eq. 9
       end for
       /* Phase 3: Threshold setting */
Function RLOSS(XX):
             /* XiX_{i} reconstruction error calculation */; for i(0,nt)i\in(0,n-t) do
                   Xi^=M(Xi)\hat{X_{i}}=M(X_{i})
Errarr[i,i:i+t]=|Xi^Xi|Err_{arr}[i,i:i+t]=|\hat{X_{i}}-{X_{i}}|
             end for
            return Errarr;Err_{arr};
      
      End Function
/* All data reconstruction error calculation */ for i(0,n)i\in(0,n) do
             larr[i]=Errarr[:,i]/(Errarr[:,i]!=0)l_{arr}[i]=\sum{Err_{arr}[:,i]/\sum(Err_{arr}[:,i]!=0)}
       end for
      return larrl_{arr}
/* Max RLoss from training set */
threshold(η)=max(RLOSS([X0,X1,,Xnt))threshold(\eta)=\max(RLOSS([X_{0},X_{1},...,X_{n-t}))
/* Phase 4: Anomaly detection on testing set */
ltestarr=RLOSS([X0,X1,,Xmt))ltest_{arr}=RLOSS([X^{\prime}_{0},X^{\prime}_{1},...,X^{\prime}_{m-t}))
for i(0,m)i\in(0,m) do
             if ltestarr[i]>ηltest_{arr}[i]>\eta then
                   xiAtx^{\prime}_{i}\rightarrow A_{t}
            else
                   xiNtx^{\prime}_{i}\rightarrow N_{t}
             end if
            
       end for
      
end

Training Phase
The first step in the training phase is to reshape the original dataset into time-series sequences, as shown in Algorithm 1 phase 1: To sequence. The dataset XiX_{i} represents a sequence in the training dataset. In our model, each sequence contains 10 CO2CO_{2} samples on 10 timesteps. As depicted in the Phase 2: LSTM-AE training within Algorithm. 1, the training of the model starts where each sequence is fed to the encoder one at a time where a sample in the sequence is trained by a single LSTM in a sequential manner. Once the training of each sequence completes, the latent space of the encoder rearranges the concatenation of the (relevant) data points as a 1-dimensional encoded feature representation. The RepeatVector layer makes multiple copies of the encoded feature.

The decoder creates an LSTM network with the number of LSTM cells according to the timesteps (i.e., also match the copies of the encoded features). Each encoded feature is processed by a single LSTM cell. The results of the processing by all LSTM cells are made as a single-dimensional vector at the TimeDistributed Dense Layer which produces the output. A reconstruction loss between the output and input is calculated, as in the steps 8 to 13. A backpropagation strategy is applied to adjust the weights and parameters of the model. We use Mean Absolute Error (MAE) algorithm, as shown in Equation 12, as the reconstruction error loss function.

Loss(MAE)=i=1n|xixi^|nLoss(MAE)=\frac{\sum_{i=1}^{n}\left|x_{i}-\hat{x_{i}}\right|}{n} (12)

where nn indicates the total number of samples, xix_{i} is the representation of the original input bein feed to the encoder while xi^\hat{x_{i}} is the output produced by the decoder.

The model trains on all time-series sequences until the reconstruction loss is minimized for all samples. Note that we use 16 neurons in the latent space of the encoder to capture the output from the 10th LSTM. We used ”tanh” as the activation function in our proposed model. We also use two Dropout layers (0.2) in the encoder and decoder respectively. We use a RepeatVector layer between the encoder and decoder. An additional TimeDistrutedDense layer is used before the output layer. Once training is complete, the max reconstruction error is obtained as a threshold as shown in Phase 3: Threshold setting.

Testing Phase
The details of the testing phase of our model are shown in the Phase 4: Anomaly detection on testing set. A sequence of time series containing 10 data points of 10 timesteps is fed to the trained LSTM encoder. Note that this time the 10 data points contain all ranges of CO2CO_{2} values. The LSTM decoder produces a single time series also containing 10 data points of 10 timesteps using the encoded (and reduced) feature representation of the input sample. A reconstruction error rate of each data point is compared with the threshold. The calculation of the reconstruction loss strategy is the same as we mentioned in the training phase in Equation 12. If the reconstruction loss value is bigger than the threshold η\eta, this data point is labeled as an anomaly otherwise labeled as normal. This is shown in the following Equation 13.

X={Xiis anomalies,ifltestarr[i]>ηXiis normal,otherwiseX^{\prime}=\left\{\begin{array}[]{@{}ll@{}}X^{\prime}_{i}\ \text{is anomalies},&\text{if}\ ltest_{arr}[i]>\eta\\ X^{\prime}_{i}\ \text{is normal},&\text{otherwise}\end{array}\right. (13)

where XX^{\prime} indicates a reconstructed time-series, XiX^{\prime}_{i} is a data point contained in the time-series, and ltestarr[i]ltest_{arr}[i] is a result from a reconstruction loss function using MAE.

V Data and Data processing

We discuss the details of the dataset we use in our study along with the description of the data preprocessing strategies we adopted.

V-A Dunedin CO2CO_{2} Dataset

Refer to caption
Figure 7: The projection of the original raw dataset

With the focus on understanding the relationship between the level of CO2CO_{2}, weather conditions, and student performance, 74 units of SKOMOBO boxes were deployed in multiple primary/secondary schools in Dunedin, South Island, New Zealand. Records containing the reading of CO2CO_{2} were collected over a period of four months between 01/01/2018 and 04/30/2018, at a 1-minute interval. The projection of the CO2CO_{2} reading is shown in Fig. 7. As expected, any changes in the CO2CO_{2} are not observed when there is a school break (e.g., during January 2018, and the 1st term break in the last two weeks of April 2018. As students occupy classrooms, we observe the changes of CO2CO_{2} fluctuating a lot of which some could be anomalous. The total number of CO2CO_{2} readings presented in the dataset was 247,263.

V-B Data Preprocessing

We first clean up the original records. We first removed all duplicate records. For example, we removed the records containing the CO2CO_{2} readings with identical timestamps. We also removed the records where it contains both CO2CO_{2} and timestamp with NaN values. We kept the records where the CO2CO_{2} reading is either an empty value or NaN value but a timestamp was legit. In this case, we replaced either empty or NaN value with the numeric 0. After this clean-up, we had a total of 171,067 records.

V-C Training and Test dataset

Two separate datasets for the training and testing phase of our model were prepared as follows.

Training Dataset
According to [26], the typical CO2CO_{2} value accepted as the normal range are in between 0 and 968 (PPM).

Refer to caption
Figure 8: The distribution of CO2CO_{2} reading according to 3-sigma

Based on our analysis of the dataset, we found that the majority of the CO2CO_{2} readings sit if we use the 2-sigma rule of the normal distribution (i.e, around CO2CO_{2} readings less than 968 when the mean of the CO2CO_{2} readings is around 488) which can be considered as the acceptable normal range, as shown in Fig. 8.

As the training of the model learns the typical reconstruction error associated with normal data points, we created a training dataset only to contain CO2CO_{2} data points within a normal range. Towards this, we first set aside 3 months of the original dataset (from 01/01/2018 - 31/03/2018). Then we calculated the 2-sigma rule to check each sample if it sits in the normal range or not. If there are any CO2CO_{2} readings beyond the 2-sigma rule, we removed them. The illustration of creating the training dataset is shown in Fig. 9.

Refer to caption
Figure 9: Creating Training Dataset

Test Dataset
We used 1 month of the original data (from 01/04/2018 - 30/04/2018) as a testing dataset. Note that this dataset contains all different ranges of CO2CO_{2} readings. We added the numeric 0 as a label if the CO2CO_{2} reading is within the 2 sigma-rule range, otherwise, we added the numeric 1. These labels are only used to evaluate the performance of our model - whether our model was good at detecting anomalous data points to normal data points. The illustration of adding labels in the test dataset is shown in Fig. 10.

V-D Data Normalization

We applied a data normalization technique to eliminate the impacts of different scales across CO2CO_{2} readings thus reducing the execution time and computational complexity of the model training. We used a standard scalar normalization as depicted in Equation (14).

Zi=XiμSZ_{i}=\frac{X_{i}-\mu}{S} (14)

where Zi denotes all the normalized numeric values ranging between [0-1]; XiX_{i} indicates a data point while μ\mu and SS refer to the mean and standard deviation.

Refer to caption
Figure 10: Creating Test Dataset

VI Evaluations

In this section, we provide the details of the experiment including the environment setup, performance metrics we use, analysis of results, and discussion.

VI-A Experiment Setup

Our experiments were carried out using the following system setup shown in Table I.

TABLE I: Implementation environment specification
Unit Description
Processor 3.4GHz Inter Core i5
RAM 16GB
OS MacOS Big Sur 11.4
Packages used tensorflow 2.0.0, sklearn 0.24.1

The hyperparameters used in the training phase are illustrated with the values for each parameter along with the description in Table II.

TABLE II: LSTM-AE Training parameters
Hyperparameters Values Descriptions
Learning rate 0.001 Learning speed (within range 0.0 and 1.0)
Droupout 0.2 No. of neurons ignored
Batch size 64 No. of samples in one fwd/bwd pass
Epoch 30 No. of one fwd/bwd pass of all samples

VI-B Performance Metrics

To evaluate the performance of our model, we used the classification accuracy, precision, recall, and F1 score as performance metrics. Table III illustrates the confusion matrix.

TABLE III: Confusion Matrix
Total Population Predicted Condition
Normal Anomaly
Actual Condition Normal TN FP
Anomaly FN TP

where;

  • True Positive (TP) indicates anomalous data point correctly classified as anomalous.

  • True Negative (TN) indicates normal data point correctly classified as normal.

  • False Positive (FP) indicates normal data point incorrectly classified as anomalous.

  • False Negative (FN) indicates anomalous data point incorrectly classified as normal.

Based on the aforementioned terms, the evaluation metrics are calculated as follows:

TPR(TruePositiveRate/Recall)=TPTP+FNTPR(TruePositiveRate/Recall)=\frac{TP}{TP+FN} (15)
FPR(FalsePositiveRate)=FPFP+TNFPR(FalsePositiveRate)=\frac{FP}{FP+TN} (16)
Precision=TPTP+FPPrecision=\frac{TP}{TP+FP} (17)
F1score=2×(Precision×RecallPrecision+Recall)F1-score=2\times\left(\frac{Precision\times Recall}{Precision+Recall}\right) (18)
Accuracy=TP+TNTP+TN+FP+FNAccuracy=\frac{TP+TN}{TP+TN+FP+FN} (19)

The area under the curve (AUC) computes the area under the receiver operating characteristics (ROC) curve which is plotted based on the trade-off between the true positive rate on the y-axis and the false positive rate on the x-axis across different thresholds. Mathematically, AUC is computed as shown in Equation (20).

AUCROC=01TPTP+FN𝑑FPTN+FPAUC_{ROC}=\int_{0}^{1}\frac{TP}{TP+FN}d\frac{FP}{TN+FP} (20)

VI-C Results

We provide the results of the performance of our proposed model observed from a number of different evaluation aspects.

VI-C1 Training

Fig. 11 shows the trends of the loss at different epoch intervals. The training loss (the blue line) assesses the error rate of the model during training. We can see that the training loss stabilizes pretty quickly approximately around 8 epochs. We set aside 10% of the validation set from the training dataset to assess the performance of our model during training. As expected, the validation loss is not stabilized before 8 epochs. However, after 8 epochs, the validation loss presents a similar loss rate to the training loss (i.e., the average of approximately 0.07%). This can be regarded as a good fit and our proposed model works well (i.e., our model does not overfit or underfit).

Refer to caption
Figure 11: Training/Validation Loss

VI-C2 Impact of Model Architecture

We also tested the sensitivity of our model in terms of the model architecture that differs in the number of the hidden layer (s) and the number of LSTM cell units used. Three different types of model architects, consisting of 1 hidden layer, 2 hidden layers, and nn hidden layers, at the encoder and decoder were evaluated. The number of hidden layers at the encoder and the decoder vary while the number of LSTM units used at different model architectures is the same. The details of the three model architectures we evaluated are shown in Fig. 12.

Refer to caption
Figure 12: Different Model Architecture

There were slight differences in terms of the model performance depending on the number of hidden layers. For example, the model architecture with 1 hidden layer worked best with the F1-score above 94.55% while the F1-score of 93.48% and 93.31% were reached by 2-layer and 3-layer models, respectively. The size of the output vector used by different architectures had a higher impact on the model performance through the smallest size of the output vector used by the 1-layer model worked best reaching 94.55% F1-score, as shown in Table. IV.

TABLE IV: Performance of different model architectures
No. (Layers) No. (Units) Accuracy Precision Recall F1-score
1 128 99.29 100 85.71 92.31
1 16 99.50 100 89.90 94.68
2 64,16 99.39 100 87.76 93.48
3 128,64,16 99.38 100 87.47 93.31

VI-C3 Impact of The Size of Time Sliding Window

The size of the time sliding window that decides the number of timesteps to contain in a sequence can impact the overall performance as it can affect the way the reconstruction error rate is computed. Thus we also tested the sensitivity of our model in terms of the size of the time sliding window and the model performance. We tested our model in terms of using the size of the time sliding window at 10, 15, 20, 25, 30, 35, and 40. As shown in Fig. 13, the time sliding window at 10 performed best with the highest TPR rate which contributed to the best F1-score and accuracy. The size of the time window at 15 and 20 worked worst with the lowest TPR rate just above 80%. Above the time sliding window size 20, we observed that TPR and accordingly f1-score started decreasing as the number of time sliding windows increased.

10152025303540707080809090100100110110Time Window LengthPrecision (%)
(a) Precision
10152025303540707080809090100100110110Time Window LengthAccuracy (%)
(b) Accuracy
10152025303540707080809090100100110110Time Window LengthRecall (%)
(c) Recall
10152025303540707080809090100100110110Time Window LengthF1-score (%)
(d) F1-score
Figure 13: Performance comparison of LSTM-AE under different time window length

VI-C4 Model Performance

Fig. 14 illustrate the performance of our model based on the confusion matrix. The total number of test samples = 42,787 containing the normal samples = 40,697 and the abnormal samples = 2,100 according to our label. Our model was able to detect a total of 1,888 abnormal data points correctly out of the 2,100 abnormal data samples (i.e., 89.90% of accuracy). Our model detected the total number of 40,697 normal data points correctly out of the 40,697 normal data samples (i.e., 100% of accuracy). Our model had none of FP by incorrectly classifying the normal samples as abnormal while it had 212 of FN by incorrectly classifying the abnormal samples as normal. By accounting for all of these, our model resulted in an accuracy of 99.50%, the precision 100%, the recall 89.90%, and the F1-score 94.68%.

Refer to caption
Figure 14: Detection Results Based on Confusion Matrix

Fig. 15 shows our model performance in terms of the AUC-ROC graph that clearly demonstrates the trade-off between true positive rate and false-positive rate. The curve confirms that our proposed model is highly effective in accurately detecting anomalies by achieving an AUC-ROC score of 94.8%. This result is calculated based on the whole time series testing dataset. To detect the model efficiency, we also compared different time window lengths in 1 minutes intervals. We observed that the AUC-ROC curve starts decreasing as the size of the time window length increases. The best performance was shown at 94.8% when the size of the time window length = 10 while the worst performance was shown when the size of the time window length was = 10 and 20. Similar to the performance of the impact of the time sliding window, the AUC-ROC score decreased slightly as the size of the time window increased from the size over 25.

00.20.20.40.40.60.60.80.81100.20.20.40.40.60.60.80.811False Positive RateTrue Positive RateAUCT10T_{10} = 95.0T15T_{15} = 91.1T20T_{20} = 91.3T25T_{25} = 93.5T30T_{30} = 93.4T35T_{35} = 93.1T40T_{40} = 92.4
Figure 15: AUC-ROC Visualization

Fig. 16 shows where anomalous data points were detected (i.e., see by the red dots) based on the test dataset. In our observation, the threshold was set at 1.742 when the training was done. During the test, we also observed that any CO2CO_{2} readings greater than 1,000 usually had a reconstruction error rate greater than 1.742 and therefore were considered anomalous.

Refer to caption
Figure 16: Normal and anomalies distribution on the testing set

VI-C5 Comparison to Other Similar Models

Table V shows the performance comparison of our model evaluated on the Dunedin CO2CO_{2} dataset with other similar models that use different variations of LSTM Autoencoder. As the result shows, our approach shows the best performance in terms of both accuracy (at 99.50%) and precision (at 100%). The similar model proposed by Yin et al. [27] shows the most competitive performance compared to ours with similar accuracy and F1-score. The model proposed by Nguyen et al. [28] shows a higher F1-score (at 96.98%) though the accuracy of their model is lower than our method. In our further investigation, they used a One-Class SVM as an additional classifier to reduce the false positives.

TABLE V: Comparison to Other Similar Models
Paper Techniques Datasets Accuracy Precision Recall F1-score
Yin et al. [27] LSTM-AE Yahoo Webscope S5 99.25 97.84 94.16 95.97
Liu et al. [29] LSTM-AE ECG 98.57 97.55 97.55 -
Chander et al.  [30] LSTM-AE WSNs position at IBRL - 89.1 86.9 85.24
Lin et al. [19] VAE-LSTM Ambient temperature - 80.6 1.0 89.2
CPU utilization AWS - 69.4 1.0 81.9
Machine teperature - 55.9 1.0 71.7
Sharma et al. [31] LSTM-AE CERT insider threat dataset 90.17 - 91.03 -
Kieu et al. [32] LSTM-AE Numenta Anomaly Benchmark (NBA) - 90.8 98.8 94.6
Li et al. [33] LSTM-AE adversarial learning e-VDS 94.16 90.31 88.45 89.37
CCV 83.03 75.08 73.26 74.16
Kang et al. [34] LSTM-AE Break Operating Unit (BOU) data 94.44 97.94 85.77 91.45
Nguyen et al. [28] LSTM-AE-OCSVM Generated dataset 98.36 98.45 99.59 96.98
Tran et al. [35] LSTM-AE iforest simulated data in Fashion industry 95 100 94 87
Our Proposal LSTM-AE Dunedin CO2CO^{2} Dataset 99.50 100 89.90 94.68

VII Conclusion

We proposed an LSTM-Autoencoder based deep-learning technique for detecting anomalies in indoor air quality datasets. In our proposed model, two LSTM networks each of which consists of multiple LSTM units provide the learning ability to identify long-term correlational dependencies that exist in a time series sequence. Autoencoder is used to generate encoded features of the input representation while maintaining the long-term dependences identified by the LSTM encoder and constructing the outputs to resemble the input through the LSTM decoder. The max MAE from the trained model on the training set is set as a threshold and is used by the anomaly detector. The anomaly detector identifies each data observation from the testing set as anomalies where its reconstruction loss result is greater than the threshold.

Our proposed model was applied to the CO2 time-series dataset obtained from a real-world deployment. The experimental results showed that our proposed model is highly efficient for anomalous CO2 readings by providing the detection accuracy of 99.50% and outperforms other similar models.

We plan to apply our proposed model for detecting DDoS attacks [25] based on the time-series analysis.

References

  • [1] J.-Y. Kim, C.-H. Chu, and S.-M. Shin, “Issaq: An integrated sensing systems for real-time indoor air quality monitoring,” IEEE Sensors Journal, vol. 14, no. 12, pp. 4230–4244, 2014.
  • [2] N. H. Motlagh, M. A. Zaidan, E. Lagerspetz, S. Varjonen, J. Toivonen, J. Mineraud, A. Rebeiro-Hargrave, M. Siekkinen, T. Hussein, P. Nurmi et al., “Indoor air quality monitoring using infrastructure-based motion detectors,” in 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), vol. 1.   IEEE, 2019, pp. 902–907.
  • [3] Y. Wang, M. Boulic, R. Phipps, C. Chitty, A. Moses, R. Weyers, J. Jang-Jaccard, G. Olivares, A. Ponder-Sutton, and C. Cunningham, “Integrating open-source technologies to build a school indoor air quality monitoring box (skomobo),” in 2017 4th Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE).   IEEE, 2017, pp. 216–223.
  • [4] R. Weyers, J. Jang-Jaccard, A. Moses, Y. Wang, M. Boulic, C. Chitty, R. Phipps, and C. Cunningham, “Low-cost indoor air quality (iaq) platform for healthier classrooms in new zealand: Engineering issues,” in 2017 4th Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE).   IEEE, 2017, pp. 208–215.
  • [5] Y. Wang, J. Jang-Jaccard, M. Boulic, R. Phipps, C. Chitty, R. Weyers, A. Moses, G. Olivares, A. Ponder-Sutton, and C. Cunningham, “Deployment issues for integrated open-source—based indoor air quality school monitoring box (skomobo),” in 2018 IEEE Sensors Applications Symposium (SAS).   IEEE, 2018, pp. 1–4.
  • [6] Y. Zeng, J. Chen, N. Jin, X. Jin, and Y. Du, “Air quality forecasting with hybrid lstm and extended stationary wavelet transform,” Building and Environment, p. 108822, 2022.
  • [7] Y. Wei, J. Jang-Jaccard, F. Sabrina, and H. Alavizadeh, “Large-scale outlier detection for low-cost pm10 sensors,” IEEE Access, vol. 8, pp. 229 033–229 042, 2020.
  • [8] D. Wu, Z. Jiang, X. Xie, X. Wei, W. Yu, and R. Li, “Lstm learning with bayesian and gaussian processing for anomaly detection in industrial iot,” IEEE Transactions on Industrial Informatics, vol. 16, no. 8, pp. 5244–5253, 2019.
  • [9] T.-B. Ottosen and P. Kumar, “Outlier detection and gap filling methodologies for low-cost air quality measurements,” Environmental Science: Processes & Impacts, vol. 21, no. 4, pp. 701–713, 2019.
  • [10] J. Li, H. Izakian, W. Pedrycz, and I. Jamal, “Clustering-based anomaly detection in multivariate time series data,” Applied Soft Computing, vol. 100, p. 106919, 2021.
  • [11] X. Zhou, Y. Hu, W. Liang, J. Ma, and Q. Jin, “Variational lstm enhanced anomaly detection for industrial big data,” IEEE Transactions on Industrial Informatics, vol. 17, no. 5, pp. 3469–3477, 2020.
  • [12] P. K. Sharma, A. Mondal, S. Jaiswal, M. Saha, S. Nandi, T. De, and S. Saha, “Indoairsense: A framework for indoor air quality estimation and forecasting,” Atmospheric Pollution Research, vol. 12, no. 1, pp. 10–22, 2021.
  • [13] K. Rastogi and D. Lohani, “An internet of things framework to forecast indoor air quality using machine learning,” in Symposium on Machine Learning and Metaheuristics Algorithms, and Applications.   Springer, 2019, pp. 90–104.
  • [14] R. Mumtaz, S. M. H. Zaidi, M. Z. Shakir, U. Shafi, M. M. Malik, A. Haque, S. Mumtaz, and S. A. R. Zaidi, “Internet of things (iot) based indoor air quality sensing and predictive analytic—a covid-19 perspective,” Electronics, vol. 10, no. 2, p. 184, 2021.
  • [15] C. Xu, H. Chen, J. Wang, Y. Guo, and Y. Yuan, “Improving prediction performance for indoor temperature in public buildings based on a novel deep learning method,” Building and Environment, vol. 148, pp. 128–135, 2019.
  • [16] Y. Jung, T. Kang, and C. Chun, “Anomaly analysis on indoor office spaces for facility management using deep learning methods,” Journal of Building Engineering, vol. 43, p. 103139, 2021.
  • [17] E. Hossain, M. A. U. Shariff, M. S. Hossain, and K. Andersson, “A novel deep learning approach to predict air quality index,” in Proceedings of International Conference on Trends in Computational and Cognitive Engineering.   Springer, 2021, pp. 367–381.
  • [18] D. Park, Y. Hoshi, and C. C. Kemp, “A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1544–1551, 2018.
  • [19] S. Lin, R. Clark, R. Birke, S. Schönborn, N. Trigoni, and S. Roberts, “Anomaly detection for time series using vae-lstm hybrid model,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2020, pp. 4322–4326.
  • [20] M. Sun, G. Strbac, P. Djapic, and D. Pudjianto, “Preheating quantification for smart hybrid heat pumps considering uncertainty,” IEEE Transactions on Industrial Informatics, vol. 15, no. 8, pp. 4753–4763, 2019.
  • [21] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [22] R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training recurrent neural networks,” in International conference on machine learning.   PMLR, 2013, pp. 1310–1318.
  • [23] H. D. Trinh, L. Giupponi, and P. Dini, “Urban anomaly detection by processing mobile traffic traces with lstm neural networks,” in 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).   IEEE, 2019, pp. 1–8.
  • [24] W. Xu, J. Jang-Jaccard, A. Singh, Y. Wei, and F. Sabrina, “Improving performance of autoencoder-based network anomaly detection on nsl-kdd dataset,” IEEE Access, vol. 9, pp. 140 136–140 146, 2021.
  • [25] Y. Wei, J. Jang-Jaccard, F. Sabrina, A. Singh, W. Xu, and S. Camtepe, “Ae-mlp: A hybrid deep learning approach for ddos detection and classification,” IEEE Access, vol. 9, pp. 146 810–146 821, 2021.
  • [26] Y. Liu, Z. Pang, M. Karlsson, and S. Gong, “Anomaly detection based on machine learning in iot-based vertical plant wall for indoor climate control,” Building and Environment, vol. 183, p. 107212, 2020.
  • [27] C. Yin, S. Zhang, J. Wang, and N. N. Xiong, “Anomaly detection based on convolutional recurrent autoencoder for iot time series,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 1, pp. 112–122, 2020.
  • [28] H. Nguyen, K. P. Tran, S. Thomassey, and M. Hamad, “Forecasting and anomaly detection approaches using lstm and lstm autoencoder techniques with the applications in supply chain management,” International Journal of Information Management, vol. 57, p. 102282, 2021.
  • [29] P. Liu, X. Sun, Y. Han, Z. He, W. Zhang, and C. Wu, “Arrhythmia classification of lstm autoencoder based on time series anomaly detection,” Biomedical Signal Processing and Control, vol. 71, p. 103228, 2022.
  • [30] B. Chander and K. Gopalakrishnan, “Auto-encoder—lstm-based outlier detection method for wsns,” in Machine Learning, Deep Learning and Computational Intelligence for Wireless Communication.   Springer, 2021, pp. 109–119.
  • [31] B. Sharma, P. Pokharel, and B. Joshi, “User behavior analytics for anomaly detection using lstm autoencoder-insider threat detection,” in Proceedings of the 11th International Conference on Advances in Information Technology, 2020, pp. 1–9.
  • [32] T. Kieu, B. Yang, and C. S. Jensen, “Outlier detection for multidimensional time series using deep neural networks,” in 2018 19th IEEE International Conference on Mobile Data Management (MDM).   IEEE, 2018, pp. 125–134.
  • [33] S. Li and W. He, “Vidanomaly: Lstm-autoencoder-based adversarial learning for one-class video classification with multiple dynamic images,” in 2019 IEEE International Conference on Big Data (Big Data).   IEEE, 2019, pp. 2881–2890.
  • [34] J. Kang, C.-S. Kim, J. W. Kang, and J. Gwak, “Anomaly detection of the brake operating unit on metro vehicles using a one-class lstm autoencoder,” Applied Sciences, vol. 11, no. 19, p. 9290, 2021.
  • [35] P. H. Tran, C. Heuchenne, and S. Thomassey, “An anomaly detection approach based on the combination of lstm autoencoder and isolation forest for multivariate time series data,” in Developments of Artificial Intelligence Technologies in Computation and Robotics: Proceedings of the 14th International FLINS Conference (FLINS 2020).   World Scientific, 2020, pp. 589–596.