Forecasting in Non-stationary Environments with Fuzzy Time Series

Petrônio Cândido de Lima e Silva [email protected] Carlos Alberto Severiano Junior [email protected] Marcos Antonio Alves [email protected]
Rodrigo Silva [email protected] Miri Weiss Cohen [email protected] Frederico Gadelha Guimarães [email protected] [ Machine Intelligence and Data Science (MINDS) Laboratory, Federal University of Minas Gerais, Belo Horizonte, Brazil Federal Institute of Education Science and Technology of Northern Minas Gerais, Januária Campus, Brazil Federal Institute of Education Science and Technology of Minas Gerais, Sabará, Brazil Department of Computer Science, Federal University of Ouro Preto, Ouro Preto, Brazil Department of Software Engineering, Braude College of Engineering, Karmiel, Israel Department of Electrical Engineering, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Abstract

In this paper we introduce a Non-Stationary Fuzzy Time Series (NSFTS) method with time varying parameters adapted from the distribution of the data. In this approach, we employ Non-Stationary Fuzzy Sets, in which perturbation functions are used to adapt the membership function parameters in the knowledge base in response to statistical changes in the time series. The proposed method is capable of dynamically adapting its fuzzy sets to reflect the changes in the stochastic process based on the residual errors, without the need to retraining the model. This method can handle non-stationary and heteroskedastic data as well as scenarios with concept-drift. The proposed approach allows the model to be trained only once and remain useful long after while keeping reasonable accuracy. The flexibility of the method by means of computational experiments was tested with eight synthetic non-stationary time series data with several kinds of concept drifts, four real market indices (Dow Jones, NASDAQ, SP500 and TAIEX), three real FOREX pairs (EUR-USD, EUR-GBP, GBP-USD), and two real cryptocoins exchange rates (Bitcoin-USD and Ethereum-USD). As competitor models the Time Variant fuzzy time series and the Incremental Ensemble were used, these are two of the major approaches for handling non-stationary data sets. Non-parametric tests are employed to check the significance of the results. The proposed method shows resilience to concept drift, by adapting parameters of the model, while preserving the symbolic structure of the knowledge base.

keywords:

Time Series Forecasting , Fuzzy Time Series , Non-stationary Environment , Online learning.

^†^†journal: Applied Soft Computing, Elsevier

url]https://minds.eng.ufmg.br/

1 Introduction

The expanding utilization of smart sensors, the increasing availability of data storage, and the emergence of big data have led to an increasing amount of data being produced very often in the form of a stream [1, 2, 3]. In many real-world applications, this data is organized in the form of a time series. In a time series forecasting problem, the information available for the prediction is limited to the past values of the series [1]. Hence, the temporal relationships which describe the evolution of the series must be deduced exclusively from these values.

Generally, the characteristics of the processes which generate a time series are unknown [4]. In several cases of practical interest, such as stock indices [5] in finance, evapotranspiration in agriculture [6], among others, the variable of interest in the time series is in fact the fusion of many other data sources. Not by accident these kinds of time series tend to show highly non-linear and non-stationary patterns [5, 6].

Many forecasting methods assume that the data is generated from a fixed probability distribution. However, as mentioned before, many time-series related applications deal with non-stationary and heteroskedastic stochastic processes which may arise from phenomena such as: seasonality, periodicity, hardware and machine faults, aging sensors and components, and unexpected events. These changes modify the properties of the data generating process, then changing its underlying probability distribution over time. In such non-stationary environments, any non-adaptive model trained under the false stationarity assumption is deemed to present a progressively increasing error or simply fail at some point [7].

Fuzzy Time Series (FTS) was introduced by Song and Chissom in 1993 [8] to handle with vague and imprecise knowledge in time series data. In FTS, the domain of the variable of interest, called Universe of Discourse (UoD), is divided into sub-domains, and each of them is linked to a fuzzy set. After the construction of these fuzzy sets, temporal patterns of the type IF-THEN are extracted from the training data in order to identify a rule-base able to represent the generating function of the time series.

FTS forecasting methods have become attractive due to their simplicity, model transparency, forecasting accuracy and computational performance. Some examples of successful applications are found in tourism demand forecasting [9], energy load [10, 11, 12], stock index price predictions [13, 14, 15], and many more. However, when dealing with non-stationary stochastic processes, the values of the time series might go outside the UoD as defined from the training data. Furthermore, the initial setting of the fuzzy sets may become inadequate over time due to the lack of mechanisms that will allow the membership functions to adapt to the varying behavior of the time series [16].

In [8], Song and Chisom presented an approach to induce time-variant Fuzzy Time Series, by retraining the FTS model in a sliding window. Thus, every time a new data point is fed to the model, the FTS has to be retrained from scratch. In many cases, this requirement may render the methods impractical due to the related computational costs.

Recently, in [17], a first attempt was made to avoid retraining in FTS when they are applied to non-stationary environments. In this approach, Non-Stationary Fuzzy Sets (NSFS) [16] were employed to forecast heteroskedastic time series with unconditional variance, i.e., time series where the variance changes through time in a predictable way. Their approach, however, is not able to account for conditional variances and scenarios with concept-drift.

Given the need to produce adaptive models for forecasting in non-stationary environments [18], in this paper, we introduce a Non-Stationary Fuzzy Times Series (NSFTS) method with time varying parameters adapted from the data distribution.

In the proposed approach, the FTS is also built on Non-Stationary Fuzzy Sets [16]. Based on the residual errors, different perturbation mechanisms adapt the membership functions in response to statistical changes in the time series. Thus, the fuzzy sets will reflect changes in the stochastic data generating process without model retraining. Differently from [17], the proposed mechanisms give to the FTS the ability to handle non-stationary and heteroskedastic data as well as scenarios with concept-drift.

To validate the proposal, different data sets consisting of market indices, FOREX pairs, cryptocurrency exchange rates and synthetic data were used. These data sets were selected because they are all non-stationary and present different types of concept drift. The forecast accuracy of the proposed method was also compared with other methods.

The remainder of this work is organized as follows: in Section 2, the main concepts of Fuzzy Time Series and non-stationary Fuzzy Sets are introduced; in Section 3, time variant methods are discussed; in Section 4, the NSFTS method is presented and Section 5 discusses the results of the computational experiments performed to compare the proposed method against the others methods. Finally, in Section 6, the main findings of the research are synthesized.

2 Preliminaries

2.1 Non-stationary Time Series

Time series data observed in different real-world applications are often non-stationary. Given that a stationary time series is defined in terms of its mean and variance, non-stationarity can be detected if any (or both) of these components vary over time. Thus, in a non-stationary context, if the chosen forecasting model relies on a false assumption of stationarity, there is a significant risk that it will present a degrading performance over time and eventually become obsolete [7].

Some non-stationary time series are consequence of hidden contexts, not given in the form of predictive features [19]. In such situations, inferring a forecasting model becomes more difficult, since changes in the hidden context can induce unpredictable changes in the target concept. These changes are known as concept drift and since they usually cannot be identified explicitly [7], an effective forecasting model should be able to quickly adapt to the resulting changes in the time series data.

A symptom of concept drift in a time series is the change of the parameters that define its distribution [20]. Following [20], the change in the distribution can be classified with respect to the rate at which the drift occurs. For instance, a political event that suddenly causes a strong effect on the stock market is a case of abrupt concept drift observed in a time series. Another example is the aging effects of a sensor which gradually leads to lower performance in a device. Such case can be referred to as gradual concept drift. According to Ditzler et al. [7], the drifts, whether abrupt or gradual, can also be classified as permanent or transient. The former is related to the effect of variation, and is not limited in time, while the latter occurs in a limited time window followed by an effect of disappearance.

Due to the practical importance of problems with varying statistical properties, the literature has presented some alternatives to handle the issue. Popular approaches commonly used in forecasting methods, including FTS, are: (i) detrending [21, 22], in which a data transformation is applied to the time-series to remove the trend component and (ii) time-varying models derived from the distribution of the raw data [23].

2.2 Fuzzy Time Series

Symbol	Description
$\Omega$	Fuzzy time series order (lags)
$k$	Number of fuzzy sets for partitioning the universe of discourse
$\mathcal{M}$	Time series model
$t$	Time instant
$y(t)$	Time series value at time $t$
$\hat{y}(t)$	Estimated time series value at time $t$
$U$	Universe of discourse
$A_{j}$	j-th fuzzy set
$\mu_{A_{j}}$	Membership function of fuzzy set $A_{j}$
$T$	Number of time steps
$f(t)$	Fuzzified value of the series at time $t$
$c$	Fuzzy set midpoint
$W$	Window of observations
$\pi(\cdot)$	Perturbation function for the non-stationary fuzzy sets
$\mathcal{E}$	Set of residuals
$\tilde{A}$	Linguistic variable
$R$	Refreshing interval
$\delta$	Displacement applied to the fuzzy set
$\rho$	Scaling factor
$w$	Window size
$n$	Sample size
$LHS$	Set of fuzzy sets in the left hand side of a fuzzy logical relationship
$RHS$	Set of fuzzy sets in the right hand side of a fuzzy logical relationship

Table 1: Convention of symbols

Since the introduction of the Fuzzy Time Series in [8], several categories of FTS methods have been proposed, varying mainly by their order $\Omega$ , number of fuzzy sets $k$ and time-variance [18] – see Table 1 for the convention of symbols adopted here. The order is defined by the number $\Omega$ of time-delays (lags) that are used in modeling the time series. The time variance defines whether the FTS model changes over time, with the Time Invariant models $\mathcal{M}^{t}$ having the same parameters for all $t=0,\ldots,T$ , and the Time Variant models $\mathcal{M}^{t}$ having different parameters in different time instants $t$ .

Given an univariate time series $Y\in\mathbb{R}$ , where $y(t)\in Y$ are the instances of $Y$ for $t=0,1,\ldots,T$ , the UoD is limited by the known bounds of $Y$ , such that $U=[\min(Y),\max(Y)]$ . The training procedure of an FTS model $\mathcal{M}$ consists of the following three steps:

a)

Partitioning: split $U$ in $k$ overlapping intervals. For each interval a new fuzzy set $A_{j}$ is created, each one with its own membership function (MF) $\mu_{A_{j}}$ . A linguistic value $\tilde{A}$ is assigned to each fuzzy set and represents a region of $U$ . The computational cost of this step is $O(k)$ .
b)

Fuzzification: maps the crisp time series $Y$ onto the fuzzified time series $F$ , by replacing each $y(t)\in Y$ by the fuzzified value $f(t)=\mu_{A_{j}}(y(t))$ , $\forall A_{j}\in\tilde{A}$ , for $t=1,\ldots,T$ . The computational cost of this step is $O(T\cdot k)$ .
c)

Knowledge Extraction: creates a representation of the sequential patterns in the time series. In a rule-based FTS, as in [24], the rules have a format $A_{i}^{t-\Omega},\ldots,A_{i}^{t-1}\rightarrow A_{j},A_{k},\ldots$ , where $A_{i}^{t-\Omega},\ldots,A_{i}^{t-1}\in\tilde{A}$ is the precedent and $A_{j},A_{k},\ldots\in\tilde{A}$ is the consequent. The rule can be read as “IF $y(t-\Omega)$ is $A_{i}^{t-\Omega}$ AND $\ldots$ AND $y(t-1)$ is $A_{i}^{t-1}$ THEN $y(t)$ may be $A_{j},A_{k},\ldots$ ”. The computational cost of this step is $O(T\cdot k^{\Omega})$ .

Once the model $\mathcal{M}$ is trained it can be used to forecast values of $Y$ given a sample $y(t-\Omega),\ldots,y(t-1)$ with a three step procedure:

a)

Fuzzification: maps the crisp sample $y(t-\Omega),\ldots,y(t-1)$ onto the fuzzified values $f(t-\Omega),\ldots,f(t-1)$ , where each $f(t)=\mu_{A_{j}}(y(t))$ , $\forall A_{j}\in\tilde{A}$ , for $t=\Omega.\ldots,1$ . The computational cost of this step is $O(\Omega\cdot k)$ .
b)

Rule Matching: find in $\mathcal{M}$ the rules whose the precedent matches with the fuzzified values $f(t-\Omega),\ldots,f(t-1)$ . The activation $\mu_{j}$ of each rule $j$ is the minimum T-norm of the individual membership values of each fuzzified value. The computational cost of this step depends on the length of the sample ( $\Omega$ ) and the number of rules in $\mathcal{M}$ ( $k^{\Omega}$ ), then the complexity is $O(\Omega\cdot k^{\Omega})$ .
c)

Defuzzification: the estimated value of $\hat{y}(t)$ is calculated by finding the mean value $c_{j}$ of each matched rule $j$ , by averaging the midpoints of the consequent fuzzy sets, and then calculating the sum of the mean values of each rule weighted by its activation values

$\hat{y}(t)=\frac{\sum_{j}\mu_{j}\cdot c_{j}}{\sum_{j}\mu_{j}}$

Many improvements were proposed in the FTS literature. The High-Order FTS (HOFTS) [12] extended the classical method in [24] by using several lags in the forecast and it is able to recognize more complex patterns in the time series. In [25], [26] and [27] weights are added in the consequent of the rules in order to give more importance for certain fuzzy sets. More recently, the Probabilistic Weighted FTS (PWFTS) was proposed in [28], and made available in the pyFTS library [29], including weights in both precedent and consequent of the FTS rules, achieving high accuracy and outperforming traditional forecasting approaches.

In order to represent forecasting uncertainty, [14], [28] and [30] proposed approaches for probabilistic forecasting. In [31] and [32] distributed FTS variants are proposed for Big Data time series. Multivariate time series are explored in [33], which uses Fuzzy Information Granules (FIG) to propose a multivariate forecasting method. In [18] a survey on the design of FTS forecasting models was provided. A comprehensive review of those aforementioned models, focused on time-invariant and rule-based approaches can be found in [32].

The major hyper-parameters of FTS methods are the number of fuzzy sets $k$ and the order of the model $\Omega$ . These hyper-parameters conduce the model training and forecasting and are responsible for the accuracy and model parsimony (the number of rules). A method for hyperparameter tuning of FTS models is presented in [34].

The FTS approaches listed in this section are all time-invariant approaches which means that the models are trained only once and then its internal rule base does not change. To keep their accuracy, these models must make strong assumptions about the stationarity and homoskedasticity of the time series. In non-stationary environments, this is a major drawback.

To better understand the behavior of the FTS methods in non-stationary scenarios, Figure 1 shows the performances of several methods in the literature when the test data falls out of the known UoD. For this example, we used the NASDAQ dataset. It can be seen that this situation makes most of the trained models useless in the long run.

Refer to caption — Figure 1: Sample of models forecasts in a concept drift scenario. The dataset (Original) is NASDAQ. The models are: Fuzzy Time Series (FTS) [8]; Conventional FTS (CFTS) [24]; Weighted FTS (FTS) [25]; Improved Weight FTS (IWFTS) [35]; Trend-Weighted FTS (TWFTS) [36]; Exponentially Weighted FTS (EWFTS) [27]; High Order FTS (HOFTS) [12]; Weighted High Order FTS (WHOFTS) order 2 and 3 [32]; Hwang [37] order 2 and 3.

2.3 Non-stationary Fuzzy Sets

Non-stationary fuzzy reasoning and non-stationary fuzzy sets (NSFS) were introduced by Garibaldi and Ozen and Garibaldi, Jaroszewski and Musikasuwan, respectivelly in [38] and [16]. The main idea is to extend the traditional fuzzy set definition by introducing a dynamic component that changes the membership function $\mu$ over time. This change takes several forms: a) variation in location: by displacing the parameters of the $\mu$ function along the UoD without changing its shape; b) variation in width: changing the shape of $\mu$ by stretching or contracting its bounds; and c) noise variation: by adding random noise to the membership grade.

A NSFS is defined with two functions: the non-stationary membership function (NSMF), which considers time variations of the corresponding membership function (MF), and the perturbation function, which is the dynamic component responsible for altering the parameters of the membership function given some parameter set.

According to Garibaldi et al. [16] a non-stationary fuzzy set $\dot{A}$ can be formalized as follows:

\dot{A}=\int_{t\in T}\int_{x\in X}\mu_{\dot{A}}(t,x)dxdt

(1)

where $\dot{A}$ is a fuzzy set over a UoD $X$ characterized by a NSMF $\mu_{\dot{A}}(t,x)$ at a set of time points $T$ .

It is worth noting that the NSMF $\mu_{\dot{A}}$ varies throughout $U$ and the time interval $T$ . This means that the NSMF parameter set should vary over time. A regular MF, $\mu_{A}(x)$ , can be expressed as $\mu_{A}(x,p_{1},\ldots,p_{m})$ , where $p_{1},\ldots,p_{m}$ denote the parameters of $\mu_{A}(x)$ . Thus, a NSMF can be denoted in an analogous way as follows:

\mu_{\dot{A}}(t,x)=\mu_{A}(x,p_{1}(t),\ldots,p_{m}(t))

(2)

where each parameter can be varied over time by a perturbation function multiplied by a constant.

One of the constraints existing in several FTS models is the lack of ability to deal with conditional variances and concept-drift scenarios, in which the statistical characteristics of the time series change, sometimes drastically. Thus, through small variation in the MFs, we deploy NSFS in FTS models to deal with variability for decisions over time and contributing to the mitigation of this drawback.

3 Time Variant Approaches

Time-variant FTS models should be applied when the data is not compliant with the stationarity and homoskedasticity assumptions. They include incremental, flexible and evolving techniques for adapting the model to the input data [18, 39, 40].

In the seminal work of Song and Chissom on time variant FTS [39], time-variance can be seen as a meta-modeling technique. It is not a proper FTS model, it is a training policy for another FTS method which controls when this method will be retrained and how many lags will be used. More specifically, the time variant approach defines $W$ , the length of the memory window, and $R$ , the refreshing interval. Thus, the chosen FTS model is built from scratch every $R$ time instants using the most recent $W$ observations of the time series.

This first time-variant FTS approach [39] was followed by several other authors who have mixed its training policy with different knowledge models and weighting schemes [40, 41, 42, 43].

In Figure 2 (left) it is easy to see that all FTS time invariant approaches can be combined with the time variant method and used as the internal model. However, two major drawbacks of the classical Song and Chissom’s method can be highlighted. The first one is its limited memory. Once a new data point arrives the previous knowledge base is completely discarded. This, in turn, may lead to catastrophic forgetting of frequent patterns if the parameters $W$ and $R$ are not tuned correctly. The second one is its high computational cost. Using a binary search tree structure to organize the $k$ fuzzy sets, the time complexity for a search among them decreases from $O(k)$ to $O(\log k)$ . Thus, for a given input of size $T$ , the complexity is $O(T/R\cdot W\cdot(\log k)^{\Omega})$ , given that its internal model will be retrained $T/R$ times with a potential cost of $O(W\cdot(\log k)^{\Omega})$ .

The Incremental Ensemble, see Figure 2 (right), is an alternative to control the limited memory of the Song and Chissom method by balancing the learning of new behaviors with the memory of the old ones [2, 44, 4, 20]. The Incremental Ensemble is, in fact, a meta-model containing $M$ internal models. It is also controlled by the $W$ and $R$ parameters. At each interval of $R$ observations a new model $\mathcal{M}_{t}$ is built with the last $W$ observations. $\mathcal{M}_{t}$ is then appended to the ensemble while the oldest model $\mathcal{M}_{t-M}$ is discarded.

The generalization of Song and Chissom’s method as well as the Incremental Ensemble using FTS as internal models can be seen in Figure 2. As in the Song and Chissom’s case, the major drawback of the incremental ensemble techniques is its computational cost.

4 The Non-Stationary Fuzzy Time Series method

The proposed Non-Stationary Fuzzy Time Series method extends the concepts of the Conventional FTS method [24] to incorporate Non-Stationary Fuzzy Sets presented by Garibaldi et al. [16]. In the proposed forecasting procedure, the mean and variance of the residuals are used to translate and scale the sets to adapt them to the changes occurred in the data after the training process. The parameters of the fuzzy sets change towards canceling the mean of the residuals. If the variance of residuals is high, the range of the fuzzy sets should be reduced to reduce granularity. Moreover, if the bounds of U change, the sets are adapted to respond to this change as well.

By convention, it is assumed that $Y\in\mathbb{R}$ is a real valuated univariate time series and $y(t)\in Y$ a single data point indexed by time index $t\in\mathbb{N}$ . It is also assumed that all fuzzy sets have a membership function, $\mu$ , with triangular shapes, as defined in equation (3), where $x\in Y$ is the input value and $l,c,u\in Y$ are, respectively, the lower, midpoint and the upper basis of the triangle.

\mu(x,l,c,u)=\left\{\begin{array}[]{lcr}0&if&x<l\;or\;x>u\\ \frac{x-l}{c-l}&if&l\leq x\leq c\\ \frac{u-x}{u-c}&if&c\leq x\leq u\end{array}\right.

(3)

Furthermore, all NSFS have a perturbation function $\pi$ , defined in (5), over the parameter set $\{l,c,u\}$ of $\mu$ , where $\delta\in\mathbb{R}$ is the displacement and $\rho\in\mathbb{R}$ is the scale increment. $\pi$ shifts the triangular function across the domain of $Y$ and scales it by moving the triangle bounds with respect to the center as shown in Figure 3. Thus, the NSMF function is given by:

\mu(x,\pi(l,c,u,\delta,\rho))

(4)

where,

\pi(l,c,u,\delta,\rho)=\left\{\frac{\rho}{2}-(l+\delta)\;,\;c+\delta\;,\;\frac{\rho}{2}+(u+\delta)\right\}

(5)

The proposed method, depicted in Figure 4, consists of training, parameter adaptation and forecasting procedures.

In the training procedure, the $U$ partitioning creates static fuzzy sets. In a forecasting model, a normal distribution of the residuals suggests that its predictive ability is consistent over the entire data range. Therefore the objective of the training procedure is to generate a model that captures all the information in the data, leaving a residual $\mathcal{E}\sim N(0,1)$ . This is done by defining the $l$ , $c$ and $u$ parameters for each one of the $k$ fuzzy sets and using them to extract temporal patterns from the data. The $w$ most recent residuals of the model are stored in the set $\mathcal{E}$ which consists of the differences between the model forecasts and the new data collected in that window.

Given a non-stationary scenario, it is unlikely that a model with predefined and fixed parameters could manage to keep its residuals normally distributed. In the parameter adaption procedure, the mean and variance of the residuals are monitored and used to adapt the MFs. The NSFS is perturbed in order to keep the residuals as close to $\mathcal{E}\sim N(0,1)$ as possible. The goal of this procedure, is to find the best values for the parameters $\delta$ and $\rho$ of each NSFS $\pi$ function in order to adapt them to the changes in the data.

The forecasting procedure finds the rules that match a given numerical input and use them to compute a numerical forecast using the non-stationary fuzzy sets perturbed by the parameters $\delta$ and $\rho$ .

It can be noticed that the rationale behind the proposed method is to keep the residuals normally distributed. In this context, the main technical contribution of this paper is to introduce the adaptation procedure which will keep $\mathcal{E}\sim N(0,1)$ . This procedure is detailed in section 4.2. For completeness of presentation, the training and forescasting procedures defined in [24] are reproduced in subsections 4.1 and 4.3, respectively.

4.1 Training Procedure

Given the training data, $Y$ , the number of partitions, $k$ , and the length of the residuals window, $w$ :

Step 1

Defining the universe of discourse, $U$ :

$U=[lb,ub]$ (6)

where, $lb=\min(Y)-\min(Y)\times 0.2$ and $ub=\max(Y)+\max(Y)\times 0.2$ .

Notice that the data bounds are extrapolated to compensate for a possible underestimation of the bounds in the training set. The value $0.2$ is a typical value, but can be modified according to the problem.
Step 2

$U$ Partitioning: Define the midpoints $c_{i},i=0,...,k-1$ of the initial fuzzy sets;

$c_{i}=lb+i\times\frac{(ub-lb)}{(k-1)}$ (7)

Step 3

Define the linguistic variable $\tilde{A}$ : Create $k$ overlapping fuzzy sets $A_{i}$ , with triangular MF $\mu_{A_{i}}$ as defined in equation (8).

\mu_{A_{i}}(x)=\left\{\begin{array}[]{lcr}0&if&x<l\;or\;x>u\\ \frac{x-l_{i}}{c_{i}-l_{i}}&if&l_{i}\leq x\leq c_{i}\\ \frac{u_{i}-x}{u_{i}-c_{i}}&if&c_{i}\leq x\leq u_{i}\end{array}\right.

(8)

where $l_{i}=c_{i-1}$ and $u_{i}=c_{i+1}$ .

Each fuzzy set $A_{i}\in\tilde{A}$ is a linguistic term of the linguistic variable $\tilde{A}$ . Once the fuzzy sets are created, a function $\pi_{i}$ , as defined in equation (5), will be associated with each fuzzy set in order to transform it into a NSFS, initialized with $\delta=0$ and $\rho=0$ .

The number of sets $k$ defines the number of states and consequently the number of state transitions that the model can represent. The more complex the time series the greater the number of $k$ should be. One should, however, be careful to not overestimate $k$ since it may cause overfitting and make the model unnecessarily big.

Step 4

Fuzzification: Transform the original time series data $Y=\{y(0),y(1),\ldots,y(T)\}$ into a fuzzy time series $F=\{f(0),f(1),\ldots,f(T)\}$ , where $f(t)$ is defined as:

$f(t)=\{\mu_{A_{0}}(y(t)),\mu_{A_{1}}(y(t)),...,\mu_{A_{k-1}}(y(t))\}$ (9)
Step 5

Generate the temporal patterns: The fuzzy temporal patterns have format $A_{l}\rightarrow A_{r}$ , where:

The precedent (or Left Hand Side - LHS) is:

$A_{l}=\underset{A_{i}}{\arg\max}\left(\mu_{A_{i}}(y(t-1))\right)$ (10)

And the consequent (or Right Hand Side - RHS) is:

$A_{r}=\underset{A_{i}}{\arg\max}\left(\mu_{A_{i}}(y(t))\right)$ (11)
Step 6

Generate the rule base: Select all temporal patterns with the same precedent and group their consequent sets creating a rule with the format $A_{l}\rightarrow A_{a},A_{b},...$ . Thus the rule $RHS$ can be understood as the set of possibilities which may happen on time $t+1$ (the consequent) when a certain set $A_{l}$ is identified on time $t$ (the precedent).
Step 7

Compute the residuals $\epsilon$ : Invoke the Forecasting Procedure defined on Section 4.3 using the given training data, $Y=\{y(0),...,y(T)\}$ , as the input and forecast the last $w$ items to calculate the set of residuals $\mathcal{E}$ defined as:

$\mathcal{E}=\{\epsilon(t-w),\epsilon(t-(w-1)),...,\epsilon(t)\}$ (12)

where $\epsilon(t)=y(t)-\hat{y}(t)$ and $\hat{y}(t)$ is the forecast produced by the model.

4.2 Parameter Adaption Procedure

Given an input value $y(t)$ , the last forecast value $\hat{y}(t)$ , the length of the residuals vector $w$ , the residuals set $\mathcal{E}$ and the linguistic variable $\tilde{A}$ , do:

Step 1

Find the displacements of $y(t)$ on $U$ : If $y(t)$ is below the lower bound of $U$ store the difference in $d_{l}=lb-y(t)$ , otherwise $d_{l}=0$ . If $y(t)$ is above the upper bound of $U$ , store the difference in $d_{u}=y(t)-ub$ , otherwise $d_{u}=0$ . Then compute the displacement range $r$ as the sum of the UoD displacements $d_{u}$ and $d_{l}$ , according with (13), and the displacement midpoint $mp_{r}$ as $r$ divided by 2, according with (14)

$r=d_{u}-d_{l}$ (13)

$mp_{r}=r/2$ (14)
Step 2

Compute the mean $\overline{\mathcal{E}}$ and variance $\sigma_{\mathcal{E}}$ of the set $\mathcal{E}$ : The residual mean $\overline{\mathcal{E}}$ indicates a bias, a shift on the accuracy of the trained model that must be corrected. The variance, $\sigma_{\mathcal{E}}$ , represents the shift on the trained model variance, as a reflection to a change in the test data. These values will be used to adjust the position and length of the fuzzy sets.

Step 3

Compute the displacements, $\delta_{i}$ : The displacement is computed for each fuzzy set $A_{i}$ in order to equally move the $k$ fuzzy sets by the mean shift $\overline{\mathcal{E}}$ , one fraction of the displacement range $[d_{l},d_{u}]$ and proportionally to the expansion of the variance in the interval $[-\sigma_{\mathcal{E}},\sigma_{\mathcal{E}}]$ :

\delta_{i}=\overline{\mathcal{E}}+\left(i\frac{r}{k-1}-mp_{r}\right)+\left(i\frac{2\sigma_{\mathcal{E}}}{k-1}-\sigma_{\mathcal{E}}\right)

(15)

The displacements $\delta_{i}$ , for $i=0\ldots k-1$ , are equally distributed in the interval $[\overline{\mathcal{E}}-d_{l}-\sigma_{\mathcal{E}}\;,\;\overline{\mathcal{E}}+d_{u}+\sigma_{\mathcal{E}}]$ and indicate the new position of fuzzy set $A_{i}$ , given the deviations from the known UoD, whose range is $r$ and its midpoint $mp_{r}$ , and the error signal with the mean $\overline{\mathcal{E}}$ and variance $\sigma_{\mathcal{E}}$ . Indeed, while the term $i(r/(k-1))-mp_{r}$ is used to translate the midpoint of $A_{i}$ by a fraction of $r$ centered in $mp_{r}$ , the term $i(2\sigma_{\mathcal{E}}/(k-1))-\sigma_{\mathcal{E}}$ is used to offset the scaling of the fuzzy set bounds by a fraction of $\sigma_{\mathcal{E}}$ centered in $0$ .

Step 4

Compute the scaling factor $\rho_{i}$ , Eq. (16): For each fuzzy set $A_{i}\in\tilde{A}$ the scale increment is empirically calculated as the distance between the displacements $\delta_{i}$ , in order to adjust the fuzzy set lengths proportionally to their displacements, avoiding discontinuities between the sets (intervals on $U$ not covered by any fuzzy set) by setting $l_{i}=c_{i-1}$ and $u_{i}=c_{i+1}$ :

$\rho_{i}=|\delta_{i-1}-\delta_{i+1}|$ (16)
Step 5

Update $\epsilon$ : Get the last forecast value $\hat{y}(t+1)$ and the last known value $y(t+1)\in Y$ . Compute the error term $\epsilon(t+1)=y(t+1)-\hat{y}(t+1)$ , push it on to the residuals set $\mathcal{E}$ and delete the oldest value. That is:

$\mathcal{E}=\mathcal{E}\setminus\epsilon(t-w)\cup\epsilon(t+1)\$ (17)

4.3 Forecasting Procedure

Given an input value $y(t)$ , the linguistic variable $\tilde{A}$ , the inferred rule set and the perturbation parameters $\delta_{i}$ and $\rho_{i}$ for each fuzzy set $A_{i}\in\tilde{A}$ :

Step 1

Fuzzification: Compute the membership grade $\mu_{A_{i}}$ for each non-stationary fuzzy set $A_{i}$ , such that $\mu_{A_{i}}=\mu_{A_{i}}(y(t),\pi(l_{i},c_{i},u_{i},\delta_{i},\rho_{i}))$ ;
Step 2

Rule matching: Build a set, $\mathcal{S}$ with all the rules $A_{j}\rightarrow RHS_{j}$ where $\mu_{A_{j}}(y(t))>0$ , that is:

$\mathcal{S}=\{A_{j}\rightarrow RHS_{j}|\mu_{A_{j}}(y(t))>0\}$ (18)
Step 3

Defuzzification: Compute the forecast $\hat{y}(t+1)$ according to Equation (19) as the weighted sum of the rule mid-points, $mp$ , by their membership grades $\mu_{j}$ for each selected rule $j$ :

$\hat{y}(t+1)=\sum_{A_{j}\rightarrow RHS_{j}\in\mathcal{S}}\mu_{A_{j}}(y(t))\cdot mp(RHS_{j})$ (19)

where

$mp(RHS)=\dfrac{\sum_{A_{i}\in RHS}c_{A_{i}}}{|RHS|}$ (20)

4.4 Method Discussion

The NSFS has the ability to dynamically adapt membership functions as the statistical properties of the time series vary. The NSFTS method tries to detect these changes by using the displacements of the input data $y(t)$ in relation to the known $U$ and the residuals statistics to identify bias and variance shifts.

The major assumption of the method is that adjusting the fuzzy sets, without modifying the model structure represented by the learned rules, is enough to adapt the model to the new behavior of the time series. Such procedures, described in subsection 4.2, given an input of size $T$ , memory window length $W$ , refreshing interval $R$ and $k$ fuzzy sets, have a time complexity of $O(T/R\cdot W\cdot\log k)$ , while a retraining process would take $O(T/R\cdot W\cdot(\log k)^{\Omega})$ , as discussed in Section 3. Hence, this approach becomes much cheaper computationally, when compared to retraining a new model from scratch.

The $\pi$ function entails an additive approach for the translation and scaling of the parameters based on grid partitioning of the UoD. The procedure calculates the translation increment proportionally to the displacement of the input value in relation to the known $U$ and the variance of the residuals. The scaling increment works as a heuristic to keep the grid aspect of the fuzzy sets after the translation, connecting the lower and upper bounds with the centers of adjacent fuzzy sets.

Figure 5 depicts the forecasting method applied to a test sample with significant drift from the training data extracted from SP500 time series, later explained in Section 5.

In Figure 5a the partitioning of the UoD is shown with 15 partitions. The generated forecasts are presented in Figure 5b and their residuals in Figure 5c. The perturbations on the NSFS, in response to the previous residuals, are represented in Figure 5d, where the same fuzzy sets represented in Figure 5a are now colored over the vertical axis. It is possible to see that these fuzzy sets move, sometimes expanding sometimes contracting, in order to fit to the data. As the drift increases, the displacement and scaling of the fuzzy sets also grow.

The proposed model is a case of active learning, in which the learning algorithm is able to interactively obtain, from different sources of information, the necessary inputs for the generation or improvement of its learning. Basically, it uses residual error values to fit a predefined model. The version here presented focuses on highlighting its adaptability to changes observed over time. It uses first-order fuzzy rules, that is, considering only a previous observation, and is applied to forecasting problems in univariate time series. However, its time variant model characteristics can be expanded or adapted to other models for better prediction values. For instance, the model can be expanded to comprehend a high-order fuzzy rule generation. It also can be adapted to fit parameters of other FTS models, such as PWFTS, used as benchmark in this work.

5 Computational Experiments

5.1 Experimental Design

Different datasets were chosen for model validation. The datasets consisted of four stock market indices (Dow Jones, NASDAQ, SP500 and TAIEX), three FOREX pairs (EUR-USD, EUR-GBP, GBP-USD), two cryptocoins exchange rates (Bitcoin-USD and Ethereum-USD) illustrated in A (Fig. 6) and eight synthetic time series with concept drifts, see B (Fig. 7).

The market indexes data sets contain the daily averaged index by business day, such that the Dow Jones Industrial Average (Dow Jones)¹¹1https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC. Accessed in 11/11/2019 is sampled from 1985 to 2017 time window, the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)²²2https://www.twse.com.tw/en/page/products/indices/series.html. Accessed in 11/11/2019 is sampled from 1995 to 2014, the National Association of Securities Dealers Automated Quotations - Composite Index (NASDAQ ÎXIC)³³3https://www.nasdaq.com/market-activity/index/ixic. Accessed in 12/11/2019 is sampled from 2000 to 2016 and the SP500 - Standard & Poor’s 500⁴⁴4https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC. Accessed in 12/11/2019 is sampled from 1950 to 2017. The FOREX data sets contain the daily averaged quotations, by business day, from 2016 to 2018, which pairs are the US Dollar to Euro (USD-EUR), Euro to Great British Pound (EUR-GBP) and Great British Pound to US Dollar (GBP-USD). The cryptocoin datasets contain the daily quotations, in US Dollar, of the Bitcoin (BTC-USD) and Ethereum (ETC-USD). The synthetic data aims to represent different types of concept drifts.

The Augmented Dickey-Fuller (ADF) test was used to check which time series are non-stationary. The Levene’s test was used to test whether the series are heteroskedastic or not. The detailed results of these tests are shown in C and D, respectively. In summary, except for the data sets, “Stationary Signal”, “Stationary Signal with blip” and “Sudden variance” (see B (Fig. 7)) all the benchmarks were shown to be non-stationary and only the “Stationary Signal with blip” homogeneity of variances with a confidence level of 95%.

The standard accuracy metrics used to evaluate point forecasting methods are the Root Mean Squared Error (RMSE) (21), Mean Absolute Percentage Error (MAPE) (22) and Theil’s U statistic (U) (23) where $y$ are the target values, $\hat{y}$ are the forecast values and $n$ the sample size. These metrics are commonly used in evaluating forecasting models [18].

RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}}

(21)

MAPE=\frac{1}{n}\sum_{i=1}^{n}\Big{|}\frac{y_{i}-\hat{y}}{y_{i}}\Big{|}

(22)

U=\frac{\sqrt{\sum_{i=1}^{n}(y_{i}-\hat{y})^{2}}}{\sqrt{\sum_{i=1}^{n}y_{i}^{2}}+\sqrt{\sum_{i=1}^{n}\hat{y}_{i}^{2}}}

(23)

The proposed NSFTS was tested against the Time Variant [39] and the Incremental Ensemble approaches, both using the PWFTS method [28] available in [29] as internal method. As can be seen in [32], the PWFTS outperformed a wide number of forecasting methods ranging from classic statistical ones such as ARIMA [45] to more recent ones such as the k-Nearest Neighbors with Kernel Density Estimation [46]. The parameters of all methods are presented in Table 2, and they were defined through grid search.

Method	Parameter	Value
All	k	35
All	$\Omega$	1
Song and Chissom & Incremental Ensemble	W	100
Song and Chissom & Incremental Ensemble	R	10
Incremental Ensemble	M	2

Table 2: Benchmarking parameters

The grid partitioning scheme was used for the initial generation of fuzzy sets, where the best number of partitions in the range $[5,100]$ was selected for each data set. The high order methods were tested with orders 2 and 3.

5.2 Results

The average RMSE, MAPE and U statistic results by method and data set are presented in Table 3. It can be seen that the NSFTS is superior or, at least, not worse than the other methods in all the selected real world benchmarks.

From the results on the artificial data sets, it can be seen that the NSFTS has more difficulties than the other methods in handling sudden changes in the series distribution mean. This behavior is expected from the NSFTS once it is an adaptive method and therefore presents a delay to respond to changes. On the other hand, it handles with relative ease incremental changes in the mean. Observing the Theil’s U statistic which is insensitive to the range of values of the series, one can see that changes in the variance alone do not seem to affect the NSFTS performance by much.

These observations explain the good performance of the NSFTS on the economic series. As one can see in A, these series present a gradual, though significant, variation of the mean along with different types of changes in variance. Apparently, the superior performance of the NSFTS when the mean varies gradually gave it the edge over the other methods.

Table 3: Results of the metrics RMSE, MAPE and U by approach and dataset

Dataset	RMSE			MAPE			U
Dataset	T. Variant	I. Ensemble	NSFTS	T. Variant	I. Ensemble	NSFTS	T. Variant	I. Ensemble	NSFTS
Stationary Signal	0.07	0.07	0.07	1.13	1.16	1.13	0.99	1.01	1.01
Stationary signal with blip	0.16	0.17	0.14	1.29	1.27	1.22	1.14	1.24	1.00
Sudden variance	0.12	0.12	0.12	1.80	1.81	1.73	0.98	1.00	0.98
Sudden mean	0.25	0.55	0.79	2.20	6.00	19.89	1.72	3.85	5.66
Sudden mean and variance	0.59	0.89	0.89	3.82	7.52	19.58	1.22	1.90	1.96
Incremental mean	0.39	7.81	0.11	1.02	14.92	0.69	3.45	71.08	1.04
Incremental variance	51.89	50.51	60.59	151.43	149.56	241.69	0.78	0.78	0.96
Incremental mean and variance	1.29	2.37	1.65	5.06	8.48	5.44	1.03	1.51	1.09
TAIEX	145.82	1130.86	116.82	1.27	9.60	1.34	1.52	11.90	1.24
SP500	9.29	61.22	7.86	0.64	2.66	0.57	1.16	7.73	1.01
NASDAQ	35.09	214.74	36.75	0.90	4.49	1.06	1.25	7.71	1.32
DowJones	71.88	519.55	63.26	0.64	2.80	0.60	1.13	8.25	1.02
BTC-USD	465.72	1775.53	180.90	4.97	32.81	3.40	2.96	11.50	1.19
ETH-USD	44.54	222.21	24.16	7.35	41.47	4.06	2.06	10.84	1.25
EUR-USD	0.01	0.04	0.01	0.43	0.98	0.41	1.22	6.75	1.13
EUR-GBP	0.00	0.01	0.00	0.37	0.48	0.32	1.24	3.61	1.08
GBP-USD	0.01	0.07	0.01	0.41	1.18	0.42	1.23	9.62	1.26

To assess the overall performance of the methods with respect to the RMSE we ran a Friedman Aligned Ranks (F-test), a non-parametric test for the equality of the means was used with $\alpha=0.05$ , where $H_{0}$ indicates that there is no significant difference between the means and $H_{1}$ indicates that there is at least one significant difference between the means. The F-value result was $13.1994$ with a p-value of $0.0013$ . Thus, $H_{0}$ was rejected at 5% confidence level.

A one-versus-all Finner paired post-hoc test was employed to check the method equivalence with NSFTS as control method, where $H_{0}$ indicates that there is no significant difference between the means and $H_{1}$ indicates that there is a significant difference between the means. The results are presented in Table 4 which shows that NSFTS is not equivalent to the Incremental Ensemble method and equivalent to the time variant FTS model, considering a 95% confidence level.

It is important to remind that the NSFTS is computationally cheaper than the Time Variant method since it does not require retraining. Therefore, even though they present equivalent RMSE performance, the NSFTS still has the computational edge.

Table 4: Post hoc multiple comparisons with the NSFTS as control method

Comparison

Z-value

P-value

Adjusted p-value

Result

NSFTS vs

Incremental Ensemble

3.680062

0.000233

0.000466

H_{0}

Rejected

NSFTS vs

Time Variant

0.144203

0.885340

H_{0}

Accepted

6 Conclusion

This paper proposed a Non-Stationary Fuzzy Time Series (NSFTS) method that is able to dynamically adapt its fuzzy sets to reflect the changes in the underlying stochastic processes based on the residual errors. The proposed approach enables the model to be trained only once and remain useful long after.

The parameter adaptation procedure developed here can be integrated into other FTS methods, extending them to deal with forecasting in non-stationary environments.

The method was tested with several non-stationary and heteroskedastic data sets consisting of four market indices, three FOREX pairs, two cryptocoin exchange rates and eight synthetic time-series that present different combinations of concept-drifts.

The main feature delivered by the proposed method is the capability of capturing a wide spectrum of unconditional heteroskedastic effects and time series trends by the variation of several parameters of its internal model. This is done without data pre-processing, without need of retraining and without changing the symbolic structure in the learned rules. In order to contribute to the replication of all the results in the paper, we provide full results and all source codes for this article in Github and in the MINDS Laboratory website. The link is https://github.com/PYFTS/NSFTS.

Acknowledgements

This work was partially financed by the Foundation for Research of the State of Minas Gerais (Fundação de Amparo à Pesquisa do Estado de Minas Gerais - FAPEMIG) and by the National Council for Scientific and Technological Development (CNPq), Brazil, Grants no. 405911/2017-3; no. 169392/2017-1; and no. 306850/2016-8.

References

[1] L. Cohen, G. Avrahami-Bakish, M. Last, A. Kandel, O. Kipersztok, Real-time data mining of non-stationary data streams from sensor networks, Information Fusion 9 (3) (2008) 344–353.
[2] B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, M. Woźniak, Ensemble learning for data stream analysis: A survey, Information Fusion 37 (2017) 132–156.
[3] J. Maia, C. A. Severiano Junior, F. G. Guimarães, C. L. d. Castro, A. P. Lemos, J. C. F. Galindo, M. W. Cohen, Evolving clustering algorithm based on mixture of typicalities for stream data mining, Future Generation Computer Systems 106 (2020) 672–684.
[4] M. Assaad, R. Boné, H. Cardot, A new boosting algorithm for improved time-series forecasting with recurrent neural networks, Information Fusion 9 (1) (2008) 41 – 55, special Issue on Applications of Ensemble Methods. doi:https://doi.org/10.1016/j.inffus.2006.10.009.
[5] X. Qiu, P. N. Suganthan, G. A. Amaratunga, Fusion of multiple indicators with ensemble incremental learning techniques for stock price forecasting, Journal of Banking and Financial Technology 3 (1) (2019) 33–42.
[6] T. Zhao, Q. J. Wang, A. Schepen, M. Griffiths, Ensemble forecasting of monthly and seasonal reference crop evapotranspiration based on global climate model outputs, Agricultural and Forest Meteorology 264 (2019) 114 – 124. doi:https://doi.org/10.1016/j.agrformet.2018.10.001.
[7] G. Ditzler, M. Roveri, C. Alippi, R. Polikar, Learning in Nonstationary Environments: A Survey, IEEE Computational Intelligence Magazine 10 (4) (2015) 12–25. doi:10.1109/MCI.2015.2471196.
[8] Q. Song, B. S. Chissom, Fuzzy time series and its models, Fuzzy Sets and Systems 54 (3) (1993) 269–277.
[9] M. H. Lee, H. Javedani, et al., A weighted fuzzy integrated time series for forecasting tourist arrivals, in: International Conference on Informatics Engineering and Information Science, Springer, 2011, pp. 206–217.
[10] Y. H. Chen, W.-C. Hong, W. Shen, N. N. Huang, Electric load forecasting based on a least squares support vector machine with fuzzy time series and global harmony search algorithm, Energies 9 (2) (2016) 70.
[11] H. J. Sadaei, F. G. Guimarães, C. J. da Silva, M. H. Lee, T. Eslami, Short-term load forecasting method based on fuzzy time series, seasonality and long memory process, International Journal of Approximate Reasoning 83 (2017) 196–217.
[12] C. A. S. Junior, P. C. L. Silva, H. J. Sadaei, F. G. Guimarães, Very short-term solar forecasting using fuzzy time series, in: 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2017, pp. 1–6. doi:10.1109/FUZZ-IEEE.2017.8015732.
[13] S. M. Chen, C. D. Chen, TAIEX forecasting based on fuzzy time series and fuzzy variation groups, IEEE Transactions on Fuzzy Systems 19 (1) (2011) 1–12. doi:10.1109/TFUZZ.2010.2073712.
[14] P. C. L. Silva, H. J. Sadaei, F. G. Guimarães, Interval forecasting with fuzzy time series, in: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016, pp. 1–8. doi:10.1109/SSCI.2016.7850010.
[15] F. M. Talarposhti, H. J. Sadaei, R. Enayatifar, F. G. Guimarães, M. Mahmud, T. Eslami, Stock market forecasting by using a hybrid model of exponential fuzzy time series, International Journal of Approximate Reasoning 70 (2016) 79–98.
[16] J. M. Garibaldi, M. Jaroszewski, S. Musikasuwan, Nonstationary fuzzy sets, IEEE Transactions on Fuzzy Systems 16 (4) (2008) 1072–1086.
[17] M. A. Alves, P. C. d. L. Silva, C. A. Severiano Junior, G. L. Vieira, F. G. Guimarães, H. J. Sadaei, An extension of nonstationary fuzzy sets to heteroskedastic fuzzy time series, in: 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Springer LNCS Series, 2018, pp. 591–596.
[18] M. Bose, K. Mali, Designing fuzzy time series forecasting models: A survey, International Journal of Approximate Reasoning 111 (2019) 78–99.
[19] A. Tsymbal, The problem of concept drift: definitions and related work, Computer Science Department, Trinity College Dublin 106 (2).
[20] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM computing surveys (CSUR) 46 (4) (2014) 44.
[21] I. Kim, S.-R. Lee, A fuzzy time series prediction method based on consecutive values, in: FUZZ-IEEE’99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315), Vol. 2, 1999, pp. 703–707 vol.2. doi:10.1109/FUZZY.1999.793034.
[22] N. E. Huang, M.-L. Wu, W. Qu, S. R. Long, S. S. Shen, Applications of hilbert–huang transform to non-stationary financial time series analysis, Applied stochastic models in business and industry 19 (3) (2003) 245–268.
[23] Y. Liu, B. Wang, H. Zhan, Y. Fan, Y. Zha, Y. Hao, Simulation of nonstationary spring discharge using time series models, Water Resources Management 31 (15) (2017) 4875–4890.
[24] S.-M. Chen, Forecasting enrollments based on fuzzy time series, Fuzzy Sets and Systems 81 (3) (1996) 311–319.
[25] H.-K. Yu, Weighted fuzzy time series models for TAIEX forecasting, Physica A: Statistical Mechanics and its Applications 349 (3) (2005) 609–624. doi:10.1016/j.physa.2004.11.006.
[26] C.-H. Cheng, T. L. Chen, H. J. Teoh, C. H. Chiang, Fuzzy time-series based on adaptive expectation model for TAIEX forecasting, Expert Systems with Applications 34 (2008) 1126–1132. doi:10.1016/j.eswa.2006.12.021.
[27] H. J. Sadaei, R. Enayatifar, A. H. Abdullah, A. Gani, Short-term load forecasting using a hybrid model with a refined exponentially weighted fuzzy time series and an improved harmony search, International Journal of Electrical Power & Energy Systems 62 (2014) 118–129.
[28] P. C. L. Silva, H. J. Sadaei, R. Ballini, F. G. Guimaraes, Probabilistic Forecasting With Fuzzy Time Series, IEEE Transactions on Fuzzy Systems (2019) 1–1doi:10.1109/TFUZZ.2019.2922152.
[29] P. C. d. L. Silva, P. O. Lucas, H. J. Sadaei, F. G. Guimarães, pyFTS: Fuzzy Time Series for Python (2018). doi:10.5281/zenodo.597359.
URL https://doi.org/10.5281/zenodo.597359
[30] P. C. d. L. Silva, M. A. Alves, C. A. Severiano Jr, G. L. Vieira, F. G. Guimarães, H. J. Sadaei, Probabilistic Forecasting with Seasonal Ensemble Fuzzy Time-Series, in: XIII Brazilian Congress on Computational Intelligence, Rio de Janeiro, 2017. doi:10.21528/CBIC2017-54.
[31] P. C. L. Silva, P. O. Lucas, F. G. Guimarães, A Distributed Algorithm for Scalable Fuzzy Time Series, in: R. Miani, L. Camargos, B. Zarpelão, P. R. Rosas, E. (Eds.), Proceedings of 14th International Conference on Green, Pervasive, and Cloud Computing (GPC 2019), Springer Nature Switzerland, Uberlândia, 2019, pp. 1–15. doi:10.1007/978-3-030-19223-5{\_}4.
[32] P. C. d. L. e. Silva, Scalable Models For Probabilistic Forecasting With Fuzzy Time Series, Ph.D. thesis, Universidade Federal de Minas Gerais (9 2019). doi:10.5281/zenodo.3374641.
URL https://doi.org/10.5281/zenodo.3374641
[33] P. C. d. L. e Silva, C. A. Severiano Jr, M. A. Alves, M. W. Cohen, F. G. Guimarães, A new granular approach for multivariate forecasting, in: Latin American Workshop on Computational Neuroscience, Springer, 2019, pp. 41–58.
[34] P. C. L. Silva, P. d. O. e Lucas, H. J. Sadaei, F. G. Guimarães, Distributed evolutionary hyperparameter optimization for fuzzy time series, IEEE Transactions on Network and Service Management (2020) 1–1.
[35] R. Efendi, Z. Ismail, M. M. Deris, Improved weight fuzzy time series as used in the exchange rates forecasting of us dollar to ringgit malaysia, International Journal of Computational Intelligence and Applications 12 (01) (2013) 1350005.
[36] C.-H. Cheng, T.-L. Chen, C.-H. Chiang, Trend-weighted fuzzy time-series model for taiex forecasting, in: International Conference on Neural Information Processing, Springer, 2006, pp. 469–477.
[37] J.-R. Hwang, S.-M. Chen, C.-H. Lee, Handling forecasting problems using fuzzy time series, Fuzzy sets and systems 100 (1-3) (1998) 217–228.
[38] J. M. Garibaldi, T. Ozen, Uncertain fuzzy reasoning: A case study in modelling expert decision making, IEEE Transactions on Fuzzy Systems 15 (1) (2007) 16–30.
[39] Q. Song, B. S. Chissom, Forecasting enrollments with fuzzy time series—part ii, Fuzzy sets and systems 62 (1) (1994) 1–8.
[40] H.-T. Liu, M.-L. Wei, An improved fuzzy forecasting method for seasonal time series, Expert Systems with Applications 37 (9) (2010) 6310–6318.
[41] S. R. Singh, A simple time variant method for fuzzy time series forecasting, Cybernetics and Systems: An International Journal 38 (3) (2007) 305–321.
[42] T. A. Jilani, S. M. A. Burney, A refined fuzzy time series model for stock market forecasting, Physica A: Statistical Mechanics and its Applications 387 (12) (2008) 2857–2862.
[43] T. Vovan, An improved fuzzy time series forecasting model using variations of data, Fuzzy Optimization and Decision Making 18 (2) (2018) 151–173.
[44] J. R. Bertini Junior, M. d. C. Nicoletti, An iterative boosting-based ensemble for streaming data classification, Information Fusion 45 (2019) 66–78.
[45] D. Asteriou, S. G. Hall, Arima models and the box–jenkins methodology, Applied Econometrics 2 (2) (2011) 265–286.
[46] Y. Zhang, J. Wang, K-nearest neighbors and a kernel density estimator for gefcom2014 probabilistic wind power forecasting, International Journal of forecasting 32 (3) (2016) 1074–1080.

Appendix A Appendix

A: Stock market indices

Appendix B Appendix

B: Synthetic data sets

Appendix C Appendix

In order to evaluate the stationarity of the presented data sets, the Augmented Dickey-Fuller (ADF) test for unit root was employed considering a confidence level $\alpha=0.05$ , where $H_{0}$ indicates that the time series have a unit root and it is non-stationary and $H_{1}$ indicates that time series does not have a unit root and it is stationary. The Augmented Dickey-Fuller results for unit root are shown below.

Table 5: Stationarity evalution

Dataset	Augmented Dickey-Fuller
Dataset	Statistic	p-value	$H_{0}$ result
Stationary Signal	-7.427114	6.504708e-11	Rejected
Stationary signal with blip	-7.497758	4.334045e-11	Rejected
Sudden variance	-7.746345	1.029561e-11	Rejected
Sudden mean	-2.112067	2.396902e-01	Accepted
Sudden mean and variance	-1.165176	6.883938e-01	Accepted
Incremental mean	3.286850	1.000000e+00	Accepted
Incremental variance	-24.746787	0.000000e+00	Rejected
Incremental mean and variance	-2.217183	2.000681e-01	Accepted
TAIEX	-2.491767	1.174904e-01	Accepted
SP500	-0.943446	7.733287e-01	Accepted
NASDAQ	0.476224	9.841323e-01	Accepted
DowJones	-0.800893	8.188597e-01	Accepted
BTC-USD	-1.206100	6.709662e-01	Accepted
ETH-USD	-1.852677	3.546403e-01	Accepted
EUR-USD	-1.845986	3.578816e-01	Accepted
EUR-GBP	-1.845986	3.578816e-01	Accepted
GBP-USD	-1.131750	7.022457e-01	Accepted

Appendix D Appendix

To check the homoskedasticity the Levene’s test was employed, which checks for homogeneity of variances, with confidence level $\alpha=0.05$ , where $H_{0}$ indicates that the sub-samples variances of the time series are all equal and $H_{1}$ indicates that at least one variance of the time series sub-samples is different. the Levene’s results for homogeneity of variances are shown below.

Table 6: Homogeneity of variances evaluation

Dataset	Levene’s test
Dataset	Statistic	p-value	$H_{0}$ result
Stationary Signal	10.148665	1.466392e-03	Rejected
Stationary signal with blip	1.935753	1.642857e-01	Accepted
Sudden variance	25.163718	5.733279e-07	Rejected
Sudden mean	6.989502	8.263198e-03	Rejected
Sudden mean and variance	173.802832	4.097776e-38	Rejected
Incremental mean	2954.661000	0.000000e+00	Rejected
Incremental variance	520.174358	1.597361e-102	Rejected
Incremental mean and variance	521.736508	9.142922e-103	Rejected
TAIEX	62.530885	3.366934e-15	Rejected
SP500	64.954050	1.003282e-15	Rejected
NASDAQ	1851.123985	0.000000e+00	Rejected
DowJones	163.413938	1.053157e-36	Rejected
BTC-USD	524.853179	4.352254e-107	Rejected
ETH-USD	666.975156	9.700699e-116	Rejected
EUR-USD	401.104689	6.851098e-86	Rejected
EUR-GBP	401.104689	6.851098e-86	Rejected
GBP-USD	1340.896808	2.812862e-260	Rejected