This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Forecasting in Non-stationary Environments with Fuzzy Time Series

Petrônio Cândido de Lima e Silva [email protected] Carlos Alberto Severiano Junior [email protected] Marcos Antonio Alves [email protected]
Rodrigo Silva
[email protected]
Miri Weiss Cohen [email protected] Frederico Gadelha Guimarães [email protected] [ Machine Intelligence and Data Science (MINDS) Laboratory, Federal University of Minas Gerais, Belo Horizonte, Brazil Federal Institute of Education Science and Technology of Northern Minas Gerais, Januária Campus, Brazil Federal Institute of Education Science and Technology of Minas Gerais, Sabará, Brazil Department of Computer Science, Federal University of Ouro Preto, Ouro Preto, Brazil Department of Software Engineering, Braude College of Engineering, Karmiel, Israel Department of Electrical Engineering, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Abstract

In this paper we introduce a Non-Stationary Fuzzy Time Series (NSFTS) method with time varying parameters adapted from the distribution of the data. In this approach, we employ Non-Stationary Fuzzy Sets, in which perturbation functions are used to adapt the membership function parameters in the knowledge base in response to statistical changes in the time series. The proposed method is capable of dynamically adapting its fuzzy sets to reflect the changes in the stochastic process based on the residual errors, without the need to retraining the model. This method can handle non-stationary and heteroskedastic data as well as scenarios with concept-drift. The proposed approach allows the model to be trained only once and remain useful long after while keeping reasonable accuracy. The flexibility of the method by means of computational experiments was tested with eight synthetic non-stationary time series data with several kinds of concept drifts, four real market indices (Dow Jones, NASDAQ, SP500 and TAIEX), three real FOREX pairs (EUR-USD, EUR-GBP, GBP-USD), and two real cryptocoins exchange rates (Bitcoin-USD and Ethereum-USD). As competitor models the Time Variant fuzzy time series and the Incremental Ensemble were used, these are two of the major approaches for handling non-stationary data sets. Non-parametric tests are employed to check the significance of the results. The proposed method shows resilience to concept drift, by adapting parameters of the model, while preserving the symbolic structure of the knowledge base.

keywords:
Time Series Forecasting , Fuzzy Time Series , Non-stationary Environment , Online learning.
journal: Applied Soft Computing, Elsevier

url]https://minds.eng.ufmg.br/

1 Introduction

The expanding utilization of smart sensors, the increasing availability of data storage, and the emergence of big data have led to an increasing amount of data being produced very often in the form of a stream [1, 2, 3]. In many real-world applications, this data is organized in the form of a time series. In a time series forecasting problem, the information available for the prediction is limited to the past values of the series [1]. Hence, the temporal relationships which describe the evolution of the series must be deduced exclusively from these values.

Generally, the characteristics of the processes which generate a time series are unknown [4]. In several cases of practical interest, such as stock indices [5] in finance, evapotranspiration in agriculture [6], among others, the variable of interest in the time series is in fact the fusion of many other data sources. Not by accident these kinds of time series tend to show highly non-linear and non-stationary patterns [5, 6].

Many forecasting methods assume that the data is generated from a fixed probability distribution. However, as mentioned before, many time-series related applications deal with non-stationary and heteroskedastic stochastic processes which may arise from phenomena such as: seasonality, periodicity, hardware and machine faults, aging sensors and components, and unexpected events. These changes modify the properties of the data generating process, then changing its underlying probability distribution over time. In such non-stationary environments, any non-adaptive model trained under the false stationarity assumption is deemed to present a progressively increasing error or simply fail at some point [7].

Fuzzy Time Series (FTS) was introduced by Song and Chissom in 1993 [8] to handle with vague and imprecise knowledge in time series data. In FTS, the domain of the variable of interest, called Universe of Discourse (UoD), is divided into sub-domains, and each of them is linked to a fuzzy set. After the construction of these fuzzy sets, temporal patterns of the type IF-THEN are extracted from the training data in order to identify a rule-base able to represent the generating function of the time series.

FTS forecasting methods have become attractive due to their simplicity, model transparency, forecasting accuracy and computational performance. Some examples of successful applications are found in tourism demand forecasting [9], energy load [10, 11, 12], stock index price predictions [13, 14, 15], and many more. However, when dealing with non-stationary stochastic processes, the values of the time series might go outside the UoD as defined from the training data. Furthermore, the initial setting of the fuzzy sets may become inadequate over time due to the lack of mechanisms that will allow the membership functions to adapt to the varying behavior of the time series [16].

In [8], Song and Chisom presented an approach to induce time-variant Fuzzy Time Series, by retraining the FTS model in a sliding window. Thus, every time a new data point is fed to the model, the FTS has to be retrained from scratch. In many cases, this requirement may render the methods impractical due to the related computational costs.

Recently, in [17], a first attempt was made to avoid retraining in FTS when they are applied to non-stationary environments. In this approach, Non-Stationary Fuzzy Sets (NSFS) [16] were employed to forecast heteroskedastic time series with unconditional variance, i.e., time series where the variance changes through time in a predictable way. Their approach, however, is not able to account for conditional variances and scenarios with concept-drift.

Given the need to produce adaptive models for forecasting in non-stationary environments [18], in this paper, we introduce a Non-Stationary Fuzzy Times Series (NSFTS) method with time varying parameters adapted from the data distribution.

In the proposed approach, the FTS is also built on Non-Stationary Fuzzy Sets [16]. Based on the residual errors, different perturbation mechanisms adapt the membership functions in response to statistical changes in the time series. Thus, the fuzzy sets will reflect changes in the stochastic data generating process without model retraining. Differently from [17], the proposed mechanisms give to the FTS the ability to handle non-stationary and heteroskedastic data as well as scenarios with concept-drift.

To validate the proposal, different data sets consisting of market indices, FOREX pairs, cryptocurrency exchange rates and synthetic data were used. These data sets were selected because they are all non-stationary and present different types of concept drift. The forecast accuracy of the proposed method was also compared with other methods.

The remainder of this work is organized as follows: in Section 2, the main concepts of Fuzzy Time Series and non-stationary Fuzzy Sets are introduced; in Section 3, time variant methods are discussed; in Section 4, the NSFTS method is presented and Section 5 discusses the results of the computational experiments performed to compare the proposed method against the others methods. Finally, in Section 6, the main findings of the research are synthesized.

2 Preliminaries

2.1 Non-stationary Time Series

Time series data observed in different real-world applications are often non-stationary. Given that a stationary time series is defined in terms of its mean and variance, non-stationarity can be detected if any (or both) of these components vary over time. Thus, in a non-stationary context, if the chosen forecasting model relies on a false assumption of stationarity, there is a significant risk that it will present a degrading performance over time and eventually become obsolete [7].

Some non-stationary time series are consequence of hidden contexts, not given in the form of predictive features [19]. In such situations, inferring a forecasting model becomes more difficult, since changes in the hidden context can induce unpredictable changes in the target concept. These changes are known as concept drift and since they usually cannot be identified explicitly [7], an effective forecasting model should be able to quickly adapt to the resulting changes in the time series data.

A symptom of concept drift in a time series is the change of the parameters that define its distribution [20]. Following [20], the change in the distribution can be classified with respect to the rate at which the drift occurs. For instance, a political event that suddenly causes a strong effect on the stock market is a case of abrupt concept drift observed in a time series. Another example is the aging effects of a sensor which gradually leads to lower performance in a device. Such case can be referred to as gradual concept drift. According to Ditzler et al. [7], the drifts, whether abrupt or gradual, can also be classified as permanent or transient. The former is related to the effect of variation, and is not limited in time, while the latter occurs in a limited time window followed by an effect of disappearance.

Due to the practical importance of problems with varying statistical properties, the literature has presented some alternatives to handle the issue. Popular approaches commonly used in forecasting methods, including FTS, are: (i) detrending [21, 22], in which a data transformation is applied to the time-series to remove the trend component and (ii) time-varying models derived from the distribution of the raw data [23].

2.2 Fuzzy Time Series

Symbol Description
Ω\Omega Fuzzy time series order (lags)
kk Number of fuzzy sets for partitioning the universe of discourse
\mathcal{M} Time series model
tt Time instant
y(t)y(t) Time series value at time tt
y^(t)\hat{y}(t) Estimated time series value at time tt
UU Universe of discourse
AjA_{j} j-th fuzzy set
μAj\mu_{A_{j}} Membership function of fuzzy set AjA_{j}
TT Number of time steps
f(t)f(t) Fuzzified value of the series at time tt
cc Fuzzy set midpoint
WW Window of observations
π()\pi(\cdot) Perturbation function for the non-stationary fuzzy sets
\mathcal{E} Set of residuals
A~\tilde{A} Linguistic variable
RR Refreshing interval
δ\delta Displacement applied to the fuzzy set
ρ\rho Scaling factor
ww Window size
nn Sample size
LHSLHS Set of fuzzy sets in the left hand side of a fuzzy logical relationship
RHSRHS Set of fuzzy sets in the right hand side of a fuzzy logical relationship
Table 1: Convention of symbols

Since the introduction of the Fuzzy Time Series in [8], several categories of FTS methods have been proposed, varying mainly by their order Ω\Omega, number of fuzzy sets kk and time-variance [18] – see Table 1 for the convention of symbols adopted here. The order is defined by the number Ω\Omega of time-delays (lags) that are used in modeling the time series. The time variance defines whether the FTS model changes over time, with the Time Invariant models t\mathcal{M}^{t} having the same parameters for all t=0,,Tt=0,\ldots,T, and the Time Variant models t\mathcal{M}^{t} having different parameters in different time instants tt.

Given an univariate time series YY\in\mathbb{R} , where y(t)Yy(t)\in Y are the instances of YY for t=0,1,,Tt=0,1,\ldots,T, the UoD is limited by the known bounds of YY, such that U=[min(Y),max(Y)]U=[\min(Y),\max(Y)]. The training procedure of an FTS model \mathcal{M} consists of the following three steps:

  • a)

    Partitioning: split UU in kk overlapping intervals. For each interval a new fuzzy set AjA_{j} is created, each one with its own membership function (MF) μAj\mu_{A_{j}}. A linguistic value A~\tilde{A} is assigned to each fuzzy set and represents a region of UU. The computational cost of this step is O(k)O(k).

  • b)

    Fuzzification: maps the crisp time series YY onto the fuzzified time series FF, by replacing each y(t)Yy(t)\in Y by the fuzzified value f(t)=μAj(y(t))f(t)=\mu_{A_{j}}(y(t)), AjA~\forall A_{j}\in\tilde{A}, for t=1,,Tt=1,\ldots,T. The computational cost of this step is O(Tk)O(T\cdot k).

  • c)

    Knowledge Extraction: creates a representation of the sequential patterns in the time series. In a rule-based FTS, as in [24], the rules have a format AitΩ,,Ait1Aj,Ak,A_{i}^{t-\Omega},\ldots,A_{i}^{t-1}\rightarrow A_{j},A_{k},\ldots, where AitΩ,,Ait1A~A_{i}^{t-\Omega},\ldots,A_{i}^{t-1}\in\tilde{A} is the precedent and Aj,Ak,A~A_{j},A_{k},\ldots\in\tilde{A} is the consequent. The rule can be read as “IF y(tΩ)y(t-\Omega) is AitΩA_{i}^{t-\Omega} AND \ldots AND y(t1)y(t-1) is Ait1A_{i}^{t-1} THEN y(t)y(t) may be Aj,Ak,A_{j},A_{k},\ldots”. The computational cost of this step is O(TkΩ)O(T\cdot k^{\Omega}).

Once the model \mathcal{M} is trained it can be used to forecast values of YY given a sample y(tΩ),,y(t1)y(t-\Omega),\ldots,y(t-1) with a three step procedure:

  • a)

    Fuzzification: maps the crisp sample y(tΩ),,y(t1)y(t-\Omega),\ldots,y(t-1) onto the fuzzified values f(tΩ),,f(t1)f(t-\Omega),\ldots,f(t-1), where each f(t)=μAj(y(t))f(t)=\mu_{A_{j}}(y(t)), AjA~\forall A_{j}\in\tilde{A}, for t=Ω.,1t=\Omega.\ldots,1. The computational cost of this step is O(Ωk)O(\Omega\cdot k).

  • b)

    Rule Matching: find in \mathcal{M} the rules whose the precedent matches with the fuzzified values f(tΩ),,f(t1)f(t-\Omega),\ldots,f(t-1). The activation μj\mu_{j} of each rule jj is the minimum T-norm of the individual membership values of each fuzzified value. The computational cost of this step depends on the length of the sample (Ω\Omega) and the number of rules in \mathcal{M} (kΩk^{\Omega}), then the complexity is O(ΩkΩ)O(\Omega\cdot k^{\Omega}).

  • c)

    Defuzzification: the estimated value of y^(t)\hat{y}(t) is calculated by finding the mean value cjc_{j} of each matched rule jj, by averaging the midpoints of the consequent fuzzy sets, and then calculating the sum of the mean values of each rule weighted by its activation values

    y^(t)=jμjcjjμj\hat{y}(t)=\frac{\sum_{j}\mu_{j}\cdot c_{j}}{\sum_{j}\mu_{j}}

Many improvements were proposed in the FTS literature. The High-Order FTS (HOFTS) [12] extended the classical method in [24] by using several lags in the forecast and it is able to recognize more complex patterns in the time series. In [25], [26] and [27] weights are added in the consequent of the rules in order to give more importance for certain fuzzy sets. More recently, the Probabilistic Weighted FTS (PWFTS) was proposed in [28], and made available in the pyFTS library [29], including weights in both precedent and consequent of the FTS rules, achieving high accuracy and outperforming traditional forecasting approaches.

In order to represent forecasting uncertainty, [14], [28] and [30] proposed approaches for probabilistic forecasting. In [31] and [32] distributed FTS variants are proposed for Big Data time series. Multivariate time series are explored in [33], which uses Fuzzy Information Granules (FIG) to propose a multivariate forecasting method. In [18] a survey on the design of FTS forecasting models was provided. A comprehensive review of those aforementioned models, focused on time-invariant and rule-based approaches can be found in [32].

The major hyper-parameters of FTS methods are the number of fuzzy sets kk and the order of the model Ω\Omega. These hyper-parameters conduce the model training and forecasting and are responsible for the accuracy and model parsimony (the number of rules). A method for hyperparameter tuning of FTS models is presented in [34].

The FTS approaches listed in this section are all time-invariant approaches which means that the models are trained only once and then its internal rule base does not change. To keep their accuracy, these models must make strong assumptions about the stationarity and homoskedasticity of the time series. In non-stationary environments, this is a major drawback.

To better understand the behavior of the FTS methods in non-stationary scenarios, Figure 1 shows the performances of several methods in the literature when the test data falls out of the known UoD. For this example, we used the NASDAQ dataset. It can be seen that this situation makes most of the trained models useless in the long run.

Refer to caption
Figure 1: Sample of models forecasts in a concept drift scenario. The dataset (Original) is NASDAQ. The models are: Fuzzy Time Series (FTS) [8]; Conventional FTS (CFTS) [24]; Weighted FTS (FTS) [25]; Improved Weight FTS (IWFTS) [35]; Trend-Weighted FTS (TWFTS) [36]; Exponentially Weighted FTS (EWFTS) [27]; High Order FTS (HOFTS) [12]; Weighted High Order FTS (WHOFTS) order 2 and 3 [32]; Hwang [37] order 2 and 3.

2.3 Non-stationary Fuzzy Sets

Non-stationary fuzzy reasoning and non-stationary fuzzy sets (NSFS) were introduced by Garibaldi and Ozen and Garibaldi, Jaroszewski and Musikasuwan, respectivelly in [38] and [16]. The main idea is to extend the traditional fuzzy set definition by introducing a dynamic component that changes the membership function μ\mu over time. This change takes several forms: a) variation in location: by displacing the parameters of the μ\mu function along the UoD without changing its shape; b) variation in width: changing the shape of μ\mu by stretching or contracting its bounds; and c) noise variation: by adding random noise to the membership grade.

A NSFS is defined with two functions: the non-stationary membership function (NSMF), which considers time variations of the corresponding membership function (MF), and the perturbation function, which is the dynamic component responsible for altering the parameters of the membership function given some parameter set.

According to Garibaldi et al. [16] a non-stationary fuzzy set A˙\dot{A} can be formalized as follows:

A˙=tTxXμA˙(t,x)𝑑x𝑑t\dot{A}=\int_{t\in T}\int_{x\in X}\mu_{\dot{A}}(t,x)dxdt (1)

where A˙\dot{A} is a fuzzy set over a UoD XX characterized by a NSMF μA˙(t,x)\mu_{\dot{A}}(t,x) at a set of time points TT.

It is worth noting that the NSMF μA˙\mu_{\dot{A}} varies throughout UU and the time interval TT. This means that the NSMF parameter set should vary over time. A regular MF, μA(x)\mu_{A}(x), can be expressed as μA(x,p1,,pm)\mu_{A}(x,p_{1},\ldots,p_{m}), where p1,,pmp_{1},\ldots,p_{m} denote the parameters of μA(x)\mu_{A}(x). Thus, a NSMF can be denoted in an analogous way as follows:

μA˙(t,x)=μA(x,p1(t),,pm(t))\mu_{\dot{A}}(t,x)=\mu_{A}(x,p_{1}(t),\ldots,p_{m}(t)) (2)

where each parameter can be varied over time by a perturbation function multiplied by a constant.

One of the constraints existing in several FTS models is the lack of ability to deal with conditional variances and concept-drift scenarios, in which the statistical characteristics of the time series change, sometimes drastically. Thus, through small variation in the MFs, we deploy NSFS in FTS models to deal with variability for decisions over time and contributing to the mitigation of this drawback.

3 Time Variant Approaches

Time-variant FTS models should be applied when the data is not compliant with the stationarity and homoskedasticity assumptions. They include incremental, flexible and evolving techniques for adapting the model to the input data [18, 39, 40].

In the seminal work of Song and Chissom on time variant FTS [39], time-variance can be seen as a meta-modeling technique. It is not a proper FTS model, it is a training policy for another FTS method which controls when this method will be retrained and how many lags will be used. More specifically, the time variant approach defines WW, the length of the memory window, and RR, the refreshing interval. Thus, the chosen FTS model is built from scratch every RR time instants using the most recent WW observations of the time series.

This first time-variant FTS approach [39] was followed by several other authors who have mixed its training policy with different knowledge models and weighting schemes [40, 41, 42, 43].

In Figure 2 (left) it is easy to see that all FTS time invariant approaches can be combined with the time variant method and used as the internal model. However, two major drawbacks of the classical Song and Chissom’s method can be highlighted. The first one is its limited memory. Once a new data point arrives the previous knowledge base is completely discarded. This, in turn, may lead to catastrophic forgetting of frequent patterns if the parameters WW and RR are not tuned correctly. The second one is its high computational cost. Using a binary search tree structure to organize the kk fuzzy sets, the time complexity for a search among them decreases from O(k)O(k) to O(logk)O(\log k). Thus, for a given input of size TT, the complexity is O(T/RW(logk)Ω)O(T/R\cdot W\cdot(\log k)^{\Omega}), given that its internal model will be retrained T/RT/R times with a potential cost of O(W(logk)Ω)O(W\cdot(\log k)^{\Omega}).

The Incremental Ensemble, see Figure 2 (right), is an alternative to control the limited memory of the Song and Chissom method by balancing the learning of new behaviors with the memory of the old ones [2, 44, 4, 20]. The Incremental Ensemble is, in fact, a meta-model containing MM internal models. It is also controlled by the WW and RR parameters. At each interval of RR observations a new model t\mathcal{M}_{t} is built with the last WW observations. t\mathcal{M}_{t} is then appended to the ensemble while the oldest model tM\mathcal{M}_{t-M} is discarded.

The generalization of Song and Chissom’s method as well as the Incremental Ensemble using FTS as internal models can be seen in Figure 2. As in the Song and Chissom’s case, the major drawback of the incremental ensemble techniques is its computational cost.

Refer to caption
Figure 2: Song and Chissom method and Incremental Ensemble

4 The Non-Stationary Fuzzy Time Series method

The proposed Non-Stationary Fuzzy Time Series method extends the concepts of the Conventional FTS method [24] to incorporate Non-Stationary Fuzzy Sets presented by Garibaldi et al. [16]. In the proposed forecasting procedure, the mean and variance of the residuals are used to translate and scale the sets to adapt them to the changes occurred in the data after the training process. The parameters of the fuzzy sets change towards canceling the mean of the residuals. If the variance of residuals is high, the range of the fuzzy sets should be reduced to reduce granularity. Moreover, if the bounds of U change, the sets are adapted to respond to this change as well.

By convention, it is assumed that YY\in\mathbb{R} is a real valuated univariate time series and y(t)Yy(t)\in Y a single data point indexed by time index tt\in\mathbb{N}. It is also assumed that all fuzzy sets have a membership function, μ\mu, with triangular shapes, as defined in equation (3), where xYx\in Y is the input value and l,c,uYl,c,u\in Y are, respectively, the lower, midpoint and the upper basis of the triangle.

μ(x,l,c,u)={0ifx<lorx>uxlcliflxcuxucifcxu\mu(x,l,c,u)=\left\{\begin{array}[]{lcr}0&if&x<l\;or\;x>u\\ \frac{x-l}{c-l}&if&l\leq x\leq c\\ \frac{u-x}{u-c}&if&c\leq x\leq u\end{array}\right. (3)

Furthermore, all NSFS have a perturbation function π\pi, defined in (5), over the parameter set {l,c,u}\{l,c,u\} of μ\mu, where δ\delta\in\mathbb{R} is the displacement and ρ\rho\in\mathbb{R} is the scale increment. π\pi shifts the triangular function across the domain of YY and scales it by moving the triangle bounds with respect to the center as shown in Figure 3. Thus, the NSMF function is given by:

μ(x,π(l,c,u,δ,ρ))\mu(x,\pi(l,c,u,\delta,\rho)) (4)

where,

π(l,c,u,δ,ρ)={ρ2(l+δ),c+δ,ρ2+(u+δ)}\pi(l,c,u,\delta,\rho)=\left\{\frac{\rho}{2}-(l+\delta)\;,\;c+\delta\;,\;\frac{\rho}{2}+(u+\delta)\right\} (5)
Refer to caption
Figure 3: The effects of the δ\delta and ρ\rho perturbation parameters on a triangular membership function with l=14l=14, c=18c=18, u=22u=22

The proposed method, depicted in Figure 4, consists of training, parameter adaptation and forecasting procedures.

Refer to caption
Figure 4: NSFTS overview

In the training procedure, the UU partitioning creates static fuzzy sets. In a forecasting model, a normal distribution of the residuals suggests that its predictive ability is consistent over the entire data range. Therefore the objective of the training procedure is to generate a model that captures all the information in the data, leaving a residual N(0,1)\mathcal{E}\sim N(0,1). This is done by defining the ll, cc and uu parameters for each one of the kk fuzzy sets and using them to extract temporal patterns from the data. The ww most recent residuals of the model are stored in the set \mathcal{E} which consists of the differences between the model forecasts and the new data collected in that window.

Given a non-stationary scenario, it is unlikely that a model with predefined and fixed parameters could manage to keep its residuals normally distributed. In the parameter adaption procedure, the mean and variance of the residuals are monitored and used to adapt the MFs. The NSFS is perturbed in order to keep the residuals as close to N(0,1)\mathcal{E}\sim N(0,1) as possible. The goal of this procedure, is to find the best values for the parameters δ\delta and ρ\rho of each NSFS π\pi function in order to adapt them to the changes in the data.

The forecasting procedure finds the rules that match a given numerical input and use them to compute a numerical forecast using the non-stationary fuzzy sets perturbed by the parameters δ\delta and ρ\rho.

It can be noticed that the rationale behind the proposed method is to keep the residuals normally distributed. In this context, the main technical contribution of this paper is to introduce the adaptation procedure which will keep N(0,1)\mathcal{E}\sim N(0,1). This procedure is detailed in section 4.2. For completeness of presentation, the training and forescasting procedures defined in [24] are reproduced in subsections 4.1 and 4.3, respectively.

4.1 Training Procedure

Given the training data, YY, the number of partitions, kk, and the length of the residuals window, ww:

  1. Step 1

    Defining the universe of discourse, UU:

    U=[lb,ub]U=[lb,ub] (6)

    where, lb=min(Y)min(Y)×0.2lb=\min(Y)-\min(Y)\times 0.2 and ub=max(Y)+max(Y)×0.2ub=\max(Y)+\max(Y)\times 0.2.

    Notice that the data bounds are extrapolated to compensate for a possible underestimation of the bounds in the training set. The value 0.20.2 is a typical value, but can be modified according to the problem.

  2. Step 2

    UU Partitioning: Define the midpoints ci,i=0,,k1c_{i},i=0,...,k-1 of the initial fuzzy sets;

    ci=lb+i×(ublb)(k1)c_{i}=lb+i\times\frac{(ub-lb)}{(k-1)} (7)
  3. Step 3

    Define the linguistic variable A~\tilde{A}: Create kk overlapping fuzzy sets AiA_{i}, with triangular MF μAi\mu_{A_{i}} as defined in equation (8).

    μAi(x)={0ifx<lorx>uxliciliiflixciuixuiciifcixui\mu_{A_{i}}(x)=\left\{\begin{array}[]{lcr}0&if&x<l\;or\;x>u\\ \frac{x-l_{i}}{c_{i}-l_{i}}&if&l_{i}\leq x\leq c_{i}\\ \frac{u_{i}-x}{u_{i}-c_{i}}&if&c_{i}\leq x\leq u_{i}\end{array}\right. (8)

    where li=ci1l_{i}=c_{i-1} and ui=ci+1u_{i}=c_{i+1}.

    Each fuzzy set AiA~A_{i}\in\tilde{A} is a linguistic term of the linguistic variable A~\tilde{A}. Once the fuzzy sets are created, a function πi\pi_{i}, as defined in equation (5), will be associated with each fuzzy set in order to transform it into a NSFS, initialized with δ=0\delta=0 and ρ=0\rho=0.

    The number of sets kk defines the number of states and consequently the number of state transitions that the model can represent. The more complex the time series the greater the number of kk should be. One should, however, be careful to not overestimate kk since it may cause overfitting and make the model unnecessarily big.

  4. Step 4

    Fuzzification: Transform the original time series data Y={y(0),y(1),,y(T)}Y=\{y(0),y(1),\ldots,y(T)\} into a fuzzy time series F={f(0),f(1),,f(T)}F=\{f(0),f(1),\ldots,f(T)\}, where f(t)f(t) is defined as:

    f(t)={μA0(y(t)),μA1(y(t)),,μAk1(y(t))}f(t)=\{\mu_{A_{0}}(y(t)),\mu_{A_{1}}(y(t)),...,\mu_{A_{k-1}}(y(t))\} (9)
  5. Step 5

    Generate the temporal patterns: The fuzzy temporal patterns have format AlArA_{l}\rightarrow A_{r}, where:

    The precedent (or Left Hand Side - LHS) is:

    Al=argmaxAi(μAi(y(t1)))A_{l}=\underset{A_{i}}{\arg\max}\left(\mu_{A_{i}}(y(t-1))\right) (10)

    And the consequent (or Right Hand Side - RHS) is:

    Ar=argmaxAi(μAi(y(t)))A_{r}=\underset{A_{i}}{\arg\max}\left(\mu_{A_{i}}(y(t))\right) (11)
  6. Step 6

    Generate the rule base: Select all temporal patterns with the same precedent and group their consequent sets creating a rule with the format AlAa,Ab,A_{l}\rightarrow A_{a},A_{b},.... Thus the rule RHSRHS can be understood as the set of possibilities which may happen on time t+1t+1 (the consequent) when a certain set AlA_{l} is identified on time tt (the precedent).

  7. Step 7

    Compute the residuals ϵ\epsilon: Invoke the Forecasting Procedure defined on Section 4.3 using the given training data, Y={y(0),,y(T)}Y=\{y(0),...,y(T)\}, as the input and forecast the last ww items to calculate the set of residuals \mathcal{E} defined as:

    ={ϵ(tw),ϵ(t(w1)),,ϵ(t)}\mathcal{E}=\{\epsilon(t-w),\epsilon(t-(w-1)),...,\epsilon(t)\} (12)

    where ϵ(t)=y(t)y^(t)\epsilon(t)=y(t)-\hat{y}(t) and y^(t)\hat{y}(t) is the forecast produced by the model.

4.2 Parameter Adaption Procedure

Given an input value y(t)y(t), the last forecast value y^(t)\hat{y}(t), the length of the residuals vector ww, the residuals set \mathcal{E} and the linguistic variable A~\tilde{A}, do:

  1. Step 1

    Find the displacements of y(t)y(t) on UU: If y(t)y(t) is below the lower bound of UU store the difference in dl=lby(t)d_{l}=lb-y(t), otherwise dl=0d_{l}=0. If y(t)y(t) is above the upper bound of UU, store the difference in du=y(t)ubd_{u}=y(t)-ub, otherwise du=0d_{u}=0. Then compute the displacement range rr as the sum of the UoD displacements dud_{u} and dld_{l}, according with (13), and the displacement midpoint mprmp_{r} as rr divided by 2, according with (14)

    r=dudlr=d_{u}-d_{l} (13)
    mpr=r/2mp_{r}=r/2 (14)
  2. Step 2

    Compute the mean ¯\overline{\mathcal{E}} and variance σ\sigma_{\mathcal{E}} of the set \mathcal{E}: The residual mean ¯\overline{\mathcal{E}} indicates a bias, a shift on the accuracy of the trained model that must be corrected. The variance, σ\sigma_{\mathcal{E}}, represents the shift on the trained model variance, as a reflection to a change in the test data. These values will be used to adjust the position and length of the fuzzy sets.

  3. Step 3

    Compute the displacements, δi\delta_{i}: The displacement is computed for each fuzzy set AiA_{i} in order to equally move the kk fuzzy sets by the mean shift ¯\overline{\mathcal{E}}, one fraction of the displacement range [dl,du][d_{l},d_{u}] and proportionally to the expansion of the variance in the interval [σ,σ][-\sigma_{\mathcal{E}},\sigma_{\mathcal{E}}]:

    δi=¯+(irk1mpr)+(i2σk1σ)\delta_{i}=\overline{\mathcal{E}}+\left(i\frac{r}{k-1}-mp_{r}\right)+\left(i\frac{2\sigma_{\mathcal{E}}}{k-1}-\sigma_{\mathcal{E}}\right) (15)

    The displacements δi\delta_{i}, for i=0k1i=0\ldots k-1, are equally distributed in the interval [¯dlσ,¯+du+σ][\overline{\mathcal{E}}-d_{l}-\sigma_{\mathcal{E}}\;,\;\overline{\mathcal{E}}+d_{u}+\sigma_{\mathcal{E}}] and indicate the new position of fuzzy set AiA_{i}, given the deviations from the known UoD, whose range is rr and its midpoint mprmp_{r}, and the error signal with the mean ¯\overline{\mathcal{E}} and variance σ\sigma_{\mathcal{E}}. Indeed, while the term i(r/(k1))mpri(r/(k-1))-mp_{r} is used to translate the midpoint of AiA_{i} by a fraction of rr centered in mprmp_{r}, the term i(2σ/(k1))σi(2\sigma_{\mathcal{E}}/(k-1))-\sigma_{\mathcal{E}} is used to offset the scaling of the fuzzy set bounds by a fraction of σ\sigma_{\mathcal{E}} centered in 0.

  4. Step 4

    Compute the scaling factor ρi\rho_{i}, Eq. (16): For each fuzzy set AiA~A_{i}\in\tilde{A} the scale increment is empirically calculated as the distance between the displacements δi\delta_{i}, in order to adjust the fuzzy set lengths proportionally to their displacements, avoiding discontinuities between the sets (intervals on UU not covered by any fuzzy set) by setting li=ci1l_{i}=c_{i-1} and ui=ci+1u_{i}=c_{i+1}:

    ρi=|δi1δi+1|\rho_{i}=|\delta_{i-1}-\delta_{i+1}| (16)
  5. Step 5

    Update ϵ\epsilon: Get the last forecast value y^(t+1)\hat{y}(t+1) and the last known value y(t+1)Yy(t+1)\in Y. Compute the error term ϵ(t+1)=y(t+1)y^(t+1)\epsilon(t+1)=y(t+1)-\hat{y}(t+1), push it on to the residuals set \mathcal{E} and delete the oldest value. That is:

    =ϵ(tw)ϵ(t+1)\mathcal{E}=\mathcal{E}\setminus\epsilon(t-w)\cup\epsilon(t+1)\ (17)

4.3 Forecasting Procedure

Given an input value y(t)y(t), the linguistic variable A~\tilde{A}, the inferred rule set and the perturbation parameters δi\delta_{i} and ρi\rho_{i} for each fuzzy set AiA~A_{i}\in\tilde{A}:

  1. Step 1

    Fuzzification: Compute the membership grade μAi\mu_{A_{i}} for each non-stationary fuzzy set AiA_{i}, such that μAi=μAi(y(t),π(li,ci,ui,δi,ρi))\mu_{A_{i}}=\mu_{A_{i}}(y(t),\pi(l_{i},c_{i},u_{i},\delta_{i},\rho_{i}));

  2. Step 2

    Rule matching: Build a set, 𝒮\mathcal{S} with all the rules AjRHSjA_{j}\rightarrow RHS_{j} where μAj(y(t))>0\mu_{A_{j}}(y(t))>0, that is:

    𝒮={AjRHSj|μAj(y(t))>0}\mathcal{S}=\{A_{j}\rightarrow RHS_{j}|\mu_{A_{j}}(y(t))>0\} (18)
  3. Step 3

    Defuzzification: Compute the forecast y^(t+1)\hat{y}(t+1) according to Equation (19) as the weighted sum of the rule mid-points, mpmp, by their membership grades μj\mu_{j} for each selected rule jj:

    y^(t+1)=AjRHSj𝒮μAj(y(t))mp(RHSj)\hat{y}(t+1)=\sum_{A_{j}\rightarrow RHS_{j}\in\mathcal{S}}\mu_{A_{j}}(y(t))\cdot mp(RHS_{j}) (19)

    where

    mp(RHS)=AiRHScAi|RHS|mp(RHS)=\dfrac{\sum_{A_{i}\in RHS}c_{A_{i}}}{|RHS|} (20)

4.4 Method Discussion

The NSFS has the ability to dynamically adapt membership functions as the statistical properties of the time series vary. The NSFTS method tries to detect these changes by using the displacements of the input data y(t)y(t) in relation to the known UU and the residuals statistics to identify bias and variance shifts.

The major assumption of the method is that adjusting the fuzzy sets, without modifying the model structure represented by the learned rules, is enough to adapt the model to the new behavior of the time series. Such procedures, described in subsection 4.2, given an input of size TT, memory window length WW, refreshing interval RR and kk fuzzy sets, have a time complexity of O(T/RWlogk)O(T/R\cdot W\cdot\log k), while a retraining process would take O(T/RW(logk)Ω)O(T/R\cdot W\cdot(\log k)^{\Omega}), as discussed in Section 3. Hence, this approach becomes much cheaper computationally, when compared to retraining a new model from scratch.

The π\pi function entails an additive approach for the translation and scaling of the parameters based on grid partitioning of the UoD. The procedure calculates the translation increment proportionally to the displacement of the input value in relation to the known UU and the variance of the residuals. The scaling increment works as a heuristic to keep the grid aspect of the fuzzy sets after the translation, connecting the lower and upper bounds with the centers of adjacent fuzzy sets.

Figure 5 depicts the forecasting method applied to a test sample with significant drift from the training data extracted from SP500 time series, later explained in Section 5.

Refer to caption
(a) UoD initial partitioning
Refer to caption
(b) NSFS model forecasts
Refer to caption
(c) Residuals
Refer to caption
(d) Perturbations on the NSFS
Figure 5: Out of sample example of NSFTS forecasting

In Figure 5a the partitioning of the UoD is shown with 15 partitions. The generated forecasts are presented in Figure 5b and their residuals in Figure 5c. The perturbations on the NSFS, in response to the previous residuals, are represented in Figure 5d, where the same fuzzy sets represented in Figure 5a are now colored over the vertical axis. It is possible to see that these fuzzy sets move, sometimes expanding sometimes contracting, in order to fit to the data. As the drift increases, the displacement and scaling of the fuzzy sets also grow.

The proposed model is a case of active learning, in which the learning algorithm is able to interactively obtain, from different sources of information, the necessary inputs for the generation or improvement of its learning. Basically, it uses residual error values to fit a predefined model. The version here presented focuses on highlighting its adaptability to changes observed over time. It uses first-order fuzzy rules, that is, considering only a previous observation, and is applied to forecasting problems in univariate time series. However, its time variant model characteristics can be expanded or adapted to other models for better prediction values. For instance, the model can be expanded to comprehend a high-order fuzzy rule generation. It also can be adapted to fit parameters of other FTS models, such as PWFTS, used as benchmark in this work.

5 Computational Experiments

5.1 Experimental Design

Different datasets were chosen for model validation. The datasets consisted of four stock market indices (Dow Jones, NASDAQ, SP500 and TAIEX), three FOREX pairs (EUR-USD, EUR-GBP, GBP-USD), two cryptocoins exchange rates (Bitcoin-USD and Ethereum-USD) illustrated in A (Fig. 6) and eight synthetic time series with concept drifts, see B (Fig. 7).

The market indexes data sets contain the daily averaged index by business day, such that the Dow Jones Industrial Average (Dow Jones)111https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC. Accessed in 11/11/2019 is sampled from 1985 to 2017 time window, the Taiwan Stock Exchange Capitalization Weighted Stock Index (TAIEX)222https://www.twse.com.tw/en/page/products/indices/series.html. Accessed in 11/11/2019 is sampled from 1995 to 2014, the National Association of Securities Dealers Automated Quotations - Composite Index (NASDAQ ÎXIC)333https://www.nasdaq.com/market-activity/index/ixic. Accessed in 12/11/2019 is sampled from 2000 to 2016 and the SP500 - Standard & Poor’s 500444https://finance.yahoo.com/quote/%5EGSPC/history?p=%5EGSPC. Accessed in 12/11/2019 is sampled from 1950 to 2017. The FOREX data sets contain the daily averaged quotations, by business day, from 2016 to 2018, which pairs are the US Dollar to Euro (USD-EUR), Euro to Great British Pound (EUR-GBP) and Great British Pound to US Dollar (GBP-USD). The cryptocoin datasets contain the daily quotations, in US Dollar, of the Bitcoin (BTC-USD) and Ethereum (ETC-USD). The synthetic data aims to represent different types of concept drifts.

The Augmented Dickey-Fuller (ADF) test was used to check which time series are non-stationary. The Levene’s test was used to test whether the series are heteroskedastic or not. The detailed results of these tests are shown in C and D, respectively. In summary, except for the data sets, “Stationary Signal”, “Stationary Signal with blip” and “Sudden variance” (see B (Fig. 7)) all the benchmarks were shown to be non-stationary and only the “Stationary Signal with blip” homogeneity of variances with a confidence level of 95%.

The standard accuracy metrics used to evaluate point forecasting methods are the Root Mean Squared Error (RMSE) (21), Mean Absolute Percentage Error (MAPE) (22) and Theil’s U statistic (U) (23) where yy are the target values, y^\hat{y} are the forecast values and nn the sample size. These metrics are commonly used in evaluating forecasting models [18].

RMSE=1ni=1n(yiy^i)2RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(y_{i}-\hat{y}_{i})^{2}} (21)
MAPE=1ni=1n|yiy^yi|MAPE=\frac{1}{n}\sum_{i=1}^{n}\Big{|}\frac{y_{i}-\hat{y}}{y_{i}}\Big{|} (22)
U=i=1n(yiy^)2i=1nyi2+i=1ny^i2U=\frac{\sqrt{\sum_{i=1}^{n}(y_{i}-\hat{y})^{2}}}{\sqrt{\sum_{i=1}^{n}y_{i}^{2}}+\sqrt{\sum_{i=1}^{n}\hat{y}_{i}^{2}}} (23)

The proposed NSFTS was tested against the Time Variant [39] and the Incremental Ensemble approaches, both using the PWFTS method [28] available in [29] as internal method. As can be seen in [32], the PWFTS outperformed a wide number of forecasting methods ranging from classic statistical ones such as ARIMA [45] to more recent ones such as the k-Nearest Neighbors with Kernel Density Estimation [46]. The parameters of all methods are presented in Table 2, and they were defined through grid search.

Method Parameter Value
All k 35
Ω\Omega 1
Song and Chissom & Incremental Ensemble W 100
R 10
Incremental Ensemble M 2
Table 2: Benchmarking parameters

The grid partitioning scheme was used for the initial generation of fuzzy sets, where the best number of partitions in the range [5,100][5,100] was selected for each data set. The high order methods were tested with orders 2 and 3.

5.2 Results

The average RMSE, MAPE and U statistic results by method and data set are presented in Table 3. It can be seen that the NSFTS is superior or, at least, not worse than the other methods in all the selected real world benchmarks.

From the results on the artificial data sets, it can be seen that the NSFTS has more difficulties than the other methods in handling sudden changes in the series distribution mean. This behavior is expected from the NSFTS once it is an adaptive method and therefore presents a delay to respond to changes. On the other hand, it handles with relative ease incremental changes in the mean. Observing the Theil’s U statistic which is insensitive to the range of values of the series, one can see that changes in the variance alone do not seem to affect the NSFTS performance by much.

These observations explain the good performance of the NSFTS on the economic series. As one can see in A, these series present a gradual, though significant, variation of the mean along with different types of changes in variance. Apparently, the superior performance of the NSFTS when the mean varies gradually gave it the edge over the other methods.

Table 3: Results of the metrics RMSE, MAPE and U by approach and dataset
Dataset RMSE MAPE U
T. Variant I. Ensemble NSFTS T. Variant I. Ensemble NSFTS T. Variant I. Ensemble NSFTS
Stationary Signal 0.07 0.07 0.07 1.13 1.16 1.13 0.99 1.01 1.01
Stationary signal with blip 0.16 0.17 0.14 1.29 1.27 1.22 1.14 1.24 1.00
Sudden variance 0.12 0.12 0.12 1.80 1.81 1.73 0.98 1.00 0.98
Sudden mean 0.25 0.55 0.79 2.20 6.00 19.89 1.72 3.85 5.66
Sudden mean and variance 0.59 0.89 0.89 3.82 7.52 19.58 1.22 1.90 1.96
Incremental mean 0.39 7.81 0.11 1.02 14.92 0.69 3.45 71.08 1.04
Incremental variance 51.89 50.51 60.59 151.43 149.56 241.69 0.78 0.78 0.96
Incremental mean and variance 1.29 2.37 1.65 5.06 8.48 5.44 1.03 1.51 1.09
TAIEX 145.82 1130.86 116.82 1.27 9.60 1.34 1.52 11.90 1.24
SP500 9.29 61.22 7.86 0.64 2.66 0.57 1.16 7.73 1.01
NASDAQ 35.09 214.74 36.75 0.90 4.49 1.06 1.25 7.71 1.32
DowJones 71.88 519.55 63.26 0.64 2.80 0.60 1.13 8.25 1.02
BTC-USD 465.72 1775.53 180.90 4.97 32.81 3.40 2.96 11.50 1.19
ETH-USD 44.54 222.21 24.16 7.35 41.47 4.06 2.06 10.84 1.25
EUR-USD 0.01 0.04 0.01 0.43 0.98 0.41 1.22 6.75 1.13
EUR-GBP 0.00 0.01 0.00 0.37 0.48 0.32 1.24 3.61 1.08
GBP-USD 0.01 0.07 0.01 0.41 1.18 0.42 1.23 9.62 1.26

To assess the overall performance of the methods with respect to the RMSE we ran a Friedman Aligned Ranks (F-test), a non-parametric test for the equality of the means was used with α=0.05\alpha=0.05, where H0H_{0} indicates that there is no significant difference between the means and H1H_{1} indicates that there is at least one significant difference between the means. The F-value result was 13.199413.1994 with a p-value of 0.00130.0013. Thus, H0H_{0} was rejected at 5% confidence level.

A one-versus-all Finner paired post-hoc test was employed to check the method equivalence with NSFTS as control method, where H0H_{0} indicates that there is no significant difference between the means and H1H_{1} indicates that there is a significant difference between the means. The results are presented in Table 4 which shows that NSFTS is not equivalent to the Incremental Ensemble method and equivalent to the time variant FTS model, considering a 95% confidence level.

It is important to remind that the NSFTS is computationally cheaper than the Time Variant method since it does not require retraining. Therefore, even though they present equivalent RMSE performance, the NSFTS still has the computational edge.

Table 4: Post hoc multiple comparisons with the NSFTS as control method
Comparison Z-value P-value
Adjusted p-value
Result
NSFTS vs
Incremental Ensemble
3.680062 0.000233 0.000466 H0H_{0} Rejected
NSFTS vs
Time Variant
0.144203 0.885340 0.885340 H0H_{0} Accepted

6 Conclusion

This paper proposed a Non-Stationary Fuzzy Time Series (NSFTS) method that is able to dynamically adapt its fuzzy sets to reflect the changes in the underlying stochastic processes based on the residual errors. The proposed approach enables the model to be trained only once and remain useful long after.

The parameter adaptation procedure developed here can be integrated into other FTS methods, extending them to deal with forecasting in non-stationary environments.

The method was tested with several non-stationary and heteroskedastic data sets consisting of four market indices, three FOREX pairs, two cryptocoin exchange rates and eight synthetic time-series that present different combinations of concept-drifts.

The main feature delivered by the proposed method is the capability of capturing a wide spectrum of unconditional heteroskedastic effects and time series trends by the variation of several parameters of its internal model. This is done without data pre-processing, without need of retraining and without changing the symbolic structure in the learned rules. In order to contribute to the replication of all the results in the paper, we provide full results and all source codes for this article in Github and in the MINDS Laboratory website. The link is https://github.com/PYFTS/NSFTS.

Acknowledgements

This work was partially financed by the Foundation for Research of the State of Minas Gerais (Fundação de Amparo à Pesquisa do Estado de Minas Gerais - FAPEMIG) and by the National Council for Scientific and Technological Development (CNPq), Brazil, Grants no. 405911/2017-3; no. 169392/2017-1; and no. 306850/2016-8.

References

  • [1] L. Cohen, G. Avrahami-Bakish, M. Last, A. Kandel, O. Kipersztok, Real-time data mining of non-stationary data streams from sensor networks, Information Fusion 9 (3) (2008) 344–353.
  • [2] B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, M. Woźniak, Ensemble learning for data stream analysis: A survey, Information Fusion 37 (2017) 132–156.
  • [3] J. Maia, C. A. Severiano Junior, F. G. Guimarães, C. L. d. Castro, A. P. Lemos, J. C. F. Galindo, M. W. Cohen, Evolving clustering algorithm based on mixture of typicalities for stream data mining, Future Generation Computer Systems 106 (2020) 672–684.
  • [4] M. Assaad, R. Boné, H. Cardot, A new boosting algorithm for improved time-series forecasting with recurrent neural networks, Information Fusion 9 (1) (2008) 41 – 55, special Issue on Applications of Ensemble Methods. doi:https://doi.org/10.1016/j.inffus.2006.10.009.
  • [5] X. Qiu, P. N. Suganthan, G. A. Amaratunga, Fusion of multiple indicators with ensemble incremental learning techniques for stock price forecasting, Journal of Banking and Financial Technology 3 (1) (2019) 33–42.
  • [6] T. Zhao, Q. J. Wang, A. Schepen, M. Griffiths, Ensemble forecasting of monthly and seasonal reference crop evapotranspiration based on global climate model outputs, Agricultural and Forest Meteorology 264 (2019) 114 – 124. doi:https://doi.org/10.1016/j.agrformet.2018.10.001.
  • [7] G. Ditzler, M. Roveri, C. Alippi, R. Polikar, Learning in Nonstationary Environments: A Survey, IEEE Computational Intelligence Magazine 10 (4) (2015) 12–25. doi:10.1109/MCI.2015.2471196.
  • [8] Q. Song, B. S. Chissom, Fuzzy time series and its models, Fuzzy Sets and Systems 54 (3) (1993) 269–277.
  • [9] M. H. Lee, H. Javedani, et al., A weighted fuzzy integrated time series for forecasting tourist arrivals, in: International Conference on Informatics Engineering and Information Science, Springer, 2011, pp. 206–217.
  • [10] Y. H. Chen, W.-C. Hong, W. Shen, N. N. Huang, Electric load forecasting based on a least squares support vector machine with fuzzy time series and global harmony search algorithm, Energies 9 (2) (2016) 70.
  • [11] H. J. Sadaei, F. G. Guimarães, C. J. da Silva, M. H. Lee, T. Eslami, Short-term load forecasting method based on fuzzy time series, seasonality and long memory process, International Journal of Approximate Reasoning 83 (2017) 196–217.
  • [12] C. A. S. Junior, P. C. L. Silva, H. J. Sadaei, F. G. Guimarães, Very short-term solar forecasting using fuzzy time series, in: 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2017, pp. 1–6. doi:10.1109/FUZZ-IEEE.2017.8015732.
  • [13] S. M. Chen, C. D. Chen, TAIEX forecasting based on fuzzy time series and fuzzy variation groups, IEEE Transactions on Fuzzy Systems 19 (1) (2011) 1–12. doi:10.1109/TFUZZ.2010.2073712.
  • [14] P. C. L. Silva, H. J. Sadaei, F. G. Guimarães, Interval forecasting with fuzzy time series, in: 2016 IEEE Symposium Series on Computational Intelligence (SSCI), 2016, pp. 1–8. doi:10.1109/SSCI.2016.7850010.
  • [15] F. M. Talarposhti, H. J. Sadaei, R. Enayatifar, F. G. Guimarães, M. Mahmud, T. Eslami, Stock market forecasting by using a hybrid model of exponential fuzzy time series, International Journal of Approximate Reasoning 70 (2016) 79–98.
  • [16] J. M. Garibaldi, M. Jaroszewski, S. Musikasuwan, Nonstationary fuzzy sets, IEEE Transactions on Fuzzy Systems 16 (4) (2008) 1072–1086.
  • [17] M. A. Alves, P. C. d. L. Silva, C. A. Severiano Junior, G. L. Vieira, F. G. Guimarães, H. J. Sadaei, An extension of nonstationary fuzzy sets to heteroskedastic fuzzy time series, in: 26th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Springer LNCS Series, 2018, pp. 591–596.
  • [18] M. Bose, K. Mali, Designing fuzzy time series forecasting models: A survey, International Journal of Approximate Reasoning 111 (2019) 78–99.
  • [19] A. Tsymbal, The problem of concept drift: definitions and related work, Computer Science Department, Trinity College Dublin 106 (2).
  • [20] J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM computing surveys (CSUR) 46 (4) (2014) 44.
  • [21] I. Kim, S.-R. Lee, A fuzzy time series prediction method based on consecutive values, in: FUZZ-IEEE’99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315), Vol. 2, 1999, pp. 703–707 vol.2. doi:10.1109/FUZZY.1999.793034.
  • [22] N. E. Huang, M.-L. Wu, W. Qu, S. R. Long, S. S. Shen, Applications of hilbert–huang transform to non-stationary financial time series analysis, Applied stochastic models in business and industry 19 (3) (2003) 245–268.
  • [23] Y. Liu, B. Wang, H. Zhan, Y. Fan, Y. Zha, Y. Hao, Simulation of nonstationary spring discharge using time series models, Water Resources Management 31 (15) (2017) 4875–4890.
  • [24] S.-M. Chen, Forecasting enrollments based on fuzzy time series, Fuzzy Sets and Systems 81 (3) (1996) 311–319.
  • [25] H.-K. Yu, Weighted fuzzy time series models for TAIEX forecasting, Physica A: Statistical Mechanics and its Applications 349 (3) (2005) 609–624. doi:10.1016/j.physa.2004.11.006.
  • [26] C.-H. Cheng, T. L. Chen, H. J. Teoh, C. H. Chiang, Fuzzy time-series based on adaptive expectation model for TAIEX forecasting, Expert Systems with Applications 34 (2008) 1126–1132. doi:10.1016/j.eswa.2006.12.021.
  • [27] H. J. Sadaei, R. Enayatifar, A. H. Abdullah, A. Gani, Short-term load forecasting using a hybrid model with a refined exponentially weighted fuzzy time series and an improved harmony search, International Journal of Electrical Power & Energy Systems 62 (2014) 118–129.
  • [28] P. C. L. Silva, H. J. Sadaei, R. Ballini, F. G. Guimaraes, Probabilistic Forecasting With Fuzzy Time Series, IEEE Transactions on Fuzzy Systems (2019) 1–1doi:10.1109/TFUZZ.2019.2922152.
  • [29] P. C. d. L. Silva, P. O. Lucas, H. J. Sadaei, F. G. Guimarães, pyFTS: Fuzzy Time Series for Python (2018). doi:10.5281/zenodo.597359.
    URL https://doi.org/10.5281/zenodo.597359
  • [30] P. C. d. L. Silva, M. A. Alves, C. A. Severiano Jr, G. L. Vieira, F. G. Guimarães, H. J. Sadaei, Probabilistic Forecasting with Seasonal Ensemble Fuzzy Time-Series, in: XIII Brazilian Congress on Computational Intelligence, Rio de Janeiro, 2017. doi:10.21528/CBIC2017-54.
  • [31] P. C. L. Silva, P. O. Lucas, F. G. Guimarães, A Distributed Algorithm for Scalable Fuzzy Time Series, in: R. Miani, L. Camargos, B. Zarpelão, P. R. Rosas, E. (Eds.), Proceedings of 14th International Conference on Green, Pervasive, and Cloud Computing (GPC 2019), Springer Nature Switzerland, Uberlândia, 2019, pp. 1–15. doi:10.1007/978-3-030-19223-5{\_}4.
  • [32] P. C. d. L. e. Silva, Scalable Models For Probabilistic Forecasting With Fuzzy Time Series, Ph.D. thesis, Universidade Federal de Minas Gerais (9 2019). doi:10.5281/zenodo.3374641.
    URL https://doi.org/10.5281/zenodo.3374641
  • [33] P. C. d. L. e Silva, C. A. Severiano Jr, M. A. Alves, M. W. Cohen, F. G. Guimarães, A new granular approach for multivariate forecasting, in: Latin American Workshop on Computational Neuroscience, Springer, 2019, pp. 41–58.
  • [34] P. C. L. Silva, P. d. O. e Lucas, H. J. Sadaei, F. G. Guimarães, Distributed evolutionary hyperparameter optimization for fuzzy time series, IEEE Transactions on Network and Service Management (2020) 1–1.
  • [35] R. Efendi, Z. Ismail, M. M. Deris, Improved weight fuzzy time series as used in the exchange rates forecasting of us dollar to ringgit malaysia, International Journal of Computational Intelligence and Applications 12 (01) (2013) 1350005.
  • [36] C.-H. Cheng, T.-L. Chen, C.-H. Chiang, Trend-weighted fuzzy time-series model for taiex forecasting, in: International Conference on Neural Information Processing, Springer, 2006, pp. 469–477.
  • [37] J.-R. Hwang, S.-M. Chen, C.-H. Lee, Handling forecasting problems using fuzzy time series, Fuzzy sets and systems 100 (1-3) (1998) 217–228.
  • [38] J. M. Garibaldi, T. Ozen, Uncertain fuzzy reasoning: A case study in modelling expert decision making, IEEE Transactions on Fuzzy Systems 15 (1) (2007) 16–30.
  • [39] Q. Song, B. S. Chissom, Forecasting enrollments with fuzzy time series—part ii, Fuzzy sets and systems 62 (1) (1994) 1–8.
  • [40] H.-T. Liu, M.-L. Wei, An improved fuzzy forecasting method for seasonal time series, Expert Systems with Applications 37 (9) (2010) 6310–6318.
  • [41] S. R. Singh, A simple time variant method for fuzzy time series forecasting, Cybernetics and Systems: An International Journal 38 (3) (2007) 305–321.
  • [42] T. A. Jilani, S. M. A. Burney, A refined fuzzy time series model for stock market forecasting, Physica A: Statistical Mechanics and its Applications 387 (12) (2008) 2857–2862.
  • [43] T. Vovan, An improved fuzzy time series forecasting model using variations of data, Fuzzy Optimization and Decision Making 18 (2) (2018) 151–173.
  • [44] J. R. Bertini Junior, M. d. C. Nicoletti, An iterative boosting-based ensemble for streaming data classification, Information Fusion 45 (2019) 66–78.
  • [45] D. Asteriou, S. G. Hall, Arima models and the box–jenkins methodology, Applied Econometrics 2 (2) (2011) 265–286.
  • [46] Y. Zhang, J. Wang, K-nearest neighbors and a kernel density estimator for gefcom2014 probabilistic wind power forecasting, International Journal of forecasting 32 (3) (2016) 1074–1080.

Appendix A Appendix

A: Stock market indices

Refer to caption
Figure 6: Stock market indices (TAIEX, Dow Jones, NASDAQ and SP500), FOREX pairs (EUR-USD, EUR-GBP, GBP-USD) and cryptocoin exchange rates (Bitcoin-USD and Ethereum-USD).

Appendix B Appendix

B: Synthetic data sets

Refer to caption
Figure 7: Synthetic time series with different combinations of concept drifts. These are: (a) stationary signal; (b) stationary signal with blip; (c) sudden variance; (d) sudden mean; (e) sudden mean and variance; (f) incremental mean; (g) incremental variance; (h) incremental mean and variance

.

Appendix C Appendix

In order to evaluate the stationarity of the presented data sets, the Augmented Dickey-Fuller (ADF) test for unit root was employed considering a confidence level α=0.05\alpha=0.05, where H0H_{0} indicates that the time series have a unit root and it is non-stationary and H1H_{1} indicates that time series does not have a unit root and it is stationary. The Augmented Dickey-Fuller results for unit root are shown below.

Table 5: Stationarity evalution
Dataset Augmented Dickey-Fuller
Statistic p-value H0H_{0} result
Stationary Signal -7.427114 6.504708e-11 Rejected
Stationary signal with blip -7.497758 4.334045e-11 Rejected
Sudden variance -7.746345 1.029561e-11 Rejected
Sudden mean -2.112067 2.396902e-01 Accepted
Sudden mean and variance -1.165176 6.883938e-01 Accepted
Incremental mean 3.286850 1.000000e+00 Accepted
Incremental variance -24.746787 0.000000e+00 Rejected
Incremental mean and variance -2.217183 2.000681e-01 Accepted
TAIEX -2.491767 1.174904e-01 Accepted
SP500 -0.943446 7.733287e-01 Accepted
NASDAQ 0.476224 9.841323e-01 Accepted
DowJones -0.800893 8.188597e-01 Accepted
BTC-USD -1.206100 6.709662e-01 Accepted
ETH-USD -1.852677 3.546403e-01 Accepted
EUR-USD -1.845986 3.578816e-01 Accepted
EUR-GBP -1.845986 3.578816e-01 Accepted
GBP-USD -1.131750 7.022457e-01 Accepted

Appendix D Appendix

To check the homoskedasticity the Levene’s test was employed, which checks for homogeneity of variances, with confidence level α=0.05\alpha=0.05, where H0H_{0} indicates that the sub-samples variances of the time series are all equal and H1H_{1} indicates that at least one variance of the time series sub-samples is different. the Levene’s results for homogeneity of variances are shown below.

Table 6: Homogeneity of variances evaluation
Dataset Levene’s test
Statistic p-value H0H_{0} result
Stationary Signal 10.148665 1.466392e-03 Rejected
Stationary signal with blip 1.935753 1.642857e-01 Accepted
Sudden variance 25.163718 5.733279e-07 Rejected
Sudden mean 6.989502 8.263198e-03 Rejected
Sudden mean and variance 173.802832 4.097776e-38 Rejected
Incremental mean 2954.661000 0.000000e+00 Rejected
Incremental variance 520.174358 1.597361e-102 Rejected
Incremental mean and variance 521.736508 9.142922e-103 Rejected
TAIEX 62.530885 3.366934e-15 Rejected
SP500 64.954050 1.003282e-15 Rejected
NASDAQ 1851.123985 0.000000e+00 Rejected
DowJones 163.413938 1.053157e-36 Rejected
BTC-USD 524.853179 4.352254e-107 Rejected
ETH-USD 666.975156 9.700699e-116 Rejected
EUR-USD 401.104689 6.851098e-86 Rejected
EUR-GBP 401.104689 6.851098e-86 Rejected
GBP-USD 1340.896808 2.812862e-260 Rejected