This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Computing Semilinear Sparse Models for Approximately Eventually Periodic Signals

Fredy Vides Scientific Computing Innovation Center, Universidad Nacional Autónoma de Honduras, Tegucigalpa, Honduras, (e-mail: [email protected])
Abstract

Some elements of the theory and algorithmics corresponding to the computation of semilinear sparse models for discrete-time signals are presented. In this study, we will focus on approximately eventually periodic discrete-time signals, that is, signals that can exhibit an aperiodic behavior for an initial amount of time, and then become approximately periodic afterwards. The semilinear models considered in this study are obtained by combining sparse representation methods, linear autoregressive models and GRU neural network models, initially fitting each block model independently using some reference data corresponding to some signal under consideration, and then fitting some mixing parameters that are used to obtain a signal model consisting of a linear combination of the previously fitted blocks using the aforementioned reference data, computing sparse representations of some of the matrix parameters of the resulting model along the process. Some prototypical computational implementations are presented as well.

keywords:
Autoregressive models, neural-network models, parameter identification, least-squares approximation, time-series analysis.

1 Introduction

In this document, some elements of the theory and algorithmics corresponding to the computation of semilinear sparse models for discrete-time signals are presented. The study reported in this document is focused on approximately eventually periodic discrete-time signals, that is, signals that can exhibit an aperiodic behavior for an initial amount of time, and then become approximately periodic afterwards.

The main contribution of the work reported in this document is the application of a colaborative scheme involving sparse representation methods, linear autoregressive models and GRU neural network models, where each block model is first fitted independently using some reference data corresponding to some given signal. Subsequently, some mixing parameters are fitted, in order to obtain a signal model consisting of a linear combination of the previously fitted blocks, using the aforementioned reference data to fit the mixing coefficients. Along the process, some of the matrices of parameters of the resulting model are fitted using sparse representation methods. Some theoretical aspects of the aforementioned process are described in §3. As a byproduct of the work reported in this document, a toolset of Python programs for semilinear sparse signal model computation based on the ideas presented in §3 and §4 has been developed, and is available in Vides (2021a).

The applications of the sparse signal model identification technology developed as part of the work reported in this document, range from numerical simulation for predictive maintenance of industrial equipement and structures, to geological data analysis.

A prototypical algorithm for the computation of sparse semilinear autoregressors based on the ideas presented in §3, is presented in §4. Some illustrative computational implementations of the prototypical algorithm presented in §4 are documented in §5.

2 Preliminaries and Notation

Given δ>0\delta>0, let us consider the function defined by the expression

Hδ(x)={1,x>δ0,xδ.\displaystyle H_{\delta}(x)=\left\{\begin{array}[]{ll}1,&x>\delta\\ 0,&x\leq\delta\end{array}\right..

Given a matrix Am×nA\in\mathbb{C}^{m\times n} with singular values denoted by the expressions sj(A)s_{j}(A) for j=1,,min{m,n}j=1,\ldots,\min\{m,n\}. We will write rkδ(A)\mathrm{rk}_{\delta}(A) to denote the number

rkδ(A)=j=1min{m,n}Hδ(sj(A)).\displaystyle\mathrm{rk}_{\delta}(A)=\sum_{j=1}^{\min\{m,n\}}H_{\delta}(s_{j}(A)).

Given a time series Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1}\subset\mathbb{C}, a positive integer LL and any tLt\geq L, we will write 𝐱L(t)\mathbf{x}_{L}(t) to denote the vector

𝐱L(t)=[xtL+1xtL+2xt1xt]L.\mathbf{x}_{L}(t)=\begin{bmatrix}x_{t-L+1}&x_{t-L+2}&\cdots&x_{t-1}&x_{t}\end{bmatrix}^{\top}\in\mathbb{C}^{L}.

Given an ordered sample ΣN={xt}t=1NΣ\Sigma_{N}=\{x_{t}\}_{t=1}^{N}\subset\Sigma from a time series Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1}, we will write L(ΣN)\mathcal{H}_{L}(\Sigma_{N}) to denote the Hankel type trajectory matrix corresponding to ΣN\Sigma_{N}, that is determined by the following expression.

L(ΣN)\displaystyle\mathcal{H}_{L}(\Sigma_{N}) =[x1x2x3xNL+1x2x3x4xNL+2xLxL+1xL+2xN]\displaystyle=\begin{bmatrix}x_{1}&x_{2}&x_{3}&\cdots&x_{N-L+1}\\ x_{2}&x_{3}&x_{4}&\cdots&x_{N-L+2}\\ \vdots&\vdots&\vdots&\iddots&\vdots\\ x_{L}&x_{L+1}&x_{L+2}&\cdots&x_{N}\end{bmatrix}

We will write InI_{n} to denote de identity matrix in n×n\mathbb{C}^{n\times n}, and we will write e^j,n\hat{e}_{j,n} to denote the matrices in n×1\mathbb{C}^{n\times 1} representing the canonical basis of n\mathbb{C}^{n} (each e^j,n\hat{e}_{j,n} equals the jj-column of InI_{n}).

We will write 𝐒1\mathbf{S}^{1} to denote the set {z:|z|=1}\{z\in\mathbb{C}:|z|=1\}. Given any matrix Xm×nX\in\mathbb{C}^{m\times n}, we will write XX^{\ast} to denote the conjugate transpose X¯n×m\overline{X}^{\top}\in\mathbb{C}^{n\times m} of XX. A matrix Pn×nP\in\mathbb{C}^{n\times n} will be called an orthogonal projector whenever P2=P=PP^{2}=P=P^{\ast}. Given any matrix An×nA\in\mathbb{C}^{n\times n}, we will write σ(A)\sigma(A) to denote the spectrum of AA, that is, the set of eigenvalues of AA.

3 Semilinear Modeling of Approximately Eventually Periodic Signals

A discrete-time signal represented by a times series Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1} is said to be approximately eventually periodic (AEP) if it can be aperiodic for an initial amount of time, and then becomes approximately periodic afterwards. In other words, there are ε>0\varepsilon>0 and two integers T,S>0T,S>0 such that |xt+kTxt|ε|x_{t+kT}-x_{t}|\leq\varepsilon for each tSt\geq S and each integer k0k\geq 0. The integer TT will be called an approximate period of Σ\Sigma.

Remark 1

Based on the notion of approximately eventually periodic signal considered on this study, it can be seen that given an AEP signal Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1}, there is a positive integer SS such that the tail {xt}tS\{x_{t}\}_{t\geq S} of Σ\Sigma is approximately periodic.

3.1 Semilinear Sparse Autoregressors

Given an AEP signal Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1}\subset\mathbb{C} and a lag value L>0L>0. For the study reported in this document we will consider semilinear signal models of the form:

xt+1\displaystyle x_{t+1} =𝒮(𝐱L(t))\displaystyle=\mathcal{S}(\mathbf{x}_{L}(t))
=(𝐱L(t))+𝒢(𝐱L(t))+(𝐱L(t)),tL,\displaystyle=\mathcal{L}(\mathbf{x}_{L}(t))+\mathcal{G}(\mathbf{x}_{L}(t))+\mathcal{E}(\mathbf{x}_{L}(t)),t\geq L, (1)

where (𝐱L(t))\mathcal{L}(\mathbf{x}_{L}(t)) denotes a linear operation defined by the expression

(𝐱L(t))=c1xt+c2xt1+c3xt2++cLxtL+1,\mathcal{L}(\mathbf{x}_{L}(t))=c_{1}x_{t}+c_{2}x_{t-1}+c_{3}x_{t-2}+\cdots+c_{L}x_{t-L+1}, (2)

the term 𝒢(𝐱L(t))\mathcal{G}(\mathbf{x}_{L}(t)) represents a linear combination of neural networks based on gated recurrent unit (GRU) layers, whose structure is described by the block diagram

𝐀\mathbf{A}𝐆𝐑𝐔\mathbf{GRU}𝐡(t)\mathbf{h}(t)𝐱L(t)\mathbf{x}_{L}(t)x^t+1\hat{x}_{t+1} (3)

and (𝐱L(t))\mathcal{E}(\mathbf{x}_{L}(t)) represents some suitable error term. For some given integer m>0m>0, each GRU cell 𝐆j\mathbf{G}_{j} in the GRU block in (3) is described for each j=1,,mj=1,\ldots,m by the following equations:

rj(t)\displaystyle r_{j}(t) =σ(e^j,m(Wir𝐱L(t)+Whr𝐡(t1)+br))\displaystyle=\sigma\left(\hat{e}_{j,m}^{\top}\left(W_{ir}\mathbf{x}_{L}(t)+W_{hr}\mathbf{h}(t-1)+b_{r}\right)\right)
zj(t)\displaystyle z_{j}(t) =σ(e^j,m(Wiz𝐱L(t)+Whz𝐡(t1)+bz))\displaystyle=\sigma\left(\hat{e}_{j,m}^{\top}\left(W_{iz}\mathbf{x}_{L}(t)+W_{hz}\mathbf{h}(t-1)+b_{z}\right)\right)
nj(t)\displaystyle n_{j}(t) =tanh(e^j,m(Win𝐱L(t)+bn)\displaystyle=\tanh(\hat{e}_{j,m}^{\top}\left(W_{in}\mathbf{x}_{L}(t)+b_{n}\right)
+rj(t)e^j,m(Whn𝐡(t1)))\displaystyle+r_{j}(t)\hat{e}_{j,m}^{\top}\left(W_{hn}\mathbf{h}(t-1)\right))
hj(t)\displaystyle h_{j}(t) =(1zj(t))nj(t)+zj(t)hj(t1)\displaystyle=(1-z_{j}(t))n_{j}(t)+z_{j}(t)h_{j}(t-1) (4)

with 𝐡(t)=[h1(t)hm(t)]\mathbf{h}(t)=\begin{bmatrix}h_{1}(t)&\cdots&h_{m}(t)\end{bmatrix}^{\top}, and where σ\sigma denotes the sigmoid function. The configuration considered in (3) is based on a GRU layer in order to prevent vanishing gradients, by taking advantage of the GRU structure presented in Cho et al. (2014). Although LSTM networks can be used to prevent vanishing gradients as well, for the study reported in this document GRU networks were chosen instead of LSTM networks, as they have fewer trainable parameters. The affine layer 𝐀\mathbf{A} of the neural network described in (3) is determined by the expression

𝐀(𝐡(t))=𝐰A𝐡(t)+bA.\mathbf{A}(\mathbf{h}(t))=\mathbf{w}_{A}^{\top}\mathbf{h}(t)+b_{A}.

In order to compute models of the form (1), we can combine sparse autoregressive models of the form (2) that can be computed using the methods presented in Vides (2021b), with GRU neural network models of the form (3) that can be computed using the computational tools provided as part of TensorFlow, Keras and PyTorch, that are described as part of Chollet et al. (2015) and Paszke et al. (2019).

An approximate representation

~(𝐱L(t))=c~1xt+c~2xt1+c~3xt2++c~LxtL+1,\tilde{\mathcal{L}}(\mathbf{x}_{L}(t))=\tilde{c}_{1}x_{t}+\tilde{c}_{2}x_{t-1}+\tilde{c}_{3}x_{t-2}+\cdots+\tilde{c}_{L}x_{t-L+1},

of the linear component of (1) such that

(𝐱L(t))~(𝐱L(t)),tL\displaystyle\mathcal{L}(\mathbf{x}_{L}(t))\approx\tilde{\mathcal{L}}(\mathbf{x}_{L}(t)),t\geq L

can be computed using some sample ΣN={xt}t=1N\Sigma_{N}=\{x_{t}\}_{t=1}^{N} and a corresponding subsample Σ0={xt}t=1N1ΣN\Sigma_{0}=\{x_{t}\}_{t=1}^{N-1}\subset\Sigma_{N} for some suitable N>LN>L, by approximately solving the matrix equation

L(Σ0)[cLcL1c2c1]=[xL+1xL+2xN1xN],\displaystyle\mathcal{H}_{L}(\Sigma_{0})^{\top}\begin{bmatrix}c_{L}\\ c_{L-1}\\ \vdots\\ c_{2}\\ c_{1}\end{bmatrix}=\begin{bmatrix}x_{L+1}\\ x_{L+2}\\ \vdots\\ x_{N-1}\\ x_{N}\end{bmatrix}, (5)

using sparse least-squares approximation methods.

Schematically, the semilinear autoregressors considered in this study can be described by a block diagram of the form,

𝔐\mathfrak{M} 𝔏\mathfrak{L} 𝔊1\mathfrak{G}_{1} \vdots 𝔊N\mathfrak{G}_{N} xt+1x_{t+1}𝐱L(t)\mathbf{x}_{L}(t) (6)

where the block 𝔏\mathfrak{L} is represented by (2), each block 𝔊j\mathfrak{G}_{j} is represented by (3), and where the block 𝔐\mathfrak{M} is a mixing block defined by the expression

𝔐(y1(t),,yN+1(t))=j=1N+1wjyj(t),\mathfrak{M}(y_{1}(t),\ldots,y_{N+1}(t))=\sum_{j=1}^{N+1}w_{j}y_{j}(t),

for some coefficients wjw_{j} to be determined and some given NN, with y1(t)=𝔏(𝐱L(t))y_{1}(t)=\mathfrak{L}(\mathbf{x}_{L}(t)) and yk+1(t)=𝔊k(𝐱L(t))y_{k+1}(t)=\mathfrak{G}_{k}(\mathbf{x}_{L}(t)) for each k=1,,Nk=1,\ldots,N and each tLt\geq L.

The details of the computation of the neural network blocks of model (1) will be omitted for brevity, for details on the theory and computation of the GRU neural network models considered for this study the reader is kindly referred to Cho et al. (2014), Chollet et al. (2015), Paszke et al. (2019) and Vides (2021a).

Several interesting papers have been written on the subject of hybrid time series models that combine ARIMA and ANN models, two important references on this matter are Zhang (2003) and Khandelwal et al. (2015). Besides using sparse AR models instead of ARIMA models, another important distinctive aspect of the modeling approach reported in this document, is that instead of using the recurrent neural network components of (6) represented by 𝒢\mathcal{G} in (1) to approximate the residuals rt=xt+1(𝐱L(t))r_{t}=x_{t+1}-\mathcal{L}(\mathbf{x}_{L}(t)). Using some suitable training subsets ΣI,ΣM\Sigma_{I},\Sigma_{M} of a given data sample ΣN\Sigma_{N} from an arbitrary AEP signal Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1} under consideration, first the parameters of the blocks 𝔏\mathfrak{L}, 𝔊1\mathfrak{G}_{1}, \cdots, 𝔊N\mathfrak{G}_{N} of (6) are fitted independently using ΣI\Sigma_{I}, and then the coefficients of the block 𝔐\mathfrak{M} of (6) are fitted using ΣM\Sigma_{M} and some corresponding predicted values generated with 𝔏\mathfrak{L}, 𝔊1\mathfrak{G}_{1}, \cdots, 𝔊N\mathfrak{G}_{N}. Computing sparse representations of some of the matrix parameters of the resulting model along the process.

3.2 An Operator Theoretic Approach to the Computation
of Linear Components of Semilinear Sparse Autoregressors

Given an AEP signal Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1} whose behavior can be approximately described by a model of the form (2), that can be computed by approximately solving an equation of the form (5) for some suitable integers N,L>0N,L>0 with N>LN>L, and given some sample Σ0={xt}t=1N1\Sigma_{0}=\{x_{t}\}_{t=1}^{N-1}, if we consider any sample Σ~0={x~t}t=1N1Σ\tilde{\Sigma}_{0}=\{\tilde{x}_{t}\}_{t=1}^{N-1}\subset\Sigma, such that for some positive integer SS the states in Σ~0\tilde{\Sigma}_{0} satisfy the conditions x~t=xt+S\tilde{x}_{t}=x_{t+S}, for each t=1,,N1t=1,\ldots,N-1. We will have that the matrix

CL=[0100000001cLcL1c2c1]L×L\displaystyle C_{L}=\begin{bmatrix}0&1&0&\cdots&\cdots&0\\ 0&\ddots&\ddots&\ddots&&\vdots\\ \vdots&\ddots&\ddots&\ddots&\ddots&\vdots\\ \vdots&&\ddots&\ddots&\ddots&0\\ 0&\cdots&\cdots&0&0&1\\ c_{L}&c_{L-1}&\cdots&\cdots&c_{2}&c_{1}\end{bmatrix}\in\mathbb{C}^{L\times L} (7)

will approximately satisfy the condition

L(Σ0)(CLS)=L(Σ~0).\displaystyle\mathcal{H}_{L}(\Sigma_{0})^{\top}\left(C_{L}^{S}\right)^{\top}=\mathcal{H}_{L}(\tilde{\Sigma}_{0})^{\top}. (8)

Using matrices of the form (7) one can express linear models of the form (2) as follows.

(𝐱L(t))=e^L,LCL𝐱L(t)\mathcal{L}(\mathbf{x}_{L}(t))=\hat{e}_{L,L}^{\top}C_{L}\mathbf{x}_{L}(t) (9)

One can observe that to each model of the form (2), there corresponds a matrix of the form (7). From here on, a matrix that satisfies the previous conditions will be called the matrix form of a linear model \mathcal{L} of the form (2).

Given δ>0\delta>0, and two matrices Am×nA\in\mathbb{C}^{m\times n} and Ym×pY\in\mathbb{C}^{m\times p}, let us write AXδYAX\approx_{\delta}Y to represent the problem of finding Xn×pX\in\mathbb{C}^{n\times p}, α,β0\alpha,\beta\geq 0 and an orthogonal projector QQ such that AXYFαδ+β(ImQ)YF\|AX-Y\|_{F}\leq\alpha\delta+\beta\|(I_{m}-Q)Y\|_{F}. The matrix XX will be called a solution to the problem AXδYAX\approx_{\delta}Y.

Theorem 2

Given δ>0\delta>0, two integers L,M>0L,M>0, a sample ΣN={xt}t=1N\Sigma_{N}=\{x_{t}\}_{t=1}^{N} from an AEP signal Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1} with N>LN>L, and a matrix AL×MA\in\mathbb{C}^{L\times M}. If r=rkδ(L(ΣN))>0r=\mathrm{rk}_{\delta}(\mathcal{H}_{L}(\Sigma_{N}))>0, then there is a sparse matrix A^L×M\hat{A}\in\mathbb{C}^{L\times M} with at most MrMr nonzero entries such that L(ΣN)A^δL(ΣN)A\mathcal{H}_{L}(\Sigma_{N})^{\top}\hat{A}\approx_{\delta}\mathcal{H}_{L}(\Sigma_{N})^{\top}A.

{pf}

Since rkδ(L(ΣN))=rkδ(L(ΣN))>0\mathrm{rk}_{\delta}(\mathcal{H}_{L}(\Sigma_{N})^{\top})=\mathrm{rk}_{\delta}(\mathcal{H}_{L}(\Sigma_{N}))>0 by (Vides, 2021b, Lemma 3.2). This result is a consequence of the application of (Vides, 2021b, Theorem 3.6) to the problem L(ΣN)A^δL(ΣN)A\mathcal{H}_{L}(\Sigma_{N})^{\top}\hat{A}\approx_{\delta}\mathcal{H}_{L}(\Sigma_{N})^{\top}A.

Given an AEP signal Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1} with approximate period TT and a signal model 𝒮\mathcal{S} for Σ\Sigma of the form (1), if for the linear component \mathcal{L} of 𝒮\mathcal{S} the corresponding residuals rt=|xt+1(𝐱L(t))|r_{t}=|x_{t+1}-\mathcal{L}(\mathbf{x}_{L}(t))| are small, then the significative contribution of \mathcal{L} to the modeling process of Σ\Sigma, will be beneficial for interpretability purposes.

Let us consider an AEP signal Σ\Sigma with approximate period TT and approximately periodic tail Σ~={xt}tS\tilde{\Sigma}=\{x_{t}\}_{t\geq S}. Given some suitable integer lag value L>0L>0 such that there is an approximate linear model \mathcal{L} for Σ\Sigma of the form (2) with small residuals rt=|xt+1(𝐱L(t))|r_{t}=|x_{t+1}-\mathcal{L}(\mathbf{x}_{L}(t))|. Let CLC_{L} be the matrix form of \mathcal{L} and let s=S+L1s=S+L-1. By applying a Krylov subspace approach along the lines presented in (Saad, 2011, §6.1), and as a consequence of (Vides, 2021b, Theorem 4.3.), one can find a matrix WkL×kW_{k}\in\mathbb{C}^{L\times k} whose columns form an orthonormal basis of the subspace 𝒦TL\mathcal{K}_{T}\subset\mathbb{C}^{L} determined by the expression

𝒦T=span({𝐱L(s),CL𝐱L(s),CL2𝐱L(s),,CLT1𝐱L(s)}),\mathcal{K}_{T}=\mathrm{span}\>(\{\mathbf{x}_{L}(s),C_{L}\mathbf{x}_{L}(s),C_{L}^{2}\mathbf{x}_{L}(s),\ldots,C_{L}^{T-1}\mathbf{x}_{L}(s)\}),

such that each zσ(WkCLWk)z\in\sigma(W_{k}^{\ast}C_{L}W_{k}) satisfies the relation |zT1|ε|z^{T}-1|\leq\varepsilon, for some ε>0\varepsilon>0. The matrix WkCLWkW_{k}^{\ast}C_{L}W_{k} will be called an approximately periodic (AP) Σ\Sigma-section of CLC_{L} and will be denoted by CL|ΣAPC_{L}|_{\Sigma}^{AP}.

Remark 3

Based on the previous considerations, when a given AEP signal Σ={xt}t1\Sigma=\{x_{t}\}_{t\geq 1} with approximate period TT is well explained by the linear component of a semilinear model 𝒮\mathcal{S} of the form (1), that is, when the corresponding residuals are relatively small, the matrix CL|ΣAPC_{L}|_{\Sigma}^{AP} corresponding to the model should mimic the approximate periodicity of the approximately periodic tail {xt}tS\{x_{t}\}_{t\geq S} of Σ\Sigma, in the sense that the number (CL|ΣAP)TIk\|(C_{L}|_{\Sigma}^{AP})^{T}-I_{k}\| should be relatively small for some suitable matrix norm \|\cdot\| (in the sense of (Saad, 2011, §1.5)). Consequently, the points in σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) should cluster around 11.

4 Algorithms

The ideas in section §3 can be tanslated into a prototypical algorithm represented by algorithm 1, that relies on (Vides, 2021b, Algorithm 1), Theorem 2 and (Vides, 2021b, Theorem 4.3.).

Data: ΣN={xt}t=1N\Sigma_{N}=\{x_{t}\}_{t=1}^{N}\subset\mathbb{C}
Result: 𝐜,𝔊1,,𝔊j,𝐰M=𝐒𝐩𝐀𝐑𝐒𝐌𝐨𝐝𝐞𝐥(ΣN)\mathbf{c},\mathfrak{G}_{1},\ldots,\mathfrak{G}_{j},\mathbf{w}_{M}=\mathbf{SpARSModel}(\Sigma_{N})
  • 0:

    Estimate the lag value LL using auto-correlation function based methods;

  • 1:

    Approximately solve (5) for 𝐜\mathbf{c} using the data in ΣN\Sigma_{N} and applying (Vides, 2021b, Algorithm 1);

  • 2:

    Fit the model blocks 𝔊j\mathfrak{G}_{j} of (6) using the data in ΣN\Sigma_{N}. 3: For the GRU layers of each 𝔊j\mathfrak{G}_{j}, compute sparse representations of the corresponding input weights Wir,Wiz,WinW_{ir},W_{iz},W_{in} in (4) when appropriate, applying Theorem 2. 4: Compute the coefficients 𝐰M=(w1,,wN+1)\mathbf{w}_{M}=(w_{1},\ldots,w_{N+1}) of the mixing block 𝔐\mathfrak{M} of (6) using the data in ΣN\Sigma_{N} and (Vides, 2021b, Algorithm 1);

  • return 𝐜,𝔊1,,𝔊j,𝐰M\mathbf{c},\mathfrak{G}_{1},\ldots,\mathfrak{G}_{j},\mathbf{w}_{M}
    Algorithm 1 SpARSModel: Semilinear Sparse Model Parameters Computation

    We can apply algorithm 1 to compute the model parameters needed for the computation of signal models of the form (6).

    5 Numerical Experiments

    In this section, some computational implementations of the methods reported in this document are presented. The experimental results documented in this section can be replicated using the function NumericalExperiment.py or the Jupyter notebook SLSpAARModelsDemo.ipynb, that are available in Vides (2021a). The configuration required to replicate the results in this section is available as part of the aforementioned programs.

    From here on, we will refer to the sparse semilinear models proposed in this document as SpARS models. The signal approximations computed using the SpARS models presented in this document are compared with the approximations obtained using standard AR models, and the standard AR models are computed using the Python program Autoreg included as part of the statsmodels module. In this section we will write nznz to denote nonzero elements. For the experiments documented in this section two GRU RNN blocks were used, the block 𝔊1\mathfrak{G}_{1} was computed using TensorFlow 2.6.0 and its input weights were replaced by their corresponding sparse representations, that were computed using (Vides, 2021b, Algorithm 1) along the lines of Theorem 2, and the block 𝔊2\mathfrak{G}_{2} was computed using PyTorch 1.9.1+cpu and its input weights were left unchanged.

    5.1 Numerical Experiment 1

    In this section, algorithm 1 is applied to compute a SpARS model for the signal data sample recorded in the csv file VortexSheddingSignal.csv in the DataSets folder in Vides (2021a). The graphic representations of the results produced by the command line

    >>> NumericalExperiment(1)
    

    are shown in figures 1 and 2, respectively.

    Refer to caption
    Figure 1: Reference and identified signals for SpARS model (top left). Approximation errors (top right). Linear component parameters of the SpARS model with 1 nznz (bottom left). Linear component parameters of the standard AR model with 215 nznz (bottom right).
    Refer to caption
    Figure 2: σ(CL|ΣAP)\sigma(C_{L}|_{\Sigma}^{AP}) for the linear component of the SpARS model (top left). σ(CL|ΣAP)\sigma(C_{L}|_{\Sigma}^{AP}) for the linear component of the standard AR model (top right). σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) for the linear component of the SpARS model (bottom left). σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) for the linear component of the standard AR model (bottom right).

    In every figure like figure 2, the red dots represent the points in each considered spectrum, the blue lines represent 𝐒1\mathbf{S}^{1}, and the number TT represents the estimated approximate period for each signal considered.

    5.2 Numerical Experiment 2

    In this section, algorithm 1 is applied to compute a SpARS model for the signal data sample recorded in the csv file NLOscillatorSignal.csv in the DataSets folder in Vides (2021a). The graphic representations of the results produced by the command line

    >>> NumericalExperiment(2)
    

    are shown in figures 3 and 4, respectively.

    Refer to caption
    Figure 3: Reference and identified signals for SpARS model (top left). Approximation errors (top right). Linear component parameters of the SpARS model with 92 nznz (bottom left). Linear component parameters of the standard AR model with 165 nznz (bottom right).
    Refer to caption
    Figure 4: σ(CL|ΣAP)\sigma(C_{L}|_{\Sigma}^{AP}) for the linear component of the SpARS model (top left). σ(CL|ΣAP)\sigma(C_{L}|_{\Sigma}^{AP}) for the linear component of the standard AR model (top right). σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) for the linear component of the SpARS model (bottom left). σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) for the linear component of the standard AR model (bottom right).

    5.3 Numerical Experiment 3

    In this section, algorithm 1 is applied to compute a SpARS model for the signal data sample recorded in the csv files:

    • art_daily_no_noise.csv

    • art_daily_small_noise.csv

    that are included as part of the datasets described in Ahmad et al. (2017). The graphic representations of the results produced by the command line

    >>> NumericalExperiment(3.1)
    

    for the periodic signal without noise are shown in figures 5 and 6, respectively. The graphic representations of the results produced by the command line

    >>> NumericalExperiment(3.2)
    

    for the periodic signal with noise are shown in figures 7 and 8.

    Refer to caption
    Figure 5: Reference and identified signals for SpARS model (top left). Approximation errors (top right). Linear component parameters of the SpARS model with 8 nznz (bottom left). Linear component parameters of the standard AR model with 288 nznz (bottom right).
    Refer to caption
    Figure 6: σ(CL|ΣAP)\sigma(C_{L}|_{\Sigma}^{AP}) for the linear component of the SpARS model (top left). σ(CL|ΣAP)\sigma(C_{L}|_{\Sigma}^{AP}) for the linear component of the standard AR model (top right). σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) for the linear component of the SpARS model (bottom left). σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) for the linear component of the standard AR model (bottom right).
    Refer to caption
    Figure 7: Reference and identified signals for SpARS model (top left). Approximation errors (top right). Linear component parameters of the SpARS model with 5 nznz (bottom left). Linear component parameters of the standard AR model with 288 nznz (bottom right).
    Refer to caption
    Figure 8: σ(CL|ΣAP)\sigma(C_{L}|_{\Sigma}^{AP}) for the linear component of the SpARS model (top left). σ(CL|ΣAP)\sigma(C_{L}|_{\Sigma}^{AP}) for the linear component of the standard AR model (top right). σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) for the linear component of the SpARS model (bottom left). σ((CL|ΣAP)T)\sigma((C_{L}|_{\Sigma}^{AP})^{T}) for the linear component of the standard AR model (bottom right).

    5.4 Approximation Errors

    The approximation root mean square errors (RMSE) are summarized in table 1.

    Table 1: RMSE
    Experiments SpARS Model AR Model
    Experiment 1 0.0031265041 0.0000035594
    Experiment 2 0.0162975118 0.3100591516
    Experiment 3.1 0.0000000000 0.0000000074
    Experiment 3.2 3.9894749119 4.0939437825

    It is appropriate to mention that the root mean square errors can present little fluctuations as one performs several numerical simulations, due primarily to the nature of the neural-network models, as the linear components tend to present very low or no variability from simulation to simulation.

    5.5 Data Availability

    The Python programs that support the findings of this study are openly available in the SPAAR repository, reference number Vides (2021a). The time series data used for the experiments 1 and 2 documented in §5.1 and §5.2, respectively, are available as part of Vides (2021a). The time series data used for experiment 3 in §5.3 are available as part of the Numenta Anomaly Benchmark (NAB) described in Ahmad et al. (2017).

    6 Conclusion

    Although in some experiments in §5 the root mean square errors corresponding to the AR and SpARS models are similar, the AP Σ\Sigma-sections corresponding to the SpARS models exhibit a better mimetic approximately periodic behavior in the sense of remark 3, as it can be visualized in figures 2, 4, 6 and 8. This mimetic behavior is interesting not just from a theoretical point of view, as it provides a criterion for how well the linear component of a given model mimics or captures the eventual approximate periodic behavior of the signal under study, but also for practical computational reasons, as long term predictions or simulations can be affected when the eigenvalues of the AP Σ\Sigma-section of the matrix form corresponding to the linear component of a signal model, lie outside the set 𝐃1={z:|z|1}\mathbf{D}^{1}=\{z\in\mathbb{C}:|z|\leq 1\}, as one can observe in figures 3 and 4. Another advantage of the SpARS modeling technology is the reduction of the amount of data needed to fit the corresponding linear component, when compared to a nonsparse linear model, as illustrated in §5.1. Although the RMSE for the standard AR model was smaller than the RMSE of the SpARS model, only 50%50\% of the reference data were needed to fit the sparse model, while 90%90\% of the reference data were needed to fit the standard AR model. This difference for the amounts of training data was only necessary for the experiment in §5.1 due to the relatively small data sample size with respect to the lag value, for the other experiments, 50%50\% of the reference data were used to train both the sparse and the nonsparse models.

    {ack}

    The numerical simulations documented in this paper were performed with computational resources from the Scientific Computing Innovation Center of UNAH, as part of the researh project PI-063-DICIHT. Some experiments were performed on Google Colab and IBM Quantum Lab as well.

    References

    • Ahmad et al. (2017) Ahmad, S., Lavin, A., Purdy, S., and Agha, Z. (2017). Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 262, 134–147. https://doi.org/10.1016/j.neucom.2017.04.070. Online Real-Time Learning Strategies for Data Streams.
    • Cho et al. (2014) Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1724–1734. Association for Computational Linguistics, Doha, Qatar. 10.3115/v1/D14-1179. URL https://aclanthology.org/D14-1179.
    • Chollet et al. (2015) Chollet, F. et al. (2015). Keras. https://keras.io.
    • Khandelwal et al. (2015) Khandelwal, I., Adhikari, R., and Verma, G. (2015). Time series forecasting using hybrid arima and ann models based on dwt decomposition. Procedia Computer Science, 48, 173–179. https://doi.org/10.1016/j.procs.2015.04.167. International Conference on Computer, Communication and Convergence (ICCC 2015).
    • Paszke et al. (2019) Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (eds.), Advances in Neural Information Processing Systems 32, 8024–8035. Curran Associates, Inc.
    • Saad (2011) Saad, Y. (2011). Numerical Methods for Large Eigenvalue Problems. Society for Industrial and Applied Mathematics. 10.1137/1.9781611970739.
    • Vides (2021a) Vides, F. (2021a). Spaar: Sparse signal identification python toolset. URL https://github.com/FredyVides/SPAAR.
    • Vides (2021b) Vides, F. (2021b). Sparse system identification by low-rank approximation. CoRR, abs/2105.07522. URL https://arxiv.org/abs/2105.07522.
    • Zhang (2003) Zhang, G. (2003). Time series forecasting using a hybrid arima and neural network model. Neurocomputing, 50, 159–175. https://doi.org/10.1016/S0925-2312(01)00702-0.