This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Efficient Sparse Identification Algorithm For Stochastic Systems With General Observation Sequences

Ziming Wang, Xinghua Zhu Z. M. Wang is with NCMIS, LSEC, Academy of Mathematics and Systems Science, Beijing, P. R. China. Emails: [email protected]. X. H. Zhu is with the Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P. R. China. Emails: [email protected].* Corresponding author.
Abstract

This paper studies the sparse identification problem of unknown sparse parameter vectors in stochastic dynamic systems. Firstly, a novel sparse identification algorithm is proposed, which can generate sparse estimates based on least squares estimation by adaptively adjusting the threshold. Secondly, under a possibly weakest non-persistent excited condition, we prove that the proposed algorithm can correctly identify the zero and nonzero elements of the sparse parameter vector using a finite number of observations, and further estimates of the nonzero elements almost surely converge to the true values. Compared with the related works, e.g., LASSO, our method only requires the weakest assumptions and does not require solving additional optimization problems. Besides, our theoretical results do not require any statistical assumptions on the regression signals, including independence or stationarity, which makes our results promising for application to stochastic feedback systems. Thirdly, the number of finite observations that guarantee the convergence of the zero-element set of unknown sparse parameters of the Hammerstein system is derived for the first time. Finally, numerical simulations are provided, demonstrating the effectiveness of the proposed method. Since there is no additional optimization problem, i.e., no additional numerical error, the proposed algorithm performs much better than other related algorithms.

Index Terms:
Stochastic dynamic system; Sparse parameter identification; The weakest non-persistent excited condition; Feedback control system; Strong consistency

I Introduction

Parameter estimation or filtering plays a very important role in system identification and control, signal processing, statistical learning, and other fields ([12; 20]). In recent decades, the classical identification and estimation theory has made great progress, and a relatively complete theoretical system has been formed ([10; 11]).

We note that in many practical application scenarios, such as the selection of effective basis functions in Hamiltonian systems, feature selection and robust inference of biomarkers based on omics data in personalized healthcare, channel estimation in ultra-wideband communication systems, since many components in the unknown parameter vector have no or negligible contribution to the system (these elements are zero or close to zero), the unknown parameter vector in a large number of systems is high-dimensional and sparse. Naturally, an interesting research direction is sparse parameter identification, i.e., the precise identification of zero and nonzero elements in the unknown parameter vector, in order to reduce model redundancy and obtain a leaner, better performance, and more reliable prediction model.

It is found that the research on sparse parameter identification has made considerable progress. In the field of signal processing, we find that the research of sparse parameter identification is based on compressed sensing (CS), i.e., solving an L0L_{0}(the number of nonzero elements) optimization problem with L2L_{2}(Euclidean metric) constraints. Since this optimization problem is NP-hard, researchers have developed some sparse signal estimation algorithms by transforming L0L_{0} optimization into a more solvable L1L_{1} convex optimization problem using the norm equivalence principle. For example, by using compressed sensing methods, Kalouptsidis et al. in [13] developed a sparse identification algorithms based on Kalman filtering and Expectation-Maximization, and verified the efficiency of the proposed algorithm via simulations. Candès and Tao et al. in [3; 2; 4] developed a relatively perfect signal reconstruction theory under some prior assumptions of the sparsity of the model. See [14; 5; 8] for more references. The variable selection problem plays a very important role in the field of statistical learning. The well-known LASSO (the least absolute shrinkage and selection operator) is a linear regression method with L1L_{1} regularization, and sparse identification algorithms represented by it and its variants have been extensively studied. Tibshirani in [18] first proposed the LASSO in 1996. Zhao and Yu in [23] established the model selection consistency under regularity conditions. Zou in [26] first proposed the adaptive lasso and discussed the advantages of the adaptive lasso in detail. Combined with the ubiquitous sparsity in practical systems, we find that CS and adaptive lasso methodology are inherited and developed in the field of system identification and control, and the sparse parameter identification theory of stochastic dynamic systems is established. For example, Satheesh and Arun in [16] proposed a CS-based iterative basis pursuit denoising (BPDN) algorithm to estimate the unknown sparse parameters in the ARMAX model, where the number of zero elements of unknown parameters is known. Roland et al. in [19] studied the consistency of estimation for sparse linear regression models based on CS under the condition that the input signals are independent and identically distributed (i.i.d.). Considering that the regressors in the stochastic dynamic system are often generated by the past input and output signals, so the independence assumption about the observation data is difficult to be satisfied. Based on this, Xie and Guo in [21] developed a compressed consensus normalized least mean squares and provided the stability analysis of the algorithm based on CS theory, where the strong statistical assumptions such as independency are not required, but the prior assumption of the sparsity of the model is. Zhao et al. in [25] proposed a LASSO-based adaptive sparse estimation algorithm with general observation sequences. The zero elements in the unknown parameter are correctly identified with a finite number of observations and a non-persistent excited condition, which is the weakest excited condition in the existing literature about sparse parameter identification. There is also related research in some other fields, such as pruning in deep learning [1; 22], etc.

Summarizing the existing research, we find that there are still some shortcomings in the sparse parameter identification theory. Firstly, the sparse estimation algorithm based on CS theory requires prior knowledge about the sparsity of the unknown parameter and the regression vectors. Secondly, the theoretical analysis of the LASSO-based sparse estimation algorithm requires a regularity condition of the regressor, which is a persistent excited condition actually and difficult or almost impossible to satisfy in stochastic feedback systems. Even the non-persistent excited condition for regressor in [25] is much stronger than the weakest excited condition proposed by Lai and Wei in [15] (See subsection III-B and Table I for detail). Thirdly, for a specific system, such as the Hammerstein system, it is very instructive to give a definite number of finite observations under further assumptions about the regressor, but this cannot be done in [25].

Our contributions to the paper are summarized as follows:

  1. 1.

    We first propose a novel sparse parameter identification algorithm to estimate unknown and sparse parameters in stochastic regression systems. The principle of the algorithm is to generate a sparse estimate based on the least squares estimator by adjusting the threshold adaptively. Compared with the existing LASSO and its variants method for generating sparse estimates by optimizing the criterion function with a penalty term, our algorithm will not need to solve the optimization problem, so it will be more concise and efficient.

  2. 2.

    Unlike the classical LS estimator’s asymptotical theory ([15]), under the same condition, we prove that the proposed algorithm can correctly identify the set of zero elements and nonzero elements in the unknown sparse parameter vector under finite observation, which is the convergence of the set. It is worth pointing out that this result is parallel to the convergence results in [25], but the non-persistent excited condition required is much weaker than [25]. Furthermore, estimates of the nonzero will converge to the true values almost surely with a convergence rate of O(logRNλminN)O\left(\sqrt{\frac{\log R_{N}}{\lambda^{N}_{\min}}}\right), i.e., parameter convergence, which, to the best of the authors’ knowledge, has never been achieved in sparse parameter identification algorithms for stochastic regression models.

  3. 3.

    For a classical system, such as the Hammerstein system, whose strong convergence and convergence rate are established by Zhao in [24], we give the number of finite observations under the same conditions, which is never achieved by sparse parameter identification in stochastic dynamic systems.

  4. 4.

    Simulation experiments compare the proposed algorithm with the classical least squares (LS) estimation, LASSO estimation, and algorithm in [25]. In contrast, other algorithms cannot exactly identify the element 0, showing the great advantage of the algorithm proposed in this paper.

The rest of this paper is organized as follows. In Section II, we give the problem formulation. Section III presents the main results of this paper, including the parameter convergence, the set convergence, and the comparison of conditions for consistency of other algorithm and the proposed algorithm in this paper. Section IV applies the algorithm to the sparse parameter estimation of the Hammerstein system. A simulation example is given in Section V, and the concluding remarks are made in Section VI.

II Problem Formulation

II-A Some preliminaries

Let (Ω,,𝒫)(\Omega,\mathcal{F},\mathcal{P}) be the probability space, ω\omega be an element in Ω\Omega, and 𝔼()\mathbb{E}(\cdot) be the expectation operator. Denote \|\cdot\| as the 22-norm of vectors or matrices in this paper. For two positive sequences {ak}k1\left\{a_{k}\right\}_{k\geq 1} and {bk}k1\left\{b_{k}\right\}_{k\geq 1}, ak=O(bk)a_{k}=O\left(b_{k}\right) means akcbka_{k}\leq cb_{k} for k1k\geq 1 and some c>0c>0, while ak=o(bk)a_{k}=o\left(b_{k}\right) means ak/bk0a_{k}/b_{k}\rightarrow 0 as kk\rightarrow\infty.

II-B Sparse identification algorithm

Consider the parameter identification problem of the following discrete-time stochastic regression model,

yk+1=φkTθ+wk+1,k0\displaystyle y_{k+1}=\varphi_{k}^{T}\theta+w_{k+1},\quad k\geq 0 (1)

where yk+1y_{k+1} is the scalar observation or output at time kk, φkr\varphi_{k}\in\mathbb{R}^{r} is the r-dimensional stochastic regression vector which may be the function of current and past inputs and outputs, θr\theta\in\mathbb{R}^{r} is an unknown rr-dimensional parameter to be estimated, and wk+1w_{k+1} is the stochastic noise sequence.

The above model (1) includes many parameterized systems, such as ARX system and Hammerstein system. We further denote the parameter vector θ\theta and the index set of its zero elements by

θ\displaystyle\theta (θ(1),,θ(r))T\displaystyle\triangleq(\theta(1),\cdots,\theta(r))^{T}
H\displaystyle H^{*} {l{1,,r}θ(l)=0}.\displaystyle\triangleq\{l\in\{1,\cdots,r\}\mid\theta(l)=0\}. (2)

The classical identification algorithms, such as the LS algorithm, can generate consistent estimates for the unknown parameters as the number of data goes to infinity. However, due to the existence of system noises and the finite number of observations in practice, it is hard to deduce from such estimates which parameters are exactly zero or not. In the existing literature, L1L_{1} penalty term is added to the criterion function to be optimized to generate sparse estimates so the zero elements in the unknown parameter θ\theta can be identified. However, there are two issues worth thinking about. Firstly, the existence of the penalty term destroys the optimality of LS and makes the estimation inevitably biased, so a stronger excited condition is needed to ensure the consistency of the algorithm. Secondly, solving the optimization problem with the penalty term brings great difficulty. Our problem is to design a new sparse adaptive estimation algorithm that does not require penalty terms to infer the set HH^{*} in a finite number of steps and identify the unknown parameter θ\theta by using stochastic regression vectors and the observation signals {φk,yk+1}k=1n\left\{\varphi_{k},y_{k+1}\right\}_{k=1}^{n}.

Algorithm 1 Sparse Identification Algorithm

Step 0:(Initialization) Choose a positive sequence {αn}\left\{\alpha_{n}\right\} satisfying αn0\alpha_{n}\to 0 as nn\to\infty and

logRnλminn=o(αn).\displaystyle\sqrt{\frac{\log R_{n}}{\lambda_{\min}^{n}}}=o\left(\alpha_{n}\right).

Step 1: Based on {φk,yk+1}k=1n\left\{\varphi_{k},y_{k+1}\right\}_{k=1}^{n}, begin with an initial vector θ0\theta_{0} and an initial matrix P0>0P_{0}>0, compute the matrix Pn+11P_{n+1}^{-1} and the estimate θn+1\theta_{n+1} of θ\theta by the following recursive LS algorithm,

Pk+1\displaystyle P_{k+1} =PkakPkφkφkTPk,\displaystyle=P_{k}-a_{k}P_{k}\varphi_{k}\varphi^{T}_{k}P_{k}, (3)
ak\displaystyle a_{k} =11+φkTPkφk,\displaystyle=\frac{1}{1+\varphi^{T}_{k}P_{k}\varphi_{k}}, (4)
θk+1\displaystyle\theta_{k+1} =θk+akPkφk{yk+1φkTθk}.\displaystyle=\theta_{k}+a_{k}P_{k}\varphi_{k}\big{\{}y_{k+1}-\varphi^{T}_{k}\theta_{k}\big{\}}. (5)

Step 2: Obtain βn+1(l)\beta_{n+1}(l) by the following mechanism,

βn+1(l)={0if|θn+1(l)|<αn+1θn+1(l)else,\displaystyle\beta_{n+1}(l)=\left\{\begin{array}[]{ll}0\quad if\ |\theta_{n+1}(l)|<\alpha_{n+1}\\ \theta_{n+1}(l)\quad else\end{array}\right., (8)

then obtain

βn+1\displaystyle\beta_{n+1} =(βn+1(1),,βn+1(r))T\displaystyle=\left(\beta_{n+1}(1),\cdots,\beta_{n+1}(r)\right)^{T} (9)
Hn+1\displaystyle H_{n+1} {l=1,,rβn+1(l)=0}.\displaystyle\triangleq\left\{l=1,\cdots,r\mid\beta_{n+1}(l)=0\right\}. (10)

III Theoretical Properties of the Sparse Identification Algorithm

In this section, we will investigate the asymptotic analysis of the unknown sparse parameter vector and the convergence of the sets of zero elements with a finite number of observations, To this end, we introduce the following assumptions to be used for the theoretical analysis.

Assumption 1

The noise {wk,k}k1\left\{w_{k},\mathcal{F}_{k}\right\}_{k\geq 1} is a martingale difference sequence, i.e., 𝔼[wk+1k]=0,k1\mathbb{E}\left[w_{k+1}\mid\mathcal{F}_{k}\right]=0,k\geq 1, and there exists some γ>2\gamma>2 such that supk𝔼[|wk+1|γk]<\sup_{k}\mathbb{E}\left[\left|w_{k+1}\right|^{\gamma}\mid\mathcal{F}_{k}\right]<\infty a.s.

Assumption 2 (Non-Excited Condition)

The growth rate of log(λmax{Pn+11})\ \log\big{(}\lambda_{\max}\big{\{}P_{n+1}^{-1}\big{\}}\big{)} is slower than λmin{Pn+11}\lambda_{\min}\big{\{}P_{n+1}^{-1}\big{\}}, that is

limnlogRnλminn=0a.s.\lim\limits_{n\to\infty}\frac{\log R_{n}}{\lambda^{n}_{\min}}=0\ \ \ \ a.s.

holds, where Rn=1+k=0nφk2R_{n}=1+\sum_{k=0}^{n}\|\varphi_{k}\|^{2}, λminn=λmin{P01+k=0nφkφkT}.\lambda_{\min}^{n}=\lambda_{\min}\{P_{0}^{-1}+\sum_{k=0}^{n}\varphi_{k}\varphi_{k}^{T}\}.

Remark 1

A further explanation of this non-excited condition is provided in subsection III-B.

III-A Set and parameter convergence of estimates

Assume that there are dd (dd is unknown) nonzero elements in vector θ\theta . Without loss of generality, we assume θ=[θ(1),,θ(d),θ(d+1),θ(r)]T\theta=[\theta(1),\ldots,\theta(d),\theta(d+1),\ldots\theta(r)]^{T} and θ(i)0,i=1,,d,θ(j)=0,j=d+1,,r\theta(i)\neq 0,i=1,\ldots,d,\theta(j)=0,j=d+1,\ldots,r. Before giving the main results, we first state a classical result in stochastic adaptive control.

Lemma 1 ([9])

Assume that Assumptions 1 holds. Then as nn\rightarrow\infty, the estimation error of the recursive LS algorithm (3)-(5) is bounded by

θn+1θ2C0logRnλminn a.s., \left\|\theta_{n+1}-\theta\right\|^{2}\leq C_{0}\frac{\log R_{n}}{\lambda^{n}_{\min}}\text{ a.s., }

where C0C_{0} is a finite constant.

Remark 2

The specific value of C0C_{0} can be referred to [9].

Remark 3

We remark that the recursive LS algorithm has strong consistency under Assumption 2, i.e., |θn+1(l)θ(l)|0a.s.|\theta_{n+1}(l)-\theta(l)|\to 0\ a.s. for all l=1,,r.l=1,\cdots,r.

For the estimate βn+1\beta_{n+1} generated by algorithms (3)-(9), we have the following main results.

Theorem 1 (Parameter Convergence)

Assume that Assumptions 1 - 2 hold. Then

βn+1(l)nθ(l),l=1,,r a.s. \beta_{n+1}(l)\underset{n\rightarrow\infty}{\longrightarrow}\theta(l),\quad l=1,\ldots,r\text{ a.s. }
Proof:

By the definition of βn+1\beta_{n+1}, we have for l{1,,r}\forall l\in\{1,\ldots,r\}

|βn+1(l)θ(l)|\displaystyle|\beta_{n+1}(l)-\theta(l)| |βn+1(l)θn+1(l)|+|θn+1(l)θ(l)|\displaystyle\leq|\beta_{n+1}(l)-\theta_{n+1}(l)|+|\theta_{n+1}(l)-\theta(l)|
αn+1+|θn+1(l)θ(l)|.\displaystyle\leq\alpha_{n+1}+|\theta_{n+1}(l)-\theta(l)|.

Thus the proof is completed by the fact that αn0\alpha_{n}\to 0 as nn\to\infty and the convergence results |θn+1(l)θ(l)|0a.s.|\theta_{n+1}(l)-\theta(l)|\to 0\ \ \ a.s.

Theorem 2 (Set Convergence)

Assume that Assumptions 1 - 2 hold. Then there exists an ω\omega-space Ω0\Omega_{0} with 𝒫{Ω0}=1\mathcal{P}\left\{\Omega_{0}\right\}=1 such that for any ωΩ0\omega\in\Omega_{0}, there exists an integer N0(ω)N_{0}(\omega) such that

βN+1(d+1)==βN+1(r)=0,NN0(ω),\beta_{N+1}(d+1)=\cdots=\beta_{N+1}(r)=0,\quad N\geq N_{0}(\omega),

i.e., HN+1=HH_{N+1}=H^{*}, where HN+1H_{N+1} and HH^{*} are defined by (10) and (2) respectively.

Proof:

Noting that Assumption 2 holds almost surely, there exists an Ω0\Omega_{0} with 𝒫{Ω0}=1\mathcal{P}\{\Omega_{0}\}=1 such that Assumption 2 holds for any ωΩ0\omega\in\Omega_{0}. In the following, we will consider the estimate sequence {βn+1}\{\beta_{n+1}\} on a fixed sample path ωΩ0\omega\in\Omega_{0}.

Firstly, we have for i{1,,d}i\in\{1,\ldots,d\},

|βn+1(i)|\displaystyle|\beta_{n+1}(i)|\geq |θ(i)||βn+1(i)θ(i)|\displaystyle|\theta(i)|-|\beta_{n+1}(i)-\theta(i)|
\displaystyle\geq |θ(i)||βn+1(i)θn+1(i)||θn+1(i)θ(i)|\displaystyle|\theta(i)|-|\beta_{n+1}(i)-\theta_{n+1}(i)|-|\theta_{n+1}(i)-\theta(i)|
\displaystyle\geq |θ(i)|αn+1|θn+1(i)θ(i)|.\displaystyle|\theta(i)|-\alpha_{n+1}-|\theta_{n+1}(i)-\theta(i)|.

By the fact that |θ(i)|>0|\theta(i)|>0 for i{1,,d}i\in\{1,\ldots,d\}, αn0\alpha_{n}\to 0 as nn\to\infty and the convergence results |θn+1(i)θ(i)|0|\theta_{n+1}(i)-\theta(i)|\to 0, there exists a constant N1N_{1} such that N>N1\forall N>N_{1},

αN+1+|θN+1(i)θ(i)||θ(i)|2,\displaystyle\alpha_{N+1}+|\theta_{N+1}(i)-\theta(i)|\leq\frac{|\theta(i)|}{2},

thus, we have for i{1,,d}\forall i\in\{1,\ldots,d\}

|βN+1(i)||θ(i)|2>0.\displaystyle|\beta_{N+1}(i)|\geq\frac{|\theta(i)|}{2}>0.

Then for i{d+1,,r}i\in\{d+1,\ldots,r\}, since θ(i)=0\theta(i)=0, by Lemma 1 we have

|θn+1(i)|=O(logRnλminn) a.s.\displaystyle|\theta_{n+1}(i)|=O\left(\sqrt{\frac{\log R_{n}}{\lambda^{n}_{\min}}}\right)\text{ a.s. }

By using the property logRnλminn=o(αn)\sqrt{\frac{\log R_{n}}{\lambda_{\min}^{n}}}=o\left(\alpha_{n}\right), there exists N2N_{2} such that, for N>N2\forall N>N_{2} we have

|θN+1(i)|<αN+1.|\theta_{N+1}(i)|<\alpha_{N+1}.

Then by the definition of βN+1\beta_{N+1} we have βN+1(i)=0\beta_{N+1}(i)=0 for N>N2\forall N>N_{2} and i{d+1,,r}i\in\{d+1,\ldots,r\}.

Finally, the proof is completed by setting N0(ω)=max(N1,N2)N_{0}(\omega)=\max(N_{1},N_{2}). ∎

Remark 4

Theorem 2 shows that the index set of the zero elements in θ\theta can be correctly identified with a finite number of observations, and estimates for the nonzero elements will also be nonzero in a finite number of observations.

The following corollary follows immediately from Theorem 2 and Lemma 1.

Corollary 1

Under assumptions of Theorem 2, Algorithm 1 has the following convergence rate as NN\to\infty:

βN+1θ=O(logRNλminN)a.s.\displaystyle\|\beta_{N+1}-\theta\|=O\left(\sqrt{\frac{\log R_{N}}{\lambda^{N}_{\min}}}\right)\quad a.s.

To the best of the authors’ knowledge, this convergence rate is parallel to that of [15; 9] and has never been achieved in sparse parameter identification algorithms for stochastic regression models. Next, we compare the conditions of observation data that guarantee the convergence of Algorithm 1, LASSO, and its variants respectively.

III-B Comparison of conditions for consistency of LASSO, adaptive LASSO, the algorithm proposed in [25] and Assumption 2

TABLE I: Conditions for consistency of LASSO and its variations and Algorithm 1
Conditions on system
LASSO [23] Regularity condition: DnDD_{n}\to D as nn\to\infty
Strong irrepresentable condition: for some η>0\eta>0, |Dn21(Dn11)1sgn(θ1)|1η|D_{n}^{21}(D_{n}^{11})^{-1}sgn(\theta_{1})|\leq 1-\eta
Adaptive LASSO [26] Regularity condition: DnDD_{n}\to D as nn\to\infty
[25] RnλminnlogRnλminn0\frac{R_{n}}{\lambda_{min}^{n}}\sqrt{\frac{\log R_{n}}{\lambda_{min}^{n}}}\to 0 as nn\to\infty
Algorithm 1 logRnλminn0\frac{\log R_{n}}{\lambda_{min}^{n}}\to 0 as nn\to\infty

For simplicity of notations, we still assume that the parameter vector θ=[θ1Tθ2T]T,θ1=[θ(1)θ(d)]T,θ2=[θ(d+1)θ(r)]T\theta=\left[\begin{array}[]{ll}\theta_{1}^{T}&\theta_{2}^{T}\end{array}\right]^{T},\theta_{1}=[\theta(1)\ldots\theta(d)]^{T},\theta_{2}=[\theta(d+1)\ldots\theta(r)]^{T} such that θ(i)0,i=1,,d\theta(i)\neq 0,i=1,\ldots,d and θ(j)=0,j=d+1,,r\theta(j)=0,j=d+1,\ldots,r. Denote

Dn1nk=1nφkφkT=[Dn11Dn12Dn21Dn22]D_{n}\triangleq\frac{1}{n}\sum_{k=1}^{n}\varphi_{k}\varphi_{k}^{T}=\left[\begin{array}[]{ll}D_{n}^{11}&D_{n}^{12}\\ D_{n}^{21}&D_{n}^{22}\end{array}\right]

where Dn11d×dD_{n}^{11}\in\mathbb{R}^{d\times d} and Dn12,Dn21D_{n}^{12},D_{n}^{21} and Dn22D_{n}^{22} are with compatible dimensions. The comparison on conditions for consistency of LASSO and its variations as well as Algorithm 1 is made in Table I. The strong irrepresentable condition given in Table I is actually prior structural information on the sparsity of the parameter vector, which is not required in our paper. Furthermore, from Table I, we can directly verify that Assumption 2 includes the regularity condition as its special case. Besides, The algorithm proposed by [25] requires an excited condition that RnlogRn=o((λminn)3/2)R_{n}\sqrt{\log R_{n}}=o\left((\lambda_{min}^{n})^{3/2}\right), which fails for Rn=O((λminn)δ)R_{n}=O\left((\lambda_{min}^{n})^{\delta}\right), δ>3/2\forall\delta>3/2. However, from Theorem 2 and Corollary 1 we know that Algorithm 1 can achieve the consistency for Rn=O((λminn)δ)R_{n}=O\left((\lambda_{min}^{n})^{\delta}\right), δ>0\forall\delta>0, which greatly improves the applicability of the algorithm. As far as the authors know, Assumption 2 is the weakest excited condition compared with the one required for sparse parameter identification in the existing literature.

IV Practical applications

Many applications of the sparse identification algorithm have been shown in [25], such as the identification of Hammerstein systems [6] and linear stochastic systems with self-tuning regulation control [9]. However, [25] only gave the existence of the constant N0(ω)N_{0}(\omega) as in Theorem 2, but did not give its specific range, which is not conducive to the development of practical applications. In this section, we take the identification of Hammerstein systems as an example, and provide the specific range of the constant N0(ω)N_{0}(\omega).

Hammerstein system is a modular nonlinear dynamic system with a static nonlinear function followed by a linear dynamic subsystem. Due to its simple structure, as well as a good simulation of nonlinear, Hammerstein system has been widely used in many practical engineering scenarios, such as power amplifiers, manipulators, as well as the neutralization reaction in chemical processes, lithium-ion battery heating systems and heat exchange system, (cf., [17],[7]), etc.

Considering a Hammerstein system whose linear subsystem is an ARX system and whose nonlinear function is a combination of basis functions:

yk+1=\displaystyle y_{k+1}= a1zyk+1++apzpyk+1\displaystyle a_{1}zy_{k+1}+\cdots+a_{p}z^{p}y_{k+1} (11)
+b1f(uk)++bqf(zq1uk)+wk+1,\displaystyle+b_{1}f\left(u_{k}\right)+\cdots+b_{q}f\left(z^{q-1}u_{k}\right)+w_{k+1},
f(uk)=j=1mcjgj(uk)\displaystyle f\left(u_{k}\right)=\sum_{j=1}^{m}c_{j}g_{j}\left(u_{k}\right) (12)

where {gj()}j=1m\left\{g_{j}(\cdot)\right\}_{j=1}^{m} are the basis functions, zz is the backward-shift operator (i.e., zyk+1=ykzy_{k+1}=y_{k}), yk+1y_{k+1} and uku_{k} are the output and input, wk+1w_{k+1} is the system noise, a1,,ap,b1,,a_{1},\cdots,a_{p},b_{1},\cdots, bq,c1,,cmb_{q},c_{1},\cdots,c_{m} are unknown coefficients that need to be estimated. Denote

θ(a1ap(b1c1)(b1cm)(bqc1)(bqcm))Tφk(ykyk+1pg1(uk)gm(uk)g1(uk+1q)gm(uk+1q))T,\begin{array}[]{c}\theta\triangleq\left(a_{1}\ldots a_{p}\left(b_{1}c_{1}\right)\ldots\left(b_{1}c_{m}\right)\ldots\left(b_{q}c_{1}\right)\ldots\left(b_{q}c_{m}\right)\right)^{T}\\ \varphi_{k}\triangleq\left(y_{k}\ldots y_{k+1-p}g_{1}\left(u_{k}\right)\ldots g_{m}\left(u_{k}\right)\ldots\right.\\ \left.\quad g_{1}\left(u_{k+1-q}\right)\ldots g_{m}\left(u_{k+1-q}\right)\right)^{T},\end{array}

the Hammerstein system can be written in the form of (1).

In order to obtain a good approximation of nonlinear functions f()f(\cdot), a large number of basis functions (which leads to a very large mm) are often used in practice, which will lead to the unknown parameter vector θ\theta is probably to be high dimensional and sparse in practice. To obtain a simpler but more precise model of the system, we find that it is crucial to determine the number of the effective basis functions in {gj()}j=1m\left\{g_{j}(\cdot)\right\}_{j=1}^{m}, or sparse identification of the unknown parameter vector θ\theta. Thus Algorithm 1 can be applied to infer the zero elements in θ\theta. Denote

M=[M(1)M(m)]M=\left[\begin{array}[]{lll}M(1)&\cdots&M(m)\end{array}\right]

with M(l)=[b1clbqcl]T,M(l)=\left[b_{1}c_{l}\cdots b_{q}c_{l}\right]^{T}, l=1,,ml=1,\ldots,m. Thus we can find that the noneffective basis functions in {gj()}j=1m\left\{g_{j}(\cdot)\right\}_{j=1}^{m} correspond to zero columns in matrix MM. Consequently, executing algorithms (3)-(5) by using the inputs and outputs {uk,yk+1}k=1n\{u_{k},y_{k+1}\}_{k=1}^{n} of Hammerstein system (11), we can obtain θn+1\theta_{n+1}, followed by executing (8)-(10), we have

βn+1=[a1,n+1ap,n+1(b1c1)n+1(b1cm)n+1\displaystyle\beta_{n+1}=\left[a_{1,n+1}\ldots a_{p,n+1}\left(b_{1}c_{1}\right)_{n+1}\ldots\left(b_{1}c_{m}\right)_{n+1}\right.
(bqc1)n+1(bqcm)n+1]T,\displaystyle\qquad\left.\ldots\left(b_{q}c_{1}\right)_{n+1}\ldots\left(b_{q}c_{m}\right)_{n+1}\right]^{T}, (13)
Mn+1=[Mn+1(1)Mn+1(m)],\displaystyle M_{n+1}={\left[M_{n+1}(1)\cdots M_{n+1}(m)\right],} (14)
withMn+1(l)=[(b1cl)n+1(bqcl)n+1]T,l=1,,m,\displaystyle\text{with}\ M_{n+1}(l)=\left[\left(b_{1}c_{l}\right)_{n+1}\ldots\left(b_{q}c_{l}\right)_{n+1}\right]^{T},l=1,\ldots,m,
H={l{1,,m}cl=0},\displaystyle H^{*}=\left\{l\in\{1,\ldots,m\}\mid c_{l}=0\right\}, (15)
Hn+1={l=1,,mMn+1(l)=0}.\displaystyle H_{n+1}=\left\{l=1,\ldots,m\mid M_{n+1}(l)=0\right\}. (16)

Before presenting the main results, we need the following assumptions.

Assumption 3

{1,g1(x),,gm(x)}\left\{1,g_{1}(x),\ldots,g_{m}(x)\right\} is linearly independent over some interval [a,b][a,b].

Assumption 4

Polynomial A(z)=1a1zapzpA(z)=1-a_{1}z-\cdots-a_{p}z^{p} is stable, i.e., |A(z)|0,|z|1|A(z)|\neq 0,\forall|z|\leq 1 and b12++bq2>0b_{1}^{2}+\cdots+b_{q}^{2}>0.

Assumption 5

{uk}k1\left\{u_{k}\right\}_{k\geq 1} is an i.i.d. sequence with density p(x)p(x) which is positive and continuous on [a,b][a,b] and 0<𝔼gj2(uk)<,j=0<\mathbb{E}g_{j}^{2}\left(u_{k}\right)<\infty,j= 1,,m1,\ldots,m. Further, {uk}k1\left\{u_{k}\right\}_{k\geq 1} and {wk}k1\left\{w_{k}\right\}_{k\geq 1} are mutually independent.

Remark 5

Assumptions 3-5 are essential to ensure convergence of parameter estimates in the Hammerstein system, as described in detail in [24].

Lemma 2 ([24])

If Assumptions 1 and 3-5 hold, then for the maximal and minimal eigenvalues of k=1nφkφkT\sum_{k=1}^{n}\varphi_{k}\varphi_{k}^{T}, the following inequalities

C1n\displaystyle C_{1}n\leq RnC2n, a.s.\displaystyle R_{n}\leq C_{2}n,\text{ a.s. } (17)
C3n\displaystyle C_{3}n\leq λminnC4n, a.s.\displaystyle\lambda^{n}_{\min}\leq C_{4}n\text{, a.s. } (18)

hold for some 0<C1<C2,0<C3<C40<C_{1}<C_{2},0<C_{3}<C_{4}. Further, the LS estimate θn+1\theta_{n+1} has the following convergence speed,

θn+1θ=O(lognn) a.s. \left\|\theta_{n+1}-\theta\right\|=O\left(\sqrt{\frac{\log n}{n}}\right)\text{ a.s. }
Proposition 1

Set αn={lognn}ϵ\alpha_{n}=\{\frac{\log n}{n}\}^{\epsilon} for any fixed ϵ(0,12)\epsilon\in\left(0,\frac{1}{2}\right). If Assumptions 1 and 3-5 hold for Hammerstein system (11)-(12), then there exists an ω\omega-set Ω0\Omega_{0} with 𝒫{Ω0}=1\mathcal{P}\left\{\Omega_{0}\right\}=1 such that for any ωΩ0\omega\in\Omega_{0},

HN+1=HNN0(ω)H_{N+1}=H^{*}\quad\forall N\geq N_{0}(\omega)

holds for N0(ω)N_{0}(\omega) being an positive integer, i.e., the effective basis functions in {gj()}j=1m\left\{g_{j}(\cdot)\right\}_{j=1}^{m} can be correctly identified.

Proof:

By (17)-(18) and noticing αn={lognn}ϵ,ϵ(0,12)\alpha_{n}=\{\frac{\log n}{n}\}^{\epsilon},\epsilon\in\left(0,\frac{1}{2}\right), we can verify that Assumptions 1-2 hold for the regression model composed of (11)-(12), thus, the desired results followed by Theorem 2 directly. ∎

To give a specific selection method of the positive integer N0(ω)N_{0}(\omega), we need the following assumption.

Assumption 6

|θ(i)|>C5>0|\theta(i)|>C_{5}>0 for i=1,,di=1,\ldots,d.

Remark 6

This assumption is necessary and natural, and if |θ(i)|>C5|\theta(i)|>C_{5} i=1,,di=1,\ldots,d is very close to 0, the required constant N0(ω)N_{0}(\omega) is inevitably going to be very large.

Without loss of generality, set the threshold value αn\alpha_{n} in the form of αn=M(Rnλminn)ϵ\alpha_{n}=M\left(\frac{R_{n}}{\lambda_{min}^{n}}\right)^{\epsilon} with M>0M>0 and 0<ϵ<120<\epsilon<\frac{1}{2}.

Theorem 3

Under Assumptions 1-6, there exists an ω\omega-set Ω0\Omega_{0} with 𝒫{Ω0}=1\mathcal{P}\left\{\Omega_{0}\right\}=1 such that for any ωΩ0\omega\in\Omega_{0},

HN+1=HN>N0\displaystyle H_{N+1}=H^{*}\quad\forall N>N_{0} (19)

holds, where

N0=\displaystyle N_{0}= max{47C2,2k1C3log(C2k1C3),2k2C3log(C2k2C3)},\displaystyle\max\left\{\frac{47}{C_{2}},\frac{2k_{1}}{C_{3}}\log\left(\frac{C_{2}k_{1}}{C_{3}}\right),\frac{2k_{2}}{C_{3}}\log\left(\frac{C_{2}k_{2}}{C_{3}}\right)\right\},
k1=\displaystyle k_{1}= (C0M)212ϵ,k2=(2MC5)1ϵ.\displaystyle\left(\frac{\sqrt{C_{0}}}{M}\right)^{\frac{2}{1-2\epsilon}},k_{2}=\left(\frac{2M}{C_{5}}\right)^{\frac{1}{\epsilon}}. (20)
Proof:

From the proof of Theorem 2 we know that, the following two formulas are sufficient conditions for HN+1=HH_{N+1}=H^{*}:

|θN+1(i)θ(i)|+αN+1<|θ(i)|,i{1,,d},\displaystyle|\theta_{N+1}(i)-\theta(i)|+\alpha_{N+1}<|\theta(i)|,\quad i\in\{1,\ldots,d\}, (21)
|θN+1(i)|<αN+1,i{d+1,,r}.\displaystyle|\theta_{N+1}(i)|<\alpha_{N+1},\quad i\in\{d+1,\ldots,r\}. (22)

Thus, if we can verify (21) and (22) under the above NN, then the desired result (19) is true naturally. We first provide an inequality as follows,

logNN<1t,N>max{47,2tlogt},t>0.\frac{\log N}{N}<\frac{1}{t},\quad\forall N>\max\{47,2t\log t\},\ t>0.

In fact, for 0<t<100<t<10, we have

logNN<log4747<110<1t.\frac{\log N}{N}<\frac{\log 47}{47}<\frac{1}{10}<\frac{1}{t}.

For t10t\geq 10, we have

logNN<logt+log2+loglogt2tlogt<2logt2tlogt=1t.\frac{\log N}{N}<\frac{\log t+\log 2+\log\log t}{2t\log t}<\frac{2\log t}{2t\log t}=\frac{1}{t}.

On one hand, by (19) and (20) we have

C2N>max{47,2C2k1C3log(C2k1C3)},C_{2}N>\max\left\{47,\frac{2C_{2}k_{1}}{C_{3}}\log\left(\frac{C_{2}k_{1}}{C_{3}}\right)\right\},

thus log(C2N)C2N<C3C2(MC0)212ϵ\frac{\log(C_{2}N)}{C_{2}N}<\frac{C_{3}}{C_{2}}\left(\frac{M}{\sqrt{C_{0}}}\right)^{\frac{2}{1-2\epsilon}}, that is log(C2N)C3N<(MC0)212ϵ\frac{\log(C_{2}N)}{C_{3}N}<\left(\frac{M}{\sqrt{C_{0}}}\right)^{\frac{2}{1-2\epsilon}}.

Now by Lemma 1 and (17) as well as (18) we have

|θN+1(i)θ(i)|αN+1\displaystyle\frac{|\theta_{N+1}(i)-\theta(i)|}{\alpha_{N+1}} C0M(log(C2N)C3N)12ϵ\displaystyle\leq\frac{\sqrt{C_{0}}}{M}\left(\frac{\log(C_{2}N)}{C_{3}N}\right)^{\frac{1}{2}-\epsilon} (23)
<C0M(MC0)12ϵ12ϵ\displaystyle<\frac{\sqrt{C_{0}}}{M}\left(\frac{M}{\sqrt{C_{0}}}\right)^{\frac{1-2\epsilon}{1-2\epsilon}}
=1.\displaystyle=1.

Thus (22) is satisfied by the fact that θ(i)=0\theta(i)=0 for i=d+1,,ri=d+1,\ldots,r. On the other hand, by

C2N>max{47,2C2k2C3log(C2k2C3)},C_{2}N>\max\{47,\frac{2C_{2}k_{2}}{C_{3}}\log\left(\frac{C_{2}k_{2}}{C_{3}}\right)\},

we have log(C2N)C3N<(C52M)1ϵ\frac{\log(C_{2}N)}{C_{3}N}<\left(\frac{C_{5}}{2M}\right)^{\frac{1}{\epsilon}}. Then by (17) we have,

2αN+1\displaystyle 2\alpha_{N+1} 2M(log(C2N)C3N)ϵ\displaystyle\leq 2M\left(\frac{\log(C_{2}N)}{C_{3}N}\right)^{\epsilon} (24)
<2M(C52M)ϵϵ\displaystyle<2M\left(\frac{C_{5}}{2M}\right)^{\frac{\epsilon}{\epsilon}}
=C5.\displaystyle=C_{5}.

(21) is satisfied by combining (23) and (24). This completes the proof of the theorem. ∎

In Theorem 3, the constant N0N_{0} is corresponding to k1k_{1} and k2k_{2}. To quickly identify the position of the zero elements, we need to make max{k1,k2}\max\{k_{1},k_{2}\} as small as possible. Clearly, for fixed ϵ\epsilon, k1k_{1} decreases as MM increases, while k2k_{2} increases as MM increases. A straightforward calculation shows that

max{k1,k2}k14C0C52,if M22ϵ1C0ϵC512ϵ,\displaystyle\max\{k_{1},k_{2}\}\geq k_{1}\geq\frac{4C_{0}}{C_{5}^{2}},\quad\text{if }M\leq 2^{2\epsilon-1}C_{0}^{\epsilon}C_{5}^{1-2\epsilon}, (25)
max{k1,k2}k24C0C52,if M22ϵ1C0ϵC512ϵ.\displaystyle\max\{k_{1},k_{2}\}\geq k_{2}\geq\frac{4C_{0}}{C_{5}^{2}},\quad\text{if }M\geq 2^{2\epsilon-1}C_{0}^{\epsilon}C_{5}^{1-2\epsilon}.

Thus the lower bound of max{k1,k2}\max\{k_{1},k_{2}\} is 4C0C524C_{0}C_{5}^{-2}, which motivates the following corollary.

Corollary 2

Under conditions of Theorem 3, with M=22ϵ1C0ϵC512ϵM=2^{2\epsilon-1}C_{0}^{\epsilon}C_{5}^{1-2\epsilon}, we have

HN+1=H,N>max{47C2,8C0C3C52log(4C2C0C3C52)}.\displaystyle H_{N+1}=H^{*},\ \forall N>\max\left\{\frac{47}{C_{2}},\frac{8C_{0}}{C_{3}C_{5}^{2}}\log\left(\frac{4C_{2}C_{0}}{C_{3}C_{5}^{2}}\right)\right\}. (26)
Proof:

The proof is a combination of (LABEL:eqzkhzkh) and Theorem 3. ∎

Remark 7

Noticing that M[b1bq]T[c1cm]M\triangleq\left[b_{1}\cdots b_{q}\right]^{T}\left[\begin{array}[]{lll}c_{1}&\cdots&c_{m}\end{array}\right]\text{, } we can further infer the estimates for the nonzero elements in {bi,i=1,,q}\left\{b_{i},i=1,\ldots,q\right\} and {dl,l=1,,m}\left\{d_{l},l=1,\ldots,m\right\} by using a singular value decomposition (SVD) algorithm to MN+1M_{N+1} defined by (14), see Chaoui et al. (2005) for more details.

V Simulation Results

In this section, we provide an example to illustrate the performance of the novel sparse identification algorithm (i.e., Algorithm 1) proposed in this paper.

Example 1

Consider a discrete-time stochastic regression model (1) with the dimension r=10r=10. The noise sequence {wk+1},k0\{w_{k+1}\},k\geq 0 in (1) is independent and identically distributed with wk+1𝒩(0,0.1)w_{k+1}\sim\mathcal{N}(0,0.1) (Gaussian distribution with zero mean and variance 0.1). Let the regression vector φk\varphi_{k}\in r(k1)\mathbb{R}^{r}(k\geq 1) be generated by the following state space model,

xk\displaystyle x_{k} =Axk1+εk,\displaystyle=Ax_{k-1}+\varepsilon_{k},
φk\displaystyle\varphi_{k} =Bkxk,\displaystyle=B_{k}x_{k},

where xkrx_{k}\in\mathbb{R}^{r} is the state of the above system with x0=[1,,1r]Tx_{0}=[\underbrace{1,\ldots,1}_{r}]^{T}, the matrices A,BkA,B_{k} and vector εk\varepsilon_{k} are chosen according to the following way,

Ak=diag{1.01,,1.01r},Bk={bk,ij}r×r with bk,ij𝒩(0,1),εk={εk,i}r with εk,i𝒩(0,1).\begin{array}[]{l}A_{k}=\operatorname{diag}\{\underbrace{1.01,\cdots,1.01}_{r}\},\\ B_{k}=\left\{b_{k,ij}\right\}\in\mathbb{R}^{r\times r}\text{ with }b_{k,ij}\sim\mathcal{N}(0,1),\\ \varepsilon_{k}=\left\{\varepsilon_{k,i}\right\}\in\mathbb{R}^{r}\text{ with }\varepsilon_{k,i}\sim\mathcal{N}(0,1).\end{array}

The true value of the unknown parameter is

θ\displaystyle\theta =[θ(1),θ(2),θ(3),θ(4),θ(5),θ(6),θ(7),θ(8),θ(9),θ(10)]T\displaystyle=\left[\theta(1),\theta(2),\theta(3),\theta(4),\theta(5),\theta(6),\theta(7),\theta(8),\theta(9),\theta(10)\right]^{T}
=[0.8,1.6,0.3,0.05,0,0,0,0,0,0]T.\displaystyle=\left[0.8,1.6,-0.3,0.05,0,0,0,0,0,0\right]^{T}.

Ten simulations are performed. Fig. 1 shows the estimate sequences {θn(1),θn(2),,θn(10)}n=1600\left\{\theta_{n}(1),\theta_{n}(2),\ldots,\theta_{n}(10)\right\}_{n=1}^{600} generated by Algorithm 1 with αn=0.1(Rnλminn)14\alpha_{n}=0.1\left(\frac{R_{n}}{\lambda_{min}^{n}}\right)^{\frac{1}{4}} from one of the simulations. Table II compares the averaged estimates from the ten simulations generated by least squares, LASSO with λn=\lambda_{n}= n0.75n^{0.75} (see [23]), the algorithm proposed by [25] with λn=(λminn)0.75\lambda_{n}=(\lambda_{min}^{n})^{0.75} and Algorithm 1 for θ(5),θ(6),θ(7),θ(8),θ(9),θ(10)\theta(5),\theta(6),\theta(7),\theta(8),\theta(9),\theta(10), with different data length nn. We adopt the Python CVX tools (http://cvxopt.org/) to solve the convex optimization in LASSO ([23]) and the algorithm proposed by [25].

From Fig. 1 and Table II, we can find that, compared with the least squares estimates, the LASSO ([23]), the algorithm proposed by [25] and Algorithm 1 generate sparser and more accurate estimates for the system parameters, naturally, give us valuable information in inferring the zero and nonzero elements in the unknown parameters. Moreover, compared with LASSO ([23]) and the algorithm proposed by [25], Algorithm 1 does not solve the additional optimization problem. Thus it can avoid many unnecessary numerical errors and make the estimates of true zeros exactly zero.

Refer to caption
Figure 1: Estimate sequences
TABLE II: Averaged estimates by our method, least squares and LASSO from ten simulations
N=100 N=200 N=300 N=400 N=500
Estimates for θ(5)\theta(5)
By Algorithm 1 0 0 0 0 0
By least squares -3.8403×103\times 10^{-3} -1.5291×103\times 10^{-3} 1.9044×104\times 10^{-4} -7.8872×105\times 10^{-5} 1.1098×104\times 10^{-4}
By LASSO 5.0651×105\times 10^{-5} -2.7055×105\times 10^{-5} -3.1356×105\times 10^{-5} -2.5534×106\times 10^{-6} -1.4774×106\times 10^{-6}
By [25] 1.8805×106\times 10^{-6} -5.3767×107\times 10^{-7} -5.8867×107\times 10^{-7} -6.3817×107\times 10^{-7} -2.4387×107\times 10^{-7}
Estimates for θ(6)\theta(6)
By Algorithm 1 0 0 0 0 0
By least squares 4.7193×103\times 10^{-3} 1.4462×103\times 10^{-3} 8.8818×104\times 10^{-4} 8.7922×104\times 10^{-4} 7.1250×104\times 10^{-4}
By LASSO 1.0038×104\times 10^{-4} 4.2274×105\times 10^{-5} 3.8088×105\times 10^{-5} -9.0536×106\times 10^{-6} -1.9497×105\times 10^{-5}
By [25] 1.9487×106\times 10^{-6} 9.5211×107\times 10^{-7} -5.2565×107\times 10^{-7} -4.8326×107\times 10^{-7} -6.0134×107\times 10^{-7}
Estimates for θ(7)\theta(7)
By Algorithm 1 0 0 0 0 0
By least squares 1.6445×103\times 10^{-3} 1.4848×103\times 10^{-3} 1.0246×103\times 10^{-3} 8.8689×105\times 10^{-5} -3.0212×105\times 10^{-5}
By LASSO 8.3569×105\times 10^{-5} 1.1443×105\times 10^{-5} -3.9639×106\times 10^{-6} 2.9968×106\times 10^{-6} -1.8478×106\times 10^{-6}
By [25] -3.8009×109\times 10^{-9} 1.4456×1010\times 10^{-10} -1.3981×1010\times 10^{-10} 2.8872×1011\times 10^{-11} -7.8004×1011\times 10^{-11}
Estimates for θ(8)\theta(8)
By Algorithm 1 0 0 0 0 0
By least squares -3.6471×103\times 10^{-3} 3.6763×103\times 10^{-3} 1.1463×103\times 10^{-3} 9.2123×104\times 10^{-4} -9.6761×105\times 10^{-5}
By LASSO 3.4481×104\times 10^{-4} -7.1948×105\times 10^{-5} -4.0180×105\times 10^{-5} -4.0338×105\times 10^{-5} -1.0343×105\times 10^{-5}
By [25] -7.8657×106\times 10^{-6} -8.4547×107\times 10^{-7} -3.2375×107\times 10^{-7} -3.5296×107\times 10^{-7} -4.4807×108\times 10^{-8}
Estimates for θ(9)\theta(9)
By Algorithm 1 0 0 0 0 0
By least squares -4.6465×103\times 10^{-3} -2.1687×103\times 10^{-3} -8.9557×104\times 10^{-4} -4.8013×104\times 10^{-4} -6.1303×105\times 10^{-5}
By LASSO 1.2557×105\times 10^{-5} 1.5550×105\times 10^{-5} -2.0270×105\times 10^{-5} -2.7999×105\times 10^{-5} -3.2130×106\times 10^{-6}
By [25] 1.1125×108\times 10^{-8} 1.4665×108\times 10^{-8} -3.8394×109\times 10^{-9} -3.5668×109\times 10^{-9} -1.4756×109\times 10^{-9}
Estimates for θ(10)\theta(10)
By Algorithm 1 0 0 0 0 0
By least squares 3.5405×103\times 10^{-3} 1.0033×103\times 10^{-3} 4.1614×104\times 10^{-4} -4.5427×104\times 10^{-4} -1.3878×104\times 10^{-4}
By LASSO 7.0112×105\times 10^{-5} 1.6159×104\times 10^{-4} 7.9057×105\times 10^{-5} 5.5963×105\times 10^{-5} -1.0325×105\times 10^{-5}
By [25] -1.9684×105\times 10^{-5} 3.7883×105\times 10^{-5} -3.8752×106\times 10^{-6} -2.1050×106\times 10^{-6} -7.3450×106\times 10^{-6}

VI Conclusion

In this paper, the identification of sparse unknown parameters in stochastic regression models is studied. We proposed an efficient algorithm for generating sparse estimates without any penalty terms. Under the weakest non-persistent excited condition, we proved that the set of zero elements in the unknown sparse parameter vector could be correctly identified in finite observations, and the non-zero elements almost surely converge to the true values. For the Hammerstein system, we give the determined number of finite observations. Some meaningful problems deserve further consideration, e.g., the application of the proposed algorithm to identify time-varying unknown sparse parameters. The distributed algorithm to estimate the unknown parameter using local measurement, the adaptive control by using the estimation algorithm based on the sampled data, and the continuing execution of the algorithm when the set of zero elements is identified in finite observations, i.e., the influence of the initial value on the algorithm.

References

  • [1] Omar A Alzubi, Jafar A Alzubi, Mohammed Alweshah, Issa Qiqieh, Sara Al-Shami, and Manikandan Ramachandran. An optimal pruning algorithm of classifier ensembles: dynamic programming approach. Neural Computing and Applications, 32(20):16091–16107, 2020.
  • [2] Emmanuel J Candès et al. Compressive sampling. In Proceedings of the international congress of mathematicians, volume 3, pages 1433–1452. Citeseer, 2006.
  • [3] Emmanuel J Candès, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
  • [4] Emmanuel J Candes and Terence Tao. Decoding by linear programming. IEEE transactions on information theory, 51(12):4203–4215, 2005.
  • [5] Yilun Chen, Yuantao Gu, and Alfred O Hero. Sparse lms for system identification. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3125–3128. IEEE, 2009.
  • [6] Esref Eskinat, Stanley H Johnson, and William L Luyben. Use of hammerstein models in identification of nonlinear systems. AIChE Journal, 37(2):255–268, 1991.
  • [7] Esref Eskinat, Stanley H. Johnson, and William L. Luyben. Use of hammerstein models in identification of nonlinear systems. AIChE Journal, 37(2):255 – 268, 1991. Cited by: 400.
  • [8] Yuantao Gu, Jian Jin, and Shunliang Mei. l_l\_{0} norm constraint lms algorithm for sparse system identification. IEEE Signal Processing Letters, 16(9):774–777, 2009.
  • [9] Lei Guo. Convergence and logarithm laws of self-tuning regulators. Automatica, 31(3):435–450, 1995.
  • [10] Lei Guo. Time-varying stochastic systems, stability and adaptive theory, Second edition. Science Press, Beijing, 2020.
  • [11] Lei Guo and HanFu Chen. Identification and stochastic adaptive control. Springer Science, Boston, MA, 1991.
  • [12] S. Haykin. Radar signal processing. IEEE Signal Processing Magazine, 2(2):2–18, 1985.
  • [13] Nicholas Kalouptsidis, Gerasimos Mileounis, Behtash Babadi, and Vahid Tarokh. Adaptive algorithms for sparse system identification. Signal Processing, 91(8):1910–1919, 2011.
  • [14] Yannis Kopsinis, Konstantinos Slavakis, and Sergios Theodoridis. Online sparse system identification and signal reconstruction using projections onto weighted \\backslashell_ {\{1}\} balls. IEEE Transactions on Signal Processing, 59(3):936–952, 2010.
  • [15] Tze Leung Lai and Ching Zong Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1):154–166, 1982.
  • [16] Satheesh K. Perepu and Arun K. Tangirala. Identification of equation error models from small samples using compressed sensing techniques. IFAC-PapersOnLine, 48(8):795–800, 2015. 9th IFAC Symposium on Advanced Control of Chemical Processes ADCHEM 2015.
  • [17] Jessy George Smith, Shivaram Kamat, and K.P. Madhavan. Modeling of ph process using wavenet based hammerstein model. Journal of Process Control, 17(6):551–561, 2007.
  • [18] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
  • [19] Roland Tóth, Borhan M. Sanandaji, Kameshwar Poolla, and Tyrone L. Vincent. Compressive system identification in the linear time-invariant framework. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 783–790, 2011.
  • [20] Masoud Vaezi and Afshin Izadian. Piecewise affine system identification of a hydraulic wind power transfer system. IEEE Transactions on Control Systems Technology, 23(6):2077–2086, 2015.
  • [21] Siyu Xie and Lei Guo. Analysis of compressed distributed adaptive filters. Automatica, 112:108707, 2020.
  • [22] Xia Xu, Bin Pan, Zongqing Chen, Zhenwei Shi, and Tao Li. Simultaneously multiobjective sparse unmixing and library pruning for hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 59(4):3383–3395, 2020.
  • [23] Peng Zhao and Bin Yu. On model selection consistency of lasso. The Journal of Machine Learning Research, 7:2541–2563, 2006.
  • [24] Wen-Xiao Zhao. Parametric identification of hammerstein systems with consistency results using stochastic inputs. IEEE Transactions on Automatic Control, 55(2):474–480, 2010.
  • [25] Wenxiao Zhao, George Yin, and Er-Wei Bai. Sparse system identification for stochastic systems with general observation sequences. Automatica, 121:109162, 2020.
  • [26] Hui Zou. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429, 2006.