An Efficient Sparse Identification Algorithm For Stochastic Systems With General Observation Sequences
Abstract
This paper studies the sparse identification problem of unknown sparse parameter vectors in stochastic dynamic systems. Firstly, a novel sparse identification algorithm is proposed, which can generate sparse estimates based on least squares estimation by adaptively adjusting the threshold. Secondly, under a possibly weakest non-persistent excited condition, we prove that the proposed algorithm can correctly identify the zero and nonzero elements of the sparse parameter vector using a finite number of observations, and further estimates of the nonzero elements almost surely converge to the true values. Compared with the related works, e.g., LASSO, our method only requires the weakest assumptions and does not require solving additional optimization problems. Besides, our theoretical results do not require any statistical assumptions on the regression signals, including independence or stationarity, which makes our results promising for application to stochastic feedback systems. Thirdly, the number of finite observations that guarantee the convergence of the zero-element set of unknown sparse parameters of the Hammerstein system is derived for the first time. Finally, numerical simulations are provided, demonstrating the effectiveness of the proposed method. Since there is no additional optimization problem, i.e., no additional numerical error, the proposed algorithm performs much better than other related algorithms.
Index Terms:
Stochastic dynamic system; Sparse parameter identification; The weakest non-persistent excited condition; Feedback control system; Strong consistencyI Introduction
Parameter estimation or filtering plays a very important role in system identification and control, signal processing, statistical learning, and other fields ([12; 20]). In recent decades, the classical identification and estimation theory has made great progress, and a relatively complete theoretical system has been formed ([10; 11]).
We note that in many practical application scenarios, such as the selection of effective basis functions in Hamiltonian systems, feature selection and robust inference of biomarkers based on omics data in personalized healthcare, channel estimation in ultra-wideband communication systems, since many components in the unknown parameter vector have no or negligible contribution to the system (these elements are zero or close to zero), the unknown parameter vector in a large number of systems is high-dimensional and sparse. Naturally, an interesting research direction is sparse parameter identification, i.e., the precise identification of zero and nonzero elements in the unknown parameter vector, in order to reduce model redundancy and obtain a leaner, better performance, and more reliable prediction model.
It is found that the research on sparse parameter identification has made considerable progress. In the field of signal processing, we find that the research of sparse parameter identification is based on compressed sensing (CS), i.e., solving an (the number of nonzero elements) optimization problem with (Euclidean metric) constraints. Since this optimization problem is NP-hard, researchers have developed some sparse signal estimation algorithms by transforming optimization into a more solvable convex optimization problem using the norm equivalence principle. For example, by using compressed sensing methods, Kalouptsidis et al. in [13] developed a sparse identification algorithms based on Kalman filtering and Expectation-Maximization, and verified the efficiency of the proposed algorithm via simulations. Candès and Tao et al. in [3; 2; 4] developed a relatively perfect signal reconstruction theory under some prior assumptions of the sparsity of the model. See [14; 5; 8] for more references. The variable selection problem plays a very important role in the field of statistical learning. The well-known LASSO (the least absolute shrinkage and selection operator) is a linear regression method with regularization, and sparse identification algorithms represented by it and its variants have been extensively studied. Tibshirani in [18] first proposed the LASSO in 1996. Zhao and Yu in [23] established the model selection consistency under regularity conditions. Zou in [26] first proposed the adaptive lasso and discussed the advantages of the adaptive lasso in detail. Combined with the ubiquitous sparsity in practical systems, we find that CS and adaptive lasso methodology are inherited and developed in the field of system identification and control, and the sparse parameter identification theory of stochastic dynamic systems is established. For example, Satheesh and Arun in [16] proposed a CS-based iterative basis pursuit denoising (BPDN) algorithm to estimate the unknown sparse parameters in the ARMAX model, where the number of zero elements of unknown parameters is known. Roland et al. in [19] studied the consistency of estimation for sparse linear regression models based on CS under the condition that the input signals are independent and identically distributed (i.i.d.). Considering that the regressors in the stochastic dynamic system are often generated by the past input and output signals, so the independence assumption about the observation data is difficult to be satisfied. Based on this, Xie and Guo in [21] developed a compressed consensus normalized least mean squares and provided the stability analysis of the algorithm based on CS theory, where the strong statistical assumptions such as independency are not required, but the prior assumption of the sparsity of the model is. Zhao et al. in [25] proposed a LASSO-based adaptive sparse estimation algorithm with general observation sequences. The zero elements in the unknown parameter are correctly identified with a finite number of observations and a non-persistent excited condition, which is the weakest excited condition in the existing literature about sparse parameter identification. There is also related research in some other fields, such as pruning in deep learning [1; 22], etc.
Summarizing the existing research, we find that there are still some shortcomings in the sparse parameter identification theory. Firstly, the sparse estimation algorithm based on CS theory requires prior knowledge about the sparsity of the unknown parameter and the regression vectors. Secondly, the theoretical analysis of the LASSO-based sparse estimation algorithm requires a regularity condition of the regressor, which is a persistent excited condition actually and difficult or almost impossible to satisfy in stochastic feedback systems. Even the non-persistent excited condition for regressor in [25] is much stronger than the weakest excited condition proposed by Lai and Wei in [15] (See subsection III-B and Table I for detail). Thirdly, for a specific system, such as the Hammerstein system, it is very instructive to give a definite number of finite observations under further assumptions about the regressor, but this cannot be done in [25].
Our contributions to the paper are summarized as follows:
-
1.
We first propose a novel sparse parameter identification algorithm to estimate unknown and sparse parameters in stochastic regression systems. The principle of the algorithm is to generate a sparse estimate based on the least squares estimator by adjusting the threshold adaptively. Compared with the existing LASSO and its variants method for generating sparse estimates by optimizing the criterion function with a penalty term, our algorithm will not need to solve the optimization problem, so it will be more concise and efficient.
-
2.
Unlike the classical LS estimator’s asymptotical theory ([15]), under the same condition, we prove that the proposed algorithm can correctly identify the set of zero elements and nonzero elements in the unknown sparse parameter vector under finite observation, which is the convergence of the set. It is worth pointing out that this result is parallel to the convergence results in [25], but the non-persistent excited condition required is much weaker than [25]. Furthermore, estimates of the nonzero will converge to the true values almost surely with a convergence rate of , i.e., parameter convergence, which, to the best of the authors’ knowledge, has never been achieved in sparse parameter identification algorithms for stochastic regression models.
-
3.
For a classical system, such as the Hammerstein system, whose strong convergence and convergence rate are established by Zhao in [24], we give the number of finite observations under the same conditions, which is never achieved by sparse parameter identification in stochastic dynamic systems.
-
4.
Simulation experiments compare the proposed algorithm with the classical least squares (LS) estimation, LASSO estimation, and algorithm in [25]. In contrast, other algorithms cannot exactly identify the element , showing the great advantage of the algorithm proposed in this paper.
The rest of this paper is organized as follows. In Section II, we give the problem formulation. Section III presents the main results of this paper, including the parameter convergence, the set convergence, and the comparison of conditions for consistency of other algorithm and the proposed algorithm in this paper. Section IV applies the algorithm to the sparse parameter estimation of the Hammerstein system. A simulation example is given in Section V, and the concluding remarks are made in Section VI.
II Problem Formulation
II-A Some preliminaries
Let be the probability space, be an element in , and be the expectation operator. Denote as the -norm of vectors or matrices in this paper. For two positive sequences and , means for and some , while means as .
II-B Sparse identification algorithm
Consider the parameter identification problem of the following discrete-time stochastic regression model,
(1) |
where is the scalar observation or output at time , is the r-dimensional stochastic regression vector which may be the function of current and past inputs and outputs, is an unknown -dimensional parameter to be estimated, and is the stochastic noise sequence.
The above model (1) includes many parameterized systems, such as ARX system and Hammerstein system. We further denote the parameter vector and the index set of its zero elements by
(2) |
The classical identification algorithms, such as the LS algorithm, can generate consistent estimates for the unknown parameters as the number of data goes to infinity. However, due to the existence of system noises and the finite number of observations in practice, it is hard to deduce from such estimates which parameters are exactly zero or not. In the existing literature, penalty term is added to the criterion function to be optimized to generate sparse estimates so the zero elements in the unknown parameter can be identified. However, there are two issues worth thinking about. Firstly, the existence of the penalty term destroys the optimality of LS and makes the estimation inevitably biased, so a stronger excited condition is needed to ensure the consistency of the algorithm. Secondly, solving the optimization problem with the penalty term brings great difficulty. Our problem is to design a new sparse adaptive estimation algorithm that does not require penalty terms to infer the set in a finite number of steps and identify the unknown parameter by using stochastic regression vectors and the observation signals .
Step 0:(Initialization) Choose a positive sequence satisfying as and
Step 1: Based on , begin with an initial vector and an initial matrix , compute the matrix and the estimate of by the following recursive LS algorithm,
(3) | ||||
(4) | ||||
(5) |
Step 2: Obtain by the following mechanism,
(8) |
then obtain
(9) | ||||
(10) |
III Theoretical Properties of the Sparse Identification Algorithm
In this section, we will investigate the asymptotic analysis of the unknown sparse parameter vector and the convergence of the sets of zero elements with a finite number of observations, To this end, we introduce the following assumptions to be used for the theoretical analysis.
Assumption 1
The noise is a martingale difference sequence, i.e., , and there exists some such that a.s.
Assumption 2 (Non-Excited Condition)
The growth rate of is slower than , that is
holds, where ,
Remark 1
A further explanation of this non-excited condition is provided in subsection III-B.
III-A Set and parameter convergence of estimates
Assume that there are ( is unknown) nonzero elements in vector . Without loss of generality, we assume and . Before giving the main results, we first state a classical result in stochastic adaptive control.
Lemma 1 ([9])
Remark 2
The specific value of can be referred to [9].
Remark 3
We remark that the recursive LS algorithm has strong consistency under Assumption 2, i.e., for all
Proof:
By the definition of , we have for
Thus the proof is completed by the fact that as and the convergence results ∎
Theorem 2 (Set Convergence)
Proof:
Noting that Assumption 2 holds almost surely, there exists an with such that Assumption 2 holds for any . In the following, we will consider the estimate sequence on a fixed sample path .
Firstly, we have for ,
By the fact that for , as and the convergence results , there exists a constant such that ,
thus, we have for
Then for , since , by Lemma 1 we have
By using the property , there exists such that, for we have
Then by the definition of we have for and .
Finally, the proof is completed by setting . ∎
Remark 4
Theorem 2 shows that the index set of the zero elements in can be correctly identified with a finite number of observations, and estimates for the nonzero elements will also be nonzero in a finite number of observations.
Corollary 1
Under assumptions of Theorem 2, Algorithm 1 has the following convergence rate as :
To the best of the authors’ knowledge, this convergence rate is parallel to that of [15; 9] and has never been achieved in sparse parameter identification algorithms for stochastic regression models. Next, we compare the conditions of observation data that guarantee the convergence of Algorithm 1, LASSO, and its variants respectively.
III-B Comparison of conditions for consistency of LASSO, adaptive LASSO, the algorithm proposed in [25] and Assumption 2
Conditions on system | |
LASSO [23] | Regularity condition: as |
Strong irrepresentable condition: for some , | |
Adaptive LASSO [26] | Regularity condition: as |
[25] | as |
Algorithm 1 | as |
For simplicity of notations, we still assume that the parameter vector such that and . Denote
where and and are with compatible dimensions. The comparison on conditions for consistency of LASSO and its variations as well as Algorithm 1 is made in Table I. The strong irrepresentable condition given in Table I is actually prior structural information on the sparsity of the parameter vector, which is not required in our paper. Furthermore, from Table I, we can directly verify that Assumption 2 includes the regularity condition as its special case. Besides, The algorithm proposed by [25] requires an excited condition that , which fails for , . However, from Theorem 2 and Corollary 1 we know that Algorithm 1 can achieve the consistency for , , which greatly improves the applicability of the algorithm. As far as the authors know, Assumption 2 is the weakest excited condition compared with the one required for sparse parameter identification in the existing literature.
IV Practical applications
Many applications of the sparse identification algorithm have been shown in [25], such as the identification of Hammerstein systems [6] and linear stochastic systems with self-tuning regulation control [9]. However, [25] only gave the existence of the constant as in Theorem 2, but did not give its specific range, which is not conducive to the development of practical applications. In this section, we take the identification of Hammerstein systems as an example, and provide the specific range of the constant .
Hammerstein system is a modular nonlinear dynamic system with a static nonlinear function followed by a linear dynamic subsystem. Due to its simple structure, as well as a good simulation of nonlinear, Hammerstein system has been widely used in many practical engineering scenarios, such as power amplifiers, manipulators, as well as the neutralization reaction in chemical processes, lithium-ion battery heating systems and heat exchange system, (cf., [17],[7]), etc.
Considering a Hammerstein system whose linear subsystem is an ARX system and whose nonlinear function is a combination of basis functions:
(11) | ||||
(12) |
where are the basis functions, is the backward-shift operator (i.e., ), and are the output and input, is the system noise, are unknown coefficients that need to be estimated. Denote
the Hammerstein system can be written in the form of (1).
In order to obtain a good approximation of nonlinear functions , a large number of basis functions (which leads to a very large ) are often used in practice, which will lead to the unknown parameter vector is probably to be high dimensional and sparse in practice. To obtain a simpler but more precise model of the system, we find that it is crucial to determine the number of the effective basis functions in , or sparse identification of the unknown parameter vector . Thus Algorithm 1 can be applied to infer the zero elements in . Denote
with . Thus we can find that the noneffective basis functions in correspond to zero columns in matrix . Consequently, executing algorithms (3)-(5) by using the inputs and outputs of Hammerstein system (11), we can obtain , followed by executing (8)-(10), we have
(13) | ||||
(14) | ||||
(15) | ||||
(16) |
Before presenting the main results, we need the following assumptions.
Assumption 3
is linearly independent over some interval .
Assumption 4
Polynomial is stable, i.e., and .
Assumption 5
is an i.i.d. sequence with density which is positive and continuous on and . Further, and are mutually independent.
Remark 5
Lemma 2 ([24])
Proposition 1
Proof:
To give a specific selection method of the positive integer , we need the following assumption.
Assumption 6
for .
Remark 6
This assumption is necessary and natural, and if is very close to 0, the required constant is inevitably going to be very large.
Without loss of generality, set the threshold value in the form of with and .
Theorem 3
Proof:
From the proof of Theorem 2 we know that, the following two formulas are sufficient conditions for :
(21) | |||
(22) |
Thus, if we can verify (21) and (22) under the above , then the desired result (19) is true naturally. We first provide an inequality as follows,
In fact, for , we have
For , we have
On one hand, by (19) and (20) we have
thus , that is .
In Theorem 3, the constant is corresponding to and . To quickly identify the position of the zero elements, we need to make as small as possible. Clearly, for fixed , decreases as increases, while increases as increases. A straightforward calculation shows that
(25) | ||||
Thus the lower bound of is , which motivates the following corollary.
Corollary 2
Under conditions of Theorem 3, with , we have
(26) |
Proof:
The proof is a combination of (LABEL:eqzkhzkh) and Theorem 3. ∎
Remark 7
Noticing that we can further infer the estimates for the nonzero elements in and by using a singular value decomposition (SVD) algorithm to defined by (14), see Chaoui et al. (2005) for more details.
V Simulation Results
In this section, we provide an example to illustrate the performance of the novel sparse identification algorithm (i.e., Algorithm 1) proposed in this paper.
Example 1
Consider a discrete-time stochastic regression model (1) with the dimension . The noise sequence in (1) is independent and identically distributed with (Gaussian distribution with zero mean and variance 0.1). Let the regression vector be generated by the following state space model,
where is the state of the above system with , the matrices and vector are chosen according to the following way,
The true value of the unknown parameter is
Ten simulations are performed. Fig. 1 shows the estimate sequences generated by Algorithm 1 with from one of the simulations. Table II compares the averaged estimates from the ten simulations generated by least squares, LASSO with (see [23]), the algorithm proposed by [25] with and Algorithm 1 for , with different data length . We adopt the Python CVX tools (http://cvxopt.org/) to solve the convex optimization in LASSO ([23]) and the algorithm proposed by [25].
From Fig. 1 and Table II, we can find that, compared with the least squares estimates, the LASSO ([23]), the algorithm proposed by [25] and Algorithm 1 generate sparser and more accurate estimates for the system parameters, naturally, give us valuable information in inferring the zero and nonzero elements in the unknown parameters. Moreover, compared with LASSO ([23]) and the algorithm proposed by [25], Algorithm 1 does not solve the additional optimization problem. Thus it can avoid many unnecessary numerical errors and make the estimates of true zeros exactly zero.

N=100 | N=200 | N=300 | N=400 | N=500 | |
---|---|---|---|---|---|
Estimates for | |||||
By Algorithm 1 | 0 | 0 | 0 | 0 | 0 |
By least squares | -3.8403 | -1.5291 | 1.9044 | -7.8872 | 1.1098 |
By LASSO | 5.0651 | -2.7055 | -3.1356 | -2.5534 | -1.4774 |
By [25] | 1.8805 | -5.3767 | -5.8867 | -6.3817 | -2.4387 |
Estimates for | |||||
By Algorithm 1 | 0 | 0 | 0 | 0 | 0 |
By least squares | 4.7193 | 1.4462 | 8.8818 | 8.7922 | 7.1250 |
By LASSO | 1.0038 | 4.2274 | 3.8088 | -9.0536 | -1.9497 |
By [25] | 1.9487 | 9.5211 | -5.2565 | -4.8326 | -6.0134 |
Estimates for | |||||
By Algorithm 1 | 0 | 0 | 0 | 0 | 0 |
By least squares | 1.6445 | 1.4848 | 1.0246 | 8.8689 | -3.0212 |
By LASSO | 8.3569 | 1.1443 | -3.9639 | 2.9968 | -1.8478 |
By [25] | -3.8009 | 1.4456 | -1.3981 | 2.8872 | -7.8004 |
Estimates for | |||||
By Algorithm 1 | 0 | 0 | 0 | 0 | 0 |
By least squares | -3.6471 | 3.6763 | 1.1463 | 9.2123 | -9.6761 |
By LASSO | 3.4481 | -7.1948 | -4.0180 | -4.0338 | -1.0343 |
By [25] | -7.8657 | -8.4547 | -3.2375 | -3.5296 | -4.4807 |
Estimates for | |||||
By Algorithm 1 | 0 | 0 | 0 | 0 | 0 |
By least squares | -4.6465 | -2.1687 | -8.9557 | -4.8013 | -6.1303 |
By LASSO | 1.2557 | 1.5550 | -2.0270 | -2.7999 | -3.2130 |
By [25] | 1.1125 | 1.4665 | -3.8394 | -3.5668 | -1.4756 |
Estimates for | |||||
By Algorithm 1 | 0 | 0 | 0 | 0 | 0 |
By least squares | 3.5405 | 1.0033 | 4.1614 | -4.5427 | -1.3878 |
By LASSO | 7.0112 | 1.6159 | 7.9057 | 5.5963 | -1.0325 |
By [25] | -1.9684 | 3.7883 | -3.8752 | -2.1050 | -7.3450 |
VI Conclusion
In this paper, the identification of sparse unknown parameters in stochastic regression models is studied. We proposed an efficient algorithm for generating sparse estimates without any penalty terms. Under the weakest non-persistent excited condition, we proved that the set of zero elements in the unknown sparse parameter vector could be correctly identified in finite observations, and the non-zero elements almost surely converge to the true values. For the Hammerstein system, we give the determined number of finite observations. Some meaningful problems deserve further consideration, e.g., the application of the proposed algorithm to identify time-varying unknown sparse parameters. The distributed algorithm to estimate the unknown parameter using local measurement, the adaptive control by using the estimation algorithm based on the sampled data, and the continuing execution of the algorithm when the set of zero elements is identified in finite observations, i.e., the influence of the initial value on the algorithm.
References
- [1] Omar A Alzubi, Jafar A Alzubi, Mohammed Alweshah, Issa Qiqieh, Sara Al-Shami, and Manikandan Ramachandran. An optimal pruning algorithm of classifier ensembles: dynamic programming approach. Neural Computing and Applications, 32(20):16091–16107, 2020.
- [2] Emmanuel J Candès et al. Compressive sampling. In Proceedings of the international congress of mathematicians, volume 3, pages 1433–1452. Citeseer, 2006.
- [3] Emmanuel J Candès, Justin K Romberg, and Terence Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 59(8):1207–1223, 2006.
- [4] Emmanuel J Candes and Terence Tao. Decoding by linear programming. IEEE transactions on information theory, 51(12):4203–4215, 2005.
- [5] Yilun Chen, Yuantao Gu, and Alfred O Hero. Sparse lms for system identification. In 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 3125–3128. IEEE, 2009.
- [6] Esref Eskinat, Stanley H Johnson, and William L Luyben. Use of hammerstein models in identification of nonlinear systems. AIChE Journal, 37(2):255–268, 1991.
- [7] Esref Eskinat, Stanley H. Johnson, and William L. Luyben. Use of hammerstein models in identification of nonlinear systems. AIChE Journal, 37(2):255 – 268, 1991. Cited by: 400.
- [8] Yuantao Gu, Jian Jin, and Shunliang Mei. {} norm constraint lms algorithm for sparse system identification. IEEE Signal Processing Letters, 16(9):774–777, 2009.
- [9] Lei Guo. Convergence and logarithm laws of self-tuning regulators. Automatica, 31(3):435–450, 1995.
- [10] Lei Guo. Time-varying stochastic systems, stability and adaptive theory, Second edition. Science Press, Beijing, 2020.
- [11] Lei Guo and HanFu Chen. Identification and stochastic adaptive control. Springer Science, Boston, MA, 1991.
- [12] S. Haykin. Radar signal processing. IEEE Signal Processing Magazine, 2(2):2–18, 1985.
- [13] Nicholas Kalouptsidis, Gerasimos Mileounis, Behtash Babadi, and Vahid Tarokh. Adaptive algorithms for sparse system identification. Signal Processing, 91(8):1910–1919, 2011.
- [14] Yannis Kopsinis, Konstantinos Slavakis, and Sergios Theodoridis. Online sparse system identification and signal reconstruction using projections onto weighted ell_ 1 balls. IEEE Transactions on Signal Processing, 59(3):936–952, 2010.
- [15] Tze Leung Lai and Ching Zong Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1):154–166, 1982.
- [16] Satheesh K. Perepu and Arun K. Tangirala. Identification of equation error models from small samples using compressed sensing techniques. IFAC-PapersOnLine, 48(8):795–800, 2015. 9th IFAC Symposium on Advanced Control of Chemical Processes ADCHEM 2015.
- [17] Jessy George Smith, Shivaram Kamat, and K.P. Madhavan. Modeling of ph process using wavenet based hammerstein model. Journal of Process Control, 17(6):551–561, 2007.
- [18] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.
- [19] Roland Tóth, Borhan M. Sanandaji, Kameshwar Poolla, and Tyrone L. Vincent. Compressive system identification in the linear time-invariant framework. In 2011 50th IEEE Conference on Decision and Control and European Control Conference, pages 783–790, 2011.
- [20] Masoud Vaezi and Afshin Izadian. Piecewise affine system identification of a hydraulic wind power transfer system. IEEE Transactions on Control Systems Technology, 23(6):2077–2086, 2015.
- [21] Siyu Xie and Lei Guo. Analysis of compressed distributed adaptive filters. Automatica, 112:108707, 2020.
- [22] Xia Xu, Bin Pan, Zongqing Chen, Zhenwei Shi, and Tao Li. Simultaneously multiobjective sparse unmixing and library pruning for hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 59(4):3383–3395, 2020.
- [23] Peng Zhao and Bin Yu. On model selection consistency of lasso. The Journal of Machine Learning Research, 7:2541–2563, 2006.
- [24] Wen-Xiao Zhao. Parametric identification of hammerstein systems with consistency results using stochastic inputs. IEEE Transactions on Automatic Control, 55(2):474–480, 2010.
- [25] Wenxiao Zhao, George Yin, and Er-Wei Bai. Sparse system identification for stochastic systems with general observation sequences. Automatica, 121:109162, 2020.
- [26] Hui Zou. The adaptive lasso and its oracle properties. Journal of the American statistical association, 101(476):1418–1429, 2006.