Unsupervised Learned Kalman Filtering
Abstract
In this paper we adapt kn, which is a recently proposed dnn (dnn)-aided system whose architecture follows the operation of the mb kf (kf), to learn its mapping in an unsupervised manner, i.e., without requiring ground-truth states. The unsupervised adaptation is achieved by exploiting the hybrid mb/dd architecture of kn, which internally predicts the next observation as the kf does. These internal features are then used to compute the loss rather than the state estimate at the output of the system. With the capability of unsupervised learning, one can use kn not only to track the hidden state, but also to adapt to variations in the ss (ss) model. We numerically demonstrate that when the noise statistics are unknown, unsupervised kn achieves a similar performance to kn with supervised learning. We also show that we can adapt a pre-trained kn to changing ss models without providing additional data thanks to the unsupervised capabilities.
Index Terms— kf, unsupervised learning.
1 Introduction
Real-time tracking of hidden state sequences from noisy observations plays a major role in many signal processing systems. Classic approaches are based on the kf [1] and its variants [2, Ch. 10]. These mb (mb) techniques rely on accurate knowledge of an underlying statistical ss model capturing the system dynamics, which may not be available in some applications, and tend to notably degrade in the presence of model mismatch. To cope with missing model parameters, data is commonly used for parameter estimation, followed by plugging in the missing parameters into the mb kf and its variants [3, 4].
The unprecedented success of dl has spurred a multitude of dnn based approaches for ss model related tasks, that are optimized in an e2e manner. This allows to achieve improved accuracy compared with mb algorithms when applied in complex, poorly understood, and partially known dynamics, by learning to carry out the task directly from data. Notable approaches include dnn feature extractors [5], variational inference techniques [6, 7, 8, 9, 10, 11], and the usage of rnn [12, 13, 14, 15]. When the ss model is partially known, one can benefit from the available knowledge by using the hybrid mb/ dd (dd) kn architecture proposed in [16] for the rt filtering task, as a learned kf via mb deep learning [17].
A key challenge in applying an e2e dnn-based filters, stems from their need to be trained using labeled data i.e., a large volume of pairs of noisy measurements and their corresponding ground-truth hidden state sequences from the underlying ss model. Obtaining such ground-truth sequences may be costly, particularly in setups where the underlying dynamics, i.e., the ss model, change over time. Previous works on dnn-based for ss model related tasks in the unsupervised setup, focused mostly on the imputation task of filling in missing observations[10, 7, 8, 9]. This task notably differs from real-time state estimations, also known as filtering [2, Ch. 4].
In this work we extend kn [16], to learn its mapping in an unsupervised fashion by building upon its interpretable hybrid architecture, which learns to implement the kf while preserving its structure. Specifically, we define a loss measure that uses the noisy observations and their predictions taken from an internal feature of kn. We also propose a semi-supervised training method, which first trains kn offline, and then adapts in an unsupervised online manner to dynamics that differ from the offline trained model without providing ground-truth data. This mechanism results in kn tracking not only the latent state, but also changes in the underlying ss model. Our numerical evaluations demonstrate that the unsupervised kn that does not have access to the noise statistics, approaches the kf with full domain knowledge. Furthermore, its semi-supervised implementation allows to improve upon on its supervised counterpart due to the newly added ability to track variations in the ss model without requiring additional data.
2 System Model and Preliminaries
We review the ss model and briefly recall the supervised kn. For simplicity, we focus on linear ss models, though the derivations can also be used for non-linear models in the same manner as the extended kf is applied [2, Ch. 10], as we demonstrate in Section 4.
2.1 Problem Formulation
We consider state estimation in dt, linear, Gaussian ss models. Letting denote the hidden state vector at time instance , which evolves in time via
(1) |
Here, is the state evolution matrix, while is awgn (awgn) with covariance . The corresponding observation , is related to via
(2) |
where is the measurement matrix, and is an awgn with covariance . We focus on the filtering problem, where one needs to track the hidden state from a known initial state . At each time instance , the goal is to provide an instantaneous estimate , based on the observations seen so far . We consider scenarios where one has partial domain knowledge, such that the statistics of the noises and are not known, while the matrices and are known. To fill the information gap we assume that we have access to an unlabeled ds containing a sequence of observations from which one has to learn to recover the hidden state.
2.2 Supervised kn
kn is a hybrid mb/dd implementation of the kf. The latter utilizes full knowledge of the ss model to estimate , based on the current observed and the previous estimate . This is achieved by first predicting the next state and observation based solely on the previous estimate via
(3a) | ||||
(3b) |
while computing the second-order moments of these estimates as , and . Next, the kf computes the kg (kg) as , which is used to update the estimation covariance , and provide the state estimate via
(4) |
where is the innovation process computed as
(5) |

kn learns to implement the kf from labeled data in partially known ss models. This is achieved by noting that the available knowledge allows to compute the predictions in (3), while the missing domain knowledge is needed to compute the kg. Thus, kn augments the flow of the kf with an rnn, which estimates the kg and implicitly tracks the second-order moments computed by the kf, while the state estimate is obtained via (4) (see [16] for a detailed description). The kn architecture, depicted in Fig. 1, is trained to estimate the state in a supervised manner based on labeled data. The data set comprises pairs of hidden state trajectories and its corresponding observations of the form , where
(6) |
and is the length of the th training trajectory. The training procedure aims at minimizing the regularized loss. Letting be the trainable parameters of the rnn and be the output of kn with parameters at time applied to , the loss is computed for the th trajectory as
(7) |
where is a regularization coefficient. The loss measure (7) is used to optimize via sgd (sgd) optimization combined with the bptt (bptt) algorithm [18].
3 Unsupervised KalmanNet
3.1 Unsupervised Training Algorithm
kn, described in Subsection 2.2, as well as other previously proposed dnn-based state estimators such as [14], are designed to estimate , and thus are trained so that the output approaches the ground-truth hidden state sequence. kn admits an interpretable architecture owing to its hybrid mb/dd design, which preserves the flow of the kf. We exploit this fact to propose a training algorithm for kn that does not rely on ground-truth labels.
Unsupervised loss: kn uses its state estimates to predict the next observation via (3b) as an internal feature. While the accuracy of this prediction, e.g., the squared magnitude of the innovation process (5), depends on the accuracy in estimating and the observation noise, (5) can be computed based solely on the observed sequence. Consequently, one can adapt kn in an unsupervised manner by training it to minimize . This quantity can be used to compute the gradient with respect to the parameters of the rnn, which outputs the learned kg, by the derivative chain rule. Indeed,
(8) |
where . In (8), holds since
(9) |
The gradient in (8) indicates that the norm of the innovation process can be used to learn the computation of the kg, which involves the trainable parameters of kn . Similarly to (7), the resulting loss for the th trajectory is
(10) |
Unsupervised kn is thus trained using solely observed trajectories based on the loss measure (10) using sgd variants combined with bptt for gradient computation.
Offline versus online training: The ability to train kn without providing ground-truth state sequences gives rise to two possible training approaches: a purely unsupervised offline training scheme, and an online semi-supervised strategy. The offline approach follows conventional unsupervised learning using unlabeled data of the form . This data set is used to optimize via mini-batch sgd-based optimization, where for every batch indexed by , we choose trajectories indexed by , computing the mini-batch loss as .
Online training builds upon the ability to learn without labels to adapt a pre-trained kn to dynamics that differ from those used during training. Pretraining can be done using labeled data obtained by mathematical modelling and/or past measurements without altering the architecture of kn. Then, the deployed model is further adapted in an unsupervised manner using observations acquired during operation to form training trajectories from realizations of the data. Such a training procedure provides kn with the ability to be adaptive to changes in the distribution of the data. Specifically, once every time steps, we compute the loss (10) over the last observations online and optimize the rnn parameters accordingly.
3.2 Discussion
The ability to train kn in an unsupervised manner without relying on ground-truth sequences, follows directly from the hybrid mb/dd architecture of kn, where one can identify the observations innovation process as an internal feature and use it to compute the loss. The training procedure does not affect the kn architecture, and one can use the same supervised model designed in [16]. In the current work, we focus on partially-known ss models where and are available from, e.g., a physical model. While supervised kn was shown in [16] to operate reliably when using inaccurate approximations of and , we leave such a study in unsupervised setups for future work.
The proposed online semi-supervised technique allows one to adapt a pre-trained kn state estimator after deployment, coping with setups in which the original training is based on data that does not fully capture the true underlying ss model. This gain, numerically demonstrated in Section 4, bears some similarity to online training mechanisms proposed for hybrid mb/dd communication receivers in [19, 20, 21]. Despite the similarity, the proposed technique, obtained from the interpretable operation of kn, is fundamentally different from that proposed in [19, 20, 21], where structures in communication data were exploited to generate confident labels from decisions. Nonetheless, both the current work and [19, 20, 21] demonstrate the potential of mb deep learning in enabling application-oriented, efficient training algorithms.
4 Numerical Evaluations
In this section we numerically111The source code used in our numerical study along with the complete set of hyper-parameters used in each numerical evaluation can be found online at https://github.com/KalmanNet/Unsupervised_ICASSP22. evaluate unsupervised kn on a linear ss model and on the nl la model, and compare it to the kf and extended kf.
In the linear setup, and take a canonical form, and and are the diagonal matrices and , respectively, while defining . In Fig. 2 we compare the performance of unsupervised kn to the mb kf, which achieves the mmse (mmse) here, for and ss models and trajectory length . We observe in Fig. 2 that the offline trained unsupervised kn learns to achieve the mmse lower bound. Next, the previously trained model is evaluated on a longer trajectory length of . The results reported in Table 1 show that kn does not overfit to the trajectory length and that the unsupervised training of kn is not tailored to the trajectories presented during training, tuning the filter with dependency only on the ss model.

kf mse | |||||
---|---|---|---|---|---|
kn mse |
The results so far indicate that for the considered ss model, unsupervised training does not degrade the performance of kn observed for supervised training in [16]. To understand the benefits of supervision, we depict in Fig. 3 the mse convergence of unsupervised kn compared with its supervised counterpart. We observe in Fig. 3 that the lack of labeled data in unsupervised kn and the fact that it is not explicitly encouraged to minimize the state estimation mse, results in slower convergence to the mmse compared to supervised kn.

Next, we train unsupervised kn for the nl ss model of the chaotic la (see [16] for details). In Fig. 4 we can see that we were able to train kn for this challenging setup. Although the training test is bounded by the observation noise , the mse achieved by unsupervised kn is within a minor gap of from the mse achieved by the extended kf which has full knowledge of the ss model. Furthermore, the dnn-aided kn is observed to require seconds to infer for each trajectory, which is quicker compared to the extended kf, that involves matrix inversions and requires seconds per trajectory. This indicates that kn may be preferable even when one can estimate the noise statistics.

Finally, we evaluate the online training mechanism when the testing distribution differs from the ss model from which the training data is generated. We again consider a linear ss model where the true (testing) observation distribution is generated with , while the model is pre-trained on data from an ss model with . For online adaptation, we train every incoming samples. In Fig. 5 we can see that kn smoothly adapts to the test distribution while training on the observed trajectory over multiple time steps. This shows that the proposed training algorithm enables kn to track variations in the ss model.

5 Conclusions
In this work we proposed an unsupervised training scheme that enables kn to learn its mapping without requiring ground-truth sequences. The training scheme exploits the interpretable nature of kn to formulate an unsupervised loss based on an internal feature that predicts the next observation. Our numerical evaluations demonstrate that the proposed unsupervised training allows kn to approach the mmse, without access to the noise statistics.
References
- [1] R. E. Kalman, “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering, vol. 82, no. 1, pp. 35–45, 1960.
- [2] J. Durbin and S. J. Koopman, Time series analysis by state space methods. Oxford University Press, 2012.
- [3] P. Abbeel, A. Coates, M. Montemerlo, A. Y. Ng, and S. Thrun, “Discriminative training of Kalman filters.” in Robotics: Science and Systems, vol. 2, 2005, p. 1.
- [4] L. Xu and R. Niu, “EKFNet: Learning system noise statistics from measurement data,” in Proc. IEEE ICASSP, 2021, pp. 4560–4564.
- [5] L. Zhou, Z. Luo, T. Shen, J. Zhang, M. Zhen, Y. Yao, T. Fang, and L. Quan, “KFNet: Learning temporal camera relocalization using Kalman filtering,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4919–4928.
- [6] R. G. Krishnan, U. Shalit, and D. Sontag, “Deep Kalman filters,” preprint arXiv:1511.05121, 2015.
- [7] M. Karl, M. Soelch, J. Bayer, and P. Van der Smagt, “Deep variational Bayes filters: Unsupervised learning of state space models from raw data,” preprint arXiv:1605.06432, 2016.
- [8] M. Fraccaro, S. D. Kamronn, U. Paquet, and O. Winther, “A disentangled recognition and nonlinear dynamics model for unsupervised learning,” in Advances in Neural Information Processing Systems, 2017.
- [9] C. Naesseth, S. Linderman, R. Ranganath, and D. Blei, “Variational sequential Monte Carlo,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2018, pp. 968–977.
- [10] E. Archer, I. M. Park, L. Buesing, J. Cunningham, and L. Paninski, “Black box variational inference for state space models,” arXiv preprint arXiv:1511.07367, 2015.
- [11] R. Krishnan, U. Shalit, and D. Sontag, “Structured inference networks for nonlinear state space models,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1, 2017.
- [12] T. Haarnoja, A. Ajay, S. Levine, and P. Abbeel, “Backprop KF: Learning discriminative deterministic state estimators,” in Advances in Neural Information Processing Systems, 2016, pp. 4376–4384.
- [13] X. Zheng, M. Zaheer, A. Ahmed, Y. Wang, E. P. Xing, and A. J. Smola, “State space LSTM models with particle MCMC inference,” preprint arXiv:1711.11179, 2017.
- [14] H. Coskun, F. Achilles, R. DiPietro, N. Navab, and F. Tombari, “Long short-term memory Kalman filters: Recurrent neural estimators for pose regularization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5524–5532.
- [15] P. Becker, H. Pandya, G. Gebhardt, C. Zhao, C. J. Taylor, and G. Neumann, “Recurrent Kalman networks: Factorized inference in high-dimensional deep feature spaces,” in International Conference on Machine Learning. PMLR, 2019, pp. 544–552.
- [16] G. Revach, N. Shlezinger, X. Ni, A. L. Escoriza, R. J. van Sloun, and Y. C. Eldar, “KalmanNet: Neural network aided Kalman filtering for partially known dynamics,” arXiv preprint arXiv:2107.10043, 2021.
- [17] N. Shlezinger, J. Whang, Y. C. Eldar, and A. G. Dimakis, “Model-based deep learning,” arXiv preprint arXiv:2012.08405, 2020.
- [18] P. J. Werbos, “Backpropagation through time: What it does and how to do it,” Proc. IEEE, vol. 78, no. 10, pp. 1550–1560, 1990.
- [19] N. Shlezinger, N. Farsad, Y. C. Eldar, and A. J. Goldsmith, “ViterbiNet: A deep learning based Viterbi algorithm for symbol detection,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3319–3331, 2020.
- [20] N. Shlezinger, R. Fu, and Y. C. Eldar, “DeepSIC: Deep soft interference cancellation for multiuser MIMO detection,” IEEE Trans. Wireless Commun., vol. 20, no. 2, pp. 1349–1362, 2021.
- [21] C.-F. Teng and Y.-L. Chen, “Syndrome enabled unsupervised learning for neural network based polar decoder and jointly optimized blind equalizer,” IEEE Trans. Emerg. Sel. Topics Circuits Syst., 2020.