This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The inverse Kalman filter

Xinyi Fang and Mengyang Gu111The authors contribute equally. Correspondence should be addressed to Mengyang Gu ([email protected])
Abstract

We introduce the inverse Kalman filter (IKF), which enables exact matrix-vector multiplication between a covariance matrix from a dynamic linear model and any real-valued vector with linear computational cost. We integrate the IKF with the conjugate gradient algorithm, which substantially accelerates the computation of matrix inversion for a general form of covariance matrix, where other approximation approaches may not be directly applicable. We demonstrate the scalability and efficiency of the IKF approach through applications in nonparametric estimation of particle interaction functions, using both simulations and real cell trajectory data.

1 Introduction

Dynamic linear models (DLMs) or linear state space models are ubiquitously used in modeling temporally correlated data [37, 5]. Each observation yty_{t}\in\mathbb{R} in DLMs is associated with a qq-dimensional latent state vector 𝜽t\bm{\theta}_{t}, defined as:

yt\displaystyle y_{t} =𝐅t𝜽t+vt,vt𝒩(0,Vt),\displaystyle=\mathbf{F}_{t}\bm{\theta}_{t}+v_{t},\quad v_{t}\sim\mathcal{N}(0,V_{t}), (1)
𝜽t\displaystyle\bm{\theta}_{t} =𝐆t𝜽t1+𝐰t,𝐰t𝒩(𝟎,𝐖t),\displaystyle=\mathbf{G}_{t}\bm{\theta}_{t-1}+\mathbf{w}_{t},\quad\mathbf{w}_{t}\sim\mathcal{MN}(\mathbf{0},\mathbf{W}_{t}), (2)

where 𝐅t\mathbf{F}_{t} and 𝐆t\mathbf{G}_{t} are matrices of dimensions 1×q1\times q and q×qq\times q, respectively, 𝐖t\mathbf{W}_{t} is a q×qq\times q covariance matrix for t=2,,Nt=2,\dots,N, and the initial state vector 𝜽1𝒩(𝐛1,𝐖1)\bm{\theta}_{1}\sim\mathcal{MN}(\mathbf{b}_{1},\mathbf{W}_{1}) with 𝐛1=𝟎\mathbf{b}_{1}=\mathbf{0} assumed herein. The dependent structure of DLMs is illustrated in Fig. 1(a), where each observation yty_{t} is associated with a single latent state 𝜽t\bm{\theta}_{t}.

DLMs are a flexible class of models that include many widely used processes, such as autoregressive and moving average processes [26, 27]. Some Gaussian processes (GPs) with commonly used kernel functions, including the Matérn covariance with a half-integer roughness parameter [12], can also be represented as DLMs [38, 13], with closed-form expressions for 𝐅t\mathbf{F}_{t}, 𝐆t\mathbf{G}_{t} and 𝐖t\mathbf{W}_{t}. This connection, summarized in the Supplementary Material, enables differentiable GPs within the DLM framework, allowing efficient estimation of smooth functions from noisy observations.

The Kalman filter (KF) provides a scalable approach for estimating latent state and computing likelihood in DLMs, which scales linearly with the number of observations [16], as reviewed in Appendix A. In particular, the KF enables efficient computation of 𝐋1𝐮\mathbf{L}^{-1}\mathbf{u} for any NN-dimensional vector 𝐮\mathbf{u} in 𝒪(q3N)\mathcal{O}(q^{3}N) operations, where 𝐋\mathbf{L} is the Cholesky factor of the covariance matrix 𝚺=cov[𝐲1:N]=𝐋𝐋T\bm{\Sigma}=\mbox{cov}[\mathbf{y}_{1:N}]=\mathbf{L}\mathbf{L}^{T}, summarized in Lemma 6 in the Appendix.

Refer to caption
Figure 1: The dependent structure of the dynamic linear model (panel (a)) and particle dynamics (panel (b)).

Computational challenges remain for high-dimensional state space models and scenarios where KF cannot be directly applied, such as the dependent structure shown in Fig. 1(b), where each observation yiy_{i} can be associated with multiple latent states. This interaction structure, introduced in Section 3, is common in physical models, such as molecular dynamics simulation [28]. One way to overcome these challenges is to efficiently compute 𝚺𝐮\bm{\Sigma}\mathbf{u} for any NN-dimensional vector 𝐮\mathbf{u} and utilize optimization methods such as the conjugate gradient (CG) algorithm for computing the predictive distribution [15]. This strategy has been used to approximate the maximum likelihood estimator of parameters in GPs [32, 25]. Yet, each CG iteration requires matrix-vector multiplication, which involves 𝒪(N2)\mathcal{O}(N^{2}) operations and storage, making it prohibitive for large NN.

To address this issue, we introduce the inverse Kalman filter (IKF), which computes 𝐋𝐮\mathbf{L}\mathbf{u} and 𝐋T𝐮\mathbf{L}^{T}\mathbf{u} with 𝒪(q3N)\mathcal{O}(q^{3}N) operations without approximations, where 𝐋\mathbf{L} is the Cholesky factor of any N×NN\times N DLM-induced covariance 𝚺\bm{\Sigma} and 𝐮\mathbf{u} is an NN-dimensional vector. The complexity is significantly smaller than 𝒪(N2)\mathcal{O}(N^{2}) in direct computation. The latent states dimension qq is not larger than 3 in our applications, and it includes many commonly used models such as twice differentiable GPs with a Matérn covariance in (15), often used as a default choice in GP surrogate models.

The IKF algorithm can be extended to accelerate matrix multiplication and inversion, a key computational bottleneck in many problems. We integrate the IKF into a CG algorithm to scalably compute the predictive distribution of observations with a general covariance structure:

𝚺y=j=1J𝐀j𝚺j𝐀jT+𝚲,\bm{\Sigma}_{y}=\sum^{J}_{j=1}\mathbf{A}_{j}\bm{\Sigma}_{j}\mathbf{A}^{T}_{j}+\bm{\Lambda}, (3)

where 𝚺j\bm{\Sigma}_{j} is a DLM-induced covariance matrix, 𝐀j\mathbf{A}_{j} is sparse, and 𝚲\bm{\Lambda} is a diagonal matrix. This structure appears in nonparametric estimation of particle interactions, which will be introduced in Section 3. Both real and simulated studies involve numerous particles over a long time period, where conventional approximation methods [34, 19, 3, 7, 10] are not applicable. This covariance structure also appears in other applications, including varying coefficient models [14] and estimating incomplete lattices of correlated data [31]. We apply our approach to the latter in Section S5 and provide numerical comparisons with various approximation methods in Section S9 of the Supplementary Material.

2 Inverse Kalman filter

In this section, we introduce an exact algorithm for computing 𝚺𝐮\bm{\Sigma}\mathbf{u} with 𝒪(q3N)\mathcal{O}(q^{3}N), where 𝐮\mathbf{u} is an NN-dimensional real-valued vector and 𝚺=cov[𝐲1:N]\bm{\Sigma}=\mbox{cov}[\mathbf{y}_{1:N}] is an N×NN\times N covariance matrix of observations induced by a DLM with qq latent states in Eqs. (1)-(2). This algorithm is applicable to any DLM specified in Eqs. (1)-(2). Denote 𝚺=𝐋𝐋T\bm{\Sigma}=\mathbf{L}\mathbf{L}^{T}, where the Cholesky factor 𝐋\mathbf{L} does not need to be explicitly computed in our approach. We provide fast algorithms to compute 𝐱~=𝐋T𝐮\mathbf{\tilde{x}}=\mathbf{L}^{T}\mathbf{u} and 𝐱=𝐋𝐱~\mathbf{x}=\mathbf{L}\mathbf{\tilde{x}} by Lemma 1 and Lemma 2, respectively, each with 𝒪(q3N)\mathcal{O}(q^{3}N) operations. Detailed proofs of these lemmas are available in the Supplementary Material. In the following, utu_{t}, xtx_{t}, x~t\tilde{x}_{t} denote the ttth element of the vector of 𝐮\mathbf{u}, 𝐱\mathbf{x}, and 𝐱~\mathbf{\tilde{x}}, respectively, and Lt,tL_{t^{\prime},t} denotes the (t,t)(t^{\prime},t)th element of 𝐋\mathbf{L} for tt and t=1,2,,Nt^{\prime}=1,2,\dots,N.

Lemma 1 (Compute 𝐱~=𝐋T𝐮\tilde{\mathbf{x}}=\mathbf{L}^{T}{\mathbf{u}} with linear computational cost).

For any NN-dimensional vector 𝐮{\mathbf{u}}, let x~N=QN1/2uN\tilde{x}_{N}={Q_{N}^{1/2}}u_{N}, x~N1=QN11/2(N,N1uN+uN1)\tilde{x}_{N-1}={Q_{N-1}^{1/2}}(\ell_{N,N-1}u_{N}+u_{N-1}), and 𝐠N1=𝐅N𝐆NuN\mathbf{g}_{N-1}=\mathbf{F}_{N}\mathbf{G}_{N}u_{N}. For t=N2,,1t=N-2,\dots,1, iteratively compute x~t\tilde{x}_{t} by

x~t\displaystyle\tilde{x}_{t} =Qt12(~t+1,t+t+1,tut+1+ut),\displaystyle={Q_{t}^{\frac{1}{2}}}\left(\tilde{\ell}_{t+1,t}+\ell_{t+1,t}u_{t+1}+u_{t}\right), (4)
𝐠t\displaystyle{\mathbf{g}}_{t} =𝐠t+1𝐆t+1+𝐅t+1𝐆t+1ut+1,\displaystyle=\mathbf{g}_{t+1}{\mathbf{G}}_{t+1}+\mathbf{F}_{t+1}\mathbf{G}_{t+1}u_{t+1}, (5)

where t+1,t=𝐅t+1𝐆t+1𝐊t\ell_{t+1,t}=\mathbf{F}_{t+1}\mathbf{G}_{t+1}\mathbf{K}_{t}, ~t+1,t=𝐠t+1𝐆t+1𝐊t\tilde{\ell}_{t+1,t}={\mathbf{g}}_{t+1}\mathbf{G}_{t+1}\mathbf{K}_{t}, and 𝐊t\mathbf{K}_{t} is the Kalman gain

𝐊t=𝐁t𝐅tTQt1,\displaystyle\mathbf{K}_{t}=\mathbf{B}_{t}\mathbf{F}^{T}_{t}Q^{-1}_{t}, (6)

with 𝐁t=cov[𝛉t𝐲1:t1]\mathbf{B}_{t}=\mbox{cov}[\bm{\theta}_{t}\mid{\mathbf{y}}_{1:t-1}] and Qt=var[yt𝐲1:t1]Q_{t}=\mbox{var}[y_{t}\mid{\mathbf{y}}_{1:t-1}] from the Kalman filter in (26) and (27) in Appendix A, respectively. Then 𝐱~=𝐋T𝐮\tilde{\mathbf{x}}=\mathbf{L}^{T}{\mathbf{u}}.

Lemma 2 (Compute 𝐱=𝐋𝐱~\mathbf{x}=\mathbf{L}\tilde{\mathbf{x}} with linear computational cost).

For any NN-dimensional vector 𝐱~\tilde{\mathbf{x}}, let x1=𝐅1𝐛1+Q11/2x~1x_{1}=\mathbf{F}_{1}{\mathbf{b}}_{1}+{Q_{1}}^{1/2}\tilde{x}_{1} and 𝐦~1=𝐛1+𝐊1(x1𝐅1𝐛1)\tilde{\mathbf{m}}_{1}={\mathbf{b}}_{1}+\mathbf{K}_{1}(x_{1}-\mathbf{F}_{1}{\mathbf{b}}_{1}). For t=2,,Nt=2,\dots,N, iteratively compute xtx_{t} by

𝐛t\displaystyle{\mathbf{b}}_{t} =𝐆t𝐦~t1,\displaystyle=\mathbf{G}_{t}\tilde{\mathbf{m}}_{t-1}, (7)
xt\displaystyle x_{t} =𝐅t𝐛t+Qt12x~t,\displaystyle=\mathbf{F}_{t}{\mathbf{b}}_{t}+{Q_{t}^{\frac{1}{2}}}\tilde{x}_{t}, (8)
𝐦~t\displaystyle\tilde{\mathbf{m}}_{t} =𝐛t+𝐊t(xt𝐅t𝐛t),\displaystyle={\mathbf{b}}_{t}+\mathbf{K}_{t}(x_{t}-\mathbf{F}_{t}{\mathbf{b}}_{t}), (9)

where QtQ_{t} and 𝐊t\mathbf{K}_{t} are from (27) and (6), respectively. Then 𝐱=𝐋𝐱~\mathbf{x}=\mathbf{L}\tilde{\mathbf{x}}.

The algorithm in Lemma 1 is unconventional and nontrivial to derive. In contrast, Lemma 2 has direct connection to the Kalman filter. Specifically, the one-step-ahead predictive distribution in step (ii) of the Kalman filter, given in Appendix A, enables computing 𝐋1𝐱\mathbf{L}^{-1}\mathbf{x} for any vector 𝐱\mathbf{x} (Lemma 6). Lemma 2 essentially reverses this operation, computing 𝐋𝐱~\mathbf{L}\mathbf{\tilde{x}} for any vector 𝐱~\mathbf{\tilde{x}}, leading to the term Inverse Kalman filter (IKF).

The IKF algorithm, outlined in Algorithm 1, sequentially applies Lemmas 1 and 2 to reduce the computational cost and storage for 𝚺𝐮\bm{\Sigma}{\mathbf{u}} from 𝒪(N2)\mathcal{O}(N^{2}) to 𝒪(q3N)\mathcal{O}(q^{3}N) without approximations. This approach applies to all DLMs, including GPs that admit equivalent DLM representations, such as GPs with Matérn kernels where roughness parameters 0.50.5 and 2.52.5 correspond to q=1q=1 and q=3q=3, respectively. Details are provided in the Supplementary Material.

Algorithm 1 Inverse Kalman filter (IKF) for computing 𝚺𝐮\bm{\Sigma}\mathbf{u}.
1:𝐅t,𝐆t,𝐖t,t=1,,N\mathbf{F}_{t},\mathbf{G}_{t},\mathbf{W}_{t},t=1,\dots,N, an NN-dimensional vector 𝐮{\mathbf{u}}.
2:Use Kalman filter from Lemma 5 to compute 𝐁t\mathbf{B}_{t}, QtQ_{t}, and 𝐊t\mathbf{K}_{t} for t=1,,Nt=1,\dots,N.
3:Use Lemma 1 to compute 𝐱~=𝐋T𝐮\tilde{\mathbf{x}}=\mathbf{L}^{T}{\mathbf{u}}.
4:Use Lemma 2 to compute 𝐱=𝐋𝐱~{\mathbf{x}}=\mathbf{L}\tilde{\mathbf{x}}.
5: 𝐱{\mathbf{x}}.
Refer to caption
Figure 2: Panels (a) and (b) compare the IKF (blue dots) with direct computation (red triangles) for computing 𝚺𝐮\bm{\Sigma}\mathbf{u} in the original time scale and logarithmic scale (base 10), respectively. Panel (c) shows the predictive error comparison between non-robust IKF (light blue squares) and robust IKF (blue dots).

While Algorithm 1 can be applied to DLM regardless of the noise variance, we do not recommend directly computing 𝚺𝐮\bm{\Sigma}\mathbf{u} in scenarios when the noise variance VtV_{t} is zero or close to zero in Eq. (1). This is because when Vt=0V_{t}=0, QtQ_{t} in Eq. (27), the variance of the one-step-ahead predictive distribution, could be extremely small, which leads to large numerical errors in Eq. (28) due to unstable inversion of QtQ_{t}. To ensure robustness, we suggest computing 𝐱=(𝚺+V𝐈N)𝐮\mathbf{x}=(\bm{\Sigma}+V\mathbf{I}_{N}){\mathbf{u}} with a positive scalar VV and outputting 𝐱V𝐮\mathbf{x}-V{\mathbf{u}} as the results of 𝚺𝐮\bm{\Sigma}{\mathbf{u}}. Here, 𝚺+V𝐈N\bm{\Sigma}+V\mathbf{I}_{N} can be interpreted as the covariance matrix of a DLM with noise variance Vt=VV_{t}=V. Essentially, we introduce artificial noise to ensure that QtQ_{t} computed in KF is at least VV for numerical stability. The result remains exact and independent of the choice of VV.

We compare the IKF approach with the direct matrix-vector multiplication of 𝚺𝐮\bm{\Sigma}\mathbf{u} in noise-free scenarios in Fig. 2. The experiments utilize zt=z(dt)z_{t}=z(d_{t}) with dd uniformly sampled from [0,1][0,1] and the Matérn covariance with unit variance, roughness parameter 2.52.5, and range parameter γ=0.1\gamma=0.1 [12]. Panels (a) and (b) show that IKF significantly reduces computational costs compared to direct computation, with a linear relationship between computation time and the number of observations. IKF achieves covariance matrix-vector multiplication for a 10610^{6}-dimensional output vector in about 2 seconds on a desktop, making it highly efficient for iterative algorithms like the CG algorithm [15] and the randomized log-determinant approximation [29]. Fig. 2(c) compares the maximum absolute error between robust IKF with V=0.1V=0.1 and non-robust IKF with V=0V=0. The non-robust IKF exhibits large numerical errors due to instability when QtQ_{t} approaches zero, while robust IKF remains stable by ensuring QtQ_{t} is no smaller than VV, even with near-singular covariance matrices 𝚺\bm{\Sigma}.

Lemmas 1 and 2 facilitate the computation of each element in 𝐋\mathbf{L} using DLM parameters and enable the linear computation of (𝐋T)1𝐱~(\mathbf{L}^{T})^{-1}\tilde{\mathbf{x}}. These results are summarized in Lemmas 3 and 4, respectively, with proofs provided in the Supplementary Material.

Lemma 3 (Cholesky factor from the inverse Kalman filter).

Each entry (t,t)(t^{\prime},t) in the lower triangular matrix 𝐋\mathbf{L} with ttt^{\prime}\geq t has the form

Lt,t=Qt12t,t,\displaystyle L_{t^{\prime},t}={Q_{t}^{\frac{1}{2}}}\ell_{t^{\prime},t}, (10)

where t,t=1\ell_{t^{\prime},t^{\prime}}=1, and for t>tt^{\prime}>t, t,t\ell_{t^{\prime},t} is defined as

t,t=𝐅t(l=t+1t𝐆l)𝐊t,\displaystyle\ell_{t^{\prime},t}=\mathbf{F}_{t^{\prime}}\left(\prod^{t^{\prime}}_{l=t+1}\mathbf{G}_{l}\right)\mathbf{K}_{t}, (11)

with l=t+1t𝐆l=𝐆t𝐆t1𝐆t+1\prod^{t^{\prime}}_{l=t+1}\mathbf{G}_{l}=\mathbf{G}_{t^{\prime}}\mathbf{G}_{t^{\prime}-1}\dots\mathbf{G}_{t+1}, QtQ_{t} and 𝐊t\mathbf{K}_{t} defined in (27) and (6), respectively.

Lemma 4 (Compute 𝐮=(𝐋T)1𝐱~{\mathbf{u}}=(\mathbf{L}^{T})^{-1}\tilde{\mathbf{x}} with linear computational cost).

For any 𝐱~\tilde{\mathbf{x}}, let uN=QN1/2x~Nu_{N}={Q_{N}^{-1/2}}\tilde{x}_{N}, and uN1=QN11/2x~N1N,N1uNu_{N-1}=Q_{N-1}^{-1/2}\tilde{x}_{N-1}-\ell_{N,N-1}u_{N}. For t=N2,,1t=N-2,\dots,1, iteratively compute utu_{t} by

ut\displaystyle u_{t} =Qt12x~t~t+1,tt+1,tut+1,\displaystyle=Q_{t}^{-\frac{1}{2}}\tilde{x}_{t}-{\tilde{\ell}}_{t+1,t}-\ell_{t+1,t}u_{t+1}, (12)

with ~t+1,t{\tilde{\ell}}_{t+1,t} and t+1,t\ell_{t+1,t} defined in Lemma 1. Then 𝐮=(𝐋T)1𝐱~{\mathbf{u}}=(\mathbf{L}^{T})^{-1}\tilde{\mathbf{x}}.

3 Nonparametric estimation of particle interaction functions

The IKF algorithm is motivated by the problem of estimating interactions between particles, which is crucial for understanding complex behaviors of molecules and active matter, such as migrating cells and flocking birds. Minimal physical models, such as the Vicsek model [35] and their extensions [1, 9], provide a framework for explaining the collective motion of active matter.

Consider a physical model encompassing multiple types of latent interactions. Let 𝐲i\mathbf{y}_{i} be a DyD_{y}-dimensional output vector representing the iith particle, which is influenced by JJ distinct interaction types. For the jjth type of interaction, the iith particle interacts with a subset pi,jp_{i,j} of particles rather than all particles, typically those within a radius rjr_{j}. This relationship is expressed as a latent factor model:

𝐲i=j=1Jk=1pi,j𝐚i,j,kzj(di,j,k)+ϵi,\mathbf{y}_{i}=\sum^{J}_{j=1}\sum^{p_{i,j}}_{k=1}\mathbf{a}_{i,j,k}z_{j}(d_{i,j,k})+\bm{\epsilon}_{i}, (13)

where ϵi𝒩(0,σ02IDy)\bm{\epsilon}_{i}\sim\mathcal{MN}(0,\sigma^{2}_{0}I_{D_{y}}) denotes a Gaussian noise vector. The term 𝐚i,j,k\mathbf{a}_{i,j,k} is a DyD_{y}-dimensional known factor loading that links the iith output to the kkth neighbor in the jjth interaction, with the unknown interaction function zj()z_{j}(\cdot) evaluated at a scalar-valued input di,j,kd_{i,j,k}, such as the distance between particles ii and kk, for k=1,,pi,jk=1,\dots,p_{i,j}, i=1,,ni=1,\dots,n, and j=1,,Jj=1,\dots,J. For a dataset of npn_{p} particles over nτn_{\tau} time points, the total number of observations is N~=nDy=npnτDy\tilde{N}=nD_{y}=n_{p}n_{\tau}D_{y}.

An illustrative example of this framework is the Vicsek model, a seminal approach for studying collective motion [35] (see Fig. S1(a) in the Supplementary Material). In this model, the 2-dimensional velocity of the ii^{\prime}th particle at time τ\tau, 𝐯i(τ)=(vi,1(τ),vi,2(τ))T\mathbf{v}_{i^{\prime}}(\tau)=(v_{i^{\prime},1}(\tau),v_{i^{\prime},2}(\tau))^{T}, has a constant magnitude v=𝐯i(τ)v=||\mathbf{v}_{i^{\prime}}(\tau)||, where ||||||\cdot|| denotes the L2L_{2} norm, and vi,1(τ)v_{i^{\prime},1}(\tau) and vi,2(τ)v_{i^{\prime},2}(\tau) are the components of the velocity along two orthogonal directions for i=1,,npi^{\prime}=1,\dots,n_{p} and τ=1,,nτ\tau=1,\dots,n_{\tau}. The velocity angle ϕi(τ)=tan1(vi,2(τ)/vi,1(τ))\phi_{i^{\prime}}(\tau)=\tan^{-1}(v_{i^{\prime},2}(\tau)/v_{i^{\prime},1}(\tau)) is updated as

ϕi(τ)\displaystyle\phi_{i^{\prime}}(\tau) =1pi(τ1)knei(τ1)ϕk(τ1)+ϵi(τ),\displaystyle=\frac{1}{p_{i^{\prime}}(\tau-1)}{\sum_{k\in ne_{i^{\prime}}(\tau-1)}\phi_{k}(\tau-1)}+\epsilon_{i^{\prime}}(\tau), (14)

where ϵi(τ)\epsilon_{i^{\prime}}(\tau) is a zero-mean Gaussian noise with variance σ02\sigma^{2}_{0}. The set of neighbors nei(τ1)ne_{i^{\prime}}(\tau-1) includes particles within a radius of rr from particle ii^{\prime} at time τ1\tau-1, i.e., nei(τ1)={k:𝐬i(τ1)𝐬k(τ1)<r}ne_{i^{\prime}}(\tau-1)=\{k:||\mathbf{s}_{i^{\prime}}(\tau-1)-\mathbf{s}_{k}(\tau-1)||<r\}, where 𝐬i(τ1)\mathbf{s}_{i^{\prime}}(\tau-1) and 𝐬k(τ1)\mathbf{s}_{k}(\tau-1) are 2-dimensional position vectors of particle ii^{\prime} and its neighbor kk, respectively, and pi(τ1)p_{i^{\prime}}(\tau-1) denotes the total number of neighbors of the ii^{\prime}th particle, including itself, at time τ1\tau-1.

The Vicsek model in Eq. (14) is a special case of the general framework in Eq. (13) with one-dimensional output yi=ϕi(τ)y_{i}=\phi_{i^{\prime}}(\tau), i.e. Dy=1D_{y}=1, with the index of the iith observation i=(τ1)np+ii=(\tau-1)n_{p}+i^{\prime} being a function of time τ\tau and particle ii^{\prime}. The Viscek model contains a single interaction, i.e. J=1J=1, with linear interaction function z(d)=dz(d)=d, where dd being the velocity angle of the neighboring particle, and the factor loading being 1/pi(τ1)1/p_{i^{\prime}}(\tau-1).

Numerous other physical and mathematical models of self-propelled particle dynamics can also be formulated as in Eq. (13) with a parametric form of a particle interaction function [1]. Nonparametric estimation of particle interactions is preferred when the physical mechanism is unknown [17, 23], but the high computational cost limits its applicability to systems with large numbers of particles over long time periods.

In this work, we model each latent interaction function zj()z_{j}(\cdot) nonparametrically using a GP. By utilizing kernels with an equivalent DLM representation, we can leverage the proposed IKF algorithms to significantly expedite computation. This includes commonly used kernels such as Matérn kernel with a half-integer roughness parameter [12], often used as the default setting of the GP models for predicting nonlinear functions, as the smoothness of the process can be controlled by its roughness parameters [11].

For each interaction jj, we form the input vector 𝐝j(u)=[d1,j(u),,dNj,j(u)]T\mathbf{d}^{(u)}_{j}=[d^{(u)}_{1,j},\dots,d^{(u)}_{N_{j},j}]^{T}, where ‘u’ indicates that the input entries are unordered, dt,j(u)=di,j,kd^{(u)}_{t,j}=d_{i,j,k} with t=i=1i1pi,j+kt=\sum^{i-1}_{i^{\prime}=1}p_{i^{\prime},j}+k for any tuple (i,j,k)(i,j,k), k=1,,pi,jk=1,\dots,p_{i,j}, i=1,,ni=1,\dots,n, j=1,,Jj=1,\dots,J, and Nj=i=1npi,jN_{j}=\sum^{n}_{i=1}p_{i,j}. The marginal distribution of the jjth factor follows 𝐳j(u)=[zj(d1,j(u)),,zj(dNj,j(u))]T𝒩(𝟎,𝚺j(u))\mathbf{z}_{j}^{(u)}=[z_{j}(d_{1,j}^{(u)}),\dots,z_{j}(d_{N_{j},j}^{(u)})]^{T}\sim\mathcal{MN}(\mathbf{0},\,\bm{\Sigma}_{j}^{(u)}), where the (t,t)(t,t^{\prime})th entry of the Nj×NjN_{j}\times N_{j} covariance matrix 𝚺j(u)\bm{\Sigma}_{j}^{(u)} is cov[zj(dt,j(u)),zj(dt,j(u))]=σj2cj(dt,j(u),dt,j(u))=σj2cj(d)\mbox{cov}[z_{j}(d_{t,j}^{(u)}),z_{j}(d_{t^{\prime},j}^{(u)})]=\sigma^{2}_{j}c_{j}(d_{t,j}^{(u)},d_{t^{\prime},j}^{(u)})=\sigma^{2}_{j}c_{j}(d), with d=|dt,j(u)dt,j(u)|d=|d_{t,j}^{(u)}-d_{t^{\prime},j}^{(u)}|, σj2\sigma^{2}_{j} and cj()c_{j}(\cdot) being the variance parameter and the correlation function, respectively. We employ the Matérn correlation with roughness parameters νj=0.5\nu_{j}=0.5 and νj=2.5\nu_{j}=2.5. For νj=0.5\nu_{j}=0.5, the Matérn correlation is the exponential correlation cj(d)=exp(d/γj)c_{j}(d)=\exp(-d/\gamma_{j}), while for νj=2.5\nu_{j}=2.5, the Matérn correlation follows:

cj(d)=(1+51/2dγj+5d23γj2)exp(51/2dγj).c_{j}(d)=\left(1+\frac{{5}^{1/2}d}{\gamma_{j}}+\frac{5d^{2}}{3\gamma_{j}^{2}}\right)\exp\left(-\frac{{5}^{1/2}d}{\gamma_{j}}\right)\,. (15)

where γj\gamma_{j} denotes the range parameter.

By integrating out latent factor processes 𝐳j\mathbf{z}_{j}, the marginal distribution of the observation vector 𝐲=(𝐲1T,,𝐲nT)T\mathbf{y}=(\mathbf{y}^{T}_{1},\dots,\mathbf{y}^{T}_{n})^{T}, with dimension N~=nDy\tilde{N}=nD_{y}, follows a multivariate normal distribution:

(𝐲𝚺j(u),σ02)𝒩(𝟎,j=1J𝐀j𝚺j(u)𝐀jT+σ02𝐈N~),\displaystyle\left(\mathbf{y}\mid\bm{\Sigma}^{(u)}_{j},\sigma^{2}_{0}\right)\sim\mathcal{MN}\left(\mathbf{0},\sum^{J}_{j=1}\mathbf{A}_{j}\bm{\Sigma}^{(u)}_{j}\mathbf{A}^{T}_{j}+\sigma^{2}_{0}\mathbf{I}_{\tilde{N}}\right), (16)

where 𝐀j\mathbf{A}_{j} is a sparse N~×Nj\tilde{N}\times N_{j} block diagonal matrix, such that the iith diagonal block is a Dy×pi,jD_{y}\times p_{i,j} matrix 𝐀i,j=[𝐚i,j,1,,𝐚i,j,pi,j]\mathbf{A}_{i,j}=[\mathbf{a}_{i,j,1},\dots,\mathbf{a}_{i,j,p_{i,j}}] for i=1,,ni=1,\dots,n. The posterior predictive distribution of the latent variable zj(d)z_{j}(d^{*}) for any given input dd^{*} is also a normal distribution:

(zj(d)𝐲,σ02,𝝈2,𝜸,𝐫)𝒩(z^j(d),cj(d)),\left(z_{j}(d^{*})\mid\mathbf{y},\sigma^{2}_{0},\bm{\sigma}^{2},\bm{\gamma},\mathbf{r}\right)\sim\mathcal{N}(\hat{z}_{j}(d^{*}),c^{*}_{j}(d^{*})), (17)

with the predictive mean and variance given by

z^j(d)\displaystyle\hat{z}_{j}(d^{*}) =𝚺j(u)(d)T𝐀jT𝚺y1𝐲,\displaystyle=\bm{\Sigma}^{(u)}_{j}(d^{*})^{T}\mathbf{A}^{T}_{j}\bm{\Sigma}_{y}^{-1}\mathbf{y}, (18)
cj(d)\displaystyle c^{*}_{j}(d^{*}) =cj(d,d)𝚺j(u)(d)T𝐀jT𝚺y1𝐀j𝚺j(u)(d),\displaystyle=c_{j}(d^{*},d^{*})-\bm{\Sigma}^{(u)}_{j}(d^{*})^{T}\mathbf{A}^{T}_{j}\bm{\Sigma}_{y}^{-1}\mathbf{A}_{j}\bm{\Sigma}^{(u)}_{j}(d^{*}), (19)

where 𝚺y=j=1J𝐀j𝚺j(u)𝐀jT+σ02𝐈N~\bm{\Sigma}_{y}=\sum^{J}_{j=1}\mathbf{A}_{j}\bm{\Sigma}^{(u)}_{j}\mathbf{A}^{T}_{j}+\sigma^{2}_{0}\mathbf{I}_{\tilde{N}}, 𝚺j(u)(d)=[cj(d1,j(u),d),,cj(dNj,j(u),d)]T\bm{\Sigma}^{(u)}_{j}(d^{*})=[c_{j}(d^{(u)}_{1,j},d^{*}),\dots,c_{j}(d^{(u)}_{N_{j},j},d^{*})]^{T}, and 𝐫\mathbf{r} represents additional system parameters.

The main computational challenge in calculating the predictive distribution (17) lies in inverting the covariance matrix 𝚺y\bm{\Sigma}_{y}, where N~\tilde{N} and NjN_{j} range between 10510^{5} and 10610^{6} in applications. Though various GP approximation methods have been proposed [34, 30, 8, 2, 22, 10, 3, 18], they typically approximate the parametric covariance 𝚺j(u)\bm{\Sigma}_{j}^{(u)} rather than 𝚺y\bm{\Sigma}_{y}, thus not directly applicable for computing the distribution in Eq. (17).

To address this, we employ the CG algorithm [15] to accelerate the computation of the predictive distribution. Each CG iteration needs to compute 𝚺y𝐮\bm{\Sigma}_{y}\mathbf{u} for an N~\tilde{N}-dimensional vector 𝐮{\mathbf{u}}. To employ the IKF, we first need to rearrange the input vector 𝐝j(u)\mathbf{d}^{(u)}_{j} into a non-decreasing input sequence 𝐝j=[d1,j,,dNj,j]T\mathbf{d}_{j}=[d_{1,j},\dots,d_{N_{j},j}]^{T}. Denote the covariance var[𝐳j]=𝚺j\mbox{var}[\mathbf{z}_{j}]=\bm{\Sigma}_{j} with 𝐳j=[zj(d1,j),,zj(dNj,j)]T\mathbf{z}_{j}=[z_{j}(d_{1,j}),\dots,z_{j}(d_{N_{j},j})]^{T}. The covariance of the observations can be written as a weighted sum of 𝚺j\bm{\Sigma}_{j} by introducing a permutation matrix for each interaction: 𝚺y=j=1J𝐀π,j𝚺j𝐀π,jT+σ02𝐈N\bm{\Sigma}_{y}=\sum^{J}_{j=1}\mathbf{A}_{\pi,j}\bm{\Sigma}_{j}\mathbf{A}^{T}_{\pi,j}+\sigma^{2}_{0}\mathbf{I}_{N}, where 𝐀π,j=𝐀j𝐏jT\mathbf{A}_{\pi,j}=\mathbf{A}_{j}\mathbf{P}^{T}_{j} with 𝐏j\mathbf{P}_{j} being a permutation matrix such that 𝚺j=𝐏j𝚺j(u)𝐏jT\bm{\Sigma}_{j}=\mathbf{P}_{j}\bm{\Sigma}^{(u)}_{j}\mathbf{P}^{T}_{j}. After this reordering, the computation of 𝚺y𝐮\bm{\Sigma}_{y}\mathbf{u} can be broken into four steps: (1) 𝐮j=𝐀π,jT𝐮\mathbf{u}_{j}=\mathbf{A}_{\pi,j}^{T}\mathbf{u}, (2) 𝐱j=𝚺j𝐮j\mathbf{x}_{j}=\bm{\Sigma}_{j}\mathbf{u}_{j}, (3) 𝐮^j=𝐀π,j𝐱j\mathbf{\hat{u}}_{j}=\mathbf{A}_{\pi,j}\mathbf{x}_{j}, and (4) j=1J𝐮^j+σ02𝐮\sum^{J}_{j=1}\mathbf{\hat{u}}_{j}+\sigma^{2}_{0}\mathbf{u}. Here 𝐀j\mathbf{A}_{j} is a sparse matrix with NjDyN_{j}D_{y} non-zero entries. The IKF algorithm is used in step (2) to accelerate the most expensive computation, with all computations performed directly using terms in the KF algorithm without explicitly constructing the covariance matrix.

We refer to the resulting approach as the IKF-CG algorithm. This approach reduces the computational cost for computing the posterior distribution from 𝒪(N~3)\mathcal{O}(\tilde{N}^{3}) operations to pseudolinear order with respect to NjN_{j}, as shown in Table S1 in the Supplementary Material. Furthermore, the IKF-CG algorithm can accelerate the parameter estimation via both cross-validation and maximum likelihood estimation, with the latter requiring an additional approximation of the log-determinant [29]. Details of the CG algorithm, the computation of the predictive distribution, parameter estimation procedures, and computational complexity are discussed in Sections S3, S4, S6, and S7 of the Supplementary Material, respectively.

4 Numerical results for particle interaction function estimation

4.1 Evaluation criteria

We conduct simulation studies in Sections 4.2-4.3 and analyze real cell trajectory data in Section 4.4 to estimate particle interaction functions in physical models. The code and data to reproduce all numerical results are available at https://github.com/UncertaintyQuantification/IKF. Simulation observations are generated at equally spaced time frames τ=1,,nτ\tau=1,\dots,n_{\tau} with interval h=0.1h=0.1, though the proposed approach is applicable to both equally and unequally spaced time frames. For simplicity, the number of particles npn_{p} is assumed constant across all time frames during simulations.

For each of the JJ latent factors, predictive performance is assessed using normalized root mean squared error (NRMSEj), the average length of the 95%95\% posterior credible interval (Lj(95%)L_{j}(95\%)), and the proportion of interaction function values covered within the 95% posterior credible interval (Pj(95%)P_{j}(95\%)), based on nn^{*} test inputs 𝐝j=(d1,j,,dn,j)T\mathbf{d}^{*}_{j}=(d^{*}_{1,j},\dots,d^{*}_{n^{*},j})^{T}:

NRMSEj\displaystyle\mbox{NRMSE}_{j} =(i=1n(z^j(di,j)zj(di,j))2i=1n(z¯jzj(di,j))2)1/2,\displaystyle=\left(\frac{{\sum^{n^{*}}_{i=1}(\hat{z}_{j}(d^{*}_{i,j})-z_{j}(d^{*}_{i,j}))^{2}}}{\sum^{n^{*}}_{i=1}(\bar{z}_{j}-z_{j}(d^{*}_{i,j}))^{2}}\right)^{1/2}, (20)
Lj(95%)\displaystyle L_{j}(95\%) =1ni=1nlength{CIi,j(95%)},\displaystyle=\frac{1}{n^{*}}\sum^{n^{*}}_{i=1}\mbox{length}\left\{CI_{i,j}(95\%)\right\}, (21)
Pj(95%)\displaystyle P_{j}(95\%) =1ni=1n1zj(di,j)CIi,j(95%).\displaystyle=\frac{1}{n^{*}}\sum^{n^{*}}_{i=1}1_{z_{j}(d^{*}_{i,j})\in CI_{i,j}(95\%)}. (22)

Here, z^j(di,j)\hat{z}_{j}(d^{*}_{i,j}) represents the predicted mean at di,jd^{*}_{i,j}, z¯j\bar{z}_{j} is the average of the jjth interaction function, and CIi,j(95%)CI_{i,j}(95\%) denotes the 95%95\% posterior credible interval of zj(di,j)z_{j}(d^{*}_{i,j}), for j=1,,Jj=1,\dots,J. A desirable method should have a low NRMSEj, small Lj(95%)L_{j}(95\%), and Pj(95%)P_{j}(95\%) close to 95%95\%.

To account for variability due to initial particle positions, velocities, and noise, each simulation scenario is repeated E=20E=20 times to compute the average performance metrics. Unless otherwise specified, parameters are estimated by cross-validation, with 80%80\% of the data used as a training set and the rest 20%20\% as the validation set. All computations are performed on a macOS Mojave system with an 8-core Intel i9 processor running at 3.60 GHz and 32 GB of RAM.

Refer to caption
Figure 3: (a) Computational time of calculating predictive means using IKF-CG algorithm (blue dots), CG method (purple squares), and direct computation (red triangles) for the Vicsek model with different numbers of observations. (b, c) NRMSE of predictive means for interaction function and log-likelihood comparing direct computation (red triangles) and IKF-CG algorithm (blue dots), with differences (yellow diamonds) of order 10510^{-5} to 10410^{-4} and 10410^{-4} to 10310^{-3}, respectively.

4.2 Vicsek model

We first consider the Vicsek model introduced in Section 3. For each particle ii^{\prime} at time τ\tau, after obtaining its velocity angle ϕi(τ)\phi_{i^{\prime}}(\tau) in Eq. (14), its position is updated as:

𝐬i(τ)\displaystyle\mathbf{s}_{i^{\prime}}(\tau) =𝐬i(τ1)+v[cos(ϕi(τ)),sin(ϕi(τ))]Th,i=1,,np,\displaystyle=\mathbf{s}_{i^{\prime}}(\tau-1)+v[\cos(\phi_{i^{\prime}}(\tau)),\sin(\phi_{i^{\prime}}(\tau))]^{T}h,\quad i^{\prime}=1,\dots,n_{p}, (23)

where vv is the particle speed and hh is the time step size. Particles are initialized with velocity [vcos(ϕi(0)),vsin(ϕi(0))]T[v\cos(\phi_{i^{\prime}}(0)),v\sin(\phi_{i^{\prime}}(0))]^{T}, where ϕi(0)\phi_{i^{\prime}}(0) is drawn from Unif[π,π]\mbox{Unif}[-\pi,\pi] and v=21/2/20.71v={2}^{1/2}/2\approx 0.71. Initial particle positions 𝐬i(0)\mathbf{s}_{i^{\prime}}(0) are uniformly sampled from [0,np1/2]×[0,np1/2][0,{n_{p}}^{1/2}]\times[0,{n_{p}}^{1/2}] to keep consistent particle density across experiments. The goal is to estimate the interaction function nonparametrically, without assuming linearity. The interaction radius r=0.5r=0.5 is estimated alongside other parameters, with results detailed in the Supplementary Material.

Refer to caption
Figure 4: Boxplots of NRMSE (Eq. (20)) for estimating the latent interaction function in the Vicsek model with σ02=0.12\sigma_{0}^{2}=0.1^{2} based on 20 experiments. Panel (a) uses the Matérn covariance with roughness parameter 2.5 and panel (b) uses the exponential covariance.

We first compare the computational cost and accuracy of IKF-CG, CG, and direct computation for calculating the predictive mean in Eq. (18) and the marginal likelihood in Eq. (16) using the covariance in Eq. (15). Simulations are conducted with np=100n_{p}=100 particles and noise variance σ02=0.22\sigma^{2}_{0}=0.2^{2}. Panel (a) of Fig. 3 shows that the IKF-CG approach substantially outperforms both direct computation and the CG algorithm in computational time when predicting n=100n^{*}=100 test inputs with varying time lengths nτn_{\tau}, ranging from 4 to 200. Fig. 3(b) compares the NRMSE of the predictive mean between direct computation and the IKF-CG method as the number of observations ranges from 400 to 2,000, corresponding to nτn_{\tau} from 4 to 20. The two methods yield nearly identical predictive errors, with negligible differences in NRMSE. Fig. 3 (c) shows a comparison of log-likelihood values computed via direct computation and IKF-CG, where the latter employs the log-determinant approximation detailed in the Supplementary Material. Both methods produce similar log-likelihood values across different sample sizes.

Refer to caption
Figure 5: Uncertainty assessment of predicted interaction function in the Vicsek model with σ02=0.12\sigma_{0}^{2}=0.1^{2} using (a) Matérn covariance function in Eq. (15) and (b) exponential covariance function. Bars represent average lengths of 95% posterior credible intervals (Eq. (21)). Dots indicate the proportions covered in the 95% intervals (Eq. (22)), with optimal coverage (0.95) shown as the purple dashed line.
Refer to caption
Figure 6: (a) Predicted (blue dashed lines) versus true (pink solid lines) interaction function with 95%95\% posterior credible interval (grey shaded area) for Vicsek model with nτ=900n_{\tau}=900, T=10T=10, and σ02=0.12\sigma_{0}^{2}=0.1^{2}, using Matérn covariance function with ν=2.5\nu=2.5. (b) Trajectories of 4545 particles over 5050 time frames using estimated (blue dashed lines) and true (pink solid lines) interaction functions with identical initial positions and noise samples.

Next, we evaluate the performance of the IKF-CG algorithm across 12 scenarios with varying particle numbers (np=100,300,900n_{p}=100,300,900), time frames (nτ=5,10n_{\tau}=5,10), and noise variances (σ02=0.12,0.22\sigma^{2}_{0}=0.1^{2},0.2^{2}). The predictive performance is evaluated using n=200n^{*}=200 test inputs evenly spaced across the domain of the interaction function, [π,π][-\pi,\pi], averaged over E=20E=20 experiments.

Panels (a) and (b) of Figs. 4-5 show the predictive performance of particle interactions using the Matérn covariance with roughness parameter ν=2.5\nu=2.5 and the exponential covariance (Matérn with ν=0.5\nu=0.5), respectively, for noise variance σ02=0.12\sigma_{0}^{2}=0.1^{2}. Results for σ02=0.22\sigma_{0}^{2}=0.2^{2} are provided in the Supplementary Material. Across all scenarios, NRMSEs remain low, with improved accuracy for larger datasets and smaller noise variances. The decrease in the average length of the 95%95\% posterior interval with increasing sample size, along with the relatively small interval span compared to the output range [π,π][-\pi,\pi], indicates improved prediction confidence with more observations. Moreover, the coverage proportion for the 95%95\% posterior credible is close to the nominal 95%95\% level in all cases, validating the reliability of the uncertainty quantification. By comparing panels (a) and (b) in Figs. 4-5, we find that the Matérn covariance in Eq. (15) yields lower NRMSE and narrower 95%95\% posterior credible intervals than the exponential kernel. This improvement is due to the smoother latent process induced by Eq. (15), which is twice mean-squared differentiable, while the process with an exponential kernel is not differentiable.

Figure 6(a) shows close agreement between predicted and true interaction functions over domain [π,π][-\pi,\pi]. The 95% interval is narrow yet covers approximately 95% of the test samples of the interaction function. Figure 6(b) compares particle trajectories over 5050 time steps using the predicted mean of one-step-ahead predictions and the true interaction function. The trajectories are visually similar.

4.3 A modified Vicsek model with multiple interactions

Various extensions of the Vicsek model have been studied to capture more complex particle dynamics [1, 9]. For illustration, we consider a modified Vicsek model with two interaction functions. The 2-dimensional velocity 𝐯i(τ)=(vi,1(τ),vi,2(τ))T\mathbf{v}_{i^{\prime}}(\tau)=(v_{i^{\prime},1}(\tau),v_{i^{\prime},2}(\tau))^{T}, corresponding to the output 𝐲i\mathbf{y}_{i} in Equation (13), is updated according to:

𝐯i(τ)=knei(τ1)𝐯k(τ1)pi(τ1)+knei(τ1)f(di,k(τ1))𝐞i,k(τ1)pi(τ1)+ϵi(τ),\displaystyle\mathbf{v}_{i^{\prime}}(\tau)=\frac{\sum_{k\in ne_{i^{\prime}}(\tau-1)}\mathbf{v}_{k}(\tau-1)}{p_{i^{\prime}}(\tau-1)}+\frac{\sum_{k\in ne^{{}^{\prime}}_{i^{\prime}}(\tau-1)}f(d_{i^{\prime},k}(\tau-1))\mathbf{e}_{i^{\prime},k}(\tau-1)}{p^{\prime}_{i^{\prime}}(\tau-1)}+\bm{\epsilon}_{i^{\prime}}(\tau), (24)

where ϵi(τ)=(ϵi,1(τ),ϵi,2(τ))T\bm{\epsilon}_{i^{\prime}}(\tau)=(\epsilon_{i^{\prime},1}(\tau),\epsilon_{i^{\prime},2}(\tau))^{T} is a Gaussian noise vector with variance σ02\sigma_{0}^{2}. The first term in (24) models velocity alignment with neighboring particles, and the second term introduces a distance-dependent interaction f()f(\cdot). The definitions of neighbor sets, 2-dimensional vector 𝐞i,k(τ1)\mathbf{e}_{i^{\prime},k}(\tau-1), and the form of f()f(\cdot) are provided in Section S8.2 of the Supplementary Material.

We simulate 12 scenarios, each replicated E=20E=20 times, using the same number of particles, time frame, and noise level as in the original Vicsek model in Section 4.2. The predictive performance of the latent factor model is evaluated using n=200n^{*}=200 test inputs evenly spaced across the training domain of each interaction function. We focus on the model with the Matérn kernel in Eq. (15) and results for the exponential kernel are detailed in the Supplementary Material. Consistent with the results of the Vicsek model, we find models with the Matérn kernel in Eq. (15) are more accurate due to the smooth prior imposed on the interaction function.

First Interaction Second Interaction
NRMSE L1(95%)L_{1}(95\%) P1(95%)P_{1}(95\%) NRMSE L2(95%)L_{2}(95\%) P2(95%)P_{2}(95\%)
np=100n_{p}=100 nτ=5n_{\tau}=5 6.8×1036.8\times 10^{-3} 0.0980.098 92%92\% 8.7×1028.7\times 10^{-2} 0.290.29 94%94\%
nτ=10n_{\tau}=10 6.0×1036.0\times 10^{-3} 0.0970.097 95%95\% 3.8×1023.8\times 10^{-2} 0.180.18 96%96\%
np=300n_{p}=300 nτ=5n_{\tau}=5 1.2×1021.2\times 10^{-2} 0.230.23 87%87\% 3.2×1023.2\times 10^{-2} 0.210.21 96%96\%
nτ=10n_{\tau}=10 4.2×1034.2\times 10^{-3} 0.100.10 98%98\% 2.0×1022.0\times 10^{-2} 0.130.13 97%97\%
np=900n_{p}=900 nτ=5n_{\tau}=5 5.6×1035.6\times 10^{-3} 0.100.10 96%96\% 1.4×1021.4\times 10^{-2} 0.120.12 98%98\%
nτ=10n_{\tau}=10 4.4×1034.4\times 10^{-3} 0.120.12 97%97\% 1.4×1021.4\times 10^{-2} 0.100.10 96%96\%
Table 1: The predictive accuracy and uncertainty assessment by (20)-(22) for the modified Vicsek model with σ02=0.12\sigma_{0}^{2}=0.1^{2} using Matérn covariance function and roughness parameter 2.52.5.

Table 1 summarizes the predictive performance for σ02=0.12\sigma_{0}^{2}=0.1^{2}; results for σ02=0.22\sigma_{0}^{2}=0.2^{2} are included in the Supplementary Material. While both interaction functions exhibit relatively low NRMSEs, the second interaction function has a higher NRMSE than the first because the repulsion at short distances creates fewer training samples with close-proximity neighbors. This discrepancy can be mitigated by increasing the number of observations. The average length of the 95%95\% posterior credible interval for the second interaction decreases as the sample size increases, and coverage proportion remains close to the nominal 95% level across all scenarios.

Refer to caption
Figure 7: Predictions (blue dashed curves), truth (pink solid curves), and the 95%95\% posterior credible interval (shaded region) of particle interaction functions when np=900n_{p}=900, nτ=10n_{\tau}=10, and σ02=0.12\sigma_{0}^{2}=0.1^{2}.

In Fig. 7, we show the predictions of the interaction functions, where the shaded regions represent the 95% posterior credible intervals. The predictions closely match the true values, and the credible intervals, while narrow relative to the output range and nearly invisible in the plots, mostly cover the truth. These results suggest high confidence in the predictions.

4.4 Estimating cell-cell interactions on anisotropic substrates

We analyze video microscopy data from [24], which tracks the trajectories of over 2,000 human dermal fibroblasts moving on a nematically aligned, liquid-crystalline substrate. This experimental setup encourages cellular alignment along a horizontal axis, with alignment order increasing over time, though the underlying mechanism remains largely unknown. Cells were imaged every 20 minutes over a 36-hour period, during which the cell count grew from 2,615 to 2,953 due to proliferation. Our objective is to estimate the latent interaction function between cells. Given the vast number of velocity observations (N300,000N\approx 300,000), direct formation and inversion of the covariance matrix is impractical.

We apply the IKF-CG algorithm to estimate the latent interaction function. Due to the anisotropic substrate, cellular velocities differ between horizontal and vertical directions, so we independently model the velocity of the ii^{\prime}th cell in each direction ll by

vi,l(τ)=1pnei,l(τ1)knei,l(τ1)zl(dk,l)+ϵi,l(τ),i=1,,np(τ)v_{i^{\prime},l}(\tau)=\frac{1}{p_{ne_{i^{\prime},l}(\tau-1)}}\sum_{k\in ne_{i^{\prime},l}(\tau-1)}z_{l}(d_{k,l})+\epsilon_{i^{\prime},l}(\tau),\quad i^{\prime}=1,\dots,n_{p}(\tau) (25)

where np(τ)n_{p}(\tau) is the cell count at time τ\tau, l=1,2l=1,2 correspond to horizontal and vertical directions, respectively, and ϵi,l(τ)𝒩(0,σ0,l2(τ))\epsilon_{i^{\prime},l}(\tau)\sim\mathcal{N}(0,\sigma^{2}_{0,l}(\tau)) denotes the noise. Inspired by the modified Vicsek model, we set dk,l=vk,l(τ1)d_{k,l}=v_{k,l}(\tau-1). To account for velocity decay caused by increasing cell confluence, we model the noise variances as σ0,l2(τ)=ωlσv,l2(τ1)\sigma^{2}_{0,l}(\tau)=\omega_{l}\sigma^{2}_{v,l}(\tau-1), where σv,l2(τ1)\sigma^{2}_{v,l}(\tau-1) is the sample velocity variance at time (τ1)(\tau-1), and ωl\omega_{l} is a parameter estimated from data for l=1,2l=1,2. The neighboring set nei,l(τ1)={k:𝐬i(τ1)𝐬k(τ1)<rl and 𝐯i(τ1)𝐯k(τ1)>0}ne_{i^{\prime},l}(\tau-1)=\{k:||\mathbf{s}_{i^{\prime}}(\tau-1)-\mathbf{s}_{k}(\tau-1)||<r_{l}\mbox{ and }\mathbf{v}_{i^{\prime}}(\tau-1)\cdot\mathbf{v}_{k}(\tau-1)>0\}, where rlr_{l} denotes the interaction radius of direction ll for l=1,2l=1,2, excludes particles moving in opposite directions, reflecting the observed gliding and intercalating behavior of cells [36].

Refer to caption
Figure 8: The blue dashed curves show the predicted interaction function for the horizontal and vertical directions. The grey shaded area is the 95% posterior credible interval. The purple line of slope 1 is the prediction using the Vicsek model. The light pink histogram shows the velocity distribution of all training samples.

We use observations from the first half of the time frames as the training set and the latter half as the testing set. The predicted interaction functions in Fig. 8 show diminishing effects in both directions, likely due to cell-substrate interactions such as friction. The estimated uncertainty (obtained via residual bootstrap) increases when the absolute velocity of neighboring cells in the previous time frame is large, which is attributed to fewer observations at the boundaries of the input domain. Furthermore, the interaction in the vertical direction is weaker than in the horizontal direction, due to the confinement from the anisotropic substrate in the vertical direction.

Our nonparametric estimation of the interaction functions is compared with two models for one-step-ahead forecasts of directional velocities: the original Vicsek model introduced in Section 4.2 and the anisotropic Vicsek model, which predicts velocities using the average velocity of neighboring cells from the previous time frame. As shown in Table 2, our model outperforms both Vicsek models in RMSE for one-step-ahead forecasts of directional velocities, despite the large inherent stochasticity in cellular motion. Moreover, our model has notably shorter average interval lengths that cover approximately 95%95\% of the held-out observations. These findings underscore the importance of the IFK-CG algorithm in enabling the use of large datasets to overcome high stochasticity and capture the underlying dynamics of alignment processes.

Horizontal direction Vertical direction
RMSE L(95%) P(95%) RMSE L(95%) P(95%)
Baseline Vicsek 4.5×1034.5\times 10^{-3} 0.011 80% 5.3×1035.3\times 10^{-3} 0.011 95%
Anisotropic Vicsek 3.9×1033.9\times 10^{-3} 0.024 98% 2.6×1032.6\times 10^{-3} 0.020 99%
Nonparametric estimation 3.5×1033.5\times 10^{-3} 0.015 95% 2.3×1032.3\times 10^{-3} 0.0092 95%
Table 2: One-step-ahead prediction performance on the testing dataset. Here RMSE={τ=1nτi=1np(τ)(v^i,l(τ)vi,l(τ))2/τ=1nτnp(τ)}1/2\mbox{RMSE}=\{\sum_{\tau=1}^{n_{\tau}^{*}}\sum_{i^{\prime}=1}^{n_{p}(\tau)}(\hat{v}_{i^{\prime},l}(\tau)-v_{i^{\prime},l}(\tau))^{2}/\sum_{\tau=1}^{n_{\tau}^{*}}n_{p}(\tau)\}^{1/2}, with nτn_{\tau}^{*} denoting the number of testing time frames. L(95%) and P(95%) are computed similarly to those in Eqs. (21) - (22), but with the predictive interval of zz replaced by that of yy, where yy represents the velocity in each direction.

5 Concluding remarks

Several future directions are worth exploring. First, computing large matrix-vector products is ubiquitous, and the approach can be extended to different models, including some large neural network models. Second, it is an open topic to scalably compute the logarithm of the determinant of the covariance in Eq. (3) without approximation. Third, the new approach can be extended to estimate latent functions when forward equations are unavailable or computationally intensive in nonlinear or non-Gaussian dynamical systems, where ensemble Kalman filter [6, 33] and particle filter [20] are commonly used. Fourth, the proposed approach can be extended for estimating and predicting multivariate time series [21] and generalized factor processes for categorical observations [4].

Acknowledgement

Xinyi Fang acknowledges partial support from the BioPACIFIC Materials Innovation Platform of the National Science Foundation under Award No. DMR-1933487. We thank the editor, associate editor, and three anonymous referees for their comments that substantially improved this article.

Supplementary Material

The Supplementary Material contains (i) proofs of all lemmas; (ii) a summary of the connection between GPs with Matérn covariance (roughness parameters 0.5 and 2.5) and DLMs, with closed-form 𝐅t,j\mathbf{F}_{t,j}, 𝐆t,j\mathbf{G}_{t,j} and 𝐖t,j\mathbf{W}_{t,j}; (iii) the conjugate gradient algorithm; (iv) procedures for the scalable computation of the predictive distribution in particle dynamics; (v) an application of the IKF-CG algorithm for predicting missing values in lattice data; (vi) the parameter estimation method; (vii) computational complexity analysis; (viii) additional numerical results for particle interaction; and (ix) numerical results for predicting missing values in lattice data.

Appendix A. Kalman Filter

The Kalman filter [16] for dynamic linear models in (1)-(2) is summarized in Lemma 5.

Lemma 5.

Let 𝛉t1𝐲1:t1𝒩(𝐦t1,𝐂t1)\bm{\theta}_{t-1}\mid{\mathbf{y}}_{1:t-1}\sim\mathcal{MN}(\mathbf{m}_{t-1},\mathbf{C}_{t-1}). Recursively for t=2,,Nt=2,\dots,{N}:

  • (i)

    The one-step-ahead predictive distribution of 𝜽t\bm{\theta}_{t} given 𝐲1:t1{\mathbf{y}}_{1:t-1} is

    𝜽t𝐲1:t1𝒩(𝐛t,𝐁t),\bm{\theta}_{t}\mid{\mathbf{y}}_{1:t-1}\sim\mathcal{MN}(\mathbf{b}_{t},\mathbf{B}_{t}), (26)

    with 𝐛t=𝐆t𝐦t1\mathbf{b}_{t}=\mathbf{G}_{t}\mathbf{m}_{t-1} and 𝐁t=𝐆t𝐂t1𝐆tT+𝐖t\mathbf{B}_{t}=\mathbf{G}_{t}\mathbf{C}_{t-1}\mathbf{G}^{T}_{t}+\mathbf{W}_{t}.

  • (ii)

    The one-step-ahead predictive distribution of yty_{t} given 𝐲1:t1{\mathbf{y}}_{1:t-1} is

    yt𝐲1:t1𝒩(ft,Qt),y_{t}\mid{\mathbf{y}}_{1:t-1}\sim\mathcal{N}(f_{t},Q_{t}), (27)

    with ft=𝐅t𝐛t,f_{t}=\mathbf{F}_{t}\mathbf{b}_{t}, and Qt=𝐅t𝐁t𝐅tT+VtQ_{t}=\mathbf{F}_{t}\mathbf{B}_{t}\mathbf{F}^{T}_{t}+V_{t}.

  • (iii)

    The filtering distribution of 𝜽t\bm{\theta}_{t} given 𝐲1:t{\mathbf{y}}_{1:t} follows

    𝜽t𝐲1:t𝒩(𝐦t,𝐂t),\bm{\theta}_{t}\mid{\mathbf{y}}_{1:t}\sim\mathcal{MN}(\mathbf{m}_{t},\mathbf{C}_{t}), (28)

    with 𝐦t=𝐛t+𝐁t𝐅tTQt1(ytft)\mathbf{m}_{t}=\mathbf{b}_{t}+\mathbf{B}_{t}\mathbf{F}^{T}_{t}Q^{-1}_{t}(y_{t}-f_{t}) and 𝐂t=𝐁t𝐁t𝐅tTQt1𝐅t𝐁t\mathbf{C}_{t}=\mathbf{B}_{t}-\mathbf{B}_{t}\mathbf{F}^{T}_{t}Q^{-1}_{t}\mathbf{F}_{t}\mathbf{B}_{t}.

Lemma 6.

Denote 𝐲~=(y~1,,y~N)T=𝐋1𝐲\mathbf{\tilde{y}}=(\tilde{y}_{1},\dots,\tilde{y}_{N})^{T}=\mathbf{L}^{-1}\mathbf{y}, where 𝐋\mathbf{L} is the factor in Cholesky decomposition of a positive definite covariance 𝚺{\bm{\Sigma}}, with 𝚺=cov[𝐲]{\bm{\Sigma}}=\mbox{cov}[\mathbf{y}] and 𝐲=(y1,,yN)T\mathbf{y}=(y_{1},\dots,y_{N})^{T} defined as in (1). We have

y~t=ytftQt12,\tilde{y}_{t}=\frac{y_{t}-f_{t}}{{Q}^{\frac{1}{2}}_{t}}, (29)

where ftf_{t} and QtQ_{t} are defined in Eq. (27). Furthermore, the ttth diagonal term of 𝐋\mathbf{L} is Lt,t=Qt1/2L_{t,t}={Q}^{1/2}_{t}.

References

  • [1] Hugues Chaté, Francesco Ginelli, Guillaume Grégoire, Fernando Peruani, and Franck Raynaud. Modeling collective motion: variations on the Vicsek model. The European Physical Journal B, 64(3):451–456, 2008.
  • [2] Noel Cressie and Gardar Johannesson. Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(1):209–226, 2008.
  • [3] Abhirup Datta, Sudipto Banerjee, Andrew O Finley, and Alan E Gelfand. Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. Journal of the American Statistical Association, 111(514):800–812, 2016.
  • [4] Daniele Durante and David B Dunson. Nonparametric Bayes dynamic modelling of relational data. Biometrika, 101(4):883–898, 2014.
  • [5] James Durbin and Siem Jan Koopman. Time series analysis by state space methods, volume 38. OUP Oxford, 2012.
  • [6] Geir Evensen. Data assimilation: the ensemble Kalman filter. Springer Science & Business Media, 2009.
  • [7] Andrew O. Finley, Abhirup Datta, and Sudipto Banerjee. spNNGP R Package for Nearest Neighbor Gaussian Process Models. Journal of Statistical Software, 103(5):1–40, 2022.
  • [8] Reinhard Furrer, Marc G Genton, and Douglas Nychka. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3):502–523, 2006.
  • [9] Francesco Ginelli, Fernando Peruani, Markus Bär, and Hugues Chaté. Large-scale collective properties of self-propelled rods. Physical review letters, 104(18):184502, 2010.
  • [10] Robert B Gramacy and Daniel W Apley. Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2):561–578, 2015.
  • [11] Mengyang Gu, Xiaojing Wang, and James O Berger. Robust Gaussian stochastic process emulation. Annals of Statistics, 46(6A):3038–3066, 2018.
  • [12] Mark S Handcock and Michael L Stein. A Bayesian analysis of kriging. Technometrics, 35(4):403–410, 1993.
  • [13] Jouni Hartikainen and Simo Sarkka. Kalman filtering and smoothing solutions to temporal Gaussian process regression models. In Machine Learning for Signal Processing (MLSP), 2010 IEEE International Workshop on, pages 379–384. IEEE, 2010.
  • [14] Trevor Hastie and Robert Tibshirani. Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4):757–779, 1993.
  • [15] Magnus R Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving linear systems. Journal of research of the National Bureau of Standards, 49(6):409, 1952.
  • [16] Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Journal of basic Engineering, 82(1):35–45, 1960.
  • [17] Yael Katz, Kolbjørn Tunstrøm, Christos C Ioannou, Cristián Huepe, and Iain D Couzin. Inferring the structure and dynamics of interactions in schooling fish. Proceedings of the National Academy of Sciences, 108(46):18720–18725, 2011.
  • [18] Matthias Katzfuss and Joseph Guinness. A general framework for Vecchia approximations of Gaussian processes. Statistical Science, 36(1):124–141, 2021.
  • [19] Matthias Katzfuss, Joseph Guinness, and Earl Lawrence. Scaled Vecchia approximation for fast computer-model emulation. SIAM/ASA Journal on Uncertainty Quantification, 10(2):537–554, 2022.
  • [20] Genshiro Kitagawa. Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of computational and graphical statistics, 5(1):1–25, 1996.
  • [21] Clifford Lam, Qiwei Yao, and Neil Bathia. Estimation of latent factors for high-dimensional time series. Biometrika, 98(4):901–918, 2011.
  • [22] Finn Lindgren, Håvard Rue, and Johan Lindström. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(4):423–498, 2011.
  • [23] Fei Lu, Ming Zhong, Sui Tang, and Mauro Maggioni. Nonparametric inference of interaction laws in systems of agents from trajectory data. Proceedings of the National Academy of Sciences, 116(29):14424–14433, 2019.
  • [24] Yimin Luo, Mengyang Gu, Minwook Park, Xinyi Fang, Younghoon Kwon, Juan Manuel Urueña, Javier Read de Alaniz, Matthew E Helgeson, Cristina M Marchetti, and Megan T Valentine. Molecular-scale substrate anisotropy, crowding and division drive collective behaviours in cell monolayers. Journal of the Royal Society Interface, 20(204):20230160, 2023.
  • [25] Suman Majumder, Yawen Guan, Brian J Reich, and Arvind K Saibaba. Kryging: geostatistical analysis of large-scale datasets using Krylov subspace methods. Statistics and Computing, 32(5):74, 2022.
  • [26] Giovanni Petris, Sonia Petrone, and Patrizia Campagnoli. Dynamic linear models. In Dynamic linear models with R. Springer, 2009.
  • [27] Raquel Prado and Mike West. Time series: modeling, computation, and inference. Chapman and Hall/CRC, 2010.
  • [28] Dennis C Rapaport. The art of molecular dynamics simulation. Cambridge university press, 2004.
  • [29] Arvind K Saibaba, Alen Alexanderian, and Ilse CF Ipsen. Randomized matrix-free trace and log-determinant estimators. Numerische Mathematik, 137(2):353–395, 2017.
  • [30] Edward Snelson and Zoubin Ghahramani. Sparse Gaussian processes using pseudo-inputs. Advances in neural information processing systems, 18:1257, 2006.
  • [31] Michael L Stein. Interpolation of spatial data: some theory for kriging. Springer Science & Business Media, 2012.
  • [32] Michael L Stein, Jie Chen, and Mihai Anitescu. Stochastic approximation of score functions for Gaussian processes. The Annals of Applied Statistics, 7(2):1162–1191, 2013.
  • [33] Jonathan R Stroud, Michael L Stein, Barry M Lesht, David J Schwab, and Dmitry Beletsky. An ensemble Kalman filter and smoother for satellite data assimilation. Journal of the american statistical association, 105(491):978–990, 2010.
  • [34] Aldo V Vecchia. Estimation and model identification for continuous spatial processes. Journal of the Royal Statistical Society: Series B (Methodological), 50(2):297–312, 1988.
  • [35] Tamás Vicsek, András Czirók, Eshel Ben-Jacob, Inon Cohen, and Ofer Shochet. Novel type of phase transition in a system of self-driven particles. Physical review letters, 75(6):1226, 1995.
  • [36] Elise Walck-Shannon and Jeff Hardin. Cell intercalation from top to bottom. Nature Reviews Molecular Cell Biology, 15(1):34–48, 2014.
  • [37] M. West and P. J. Harrison. Bayesian Forecasting & Dynamic Models. Springer Verlag, 2nd edition, 1997.
  • [38] Peter Whittle. On stationary processes in the plane. Biometrika, pages 434–449, 1954.