Feedback Interconnected Mean-Field Density Estimation and Control

Tongjia Zheng¹, Qing Han² and Hai Lin¹ *This work was supported by the National Science Foundation under Grant No. IIS-1724070, CNS-1830335, IIS-2007949.¹Tongia Zheng and Hai Lin are with the Department of Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA. [email protected], [email protected].²Qing Han is with the Department of Mathematics, University of Notre Dame, Notre Dame, IN 46556, USA. [email protected].

Abstract

Swarm robotic systems have foreseeable applications in the near future. Recently, there has been an increasing amount of literature that employs mean-field partial differential equations (PDEs) to model the time-evolution of the probability density of swarm robotic systems and uses density feedback to design stabilizing control laws that act on individuals such that their density converges to a target profile. However, it remains largely unexplored considering problems of how to estimate the mean-field density, how the density estimation algorithms affect the control performance, and whether the estimation performance in turn depends on the control algorithms. In this work, we focus on studying the interplay of these algorithms. Specifically, we propose new density control laws which use the mean-field density and its gradient as feedback, and prove that they are globally input-to-state stable (ISS) with respect to estimation errors. Then, we design filtering algorithms to estimate the density and its gradient separately, and prove that these estimates are convergent assuming the control laws are known. Finally, we show that the feedback interconnection of these estimation and control algorithms is still globally ISS, which is attributed to the bilinearity of the PDE system. An agent-based simulation is included to verify the stability of these algorithms and their feedback interconnection.

I Introduction

Swarm robotic systems (such as drones) have foreseeable applications in the near future. Compared with small-scale robotic systems, the dramatic increase in the number of involved robots provides numerous advantages such as robustness, efficiency, and flexibility, but also poses significant challenges to their estimation and control problems.

Many methods have been proposed for controlling large-scale systems, such as graph theoretic design [1] and game theoretic formulation, especially potential games [2] and mean-field games [3]. Our work is also inspired by modelling and control strategies based on mean-field limit approximations. However, unlike mean-field games where the mean-field density is used to approximate the collective effect of the swarm, we aim at the direct control of the mean-field density, which results in a PDE control problem. Mean-field models include Markov chains and PDEs. The first category partitions the spatial domain to obtain an abstracted Markov chain model and designs the transition probability to stabilize the density [4, 5], which usually suffers from the state explosion issue. In PDE-based models, individual robots are modelled by a family of stochastic differential equations and their density evolves according to a PDE. In this way, the density control problem of a robotic swarm is converted as a regulation problem of the PDE.

Considering density control, early efforts usually adopt an optimal control formulation [6]. While an optimal control formulation provides more flexibility for the control objective, the solution relies on numerically solving the optimality conditions, which are computationally expensive and essentially open-loop. Density control is also studied in [7, 8] by relating density control problems with the so-called Schrödinger Bridge problem. However, numerically solving the associated Schrödinger Bridge problem is also known to suffer from the curse of dimensionality except for the linear case. In recent years, researchers have sought to design control laws that explicitly use the mean-field density as feedback to form closed-loop control [9, 10, 11]. This density feedback technique is able to guarantee closed-loop stability and can be efficiently computed since it is given in a closed form. However, it remains largely unexplored considering problems of how to estimate the mean-field density, how the density estimation algorithm affects the control performance, and whether the estimation performance in turn depends on the density control algorithms. These problems become more critical as it is observed that most density feedback design more or less depends on the gradient of the mean-field density. Since gradient is an unbounded operator, any density estimation algorithm that produces accurate density estimates may have arbitrarily large estimation errors for the gradient. This brings significant concerns to the density estimation problem.

In this work, we study the interplay of density feedback laws and estimation algorithms. In [11], we have proposed some density feedback laws and obtained preliminary results on their robustness to density estimation errors. This work extends these feedback laws so that they are less restrictive and have more verifiable robustness properties in the presence of estimation errors. We have also reported density filtering algorithms in [12, 13] which are designed for large-scale stochastic systems modelled by mean-field PDEs. This work extends the algorithms to directly estimate the gradient of the density, a quantity required by almost all existing density feedback control in the literature. Furthermore, we study the interconnection of these estimation and control algorithms and prove their closed-loop stability.

Our contribution includes three aspects. First, we propose new density feedback laws and show their robustness using the notion of ISS. Second, we design infinite-dimensional filters to estimate the gradient of the density and study their stability and optimality. Third, we prove that the feedback interconnection of these estimation and control algorithms is still globally ISS.

The rest of the paper is organized as follow. Section II introduces some preliminaries. Problem formulation is given in Section III. Section IV is our main results in which we propose new density estimation and control laws, and study their interconnected stability. Section V presents an agent-based simulation to verify the effectiveness.

II Preliminaries

II-A Notations

For a vector $x\in\mathbb{R}^{n}$ , its Euclidean norm is denoted by $\|x\|$ . Let $E\subset\mathbb{R}^{n}$ be a measurable set. For $f:E\to\mathbb{R}$ , its $L^{2}$ -norm is denoted by $\|f\|_{L^{2}(E)}:=(\int_{E}|f(x)|^{2}dx)^{1/2}$ . Given $g(x)$ with $0<\inf_{x\in E}g(x)\leq\sup_{x\in E}g(x)<\infty$ , the weighted norm $\|f\|_{L^{p}(E;g)}:=(\int_{E}|f(x)|^{p}g(x)dx)^{1/p}$ is equivalent to $\|f\|_{L^{p}(E)}$ . We will omit $E$ in the notation when it is clear. The gradient and Laplacian of a scalar function $f$ are denoted by $\nabla f$ and $\Delta f$ , respectively The divergence of a vector field $F$ is denoted by $\nabla\cdot F$ .

Lemma 1 (Poincaré inequality for density functions [14])

Let $\Omega\in\mathbb{R}^{n}$ be a bounded convex open set with a Lipschitz boundary. Let $g$ be a continuous density function on $\Omega$ such that $0<c_{1}\leq\inf g\leq\sup g\leq c_{2}$ for some constants $c_{1}$ and $c_{2}$ . Then, $\exists$ a constant $C>0$ such that for $\forall f\in H^{1}(\Omega)$ ,

\int_{\Omega}|\nabla f|^{2}gdx\geq C\int_{\Omega}\left|f-\int_{\Omega}fgdx\right|^{2}gdx.

(1)

II-B Input-to-state stability

We introduce ISS for infinite-dimensional systems [15]. Define the following classes of comparison functions:

	$\displaystyle\mathscr{P}$	$\displaystyle:=\{\gamma:\mathbb{R}_{+}\to\mathbb{R}_{+}\|\gamma\text{ is continuous, }\gamma(0)=0,$
		$\displaystyle\qquad\text{ and }\gamma(r)>0\text{ for }r>0\}$
	$\displaystyle\mathscr{K}$	$\displaystyle:=\{\gamma\in\mathscr{P}\mid\gamma\text{ is strictly increasing}\}$
	$\displaystyle\mathscr{K}_{\infty}$	$\displaystyle:=\{\gamma\in\mathscr{K}\mid\gamma\text{ is unbounded}\}$
	$\displaystyle\mathscr{L}$	$\displaystyle:=\{\gamma:\mathbb{R}_{+}\to\mathbb{R}_{+}\mid\gamma\text{ is continuous and strictly}$
		$\displaystyle\quad\quad\text{decreasing with }\lim_{t\to\infty}\gamma(t)=0\}$
	$\displaystyle\mathscr{KL}$	$\displaystyle:=\{\beta:\mathbb{R}_{+}\times\mathbb{R}_{+}\to\mathbb{R}_{+}\mid\beta(\cdot,t)\in\mathscr{K},\forall t\geq 0,$
		$\displaystyle\qquad\beta(r,\cdot)\in\mathscr{L},\forall r>0\}.$

Let $\left(X,\|\cdot\|_{X}\right)$ and $\left(U,\|\cdot\|_{U}\right)$ be the state and input space, endowed with norms $\|\cdot\|_{X}$ and $\|\cdot\|_{U}$ , respectively. Denote $U_{c}=PC(\mathbb{R}_{+};U)$ , the space of piecewise right-continuous functions from $\mathbb{R}_{+}$ to $U$ , equipped with the sup-norm. Consider a control system $\Sigma=(X,U_{c},\phi)$ where $\phi:\mathbb{R}_{+}\times X\times U_{c}\to X$ is a transition map. Let $x(t)=\phi(t,x_{0},u)$ .

Definition 1

$\Sigma$ is called input-to-state stable (ISS), if $\exists\beta\in\mathcal{KL},\gamma\in\mathcal{K}$ , such that

\|x(t)\|_{X}\leq\beta(\|x_{0}\|_{X},t)+\gamma\Big{(}\sup_{0\leq s\leq t}\|u(s)\|_{U}\Big{)},

$\forall x_{0}\in X,\forall u\in U_{c}$ and $\forall t\geq 0$ .

Definition 2

A continuous function $V:\mathbb{R}_{+}\times X\to\mathbb{R}_{+}$ is called an ISS-Lyapunov function for $\Sigma$ , if $\exists\psi_{1},\psi_{2}\in\mathcal{K}_{\infty},\chi\in\mathcal{K}$ , and $W\in\mathcal{P}$ , such that:

(i)

$\psi_{1}(\|x\|_{X})\leq V(t,x)\leq\psi_{2}(\|x\|_{X}),~{}\forall x\in X$
(ii)

$\forall x\in X,\forall u\in U_{c}$ with $u(0)=\xi\in U$ it holds:

$\|x\|_{X}\geq\chi(\|\xi\|_{U})\Rightarrow\dot{V}(t,x)\leq-W(\|x\|_{X}).$

Theorem 1

If $\Sigma$ admits an ISS-Lyapunov function, then it is ISS.

ISS is useful for studying the stability of cascade systems. Consider two systems $\Sigma_{i}=(X_{i},U_{ci},\phi_{i}),i=1,2$ , where $U_{ci}=PC(\mathbb{R}_{+};U_{i})$ and $X_{1}\subset U_{2}$ . We say they form a cascade connection if $u_{2}(t)=\phi_{1}(t,t_{0},\phi_{01},u_{1})$ .

Theorem 2

[16] The cascade connection of two ISS systems is ISS.

II-C Infinite-dimensional Kalman filters

We introduce the infinite-dimensional Kalman filters presented in [17]. Let $\mathcal{H},\mathcal{K}$ be real Hilbert spaces. Consider the following infinite-dimensional linear system:

	$\displaystyle du(t)=\mathcal{A}(t)u(t)dt+\mathcal{B}(t)dw(t),~{}u(0)=u_{0}$
	$\displaystyle dz(t)=\mathcal{C}(t)u(t)dt+\mathcal{F}(t)dv(t),~{}z(0)=0,$

where $\mathcal{A}(t)$ is a linear closed operator on $\mathcal{H}$ , $\mathcal{B}(\cdot)\in L^{\infty}([0,T];\mathcal{L}(\mathcal{H}))$ , $\mathcal{C}(\cdot)\in L^{\infty}([0,T];\mathcal{L}(\mathcal{H},\mathcal{K}))$ , and $\mathcal{F}(\cdot),\mathcal{F}(\cdot)^{-1}\in L^{\infty}([0,T];\mathcal{L}(\mathcal{K}))$ . $w(t)$ and $v(t)$ are independent Wiener processes on $\mathcal{H}$ and $\mathcal{K}$ with covariance operators $\mathcal{W}$ and $\mathcal{V}$ , respectively. The infinite-dimensional Kalman filter is given by:

d\hat{u}(t)=\mathcal{A}(t)\hat{u}dt+\mathcal{K}(t)(dz(t)-\mathcal{C}(t)u(t)dt),~{}\hat{u}(0)=0

where $\mathcal{K}(t)=\mathcal{P}(t)\mathcal{C}^{*}(t)\left(\mathcal{F}(t)\mathcal{V}\mathcal{F}^{*}(t)\right)^{-1}$ is the Kalman gain, and $\mathcal{P}(t)$ is the solution of the Riccati equation:

	$\displaystyle\frac{d\mathcal{P}(t)}{dt}$	$\displaystyle=\mathcal{A}(t)\mathcal{P}(t)+\mathcal{P}(t)\mathcal{A}^{}(t)+\mathcal{B}(t)\mathcal{W}\mathcal{B}^{}(t)$
		$\displaystyle\quad-\mathcal{P}(t)\mathcal{C}^{}(t)\left(\mathcal{F}(t)\mathcal{V}\mathcal{F}^{}(t)\right)^{-1}\mathcal{C}(t)\mathcal{P}(t).$

III Problem formulation

This work studies the density control problem of robotic swarms. Consider $N$ robots in a bounded convex domain $\Omega\subset\mathbb{R}^{n}$ , which are assumed to be homogeneous and satisfy:

dX_{t}^{i}=v(X_{t}^{i},t)dt+\sqrt{2\sigma(t)}dB_{t}^{i},\quad i=1,\dots,N,

(2)

where $X_{t}^{i}\in\Omega$ represents the position of the $i$ -th robot, $v=(v_{1},\dots,v_{n})\in\mathbb{R}^{n}$ is the velocity field to be designed, $\{B_{t}^{i}\}$ are standard Wiener processes assumed to be independent across the robots, and $\sqrt{2\sigma(t)}\in\mathbb{R}$ is the standard deviation.

In the mean-field limit as $N\to\infty$ , the collective behavior of the robots can be captured by their probability density

\displaystyle p(x,t)\approx\frac{1}{N}\sum_{i=1}^{N}\delta_{X_{t}^{i}}

with $\delta_{x}$ being the Dirac distribution, and this density satisfies a Fokker-Planck equation given by:

\displaystyle\begin{split}\partial_{t}p=-\nabla\cdot({v}p)+\Delta(\sigma p)&\quad\text{in}\quad\Omega\times(0,T),\\ p=p_{0}&\quad\text{on}\quad\Omega\times\{0\},\\ \boldsymbol{n}\cdot(\nabla(\sigma p)-{v}p)=0&\quad\text{on}\quad\partial\Omega\times(0,T),\end{split}

(3)

where $\boldsymbol{n}$ is the unit inner normal to the boundary $\partial\Omega$ , and $p_{0}(x)$ is the initial density. The last equation is the reflecting boundary condition to confine the robots within $\Omega$ .

Remark 1

This work focuses on the interconnected stability of density estimation and control. For clarity, the robots are assumed to be first-order integrators in (2). Density control for heterogeneous higher-order nonlinear systems is studied in a separate work [18]. The interconnected stability results to be presented later can be generalized to these more general systems by combining the stability results in [18].

The problems studied in this work are stated as follow.

Problem 1 (Density control)

Consider (3). Given a target density $p_{*}(x)$ , design $v$ such that $p(x,t)\to p_{*}(x)$ .

Problem 2 (Density estimation)

Consider (3). Given the robots’ states $\{X_{t}^{i}\}_{i=1}^{N}$ , estimate $p$ and its gradient.

Problem 3 (Feedback interconnection)

Consider (3). Given a target density $p_{*}(x)$ , let $v$ be designed as a feedback function of certain density estimates that are computed based on (3). Prove that $p(x,t)\to p_{*}(x)$ .

IV Main results

IV-A Modified density feedback laws

Given a smooth target density $p_{*}(x)$ , bounded from above and below by positive constants, we design:

v(x,t)=-\alpha(x,t)\nabla\frac{p(x,t)}{p_{*}(x)}+\frac{\sigma(t)\nabla p_{*}(x)}{p_{*}(x)}

(4)

where $\alpha\geq 0$ is a design parameter for individuals to adjust their velocity magnitude. In the collective level, (4) is called density feedback because of its explicit dependence on the mean-field density $p$ . In the individual level, (4) is essentially nonlinear state feedback and the velocity input for the $i$ -th robot is simply given by $v(X_{t}^{i},t)$ .

Remark 2

Compared with the control laws proposed in [9, 10, 11], the remarkable difference of (4) is that $p$ does not appear in any denominator, which provides several advantages. First, it relaxes the requirement for $p$ to be strictly positive and avoids producing large velocity when $p$ is close to 0. Second, it will enable us to obtain ISS results with respect to estimation errors in $L^{2}$ norm. (Such results are difficult to obtain for those in [9, 10, 11].) The significance of this property will become apparent when we study the interconnected stability of control and estimation algorithms.

Define $\Phi=p-p_{*}$ as the convergence error.

Theorem 3

Consider system (3). Let $v$ be given by (4). Then $\|\Phi(t)\|_{L^{2}}$ converges to 0 exponentially.

Proof:

Substituting (4) into (3), we obtain:

\displaystyle\begin{split}\partial_{t}p&=\nabla\cdot\left(\alpha p\nabla\frac{p}{p_{*}}\right)-\nabla\cdot\left(\frac{\sigma p}{p_{*}}\nabla p_{*}\right)+\Delta(\sigma p)\\ &=\nabla\cdot\left(\alpha p\nabla\frac{p}{p_{*}}\right)-\nabla\cdot\left[\frac{\sigma p}{p_{*}}\nabla p_{*}-\nabla(\sigma p)\right]\\ &=\nabla\cdot\left(\alpha p\nabla\frac{p}{p_{*}}\right)+\nabla\cdot\left(\sigma p_{*}\nabla\frac{p}{p_{*}}\right).\end{split}

Consider a Lyapunov function $V(t)=\frac{1}{2}\|\Phi\|_{L^{2}(1/p_{*})}^{2}$ . By the divergence theorem and the boundary condition, we have

\displaystyle\begin{split}\frac{dV}{dt}&=\int_{\Omega}\frac{p-p_{*}}{p_{*}}\partial_{t}pdx\\ &=\int_{\Omega}\frac{p-p_{*}}{p_{*}}\left[\nabla\cdot\left(\alpha p\nabla\frac{p}{p_{*}}\right)+\nabla\cdot\left(\sigma p_{*}\nabla\frac{p}{p_{*}}\right)\right]dx\\ &=\int_{\Omega}-\alpha p\left(\nabla\frac{p}{p_{*}}\right)^{2}-\sigma p_{*}\left(\nabla\frac{p}{p_{*}}\right)^{2}dx\\ &\leq-(\alpha_{1}(t)+\alpha_{2}(t))\left\|\nabla\frac{p}{p_{*}}\right\|_{L^{2}}^{2}\\ &=-(\alpha_{1}(t)+\alpha_{2}(t))\left\|\nabla\frac{p-p_{*}}{p_{*}}\right\|_{L^{2}}^{2}.\end{split}

By the Poincáre inequality (1) (where we set $f=\frac{p-p_{*}}{p_{*}}$ and $g=p_{*}$ ) and the fact that $\int_{\Omega}(p-p_{*})dx=0$ , we have

\displaystyle\begin{split}\frac{dV}{dt}&\leq-(\alpha_{1}(t)+\alpha_{2}(t))C^{2}\left\|\frac{p-p_{*}}{p_{*}}\right\|_{L^{2}}^{2}\\ &=-(\alpha_{1}(t)+\alpha_{2}(t))C^{2}\left\|\Phi\right\|_{L^{2}(1/p_{*}^{2})}^{2}\end{split}

where $\alpha_{1}(t),\alpha_{2}(t)\in\mathbb{R}$ satisfy $0\leq\min_{x}\alpha p\leq\alpha_{1}(t)\leq\max_{x}\alpha p$ and $0<\min_{x}\sigma p_{*}\leq\alpha_{2}(t)\leq\max_{x}\sigma p_{*}$ , and $C>0$ is the Poincáre constant. By the strong maximum principle [19], there exist $t_{1},a>0$ such that $p\geq a$ for $t\geq t_{1}$ . Hence, $\alpha_{1}=0$ if and only if $\alpha=0$ for $t\geq t_{1}$ . ∎

Remark 3

Note that we allow $\alpha=0$ in (4). In this case, (4) becomes $v=\frac{\sigma\nabla p_{*}}{p_{*}}$ , which is a well-known law to drive stochastic particles towards a target distribution in physics [20]. However, the convergence speed will be very slow since $\sigma$ is small in general. We add the feedback term $-\alpha\nabla\frac{p}{p_{*}}$ to provide extra and locally adjustable convergence speed. The accelerated convergence is reflected by $\alpha_{1}$ in the proof.

The mean-field density $p$ cannot be measured directly. We will study how to estimate $p$ in next section. We first establish some robustness results regardless of what estimation algorithm to use. It is useful to rewrite (4) as:

v=-\alpha\frac{p_{*}\nabla p-p\nabla p_{*}}{p_{*}^{2}}+\frac{\sigma(t)\nabla p_{*}}{p_{*}}.

(5)

In general, we need to estimate $p$ and $\nabla p$ separately because $\nabla$ is an unbounded operator, i.e., any algorithm that produces accurate estimates of $p$ may have arbitrarily large estimation errors for $\nabla p$ . Let $\hat{p}$ and $\widehat{\nabla p}$ be the estimates of $p$ and $\nabla p$ . Based on (5), the control law using estimates is given by:

v=-\alpha\frac{p_{*}\widehat{\nabla p}-\hat{p}\nabla p_{*}}{p_{*}^{2}}+\frac{\sigma(t)\nabla p_{*}}{p_{*}}.

(6)

Define $\epsilon=\hat{p}-p$ and $\epsilon_{g}=\widehat{\nabla p}-\nabla p$ as the estimation errors. Substituting (6) into (3), we obtain the closed-loop system:

\partial_{t}\Phi=\partial_{t}p=\nabla\cdot\left(\alpha p\frac{p_{*}\widehat{\nabla p}-\hat{p}\nabla p_{*}}{p_{*}^{2}}+\sigma p_{*}\nabla\frac{p}{p_{*}}\right).

(7)

We have the following robustness result.

Theorem 4

Consider (3). Let $v$ be given by (6). If $\alpha>0$ , then $\|\Phi(t)\|_{L^{2}}$ is ISS with respect to $\|\epsilon(t)\|_{L^{2}}$ and $\|\epsilon_{g}(t)\|_{L^{2}}$ .

Proof:

Consider $V(t)=\frac{1}{2}\|\Phi\|_{L^{2}(1/p_{*})}^{2}$ . By the divergence theorem and the boundary condition, we have

\displaystyle\begin{split}\frac{dV}{dt}&=\int_{\Omega}\frac{p-p_{*}}{p_{*}}\left[\nabla\cdot\left(\alpha p\frac{p_{*}\widehat{\nabla p}-\hat{p}\nabla p_{*}}{p_{*}^{2}}+\sigma p_{*}\nabla\frac{p}{p_{*}}\right)\right]dx\\ &=\int_{\Omega}-\alpha p\nabla\frac{p}{p_{*}}\cdot\left(\frac{p_{*}\nabla p-p\nabla p_{*}+p_{*}\epsilon_{g}-\epsilon\nabla p_{*}}{p_{*}^{2}}\right)\\ &\quad-\sigma p_{*}\left(\nabla\frac{p}{p_{*}}\right)^{2}dx\\ &=\int_{\Omega}-\alpha p\nabla\frac{p}{p_{*}}\cdot\left(\nabla\frac{p}{p_{*}}+\frac{1}{p_{*}}\epsilon_{g}-\frac{\nabla p_{*}}{p_{*}^{2}}\epsilon\right)\\ &\quad-\sigma p_{*}\left(\nabla\frac{p}{p_{*}}\right)^{2}dx\\ &\leq-(\alpha_{1}(t)+\alpha_{2}(t))\left\|\nabla\frac{p}{p_{*}}\right\|_{L^{2}}^{2}\\ &\quad+\alpha_{3}(t)\left\|\nabla\frac{p}{p_{*}}\right\|_{L^{2}}\left\|\frac{1}{p_{*}}\epsilon_{g}-\frac{\nabla p_{*}}{p_{*}^{2}}\epsilon\right\|_{L^{2}}\\ \end{split}

where $\alpha_{1}$ and $\alpha_{2}$ are defined in the proof of Theorem 3, and $\alpha_{3}(t)\in\mathbb{R}$ satisfies $0\leq\min_{x}\alpha p\leq\alpha_{3}(t)\leq\max_{x}\alpha p$ . Using a constant $\theta\in(0,1)$ to split the first term and applying the Poincáre inequality (1), we have

\displaystyle\begin{split}\frac{dV}{dt}&\leq-(\alpha_{1}(t)+\alpha_{2}(t))(1-\theta)C^{2}\left\|\frac{p-p_{*}}{p_{*}}\right\|_{L^{2}}^{2}\\ &\quad-(\alpha_{1}(t)+\alpha_{2}(t))\theta C\left\|\nabla\frac{p}{p_{*}}\right\|_{L^{2}}\left\|\frac{p-p_{*}}{p_{*}}\right\|_{L^{2}}\\ &\quad+\alpha_{3}(t)\big{(}\|\epsilon_{g}\|_{L^{2}(\frac{1}{p_{*}})}+\|\nabla p_{*}\|_{L^{\infty}}\|\epsilon\|_{L^{2}(\frac{1}{p_{*}^{2}})}\|\big{)}\left\|\nabla\frac{p}{p_{*}}\right\|_{L^{2}}\\ &\leq-(\alpha_{1}(t)+\alpha_{2}(t))(1-\theta)C^{2}\|\Phi\|_{L^{2}(1/p_{*}^{2})}^{2},\end{split}

if $\|\Phi\|_{L^{2}(\frac{1}{p_{*}^{2}})}\geq\alpha_{\sup}(\|\epsilon_{g}\|_{L^{2}(\frac{1}{p_{*}})}+\|\nabla p_{*}\|_{L^{\infty}}\|\epsilon\|_{L^{2}(\frac{1}{p_{*}^{2}})}\|)$ where $\alpha_{\sup}:=\sup_{t}\frac{\alpha_{3}(t)}{(\alpha_{1}(t)+\alpha_{2}(t))\theta C}$ . The ISS property then follows from Theorem 1. ∎

This theorem ensures that $\|\Phi(t)\|_{L^{2}}$ is always bounded by a function of $\|\epsilon(t)\|_{L^{2}}$ and $\|\epsilon_{g}(t)\|_{L^{2}}$ , and will approach 0 exponentially if both $\hat{p}$ and $\widehat{\nabla p}$ are accurate.

IV-B Density and gradient estimation

Now we design algorithms to estimate $p$ and $\nabla p$ separately. The corresponding algorithms will be referred to as density filters and gradient filters, respectively. The former has been studied in [12, 13] and is included for completeness. In the estimation problem, the system (3) is assumed known (including $v$ ). We use kernel density estimation (KDE) to construct noisy measurements of $p$ and $\nabla p$ and utilize two important properties: the dynamics governing $p$ and $\nabla p$ are linear; their measurement noises are approximately Gaussian. Then we use infinite-dimensional Kalman filters to design two filters to estimate them separately.

First, the processes $\{X_{t}^{i}\}_{i=1}^{N}$ become asymptotically independent as $N\to\infty$ [3]. Hence, for a large $N$ , $\{X_{t}^{i}\}_{i=1}^{N}$ can be approximately treated as independent samples of $p(x,t)$ . We then use KDE to construct priori estimates which are used as noisy measurements of $p$ and $\nabla p$ .

We need to derive an evolution equation for $\nabla p$ . By applying the gradient operator on both sides of (3), we have:

\displaystyle\begin{split}\partial_{t}(\nabla p)=\nabla(\partial_{t}p)&=\nabla[-\nabla\cdot(vp)+\Delta(\sigma p)]\\ &=\nabla[-\nabla\cdot(v\mathscr{I}(\nabla p))+\Delta(\sigma\mathscr{I}(\nabla p))]\end{split}

where $\mathscr{I}$ is an integration operator defined as follows. For a given vector field $F:\Omega\to\mathbb{R}^{n}$ , define $\mathscr{I}(F)=f$ , where $f:\Omega\to\mathbb{R}$ is uniquely determined by the following relations:

\nabla f=F\text{ and }\boldsymbol{n}\cdot(\nabla(\sigma f)-{v}f)=0\text{ on }\partial\Omega.

(8)

For simplicity, denote $q=\nabla p$ , i.e., $q:\Omega\times(0,T)\to\mathbb{R}^{n}$ . We obtain a partial-integro-differential equation for $q$ :

\displaystyle\begin{split}\partial_{t}q=\nabla[-\nabla\cdot(v\mathscr{I}(q))+\Delta(\sigma\mathscr{I}(q))]\end{split}

(9)

which is a linear equation. We will rewrite (3) and (9) as evolution equations in $\mathcal{H}_{1}=L^{2}(\Omega)$ and $\mathcal{H}_{2}=L^{2}(\Omega)^{n}:=\{(F_{1},\dots,F_{n}):\Omega\to\mathbb{R}^{n}\mid F_{i}\in L^{2}(\Omega)\}$ , respectively. Specifically, define the following linear operators:

	$\displaystyle\mathcal{A}(t)p:=-\nabla\cdot[v(t)p]+\Delta[\sigma(t)p],$
	$\displaystyle\mathcal{A}_{g}(t)q:=\nabla[-\nabla\cdot(v(t)\mathscr{I}(q))+\Delta(\sigma(t)\mathscr{I}(q))].$

For any $t$ , we use KDE and the samples $\{X_{t}^{i}\}_{i=1}^{N}$ to construct a priori estimate $p_{\text{KDE}}(x,t)$ [21]:

\displaystyle p_{\text{KDE}}(x,t)

\displaystyle=\frac{1}{Nh^{n}}\sum_{i=1}^{N}K\Big{(}\frac{1}{h}(x-X_{t}^{i})\Big{)},

where $K(x)$ is a kernel function and $h$ is the bandwidth. We treat $p_{\text{KDE}}(x,t)$ as a noisy measurement of $p(x,t)$ . We also treat $\nabla p_{\text{KDE}}(x,t)$ , the gradient of $p_{\text{KDE}}(x,t)$ , as a noisy measurement of $q(x,t)$ . Define $w(x,t)=p_{\text{KDE}}(x,t)-p(x,t)$ and $w_{g}(x,t)=\nabla p_{\text{KDE}}(x,t)-q(x,t)$ . Then, $w(x,t)$ and $w_{g}(x,t)$ are approximately infinite-dimensional Gaussian noises with diagonal covariance operators $\bar{\mathcal{R}}(t)=k\operatorname{diag}(p(t))$ and $\bar{\mathcal{R}}_{g}(t)=k_{g}\operatorname{diag}(q(t))$ , respectively, where $k,k_{g}>0$ are constants depending on the kernels and $N$ . (See Appendices in [13] for why $w$ and $w_{g}$ are approximately Gaussian.)

We obtain two infinite-dimensional linear systems:

	$\displaystyle\left\{\begin{array}[]{l}\dot{p}=\mathcal{A}(t)p\\ y=p_{\text{KDE}}(t)=p+w(t)\end{array}\right.$		(12)
	$\displaystyle\left\{\begin{array}[]{l}\dot{q}=\mathcal{A}_{g}(t)q\\ y_{g}=\nabla p_{\text{KDE}}(t)=q+w_{g}(t)\end{array}\right.$		(15)

We can design infinite-dimensional Kalman filters for (12) and (15). However, $\bar{\mathcal{R}}$ and $\bar{\mathcal{R}}_{g}$ are unknown because they depend on $p$ and $q$ , the states to be estimated. Hence, we approximate $\bar{\mathcal{R}}$ and $\bar{\mathcal{R}}_{g}$ with $\mathcal{R}(t)=k\operatorname{diag}(p_{\text{KDE}}(t))$ and $\mathcal{R}_{g}(t)=k_{g}\operatorname{diag}(\nabla p_{\text{KDE}}(t))$ . In this way, the “suboptimal” density filter is given by:

	$\displaystyle\dot{\hat{p}}=\mathcal{A}(t)\hat{p}+\mathcal{P}(t)\mathcal{R}^{-1}(t)(y(t)-\hat{p}),$		(16)
	$\displaystyle\dot{\mathcal{P}}=\mathcal{A}(t)\mathcal{P}+\mathcal{P}\mathcal{A}^{*}(t)-\mathcal{P}\mathcal{R}^{-1}(t)\mathcal{P},$		(17)

and the “suboptimal” gradient filter is given by:

	$\displaystyle\dot{\hat{q}}=\mathcal{A}_{g}(t)\hat{q}+\mathcal{Q}(t)\mathcal{R}_{g}^{-1}(t)(y_{g}(t)-\hat{q}),$		(18)
	$\displaystyle\dot{\mathcal{Q}}=\mathcal{A}_{g}(t)\mathcal{Q}+\mathcal{Q}\mathcal{A}_{g}^{*}(t)-\mathcal{Q}\mathcal{R}_{g}^{-1}(t)\mathcal{Q},$		(19)

where $\hat{p}$ and $\hat{q}$ are our estimates for $p$ and $q$ (i.e., $\nabla p$ ).

Now we study the stability and optimality of these two filters. Let $\bar{\mathcal{P}}$ and $\bar{\mathcal{Q}}$ be the corresponding solutions of (17) and (19) when $\mathcal{R}$ and $\mathcal{R}_{g}$ are respectively replaced by $\bar{\mathcal{R}}$ and $\bar{\mathcal{R}}_{g}$ , their true but unknown values. Then, $\bar{\mathcal{P}}$ and $\bar{\mathcal{Q}}$ represent the optimal flows of estimation error covariance. We denote $\mathcal{L}=\mathcal{P}\mathcal{R}^{-1}$ and $\mathcal{L}_{g}=\mathcal{Q}\mathcal{R}_{g}^{-1}$ to represent the suboptimal Kalman gains, and denote $\bar{\mathcal{L}}=\bar{\mathcal{P}}\bar{\mathcal{R}}^{-1}$ and $\bar{\mathcal{L}}_{g}=\bar{\mathcal{Q}}\bar{\mathcal{R}}_{g}^{-1}$ to represent the optimal Kalman gains. Define $\epsilon=\hat{p}-p$ and $\epsilon_{g}=\hat{q}-q$ as the estimation errors of the filters. Then $\epsilon$ and $\epsilon_{g}$ satisfy the follow equations:

	$\displaystyle\dot{\epsilon}=(\mathcal{A}(t)-\mathcal{P}(t)\mathcal{R}^{-1}(t))\epsilon+\mathcal{P}(t)\mathcal{R}^{-1}(t)w$		(20)
	$\displaystyle\dot{\epsilon}_{g}=(\mathcal{A}_{g}(t)-\mathcal{Q}(t)\mathcal{R}_{g}^{-1}(t))\epsilon_{g}+\mathcal{Q}(t)\mathcal{R}_{g}^{-1}(t)w_{g}.$		(21)

We can show that the suboptimal filters are stable and remain close to the optimal ones. The following theorem is for the density filter (16) and (17), which is proved in [12, 13].

Theorem 5

Assume that $\|\mathcal{P}(t)\|$ and $\|\bar{\mathcal{P}}(t)\|$ are uniformly bounded, and $\exists c_{1},c_{2}>0$ such that for $t\geq 0$ ,

c_{1}\mathcal{I}\leq\mathcal{R}^{-1}(t),\bar{\mathcal{R}}^{-1}(t),\mathcal{P}^{-1}(t),\bar{\mathcal{P}}^{-1}(t)\leq c_{2}\mathcal{I}.

(22)

Then we have:

(i)

$\|\epsilon\|_{L^{2}}$ is ISS with respect to (w.r.t.) $\|w\|_{L^{2}}$ (and is uniformly exponentially stable if $w=0$ );
(ii)

$\|\bar{\mathcal{P}}-\mathcal{P}\|$ is LISS w.r.t. $\|\bar{\mathcal{R}}^{-1}-\mathcal{R}^{-1}\|$ ;
(iii)

$\|\bar{\mathcal{L}}-\mathcal{L}\|$ is LISS w.r.t. $\|\bar{\mathcal{R}}^{-1}-\mathcal{R}^{-1}\|$ .

Property (i) means that the suboptimal filter (16) is stable even if $\mathcal{R}$ is only an approximation for $\bar{\mathcal{R}}$ . Property (ii) means that the solution of the suboptimal operator Riccati equation (17) remains close to the solution of the optimal one (when $\mathcal{R}$ is replaced by $\bar{\mathcal{R}}$ ). Property (iii) means that the suboptimal Kalman gain remains close to the optimal Kalman gain. We have similar results for the gradient filter (18) and (19). The proof is similar to the proof of Theorem 5 and thus omitted.

Theorem 6

Assume that $\|\mathcal{Q}(t)\|$ and $\|\bar{\mathcal{Q}}(t)\|$ are uniformly bounded, and $\exists c_{3},c_{4}>0$ such that for $t\geq 0$ ,

c_{3}\mathcal{I}\leq\mathcal{R}_{g}^{-1}(t),\bar{\mathcal{R}}_{g}^{-1}(t),\mathcal{Q}^{-1}(t),\bar{\mathcal{Q}}^{-1}(t)\leq c_{4}\mathcal{I}.

(23)

Then we have:

(i)

$\|\epsilon_{g}\|_{L^{2}}$ is ISS w.r.t. $\|w_{g}\|_{L^{2}}$ (and is uniformly exponentially stable if $w_{g}=0$ );
(ii)

$\|\bar{\mathcal{Q}}-\mathcal{Q}\|$ is LISS w.r.t. $\|\bar{\mathcal{R}}_{g}^{-1}-\mathcal{R}_{g}^{-1}\|$ ;
(iii)

$\|\bar{\mathcal{L}}_{g}-\mathcal{L}_{g}\|$ is LISS w.r.t. $\|\bar{\mathcal{R}}_{g}^{-1}-\mathcal{R}_{g}^{-1}\|$ .

IV-C Stability of feedback interconnection

Now we discuss the stability of the feedback interconnection of density estimation and control. We collect equations (7), (20) and (21) in the following for clarity:

\displaystyle\begin{split}&\partial_{t}\Phi=\nabla\cdot\left(\alpha p\frac{p_{*}(\epsilon_{g}+\nabla p)-(\epsilon+p)\nabla p_{*}}{p_{*}^{2}}+\sigma p_{*}\nabla\frac{p}{p_{*}}\right),\\ &\dot{\epsilon}=[\mathcal{A}(t;\Phi,\epsilon,\epsilon_{g})-\mathcal{P}(t)\mathcal{R}^{-1}(t)]\epsilon+\mathcal{P}(t)\mathcal{R}^{-1}(t)w,\\ &\dot{\epsilon}_{g}=[\mathcal{A}_{g}(t;\Phi,\epsilon,\epsilon_{g})-\mathcal{Q}(t)\mathcal{R}_{g}^{-1}(t)]\epsilon_{g}+\mathcal{Q}(t)\mathcal{R}_{g}^{-1}(t)w_{g},\end{split}

(24)

where we write $\mathcal{A}(t;\Phi,\epsilon,\epsilon_{g})$ and $\mathcal{A}_{g}(t;\Phi,\epsilon,\epsilon_{g})$ to emphasize their dependence on $\Phi$ , $\epsilon$ and $\epsilon_{g}$ through $v$ .

A critical observation is that the ISS results we have established for $\epsilon$ and $\epsilon_{g}$ are valid in spite of this dependence. This is because when designing estimation algorithms, $\mathcal{A}$ and $\mathcal{A}_{g}$ (and $v$ ) are treated as known system coefficients. In this way, the bilinear control system (3) becomes a linear system. The dependence on $\Phi$ , $\epsilon$ and $\epsilon_{g}$ can be seen as part of the time-varying nature of $\mathcal{A}$ and $\mathcal{A}_{g}$ . Hence, the stability results for our density and gradient filters will not be affected. (Nevertheless, since in the interconnection $v$ depends on $(\hat{p},\hat{q})$ and becomes stochastic, the presented filters downgrade to be the best “linear” filters instead of the minimum covariance filters.) In this regard, we can treat (24) as a cascade system. By Theorem 2, we have the following stability result for (24).

Theorem 7 (Interconnected stability)

Under the assumptions in Theorems 5 and 6, $\|\Phi\|_{L^{2}}$ , $\|\epsilon\|_{L^{2}}$ and $\|\epsilon_{g}\|_{L^{2}}$ are all ISS w.r.t. $\|w\|_{L^{2}}$ and $\|w_{g}\|_{L^{2}}$ .

This theorem has two implications. First, the stability results for the filters are independent of the density controller. Hence, they can be used for any control design when density feedback is required. Second, the interconnected system is always ISS as long as the density feedback laws are designed such that the tracking error is ISS w.r.t. the $L^{2}$ norms of estimation errors. Considering that norms of infinite-dimensional vectors are not equivalent in general, obtaining ISS results w.r.t. $L^{2}$ norms is critical. This in turn highlights the advantage of (4), because it is difficult to obtain such an ISS result if $p$ presents in the denominator, as in [9, 10, 11].

Remark 4

In our recent work [13], we proved that with some regularity assumptions on $v$ , the density filter (16) alone also produces convergent gradient estimates, which means the gradient filter may not be needed. However, those assumptions are not satisfied in a feedback interconnected system considered in this work. Hence, the gradient filter (18) still serves as a general solution to estimate the gradient. Determining the condition such that the density filter (16) produces convergent gradient estimates in feedback interconnected systems is the subject of continuing research.

V Simulation studies

Refer to caption — Figure 1: Evolution of the swarm (top), the density estimates $\hat{p}(x,t)$ (middle) and the velocity fields ${v}(x,t)$ (bottom). Each column represents a single time step.

An agent-based simulation using 1024 robots is performed on Matlab to verify the proposed control law. We set $\Omega=(0,1)^{2}$ , $\sqrt{2\sigma}=0.01$ and $\alpha=0.003$ . Each robot is simulated according to (2) where $v$ is given by (6). The initial positions are drawn from a uniform distribution on $[0.15,0.85]^{2}$ . The desired density $p_{*}(x)$ is illustrated in Fig. 2(a). The implementation of the filters and the feedback controller is based on the finite difference method. We discretize $\Omega$ into a $30\times 30$ grid, and the time difference is $0.01s$ . We use KDE (in which we set $h=0.04$ ) to obtain $p_{\text{KDE}}$ and $\nabla p_{\text{KDE}}$ . Numerical implementation of the filters are introduced in [13]. Simulation results are given in Fig. 1. It is seen that the swarm is able to evolve towards the desired density. The convergence error $\|\hat{p}-p^{*}\|_{L^{2}(\Omega)}$ is given in Fig. 2(b), which shows that the error converges exponentially to a small neighbourhood around $0$ and remains bounded, which verifies the ISS property of the proposed algorithm.

VI Conclusion

This paper studied the interplay of density estimation and control algorithms. We proposed new density feedback laws for robust density control of swarm robotic systems and filtering algorithms for estimating the mean-field density and its gradient. We also proved that the interconnection of these algorithms is globally ISS. Our future work is to incorporate the density control algorithm for higher-order nonlinear systems in [18] and the distributed density estimation algorithm in [13], and study their interconnected stability.

References

[1] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and cooperation in networked multi-agent systems,” Proceedings of the IEEE, vol. 95, no. 1, pp. 215–233, 2007.
[2] J. R. Marden, G. Arslan, and J. S. Shamma, “Cooperative control and potential games,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 6, pp. 1393–1407, 2009.
[3] M. Huang, R. P. Malhamé, P. E. Caines et al., “Large population stochastic dynamic games: closed-loop mckean-vlasov systems and the nash certainty equivalence principle,” Communications in Information & Systems, vol. 6, no. 3, pp. 221–252, 2006.
[4] B. Açikmeşe and D. S. Bayard, “A markov chain approach to probabilistic swarm guidance,” in 2012 American Control Conference (ACC). IEEE, 2012, pp. 6300–6307.
[5] S. Bandyopadhyay, S.-J. Chung, and F. Y. Hadaegh, “Probabilistic and distributed control of a large-scale swarm of autonomous agents,” IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1103–1123, 2017.
[6] K. Elamvazhuthi and S. Berman, “Optimal control of stochastic coverage strategies for robotic swarms,” in 2015 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2015, pp. 1822–1829.
[7] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal steering of a linear stochastic system to a final probability distribution, part i,” IEEE Transactions on Automatic Control, vol. 61, no. 5, pp. 1158–1169, 2015.
[8] J. Ridderhof, K. Okamoto, and P. Tsiotras, “Nonlinear uncertainty control with iterative covariance steering,” in 2019 IEEE 58th Conference on Decision and Control (CDC). IEEE, 2019, pp. 3484–3490.
[9] U. Eren and B. Açıkmeşe, “Velocity field generation for density control of swarms using heat equation and smoothing kernels,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 9405–9411, 2017.
[10] K. Elamvazhuthi, H. Kuiper, M. Kawski, and S. Berman, “Bilinear controllability of a class of advection–diffusion–reaction systems,” IEEE Transactions on Automatic Control, vol. 64, no. 6, pp. 2282–2297, 2018.
[11] T. Zheng, Q. Han, and H. Lin, “Transporting robotic swarms via mean-field feedback control,” IEEE Transactions on Automatic Control, pp. 1–1, 2021.
[12] ——, “Pde-based dynamic density estimation for large-scale agent systems,” IEEE Control Systems Letters, vol. 5, no. 2, pp. 541–546, 2020.
[13] ——, “Distributed mean-field density estimation for large-scale systems,” IEEE Transactions on Automatic Control, pp. 1–1, 2021.
[14] D. Bakry, I. Gentil, and M. Ledoux, Analysis and geometry of Markov diffusion operators. Springer Science & Business Media, 2013, vol. 348.
[15] S. Dashkovskiy and A. Mironchenko, “Input-to-state stability of infinite-dimensional control systems,” Mathematics of Control, Signals, and Systems, vol. 25, no. 1, pp. 1–35, 2013.
[16] H. K. Khalil and J. W. Grizzle, Nonlinear systems. Prentice hall Upper Saddle River, NJ, 2002, vol. 3.
[17] R. F. Curtain, “Infinite-dimensional filtering,” SIAM Journal on Control, vol. 13, no. 1, pp. 89–104, 1975.
[18] T. Zheng, Q. Han, and H. Lin, “Backstepping density control for large-scale heterogeneous nonlinear stochastic systems,” in 2022 American Control Conference (ACC), 2022.
[19] G. M. Lieberman, Second order parabolic differential equations. World scientific, 1996.
[20] P. A. Markowich and C. Villani, “On the trend to equilibrium for the fokker-planck equation: an interplay between physics and functional analysis,” Mat. Contemp, vol. 19, pp. 1–29, 2000.
[21] T. Cacoullos, “Estimation of a multivariate density,” Annals of the Institute of Statistical Mathematics, vol. 18, no. 1, pp. 179–189, 1966.