This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sinkhorn MPC: Model predictive optimal transport over
dynamical systems*

Kaito Ito1, Student Member, IEEE, and Kenji Kashima1, Senior Member, IEEE *This work was supported in part by the joint project of Kyoto University and Toyota Motor Corporation, titled “Advanced Mathematical Science for Mobility Society”, and by JSPS KAKENHI Grant Numbers JP21J14577, JP21H04875.1K. Ito and K. Kashima are with the Graduate School of Informatics, Kyoto University, Kyoto, Japan [email protected]; [email protected]©\copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Abstract

We consider the optimal control problem of steering an agent population to a desired distribution over an infinite horizon. This is an optimal transport problem over a dynamical system, which is challenging due to its high computational cost. In this paper, we propose Sinkhorn MPC, which is a dynamical transport algorithm combining model predictive control and the so-called Sinkhorn algorithm. The notable feature of the proposed method is that it achieves cost-effective transport in real time by performing control and transport planning simultaneously. In particular, for linear systems with an energy cost, we reveal the fundamental properties of Sinkhorn MPC such as ultimate boundedness and asymptotic stability.

I Introduction

The problem of controlling a large number of agents has become a more and more important area in control theory with a view to applications in sensor networks, smart grids, intelligent transportation systems, and systems biology, to name a few. One of the most fundamental tasks in this problem is to stabilize a collection of agents to a desired distribution shape with minimum cost. This can be formulated as an optimal transport (OT) problem [1] between the empirical distribution based on the state of the agents and the target distribution over a dynamical system.

In [2, 3, 4, 5], infinitely many agents are represented as a probability density of the state of a single system, and the problem of steering the state from an initial density to a target one with minimum energy is considered. Although this approach can avoid the difficulty due to the large scale of the collective dynamics, in this framework, the agents must have the identical dynamics, and they are considered to be indistinguishable. In addition, even for linear systems, the density control requires us to solve a nonlinear partial differential equation such as the Monge-Ampère equation or the Hamilton-Jacobi-Bellman equation. Furthermore, it is difficult to incorporate state constraints due to practical requirements, such as safety, into the density control.

With this in mind, we rather deal with the collective dynamics directly without taking the number of agents to infinity. This straightforward approach is investigated in multi-agent assignment problems; see e.g., [6, 7] and references therein. In the literature, homogeneous agents following single integrator dynamics and easily computable assignment costs, e.g., distance-based cost, are considered in general. On the other hand, for more general dynamics, the main challenge of the assignment problem is the large computation time for obtaining the minimum cost of stabilizing each agent to each target state (i.e., point-to-point optimal control (OC)) and optimizing the destination of each agent. This is especially problematic for OT in dynamical environments where control inputs need to be determined immediately for given initial and target distributions.

For the point-to-point OC, the concept of model predictive control (MPC) solving a finite horizon OC problem instead of an infinite horizon OC problem is effective in reducing the computational cost while incorporating constraints [8]. On the other hand, in [9], several favorable computational properties of an entropy-regularized version of OT are highlighted. In particular, entropy-regularized OT can be solved efficiently via the Sinkhorn algorithm. Based on these ideas, we propose a dynamical transport algorithm combining MPC and the Sinkhorn algorithm, which we call Sinkhorn MPC. Consequently, the computational effort can be reduced substantially. Moreover, for linear systems with an energy cost, we reveal the fundamental properties of Sinkhorn MPC such as ultimate boundedness and asymptotic stability.

Organization: The remainder of this paper is organized as follows. In Section II, we introduce OT between discrete distributions. In Section III, we provide the problem formulation. In Section IV, we describe the idea of Sinkhorn MPC. In Section V, numerical examples illustrate the utility of the proposed method. In Section VI, for linear systems with an energy cost, we investigate the fundamental properties of Sinkhorn MPC. Some concluding remarks are given in Section VII.

Notation: Let {\mathbb{R}} denote the set of real numbers. The set of all positive (resp. nonnegative) vectors in n{\mathbb{R}}^{n} is denoted by >0n{\mathbb{R}}_{>0}^{n} (resp. 0n{\mathbb{R}}_{\geq 0}^{n}). We use similar notations for the set of all real matrices m×n{\mathbb{R}}^{m\times n} and integers {\mathbb{Z}}, respectively. The set of integers {1,,N}\{1,\ldots,N\} is denoted by [[N]][\![N]\!]. The Euclidean norm is denoted by \|\cdot\|. For a positive semidefinite matrix AA, denote xA:=(xAx)1/2\|x\|_{A}:=(x^{\top}Ax)^{1/2}. The identity matrix of size nn is denoted by InI_{n}. The matrix norm induced by the Euclidean norm is denoted by 2\|\cdot\|_{2}. For vectors x1,,xmnx_{1},\ldots,x_{m}\in{\mathbb{R}}^{n}, a collective vector [x1xm]nm[x_{1}^{\top}\ \cdots\ x_{m}^{\top}]^{\top}\in{\mathbb{R}}^{nm} is denoted by [x1;;xm][x_{1};\ \cdots\ ;x_{m}]. For A=[a1an]m×nA=[a_{1}\ \cdots\ a_{n}]\in{\mathbb{R}}^{m\times n}, we write vec(A):=[a1;;an]{\rm vec}(A):={\color[rgb]{0,0,0}{[a_{1};\ \cdots\ ;a_{n}]}}. For α=[α1αN]N\alpha=[\alpha_{1}\ \cdots\ \alpha_{N}]^{\top}\in{\mathbb{R}}^{N}, the diagonal matrix with diagonal entries {αi}i=1N\{\alpha_{i}\}_{i=1}^{N} is denoted by α\alpha^{\boxbslash}. The block diagonal matrix with diagonal entries {Ai}i=1N,Aim×n\{A_{i}\}_{i=1}^{N},A_{i}\in{\mathbb{R}}^{m\times n} is denoted by {Ai}i\{A_{i}\}_{i}^{\boxbslash}. Especially when Ai=A,iA_{i}=A,\forall i, {Ai}i\{A_{i}\}_{i}^{\boxbslash} is also denoted by A,NA^{\boxbslash,N}. Let (M,d)(M,d) be a metric space. The open ball of radius r>0r>0 centered at xMx\in M is denoted by Br(x):={yM:d(x,y)<r}B_{r}(x):=\{y\in M:d(x,y)<r\}. The element-wise division of a,b>0na,b\in{\mathbb{R}}_{>0}^{n} is denoted by ab:=[a1/b1an/bn]a\oslash b:=[a_{1}/b_{1}\ \cdots\ a_{n}/b_{n}]^{\top}. The NN-dimensional vector of ones is denoted by 1N1_{N}. The gradient of a function ff with respect to the variable xx is denoted by xf\nabla_{x}f. For x,x>0nx,x^{\prime}\in{\mathbb{R}}_{>0}^{n}, define an equivalence relation \sim on >0n{\mathbb{R}}_{>0}^{n} by xxx\sim x^{\prime} if and only if r>0,x=rx\exists r>0,x=rx^{\prime}.

II Background on optimal transport

Here, we briefly review OT between discrete distributions μ:=i=1Naiδxi,ν:=j=1Mbjδyj\mu:=\sum_{i=1}^{N}a_{i}\delta_{x_{i}},\nu:=\sum_{j=1}^{M}b_{j}\delta_{y_{j}} where aΣN:={p0N:i=1Npi=1},bΣMa\in\Sigma_{N}:=\{p\in{\mathbb{R}}_{\geq 0}^{N}:\sum_{i=1}^{N}p_{i}=1\},b\in\Sigma_{M}, xi,yjnx_{i},y_{j}\in{\mathbb{R}}^{n}, and δx\delta_{x} is the Dirac delta at xx. Given a cost function c:n×n((x,y))c:{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}(\ni(x,y))\rightarrow{\mathbb{R}}, which represents the cost of transporting a unit of mass from xx to yy, the original formulation of OT due to Monge seeks a map T:{x1,,xN}{y1,,yM}T:\{x_{1},\ldots,x_{N}\}\rightarrow\{y_{1},\ldots,y_{M}\} that solves

minimize𝑇i[[N]]c(xi,T(xi))\displaystyle\underset{T}{\rm minimize}\ \sum_{i\in[\![N]\!]}c(x_{i},T(x_{i})) (1)
subject tobj=i:T(xi)=yjai,j[[M]].\displaystyle\text{subject to}\ b_{j}=\sum_{i:T(x_{i})=y_{j}}a_{i},\ \forall j\in[\![M]\!].

Especially when M=NM=N and a=b=1N/Na=b=1_{N}/N, the optimal map TT gives the optimal assignment for transporting agents with the initial states {xi}i\{x_{i}\}_{i} to the desired states {yj}j\{y_{j}\}_{j}, and the Hungarian algorithm [10] can be used to solve (1). However, this method can be applied only to small problems because it has O(N3)O(N^{3}) complexity.

On the other hand, the Kantorovich formulation of OT is a linear program (LP):

minimizeP𝒯(a,b)i[[N]],j[[M]]CijPij\underset{P\in{\mathcal{T}}(a,b)}{\rm minimize}\ \sum_{i\in[\![N]\!],j\in[\![M]\!]}C_{ij}P_{ij} (2)

where Cij:=c(xi,yj)C_{ij}:=c(x_{i},y_{j}) and

𝒯(a,b):={P0N×M:P1M=a,P1N=b}.{\mathcal{T}}(a,b):=\{P\in{\mathbb{R}}_{\geq 0}^{N\times M}:P1_{M}=a,\ P^{\top}1_{N}=b\}.

A matrix P𝒯(a,b)P\in{\mathcal{T}}(a,b), which is called a coupling matrix, represents a transport plan where PijP_{ij} describes the amount of mass flowing from xix_{i} towards yjy_{j}. In particular, when M=NM=N and a=b=1N/Na=b=1_{N}/N, there exists an optimal solution from which we can reconstruct an optimal map for (1) [11, Proposition 2.1]. However, similarly to (1), for a large number of agents and destinations, the problem (2) with NMNM variables is challenging to solve.

In view of this, [9] employed an entropic regularization to (2):

minimizeP𝒯(a,b)i[[N]],j[[M]]CijPijεH(P),\underset{P\in{\mathcal{T}}(a,b)}{\rm minimize}\ \sum_{i\in[\![N]\!],j\in[\![M]\!]}C_{ij}P_{ij}-\varepsilon H(P), (3)

where ε>0\varepsilon>0 and the entropy of PP is defined by H(P):=i,jPij(log(Pij)1)H(P):=-\sum_{i,j}P_{ij}(\log(P_{ij})-1). Define the Gibbs kernel KK associated to the cost matrix C=(Cij)C=(C_{ij}) as

K=(Kij)>0N×M,Kij:=exp(Cij/ε).K=(K_{ij})\in{\mathbb{R}}_{>0}^{N\times M},\ K_{ij}:=\exp\left(-C_{ij}/\varepsilon\right).

Then, a unique solution of the entropic OT (3) has the form P=(α)K(β)P^{*}=(\alpha^{*})^{\boxbslash}K(\beta^{*})^{\boxbslash} for two (unknown) scaling variables (α,β)>0N×>0M(\alpha^{*},\beta^{*})\in{\mathbb{R}}_{>0}^{N}\times{\mathbb{R}}_{>0}^{M}. The variables (α,β)(\alpha^{*},\beta^{*}) can be efficiently computed by the Sinkhorn algorithm:

α(k+1)=a[Kβ(k)],β(k)=b[Kα(k)]\alpha(k+1)=a\oslash[K\beta(k)],\ \beta(k)=b\oslash[K^{\top}\alpha(k)] (4)

where

limkα(k+1)Kβ(k)=P,α(0)=α0>0N.\lim_{k\rightarrow\infty}\alpha(k+1)^{\boxbslash}K\beta(k)^{\boxbslash}=P^{*},\ \forall\alpha(0)=\alpha_{0}\in{\mathbb{R}}_{>0}^{N}.

Now, let us introduce Hilbert’s projective metric

d(β,β):=logmaxi,j[[M]]βiβjβjβi,β,β>0M,{d_{\mathcal{H}}}(\beta,\beta^{\prime}):=\log\max_{i,j\in[\![M]\!]}\frac{\beta_{i}\beta^{\prime}_{j}}{\beta_{j}\beta^{\prime}_{i}},\ \beta,\beta^{\prime}\in{\mathbb{R}}_{>0}^{M}, (5)

which is a distance on the projective cone >0M/{\mathbb{R}}_{>0}^{M}/{\sim} (see the Notation in Section I for \sim) and is useful for the convergence analysis of the Sinkhorn algorithm; see [11, Remark 4.12 and 4.14]. Indeed, for any (β,β)(>0M)2(\beta,\beta^{\prime})\in({\mathbb{R}}_{>0}^{M})^{2} and any K¯>0N×M\bar{K}\in{\mathbb{R}}_{>0}^{N\times M}, it holds

d(K¯β,K¯β)λ(K¯)d(β,β){d_{\mathcal{H}}}(\bar{K}\beta,\bar{K}\beta^{\prime})\leq\lambda(\bar{K}){d_{\mathcal{H}}}(\beta,\beta^{\prime}) (6)

where

λ(K¯):=η(K¯)1η(K¯)+1<1,η(K¯):=maxi,j,k,lK¯ikK¯jlK¯jkK¯il.\lambda(\bar{K}):=\frac{\sqrt{\eta(\bar{K})}-1}{\sqrt{\eta(\bar{K})}+1}<1,\ \eta(\bar{K}):=\max_{i,j,k,l}\frac{\bar{K}_{ik}\bar{K}_{jl}}{\bar{K}_{jk}\bar{K}_{il}}.

Then it follows from (6) that

d\displaystyle{d_{\mathcal{H}}} (β(k+1),β)=d(b[Kα(k+1)],b[Kα])\displaystyle(\beta(k+1),\beta^{*})={d_{\mathcal{H}}}(b\oslash[K^{\top}\alpha(k+1)],b\oslash[K^{\top}\alpha^{*}])
=d(Kα(k+1),Kα)\displaystyle={d_{\mathcal{H}}}(K^{\top}\alpha(k+1),K^{\top}\alpha^{*})
λ(K)d(α(k+1),α)λ2(K)d(β(k),β)\displaystyle\leq\lambda(K){d_{\mathcal{H}}}(\alpha(k+1),\alpha^{*})\leq\lambda^{2}(K){d_{\mathcal{H}}}(\beta(k),\beta^{*})

which implies VP(β):=d(β,β)V_{P}(\beta):={d_{\mathcal{H}}}(\beta,\beta^{*}) is a Lyapunov function of (4), and limkβ(k)=β>0M/\lim_{k\rightarrow\infty}\beta(k)=\beta^{*}\in{\mathbb{R}}_{>0}^{M}/{\sim}.

III Problem formulation

In this paper, we consider the problem of stabilizing agents efficiently to a given discrete distribution over dynamical systems. This can be formulated as Monge’s OT problem.

Problem 1

Given initial and desired states {xi0}i=1N,{xj𝖽}j=1N(n)N\{x_{i}^{0}\}_{i=1}^{N},\{x_{j}^{\sf d}\}_{j=1}^{N}\in(\mathbb{R}^{n})^{N}, find control inputs {ui}i=1N\{u_{i}\}_{i=1}^{N} and a permutation σ:[[N]][[N]]\sigma:[\![N]\!]\rightarrow[\![N]\!] that solve

minimize𝜎i[[N]]ci(xi0,xσ(i)𝖽).\displaystyle\underset{\sigma}{\rm minimize}~{}~{}\sum_{i\in[\![N]\!]}c_{\infty}^{i}(x_{i}^{0},x_{\sigma(i)}^{\sf d}). (7)

Here, the cost function cic_{\infty}^{i} is defined by

ci(xi0,xj𝖽):=minuik=0i(xi(k),ui(k);xj𝖽)\displaystyle c_{\infty}^{i}(x_{i}^{0},x_{j}^{\sf d}):=\min_{u_{i}}\ \sum_{k=0}^{\infty}\ell_{i}(x_{i}(k),u_{i}(k);x_{j}^{\sf d}) (8)
subject to xi(k+1)=Aixi(k)+Biui(k),\displaystyle{\color[rgb]{0,0,0}{x_{i}(k+1)=A_{i}x_{i}(k)+B_{i}u_{i}(k)}}, (9)
xi(k)𝒳n,k0,\displaystyle x_{i}(k)\in{\mathcal{X}}\subseteq{\mathbb{R}}^{n},\ \forall k\in{\mathbb{Z}}_{\geq 0}, (10)
xi(0)=xi0,\displaystyle x_{i}(0)=x_{i}^{0}, (11)
limkxi(k)=xj𝖽,\displaystyle\lim_{k\rightarrow\infty}x_{i}(k)=x_{j}^{\sf d}, (12)

where xi(k)n,ui(k)m,Ain×n,Bin×mx_{i}(k)\in{\mathbb{R}}^{n},u_{i}(k)\in{\mathbb{R}}^{m},A_{i}\in{\mathbb{R}}^{n\times n},B_{i}\in{\mathbb{R}}^{n\times m}. \triangle

Note that the running cost i\ell_{i} depends not only on the state xix_{i} and the control input uiu_{i}, but also on the destination xj𝖽x_{j}^{\sf d}. Throughout this paper, we assume the existence of an optimal solution of OC problems. In addition, we assume that there exists a constant input u¯i\bar{u}_{i} under which xi=xj𝖽x_{i}=x_{j}^{\sf d} is an equilibrium of (9). A necessary condition for the infinite horizon cost ci(xi0,xj𝖽)c_{\infty}^{i}(x_{i}^{0},x_{j}^{\sf d}) to be finite is that at xi=xj𝖽x_{i}=x_{j}^{\sf d} with ui=u¯iu_{i}=\bar{u}_{i}, there is not a cost incurred, i.e., i(xj𝖽,u¯i;xj𝖽)=0\ell_{i}(x_{j}^{\sf d},\bar{u}_{i};x_{j}^{\sf d})=0. If BiB_{i} is square and invertible, u¯i=Bi1(xj𝖽Aixj𝖽)\bar{u}_{i}=B_{i}^{-1}(x_{j}^{\sf d}-A_{i}x_{j}^{\sf d}) makes xi=xj𝖽x_{i}=x_{j}^{\sf d} an equilibrium.

IV Combining MPC and Sinkhorn algorithm

The main difficulties of Problem 1 are as follows:

  1. 1.

    In most cases, the infinite horizon OC problem ci(xi0,xj𝖽)c_{\infty}^{{\color[rgb]{0,0,0}{i}}}(x_{i}^{0},x_{j}^{\sf d}) is computationally intractable.

  2. 2.

    In addition, given ci(xi0,xj𝖽),i,j[[N]]c_{\infty}^{{\color[rgb]{0,0,0}{i}}}(x_{i}^{0},x_{j}^{\sf d}),\forall i,j\in[\![N]\!], the assignment problem needs to be solved, which leads to the high computational burden when NN is large.

To overcome these issues, we utilize the concept of MPC, which solves tractable finite horizon OC while satisfying the state constraint (10). Specifically, at each time, we address the OT problem whose cost function with the current state xˇi\check{x}_{i} and the finite horizon Th>0T_{h}\in{\mathbb{Z}}_{>0} is given by

cThi(xˇi,xj𝖽):=\displaystyle c_{T_{h}}^{{\color[rgb]{0,0,0}{i}}}(\check{x}_{i},x_{j}^{\sf d}):= minuik=0Th1i(xi(k),ui(k);xj𝖽)\displaystyle\min_{u_{i}}\ \sum_{k=0}^{T_{h}-1}\ell_{i}(x_{i}(k),u_{i}(k);x_{j}^{\sf d})
subj. to (9), (10),xi(0)=xˇi,xi(Th)=xj𝖽.\displaystyle\text{subj. to \eqref{eq:nonlinear_dynamics}, {\color[rgb]{0,0,0}{\eqref{eq:state_constraint}}},}~{}~{}x_{i}(0)=\check{x}_{i},\ x_{i}(T_{h})=x_{j}^{\sf d}.

Denote the first control in the optimal sequence by uiMPC(xˇi,xj𝖽)u_{i}^{\rm MPC}(\check{x}_{i},x_{j}^{\sf d}). From the viewpoint of the computation time for solving the OT problem, we relax the Problem 1 by the entropic regularization. It should be noted that, in challenging situations in which the number of agents is large and the sampling time is small, only a few Sinkhorn iterations are allowed. In what follows, we deal only with the case where just one Sinkhorn iteration is performed at each time. Nevertheless, by similar arguments, all of the results of this paper are still valid when more iterations are performed.

Based on the approximate solution P(k)P(k) obtained by the Sinkhorn algorithm at time kk, we need to determine a target state for each agent. Then, we introduce a set 𝕏n{\mathbb{X}}\subset{\mathbb{R}}^{n} and a map xtmp𝖽,i:0N×N𝕏x_{\rm tmp}^{{\sf d},i}:{\mathbb{R}}_{\geq 0}^{N\times N}\rightarrow{\mathbb{X}} as a policy to determine a temporary target xtmp𝖽,i(P(k))x_{\rm tmp}^{{\sf d},i}(P(k)) at time kk for agent ii. Hereafter, assume that there exists a constant rupp>0r_{\rm upp}>0 such that

xrupp,x𝕏.\|x\|\leq r_{\rm upp},\ \forall x\in{\mathbb{X}}. (13)

For example, if 𝕏{\mathbb{X}} is the convex hull of {xj𝖽}j\{x_{j}^{\sf d}\}_{j}, we can take rupp=maxjxj𝖽r_{\rm upp}=\max_{j}\|x_{j}^{\sf d}\|. A typical policy to approximate a Monge’s OT map from a coupling matrix PP is the so-called barycentric projection xtmp𝖽,i(P)=Nj=1NPijxj𝖽x_{\rm tmp}^{{\sf d},i}(P)=N\sum_{j=1}^{N}P_{ij}x_{j}^{\sf d} [11, Remark 4.11].

For any given policy xtmp𝖽,ix_{\rm tmp}^{{\sf d},i}, we summarize the above strategy as the following dynamics where the Sinkhorn algorithm behaves as a dynamic controller.
Sinkhorn MPC:

xi(k+1)=Aixi(k)+BiuiMPC(xi(k),xtmp𝖽,i(P(k))),\displaystyle x_{i}(k+1)=A_{i}x_{i}(k)+B_{i}u_{i}^{\rm MPC}\bigl{(}x_{i}(k),x_{\rm tmp}^{{\sf d},i}\left(P(k)\right)\bigr{)},
i[[N]],\displaystyle\hskip 170.71652pt\forall i\in[\![N]\!], (14)
P(k)=α(k+1)K(x(k))β(k),\displaystyle P(k)=\alpha(k+1)^{\boxbslash}K(x(k))\beta(k)^{\boxbslash}, (15)
α(k+1)=1N/N[K(x(k))β(k)],\displaystyle\alpha(k+1)=1_{N}/N\oslash\left[K(x(k))\beta(k)\right], (16)
β(k)=1N/N[K(x(k))α(k)],\displaystyle\beta(k)=1_{N}/N\oslash\left[K(x(k))^{\top}\alpha(k)\right], (17)
xi(0)=xi0,α(0)=α0,\displaystyle x_{i}(0)=x_{i}^{0},\ \alpha(0)=\alpha_{0},

where

Kij(x):=exp(cThi(xi,xj𝖽)ε),x=[x1;;xN]nN,\displaystyle K_{ij}(x):=\exp\left(-\frac{c_{T_{h}}^{{\color[rgb]{0,0,0}{i}}}(x_{i},x_{j}^{\sf d})}{\varepsilon}\right),\ x=[x_{1};\cdots;x_{N}]\in{\mathbb{R}}^{nN},

and the initial value α0\alpha_{0} is arbitrary. \triangle
Note that, under the assumption that for all i[[N]]i\in[\![N]\!], BiB_{i} is square and invertible, Sinkhorn MPC is obviously recursively feasible, and the dynamics (14) satisfies the state constraint (10) for all i[[N]]i\in[\![N]\!].

V Numerical examples

This section gives examples for Sinkhorn MPC with an energy cost

i(xi,ui;xj𝖽)=uiBi1(xj𝖽Aixj𝖽)2,\ell_{i}(x_{i},u_{i};x_{j}^{\sf d})=\|u_{i}-B_{i}^{-1}(x_{j}^{\sf d}-A_{i}x_{j}^{\sf d})\|^{2}, (18)

and 𝒳=n{\mathcal{X}}={\mathbb{R}}^{n}. Then, the dynamics under Sinkhorn MPC can be written as follows [12, Section 2.2, pp. 37-39]:

xi(k+1)=A¯ixi(k)+(IA¯i)xtmp𝖽,i(P(k)),\displaystyle x_{i}(k+1)=\bar{A}_{i}x_{i}(k)+(I-\bar{A}_{i})x_{\rm tmp}^{{\sf d},i}(P(k)), (19)
uiMPC(xi,x^)=Bi(Ai)Th1Gi,Th1AiTh(xix^)\displaystyle u_{i}^{\rm MPC}(x_{i},{\color[rgb]{0,0,0}{\hat{x}}})=-B_{i}^{\top}(A_{i}^{\top})^{T_{h}-1}G_{i,T_{h}}^{-1}A_{i}^{T_{h}}(x_{i}-{\color[rgb]{0,0,0}{\hat{x}}})
+Bi1(x^Aix^),i[[N]],x^n\displaystyle\hskip 71.13188pt+B_{i}^{-1}({\color[rgb]{0,0,0}{\hat{x}}}-A_{i}{\color[rgb]{0,0,0}{\hat{x}}}),\ \forall i\in[\![N]\!],\ {\color[rgb]{0,0,0}{\forall\hat{x}\in{\mathbb{R}}^{n}}}

with (15)–(17) where

Kij(x)=exp(xixj𝖽𝒢i2ε),\displaystyle K_{ij}(x)=\exp\left(-\frac{\|x_{i}-x_{j}^{\sf d}\|_{{\mathcal{G}}_{i}}^{2}}{\varepsilon}\right),
𝒢i:=(AiTh)Gi,Th1AiTh,Gi,Th:=k=0Th1AikBiBi(Ai)k,\displaystyle{\mathcal{G}}_{i}:=(A_{i}^{T_{h}})^{\top}G_{i,T_{h}}^{-1}A_{i}^{T_{h}},\ G_{i,T_{h}}:=\sum_{k=0}^{T_{h}-1}A_{i}^{k}B_{i}B_{i}^{\top}(A_{i}^{\top})^{k},
A¯i:=AiBiBi(Ai)Th1Gi,Th1AiTh.\displaystyle\bar{A}_{i}:=A_{i}-B_{i}B_{i}^{\top}(A_{i}^{\top})^{T_{h}-1}G_{i,T_{h}}^{-1}A_{i}^{T_{h}}.

In the examples below, we use the barycentric target xtmp𝖽,i(P)=Nj=1NPijxj𝖽x_{\rm tmp}^{{\sf d},i}(P)=N\sum_{j=1}^{N}P_{ij}x_{j}^{\sf d}.

First, consider the case where

Ai=[1.20.130.051.1],Bi=0.1I2,i[[N]],A_{i}=\begin{bmatrix}1.2&0.13\\ -0.05&1.1\end{bmatrix},\ B_{i}=0.1I_{2},\ \forall i\in[\![N]\!], (20)

and set N=150,ε=1.0,Th=10,α0=1NN=150,\ \varepsilon=1.0,\ T_{h}=10,\ \alpha_{0}=1_{N}. For given initial and desired states, the trajectories of the agents governed by (19) with (15)–(17) are illustrated in Fig. 1. It can be seen that the agents converge sufficiently close to the target states. The computation time for one Sinkhorn iteration is about 0.00630.0063 ms, 0.0300.030 ms, and 0.080.08 ms for N=150,500,800N=150,500,800, respectively, with MacBook Pro with Intel Core i5. On the other hand, solving the linear program (2) with MATLAB linprog takes about 0.120.12 s, 6.46.4 s, and 6666 s for N=150,500,800N=150,500,800, respectively, and is thus not scalable. Hence, Sinkhorn MPC contributes to reducing the computational burden.

Refer to caption
(a) k=0k=0
Refer to caption
(b) k=50k=50
Refer to caption
(c) k=200k=200
Refer to caption
(d) k=500k=500
Refer to caption
(e) k=1000k=1000
Refer to caption
(f) k=3000k=3000
Figure 1: Trajectories xi(k)=[xi(1)(k)xi(2)(k)]x_{i}(k)=[x_{i}^{(1)}(k)\ x_{i}^{(2)}(k)]^{\top} of 150 agents for (20) (colored filled circles) and desired states (black circles).

Next, we investigate the effect of the regularization parameter ε\varepsilon on the behavior of Sinkhorn MPC. To this end, consider a simple case where N=10,Th=10N=10,\ T_{h}=10, and

Ai=1,Bi=0.1,i[[N]].A_{i}=1,\ B_{i}=0.1,\ \forall i\in[\![N]\!]. (21)

Then the trajectories of the agents with ε=0.5,1.0\varepsilon=0.5,1.0 are shown in Fig. 2. As can be seen, the overshoot/undershoot is reduced for larger ε\varepsilon while the limiting values of the states deviate from the desired states. In other words, the parameter ε\varepsilon reflects the trade-off between the stationary and transient behaviors of the dynamics under Sinkhorn MPC.

Refer to caption
Figure 2: Trajectories of 10 agents for (21) with ε=0.5\varepsilon=0.5 (solid) and ε=1.0\varepsilon=1.0 (chain), respectively, and desired states (black circles).

VI Fundamental properties of Sinkhorn MPC with an energy cost

In this section, we elucidate ultimate boundedness and asymptotic stability of Sinkhorn MPC with the energy cost (18) and 𝒳=n{\mathcal{X}}={\mathbb{R}}^{n}. Hereafter, we assume the invertibility of BiB_{i}.

VI-A Ultimate boundedness for Sinkhorn MPC

It is known that, under the assumption that BiB_{i} is invertible, A¯i\bar{A}_{i} is stable, i.e., the spectral radius ρi\rho_{i} of A¯i\bar{A}_{i} satisfies ρi<1\rho_{i}<1 [13, Corollary 1]. Using this fact, we derive the ultimate boundedness of (19) with (15)–(17).

Proposition 1

For any δ>0,{xi0}i\delta>0,\{x_{i}^{0}\}_{i}, and {νi}i\{\nu_{i}\}_{i} satisfying νi>0,ρi+νi<1,i\nu_{i}>0,\rho_{i}+\nu_{i}<1,\forall i, there exists τ(δ,{xi0},{νi})>0\tau(\delta,\{x_{i}^{0}\},\{\nu_{i}\})\in{\mathbb{Z}}_{>0} such that the solution {xi}i\{x_{i}\}_{i} of (19) with (15)–(17) satisfies

xi(k)<δ+ruppIA¯i21(ρi+νi),kτ,i[[N]],\|x_{i}(k)\|<\delta+\frac{r_{\rm upp}\|I-\bar{A}_{i}\|_{2}}{1-(\rho_{i}+\nu_{i})},\ \forall k\geq{\color[rgb]{0,0,0}{\tau}},\ \forall i\in[\![N]\!], (22)

where ruppr_{\rm upp} satisfies (13).

Proof:

Let u~i(k):=(IA¯i)xtmp𝖽,i(P(k))\tilde{u}_{i}(k):=(I-\bar{A}_{i})x_{\rm tmp}^{{\sf d},i}(P(k)). Then, it follows from (13) that

u~i(k)ruppIA¯i2,k0.\displaystyle\|\tilde{u}_{i}(k)\|\leq r_{\rm upp}\|I-\bar{A}_{i}\|_{2},\ \forall k\in{\mathbb{Z}}_{\geq 0}.

Note that for any νi>0\nu_{i}>0, there exists τi(νi)>0\tau_{i}(\nu_{i})\in{\mathbb{Z}}_{>0} such that

A¯ik<(ρi+νi)k,kτi.{\color[rgb]{0,0,0}{\|\bar{A}_{i}^{k}\|<(\rho_{i}+\nu_{i})^{k},\ \forall k\geq\tau_{i}.}}

Hence, the desired result is straightforward from

xi(k)A¯ik2xi0+s=1kA¯is12u~i(ks).\displaystyle\|x_{i}(k)\|\leq\|\bar{A}_{i}^{k}\|_{2}\|x_{i}^{0}\|+\sum_{s=1}^{k}\|\bar{A}_{i}^{s-1}\|_{2}\|\tilde{u}_{i}(k-s)\|.

We emphasize that Proposition 1 holds for any policy xtmp𝖽,ix_{\rm tmp}^{{\sf d},i} whose range 𝕏{\mathbb{X}} satisfies (13).

VI-B Existence of the equilibrium points

In the remainder of this section, we focus on the barycentric target xtmp𝖽,i(P)=Nj=1NPijxj𝖽x_{\rm tmp}^{{\sf d},i}(P)=N\sum_{j=1}^{N}P_{ij}x_{j}^{\sf d}. For (x,β)nN×>0N(x,\beta)\in{\mathbb{R}}^{nN}\times{\mathbb{R}}_{>0}^{N} and X𝖽:=[x1𝖽xN𝖽]n×NX^{\sf d}:=[x_{1}^{\sf d}\ \cdots\ x_{N}^{\sf d}]\in{\mathbb{R}}^{n\times N}, define

f1(x,β):={A¯i}ix+N{InA¯i}i(X𝖽),Nvec(P~(x,β)),\displaystyle f_{1}(x,\beta):=\{\bar{A}_{i}\}_{i}^{\boxbslash}x+N\{I_{n}-\bar{A}_{i}\}_{i}^{\boxbslash}(X^{\sf d})^{\boxbslash,N}{\rm vec}(\tilde{P}(x,\beta)), (23)
f2(x,β):=1N/N[K(f1(x,β))(1N/N[K(x)β])],\displaystyle f_{2}(x,\beta):=1_{N}/N\oslash\left[K(f_{1}(x,\beta))^{\top}(1_{N}/N\oslash[K(x)\beta])\right], (24)
P~(x,β):=(1N/N[K(x)β])K(x)β.\displaystyle\tilde{P}(x,\beta):=\left(1_{N}/N\oslash[K(x)\beta]\right)^{\boxbslash}K(x)\beta^{\boxbslash}. (25)

Then, the collective dynamics (19) with (15)–(17) is

x(k+1)=f1(x(k),β(k)),\displaystyle x(k+1)=f_{1}(x(k),\beta(k)), (26)
β(k+1)=f2(x(k),β(k)),\displaystyle\beta(k+1)=f_{2}(x(k),\beta(k)), (27)

where x(k):=[x1(k);;xN(k)]x(k):={\color[rgb]{0,0,0}{[x_{1}(k);\ \cdots;x_{N}(k)]}}.

Here, we characterize equilibria of (26), (27). Note that the existence of the equilibria is not trivial due to the nonlinearity of the dynamics. A point x𝖾=[x1𝖾;;xN𝖾]nNx^{{\sf e}}={\color[rgb]{0,0,0}{[x_{1}^{\sf e};\ \cdots\ ;x_{N}^{\sf e}]}}\in{\mathbb{R}}^{nN} is an equilibrium if and only if

(InA¯i)(xi𝖾Nj=1NPij(x𝖾)xj𝖽)=0,i[[N]],\displaystyle(I_{n}-\bar{A}_{i})\biggl{(}x_{i}^{{\sf e}}-N\sum_{j=1}^{N}P_{ij}^{*}(x^{\sf e})x_{j}^{\sf d}\biggr{)}=0,\ \forall i\in[\![N]\!],
Pij(x):=αiKij(x)βj,α,β>0N,\displaystyle P_{ij}^{*}(x):=\alpha_{i}^{*}K_{ij}(x)\beta_{j}^{*},\ \alpha^{*},\beta^{*}\in{\mathbb{R}}_{>0}^{N}, (28)
α=1N/N[K(x)β],β=1N/N[K(x)α].\displaystyle\alpha^{*}=1_{N}/N\oslash\left[K(x)\beta^{*}\right],\beta^{*}=1_{N}/N\oslash\left[K(x)^{\top}\alpha^{*}\right]. (29)

The stability of A¯i\bar{A}_{i} implies that it has no eigenvalue equal to 11, and therefore InA¯iI_{n}-\bar{A}_{i} is invertible. Thus, the necessary and sufficient condition for the equilibria is given by

xi𝖾Nj=1NPij(x𝖾)xj𝖽=0,i[[N]].x_{i}^{{\sf e}}-N\sum_{j=1}^{N}P_{ij}^{*}(x^{\sf e})x_{j}^{{\sf d}}=0,\ \forall i\in[\![N]\!]. (30)
Proposition 2

The dynamics (26), (27) has at least one equilibrium point (x𝖾,β𝖾)nN×(>0N/)(x^{\sf e},\beta^{\sf e})\in{\mathbb{R}}^{nN}\times({\mathbb{R}}_{>0}^{N}/{\sim}).

Proof:

Note that if a point x𝖾nNx^{{\sf e}}\in{\mathbb{R}}^{nN} satisfies (30), the corresponding β𝖾>0N/\beta^{\sf e}\in{\mathbb{R}}_{>0}^{N}/{\sim} is uniquely determined by β𝖾=β\beta^{\sf e}=\beta^{*} in (29) with x=x𝖾x=x^{\sf e} [11, Theorem 4.2]. Define a continuous map h:nNnNh:{\mathbb{R}}^{nN}\rightarrow{\mathbb{R}}^{nN} as

h(x):=[Nj=1NP1j(x)xj𝖽Nj=1NPNj(x)xj𝖽],xnN.h(x):=\begin{bmatrix}N\sum_{j=1}^{N}P_{1j}^{*}(x)x_{j}^{\sf d}\\ \vdots\\ N\sum_{j=1}^{N}P_{Nj}^{*}(x)x_{j}^{\sf d}\end{bmatrix},\ x\in{\mathbb{R}}^{nN}. (31)

From (30), fixed points of hh are equilibria of (26), (27). Note that for any i[[N]]i\in[\![N]\!] and any xnNx\in{\mathbb{R}}^{nN}, Nj=1NPij(x)xj𝖽N\sum_{j=1}^{N}P_{ij}^{*}(x)x_{j}^{\sf d} belongs to the convex hull 𝕏{\mathbb{X}} of {xj𝖽}j\{x_{j}^{\sf d}\}_{j}. For brevity, we abuse notation and regard 𝕏N{\mathbb{X}}^{N} as a subset of nN{\mathbb{R}}^{nN}. Let h𝕏:𝕏N𝕏Nh_{{\mathbb{X}}}:{\mathbb{X}}^{N}\rightarrow{\mathbb{X}}^{N} be the restriction of hh in (31) to 𝕏N{\mathbb{X}}^{N}. Now we can utilize Brouwer’s fixed point theorem. That is, since h𝕏h_{{\mathbb{X}}} is a continuous map from a compact convex set 𝕏N{\mathbb{X}}^{N} into itself, there exists a point x𝖾𝕏Nx^{{\sf e}}\in{\mathbb{X}}^{N} such that x𝖾=h(x𝖾)x^{{\sf e}}=h(x^{\sf e}). ∎

Sometimes, in order to emphasize the dependence of (x𝖾,β𝖾)(x^{\sf e},\beta^{\sf e}) on ε\varepsilon, we write (x𝖾(ε),β𝖾(ε))(x^{\sf e}(\varepsilon),\beta^{\sf e}(\varepsilon)).

VI-C Asymptotic stability for Sinkhorn MPC

Next, we analyze the stability of the equilibrium points. For this purpose, the following lemma is crucial when ε\varepsilon is small. Due to the limited space, we omit the proof.

Lemma 1

Assume that xi𝖽xj𝖽x_{i}^{\sf d}\neq x_{j}^{\sf d} for all (i,j),ij(i,j),\ i\neq j. For a permutation σ:[[N]][[N]]\sigma:[\![N]\!]\rightarrow[\![N]\!], define x𝖽(σ):=[xσ(1)𝖽;;xσ(N)𝖽]x^{{\sf d}}(\sigma):={\color[rgb]{0,0,0}{[x_{\sigma(1)}^{{\sf d}};\ \cdots\ ;x_{\sigma(N)}^{{\sf d}}]}} and a permutation matrix Pσ=(Pijσ)P^{\sigma}=(P_{ij}^{\sigma}) as Pijσ:=1/NP_{ij}^{\sigma}:=1/N if j=σ(i)j=\sigma(i), and 0, otherwise. Then, there exists an equilibrium (x𝖾(ε),β𝖾(ε))(x^{\sf e}(\varepsilon),\beta^{\sf e}(\varepsilon)) of (26), (27) such that x𝖾(ε)x^{\sf e}(\varepsilon) and P(x𝖾(ε))P^{*}(x^{\sf e}(\varepsilon)) converge exponentially to x𝖽(σ)x^{{\sf d}}(\sigma) and PσP^{\sigma}, respectively, as ε+0\varepsilon\rightarrow+0, i.e., there exists ζ>0\zeta>0 such that

limε+0η(ε)2exp(ζ/ε)=0\lim_{\varepsilon\rightarrow+0}\frac{\|\eta(\varepsilon)\|_{2}}{\exp(-\zeta/\varepsilon)}=0

for η(ε)=x𝖾(ε)x𝖽(σ)\eta(\varepsilon)=x^{\sf e}(\varepsilon)-x^{\sf d}(\sigma) and η(ε)=P(x𝖾(ε))Pσ\eta(\varepsilon)=P^{*}(x^{\sf e}(\varepsilon))-P^{\sigma}. \triangle

Denote by Exp(σ){\rm Exp}(\sigma) the set of all equilibria (x𝖾(),β𝖾())(x^{\sf e}(\cdot),\beta^{\sf e}(\cdot)) of (26), (27) having the property in Lemma 1 for a permutation σ\sigma.

For P¯N×N\bar{P}\in{\mathbb{R}}^{N\times N} and x=[x1;;xN]nNx={\color[rgb]{0,0,0}{[x_{1};\ \cdots\ ;x_{N}]}}\in{\mathbb{R}}^{nN}, define

Vx(x):=i=1NxiNj=1NP¯ijxj𝖽𝒢i2.\displaystyle V_{\rm x}(x):=\sum_{i=1}^{N}\Bigl{\|}x_{i}-N\sum_{j=1}^{N}\bar{P}_{ij}x_{j}^{\sf d}\Bigr{\|}_{{\mathcal{G}}_{i}}^{2}.

Then, VxV_{\rm x} is a Lyapunov function of (26) where P~(x,β)\tilde{P}(x,\beta) is fixed by P¯\bar{P} [14]. Indeed, we have

Vx(x(k+1))Vx(x(k))i=1NW1,i(xi(k),P¯)\displaystyle V_{\rm x}(x(k+1))-V_{\rm x}(x(k))\leq-\sum_{i=1}^{N}W_{1,i}(x_{i}(k),\bar{P})
W1,i(xi,P¯):=Bi(Ai)Th1Gi,Th1AiTh\displaystyle W_{1,i}(x_{i},\bar{P}):=\Bigl{\|}B_{i}^{\top}(A_{i}^{\top})^{T_{h}-1}G_{i,T_{h}}^{-1}A_{i}^{T_{h}}
×(xiNj=1NP¯ijxj𝖽)2.\displaystyle\hskip 113.81102pt\times\bigl{(}x_{i}-N\sum_{j=1}^{N}\bar{P}_{ij}x_{j}^{\sf d}\bigr{)}\Bigr{\|}^{2}.

Given an equilibrium (x𝖾,β𝖾)(x^{\sf e},\beta^{\sf e}), let us take the optimal coupling P𝖾:=P(x𝖾)P^{\sf e}:=P^{*}(x^{{\sf e}}) as P¯\bar{P}, and for γ>0\gamma>0, define

V(x,β):=Vx(x)+γd(β,β𝖾),(x,β)nN×(>0N/).V(x,\beta):=V_{\rm x}(x)+\gamma{d_{\mathcal{H}}}(\beta,\beta^{\sf e}),\ (x,\beta)\in{\mathbb{R}}^{nN}\times({\mathbb{R}}_{>0}^{N}/{\sim}). (32)

The following theorem follows from the fact that, for sufficiently small or large ε>0\varepsilon>0 and large γ>0\gamma>0, VV behaves as a Lyapunov function of (26), (27) with respect to (x𝖾,β𝖾)(x^{\sf e},\beta^{\sf e}).

Theorem 1

Assume that for all i[[N]]i\in[\![N]\!], AiA_{i} is invertible. Then the following hold:

  • (i)

    Assume that (x𝖾,β𝖾)(x^{\sf e},\beta^{\sf e}) is an isolated equilibrium111An equilibrium is said to be isolated if it has a neighborhood which does not contain any other equilibria. of (26), (27). Then, for a sufficiently large ε>0\varepsilon>0, (x𝖾,β𝖾)(x^{\sf e},\beta^{\sf e}) is locally asymptotically stable.

  • (ii)

    Assume that xi𝖽xj𝖽x_{i}^{\sf d}\neq x_{j}^{\sf d} for all (i,j),ij(i,j),\ i\neq j. Assume further that for some ε>0\varepsilon^{\prime}>0, (x𝖾(ε),β𝖾(ε))(x^{\sf e}(\varepsilon^{\prime}),\beta^{\sf e}(\varepsilon^{\prime})) is an isolated equilibrium of (26), (27) and for some permutation σ\sigma, (x𝖾(),β𝖾())Exp(σ)(x^{\sf e}(\cdot),\beta^{\sf e}(\cdot))\in{\rm Exp}(\sigma). Then, for sufficiently small ε>0\varepsilon>0, (x𝖾(ε),β𝖾(ε))(x^{\sf e}(\varepsilon),\beta^{\sf e}(\varepsilon)) is locally asymptotically stable.

Proof:

We prove only (ii) as the proof is similar for (i). In this proof, we regard (x(),β())(x(\cdot),\beta(\cdot)) as a trajectory in a metric space nN×(>0N/){\mathbb{R}}^{nN}\times({\mathbb{R}}_{>0}^{N}/{\sim}) with metric d((x,β),(x,β)):=xx+d(β,β)d((x,\beta),(x^{\prime},\beta^{\prime})):=\|x-x^{\prime}\|+{d_{\mathcal{H}}}(\beta,\beta^{\prime}). Fix any (x𝖾,β𝖾)Exp(σ)(x^{\sf e},\beta^{\sf e})\in{\rm Exp}(\sigma) satisfying the assumption in (ii). By definition, it is trivial that VV is positive definite on a neighborhood of (x𝖾,β𝖾)(x^{\sf e},\beta^{\sf e}). Moreover, for any (x,β)nN×(>0N/)(x,\beta)\in{\mathbb{R}}^{nN}\times({\mathbb{R}}_{>0}^{N}/{\sim}), we have

V(f1(x,β),f2(x,β))V(x,β)\displaystyle V(f_{1}(x,\beta),f_{2}(x,\beta))-V(x,\beta)
i=1N{A¯i(xiNjP~ij(x,β)xj𝖽)\displaystyle\leq\sum_{i=1}^{N}\biggl{\{}\Bigl{\|}\bar{A}_{i}\Bigl{(}x_{i}-N\sum_{j}\tilde{P}_{ij}(x,\beta)x_{j}^{{\sf d}}\Bigr{)}
+Nj(P~ij(x,β)Pij𝖾)xj𝖽𝒢i2xiNjPij𝖾xj𝖽𝒢i2}\displaystyle+N\sum_{j}(\tilde{P}_{ij}(x,\beta)-P_{ij}^{\sf e})x_{j}^{{\sf d}}\Bigr{\|}_{{\mathcal{G}}_{i}}^{2}-\Bigl{\|}x_{i}-N\sum_{j}P_{ij}^{\sf e}x_{j}^{\sf d}\Bigr{\|}_{{\mathcal{G}}_{i}}^{2}\biggr{\}}
+γ(W3(x,β)+W4(x,β)+W5(x,β))\displaystyle\quad+\gamma(-W_{3}(x,\beta)+W_{4}(x,\beta)+W_{5}(x,\beta))
i=1N(W1,i(xi,P~(x,β))+W2,i(x,β))\displaystyle\leq\sum_{i=1}^{N}\left(-W_{1,i}(x_{i},\tilde{P}(x,\beta))+W_{2,i}(x,\beta)\right)
+γ(W3(x,β)+W4(x,β)+W5(x,β))=:W(x,β),\displaystyle\qquad+\gamma(-W_{3}(x,\beta)+W_{4}(x,\beta)+W_{5}(x,\beta))=:W(x,\beta),

where we used the triangle inequality for d{d_{\mathcal{H}}}, and

W2,i(x,β):=2(xiNj[[N]]P~ij(x,β)xj𝖽)(A¯iIn)𝒢i\displaystyle W_{2,i}(x,\beta):=2\Bigl{(}x_{i}-N\sum_{j\in[\![N]\!]}\tilde{P}_{ij}(x,\beta)x_{j}^{\sf d}\Bigr{)}^{\top}(\bar{A}_{i}-I_{n})^{\top}{\mathcal{G}}_{i}
×Nj[[N]](P~ij(x,β)Pij𝖾)xj𝖽,\displaystyle\qquad\qquad\qquad\times N\sum_{j\in[\![N]\!]}(\tilde{P}_{ij}(x,\beta)-P_{ij}^{\sf e})x_{j}^{\sf d},
W3(x,β):=[1λ(K(x))λ(K(f1(x,β)))]d(β,β𝖾),\displaystyle W_{3}(x,\beta):=\left[1-\lambda(K(x))\lambda\left(K(f_{1}(x,\beta))\right)\right]{d_{\mathcal{H}}}(\beta,\beta^{\sf e}),
W4(x,β):=d(K(f1(x,β))α𝖾,(K𝖾)α𝖾),\displaystyle W_{4}(x,\beta):={d_{\mathcal{H}}}(K(f_{1}(x,\beta))^{\top}\alpha^{\sf e},(K^{\sf e})^{\top}\alpha^{\sf e}),
W5(x,β):=λ(K(f1(x,β)))d(K(x)β𝖾,K𝖾β𝖾),\displaystyle W_{5}(x,\beta):=\lambda\left(K(f_{1}(x,\beta))\right){d_{\mathcal{H}}}(K(x)\beta^{\sf e},K^{\sf e}\beta^{\sf e}),
K𝖾:=K(x𝖾),α𝖾:=1N/N[K𝖾β𝖾].\displaystyle K^{\sf e}:=K(x^{\sf e}),\ \alpha^{\sf e}:=1_{N}/N\oslash[K^{\sf e}\beta^{\sf e}].

In the sequel, we explain that sufficiently small ε\varepsilon and large γ\gamma enable us to take a neighborhood Br(x𝖾,β𝖾)B_{r}(x^{\sf e},\beta^{\sf e}) where

W(x,β)<0,(x,β)Br(x𝖾,β𝖾)\{(x𝖾,β𝖾)},W(x,\beta)<0,\ \forall(x,\beta)\in B_{r}(x^{\sf e},\beta^{\sf e})\backslash\{(x^{\sf e},\beta^{\sf e})\}, (33)

which means the asymptotic stability of (x𝖾,β𝖾)(x^{\sf e},\beta^{\sf e}) [15, Theorem 1.3].

First, a straightforward calculation yields, for any i,j[[N]],l[[n]]i,j\in[\![N]\!],l\in[\![n]\!] and any (x,β)nN×(>0N/)(x,\beta)\in{\mathbb{R}}^{nN}\times({\mathbb{R}}_{>0}^{N}/{\sim}),

|xi,lP~ij(x,β)|2Ng¯i,j,lεP~ij(x,β)(1NP~ij(x,β)),\displaystyle\left|\frac{\partial}{\partial x_{i,l}}\tilde{P}_{ij}(x,\beta)\right|\leq\frac{2N\bar{g}_{i,j,l}}{\varepsilon}\tilde{P}_{ij}(x,\beta)\left(\frac{1}{N}-\tilde{P}_{ij}(x,\beta)\right),
xi=[xi,1xi,n],\displaystyle x_{i}=[x_{i,1}\ \cdots\ x_{i,n}]^{\top},
g¯i,j,l:=maxkj|gi,l(xj𝖽xk𝖽)|,𝒢i=[gi,1gi,n].\displaystyle\bar{g}_{i,j,l}:=\max_{k\neq j}|g_{i,l}^{\top}(x_{j}^{\sf d}-x_{k}^{\sf d})|,\ {\mathcal{G}}_{i}=[g_{i,1}\ \cdots\ g_{i,n}]^{\top}.

From Lemma 1, under the assumption xi𝖽xj𝖽,ijx_{i}^{\sf d}\neq x_{j}^{\sf d},\ i\neq j, P~ij(x𝖾(ε),β𝖾(ε))\tilde{P}_{ij}(x^{\sf e}(\varepsilon),\beta^{\sf e}(\varepsilon)) converges exponentially to 0 or 1/N1/N as ε+0\varepsilon\rightarrow+0. Hence, the variation of W2,iW_{2,i} with respect to xx around (x𝖾(ε),β𝖾(ε))(x^{\sf e}(\varepsilon),\beta^{\sf e}(\varepsilon)) can be made arbitrarily small by using sufficiently small ε=ε1\varepsilon=\varepsilon_{1}. In addition, since γ>0\gamma>0 can be chosen independently of ε\varepsilon, sufficiently large γ=γ¯\gamma=\bar{\gamma} enables us to take a neighborhood Br1(x𝖾,β𝖾)B_{r_{1}}(x^{\sf e},\beta^{\sf e}) where

i=1N(12W1,i(xi,P~(x,β))+W2,i(x,β))\displaystyle\sum_{i=1}^{N}\left(-\frac{1}{2}W_{1,i}\left(x_{i},\tilde{P}(x,\beta)\right)+W_{2,i}(x,\beta)\right)
+γ(12W3(x,β))<0,(x,β)Br1(x𝖾,β𝖾)\{(x𝖾,β𝖾)}.\displaystyle+\gamma\left(-\frac{1}{2}W_{3}(x,\beta)\right)<0,\ \forall(x,\beta)\in B_{r_{1}}(x^{\sf e},\beta^{\sf e})\backslash\{(x^{\sf e},\beta^{\sf e})\}. (34)

Next, it follows from (x𝖾,β𝖾)Exp(σ)(x^{\sf e},\beta^{\sf e})\in{\rm Exp}(\sigma) that

xiKij|x=xe(ε)\displaystyle\nabla_{x_{i}}K_{ij}|_{x=x^{e}(\varepsilon)} =2εexp(xi𝖾(ε)xj𝖽𝒢i2ε)\displaystyle=-\frac{2}{\varepsilon}\exp\left(-\frac{\|x_{i}^{\sf e}(\varepsilon)-x_{j}^{\sf d}\|_{{\mathcal{G}}_{i}}^{2}}{\varepsilon}\right)
×𝒢i(xi𝖾(ε)xj𝖽)0,asε+0.\displaystyle\qquad\times{\mathcal{G}}_{i}(x_{i}^{\sf e}(\varepsilon)-x_{j}^{\sf d})\rightarrow 0,\ {\rm as}\ \varepsilon\rightarrow+0.

Since W4W_{4} and W5W_{5} depend on (x,β)(x,\beta) only via KK, their variation around (x𝖾(ε),β𝖾(ε))(x^{\sf e}(\varepsilon),\beta^{\sf e}(\varepsilon)) can be made arbitrarily small by taking sufficiently small ε>0\varepsilon>0. Therefore, under the assumption that (x𝖾(ε),β𝖾(ε))(x^{\sf e}(\varepsilon),\beta^{\sf e}(\varepsilon)) is isolated, for any given γ>0\gamma>0, we can take ε=ε2(γ)\varepsilon=\varepsilon_{2}(\gamma) such that there exists a neighborhood Br2(x𝖾,β𝖾)B_{r_{2}}(x^{\sf e},\beta^{\sf e}) where

i=1N(12W1,i(xi,P~(x,β)))+γ(12W3(x,β)+W4(x,β)\displaystyle\sum_{i=1}^{N}\left(-\frac{1}{2}W_{1,i}(x_{i},\tilde{P}(x,\beta))\right)+\gamma\biggl{(}-\frac{1}{2}W_{3}(x,\beta)+W_{4}(x,\beta)
+W5(x,β))<0,(x,β)Br2(x𝖾,β𝖾)\{(x𝖾,β𝖾)}.\displaystyle+W_{5}(x,\beta)\biggr{)}<0,\forall(x,\beta)\in B_{r_{2}}(x^{\sf e},\beta^{\sf e})\backslash\{(x^{\sf e},\beta^{\sf e})\}. (35)

By combining (34) and (35), we obtain (33) for r=min{r1,r2}r=\min\{r_{1},r_{2}\}, γ=γ¯\gamma=\bar{\gamma}, and ε=min{ε1,ε2(γ¯)}\varepsilon=\min\{\varepsilon_{1},\varepsilon_{2}(\bar{\gamma})\}, which completes the proof. ∎

VII Conclusion

In this paper, we presented the concept of Sinkhorn MPC, which combines MPC and the Sinkhorn algorithm to achieve scalable, cost-effective transport over dynamical systems. For linear systems with an energy cost, we analyzed the ultimate boundedness and the asymptotic stability for Sinkhorn MPC based on the stability of the constrained MPC and the conventional Sinkhorn algorithm.

On the other hand, in the numerical example, we observed that the regularization parameter plays a key role in the trade-off between the stationary and transient behaviors for Sinkhorn MPC. Hence, an important direction for future work is to investigate the design of a time-varying regularization parameter to balance the trade-off. Another direction is to extend our results to nonlinear systems with state constraints. In addition, the computational complexity still can be a problem for nonlinear systems. These problems will be addressed in future work.

References

  • [1] C. Villani, Topics in Optimal Transportation.   American Mathematical Soc., 2003, no. 58.
  • [2] Y. Chen, T. T. Georgiou, and M. Pavon, “Optimal transport over a linear dynamical system,” IEEE Transactions on Automatic Control, vol. 62, no. 5, pp. 2137–2152, 2017.
  • [3] ——, “Steering the distribution of agents in mean-field games system,” Journal of Optimization Theory and Applications, vol. 179, no. 1, pp. 332–357, 2018.
  • [4] K. Bakshi, D. D. Fan, and E. A. Theodorou, “Schrödinger approach to optimal control of large-size populations,” IEEE Transactions on Automatic Control, vol. 66, no. 5, pp. 2372–2378, 2020.
  • [5] M. H. de Badyn, E. Miehling, D. Janak, B. Açıkmeşe, M. Mesbahi, T. Başar, J. Lygeros, and R. S. Smith, “Discrete-time linear-quadratic regulation via optimal transport,” arXiv preprint arXiv:2109.02347, 2021.
  • [6] J. Yu, S.-J. Chung, and P. G. Voulgaris, “Target assignment in robotic networks: Distance optimality guarantees and hierarchical strategies,” IEEE Transactions on Automatic Control, vol. 60, no. 2, pp. 327–341, 2014.
  • [7] A. R. Mosteo, E. Montijano, and D. Tardioli, “Optimal role and position assignment in multi-robot freely reachable formations,” Automatica, vol. 81, pp. 305–313, 2017.
  • [8] D. Q. Mayne, “Model predictive control: Recent developments and future promise,” Automatica, vol. 50, no. 12, pp. 2967–2986, 2014.
  • [9] M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimal transport,” Advances in Neural Information Processing Systems, vol. 26, pp. 2292–2300, 2013.
  • [10] H. W. Kuhn, “The Hungarian method for the assignment problem,” Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955.
  • [11] G. Peyré and M. Cuturi, “Computational optimal transport: With applications to data science,” Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, 2019.
  • [12] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal Control.   John Wiley & Sons, 2012.
  • [13] W. Kwon and A. Pearson, “On the stabilization of a discrete constant linear system,” IEEE Transactions on Automatic Control, vol. 20, no. 6, pp. 800–801, 1975.
  • [14] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. Scokaert, “Constrained model predictive control: Stability and optimality,” Automatica, vol. 36, no. 6, pp. 789–814, 2000.
  • [15] W. Krabs and S. Pickl, Dynamical Systems: Stability, Controllability and Chaotic Behavior.   Springer-Verlag, 2010.