This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convergence analysis of the discrete consensus-based optimization algorithm with random batch interactions and heterogeneous noises

Dongnam Ko Department of Mathematics, The Catholic University of Korea,
Jibongro 43, Bucheon, Gyeonggido 14662, Republic of Korea
[email protected]
Seung-Yeal Ha Department of Mathematical Sciences and Research Institute of Mathematics,
Seoul National University, Seoul 08826
and School of Mathematics, Korea Institute for Advanced Study,
Hoegiro 85, Seoul 02455, Republic of Korea
[email protected]
Shi Jin School of Mathematical Sciences, Institute of Natural Sciences, and MOE-LSC,
Shanghai Jiao Tong University, Shanghai 200240, China
[email protected]
 and  Doheon Kim School of Mathematics, Korea Institute for Advanced Study,
Hoegiro 85, Seoul 02455, Republic of Korea
[email protected]
Abstract.

We present stochastic consensus and convergence of the discrete consensus-based optimization (CBO) algorithm with random batch interactions and heterogeneous external noises. Despite the wide applications and successful performance in many practical simulations, the convergence of the discrete CBO algorithm was not rigorously investigated in such a generality. In this work, we introduce a generalized discrete CBO algorithm with a weighted representative point and random batch interactions, and show that the proposed discrete CBO algorithm exhibits stochastic consensus and convergence toward the common equilibrium state exponentially fast under suitable assumptions on system parameters. For this, we recast the given CBO algorithm with random batch interactions as a discrete consensus model with a random switching network topology, and then we use the mixing property of interactions over sufficiently long time interval to derive stochastic consensus and convergence estimates in mean square and almost sure senses. Our proposed analysis significantly improves earlier works on the convergence analysis of CBO models with full batch interactions and homogeneous external noises.

Key words and phrases:
Consensus, external noise, interacting particle system, random batch interactions, randomly switching network topology
2010 Mathematics Subject Classification:
37M99, 37N30, 65P99
Acknowledgment. The work of S.-Y. Ha was supported by National Research Foundation of Korea (NRF-2020R1A2C3A01003881), the work of S. Jin was supported by National Natural Science Foundation of China grant 12031013, Shanghai Municipal Science and Technology Major Project 2021SHZDZX0102, and Science and Technology Commission of Shanghai Municipality grant No. 20JC1414100, the work of D. Kim was supported by a KIAS Individual Grant (MG073901) at Korea Institute for Advanced Study, and the work of D. Ko was supported by the Catholic University of Korea, Research Fund, 2021, and by National Research Foundation of Korea (NRF-2021R1G1A1008559).

1. Introduction

Swarm intelligence [4, 22] provides remarkable meta-heuristic gradient-free optimization methods. Motivated by collective behaviors [1] of herds, flock, colonies or schools, several population-based meta-heuristic optimization techniques were proposed in literature [29, 30]. To name a few, particle swarm optimization [23], ant colony optimization [13], genetic algorithm [20] and grey wolf optimizer [26] are popular meta-heuristic algorithms which have attracted researchers from various disciplines over the last two decades. Among them, consensus-based optimization (CBO) algorithm [27, 5] is a variant methodology of particle swarm optimization, which adopts the idea of consensus mechanism [7, 9] in a multi-agent system. The mechanism is a dynamic process for agent (particle) to approach the common consensus via cooperation [3, 6, 8, 15, 16, 17, 18, 19, 21, 28]. CBO algorithm was further studied as a system of stochastic differential equations [27] and has also been applied to handle objective functions on constrained search spaces, for example hypersurfaces [15], high-dimensional spaces [6], the Stiefel manifold [19, 24], etc.

In this paper, we are interested in stochastic consensus estimates for a discrete CBO algorithm incorporating two components: “heterogeneous external noises” and “random batch interactions”. In [6], a CBO algorithm with these two aspects was studied numerically, while a previous analytical study of CBO in [18] does not cover these aspects.

We first briefly describe the discrete CBO algorithm in [6, 18]. Suppose L:dL:\mathbb{R}^{d}\to\mathbb{R} is non-convex objective function to minimize. As an initial guess, we introduce initial data of particles 𝕩0id{\mathbb{x}}_{0}^{i}\in\mathbb{R}^{d} for i=1,,Ni=1,\dots,N. This is commonly sampled from random variables as in Monte Carlo methods, independent of other system parameters. Then, the following CBO algorithm [6, 18] governs temporal evolution of each sample point 𝕩ni{\mathbb{x}}_{n}^{i} to find the optimal value among the values on the trajectories of particles:

(1.1) {𝕩n+1i=𝕩niγ(𝕩ni𝕩¯n)l=1d(xni,lx¯n,l)ηnl𝕖l,i𝒩,𝕩¯n:=j=1N𝕩njeβL(𝕩nj)j=1NeβL(𝕩nj),n0,l=1,,d,\begin{cases}\displaystyle{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}^{i}_{n}-\gamma({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\eta_{n}^{l}\mathbb{e}_{l},\quad i\in{\mathcal{N}},\\ \displaystyle{\bar{\mathbb{x}}}_{n}^{*}:=\frac{\sum_{j=1}^{N}{\mathbb{x}}^{j}_{n}e^{-\beta L({\mathbb{x}}^{j}_{n})}}{\sum_{j=1}^{N}e^{-\beta L({\mathbb{x}}^{j}_{n})}},\quad n\geq 0,\quad l=1,\cdots,d,\end{cases}

where the random variables {ηnl}n,l\{\eta^{l}_{n}\}_{n,l} are independent and identically distributed(i.i.d.) with

𝔼[ηnl]=0,𝔼[|ηnl|2]=ζ2,n0,l=1,,d.\mathbb{E}[\eta_{n}^{l}]=0,\quad\mathbb{E}[|\eta_{n}^{l}|^{2}]=\zeta^{2},\quad n\geq 0,\quad l=1,\cdots,d.

In particular, from the current data of distributed agents, the network interactions between agents share the information of neighbors, and each agent asymptotically approaches to the minimizer of the objective function. Hence, the main role of the CBO algorithm is to make the sample point 𝕩ni{\mathbb{x}}_{n}^{i} converge to the global minimizer 𝕩{\mathbb{x}}^{*} of LL.

In the sequel, we describe a more generalized CBO algorithm which includes (1.1) as its special case. Let 𝕩ni=(xni,1,,xni,d)d{\mathbb{x}}_{n}^{i}=(x_{n}^{i,1},\dots,x_{n}^{i,d})\in\mathbb{R}^{d} be the state of the ii-th sample point at time nn in the search space, and the set [i]n[i]_{n} denotes an index set of agents interacting with ii at time nn (neighboring sample points in the interaction network). For notational simplicity, we also set

𝒳n:=(𝕩n1,,𝕩nN),𝒩:={1,,N}.{\mathcal{X}}_{n}:=({\mathbb{x}}_{n}^{1},\cdots,{\mathbb{x}}_{n}^{N}),\quad{\mathcal{N}}:=\{1,\cdots,N\}.

For a nonempty set S𝒩S\subset{\mathcal{N}} and an index j𝒩j\in{\mathcal{N}}, we introduce a weight function ωS,j:(d)N0\omega_{S,j}:(\mathbb{R}^{d})^{N}\to\mathbb{R}_{\geq 0} which satisfies the following relations:

ωS,j(𝒳)0,jSωS,j(𝒳)=1,ωS,j(𝒳)=0 if jS,\omega_{S,j}({\mathcal{X}})\geq 0,\quad\sum_{j\in S}\omega_{S,j}({\mathcal{X}})=1,\quad\omega_{S,j}({\mathcal{X}})=0~{}\mbox{ if }~{}j\notin S,

and the convex combination of 𝕩nj\mathbb{x}_{n}^{j}:

(1.2) 𝕩¯nS,:=j=1NωS,j(𝒳)𝕩njconv{𝕩nj:jS},S𝒩.{\bar{\mathbb{x}}}_{n}^{S,*}:=\sum_{j=1}^{N}\omega_{S,j}({\mathcal{X}}){\mathbb{x}}_{n}^{j}\in\operatorname{conv}\{{\mathbb{x}}_{n}^{j}:j\in S\},\quad\forall~{}\varnothing\neq S\subseteq{\mathcal{N}}.

Before we introduce our governing CBO algorithm, we briefly discuss two key components (heterogeneous external noises and random batch interactions) one by one.

First, a sequence of random matrices {(ηni,l)1iN,1ld}n0\{(\eta^{i,l}_{n})_{1\leq i\leq N,1\leq l\leq d}\}_{n\geq 0} are assumed to be i.i.d. with respect to n0n\geq 0, with finite first two moments:

𝔼[ηni,l]=0,𝔼[|ηni,l|2]ζ2,ζ0,n0,l=1,,d,i𝒩.\mathbb{E}[\eta_{n}^{i,l}]=0,\quad\mathbb{E}[|\eta_{n}^{i,l}|^{2}]\leq\zeta^{2},\quad{\zeta\geq 0,}\quad{n\geq 0,}\quad l=1,\cdots,d,\quad i\in{\mathcal{N}}.

Second, we discuss “random batch method (RBM)” in [6, 21], to introduce random batch interactions. For the convergence analysis, we need to clarify how RBM provides a randomly switching network. At the nn-th time instant, we choose a partition of 𝒩{\mathcal{N}} randomly, n={1n,,NPn}{\mathcal{B}}^{n}=\{{\mathcal{B}}_{1}^{n},\ldots,{\mathcal{B}}_{\lceil\frac{N}{P}\rceil}^{n}\} of NP\lceil\frac{N}{P}\rceil batches (subsets) with sizes at most P>1P>1 as follows:

{1,,N}=1n2nNPn,|in|=P,i=1,,NP1,|NPn|P.\{1,\ldots,N\}={\mathcal{B}}_{1}^{n}\cup{\mathcal{B}}_{2}^{n}\cup\cdots\cup{\mathcal{B}}_{\lceil\frac{N}{P}\rceil}^{n},\quad|{\mathcal{B}}_{i}^{n}|=P,\quad i=1,\cdots,\Big{\lceil}\frac{N}{P}\Big{\rceil}-1,\quad|{\mathcal{B}}_{\lceil\frac{N}{P}\rceil}^{n}|\leq P.

Let 𝒜\mathcal{A} be a set of all such partitions for 𝒩{\mathcal{N}}. Then, the choices of batches are independent at each time nn and follow the uniform law in the finite set of partitions 𝒜\mathcal{A}. The random variables n:(Ω,,)𝒜{\mathcal{B}}^{n}:(\Omega,\mathcal{F},\mathbb{P})\to\mathcal{A} yield random batches at time nn for each realization ωΩ\omega\in\Omega, which are also assumed to be independent of the initial data {𝕩0i}i\{{\mathbb{x}}_{0}^{i}\}_{i} and noise {ηni,l}n,i,l\{\eta_{n}^{i,l}\}_{n,i,l}. Thus, for each time nn, the index set of neighbors [i]nn[i]_{n}\in{\mathcal{B}}^{n} is an element of the partition from 𝒩{\mathcal{N}} containing ii. With the aforementioned key components, we are ready to consider the Cauchy problem to the generalized discrete CBO algorithm under random batch interactions and heterogeneous noises:

(1.3) {𝕩n+1i=𝕩niγ(𝕩ni𝕩¯n[i]n,)l=1d(xni,lx¯n[i]n,,l)ηni,l𝕖l,n0,𝒳0=(𝕩01,,𝕩0N)dN,i𝒩,\begin{cases}\displaystyle{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}^{i}_{n}-\gamma({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{[i]_{n},*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{[i]_{n},*,l})\eta_{n}^{i,l}\mathbb{e}_{l},\quad n\geq 0,\\ \displaystyle{\mathcal{X}}_{0}=({\mathbb{x}}_{0}^{1},\cdots,{\mathbb{x}}_{0}^{N})\in\mathbb{R}^{dN},\quad i\in{\mathcal{N}},\end{cases}

where the initial data 𝒳0{\mathcal{X}}_{0} can be deterministic or random, and the drift rate γ(0,1)\gamma\in(0,1) is a positive constant. In this way, the dynamics (1.3) includes random batch interactions and heterogeneous noises.

In [18], the consensus analysis and error estimates for (1.1) have been done, which is actually the following special case of (1.3): for a positive constant β\beta,

(1.4) 𝕩¯n[i]n,=j=1NeβL(𝕩nj)k=1NeβL(𝕩nk)𝕩njandηni,l=ηnl,i𝒩.{\bar{\mathbb{x}}}_{n}^{[i]_{n},*}=\sum_{j=1}^{N}\frac{e^{-\beta L({\mathbb{x}}_{n}^{j})}}{\sum_{k=1}^{N}e^{-\beta L({\mathbb{x}}_{n}^{k})}}{\mathbb{x}}_{n}^{j}\quad\mbox{and}\quad\eta_{n}^{i,l}=\eta_{n}^{l},\quad\forall~{}i\in{\mathcal{N}}.

Here both right-hand sides are independent of ii, namely the full batch P=NP=N is used, all agent interacts with all the other agents, and the noise is the same for all particles. We refer to Section 2.1 for details.

In this paper, we are interested in the following two questions.

  • (Q1: Emergence of stochastic consensus state) Is there a common consensus state 𝕩d{\mathbb{x}}_{\infty}\in\mathbb{R}^{d} for system (1.3), i.e.,

    limn𝕩ni=𝕩in suitable sense,for all i𝒩?\lim_{n\to\infty}{\mathbb{x}}_{n}^{i}={\mathbb{x}}_{\infty}\quad\mbox{in suitable sense,}\quad\mbox{for all~{}$i\in{\mathcal{N}}$}?
  • (Q2: Optimality of the stochastic consensus state) If so, then is the consensus state 𝕩d{\mathbb{x}}_{\infty}\in\mathbb{R}^{d} an optimal point?

    L(𝕩)=inf{L(𝕩):𝕩d}.L({\mathbb{x}}_{\infty})=\inf\{L({\mathbb{x}}):{\mathbb{x}}\in\mathbb{R}^{d}\}.

Among the above two posed questions, we will mostly focus on the first question (Q1). In fact, the continuous analogue for (1.3) has been dealt with in literature. In the first paper of CBO [3], the CBO algorithm without external noise (σ=0\sigma=0) has been studied, and the formation of consensus has been proved under suitable assumption on the network structure. However, the optimality was only shown by numerical simulation even for this deterministic case. In [6], the dynamics (1.3) has been investigated via a mean-field limit (NN\to\infty). In this case, the particle density distribution, which is the limit of the empirical measure 1Niδ𝕩ni\frac{1}{N}\sum_{i}\delta_{{\mathbb{x}}_{n}^{i}} in NN\to\infty, satisfies a nonlinear Fokker-Planck equation. By using the analytical structure of the Fokker-Planck equation, the consensus estimate was established. The optimality of the limit value is also estimated by Laplace’s principle, as β\beta\to\infty (see Section 2). Rigorous convergence and error estimates to the continuous and discrete algorithms for particle system (1.3) with fixed NN were addressed in authors’ recent works [17, 18] for the special setting (1.4).

Before we present our main results, we introduce several concepts of stochastic consensus and convergence as follows.

Definition 1.1.

Let {𝒳n}n0={𝕩ni:1iN}n0\{{\mathcal{X}}_{n}\}_{n\geq 0}=\{\mathbb{x}_{n}^{i}:1\leq i\leq N\}_{n\geq 0} be a discrete stochastic process whose dynamics is governed by (1.3). Then, several concepts of stochastic consensus and convergence can be defined as follows.

  1. (1)

    The random process {𝒳n}n0\{{\mathcal{X}}_{n}\}_{n\geq 0} exhibits “consensus in expectation” or “almost sure consensus”, respectively if

    (1.5) limnmaxi,j𝔼𝕩ni𝕩nj=0orlimn𝕩ni𝕩nj=0,i,j𝒩a.s.,\lim_{n\to\infty}\max_{i,j}\mathbb{E}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{n}^{j}\|=0\quad\mbox{or}\quad\lim_{n\to\infty}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{n}^{j}\|=0,~{}~{}\forall~{}~{}i,j\in{\mathcal{N}}\quad\mbox{a.s.},

    where \|\cdot\| denotes the standard 2\ell^{2}-norm in d{\mathbb{R}}^{d}.

  2. (2)

    The random process {𝒳n}n0\{{\mathcal{X}}_{n}\}_{n\geq 0} exhibits “convergence in expectation” or “almost sure convergence”, respectively if there exists a random vector 𝕩\mathbb{x}_{\infty} such that

    (1.6) limnmaxi𝔼𝕩ni𝕩=0orlimn𝕩ni𝕩=0,i𝒩a.s.\lim_{n\to\infty}\max_{i}\mathbb{E}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{\infty}\|=0\quad\mbox{or}\quad\lim_{n\to\infty}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{\infty}\|=0,~{}~{}\forall~{}~{}i\in{\mathcal{N}}\quad\mbox{a.s.}

In addition, we introduce some terminologies. For a given set of NN vectors {𝕩i}i=1N\{{\mathbb{x}}^{i}\}_{i=1}^{N}, we use XX to denote the N×dN\times d matrix whose ii-th row is (𝕩i)({\mathbb{x}}^{i})^{\top}, and the ll-th column vector 𝔵l\mathfrak{x}^{l}, as

X:=(x1,1x1,dxN,1xN,d),𝕩i:=(xi,1xi,d),𝔵l:=(x1,lxN,l),𝒟(𝔵l):=maxixi,lminixi,l.X:=\left(\begin{array}[]{ccc}x^{1,1}&\cdots&x^{1,d}\\ \vdots&\vdots&\cdots\\ x^{N,1}&\cdots&x^{N,d}\end{array}\right),\quad{\mathbb{x}}^{i}:=\left(\begin{array}[]{c}x^{i,1}\\ \vdots\\ x^{i,d}\end{array}\right),\quad{\mathfrak{x}}^{l}:=\left(\begin{array}[]{c}x^{1,l}\\ \vdots\\ x^{N,l}\end{array}\right),\quad{\mathcal{D}}(\mathfrak{x}^{l}):=\max_{i}x^{i,l}-\min_{i}x^{i,l}.

The main results of this paper are two-fold. First, we present stochastic consensus estimates for (1.5) in the sense of Definition 1.1 (1). For this, we begin with the dynamics of the ll-th column vector 𝔵l\mathfrak{x}^{l}:

𝔵n+1l=(An+Bn)𝔵nl,n0{\mathfrak{x}}_{n+1}^{l}=(A_{n}+B_{n}){\mathfrak{x}}_{n}^{l},\quad n\geq 0

for some random matrices AnA_{n} and BnB_{n} corresponding to random source due to, first, only random batch interactions, and second, interplay between the external white noises ηni,l\eta_{n}^{i,l} and random batch interactions, respectively (see Section 2.1 for details). Then the diameter functional 𝒟(𝔵l){\mathcal{D}}(\mathfrak{x}^{l}) satisfies

(1.7) 𝒟(𝔵n+1l)(1α(An+Bn))𝒟(𝔵nl),{\mathcal{D}}(\mathfrak{x}_{n+1}^{l})\leq(1-\alpha(A_{n}+B_{n})){\mathcal{D}}(\mathfrak{x}_{n}^{l}),

where α(A)\alpha(A) is the ergodicity coefficient of matrix AA (see Definition 3.1).

For the case (1.4) in which interactions are full batch (P=NP=N) with a constant weight function ω𝒩,j\omega_{\mathcal{N},j} and the noises are homogenous, one can show that the ergodicity α(An+Bn)\alpha(A_{n}+B_{n}) is strictly positive for small ζ\zeta. Then, the recursive relation (1.7) yields desired exponential consensus. Moreover, the assumption (1.4) also implies that the dynamics of xni,lxnj,lx^{i,l}_{n}-x^{j,l}_{n} follows a closed form of stochastic difference equation (see [18]). Then, following the idea of geometric Brownian motion from [17], the decay of xni,lxnj,lx^{i,l}_{n}-x^{j,l}_{n} can be obtained. However, in our setting (1.3), the aforementioned blue picture breaks down. In fact, we cannot even guarantee the nonnegativity of α(An+Bn)\alpha(A_{n}+B_{n}). From the definition of ergodic coefficient, the positive ergodicity coefficient implies that any two particles are attracted thanks to network structure, which is not possible when particles are separated by random batches. Hence, the recursive relation (1.7) is not enough to derive an exponential decay of 𝒟(𝔵nl){\mathcal{D}}(\mathfrak{x}_{n}^{l}) as it is. On the other hand, to quantify enough mixing effect via the network topology, we use the mm-transition relation with m1m\gg 1 so that

0<α(k=nn+m1(Ak+Bk))<1.0<\alpha\Big{(}\prod_{k=n}^{n+m-1}(A_{k}+B_{k})\Big{)}<1.

In this case, the mm-transition relation satisfies

𝔵n+ml=k=nn+m1(Ak+Bk)𝔵nl,n0.{\mathfrak{x}}_{n+m}^{l}=\prod_{k=n}^{n+m-1}(A_{k}+B_{k}){\mathfrak{x}}_{n}^{l},\quad n\geq 0.

Again, we use the elementary property of ergodicity coefficient presented in Lemma 3.3 to find

𝒟(𝔵n+ml)(1α((An+m1+Bn+m1)(An+m2+Bn+m2)(An+Bn)))𝒟(𝔵nl).{\mathcal{D}}(\mathfrak{x}_{n+m}^{l})\leq(1-\alpha((A_{n+m-1}+B_{n+m-1})(A_{n+m-2}+B_{n+m-2})\cdots(A_{n}+B_{n}))){\mathcal{D}}(\mathfrak{x}_{n}^{l}).

One of the crucial novelties of this paper lies on the estimation technique for a product of matrix summations. When there is no noise (Bk=0)(B_{k}=0), we can use analysis on randomly switching topologies [12] to derive the positivity of the ergodic coefficient. In order to handle heterogenous external noises, we adopt the concept of negative ergodicity from general nn-by-nn matrices and treat BkB_{k} as a perturbation (see Lemma 4.1 and Lemma 4.2 for details). This yields an exponential decay of 𝒟(𝔵l){\mathcal{D}}(\mathfrak{x}^{l}) in suitable stochastic senses for a sufficiently small ζ\zeta, the variance of the noise (see Theorem 4.1 and Theorem 4.2): there exist positive constants Λ1\Lambda_{1}, Λ2\Lambda_{2}, C1C_{1} and a random variable C2=C2(ω)C_{2}=C_{2}(\omega) such that

(1.8) 𝔼𝒟(𝔵nl)C1exp(Λ1n)and𝒟(𝔵nl)C2(ω)exp(Λ2n)a.s.\displaystyle\begin{aligned} &\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C_{1}\exp\left(-\Lambda_{1}n\right)\quad\text{and}\quad\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C_{2}(\omega)\exp\left(-\Lambda_{2}n\right)\quad\mbox{a.s.}\end{aligned}

This implies

𝔼𝕩ni𝕩njC1eΛ1nandmaxi,j𝕩ni𝕩njC2(ω)eΛ2na.s.\displaystyle\begin{aligned} &{\mathbb{E}}\|\mathbb{x}_{n}^{i}-\mathbb{x}_{n}^{j}\|\leq C_{1}e^{-\Lambda_{1}n}\quad\text{and}\quad\max_{i,j}\|\mathbb{x}_{n}^{i}-\mathbb{x}_{n}^{j}\|\leq C_{2}(\omega)e^{-\Lambda_{2}n}\quad\mbox{a.s.}\end{aligned}

Second, we deal with the convergence analysis (1.6) of stochastic flow {𝕩ni}\{\mathbb{x}_{n}^{i}\}. One first sees that the dynamics of 𝔵nl\mathfrak{x}_{n}^{l} can be described as

(1.9) 𝔵nl𝔵0l=γk=0n1(INWk)𝔵klk=0n1Hkl(INWk)𝔵kl,1ld,n0.\mathfrak{x}_{n}^{l}-\mathfrak{x}_{0}^{l}=-\gamma\sum_{k=0}^{n-1}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}-\sum_{k=0}^{n-1}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l},\quad 1\leq l\leq d,~{}n\geq 0.

Then, applying the strong law of large numbers as in [18], the almost sure exponential convergence (LABEL:A-6) for 𝒟(𝔵nl)\mathcal{D}(\mathfrak{x}_{n}^{l}) induces the convergence of the first series in the R.H.S. of (1.9). On the other hand, the convergence of the second series will be shown using Doob’s martingale convergence theorem. Thus, for sufficiently small ζ\zeta, there exists a random variable 𝕩{\mathbb{x}}_{\infty} such that

limn𝕩ni=𝕩a.s.,i𝒩.\lim_{n\to\infty}{\mathbb{x}}_{n}^{i}={\mathbb{x}}_{\infty}\quad\mbox{a.s.,}\quad i\in{\mathcal{N}}.

We refer to Lemma 5.1 for details. Moreover, the above convergence result can be shown to be exponential in expectation and almost sure senses. A natural estimate yields

|xmi,lxni,l|k=nm1(γ+|ηki,l|)𝒟(𝔵kl),|x_{m}^{i,l}-x_{n}^{i,l}|\leq\sum_{k=n}^{m-1}(\gamma+|\eta_{k}^{i,l}|)\mathcal{D}(\mathfrak{x}_{k}^{l}),

however, the external noises ηni,l\eta_{n}^{i,l} are not uniformly bounded in almost sure sense. Instead, the noises with decay in time, ηni,leεn\eta_{n}^{i,l}e^{-\varepsilon n}, is uniformly bounded for any ε>0\varepsilon>0, as presented in Lemma 5.2. This is a key lemma employed in the derivation of exponential convergence: there exists a positive constant C3>0C_{3}>0 and a random variable C4(ω)C_{4}(\omega) such that

𝔼𝕩i𝕩niC3eΛ1nand𝕩𝕩niC4(ω)eΛ2na.s.{\mathbb{E}}\|{\mathbb{x}}_{\infty}^{i}-{\mathbb{x}}_{n}^{i}\|\leq C_{3}e^{-\Lambda_{1}n}\quad\mbox{and}\quad\|\mathbb{x}_{\infty}-\mathbb{x}_{n}^{i}\|\leq C_{4}(\omega)e^{-\Lambda_{2}n}\quad\mbox{a.s.}

(See Theorem 5.1 for details).

The rest of this paper is organized as follows. In Section 2, we briefly present the discrete CBO algorithm [6] and review consensus and convergence result [18] for the case of homogeneous external noises. In Section 3, we study stochastic consensus estimates in the absence of external noises so that the randomness in the dynamics is mainly caused by random batch interactions. In Section 4, we consider the stochastic consensus arising from the interplay of two random sources (heterogeneous external noises and random batch interactions) when the standard deviation of heterogeneous external noise is sufficiently small. In Section 5, we provide a stochastic convergence analysis for the discrete CBO flow based on the consensus results. In Section 6, we provide monotonicity of the maximum of the objective function along the discrete CBO flow, and present several numerical simulations to confirm analytical results. Finally, Section 7 is devoted to a brief summary of our main results and some remaining issues to be explored in a future work.

Gallery of notation: We use superscript to denote the particle number and components of a vector, for example, xi,kx^{i,k} denotes the kk-th component of the ii-th’s particle, i.e.,

𝕩i=(xi,1,,xi,d)d,i𝒩.{\mathbb{x}}^{i}=(x^{i,1},\cdots,x^{i,d})\in{\mathbb{R}}^{d},\quad i\in{\mathcal{N}}.

We also use simplified notation for summation and maximization:

i:=i=1N,i,j:=i=1Nj=1N,maxi:=max1iN,maxi,j:=max1i,jN.\sum_{i}:=\sum_{i=1}^{N},\quad\sum_{i,j}:=\sum_{i=1}^{N}\sum_{j=1}^{N},\quad\max_{i}:=\max_{1\leq i\leq N},\quad\max_{i,j}:=\max_{1\leq i,j\leq N}.

For a given state vector 𝔵=(x1,,xN)N{\mathfrak{x}}=(x^{1},\cdots,x^{N})\in\mathbb{R}^{N}, we introduce its state diameter 𝒟(𝔵){\mathcal{D}}({\mathfrak{x}}):

(1.10) 𝒟(𝔵):=maxi,j(xixj)=maxiximinixi.{\mathcal{D}}(\mathfrak{x}):=\max_{i,j}(x^{i}-x^{j})=\max_{i}x^{i}-\min_{i}x^{i}.

For 𝕩=(x1,,xd)d\mathbb{x}=(x^{1},\cdots,x^{d})\in\mathbb{R}^{d}, we denote its \ell^{\infty}-norm by 𝕩:=maxk|xk|\|\mathbb{x}\|_{\infty}:=\displaystyle\max_{k}|x^{k}|.
Lastly, for integers n1n2n_{1}\leq n_{2} and square matrices Mn1,Mn1+1,,Mn2M_{n_{1}},M_{n_{1}+1},\dots,M_{n_{2}} with the same size, we define

k=n1n2Mk:=Mn2Mn1+1Mn1.\prod_{k=n_{1}}^{n_{2}}M_{k}:=M_{n_{2}}\cdots M_{n_{1}+1}M_{n_{1}}.

Note that the ordering in the product is important, because the matrix product is not commutative in general.

2. Preliminaries

In this section, we briefly present the discrete CBO algorithm (1.3) and then recall convergence and optimization results from [18].

2.1. The CBO algorithms

Let 𝕩ti=(xti,1,,xti,d)d{\mathbb{x}}_{t}^{i}=(x_{t}^{i,1},\dots,x_{t}^{i,d})\in\mathbb{R}^{d} be the state of the ii-th sample point at time tt in search space, and we assume that state dynamics is governed by the CBO algorithm introduced in [6].

(2.1) {d𝕩ti=γ(𝕩ti𝕩¯t)dtσl=1d(xti,lx¯t,l)dWti,l𝕖l,t>0,i𝒩,𝕩¯t=(x¯t,1,,x¯t,d):=j=1N𝕩tjeβL(𝕩tj)k=1NeβL(𝕩tk).\begin{cases}\displaystyle d{\mathbb{x}}^{i}_{t}=-\gamma({\mathbb{x}}^{i}_{t}-{\bar{\mathbb{x}}}_{t}^{*})dt-\sigma\sum_{l=1}^{d}(x^{i,l}_{t}-{\bar{x}}_{t}^{*,l})dW_{t}^{i,l}\mathbb{e}_{l},\quad t>0,\quad i\in{\mathcal{N}},\\ \displaystyle{\bar{\mathbb{x}}}_{t}^{*}={(\bar{x}_{t}^{*,1},\cdots,\bar{x}_{t}^{*,d})}:=\frac{\sum_{j=1}^{N}{\mathbb{x}}^{j}_{t}e^{-\beta L({\mathbb{x}}^{j}_{t})}}{\sum_{k=1}^{N}e^{-\beta L({\mathbb{x}}^{k}_{t})}}.\end{cases}

Here γ(0,1)\gamma\in(0,1), σ\sigma and β>0\beta>0 are drift rate, noise intensity and reciprocal of temperature, respectively. {𝕖l}l=1d\{\mathbb{e}_{l}\}_{l=1}^{d} is the standard orthonormal basis in d\mathbb{R}^{d}, and Wti,lW_{t}^{i,l} (l=1,,d,i=1,,N)(l=1,\dots,d,~{}i=1,\dots,N) are i.i.d. standard one-dimensional Brownian motions.

Note the following discrete analogue of Laplace’s principle [10]:

limβj𝒩𝕩jeβL(𝕩j)j𝒩eβL(𝕩j)=centroid of the set{𝕩j:L(𝕩j)=mink𝒩L(𝕩k)}.\lim_{\beta\to\infty}\frac{\sum_{j\in{\mathcal{N}}}{\mathbb{x}}^{j}e^{-\beta L({\mathbb{x}}^{j})}}{\sum_{j\in{\mathcal{N}}}e^{-\beta L({\mathbb{x}}^{j})}}=\mbox{centroid of the set}~{}\left\{{\mathbb{x}}^{j}~{}:~{}L({\mathbb{x}}^{j})=\min_{k\in{\mathcal{N}}}L({\mathbb{x}}^{k})\right\}.

This explains why CBO algorithm (2.1) is expected to pick up the minimizer of LL.

Next, we consider time-discretization of (2.1). For this, we set

h:=Δt,𝕩ni:=𝕩nhi,n0,i𝒩.h:=\Delta t,\quad{\mathbb{x}}^{i}_{n}:={\mathbb{x}}^{i}_{nh},\quad n\geq 0,\quad i\in{\mathcal{N}}.

Consider the following three discrete algorithms based on [6, 18]: for n0,i=1,,N,n\geq 0,~{}~{}i=1,\cdots,N,

Model A:𝕩n+1i=𝕩niλh(𝕩ni𝕩¯n)l=1d(xni,lx¯n,l)σhZnl𝕖l,Model B:{𝕩^ni=𝕩¯n+eλh(𝕩ni𝕩¯n),𝕩n+1i=𝕩^nil=1d(x^ni,lx¯n,l)σhZnl𝕖l,Model C:𝕩n+1i=𝕩¯n+l=1d(xni,lx¯n,l)[exp((λ+12σ2)h+σhZnl)]𝕖l,\displaystyle\begin{aligned} &\mbox{Model A}:\quad{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}_{n}^{i}-\lambda h({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\sigma\sqrt{h}Z_{n}^{l}\mathbb{e}_{l},\\ &\mbox{Model B}:\quad\begin{cases}\displaystyle\hat{\mathbb{x}}^{i}_{n}={\bar{\mathbb{x}}}_{n}^{*}+e^{-\lambda h}({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{*}),\\ \displaystyle{\mathbb{x}}^{i}_{n+1}=\hat{\mathbb{x}}_{n}^{i}-\sum_{l=1}^{d}(\hat{x}^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\sigma\sqrt{h}Z_{n}^{l}\mathbb{e}_{l},\end{cases}\\ &\mbox{Model C}:\quad{\mathbb{x}}^{i}_{n+1}={\bar{\mathbb{x}}}_{n}^{*}+\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\left[\exp\left(-\left(\lambda+\frac{1}{2}\sigma^{2}\right)h+\sigma\sqrt{h}Z_{n}^{l}\right)\right]\mathbb{e}_{l},\end{aligned}

where the random variables {Znl}n,l\{Z^{l}_{n}\}_{n,l} are i.i.d standard normal distributions. Model A is precisely the Euler-Maruyama scheme applied to (2.1), and Models B and C are other discretizations of (2.1) introdueced in [6]. In [18], it is shown that the three models are special cases (with different choices of γ\gamma and ηnl\eta_{n}^{l}) of the following generalized scheme:

(2.2) {𝕩n+1i=𝕩niγ(𝕩ni𝕩¯n)l=1d(xni,lx¯n,l)ηnl𝕖l,i𝒩,𝕩¯n:=j=1N𝕩njeβL(𝕩nj)j=1NeβL(𝕩nj),n0,l=1,,d,\begin{cases}\displaystyle{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}^{i}_{n}-\gamma({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\eta_{n}^{l}\mathbb{e}_{l},\quad i\in{\mathcal{N}},\\ \displaystyle{\bar{\mathbb{x}}}_{n}^{*}:=\frac{\sum_{j=1}^{N}{\mathbb{x}}^{j}_{n}e^{-\beta L({\mathbb{x}}^{j}_{n})}}{\sum_{j=1}^{N}e^{-\beta L({\mathbb{x}}^{j}_{n})}},\quad n\geq 0,\quad l=1,\cdots,d,\end{cases}

where the random variables {ηnl}n,l\{\eta^{l}_{n}\}_{n,l} are i.i.d. with

𝔼[ηnl]=0,𝔼[|ηnl|2]=ζ2,n0,l=1,,d.\mathbb{E}[\eta_{n}^{l}]=0,\quad\mathbb{E}[|\eta_{n}^{l}|^{2}]=\zeta^{2},\quad n\geq 0,\quad l=1,\cdots,d.

Moreover, discrete algorithm (2.2) corresponds to the special case of (1.3):

S:=𝒩,ω𝒩,j=eβL(𝕩j)k=1NeβL(𝕩k),F𝒩(X):=j=1N𝕩jeβL(𝕩j)k=1NeβL(𝕩k),ηnlηni,l.{S}:={\mathcal{N}},\qquad\omega_{{\mathcal{N}},j}=\frac{e^{-\beta L({\mathbb{x}}^{j})}}{\sum_{k=1}^{N}e^{-\beta L({\mathbb{x}}^{k})}},\quad F_{{\mathcal{N}}}(X):=\sum_{j=1}^{N}\frac{{\mathbb{x}}^{j}e^{-\beta L({\mathbb{x}}^{j})}}{\sum_{k=1}^{N}e^{-\beta L({\mathbb{x}}^{k})}},\quad\eta_{n}^{l}\equiv\eta_{n}^{i,l}.

Therefore, these three models can be explained by the dynamics (1.3).

2.2. Previous results

In this subsection, we briefly summarize the previous results [18] on the convergence and error estimates for the case

ηni,lηnl\eta_{n}^{i,l}\equiv\eta_{n}^{l}

(thus noises are not heterogeneous). In the next theorem, we recall the convergence estimate of the discrete scheme (2.2) as follows.

Theorem 2.1.

(Convergence estimate [18]) Suppose system parameters γ\gamma and ζ\zeta satisfy

(γ1)2+ζ2<1,(\gamma-1)^{2}+\zeta^{2}<1,

and let {𝕩ti}\{{\mathbb{x}}_{t}^{i}\} be a solution process to (2.2). Then, one has the following assertions:

  1. (1)

    Consensus in expectation and in almost sure sense emerge asymptotically:

    limn𝔼|𝕩ni𝕩nj|2=0,|xni,lxnj,l|2|x0i,lx0j,l|2enYnl,a.s.ωΩ,\lim_{n\to\infty}\mathbb{E}|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}|^{2}=0,\quad|x^{i,l}_{n}-x^{j,l}_{n}|^{2}\leq|x^{i,l}_{0}-x^{j,l}_{0}|^{2}e^{-nY_{n}^{l}},\quad\mbox{a.s.}~{}\omega\in\Omega,

    for i,j𝒩,l=1,,d,i,j\in{\mathcal{N}},~{}~{}l=1,\cdots,d, and a random variable YnlY_{n}^{l} satisfying

    limnYnl(ω)=1(γ1)2ζ2>0,a.s.ωΩ.\lim_{n\to\infty}Y_{n}^{l}(\omega)=1-(\gamma-1)^{2}-\zeta^{2}>0,\quad\mbox{a.s.}~{}\omega\in\Omega.
  2. (2)

    There exists a common random vector 𝕩=(x1,,xd){\mathbb{x}}_{\infty}=(x_{\infty}^{1},\cdots,x_{\infty}^{d}) such that

    limn𝕩ni=𝕩a.s.,i𝒩.\lim\limits_{n\to\infty}{\mathbb{x}}_{n}^{i}={\mathbb{x}}_{\infty}~{}\mbox{a.s.},\quad i\in{\mathcal{N}}.
Remark 2.1.

In [18], an error estimate of (2.2) toward the global minimum is also obtained as a partial result on optimality. If the initial guess 𝕩in{\mathbb{x}}_{in} is good enough so that it surrounds the global minimum 𝕩{\mathbb{x}}_{*} and particles are close to each other, then the asymptotic consensus state 𝕩{\mathbb{x}}_{\infty} is close to 𝕩{\mathbb{x}}_{*}.

3. Discrete CBO algorithm with random batch interactions

In this section, we present matrix formulation of (1.3), some elementary estimates for ergodicity coefficient and stochastic consensus to system (1.3) in the absence of heterogeneous external noises.

3.1. A matrix reformulation

In this subsection, we present a matrix formulation of the following discrete CBO model:

(3.1) 𝕩n+1i=𝕩niγ(𝕩ni𝕩¯n[i]n,)l=1d(xni,lx¯n[i]n,,l)ηni,l𝕖l,n0,i𝒩.{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}^{i}_{n}-\gamma({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{[i]_{n},*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{[i]_{n},*,l})\eta_{n}^{i,l}\mathbb{e}_{l},\quad n\geq 0,\quad i\in{\mathcal{N}}.

Then, the ll-th component of (3.1) can be rewritten as

(3.2) xn+1i,l=(1γηni,l)xni,l+(γ+ηni,l)j[i]nω[i]n,j(Xn)xnj,l.x^{i,l}_{n+1}=(1-\gamma-\eta_{n}^{i,l})x^{i,l}_{n}+(\gamma+\eta_{n}^{i,l})\sum_{j\in[i]_{n}}\omega_{[i]_{n},j}(X_{n})x_{n}^{j,l}.

For n0n\geq 0 and l=1,,dl=1,\cdots,d, set

(3.3) Wn:=(ω[i]n,j(Xn))N×N,𝔵nl:=(xn1,l,,xnN,l),Hnl:=diag(ηn1,l,,ηnN,l).W_{n}:=(\omega_{[i]_{n},j}(X_{n}))\in\mathbb{R}^{N\times N},\quad{\mathfrak{x}}_{n}^{l}:=(x_{n}^{1,l},\dots,x_{n}^{N,l})^{\top},\quad H_{n}^{l}:=\operatorname{diag}(\eta_{n}^{1,l},\dots,\eta_{n}^{N,l}).

Note that 𝔵nl{\mathfrak{x}}_{n}^{l} denotes the ll-th column of the matrix XnX_{n}. Then system (3.2) can be rewritten in the following matrix form:

The randomness of AnA_{n} is due to the random switching of network topology via random batch interactions. In contrast, BnB_{n} takes care of the external noises HnlH_{n}^{l}. In the absence of external noise (ζ=0\zeta=0), one has Hnl=0H_{n}^{l}=0 for all n0n\geq 0 and system (3.1) reduces to the stochastic system:

(3.4) 𝔵n+1l=An𝔵nl,n0.{\mathfrak{x}}_{n+1}^{l}=A_{n}{\mathfrak{x}}_{n}^{l},\quad n\geq 0.

In the next lemma, we study several properties of AnA_{n}.

Lemma 3.1.

Let AnA_{n} be a matrix defined in (3.1). Then, it satisfies the following two properties:

  1. (1)

    AnA_{n} is row-stochastic:

    [An]ij0,j=1N[An]ij=1,i𝒩,[A_{n}]_{ij}\geq 0,\quad\sum_{j=1}^{N}[A_{n}]_{ij}=1,\quad\forall~{}i\in{\mathcal{N}},

    where [An]ij[A_{n}]_{ij} is the (i,j)(i,j)-th element of the matrix AnN×NA_{n}\in\mathbb{R}^{N\times N}.

  2. (2)

    The product An+mAnA_{n+m}\cdots A_{n} is row-stochastic:

    [An+mAn]ij0,j=1d[An+mAn]ij=1.[A_{n+m}\cdots A_{n}]_{ij}\geq 0,\quad\sum_{j=1}^{d}[A_{n+m}\cdots A_{n}]_{ij}=1.
Proof.

(1) By (3.3) and (3.1), one has

[An]ij=[(1γ)IN+γWn]ij=(1γ)δij+γω[i]n,j(Xn)0,[A_{n}]_{ij}=[(1-\gamma)I_{N}+\gamma W_{n}]_{ij}=(1-\gamma)\delta_{ij}+\gamma\omega_{[i]_{n},j}(X_{n})\geq 0,

where we used the fact γ(0,1)\gamma\in(0,1). This also yields

j[An]ij=j(1γ)δij+γω[i]n,j(Xn)=(1γ)jδij+γjω[i]n,j(Xn)=1.\sum_{j}[A_{n}]_{ij}=\sum_{j}(1-\gamma)\delta_{ij}+\gamma\omega_{[i]_{n},j}(X_{n})=(1-\gamma)\sum_{j}\delta_{ij}+\gamma\sum_{j}\omega_{[i]_{n},j}(X_{n})=1.

(2) The second assertion holds since the product of two row-stochastic matrices is also row-stochastic.

Note that the asymptotic dynamics of 𝕩n{\mathbb{x}}_{n} is completely determined by the matrix sequence {An}n0\{A_{n}\}_{n\geq 0} in (3.4).

3.2. Ergodicity coefficient

In this subsection, we recall the concept of ergodicity coefficient and investigate its basic properties which will be crucially used in later sections.

First, we recall the ergodicity coefficient of a matrix AN×NA\in\mathbb{R}^{N\times N} as follows.

Definition 3.1.

[2] For A=(aij)N×NA=(a_{ij})\in\mathbb{R}^{N\times N}, we define the ergodicity coefficient of AA as

(3.5) α(A):=mini,jk=1Nmin{aik,ajk}.\alpha(A):=\min_{i,j}\sum_{k=1}^{N}\min\{a_{ik},a_{jk}\}.
Remark 3.1.

In [7], the ergodicity coefficient (3.5) is introduced to study the emergent dynamics of systems of the form (3.4).

As a direct application of (3.5), one has the super-additivity and monotonicity for α\alpha.

Lemma 3.2.

Let A,BA,B be square matrices in N×N\mathbb{R}^{N\times N}. Then, the ergodicity coefficient has the following assertions:

  1. (1)

    (Super-additivity and homogeneity):

    α(A+B)α(A)+α(B)andα(tA)=tα(A),t0.\alpha(A+B)\geq\alpha(A)+\alpha(B)\quad\text{and}\quad\alpha(tA)=t\alpha(A),\quad t\geq 0.
  2. (2)

    (Monotonicity):

    ABentry-wiseα(A)α(B).A\geq B~{}\mbox{entry-wise}\quad\Longrightarrow\quad\alpha(A)\geq\alpha(B).
Proof.

All the estimates can be followed from (3.5) directly. Hence, we omit its details. ∎

Next, we recall a key tool that will be crucially used in the consensus estimates for (3.1).

Lemma 3.3.

[2] Let AN×NA\in\mathbb{R}^{N\times N} be a square matrix with equal row sums:

j=1daij=a,i𝒩.\sum_{j=1}^{d}a_{ij}=a,\quad\forall~{}i\in{\mathcal{N}}.

Then, we have

𝒟(A𝕫)(aα(A))𝒟(𝕫),𝕫N.\mathcal{D}(A\mathbb{z})\leq(a-\alpha(A))\mathcal{D}(\mathbb{z}),\qquad\forall~{}\mathbb{z}\in\mathbb{R}^{N}.
Proof.

A proof using induction on NN can be found in [2], but here we present a different approach. By definition of state diameter (1.10), one has ∎

Next, we study elementary lemmas for the emergence of consensus. From Definition 1.1, for consensus, it suffices to show

limn𝒟(𝔵nl)=0,l.\lim_{n\to\infty}{\mathcal{D}}(\mathfrak{x}_{n}^{l})=0,\quad\forall~{}l.

We apply Lemma 3.3 for (3.4) and row-stochasticity of AnA_{n} to get

𝒟(𝔵n+1l)(1α(An))𝒟(𝔵nl).\displaystyle{\mathcal{D}}(\mathfrak{x}_{n+1}^{l})\leq(1-\alpha(A_{n})){\mathcal{D}}(\mathfrak{x}_{n}^{l}).

However, note that the ergodicity coefficient α(An)\alpha(A_{n}) may not be strictly positive. For m1m\geq 1, we iterate (3.4) mm-times to get

𝔵n+ml=An+m1An𝔵nl,n0.{\mathfrak{x}}_{n+m}^{l}=A_{n+m-1}\cdots A_{n}{\mathfrak{x}}_{n}^{l},\quad n\geq 0.

Since the product An+m1AnA_{n+m-1}\cdots A_{n} is also row-stochastic (see Lemma 3.1), we can apply Lemma 3.3 to find

𝒟(𝔵n+ml)(1α(An+m1An+m2An))𝒟(𝔵nl).{\mathcal{D}}(\mathfrak{x}_{n+m}^{l})\leq(1-\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})){\mathcal{D}}(\mathfrak{x}_{n}^{l}).

Then, the ergodicity coefficient α(An+m1An+m2An)\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n}) depends on the network structure between Wn+m1,Wn+m2,,WnW_{n+m-1},W_{n+m-2},\ldots,W_{n}, and the probability of α\alpha being strictly positive increases, as mm grows.

In [7], the consensus 𝒟(𝔵nl)0\mathcal{D}(\mathfrak{x}_{n}^{l})\to 0 (n)(n\to\infty) was guaranteed under the assumption:

(3.6) k=0α(Atk+11Atk+12Atk)=for some0=t0<t1<UNKNOWN\displaystyle\sum_{k=0}^{\infty}\alpha(A_{t_{k+1}-1}A_{t_{k+1}-2}\dots A_{t_{k}})=\infty\quad\mbox{for some}\quad 0=t_{0}<t_{1}<\dots{}

However, examples of the network topology (which may vary in time, possibly randomly) that satisfies the aforementioned a priori assumption (3.6) was not well-studied yet. Below, we show that random batch interactions provide a network topology satisfying a priori condition (3.6).

First, we introduce a random variable measuring the degree of network connectivity. For a given pair (i,j)𝒩×𝒩(i,j)\in{\mathcal{N}}\times{\mathcal{N}}, we consider the set of all time instant rr within the time zone [n,n+m)[n,n+m) such that ii and jj are in the same batch ([i]r=[j]r)([i]_{r}=[j]_{r}). More precisely, we introduce

(3.7) 𝒯[n,n+m)ij:={nr<n+m:[i]r=[j]r},𝒢[n,n+m):=mini,j|𝒯[n,n+m)ij|.{\mathcal{T}}^{ij}_{[n,n+m)}:=\{n\leq r<n+m~{}:~{}[i]_{r}=[j]_{r}\},\qquad{\mathcal{G}}_{[n,n+m)}:=\min_{i,j}\big{|}{\mathcal{T}}^{ij}_{[n,n+m)}\big{|}.

Note that 𝒯[n,n+m)ij{\mathcal{T}}^{ij}_{[n,n+m)} is non-empty if one batch contains both ii and jj for some instant in {n,n+1,,n+m1}\{n,n+1,\dots,n+m-1\}. In the following lemma, we will estimate ergodicity coefficients from random batch interactions.

Lemma 3.4.

Let AnA_{n} be the transition matrix in (3.1). Then, for given integers m0m\geq 0, n0n\geq 0 and l=1,,dl=1,\dots,d,

(3.8) α(An+m1An+m2An)γ(1γ)m1𝒢[n,n+m).\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}.
Proof.

We derive the estimate (3.8) via two steps:

(3.9) α(An+m1An+m2An)γ(1γ)m1α(r=nn+m1Wr)γ(1γ)m1𝒢[n,n+m).\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})\geq\gamma(1-\gamma)^{m-1}\alpha\Bigg{(}\sum_{r={n}}^{n+m-1}W_{r}\Bigg{)}\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}.

\bullet Step A (Derivation of the first inequality): Recall that

An=(1γ)IN+γWn.A_{n}=(1-\gamma)I_{N}+\gamma W_{n}.

Then, it follows from Lemma 3.2 that

(3.10) α(An+m1An+m2An)=α([(1γ)IN+γWn+m1][(1γ)IN+γWn])=α((1γ)mIN+γ(1γ)m1r=nn+m1Wr++γmWn+m1Wn)α(γ(1γ)m1r=nn+m1Wr)=γ(1γ)m1α(r=nn+m1Wr).\displaystyle\begin{aligned} &\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})\\ &\hskip 14.22636pt=\alpha\Big{(}\left[(1-\gamma)I_{N}+\gamma W_{n+m-1}\right]\dots\left[(1-\gamma)I_{N}+\gamma W_{n}\right]\Big{)}\\ &\hskip 14.22636pt=\alpha\Bigg{(}(1-\gamma)^{m}I_{N}+\gamma(1-\gamma)^{m-1}\sum_{r={n}}^{n+m-1}W_{r}+\cdots+\gamma^{m}W_{n+m-1}\cdots W_{n}\Bigg{)}\\ &\hskip 14.22636pt\geq\alpha\Bigg{(}\gamma(1-\gamma)^{m-1}\sum_{r={n}}^{n+m-1}W_{r}\Bigg{)}=\gamma(1-\gamma)^{m-1}\alpha\Bigg{(}\sum_{r={n}}^{n+m-1}W_{r}\Bigg{)}.\end{aligned}

\bullet Step B (Derivation of the second inequality): We use Wr:=(ω[i]r,j(Xr))W_{r}:=(\omega_{[i]_{r},j}(X_{r})) and Definition 3.1 to see

(3.11) α(r=nn+m1Wr)=mini,jk=1Nmin{r=nn+m1ω[i]r,k(Xr),r=nn+m1ω[j]r,k(Xr)}mini,jk=1Nmin{nr<n+m:[i]r=[j]rω[i]r,k(Xr),nr<n+m:[i]r=[j]rω[j]r,k(Xr)}=mini,jk=1Nnr<n+m:[i]r=[j]rω[i]r,k(Xr)=mini,jnr<n+m:[i]r=[j]r1=mini,j|{nr<n+m:[i]r=[j]r}|.\displaystyle\begin{aligned} \alpha\Bigg{(}\sum_{r={n}}^{n+m-1}W_{r}\Bigg{)}&=\min_{i,j}\sum_{k=1}^{N}\min\Bigg{\{}\sum_{r=n}^{n+m-1}\omega_{[i]_{r},k}(X_{r}),\sum_{r=n}^{n+m-1}\omega_{[j]_{r},k}(X_{r})\Bigg{\}}\\ &\geq\min_{i,j}\sum_{k=1}^{N}\min\Bigg{\{}\sum_{\begin{subarray}{c}n\leq r<n+m:\\ [i]_{r}=[j]_{r}\end{subarray}}\omega_{[i]_{r},k}(X_{r}),\sum_{\begin{subarray}{c}n\leq r<n+m:\\ [i]_{r}=[j]_{r}\end{subarray}}\omega_{[j]_{r},k}(X_{r})\Bigg{\}}\\ &=\min_{i,j}\sum_{k=1}^{N}\sum_{\begin{subarray}{c}n\leq r<n+m:\\ [i]_{r}=[j]_{r}\end{subarray}}\omega_{[i]_{r},k}(X_{r})=\min_{i,j}\sum_{\begin{subarray}{c}n\leq r<n+m:\\ [i]_{r}=[j]_{r}\end{subarray}}1\\ &=\min_{i,j}\left|\{n\leq r<n+m:[i]_{r}=[j]_{r}\}\right|.\end{aligned}

Finally we combine (3.10) and (3.11) to derive (3.9). ∎

Remark 3.2.

Below, we provide two comments on the result of Lemma 3.4.

  1. (1)

    Lemma 3.3 says that, if the union of network structure from time r=nr=n to time r=n+m1r=n+m-1 forms all-to-all connected network, then the ergodicity coefficient α(An+m1An+m2An)\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n}) is positive. To make the ergodicity coefficient positive, the union network needs not be necessarily all-to-all. However, the estimation on the lower bound of α\alpha could be more difficult.

  2. (2)

    The diameter D(𝔵nl)D({\mathfrak{x}}^{l}_{n}) satisfies

    𝒟(𝔵n+ml)(1γ(1γ)m1𝒢[n,n+m))𝒟(𝔵nl),n,m0{\mathcal{D}}(\mathfrak{x}_{n+m}^{l})\leq\Big{(}1-\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}\Big{)}{\mathcal{D}}(\mathfrak{x}_{n}^{l}),\quad n,m\geq 0

3.3. Almost sure consensus

In this subsection, we provide a stochastic consensus result when the stochastic noises are turned off, i.e., ζ=0\zeta=0. For this, we choose a suitable mm such that 𝒢[0,m){\mathcal{G}}_{[0,m)} has a strictly positive expectation.

Lemma 3.5.

The following assertions hold.

  1. (1)

    There exists m0m_{0}\in\mathbb{N} such that there exist partitions 𝒫1,,𝒫m0{\mathcal{P}}^{1},\dots,{\mathcal{P}}^{m_{0}} of 𝒩{\mathcal{N}} in 𝒜\mathcal{A}, such that for any i,j𝒩i,j\in{\mathcal{N}}, one has i,jS𝒫ki,j\in S\in{\mathcal{P}}^{k} for some 1km01\leq k\leq m_{0} and S𝒫kS\in{\mathcal{P}}^{k}.

  2. (2)

    For n0n\geq 0,

    (3.12) 𝔼[𝒢[n,n+m0)]=𝔼[𝒢[0,m0)]=:pm0>0.{\mathbb{E}}\left[{\mathcal{G}}_{[n,n+m_{0})}\right]={\mathbb{E}}\left[{\mathcal{G}}_{[0,m_{0})}\right]=:p_{m_{0}}>0.
Proof.

(1) First, the existence of m0m_{0} and partitions are clear since we may consider all possible partitions 𝒫1,,𝒫m1{\mathcal{P}}^{1},\ldots,{\mathcal{P}}^{m_{1}} of {1,,N}\{1,\dots,N\} in 𝒜\mathcal{A}. In other words, we may choose m0m_{0} as m1=|𝒜|m_{1}=|\mathcal{A}|.

(2)  Note that the random choices of batches n{\mathcal{B}}^{n} (n0)(n\geq 0) are independent and identically distributed. Hence, the expectations of 𝒢[0,m0){\mathcal{G}}_{[0,m_{0})} and 𝒢[n,n+m0){\mathcal{G}}_{[n,n+m_{0})} should be identical:

𝔼[𝒢[n,n+m0)]=𝔼[𝒢[0,m0)]=:pm0.{\mathbb{E}}\left[{\mathcal{G}}_{[n,n+m_{0})}\right]={\mathbb{E}}\left[{\mathcal{G}}_{[0,m_{0})}\right]=:p_{m_{0}}.

Next, let 𝒫1,,𝒫m0{\mathcal{P}}^{1},\dots,{\mathcal{P}}^{m_{0}} be the partitions chosen in (1). Note that the probability of having 1=𝒫1{\mathcal{B}}^{1}={\mathcal{P}}^{1} depends on m1=|𝒜|m_{1}=|\mathcal{A}|: for any i,ji,j,

{i=𝒫j}=1/m1.{\mathbb{P}}\left\{{\mathcal{B}}^{i}={\mathcal{P}}^{j}\right\}=1/m_{1}.

Therefore, we may proceed to give a rough estimate on 𝒢[0,m0){\mathcal{G}}_{[0,m_{0})}:

𝔼[𝒢[0,m0)]\displaystyle{\mathbb{E}}\left[{\mathcal{G}}_{[0,m_{0})}\right] {𝒢[0,m0)1}\displaystyle\geq{\mathbb{P}}\left\{{\mathcal{G}}_{[0,m_{0})}\geq 1\right\}
{for each 1jm0,i=𝒫j for some 0im1}\displaystyle\geq{\mathbb{P}}\left\{\text{for each }1\leq j\leq m_{0},~{}{\mathcal{B}}^{i}={\mathcal{P}}^{j}\text{ for some }0\leq i\leq m-1\right\}
j=1m0{j1=𝒫j}=1/(m1)m0>0.\displaystyle\geq\prod_{j=1}^{m_{0}}{\mathbb{P}}\left\{{\mathcal{B}}^{j-1}={\mathcal{P}}^{j}\right\}=1/{(m_{1})}^{m_{0}}>0.

This gives pm0>0.p_{m_{0}}>0.

Now we are ready to provide a stochastic consensus estimate on the dynamics (3.1) in the absence of noise, ζ=0\zeta=0.

Theorem 3.1.

Let {𝕩ni}{\{\mathbb{x}}^{i}_{n}\} be a solution process to (1.3) with ηni,l=0\eta_{n}^{i,l}=0 a.s. for any n,i,ln,i,l, and let k0k\geq 0. Then, for n[km0,(k+1)m0)n\in[km_{0},(k+1)m_{0}), there exists a nonnegative random variable Λ0=Λ0(m0,k)\Lambda_{0}=\Lambda_{0}(m_{0},k) such that

(3.13) (i)𝒟(𝔵nl)𝒟(𝔵0l)exp(Λ0k),a.s.,(ii)maxi,j𝕩ni𝕩njmaxi,j𝕩0i𝕩0jexp(Λ0k),a.s.,\displaystyle\begin{aligned} &(i)~{}{\mathcal{D}}({\mathfrak{x}}^{l}_{n})\leq{\mathcal{D}}({\mathfrak{x}}^{l}_{0})\exp(-\Lambda_{0}k),\quad\mbox{a.s.},\\ &(ii)~{}\max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}\leq\max_{i,j}\|{\mathbb{x}}^{i}_{0}-{\mathbb{x}}^{j}_{0}\|_{\infty}\exp(-\Lambda_{0}k),\quad\mbox{a.s.,}\end{aligned}

where the decay exponent Λ0(m0,k)\Lambda_{0}(m_{0},k) satisfies

(3.14) limkΛ0(m0,k)=pm0γ(1γ)m01a.s.\lim_{k\to\infty}\Lambda_{0}(m_{0},k)=p_{m_{0}}\gamma(1-\gamma)^{m_{0}-1}\quad\mbox{a.s.}
Proof.

(i) We claim

(3.15) 𝒟(𝔵nl)𝒟(𝔵km0l)𝒟(𝔵0l)exp[γ(1γ)m01(1ks=1k𝒢[(s1)m0,sm0))k].{\mathcal{D}}({\mathfrak{x}^{l}_{n}})\leq{\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}})\leq{\mathcal{D}}({\mathfrak{x}^{l}_{0}})\exp\Big{[}-\gamma(1-\gamma)^{m_{0}-1}\Big{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{)}k\Big{]}.

\bullet (First inequality in (3.15)): Since AnA_{n} has only nonnegative elements, the values of α\alpha at products of AnA_{n}’s are nonnegative. Hence,

(3.16) 𝒟(𝔵nl)𝒟(𝔵km0l)(1α(An1An2Akm0))𝒟(𝔵km0l).{\mathcal{D}}({\mathfrak{x}^{l}_{n}})\leq{\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}})\left(1-\alpha(A_{n-1}A_{n-2}\cdots A_{km_{0}})\right)\leq{\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}}).

\bullet (Second inequality in (3.15)): By Lemma 3.4, the ergodicity coefficient can be written in terms of 𝒢[(k1)m0,km0){\mathcal{G}}_{[(k-1)m_{0},km_{0})}:

(3.17) 𝒟(𝔵km0l)𝒟(𝔵(k1)m0l)(1α(Akm01Akm02A(k1)m0))𝒟(𝔵(k1)m0l)(1γ(1γ)m01𝒢[(k1)m0,km0))𝒟(𝔵(k1)m0l)exp(γ(1γ)m01𝒢[(k1)m0,km0)),\displaystyle\begin{aligned} {\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}})&\leq{\mathcal{D}}({\mathfrak{x}^{l}_{(k-1)m_{0}}})(1-\alpha(A_{km_{0}-1}A_{km_{0}-2}\cdots A_{(k-1)m_{0}}))\\ &\leq{\mathcal{D}}({\mathfrak{x}^{l}_{(k-1)m_{0}}})(1-\gamma(1-\gamma)^{m_{0}-1}{\mathcal{G}}_{[(k-1)m_{0},km_{0})})\\ &\leq{\mathcal{D}}({\mathfrak{x}^{l}_{(k-1)m_{0}}})\exp(-\gamma(1-\gamma)^{m_{0}-1}{\mathcal{G}}_{[(k-1)m_{0},km_{0})}),\end{aligned}

where we used 1+xex1+x\leq e^{x} in the last inequality. By induction on kk in (3.17),

(3.18) 𝒟(𝔵km0l)𝒟(𝔵0l)exp[γ(1γ)m01s=1k𝒢[(s1)m0,sm0)]=𝒟(𝔵0l)exp[γ(1γ)m01(1ks=1k𝒢[(s1)m0,sm0))k].\displaystyle\begin{aligned} {\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}})&\leq{\mathcal{D}}({\mathfrak{x}^{l}_{0}})\exp\Big{[}-\gamma(1-\gamma)^{m_{0}-1}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{]}\\ &={\mathcal{D}}({\mathfrak{x}^{l}_{0}})\exp\Big{[}-\gamma(1-\gamma)^{m_{0}-1}\Big{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{)}k\Big{]}.\end{aligned}

We combine (3.16) and (3.18) to get the desired estimate (3.15). Finally, set random variable Λ0\Lambda_{0} as

Λ0(m0,k):=γ(1γ)m01(1ks=1k𝒢[(s1)m0,sm0)).\Lambda_{0}(m_{0},k):=\gamma(1-\gamma)^{m_{0}-1}\Big{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{)}.

Note that the random variables 𝒢[(s1)m0,sm0){\mathcal{G}}_{[(s-1)m_{0},sm_{0})} (s1)(s\geq 1) are independent and identically distributed. Hence, it follows from the strong law of large numbers that Λ0(m0,k)\Lambda_{0}(m_{0},k) converges to the expectation value as kk\to\infty, which is estimated in Lemma 3.4:

limkγ(1γ)m01(1ks=1k𝒢[(s1)m0,sm0))=γ(1γ)m01𝔼[𝒢[0,m0)]=γ(1γ)m01pm0a.s.\displaystyle\lim_{k\to\infty}\gamma(1-\gamma)^{m_{0}-1}\Big{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{)}=\gamma(1-\gamma)^{m_{0}-1}{\mathbb{E}}[{\mathcal{G}}_{[0,m_{0})}]=\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}\quad\mbox{a.s.}

This gives the asymptotic property of Λ0(m0,k)\Lambda_{0}(m_{0},k) in (3.14).

(ii) By (1.10) and the definition of 𝔵nl{\mathfrak{x}}_{n}^{l}, one can see that for l=1,,dl=1,\cdots,d,

𝒟(𝔵nl)maxi,j𝕩ni𝕩nj=max1kd𝒟(𝔵nk).{\mathcal{D}}({\mathfrak{x}}_{n}^{l})\leq\max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}=\max_{1\leq k\leq d}{\mathcal{D}}({\mathfrak{x}}_{n}^{k}).

This and (3.13)1\eqref{C-11}_{1} yield (3.13)2\eqref{C-11}_{2}:

maxi,j𝕩ni𝕩nj=max1ld𝒟(𝔵nl)max1ld𝒟(𝔵0l)exp(Λ0k)=maxi,j𝕩0i𝕩0jexp(Λ0k),a.s.\displaystyle\begin{aligned} \max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}&=\max_{1\leq l\leq d}{\mathcal{D}}(\mathfrak{x}_{n}^{l})\leq\max_{1\leq l\leq d}{\mathcal{D}}({\mathfrak{x}}^{l}_{0})\exp(-\Lambda_{0}k)=\max_{i,j}\|{\mathbb{x}}^{i}_{0}-{\mathbb{x}}^{j}_{0}\|_{\infty}\exp(-\Lambda_{0}k),\quad\mbox{a.s.}\end{aligned}

It is also worth mentioning again that the transition matrix AnA_{n} has only nonnegative elements. In [7], the nonnegative version of Lemma 3.3 was applied using the result in [25]. If one takes into account of the randomness HnlH^{l}_{n}, we need to consider Lemma 3.3 and generalize Lemma 3.4 for transition matrices with possibly negative entries. We will analyze it in the next section.

4. Discrete CBO with random batch interactions and heterogeneous external noises: consensus analysis

In this section, we study consensus estimates for the stochastic dynamics (3.1) in the presence of both random batch intereactions and heterogeneous external noises. In fact, the materials presented in Section 3 corresponds to the special case of this section. Hence, presentation will be parallel to that of Section 3 and we will focus on the effect of external noises on the stochastic consensus.

4.1. Ergodicity coefficient

As discussed in Section 3, we use the ergodicity coefficient to prove the emergence of stochastic consensus. Recall the governing stochastic system:

(4.1) 𝔵n+1l=(An+Bn)𝔵nl,An:=(1γ)IN+γWnandBn:=Hnl(IN+Wn).\displaystyle\begin{aligned} &\mathfrak{x}^{l}_{n+1}=(A_{n}+B_{n})\mathfrak{x}^{l}_{n},\\ &A_{n}:=(1-\gamma)I_{N}+\gamma W_{n}\quad\text{and}\quad B_{n}:=H^{l}_{n}(I_{N}+W_{n}).\end{aligned}

As in Lemma 3.4, one needs to compute the ergodicity coefficient of

(4.2) (An+m1+Bn+m1)(An+m2+Bn+m2)(An+Bn)(A_{n+m-1}+B_{n+m-1})(A_{n+m-2}+B_{n+m-2})\cdots(A_{n}+B_{n})

for integers n0n\geq 0 and m1m\geq 1.

In what follows, we study preliminary lemmas for the estimation of ergodicity coefficient α()\alpha(\cdot) to handle the stochastic part BnB_{n} due to external noises as a perturbation. Then, system (LABEL:D-1) can be viewed as a perturbation of the nonlinear system (3.4) which has been treated in the previous section. For a given square matrix A=(aij)A=(a_{ij}), define a mixed norm:

A1,:=max1iNj=1N|aij|.\|A\|_{1,\infty}:=\max\limits_{1\leq i\leq N}\sum\limits_{j=1}^{N}|a_{ij}|.
Lemma 4.1.

For a matrix A=(aij)i,j=1NN×NA=(a_{ij})_{i,j=1}^{N}\in\mathbb{R}^{N\times N}, the ergodicity coefficient α(A)\alpha(A) has a lower bound:

α(A)2A1,.\alpha(A)\geq-2\|A\|_{1,\infty}.
Proof.

By direct calculation, one has

α(A)\displaystyle\alpha(A) =mini1,i2j=1Nmin{ai1j,ai2j}mini1,i2j=1Nmin{|ai1j|,|ai2j|}\displaystyle=\min_{i_{1},i_{2}}\sum_{j=1}^{N}\min\{a_{i_{1}j},a_{i_{2}j}\}\geq\min_{i_{1},i_{2}}\sum_{j=1}^{N}\min\{-|a_{i_{1}j}|,-|a_{i_{2}j}|\}
mini1,i2j=1N(|ai1j||ai2j|)=2A1,.\displaystyle\geq\min_{i_{1},i_{2}}\sum_{j=1}^{N}(-|a_{i_{1}j}|-|a_{i_{2}j}|)=-2\|A\|_{1,\infty}.

The following lemma helps to give a lower bound for the ergodicity coefficient of the product of matrices in (4.2).

Lemma 4.2.

For nn\in\mathbb{N}, let A1,,AnA_{1},\dots,A_{n}, B1,,BnB_{1},\dots,B_{n} be matrices in N×N\mathbb{R}^{N\times N}. Then, one has

(4.3) (A1+B1)(An+Bn)A1An1,(A11,+B11,)(An1,+Bn1,)A11,An1,.\displaystyle\begin{aligned} &\|(A_{1}+B_{1})\cdots(A_{n}+B_{n})-A_{1}\cdots A_{n}\|_{1,\infty}\\ &\hskip 56.9055pt\leq(\|A_{1}\|_{1,\infty}+\|B_{1}\|_{1,\infty})\cdots(\|A_{n}\|_{1,\infty}+\|B_{n}\|_{1,\infty})-\|A_{1}\|_{1,\infty}\cdots\|A_{n}\|_{1,\infty}.\end{aligned}
Proof.

We use the induction on nn. The initial step n=1n=1 is trivial. Suppose that the inequality (LABEL:D-3-1) holds for n1n-1. By the subadditivity and submultiplicativity of the matrix norm 1,\|\cdot\|_{1,\infty}, we have

(A1+B1)(An+Bn)A1An1,\displaystyle\|(A_{1}+B_{1})\cdots(A_{n}+B_{n})-A_{1}\cdots A_{n}\|_{1,\infty}
(A1+B1)(An1+Bn1)AnA1An1,+(A1+B1)(An1+Bn1)Bn1,\displaystyle\hskip 28.45274pt\leq\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})A_{n}-A_{1}\cdots A_{n}\|_{1,\infty}+\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})B_{n}\|_{1,\infty}
(A1+B1)(An1+Bn1)A1An11,An1,\displaystyle\hskip 28.45274pt\leq\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})-A_{1}\cdots A_{n-1}\|_{1,\infty}\|A_{n}\|_{1,\infty}
+(A11,+B11,)(An11,+Bn11,)Bn1,\displaystyle\hskip 28.45274pt\quad+(\|A_{1}\|_{1,\infty}+\|B_{1}\|_{1,\infty})\cdots(\|A_{n-1}\|_{1,\infty}+\|B_{n-1}\|_{1,\infty})\|B_{n}\|_{1,\infty}
((A11,+B11,)(An11,+Bn11,)A11,An11,)An1,\displaystyle\hskip 28.45274pt\leq\big{(}(\|A_{1}\|_{1,\infty}+\|B_{1}\|_{1,\infty})\cdots(\|A_{n-1}\|_{1,\infty}+\|B_{n-1}\|_{1,\infty})-\|A_{1}\|_{1,\infty}\cdots\|A_{n-1}\|_{1,\infty}\big{)}\|A_{n}\|_{1,\infty}
+(A11,+B11,)(An11,+Bn11,)Bn1,\displaystyle\hskip 28.45274pt\quad+(\|A_{1}\|_{1,\infty}+\|B_{1}\|_{1,\infty})\cdots(\|A_{n-1}\|_{1,\infty}+\|B_{n-1}\|_{1,\infty})\|B_{n}\|_{1,\infty}
=(A11,+B11,)(An1,+Bn1,)A11,An1,.\displaystyle\hskip 28.45274pt=(\|A_{1}\|_{1,\infty}+\|B_{1}\|_{1,\infty})\cdots(\|A_{n}\|_{1,\infty}+\|B_{n}\|_{1,\infty})-\|A_{1}\|_{1,\infty}\cdots\|A_{n}\|_{1,\infty}.

This verifies the desired estimate (LABEL:D-3-1). ∎

From Lemma 4.1 and Lemma 4.2, we are ready to estimate the ergodicity coefficient of (4.2). For this, we also introduce a new random variable [n,n+m){\mathcal{H}}_{[n,n+m)} measuring the size of error:

(4.4) [n,n+m):=2[r=nn+m1(1+2Hrl1,)1],{\mathcal{H}}_{[n,n+m)}:=2\left[\prod_{r=n}^{n+m-1}(1+2\|H^{l}_{r}\|_{1,\infty})-1\right],

where Hnl:=diag(ηn1,l,,ηnN,l)H_{n}^{l}:=\operatorname{diag}(\eta_{n}^{1,l},\dots,\eta_{n}^{N,l}).

Lemma 4.3.

For given integers m1m\geq 1, n1n\geq 1 and l=1,,dl=1,\dots,d, one has

(4.5) α(r=nn+m1(Ar+Br))γ(1γ)m1𝒢[n,n+m)[n,n+m),\alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})\right)\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}-{\mathcal{H}}_{[n,n+m)},

where 𝒢[n,n+m){\mathcal{G}}_{[n,n+m)} is defined in (3.7).

Proof.

First, we use super-additivity of α\alpha in Lemma 3.2:

(4.6) α(r=nn+m1(Ar+Br))α(r=nn+m1Ar)+α(r=nn+m1(Ar+Br)r=nn+m1Ar)=:11+12.\displaystyle\begin{aligned} \alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})\right)&\geq\alpha\left(\prod_{r=n}^{n+m-1}A_{r}\right)+\alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})-\prod_{r=n}^{n+m-1}A_{r}\right)\\ &=:{\mathcal{I}}_{11}+{\mathcal{I}}_{12}.\end{aligned}

\bullet (Estimate of 11{\mathcal{I}}_{11}): This case has already been treated in Lemma 3.4:

(4.7) α(r=nn+m1Ar)γ(1γ)m1𝒢[n,n+m).\alpha\left(\prod_{r=n}^{n+m-1}A_{r}\right)\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}.

\bullet (Estimate of 12{\mathcal{I}}_{12}): The term 12{\mathcal{I}}_{12} can be regarded as an error term due to external stochastic noise. We may use Lemma 4.1 to get a lower bound of this term:

(4.8) α(r=nn+m1(Ar+Br)r=nn+m1Ar)2r=nn+m1(Ar+Br)r=nn+m1Ar1,.\displaystyle\alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})-\prod_{r=n}^{n+m-1}A_{r}\right)\geq-2\left\|\prod_{r=n}^{n+m-1}(A_{r}+B_{r})-\prod_{r=n}^{n+m-1}A_{r}\right\|_{1,\infty}.

By Lemma 4.2, one gets

(4.9) r=nn+m1(Ar+Br)r=nn+m1Ar1,r=nn+m1(Ar1,+Br1,)r=nn+m1Ar1,.\displaystyle\left\|\prod_{r=n}^{n+m-1}(A_{r}+B_{r})-\prod_{r=n}^{n+m-1}A_{r}\right\|_{1,\infty}\leq\prod_{r=n}^{n+m-1}(\|A_{r}\|_{1,\infty}+\|B_{r}\|_{1,\infty})-\prod_{r=n}^{n+m-1}\|A_{r}\|_{1,\infty}.

By the defining relations (3.1), ArA_{r} and BrB_{r} can be estimated as follows.

(4.10) Ar1,=(1γ)IN+γWr1,1,\displaystyle\|A_{r}\|_{1,\infty}=\|(1-\gamma)I_{N}+\gamma W_{r}\|_{1,\infty}\leq 1,
Br1,=Hrl(INWr)1,Hrl1,(IN1,+Wr1,)=2Hrl1,.\displaystyle\|B_{r}\|_{1,\infty}=\|H^{l}_{r}(I_{N}-W_{r})\|_{1,\infty}\leq\|H^{l}_{r}\|_{1,\infty}(\|I_{N}\|_{1,\infty}+\|W_{r}\|_{1,\infty})=2\|H^{l}_{r}\|_{1,\infty}.

Now, we combine (4.8), (4.9) and (4.10) to estimate 12{\mathcal{I}}_{12}:

(4.11) 122[r=nn+m1(1+2Hrl1,)1]=[n,n+m).\displaystyle{\mathcal{I}}_{12}\geq-2\left[\prod_{r=n}^{n+m-1}(1+2\|H^{l}_{r}\|_{1,\infty})-1\right]=-{\mathcal{H}}_{[n,n+m)}.

Finally, combining (4.6), (4.7) and (4.11) yield the desired estimate:

α(r=nn+m1(Ar+Br))γ(1γ)m1𝒢[n,n+m)[n,n+m).\alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})\right)\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}-{\mathcal{H}}_{[n,n+m)}.

Remark 4.1.

For the zero noise case ζ=0\zeta=0, the random variable [n,n+m){\mathcal{H}}_{[n,n+m)} becomes zero and the estimate (4.5) reduces to (3.8).

4.2. Stochastic consensus

Recall the discrete system for 𝔵nl{\mathfrak{x}}^{l}_{n}:

𝔵n+1l=(An+Bn)𝔵nl.\mathfrak{x}^{l}_{n+1}=(A_{n}+B_{n})\mathfrak{x}^{l}_{n}.

By iterating the above relation mm times, one gets

(4.12) 𝔵n+ml=(r=nn+m1(Ar+Br))𝔵nl.\mathfrak{x}^{l}_{n+m}=\Big{(}\prod_{r=n}^{n+m-1}(A_{r}+B_{r})\Big{)}\mathfrak{x}^{l}_{n}.
Lemma 4.4.

Let {𝕩ni}{\{\mathbb{x}}^{i}_{n}\} be a solution process to (1.3) and let m1m\geq 1 and k0k\geq 0 be given integers. Then, for n[km,(k+1)m)n\in[km,(k+1)m) and l=1,,dl=1,\dots,d,

𝒟(𝔵nl)\displaystyle\mathcal{D}(\mathfrak{x}_{n}^{l}) 𝒟(𝔵0l)(1+[km,n))s=1k(1γ(1γ)m1𝒢[(s1)m,sm)+[(s1)m,sm)).\displaystyle\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\left(1+{\mathcal{H}}_{[km,n)}\right)\prod_{s=1}^{k}\Big{(}1-\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[(s-1)m,sm)}+{\mathcal{H}}_{[(s-1)m,sm)}\Big{)}.
Proof.

We will follow the same procedure employed in Theorem 3.1, i.e., we first bound 𝒟(𝔵nl)\mathcal{D}(\mathfrak{x}_{n}^{l}) using 𝒟(𝔵kml){\mathcal{D}}(\mathfrak{x}_{km}^{l}), and then we bound 𝒟(𝔵kml){\mathcal{D}}(\mathfrak{x}_{km}^{l}) using 𝒟(𝔵0l){\mathcal{D}}(\mathfrak{x}_{0}^{l}).

\bullet Step A: For n[km,(k+1)m)n\in[km,(k+1)m), we estimate 𝒟(𝕩nl){\mathcal{D}}(\mathbb{x}_{n}^{l}) by using 𝒟(𝕩kml){\mathcal{D}}(\mathbb{x}_{km}^{l}):

(4.13) 𝒟(𝔵nl)\displaystyle{\mathcal{D}}(\mathfrak{x}_{n}^{l}) (1α(r=kmn1(Ar+Br)))𝒟(𝔵kml)\displaystyle\leq\left(1-\alpha\left(\prod_{r=km}^{n-1}(A_{r}+B_{r})\right)\right){\mathcal{D}}(\mathfrak{x}_{km}^{l})
(1γ(1γ)nkm1𝒢[km,n)+[km,n))𝒟(𝔵kml)\displaystyle\leq\left(1-\gamma(1-\gamma)^{n-km-1}{\mathcal{G}}_{[km,n)}+{\mathcal{H}}_{[km,n)}\right){\mathcal{D}}(\mathfrak{x}_{km}^{l})
(1+[km,n))𝒟(𝔵kml),\displaystyle\leq\left(1+{\mathcal{H}}_{[km,n)}\right){\mathcal{D}}(\mathfrak{x}_{km}^{l}),

where we used the fact that 𝒢[km,n)0{\mathcal{G}}_{[km,n)}\geq 0 in the last inequality.

\bullet Step B: We claim

(4.14) 𝒟(𝔵kml)s=1k(1γ(1γ)m1𝒢[(k1)m,km)[(k1)m,km))𝒟(𝔵0l).{\mathcal{D}}(\mathfrak{x}_{km}^{l})\leq\prod_{s=1}^{k}\Big{(}1-\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[(k-1)m,km)}-{\mathcal{H}}_{[(k-1)m,km)}\Big{)}{\mathcal{D}}(\mathfrak{x}_{0}^{l}).

Proof of claim: By setting n=(k1)mn=(k-1)m in (4.12), one has

𝔵kml=(r=(k1)mkm1(Ar+Br))𝔵(k1)ml.\mathfrak{x}^{l}_{km}=\Big{(}\prod_{r=(k-1)m}^{km-1}(A_{r}+B_{r})\Big{)}{\mathfrak{x}}^{l}_{(k-1)m}.

Then, we use Lemma 3.1 (with a=1a=1) to (4.12) to obtain

(4.15) 𝒟(𝔵kml)(1α(r=(k1)mkm1(Ar+Br)))𝒟(𝔵(k1)ml).{\mathcal{D}}(\mathfrak{x}_{km}^{l})\leq\left(1-\alpha\left(\prod_{r=(k-1)m}^{km-1}(A_{r}+B_{r})\right)\right){\mathcal{D}}(\mathfrak{x}_{(k-1)m}^{l}).

By induction on kk, the relation (4.15) yields

(4.16) 𝒟(𝔵kml)\displaystyle{\mathcal{D}}(\mathfrak{x}_{km}^{l}) s=1k(1α(r=(s1)msm1(Ar+Br)))𝒟(𝔵0l).\displaystyle\leq\prod_{s=1}^{k}\left(1-\alpha\left(\prod_{r=(s-1)m}^{sm-1}(A_{r}+B_{r})\right)\right){\mathcal{D}}(\mathfrak{x}_{0}^{l}).

On the other hand, it follows from Lemma 4.3 that

(4.17) α(r=(k1)mkm1(Ar+Br))γ(1γ)m1𝒢[(k1)m,km)[(k1)m,km).\alpha\left(\prod_{r=(k-1)m}^{km-1}(A_{r}+B_{r})\right)\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[(k-1)m,km)}-{\mathcal{H}}_{[(k-1)m,km)}.

Finally, one combines (4.16) and (4.17) to derive (4.14).

\bullet Final step: By combining (4.13) and (4.14), one has

𝒟(𝔵nl)(1+[km,n))𝒟(𝔵kml)𝒟(𝔵0l)(1+[km,n))s=1k(1γ(1γ)m1𝒢[(s1)m,sm)+[(s1)m,sm)).\displaystyle\begin{aligned} {\mathcal{D}}(\mathfrak{x}_{n}^{l})&\leq\left(1+{\mathcal{H}}_{[km,n)}\right){\mathcal{D}}(\mathfrak{x}_{km}^{l})\\ &\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\left(1+{\mathcal{H}}_{[km,n)}\right)\prod_{s=1}^{k}\Big{(}1-\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[(s-1)m,sm)}+{\mathcal{H}}_{[(s-1)m,sm)}\Big{)}.\end{aligned}

Now, we are ready to provide an exponential decay of 𝔼𝒟(𝔵nl){\mathbb{E}}{\mathcal{D}}(\mathfrak{x}_{n}^{l}).

Theorem 4.1.

(L1(Ω)L^{1}(\Omega)-consensus) Let {𝕩ni}\{{\mathbb{x}}^{i}_{n}\} be a solution process to (1.3), and let m0m_{0} be a positive constant defined in Lemma 3.5. Then there exists ζ1>0\zeta_{1}>0 independent of the dimension dd, such that if 0ζ<ζ10\leq\zeta<\zeta_{1}, there exists some positive constants C1=C1(N,m0,ζ)C_{1}=C_{1}(N,m_{0},\zeta) and Λ1=Λ1(γ,N,m0,ζ)\Lambda_{1}=\Lambda_{1}(\gamma,N,m_{0},\zeta) such that

𝔼𝒟(𝔵nl)C1𝔼𝒟(𝔵0l)exp(Λ1n),n0.\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C_{1}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})\exp\left(-\Lambda_{1}n\right),\quad n\geq 0.

In particular,

maxi,j𝕩ni𝕩njC1maxi,j𝕩0i𝕩0jexp(Λ1n),n0.\max_{i,j}\|\mathbb{x}_{n}^{i}-\mathbb{x}_{n}^{j}\|_{\infty}\leq C_{1}\max_{i,j}\|\mathbb{x}_{0}^{i}-\mathbb{x}_{0}^{j}\|_{\infty}\exp\left(-\Lambda_{1}n\right),\quad n\geq 0.
Proof.

Assume n[km0,(k+1)m0)n\in[km_{0},(k+1)m_{0}) (k0)(k\geq 0). It follows from Lemma 4.4 that

(4.18) 𝒟(𝔵nl)𝒟(𝔵0l)(1+[km0,n))s=1k(1γ(1γ)m01𝒢[(s1)m0,sm0)+[(s1)m0,sm0)).\mathcal{D}(\mathfrak{x}_{n}^{l})\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\left(1+{\mathcal{H}}_{[km_{0},n)}\right)\prod_{s=1}^{k}\Big{(}1-\gamma(1-\gamma)^{m_{0}-1}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}+{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\Big{)}.

Since the following random variables

𝒟(𝔵0l),[km0,n),𝒢[(s1)m0,sm0),[(s1)m0,sm0),s=1,,k{\mathcal{D}}(\mathfrak{x}_{0}^{l}),\quad{\mathcal{H}}_{[km_{0},n)},\quad{\mathcal{G}}_{[(s-1)m_{0},sm_{0})},\quad{\mathcal{H}}_{[(s-1)m_{0},sm_{0})},\quad s=1,\cdots,k

are independent, we can take expectations on both sides of the estimate in (4.18) and then use the inequality 1+xex1+x\leq e^{x} to see that

In the sequel, we estimate the terms 𝔼𝒢[(s1)m0,sm0){\mathbb{E}}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})} and 𝔼[(s1)m0,sm0){\mathbb{E}}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})} one by one.

\bullet Case A (Estimate of 𝔼[(s1)m0,sm0){\mathbb{E}}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}): Recall the defining relation [n,n+m0){\mathcal{H}}_{[n,n+m_{0})} in (4.4):

[(s1)m0,sm0):=2[(s1)m0sm01(1+2Hrl1,)1].{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}:=2\left[\prod_{(s-1)m_{0}}^{sm_{0}-1}(1+2\|H^{l}_{r}\|_{1,\infty})-1\right].

By taking expectation of [(s1)m0,sm0){\mathcal{H}}_{[(s-1)m_{0},sm_{0})},

(4.19) 𝔼[(s1)m0,sm0)\displaystyle{\mathbb{E}}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})} =2[r=(s1)m0sm01(1+2𝔼Hrl1,)1]=2[(1+2𝔼H0l1,)m01]\displaystyle=2\left[\prod_{r=(s-1)m_{0}}^{sm_{0}-1}\left(1+2{\mathbb{E}}\|H^{l}_{r}\|_{1,\infty}\right)-1\right]=2\left[\left(1+2{\mathbb{E}}\|H^{l}_{0}\|_{1,\infty}\right)^{m_{0}}-1\right]
2[(1+2Nζ)m01],\displaystyle\leq 2\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right],

where we used the fact 𝔼|x|𝔼|x|2{\mathbb{E}}|x|\leq\sqrt{{\mathbb{E}}|x|^{2}} to get

𝔼Hnl1,=𝔼maxk|ηnk,l|𝔼k|ηnk,l|2𝔼k|ηnk,l|2Nζ.{\mathbb{E}}\|H_{n}^{l}\|_{1,\infty}={\mathbb{E}}\max_{k}|\eta_{n}^{k,l}|\leq{\mathbb{E}}\sqrt{\sum_{k}|\eta^{k,l}_{n}|^{2}}\leq\sqrt{{\mathbb{E}}\sum_{k}|\eta_{n}^{k,l}|^{2}}\leq\sqrt{N}\zeta.

\bullet Case B (Estimate of 𝔼𝒢[(s1)m0,sm0){\mathbb{E}}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}): By (3.12) of Lemma 3.5, one has

𝔼𝒢[(s1)m0,sm0)=pm0.{\mathbb{E}}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}=p_{m_{0}}.

Combining (4.2) and (4.19) yields

(4.20) 𝔼𝒟(𝔵nl)\displaystyle{\mathbb{E}}{\mathcal{D}}(\mathfrak{x}_{n}^{l}) 𝔼𝒟(𝔵0l)(1+2[(1+2Nζ)nkm01])\displaystyle\leq{\mathbb{E}}\mathcal{D}(\mathfrak{x}_{0}^{l})(1+2[(1+2\sqrt{N}\zeta)^{n-km_{0}}-1])
×exp{kγ(1γ)m01pm0+2k[(1+2Nζ)m01]}.\displaystyle\quad\times\exp\left\{-k\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}+2k\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right]\right\}.

Note that our goal is to estimate 𝔼𝒟(𝔵nl){\mathbb{E}}{\mathcal{D}}(\mathfrak{x}_{n}^{l}) in terms of nn. Then, kk is actually a function of nn. To be more specific,

k=k(n)=nm0.k=k(n)=\Big{\lfloor}\frac{n}{m_{0}}\Big{\rfloor}.

If nm0n\geq m_{0}, then

(4.21) kn1n(nm01+1m0)=1m01n(11m0)1m01m0(11m0)=1m02.\frac{k}{n}\geq\frac{1}{n}\left(\frac{n}{m_{0}}-1+\frac{1}{m_{0}}\right)=\frac{1}{m_{0}}-\frac{1}{n}\left(1-\frac{1}{m_{0}}\right)\geq\frac{1}{m_{0}}-\frac{1}{m_{0}}\left(1-\frac{1}{m_{0}}\right)=\frac{1}{m_{0}^{2}}.

If 1nm01\leq n\leq m_{0}, then

(4.22) m0n1\frac{m_{0}}{n}\geq 1

Therefore, by combining (4.20), (4.21), (4.22) one has

𝔼𝒟(𝔵nl)exp(Λ^nn)𝔼𝒟(𝔵0l)(1+2[(1+2Nζ)m011])em0,n1,\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq\exp\left(-\widehat{\Lambda}_{n}\cdot n\right)\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})(1+2[(1+2\sqrt{N}\zeta)^{m_{0}-1}-1])e^{m_{0}},\quad n\geq 1,

where the sequence {Λ^n}n1\{\widehat{\Lambda}_{n}\}_{n\geq 1} satisfies

Λ^n\displaystyle\widehat{\Lambda}_{n} :=kn(γ(1γ)m01pm02[(1+2Nζ)m01])+m0n\displaystyle:=\frac{k}{n}\Big{(}\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}-2\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right]\Big{)}+\frac{m_{0}}{n}
min{1m02(γ(1γ)m01pm02[(1+2Nζ)m01]),1}\displaystyle\geq\min\left\{\frac{1}{m_{0}^{2}}\Big{(}\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}-2\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right]\Big{)},1\right\}
=1m02(γ(1γ)m01pm02[(1+2Nζ)m01])=:Λ1(γ,N,m,ζ)>0,\displaystyle=\frac{1}{m_{0}^{2}}\Big{(}\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}-2\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right]\Big{)}=:\Lambda_{1}(\gamma,N,m,\zeta)>0,

provided that ζ\zeta is sufficiently small. Hence

𝔼𝒟(𝔵nl)exp(Λ1n)𝔼𝒟(𝔵0l)(1+2[(1+2Nζ)m011])em0,n0.\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq\exp\left(-\Lambda_{1}\cdot n\right)\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})(1+2[(1+2\sqrt{N}\zeta)^{m_{0}-1}-1])e^{m_{0}},\quad n\geq 0.

Now by setting

C1:=(1+2[(1+2Nζ)m011])em0C_{1}:=(1+2[(1+2\sqrt{N}\zeta)^{m_{0}-1}-1])e^{m_{0}}

one gets the desired result. ∎

In the next theorem, we derive almost sure consensus of (1.3).

Theorem 4.2.

(Almost sure consensus) Let {𝕩ni}{\{\mathbb{x}}^{i}_{n}\} be a solution process to (1.3), and let m0m_{0} be a positive constant defined in Lemma 3.5. Then there exists ζ2>0\zeta_{2}>0 independent of the dimension dd, such that if 0ζ<ζ20\leq\zeta<\zeta_{2} and l{1,,d}l\in\{1,\cdots,d\}, then there exist a positive constant Λ2=Λ2(γ,N,m,ζ)\Lambda_{2}=\Lambda_{2}(\gamma,N,m,\zeta) and random variables C2l=C2l(ω)C_{2}^{l}=C_{2}^{l}(\omega) and C2=max1ldC2l>0C_{2}=\displaystyle\max_{1\leq l\leq d}C_{2}^{l}>0 such that

(i)𝒟(𝔵nl)C2l(ω)exp(Λ2n),n0,a.s.(ii)maxi,j𝕩ni𝕩njC2exp(Λ2n),n0,a.s.\displaystyle\begin{aligned} &(i)~{}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C^{l}_{2}(\omega)\exp\left(-\Lambda_{2}n\right),\quad n\geq 0,\quad\quad\mbox{a.s.}\\ &(ii)~{}\max_{i,j}\|\mathbb{x}_{n}^{i}-\mathbb{x}_{n}^{j}\|_{\infty}\leq C_{2}\exp\left(-\Lambda_{2}n\right),\quad n\geq 0,\quad\mbox{a.s.}\end{aligned}
Proof.

(i) We apply the inequality 1+xex1+x\leq e^{x} to Lemma 4.4 to get

(4.23) 𝒟(𝔵nl)𝒟(𝔵0l)(1+[km0,n))exp[γ(1γ)m01s=1k𝒢[(s1)m0,sm0)+s=1k[(s1)m0,sm0)]𝒟(𝔵0l)exp[γ(1γ)m01s=1k𝒢[(s1)m0,sm0)+s=1k+1[(s1)m0,sm0)]=𝒟(𝔵0l)exp[γ(1γ)m01(1ks=1k𝒢[(s1)m0,sm0))knn+(1k+1s=1k+1[(s1)m0,sm0))k+1nn].\displaystyle\begin{aligned} &\mathcal{D}(\mathfrak{x}_{n}^{l})\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\left(1+{\mathcal{H}}_{[km_{0},n)}\right)\exp\bigg{[}-\gamma(1-\gamma)^{m_{0}-1}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}+\sum_{s=1}^{k}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\bigg{]}\\ &\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\exp\bigg{[}-\gamma(1-\gamma)^{m_{0}-1}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}+\sum_{s=1}^{k+1}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\bigg{]}\\ &={\mathcal{D}}(\mathfrak{x}_{0}^{l})\exp\Bigg{[}-\gamma(1-\gamma)^{m_{0}-1}\Bigg{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Bigg{)}\cdot\frac{k}{n}\cdot n\\ &\hskip 113.81102pt+\Bigg{(}\frac{1}{k+1}\sum_{s=1}^{k+1}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\Bigg{)}\cdot\frac{k+1}{n}\cdot n\Bigg{]}.\end{aligned}

As in Theorem 4.1, one may consider

k=k(n)=nm0.k=k(n)=\Big{\lfloor}\frac{n}{m_{0}}\Big{\rfloor}.

Then, its limiting behavior is :

(4.24) limnk=andlimnkn=1m0.\lim_{n\to\infty}k=\infty\quad\mbox{and}\quad\lim_{n\to\infty}\frac{k}{n}=\frac{1}{m_{0}}.

Hence, one can apply the strong law of large numbers to get

(4.25) limn1ks=1k𝒢[(s1)m0,sm0)=𝔼[𝒢[0,m0)]=pm0a.s.\lim_{n\to\infty}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}=\mathbb{E}\left[{\mathcal{G}}_{[0,m_{0})}\right]=p_{m_{0}}\quad\mbox{a.s.}

and

(4.26) limn1k+1s=1k+1[(s1)m0,sm0)\displaystyle\lim_{n\to\infty}\frac{1}{k+1}\sum_{s=1}^{k+1}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}
=limn1k+1s=1k+12[(1+2Hsm01l1,)(1+2H(s1)m0l1,)1]\displaystyle\hskip 28.45274pt=\lim_{n\to\infty}\frac{1}{k+1}\sum_{s=1}^{k+1}2\left[\left(1+2\|H_{sm_{0}-1}^{l}\|_{1,\infty}\right)\cdots\left(1+2\|H_{(s-1)m_{0}}^{l}\|_{1,\infty}\right)-1\right]
=2[(1+2𝔼H0l1,)m01]a.s.\displaystyle\hskip 28.45274pt=2[(1+2\mathbb{E}\|H_{0}^{l}\|_{1,\infty})^{m_{0}}-1]\quad\mbox{a.s.}

Finally, combining (4.23), (4.24), (4.25) and (4.26) gives

𝒟(𝔵nl)exp(Λ~nn)𝒟(𝔵0l),n1,\mathcal{D}(\mathfrak{x}_{n}^{l})\leq\exp\left(-\widetilde{\Lambda}_{n}\cdot n\right)\mathcal{D}(\mathfrak{x}_{0}^{l}),\quad n\geq 1,

where the random process (Λ~n)n1(\widetilde{\Lambda}_{n})_{n\geq 1} is defined by

Λ~n:=γ(1γ)m01(1ks=1k𝒢[(s1)m0,sm0))kn+(1k+1s=1k+1[(s1)m0,sm0))k+1n,\displaystyle\widetilde{\Lambda}_{n}:=-\gamma(1-\gamma)^{m_{0}-1}\Bigg{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Bigg{)}\cdot\frac{k}{n}+\Bigg{(}\frac{1}{k+1}\sum_{s=1}^{k+1}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\Bigg{)}\cdot\frac{k+1}{n},

and satisfies

limnΛ~n\displaystyle\lim_{n\to\infty}\widetilde{\Lambda}_{n} =γ(1γ)m01m0𝔼[𝒢[(s1)m0,sm0)]2[(1+2𝔼H0l1,)m01]m0\displaystyle=\frac{\gamma(1-\gamma)^{m_{0}-1}}{m_{0}}\mathbb{E}\left[{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\right]-\frac{2[(1+2\mathbb{E}\|H_{0}^{l}\|_{1,\infty})^{m_{0}}-1]}{m_{0}}
γ(1γ)m01pm0m02[(1+2Nζ)m01]m0.\displaystyle\geq\frac{\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}}{m_{0}}-\frac{2[(1+2\sqrt{N}\zeta)^{m_{0}}-1]}{m_{0}}.

Thus, if ζ\zeta is sufficiently small,

limnΛ~n>0.\lim_{n\to\infty}\widetilde{\Lambda}_{n}>0.

This implies that, if one chooses a positive constant Λ2=Λ2(γ,N,m0,ζ)\Lambda_{2}=\Lambda_{2}(\gamma,N,m_{0},\zeta) such that

(4.27) 0<Λ2<γ(1γ)m01pm0m02[(1+2Nζ)m01]m0,0<\Lambda_{2}<\frac{\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}}{m_{0}}-\frac{2[(1+2\sqrt{N}\zeta)^{m_{0}}-1]}{m_{0}},

there exists a stopping time T>0T>0 such that

𝒟(𝔵nl)exp(Λ2n)𝒟(𝔵0l),nT.\mathcal{D}(\mathfrak{x}_{n}^{l})\leq\exp\left(-\Lambda_{2}n\right)\mathcal{D}(\mathfrak{x}_{0}^{l}),\quad n\geq T.

In other words, there exists an almost surely positive and bounded random variable

C2l(ω):=max0n<T{exp(Λ2n)𝒟(𝔵nl)},C_{2}^{l}(\omega):=\max_{0\leq n<T}\{\exp\left(\Lambda_{2}n\right)\mathcal{D}(\mathfrak{x}_{n}^{l})\},

which satisfies

(4.28) 𝒟(𝔵nl)C2l(ω)exp(Λ2n),n0.\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C^{l}_{2}(\omega)\exp\left(-\Lambda_{2}n\right),\quad n\geq 0.

(ii) In the course of proof of Theorem 3.1, we had

(4.29) maxi,j𝕩ni𝕩nj=max1ld𝒟(𝔵nl)a.s.\max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}=\max_{1\leq l\leq d}{\mathcal{D}}({\mathfrak{x}}_{n}^{l})\quad\mbox{a.s.}

Combining (4.28) and (4.29) gives

maxi,j𝕩ni𝕩nj=max1ld𝒟(𝔵nl)(max1ldC2l(ω))exp(Λ2n)=:C2(ω)exp(Λ2n),a.s.\max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}=\max_{1\leq l\leq d}{\mathcal{D}}(\mathfrak{x}_{n}^{l})\leq\Big{(}\max_{1\leq l\leq d}C^{l}_{2}(\omega)\Big{)}\exp(-\Lambda_{2}n)=:C_{2}(\omega)\exp(-\Lambda_{2}n),\quad\mbox{a.s.}

5. Convergence of discrete CBO flow

In this section, we present the convergence analysis of the CBO algorithm (1.3) using the stochastic consensus estimates in Theorems 4.1 and 4.2. In those theorems, it turns out that consensus of (1.3), i.e., decay of the relative state 𝕩i𝕩j\|{\mathbb{x}}^{i}-{\mathbb{x}}^{j}\|, is shown to be exponentially fast in expectation and in almost sure sense. However, this does not guarantee that a solution converges toward an equilibrium of the discrete model (1.3), since it may approach a limit cycle or exhibit chaotic trajectory even if the relative states decay to zero. The convergence of the states 𝔵ni\mathfrak{x}_{n}^{i} to a common point is an important issue for the purpose of global optimization of the CBO algorithm.

Lemma 5.1.

(almost sure convergence) Let {𝕩ni}{\{\mathbb{x}}^{i}_{n}\} be a solution process to (1.3). Then, there exists ζ2>0\zeta_{2}>0 independent of the dimension dd, such that if 0ζ<ζ20\leq\zeta<\zeta_{2}, there exists a random variable 𝕩{\mathbb{x}}_{\infty} such that the solution {𝕩ni}\{{\mathbb{x}}^{i}_{n}\} of (1.3) converges to 𝕩{\mathbb{x}}_{\infty} almost surely:

limn𝕩ni=𝕩a.s.,i𝒩.\lim_{n\to\infty}{\mathbb{x}}_{n}^{i}={\mathbb{x}}_{\infty}\quad\mbox{a.s.,}\quad i\in{\mathcal{N}}.
Proof.

We will use almost sure consensus of the states 𝕩ni{\mathbb{x}}_{n}^{i} (i=1,,Ni=1,\dots,N) in Theorem 4.2 and similar arguments of Theorem 3.1 in [18]. Similar to (3.1), we may rewrite (1.3) as

(5.1) 𝔵n+1l𝔵nl=γ(INWn)𝔵nlHnl(INWn)𝔵nl,1ld,n0,\displaystyle\begin{aligned} \mathfrak{x}_{n+1}^{l}-\mathfrak{x}_{n}^{l}&=-\gamma\left(I_{N}-W_{n}\right)\mathfrak{x}_{n}^{l}-H_{n}^{l}(I_{N}-W_{n})\mathfrak{x}_{n}^{l},\quad 1\leq l\leq d,~{}n\geq 0,\end{aligned}

where the convergence of 𝕩ni{\mathbb{x}}^{i}_{n} for all i=1,,Ni=1,\ldots,N is equivalent to that of 𝔵nl\mathfrak{x}_{n}^{l} for all l=1,,dl=1,\ldots,d. Summing up (5.1) over nn gives

(5.2) 𝔵nl𝔵0l=γk=0n1(INWk)𝔵kl=:𝒥1k=0n1Hkl(INWk)𝔵kl=:𝒥2,1ld,n0.\mathfrak{x}_{n}^{l}-\mathfrak{x}_{0}^{l}=-\gamma\underbrace{\sum_{k=0}^{n-1}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}}_{=:{\mathcal{J}}_{1}}-\underbrace{\sum_{k=0}^{n-1}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}}_{=:{\mathcal{J}}_{2}},\quad 1\leq l\leq d,~{}n\geq 0.

Note that the convergence of 𝔵nl\mathfrak{x}_{n}^{l} follows if the two vector series in the R.H.S. of (5.2) are both convergent almost surely as nn\to\infty.

\bullet Case A (Almost sure convergence of 𝒥1{\mathcal{J}}_{1}): For the ii-th standard unit vector 𝕖i\mathbb{e}_{i} in N\mathbb{R}^{N}, one has

(5.3) |𝕖i(INWk)𝔵kl|=|xki,lj=1Nω[i]k,j(Xk1,,XkN)xkj,l|j=1Nω[i]k,j(Xk1,,XkN)|xki,lxkj,l|max1jN|xki,lxkj,l|=𝒟(𝔵kl).\displaystyle\begin{aligned} |\mathbb{e}_{i}^{\top}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}|&=\left|x_{k}^{i,l}-\sum_{j=1}^{N}\omega_{[i]_{k},j}(X_{k}^{1},\dots,X_{k}^{N})x_{k}^{j,l}\right|\\ &\leq\sum_{j=1}^{N}\omega_{[i]_{k},j}(X_{k}^{1},\dots,X_{k}^{N})\left|x_{k}^{i,l}-x_{k}^{j,l}\right|\leq\max_{1\leq j\leq N}\left|x_{k}^{i,l}-x_{k}^{j,l}\right|=\mathcal{D}(\mathfrak{x}_{k}^{l}).\end{aligned}

where we used the fact that jω[i]k,j=1\sum_{j}\omega_{[i]_{k},j}=1. It follows from Theorem 4.2 that the diameter process 𝒟(𝔵kl)\mathcal{D}(\mathfrak{x}_{k}^{l}) decays to zero exponentially fast almost surely. Hence, for each i=1,,Ni=1,\dots,N, the relation (5.3) implies

k=0|𝕖i(INWk)𝔵kl|k=0𝒟(𝔵kl)<,a.s.\sum_{k=0}^{\infty}|\mathbb{e}_{i}^{\top}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}|\leq\sum_{k=0}^{\infty}\mathcal{D}(\mathfrak{x}_{k}^{l})<\infty,\quad\mbox{a.s.}

This shows that each component of the first series in (5.2) is convergent almost surely.

\bullet Case B (Almost sure convergence of 𝒥2{\mathcal{J}}_{2}): Note that this series is martingale since the random vectors {Hkl(INWk)𝔵kl}k0\{H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\}_{k\geq 0} are independent and have zero mean: for any σ\sigma-field \mathcal{F} independent of HklH^{l}_{k}, we have

𝔼[Hkl(INWk)𝔵kl|]=𝔼[Hkl(INWk)𝔵kl]=(𝔼Hkl)(𝔼(INWk))(𝔼𝔵kl)=0,{\mathbb{E}}[~{}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}~{}|~{}\mathcal{F}~{}]={\mathbb{E}}[~{}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}]=\Big{(}{\mathbb{E}}H_{k}^{l}\Big{)}\cdot\Big{(}{\mathbb{E}}(I_{N}-W_{k})\Big{)}\cdot\Big{(}{\mathbb{E}}\mathfrak{x}_{k}^{l}\Big{)}=0,

where we used the fact that 𝔼Hkl=0{\mathbb{E}}H_{k}^{l}=0. By Doob’s martingale convergence theorem [14], the martingale converges almost surely if the series is uniformly bounded in L2(Ω)L^{2}(\Omega). Again, it follows from (5.3) and Theorem 4.2 that

supn0𝔼|k=0n𝕖iHkl(INWk)𝔵kl|2=supn0𝔼|k=0nηki,l𝕖i(INWk)𝔵kl|2supn0𝔼k=0n|ηki,l𝕖i(INWk)𝔵kl|2=supn0k=0n𝔼|ηki,l|2𝔼|𝕖i(INWk)𝔵kl|2ζ2k=0𝔼|𝕖i(INWk)𝔵kl|2ζ2k=0𝔼|𝒟(𝔵kl)|2<.\displaystyle\begin{aligned} &\sup_{n\geq 0}\mathbb{E}\left|\sum_{k=0}^{n}\mathbb{e}_{i}^{\top}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}=\sup_{n\geq 0}\mathbb{E}\left|\sum_{k=0}^{n}\eta_{k}^{i,l}\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}\\ &\hskip 28.45274pt\leq\sup_{n\geq 0}\mathbb{E}\sum_{k=0}^{n}\left|\eta_{k}^{i,l}\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}=\sup_{n\geq 0}\sum_{k=0}^{n}{\mathbb{E}}|\eta_{k}^{i,l}|^{2}\mathbb{E}\left|\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}\\ &\hskip 28.45274pt\leq\zeta^{2}\sum_{k=0}^{\infty}\mathbb{E}\left|\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}\leq\zeta^{2}\sum_{k=0}^{\infty}\mathbb{E}|\mathcal{D}(\mathfrak{x}_{k}^{l})|^{2}<\infty.\end{aligned}

This yields that the second series in (5.3) converges almost surely.

In (5.3), we combine results from Case A and Case B to conclude that 𝔵nl\mathfrak{x}^{l}_{n} converges to a random variable almost surely. ∎

Lemma 5.2.

Let {Yk}k0\{Y_{k}\}_{k\geq 0} be an i.i.d. sequence of random variables with 𝔼|Y0|<\mathbb{E}|Y_{0}|<\infty. Then, for any given δ>0\delta>0, there exists a random variable C¯(ω)\bar{C}(\omega) such that

|Yk|<C¯(ω)eδk,k0,a.s.|Y_{k}|<\bar{C}(\omega)e^{\delta k},\quad\forall~{}k\geq 0,\quad\mbox{a.s.}
Proof.

By Markov’s inequality, for any constant n1n\geq 1 one has

(|Yk|neδk)𝔼|Yk|neδk,k0.\mathbb{P}(|Y_{k}|\geq ne^{\delta k})\leq\frac{\mathbb{E}|Y_{k}|}{ne^{\delta k}},\quad\forall~{}k\geq 0.

In particular,

(|Yk|<neδk,k0)=1(|Yk|neδk,k0)1k=0N(|Yk|neδk)1𝔼|Yk|n(1eδ).\displaystyle\begin{aligned} \mathbb{P}(|Y_{k}|<ne^{\delta k},~{}\forall~{}k\geq 0)&=1-\mathbb{P}(|Y_{k}|\geq ne^{\delta k},~{}\exists k\geq 0)\geq 1-\sum_{k=0}^{N}\mathbb{P}(|Y_{k}|\geq ne^{\delta k})\geq 1-\frac{\mathbb{E}|Y_{k}|}{n(1-e^{-\delta})}.\end{aligned}

Therefore, for any δ>0\delta>0,

(n1s.t.fork0,|Yk|<neδk)=limn(|Yk|<neδk,k0)=1.\mathbb{P}(\exists~{}n\geq 1~{}\mbox{s.t.}~{}\mbox{for}~{}\forall~{}k\geq 0,~{}|Y_{k}|<ne^{\delta k})=\lim_{n\to\infty}\mathbb{P}(|Y_{k}|<ne^{\delta k},~{}\forall~{}k\geq 0)=1.

This implies existence of a random variable C¯(ω)\bar{C}(\omega) satisfying

|Yk|<C¯(ω)eδk,k0,a.s.|Y_{k}|<\bar{C}(\omega)e^{\delta k},\quad\forall~{}k\geq 0,\quad\mbox{a.s.}

In details, we may define the random variable C¯(ω)\bar{C}(\omega) as

C¯(ω):=n=1𝟏C¯n,C¯n:=Ω{ωΩ:|Yk(ω)|<neδk,k0},n1.\bar{C}(\omega):=\sum_{n=1}^{\infty}\mathbf{1}_{\bar{C}_{n}},\quad\bar{C}_{n}:=\Omega\setminus\left\{\omega\in\Omega~{}:~{}|Y_{k}(\omega)|<ne^{\delta k},~{}\forall~{}k\geq 0\right\},\quad n\geq 1.

For example, if ω\omega satisfies |Yk(ω)|<meδk,k0|Y_{k}(\omega)|<me^{\delta k},~{}\forall~{}k\geq 0 for some mm but not for (m1)(m-1), then ωC¯1,C¯2,,C¯m1\omega\in\bar{C}_{1},\bar{C}_{2},\dots,\bar{C}_{m-1} and ωC¯m,C¯m+1,\omega\notin\bar{C}_{m},\bar{C}_{m+1},\dots so that C¯(ω)=m\bar{C}(\omega)=m. ∎

Now, we are ready to present the convergence of 𝕩ni\mathbb{x}^{i}_{n} in expectation and almost sure sense.

Theorem 5.1.

Let {𝕩ni}{\{\mathbb{x}}^{i}_{n}\} be a solution process to (1.3), and let m0m_{0} be a positive constant defined in Lemma 3.5. Then, there exists ζ3>0\zeta_{3}>0 independent of the dimension dd, such that if 0ζ<ζ30\leq\zeta<\zeta_{3}, then there exists some random variable 𝕩\mathbb{x}_{\infty} such that 𝕩ni{\mathbb{x}}^{i}_{n} exponentially converges to 𝕩{\mathbb{x}}_{\infty} in the following senses:

  1. (1)

    (Convergence in expectation): for positive constants C1C_{1} and Λ1\Lambda_{1} appearing in Theorem 4.1, one has

    𝔼𝕩ni𝕩i1d(γ+Nζ)C1𝔼𝒟(𝔵0l)1eΛ1eΛ1n,n0,i𝒩.{\mathbb{E}}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{\infty}^{i}\|_{1}\leq\frac{d(\gamma+\sqrt{N}\zeta)C_{1}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})}{1-e^{-\Lambda_{1}}}e^{-\Lambda_{1}n},\quad n\geq 0,~{}~{}i\in{\mathcal{N}}.
  2. (2)

    (Almost sure convergence): for a positive constant Λ2\Lambda_{2} appearing in Theorem 4.2, there exists a positive random variable C3(ω)>0C_{3}(\omega)>0 such that

    𝕩n𝕩i1C3(ω)eΛ2na.s.n0,i𝒩.\|\mathbb{x}_{n}-\mathbb{x}_{\infty}^{i}\|_{1}\leq C_{3}(\omega)e^{-\Lambda_{2}n}\quad\mbox{a.s.}\quad n\geq 0,~{}~{}i\in{\mathcal{N}}.
Proof.

Let m0m_{0} be a positive integer appearing in Theorem 4.1. Then, for n0n\geq 0, there exists ζ1>0\zeta_{1}>0 independent of the dimension dd, such that if 0ζ<ζ10\leq\zeta<\zeta_{1}, there exists some positive constants C1=C1(N,m0,ζ)C_{1}=C_{1}(N,m_{0},\zeta) and Λ1=Λ1(γ,N,m0,ζ)\Lambda_{1}=\Lambda_{1}(\gamma,N,m_{0},\zeta) such that

(5.4) 𝔼𝒟(𝔵nl)C1𝔼𝒟(𝔵0l)exp(Λ1n),n0.\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C_{1}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})\exp\left(-\Lambda_{1}n\right),\quad n\geq 0.

Let MnM\geq n. Then, summing up (5.1) over nn gives

𝔵Ml𝔵nl=γk=nM1(INWk)𝔵klk=nM1Hkl(INWk)𝔵kl,1ld,n0.\displaystyle\begin{aligned} \mathfrak{x}_{M}^{l}-\mathfrak{x}_{n}^{l}&=-\gamma\sum_{k=n}^{M-1}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}-\sum_{k=n}^{M-1}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l},\quad 1\leq l\leq d,~{}n\geq 0.\end{aligned}

As in the proof of Lemma 5.1, for the ii-th standard unit vector 𝕖i\mathbb{e}_{i}, one has

(5.5) |xMi,lxni,l|\displaystyle|x_{M}^{i,l}-x_{n}^{i,l}| =|𝕖i(𝔵Ml𝔵nl)|γk=nM1|𝕖i(INWk)𝔵kl|+k=nM1|ηki,l||𝕖i(INWk)𝔵kl|\displaystyle=|\mathbb{e}_{i}^{\top}(\mathfrak{x}_{M}^{l}-\mathfrak{x}_{n}^{l})|\leq\gamma\sum_{k=n}^{M-1}|\mathbb{e}_{i}^{\top}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}|+\sum_{k=n}^{M-1}|\eta_{k}^{i,l}||\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}|
k=nM1(γ+|ηki,l|)𝒟(𝔵kl),\displaystyle\leq\sum_{k=n}^{M-1}(\gamma+|\eta_{k}^{i,l}|)\mathcal{D}(\mathfrak{x}_{k}^{l}),

where the last inequality follows from (5.3)\eqref{E-3}.

(i) Taking expectations to (5.5) and use (5.4) gives This yields

(5.6) 𝔼𝕩Mi𝕩ni=max1ld𝔼|xMi,lxni,l|(γ+Nζ)C11eΛ1max1ld𝔼𝒟(𝔵0l)eΛ1n.{\mathbb{E}}\|{\mathbb{x}}_{M}^{i}-{\mathbb{x}}_{n}^{i}\|_{\infty}=\max_{1\leq l\leq d}{\mathbb{E}}|x_{M}^{i,l}-x_{n}^{i,l}|\leq\frac{(\gamma+\sqrt{N}\zeta)C_{1}}{1-e^{-\Lambda_{1}}}\max_{1\leq l\leq d}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})e^{-\Lambda_{1}n}.

In (5.6), letting MM\to\infty and using Lemma 5.1 give

𝔼𝕩i𝕩ni(γ+Nζ)C11eΛ1max1ld𝔼𝒟(𝔵0l)eΛ1n,n0.{\mathbb{E}}\|{\mathbb{x}}_{\infty}^{i}-{\mathbb{x}}_{n}^{i}\|_{\infty}\leq\frac{(\gamma+\sqrt{N}\zeta)C_{1}}{1-e^{-\Lambda_{1}}}\max_{1\leq l\leq d}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})e^{-\Lambda_{1}n},\quad n\geq 0.

(ii)  In the proof of Theorem 4.2, the constant Λ2\Lambda_{2} was chosen in (4.27) so that

0<Λ2<γ(1γ)m01pm0m02[(1+2Nζ)m01]m0.0<\Lambda_{2}<\frac{\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}}{m_{0}}-\frac{2[(1+2\sqrt{N}\zeta)^{m_{0}}-1]}{m_{0}}.

Here one may fix Λ2\Lambda_{2} and consider small ε>0\varepsilon>0 satisfying

0<Λ2<Λ2+ε<γ(1γ)m01pm0m02[(1+2Nζ)m01]m0.0<\Lambda_{2}<\Lambda_{2}+\varepsilon<\frac{\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}}{m_{0}}-\frac{2[(1+2\sqrt{N}\zeta)^{m_{0}}-1]}{m_{0}}.

Then, from Theorem 4.2 and Lemma 5.2, the estimates of 𝒟(𝔵nl)\mathcal{D}(\mathfrak{x}_{n}^{l}) and |ηki,l||\eta_{k}^{i,l}| can be described by

𝒟(𝔵nl)C2l(ω)exp((Λ2+ε)n),n0,a.s.,|ηki,l|<C¯(ω)eεk,k0,a.s.\displaystyle\begin{aligned} &\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C^{l}_{2}(\omega)\exp\left(-(\Lambda_{2}+\varepsilon)n\right),\quad n\geq 0,\quad\quad\mbox{a.s.},\\ &|\eta_{k}^{i,l}|<\bar{C}(\omega)e^{\varepsilon k},\quad\forall~{}k\geq 0,\quad\mbox{a.s.}\end{aligned}

We apply these estimates to (5.5) and obtain

|xMi,lxni,l|k=nM1(γ+C¯(ω)eεk)C2l(ω)e(Λ2+ε)kC~(ω)k=nM1eΛ2kC~(ω)eΛ2n1eΛ2,a.s.\displaystyle\begin{aligned} |x_{M}^{i,l}-x_{n}^{i,l}|&\leq\sum_{k=n}^{M-1}(\gamma+\bar{C}(\omega)e^{\varepsilon k})C_{2}^{l}(\omega)e^{-(\Lambda_{2}+\varepsilon)k}\leq{\widetilde{C}}(\omega)\sum_{k=n}^{M-1}e^{-\Lambda_{2}k}\leq\frac{{\widetilde{C}}(\omega)e^{-\Lambda_{2}n}}{1-e^{-\Lambda_{2}}},\quad\mbox{a.s.}\end{aligned}

for some positive and bounded random variable C~(ω){\widetilde{C}}(\omega), independent of ll. Now, sending MM\to\infty and using Lemma 5.1 give

|xlxni,l|C~(ω)eΛ2n1eΛ2,a.s.|x_{\infty}^{l}-x_{n}^{i,l}|\leq\frac{{\widetilde{C}}(\omega)e^{-\Lambda_{2}n}}{1-e^{-\Lambda_{2}}},\quad\mbox{a.s.}

This yields the desired estimate:

𝕩𝕩niC~(ω)1eΛ2eΛ2n,a.s.n0.\|\mathbb{x}_{\infty}-\mathbb{x}_{n}^{i}\|_{\infty}\leq\frac{{\widetilde{C}}(\omega)}{1-e^{-\Lambda_{2}}}e^{-\Lambda_{2}n},\quad\mbox{a.s.}\quad n\geq 0.

6. Application to optimization problem

In this section, we discuss how the consensus estimates studied in previous section can be applied to the optimization problem. Let LL be an objective function that we look for a minimizer of

min𝕩dL(𝕩).\min_{\mathbb{x}\in\mathbb{R}^{d}}L(\mathbb{x}).

For the weighted representative point (1.2) with a nonempty set S𝒩S\subset{\mathcal{N}}, we set

(6.1) 𝕩¯nS,:=𝕩niwithi=min{argminjSL(𝕩nj)}.{\bar{\mathbb{x}}}_{n}^{S,*}:=\mathbb{x}_{n}^{i}\quad\mbox{with}~{}i=\min\{\operatorname{argmin}_{j\in S}L({\mathbb{x}_{n}^{j}})\}.

In this setting, we specify one agent of the set argminXj:jSL(Xj)\operatorname{argmin}_{X^{j}:j\in S}L(X^{j}) [6].

Proposition 6.1.

Let {𝕩ni}{\{\mathbb{x}}^{i}_{n}\} be a solution process to (1.3) equipped with (6.1). Then, for all n0n\geq 0, the best guess among all agents at time nn has monotonicity of its objective value:

min1jNL(𝕩n+1j)min1jNL(𝕩nj).\min_{1\leq j\leq N}L(\mathbb{x}_{n+1}^{j})\leq\min_{1\leq j\leq N}L(\mathbb{x}_{n}^{j}).
Proof.

For a fixed nn, let ii be the index satisfying

i=min{argmin1jNL(𝕩nj)}.i=\min\{\operatorname{argmin}_{1\leq j\leq N}L(\mathbb{x}_{n}^{j})\}.

Then, it is easy to see

i=min{argminj[i]nL(𝕩nj)},i.e.,𝕩¯n[i]n,=𝕩ni.i=\min\{\operatorname{argmin}_{j\in[i]_{n}}L(\mathbb{x}_{n}^{j})\},\quad\mbox{i.e.,}\quad{\bar{\mathbb{x}}}_{n}^{[i]_{n},*}=\mathbb{x}_{n}^{i}.

For each 1ld1\leq l\leq d, we have

xn+1i,l\displaystyle x^{i,l}_{n+1} =(1γηni,l)xni,l+(γ+ηni,l)j[i]nω[i]n,j(𝕩n1,,𝕩nN)xnj,l\displaystyle=(1-\gamma-\eta_{n}^{i,l})x^{i,l}_{n}+(\gamma+\eta_{n}^{i,l})\sum_{j\in[i]_{n}}\omega_{[i]_{n},j}(\mathbb{x}_{n}^{1},\dots,\mathbb{x}_{n}^{N})x_{n}^{j,l}
=(1γηni,l)xni,l+(γ+ηni,l)xni,l=xni,l,\displaystyle=(1-\gamma-\eta_{n}^{i,l})x^{i,l}_{n}+(\gamma+\eta_{n}^{i,l})x_{n}^{i,l}=x_{n}^{i,l},

i.e.,

𝕩n+1i=𝕩ni.\mathbb{x}_{n+1}^{i}=\mathbb{x}_{n}^{i}.

Hence, one has

min1jNL(𝕩n+1j)L(𝕩n+1i)=L(𝕩ni)=min1jNL(𝕩nj).\min_{1\leq j\leq N}L(\mathbb{x}_{n+1}^{j})\leq L(\mathbb{x}_{n+1}^{i})=L(\mathbb{x}_{n}^{i})=\min_{1\leq j\leq N}L(\mathbb{x}_{n}^{j}).

Remark 6.1.

Proposition 6.1 suggests that from the relation

min1jNL(𝕩nj)=min0tn,1jNL(𝕩tj),\min_{1\leq j\leq N}L(\mathbb{x}_{n}^{j})=\min_{0\leq t\leq n,~{}1\leq j\leq N}L(\mathbb{x}_{t}^{j}),

the monotonicity actually gives

𝕩¯n𝒩,argmin𝕩nj,1jNL()=argmin𝕩nj,0tn,1jNL().{\bar{\mathbb{x}}}_{n}^{\mathcal{N},*}\in\operatorname{argmin}_{\mathbb{x}_{n}^{j},~{}1\leq j\leq N}L(\cdot)=\operatorname{argmin}_{\mathbb{x}_{n}^{j},~{}0\leq t\leq n,~{}1\leq j\leq N}L(\cdot).

In other words, 𝕩¯n𝒩,{\bar{\mathbb{x}}}_{n}^{\mathcal{N},*} is a best prediction for the minimizer of LL, among the trajectories of all agents on time interval [0,n][0,~{}n]. In the particle swarm optimization (PSO, in short) literature [23], such point is called ‘global best’, and this observation tells us that the algorithm (1.3) equipped with (6.1) may be considered as a simplified model of the PSO algorithm.

Next, we provide numerical simulations for the CBO alglorithms with (6.1). From Proposition 6.1, this algorithm has monotonicity related to the objective function, which is useful to locate the minimum of LL. The CBO algorithms (1.3)–(6.1) will be tested with the Rastrigin function as in [27], which reads

L(x)=1di=1d[(xiBi)210cos(2π(xiBi))+10]+C,L(x)=\frac{1}{d}\sum_{i=1}^{d}\left[(x_{i}-B_{i})^{2}-10\cos(2\pi(x_{i}-B_{i}))+10\right]+C,

where 𝐁=(B1,,Bd)\mathbf{B}=(B_{1},\dots,B_{d}) is the global minimizer of LL. We set C=0C=0 and Bi=1B_{i}=1 for all ii. The Rastrigin function has a lot of local minima near BB, whose values are very close to the global minimum. The parameters are chosen similar to [6, 27]:

N=100,γ=0.01,ζ=0.5,P=10,50,d=2,3,4,5,6,7,8,9,10,N=100,\quad\gamma=0.01,\quad\zeta=0.5,\quad P=10,50,\quad d=2,3,4,5,6,7,8,9,10,

where the initial positions of the particles are distributed uniformly on the box [3,3]d[-3,3]^{d}.

Note that the objective function LL only affects the weighted average 𝕩¯nS,\bar{\mathbb{x}}_{n}^{S,*} in the CBO dynamics (1.3). Theorems 4.1 and 4.2 show that γ\gamma and ζ\zeta mainly determine (a lower bound of) speed of convergence. One useful property of the CBO algorithm is that it always produces a candidate solution for the optimization problem (see Theorem 5.1). However, the candidate may not even be a local minimum. Therefore, following [27], we consider one simulation successful if the candidate is close to the global minimum BB in the sense that

𝕩ni𝐁<0.25,withi=min{argmin1jNL(𝕩nj)}.\|\mathbb{x}_{n}^{i}-\mathbf{B}\|_{\infty}<0.25,\quad\mbox{with}\quad i=\min\left\{\operatorname{argmin}_{1\leq j\leq N}L({\mathbb{x}_{n}^{j}})\right\}.

The stopping criterion is made with the change of positions,

i=1N|𝕩n+1i𝕩ni|2<ε,ε=103.\sum_{i=1}^{N}|\mathbb{x}_{n+1}^{i}-\mathbb{x}_{n}^{i}|^{2}<\varepsilon,\quad\varepsilon=10^{-3}.
Refer to caption
Refer to caption
Figure 1. Time-evolution of the best objective value minjL(𝕩nj)\min_{j}L({\mathbb{x}}_{n}^{j}) in CBO algorithms with full batch (left) and random batches of P=10P=10 (right) for d=4d=4 and N=100N=100.

Figure 1 shows the dynamics of the current guess of the minimum values in CBO algorithms with full batch (same as P=100=NP=100=N) and random batches (P=10P=10) for d=4d=4. Since we choose the argument minimum for the weighted average 𝕩¯nS,\bar{\mathbb{x}}_{n}^{S,*}, the best objective value minjL(𝕩nj)\min_{j}L({\mathbb{x}}_{n}^{j}) is not increasing along the optimization step. This property cannot be observed when we use a weighted average as in (2.1). Of course, for large β\beta, (2.1) is close to the argument minimum (6.1). We can also check that the speed of consensus is much slower if P=10P=10 since the best objective value does not change for a long time (for example, from n=100n=100 to n=200n=200).

Success rate Full batch (P=100P=100) P=50P=50 P=10P=10
d = 2 1.000 1.000 1.000
d = 3 0.988 0.983 0.998
d = 4 0.798 0.920 0.988
d = 5 0.712 0.658 0.931
d = 6 0.513 0.655 0.880
d = 7 0.388 0.464 0.854
d = 8 0.264 0.389 0.832
d = 9 0.170 0.323 0.868
d = 10 0.117 0.274 0.886
Table 1. Success rates of CBO algorithms with different dimensions of Rastrigin functions. 10001000 simulations are done for each algorithm.

Table 1 presents the success rates of finding the minima 𝐁\mathbf{B} for different dimensions and algorithms. It clearly shows that the rates get better for smaller pp, due to the increased randomness. However, the computation with small pp takes more steps to converge, as shown in Figure 2.

Refer to caption
Figure 2. Average number of time steps from full batch and random batches. For each time step, computational time is similar for both methods to converge.
Refer to caption
Refer to caption
Figure 3. Trajectories of CBO particles (left) and time-evolution of the objective values (right) with d=4d=4 without noise from full batch (above) and random batches of P=10P=10 (below) represented in the first two dimensions. The trajectories are drawn with the first two components and the objective values are evaluated from 100100 particles. We choose a successful case of optimization for each method to compare convergent behaviors.

Figure 2 shows a weak point of the random batch interactions by showing the number of time steps needed to reach the stopping criterion. Note that the computational costs for each time step is similar for different PP. However, as it slows down the convergence speed, the number of time steps increases as N/PN/P increases.

Figure 3 shows sample trajectories of the CBO particles with full batch and random batches (P=10P=10) for d=4d=4 when there is no noise (ζ=0\zeta=0). If there is noise, then the trajectories nearly cover the area near 0 so that we cannot catch the differences easily. Note that the random interactions make the particles move around the space. Contrary to the random batch interactions, the particles iwithfull batch dynamics move toward the same point, which is the current minimum position among the particles. This may explain that random batch interactions have better searching ability though it requires more computational cost.

7. Conclusion

In this paper, we have provided stochastic consensus and convergence analysis for the discrete CBO algorithm with random batch interactions and heterogeneous external noises. Several versions of discrete CBO algorithm has been proposed to deal with large scale global non-convex minimization problem arising from, for example, machine learning prolems. Recently, thanks to it simplicity and derivative-free nature, consensus based optimization has received lots of attention from applied mathematics community, yet the full understanding of the dynamic features of the CBO algorithm is still far from complete. Previously, the authors investigated consensus and convergence of the modified CBO algorithm proposed in [6] in the presence of homogeneous external noises and full batch interactions and proposed a sufficient framework leading to the consensus and convergence in terms of system parameters. In this work, we extend our previous work [18] with two new components (random batch interactions and heterogeneous external noises). To deal with random batch interactions, we recast the CBO algorithm with random batch interactions into the consensus model with randomly switching network topology so that we can apply well-developed machineries [11, 12] for the continuous and discrete Cucker-Smale flockng models using ergodicity coefficient measuring connectivity of network topology (or mixing of graphs). Our analysis shows that stochastic consensus and convergence emerge exponentially fast as long as the variance of external noises is sufficiently small. Our consensus estimate yields the monotonicity of an objective function along the discrete CBO flow. One should note that our preliminary result does not yield error estimate for the optimization problem. Thus, whether our asymptotic limit point of the discrete CBO flow stays close to the optimal point or not will be an interesting problem for a future work.

References

  • [1] Albi, G., Bellomo, N., Fermo, L., Ha, S.-Y., Pareschi, L., Poyato, D. and Soler, J.: Vehicular traffic, crowds, and swarms. On the kinetic theory approach towards research perspectives. Math. Models Methods Appl. Sci. 29 (2019), 1901-2005.
  • [2] Alpin, Yu. A. and Gabassov, N. Z.: A remark on the problem locating the eigenvalues of real matrices. Izv. Vyssh. Uchebn. Zaved. Mat., 1976, no. 11, 98–100; Soviet Math. (Iz. VUZ), 20:11 (1976), 86–88.
  • [3] Askari-Sichani, O. and Jalili, M.: Large-scale global optimization through consensus of opinions over complex networks. Complex Adapt Syst Model 1, 11 (2013).
  • [4] Bonabeau, E., Dorigo, M. and Theraulaz, G. (1999). Swarm intelligence: from natural to artificial systems. Oxford university press.
  • [5] Carrillo, J. A., Choi, Y.-P., Totzeck, C. and Tse, O.: An analytical framework for consensus-based global optimization method. Math. Models Methods Appl. Sci. 21 (2018), 1037–1066.
  • [6] Carrillo, J. A., Jin, S., Li, L. and Zhu, Y.: A consensus-based global optimization method for high dimensional machine learning problems. ESAIM: COCV, 27 S5 (2021).
  • [7] Chatterjee, S. and Seneta, E.: Towards consensus: some convergence theorems on repeated averaging. Journal of Applied Probability 14 No. 1 (1977), pp. 89-97.
  • [8] Chen, J., Jin, S. and Lyu, L.: A consensus-based global optimization method with adaptive momentum estimate. Preprint (2020).
  • [9] Degroot, M. H.: Reaching a consensus. Journal of the American Statistical Association 69 No. 345 (1974), pp. 118-12.
  • [10] Dembo, A. and Zeitouni, O.: Large deviations techniques and applications. Springer-Verlag, Berlin, Heidelberg, second edition, 1998.
  • [11] Dong, J.-G., Ha, S.-Y., Jung, J. and Kim, D.: Emergence of stochastic flocking for the discrete Cucker-Smale model with randomly switching topologies. Commun. Math. Sci. 19 (2021), 205-228.
  • [12] Dong, J.-G., Ha, S.-Y., Jung, J. and Kim, D.: On the stochastic flocking of the Cucker-Smale flock with randomly switching topologies. SIAM Journal on Control and Optimization. 58 (2020), 2332-2353.
  • [13] Dorigo, M., Birattari, M. and Stutzle, T.: Ant colony optimization. IEEE computational intelligence magazine, 1 (2006) 28-39.
  • [14] Durrett, R.: Probability: theory and examples. Second ed.. Duxbury Press 1996.
  • [15] Fornasier, M., Huang, H., Pareschi, L. and Sünnen, P.: Consensus-based optimization on hypersurfaces: Well-posedness and mean-field limit. Mathematical Models and Methods in Applied Sciences 30 No. 14, pp. 2725-2751 (2020).
  • [16] Fornasier, M., Huang, H., Pareschi, L. and Sünnen, P.: Consensus-based optimization on the sphere II: Convergence to global minimizers and machine learning. Preprint (2020).
  • [17] Ha, S.-Y., Jin, S. and Kim, D.: Convergence of a first-order consensus-based global optimization algorithm. Math. Models Meth. Appl. Sci. 30 (12), 2417-2444 (2020).
  • [18] Ha, S.-Y., Jin, S. and Kim, D.: Convergence and error estimates for time-discrete consensus-based optimization algorithms. Numer. Math. 147, 255–282 (2021).
  • [19] Ha, S.-Y., Kang, M., Kim, D., Kim, J. and Yang, I.: Stochastic consensus dynamics for nonconvex optimization on the Stiefel manifold: mean-field limit and convergence. Submitted (2021).
  • [20] Holland, J. H.: Genetic algorithms. Scientific American 267 (1992), 66-73.
  • [21] Jin, S., Li, L. and Liu, J. G.: Random Batch Methods (RBM) for interacting particle systems. Journal of Computational Physics, 400 (2020), 108877.
  • [22] Kennedy, J. (2006). Swarm intelligence. In Handbook of nature-inspired and innovative computing (pp. 187-219). Springer, Boston, MA.
  • [23] Kennedy, J. and Eberhart, R. (1005). Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks (Vol. 4, pp. 1942-1948). IEEE.
  • [24] Kim, J., Kang, M., Kim, D., Ha, S.-Y. and Yang, I.: A stochastic consensus method for nonconvex optimization on the Stiefel Manifold. 2020 59th IEEE Conference on Decision and Control, 2020, pp. 1050-1057.
  • [25] Markov, A. A., (1906). Extension of the law oflarge numbers to dependent quantities [in Russian]. Izv. Fiz.-Matem. Ohsch. Kazan Univ., (2nd ser.) 15, 135-56.
  • [26] Mirjalili, S., Mirjalili, S. M. and Lewis, A.: Grey wolf optimizer. Advances in engineering software, 69 (2014), 46-61.
  • [27] Pinnau, R., Totzeck, C., Tse, O. and Martin, S.: A consensus-based model for global optimization and its mean-field limit. Mathematical Models and Methods in Applied Sciences, 27 (2017), 183-204.
  • [28] Totzeck, C. Pinnau, R., Blauth, S. and Schotthófer, S.: A numerical comparison of consensus-based global optimization to other particle-based global optimization scheme. Proceedings in Applied Mathematics and Mechanics, 18, 2018.
  • [29] Yang, X.-S.: Nature-inspired metaheuristic algorithms. Luniver Press, 2010.
  • [30] Yang, X.-S., Deb, S., Zhao, Y.-X., Fong, S. and He, X.: Swarm intelligence: past, present and future. Soft Comput 22 (2018), 5923-5933.