Convergence analysis of the discrete consensus-based optimization algorithm with random batch interactions and heterogeneous noises

Dongnam Ko Department of Mathematics, The Catholic University of Korea,
Jibongro 43, Bucheon, Gyeonggido 14662, Republic of Korea
[email protected] , Seung-Yeal Ha Department of Mathematical Sciences and Research Institute of Mathematics,
Seoul National University, Seoul 08826
and School of Mathematics, Korea Institute for Advanced Study,
Hoegiro 85, Seoul 02455, Republic of Korea
[email protected] , Shi Jin School of Mathematical Sciences, Institute of Natural Sciences, and MOE-LSC,
Shanghai Jiao Tong University, Shanghai 200240, China
[email protected] and Doheon Kim School of Mathematics, Korea Institute for Advanced Study,
Hoegiro 85, Seoul 02455, Republic of Korea
[email protected]

Abstract.

We present stochastic consensus and convergence of the discrete consensus-based optimization (CBO) algorithm with random batch interactions and heterogeneous external noises. Despite the wide applications and successful performance in many practical simulations, the convergence of the discrete CBO algorithm was not rigorously investigated in such a generality. In this work, we introduce a generalized discrete CBO algorithm with a weighted representative point and random batch interactions, and show that the proposed discrete CBO algorithm exhibits stochastic consensus and convergence toward the common equilibrium state exponentially fast under suitable assumptions on system parameters. For this, we recast the given CBO algorithm with random batch interactions as a discrete consensus model with a random switching network topology, and then we use the mixing property of interactions over sufficiently long time interval to derive stochastic consensus and convergence estimates in mean square and almost sure senses. Our proposed analysis significantly improves earlier works on the convergence analysis of CBO models with full batch interactions and homogeneous external noises.

Key words and phrases:

Consensus, external noise, interacting particle system, random batch interactions, randomly switching network topology

2010 Mathematics Subject Classification:

37M99, 37N30, 65P99

Acknowledgment. The work of S.-Y. Ha was supported by National Research Foundation of Korea (NRF-2020R1A2C3A01003881), the work of S. Jin was supported by National Natural Science Foundation of China grant 12031013, Shanghai Municipal Science and Technology Major Project 2021SHZDZX0102, and Science and Technology Commission of Shanghai Municipality grant No. 20JC1414100, the work of D. Kim was supported by a KIAS Individual Grant (MG073901) at Korea Institute for Advanced Study, and the work of D. Ko was supported by the Catholic University of Korea, Research Fund, 2021, and by National Research Foundation of Korea (NRF-2021R1G1A1008559).

1. Introduction

Swarm intelligence [4, 22] provides remarkable meta-heuristic gradient-free optimization methods. Motivated by collective behaviors [1] of herds, flock, colonies or schools, several population-based meta-heuristic optimization techniques were proposed in literature [29, 30]. To name a few, particle swarm optimization [23], ant colony optimization [13], genetic algorithm [20] and grey wolf optimizer [26] are popular meta-heuristic algorithms which have attracted researchers from various disciplines over the last two decades. Among them, consensus-based optimization (CBO) algorithm [27, 5] is a variant methodology of particle swarm optimization, which adopts the idea of consensus mechanism [7, 9] in a multi-agent system. The mechanism is a dynamic process for agent (particle) to approach the common consensus via cooperation [3, 6, 8, 15, 16, 17, 18, 19, 21, 28]. CBO algorithm was further studied as a system of stochastic differential equations [27] and has also been applied to handle objective functions on constrained search spaces, for example hypersurfaces [15], high-dimensional spaces [6], the Stiefel manifold [19, 24], etc.

In this paper, we are interested in stochastic consensus estimates for a discrete CBO algorithm incorporating two components: “heterogeneous external noises” and “random batch interactions”. In [6], a CBO algorithm with these two aspects was studied numerically, while a previous analytical study of CBO in [18] does not cover these aspects.

We first briefly describe the discrete CBO algorithm in [6, 18]. Suppose $L:\mathbb{R}^{d}\to\mathbb{R}$ is non-convex objective function to minimize. As an initial guess, we introduce initial data of particles ${\mathbb{x}}_{0}^{i}\in\mathbb{R}^{d}$ for $i=1,\dots,N$ . This is commonly sampled from random variables as in Monte Carlo methods, independent of other system parameters. Then, the following CBO algorithm [6, 18] governs temporal evolution of each sample point ${\mathbb{x}}_{n}^{i}$ to find the optimal value among the values on the trajectories of particles:

(1.1)

\begin{cases}\displaystyle{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}^{i}_{n}-\gamma({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\eta_{n}^{l}\mathbb{e}_{l},\quad i\in{\mathcal{N}},\\ \displaystyle{\bar{\mathbb{x}}}_{n}^{*}:=\frac{\sum_{j=1}^{N}{\mathbb{x}}^{j}_{n}e^{-\beta L({\mathbb{x}}^{j}_{n})}}{\sum_{j=1}^{N}e^{-\beta L({\mathbb{x}}^{j}_{n})}},\quad n\geq 0,\quad l=1,\cdots,d,\end{cases}

where the random variables $\{\eta^{l}_{n}\}_{n,l}$ are independent and identically distributed(i.i.d.) with

\mathbb{E}[\eta_{n}^{l}]=0,\quad\mathbb{E}[|\eta_{n}^{l}|^{2}]=\zeta^{2},\quad n\geq 0,\quad l=1,\cdots,d.

In particular, from the current data of distributed agents, the network interactions between agents share the information of neighbors, and each agent asymptotically approaches to the minimizer of the objective function. Hence, the main role of the CBO algorithm is to make the sample point ${\mathbb{x}}_{n}^{i}$ converge to the global minimizer ${\mathbb{x}}^{*}$ of $L$ .

In the sequel, we describe a more generalized CBO algorithm which includes (1.1) as its special case. Let ${\mathbb{x}}_{n}^{i}=(x_{n}^{i,1},\dots,x_{n}^{i,d})\in\mathbb{R}^{d}$ be the state of the $i$ -th sample point at time $n$ in the search space, and the set $[i]_{n}$ denotes an index set of agents interacting with $i$ at time $n$ (neighboring sample points in the interaction network). For notational simplicity, we also set

{\mathcal{X}}_{n}:=({\mathbb{x}}_{n}^{1},\cdots,{\mathbb{x}}_{n}^{N}),\quad{\mathcal{N}}:=\{1,\cdots,N\}.

For a nonempty set $S\subset{\mathcal{N}}$ and an index $j\in{\mathcal{N}}$ , we introduce a weight function $\omega_{S,j}:(\mathbb{R}^{d})^{N}\to\mathbb{R}_{\geq 0}$ which satisfies the following relations:

\omega_{S,j}({\mathcal{X}})\geq 0,\quad\sum_{j\in S}\omega_{S,j}({\mathcal{X}})=1,\quad\omega_{S,j}({\mathcal{X}})=0~{}\mbox{ if }~{}j\notin S,

and the convex combination of $\mathbb{x}_{n}^{j}$ :

(1.2)

{\bar{\mathbb{x}}}_{n}^{S,*}:=\sum_{j=1}^{N}\omega_{S,j}({\mathcal{X}}){\mathbb{x}}_{n}^{j}\in\operatorname{conv}\{{\mathbb{x}}_{n}^{j}:j\in S\},\quad\forall~{}\varnothing\neq S\subseteq{\mathcal{N}}.

Before we introduce our governing CBO algorithm, we briefly discuss two key components (heterogeneous external noises and random batch interactions) one by one.

First, a sequence of random matrices $\{(\eta^{i,l}_{n})_{1\leq i\leq N,1\leq l\leq d}\}_{n\geq 0}$ are assumed to be i.i.d. with respect to $n\geq 0$ , with finite first two moments:

\mathbb{E}[\eta_{n}^{i,l}]=0,\quad\mathbb{E}[|\eta_{n}^{i,l}|^{2}]\leq\zeta^{2},\quad{\zeta\geq 0,}\quad{n\geq 0,}\quad l=1,\cdots,d,\quad i\in{\mathcal{N}}.

Second, we discuss “random batch method (RBM)” in [6, 21], to introduce random batch interactions. For the convergence analysis, we need to clarify how RBM provides a randomly switching network. At the $n$ -th time instant, we choose a partition of ${\mathcal{N}}$ randomly, ${\mathcal{B}}^{n}=\{{\mathcal{B}}_{1}^{n},\ldots,{\mathcal{B}}_{\lceil\frac{N}{P}\rceil}^{n}\}$ of $\lceil\frac{N}{P}\rceil$ batches (subsets) with sizes at most $P>1$ as follows:

\{1,\ldots,N\}={\mathcal{B}}_{1}^{n}\cup{\mathcal{B}}_{2}^{n}\cup\cdots\cup{\mathcal{B}}_{\lceil\frac{N}{P}\rceil}^{n},\quad|{\mathcal{B}}_{i}^{n}|=P,\quad i=1,\cdots,\Big{\lceil}\frac{N}{P}\Big{\rceil}-1,\quad|{\mathcal{B}}_{\lceil\frac{N}{P}\rceil}^{n}|\leq P.

Let $\mathcal{A}$ be a set of all such partitions for ${\mathcal{N}}$ . Then, the choices of batches are independent at each time $n$ and follow the uniform law in the finite set of partitions $\mathcal{A}$ . The random variables ${\mathcal{B}}^{n}:(\Omega,\mathcal{F},\mathbb{P})\to\mathcal{A}$ yield random batches at time $n$ for each realization $\omega\in\Omega$ , which are also assumed to be independent of the initial data $\{{\mathbb{x}}_{0}^{i}\}_{i}$ and noise $\{\eta_{n}^{i,l}\}_{n,i,l}$ . Thus, for each time $n$ , the index set of neighbors $[i]_{n}\in{\mathcal{B}}^{n}$ is an element of the partition from ${\mathcal{N}}$ containing $i$ . With the aforementioned key components, we are ready to consider the Cauchy problem to the generalized discrete CBO algorithm under random batch interactions and heterogeneous noises:

(1.3)

\begin{cases}\displaystyle{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}^{i}_{n}-\gamma({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{[i]_{n},*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{[i]_{n},*,l})\eta_{n}^{i,l}\mathbb{e}_{l},\quad n\geq 0,\\ \displaystyle{\mathcal{X}}_{0}=({\mathbb{x}}_{0}^{1},\cdots,{\mathbb{x}}_{0}^{N})\in\mathbb{R}^{dN},\quad i\in{\mathcal{N}},\end{cases}

where the initial data ${\mathcal{X}}_{0}$ can be deterministic or random, and the drift rate $\gamma\in(0,1)$ is a positive constant. In this way, the dynamics (1.3) includes random batch interactions and heterogeneous noises.

In [18], the consensus analysis and error estimates for (1.1) have been done, which is actually the following special case of (1.3): for a positive constant $\beta$ ,

(1.4)

{\bar{\mathbb{x}}}_{n}^{[i]_{n},*}=\sum_{j=1}^{N}\frac{e^{-\beta L({\mathbb{x}}_{n}^{j})}}{\sum_{k=1}^{N}e^{-\beta L({\mathbb{x}}_{n}^{k})}}{\mathbb{x}}_{n}^{j}\quad\mbox{and}\quad\eta_{n}^{i,l}=\eta_{n}^{l},\quad\forall~{}i\in{\mathcal{N}}.

Here both right-hand sides are independent of $i$ , namely the full batch $P=N$ is used, all agent interacts with all the other agents, and the noise is the same for all particles. We refer to Section 2.1 for details.

In this paper, we are interested in the following two questions.

•

(Q1: Emergence of stochastic consensus state) Is there a common consensus state ${\mathbb{x}}_{\infty}\in\mathbb{R}^{d}$ for system (1.3), i.e.,

\lim_{n\to\infty}{\mathbb{x}}_{n}^{i}={\mathbb{x}}_{\infty}\quad\mbox{in suitable sense,}\quad\mbox{for all~{}$i\in{\mathcal{N}}$}?

•

(Q2: Optimality of the stochastic consensus state) If so, then is the consensus state ${\mathbb{x}}_{\infty}\in\mathbb{R}^{d}$ an optimal point?

$L({\mathbb{x}}_{\infty})=\inf\{L({\mathbb{x}}):{\mathbb{x}}\in\mathbb{R}^{d}\}.$

Among the above two posed questions, we will mostly focus on the first question (Q1). In fact, the continuous analogue for (1.3) has been dealt with in literature. In the first paper of CBO [3], the CBO algorithm without external noise ( $\sigma=0$ ) has been studied, and the formation of consensus has been proved under suitable assumption on the network structure. However, the optimality was only shown by numerical simulation even for this deterministic case. In [6], the dynamics (1.3) has been investigated via a mean-field limit ( $N\to\infty$ ). In this case, the particle density distribution, which is the limit of the empirical measure $\frac{1}{N}\sum_{i}\delta_{{\mathbb{x}}_{n}^{i}}$ in $N\to\infty$ , satisfies a nonlinear Fokker-Planck equation. By using the analytical structure of the Fokker-Planck equation, the consensus estimate was established. The optimality of the limit value is also estimated by Laplace’s principle, as $\beta\to\infty$ (see Section 2). Rigorous convergence and error estimates to the continuous and discrete algorithms for particle system (1.3) with fixed $N$ were addressed in authors’ recent works [17, 18] for the special setting (1.4).

Before we present our main results, we introduce several concepts of stochastic consensus and convergence as follows.

Definition 1.1.

Let $\{{\mathcal{X}}_{n}\}_{n\geq 0}=\{\mathbb{x}_{n}^{i}:1\leq i\leq N\}_{n\geq 0}$ be a discrete stochastic process whose dynamics is governed by (1.3). Then, several concepts of stochastic consensus and convergence can be defined as follows.

(1)

The random process $\{{\mathcal{X}}_{n}\}_{n\geq 0}$ exhibits “consensus in expectation” or “almost sure consensus”, respectively if

(1.5)

\lim_{n\to\infty}\max_{i,j}\mathbb{E}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{n}^{j}\|=0\quad\mbox{or}\quad\lim_{n\to\infty}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{n}^{j}\|=0,~{}~{}\forall~{}~{}i,j\in{\mathcal{N}}\quad\mbox{a.s.},

where $\|\cdot\|$ denotes the standard $\ell^{2}$ -norm in ${\mathbb{R}}^{d}$ .

(2)

The random process $\{{\mathcal{X}}_{n}\}_{n\geq 0}$ exhibits “convergence in expectation” or “almost sure convergence”, respectively if there exists a random vector $\mathbb{x}_{\infty}$ such that

(1.6)

\lim_{n\to\infty}\max_{i}\mathbb{E}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{\infty}\|=0\quad\mbox{or}\quad\lim_{n\to\infty}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{\infty}\|=0,~{}~{}\forall~{}~{}i\in{\mathcal{N}}\quad\mbox{a.s.}

In addition, we introduce some terminologies. For a given set of $N$ vectors $\{{\mathbb{x}}^{i}\}_{i=1}^{N}$ , we use $X$ to denote the $N\times d$ matrix whose $i$ -th row is $({\mathbb{x}}^{i})^{\top}$ , and the $l$ -th column vector $\mathfrak{x}^{l}$ , as

X:=\left(\begin{array}[]{ccc}x^{1,1}&\cdots&x^{1,d}\\ \vdots&\vdots&\cdots\\ x^{N,1}&\cdots&x^{N,d}\end{array}\right),\quad{\mathbb{x}}^{i}:=\left(\begin{array}[]{c}x^{i,1}\\ \vdots\\ x^{i,d}\end{array}\right),\quad{\mathfrak{x}}^{l}:=\left(\begin{array}[]{c}x^{1,l}\\ \vdots\\ x^{N,l}\end{array}\right),\quad{\mathcal{D}}(\mathfrak{x}^{l}):=\max_{i}x^{i,l}-\min_{i}x^{i,l}.

The main results of this paper are two-fold. First, we present stochastic consensus estimates for (1.5) in the sense of Definition 1.1 (1). For this, we begin with the dynamics of the $l$ -th column vector $\mathfrak{x}^{l}$ :

{\mathfrak{x}}_{n+1}^{l}=(A_{n}+B_{n}){\mathfrak{x}}_{n}^{l},\quad n\geq 0

for some random matrices $A_{n}$ and $B_{n}$ corresponding to random source due to, first, only random batch interactions, and second, interplay between the external white noises $\eta_{n}^{i,l}$ and random batch interactions, respectively (see Section 2.1 for details). Then the diameter functional ${\mathcal{D}}(\mathfrak{x}^{l})$ satisfies

(1.7)

{\mathcal{D}}(\mathfrak{x}_{n+1}^{l})\leq(1-\alpha(A_{n}+B_{n})){\mathcal{D}}(\mathfrak{x}_{n}^{l}),

where $\alpha(A)$ is the ergodicity coefficient of matrix $A$ (see Definition 3.1).

For the case (1.4) in which interactions are full batch ( $P=N$ ) with a constant weight function $\omega_{\mathcal{N},j}$ and the noises are homogenous, one can show that the ergodicity $\alpha(A_{n}+B_{n})$ is strictly positive for small $\zeta$ . Then, the recursive relation (1.7) yields desired exponential consensus. Moreover, the assumption (1.4) also implies that the dynamics of $x^{i,l}_{n}-x^{j,l}_{n}$ follows a closed form of stochastic difference equation (see [18]). Then, following the idea of geometric Brownian motion from [17], the decay of $x^{i,l}_{n}-x^{j,l}_{n}$ can be obtained. However, in our setting (1.3), the aforementioned blue picture breaks down. In fact, we cannot even guarantee the nonnegativity of $\alpha(A_{n}+B_{n})$ . From the definition of ergodic coefficient, the positive ergodicity coefficient implies that any two particles are attracted thanks to network structure, which is not possible when particles are separated by random batches. Hence, the recursive relation (1.7) is not enough to derive an exponential decay of ${\mathcal{D}}(\mathfrak{x}_{n}^{l})$ as it is. On the other hand, to quantify enough mixing effect via the network topology, we use the $m$ -transition relation with $m\gg 1$ so that

0<\alpha\Big{(}\prod_{k=n}^{n+m-1}(A_{k}+B_{k})\Big{)}<1.

In this case, the $m$ -transition relation satisfies

{\mathfrak{x}}_{n+m}^{l}=\prod_{k=n}^{n+m-1}(A_{k}+B_{k}){\mathfrak{x}}_{n}^{l},\quad n\geq 0.

Again, we use the elementary property of ergodicity coefficient presented in Lemma 3.3 to find

{\mathcal{D}}(\mathfrak{x}_{n+m}^{l})\leq(1-\alpha((A_{n+m-1}+B_{n+m-1})(A_{n+m-2}+B_{n+m-2})\cdots(A_{n}+B_{n}))){\mathcal{D}}(\mathfrak{x}_{n}^{l}).

One of the crucial novelties of this paper lies on the estimation technique for a product of matrix summations. When there is no noise $(B_{k}=0)$ , we can use analysis on randomly switching topologies [12] to derive the positivity of the ergodic coefficient. In order to handle heterogenous external noises, we adopt the concept of negative ergodicity from general $n$ -by- $n$ matrices and treat $B_{k}$ as a perturbation (see Lemma 4.1 and Lemma 4.2 for details). This yields an exponential decay of ${\mathcal{D}}(\mathfrak{x}^{l})$ in suitable stochastic senses for a sufficiently small $\zeta$ , the variance of the noise (see Theorem 4.1 and Theorem 4.2): there exist positive constants $\Lambda_{1}$ , $\Lambda_{2}$ , $C_{1}$ and a random variable $C_{2}=C_{2}(\omega)$ such that

(1.8)

\displaystyle\begin{aligned} &\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C_{1}\exp\left(-\Lambda_{1}n\right)\quad\text{and}\quad\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C_{2}(\omega)\exp\left(-\Lambda_{2}n\right)\quad\mbox{a.s.}\end{aligned}

This implies

\displaystyle\begin{aligned} &{\mathbb{E}}\|\mathbb{x}_{n}^{i}-\mathbb{x}_{n}^{j}\|\leq C_{1}e^{-\Lambda_{1}n}\quad\text{and}\quad\max_{i,j}\|\mathbb{x}_{n}^{i}-\mathbb{x}_{n}^{j}\|\leq C_{2}(\omega)e^{-\Lambda_{2}n}\quad\mbox{a.s.}\end{aligned}

Second, we deal with the convergence analysis (1.6) of stochastic flow $\{\mathbb{x}_{n}^{i}\}$ . One first sees that the dynamics of $\mathfrak{x}_{n}^{l}$ can be described as

(1.9)

\mathfrak{x}_{n}^{l}-\mathfrak{x}_{0}^{l}=-\gamma\sum_{k=0}^{n-1}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}-\sum_{k=0}^{n-1}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l},\quad 1\leq l\leq d,~{}n\geq 0.

Then, applying the strong law of large numbers as in [18], the almost sure exponential convergence (LABEL:A-6) for $\mathcal{D}(\mathfrak{x}_{n}^{l})$ induces the convergence of the first series in the R.H.S. of (1.9). On the other hand, the convergence of the second series will be shown using Doob’s martingale convergence theorem. Thus, for sufficiently small $\zeta$ , there exists a random variable ${\mathbb{x}}_{\infty}$ such that

\lim_{n\to\infty}{\mathbb{x}}_{n}^{i}={\mathbb{x}}_{\infty}\quad\mbox{a.s.,}\quad i\in{\mathcal{N}}.

We refer to Lemma 5.1 for details. Moreover, the above convergence result can be shown to be exponential in expectation and almost sure senses. A natural estimate yields

|x_{m}^{i,l}-x_{n}^{i,l}|\leq\sum_{k=n}^{m-1}(\gamma+|\eta_{k}^{i,l}|)\mathcal{D}(\mathfrak{x}_{k}^{l}),

however, the external noises $\eta_{n}^{i,l}$ are not uniformly bounded in almost sure sense. Instead, the noises with decay in time, $\eta_{n}^{i,l}e^{-\varepsilon n}$ , is uniformly bounded for any $\varepsilon>0$ , as presented in Lemma 5.2. This is a key lemma employed in the derivation of exponential convergence: there exists a positive constant $C_{3}>0$ and a random variable $C_{4}(\omega)$ such that

{\mathbb{E}}\|{\mathbb{x}}_{\infty}^{i}-{\mathbb{x}}_{n}^{i}\|\leq C_{3}e^{-\Lambda_{1}n}\quad\mbox{and}\quad\|\mathbb{x}_{\infty}-\mathbb{x}_{n}^{i}\|\leq C_{4}(\omega)e^{-\Lambda_{2}n}\quad\mbox{a.s.}

(See Theorem 5.1 for details).

The rest of this paper is organized as follows. In Section 2, we briefly present the discrete CBO algorithm [6] and review consensus and convergence result [18] for the case of homogeneous external noises. In Section 3, we study stochastic consensus estimates in the absence of external noises so that the randomness in the dynamics is mainly caused by random batch interactions. In Section 4, we consider the stochastic consensus arising from the interplay of two random sources (heterogeneous external noises and random batch interactions) when the standard deviation of heterogeneous external noise is sufficiently small. In Section 5, we provide a stochastic convergence analysis for the discrete CBO flow based on the consensus results. In Section 6, we provide monotonicity of the maximum of the objective function along the discrete CBO flow, and present several numerical simulations to confirm analytical results. Finally, Section 7 is devoted to a brief summary of our main results and some remaining issues to be explored in a future work.

Gallery of notation: We use superscript to denote the particle number and components of a vector, for example, $x^{i,k}$ denotes the $k$ -th component of the $i$ -th’s particle, i.e.,

{\mathbb{x}}^{i}=(x^{i,1},\cdots,x^{i,d})\in{\mathbb{R}}^{d},\quad i\in{\mathcal{N}}.

We also use simplified notation for summation and maximization:

\sum_{i}:=\sum_{i=1}^{N},\quad\sum_{i,j}:=\sum_{i=1}^{N}\sum_{j=1}^{N},\quad\max_{i}:=\max_{1\leq i\leq N},\quad\max_{i,j}:=\max_{1\leq i,j\leq N}.

For a given state vector ${\mathfrak{x}}=(x^{1},\cdots,x^{N})\in\mathbb{R}^{N}$ , we introduce its state diameter ${\mathcal{D}}({\mathfrak{x}})$ :

(1.10)

{\mathcal{D}}(\mathfrak{x}):=\max_{i,j}(x^{i}-x^{j})=\max_{i}x^{i}-\min_{i}x^{i}.

For $\mathbb{x}=(x^{1},\cdots,x^{d})\in\mathbb{R}^{d}$ , we denote its $\ell^{\infty}$ -norm by $\|\mathbb{x}\|_{\infty}:=\displaystyle\max_{k}|x^{k}|$ .
Lastly, for integers $n_{1}\leq n_{2}$ and square matrices $M_{n_{1}},M_{n_{1}+1},\dots,M_{n_{2}}$ with the same size, we define

\prod_{k=n_{1}}^{n_{2}}M_{k}:=M_{n_{2}}\cdots M_{n_{1}+1}M_{n_{1}}.

Note that the ordering in the product is important, because the matrix product is not commutative in general.

2. Preliminaries

In this section, we briefly present the discrete CBO algorithm (1.3) and then recall convergence and optimization results from [18].

2.1. The CBO algorithms

Let ${\mathbb{x}}_{t}^{i}=(x_{t}^{i,1},\dots,x_{t}^{i,d})\in\mathbb{R}^{d}$ be the state of the $i$ -th sample point at time $t$ in search space, and we assume that state dynamics is governed by the CBO algorithm introduced in [6].

(2.1)

\begin{cases}\displaystyle d{\mathbb{x}}^{i}_{t}=-\gamma({\mathbb{x}}^{i}_{t}-{\bar{\mathbb{x}}}_{t}^{*})dt-\sigma\sum_{l=1}^{d}(x^{i,l}_{t}-{\bar{x}}_{t}^{*,l})dW_{t}^{i,l}\mathbb{e}_{l},\quad t>0,\quad i\in{\mathcal{N}},\\ \displaystyle{\bar{\mathbb{x}}}_{t}^{*}={(\bar{x}_{t}^{*,1},\cdots,\bar{x}_{t}^{*,d})}:=\frac{\sum_{j=1}^{N}{\mathbb{x}}^{j}_{t}e^{-\beta L({\mathbb{x}}^{j}_{t})}}{\sum_{k=1}^{N}e^{-\beta L({\mathbb{x}}^{k}_{t})}}.\end{cases}

Here $\gamma\in(0,1)$ , $\sigma$ and $\beta>0$ are drift rate, noise intensity and reciprocal of temperature, respectively. $\{\mathbb{e}_{l}\}_{l=1}^{d}$ is the standard orthonormal basis in $\mathbb{R}^{d}$ , and $W_{t}^{i,l}$ $(l=1,\dots,d,~{}i=1,\dots,N)$ are i.i.d. standard one-dimensional Brownian motions.

Note the following discrete analogue of Laplace’s principle [10]:

\lim_{\beta\to\infty}\frac{\sum_{j\in{\mathcal{N}}}{\mathbb{x}}^{j}e^{-\beta L({\mathbb{x}}^{j})}}{\sum_{j\in{\mathcal{N}}}e^{-\beta L({\mathbb{x}}^{j})}}=\mbox{centroid of the set}~{}\left\{{\mathbb{x}}^{j}~{}:~{}L({\mathbb{x}}^{j})=\min_{k\in{\mathcal{N}}}L({\mathbb{x}}^{k})\right\}.

This explains why CBO algorithm (2.1) is expected to pick up the minimizer of $L$ .

Next, we consider time-discretization of (2.1). For this, we set

h:=\Delta t,\quad{\mathbb{x}}^{i}_{n}:={\mathbb{x}}^{i}_{nh},\quad n\geq 0,\quad i\in{\mathcal{N}}.

Consider the following three discrete algorithms based on [6, 18]: for $n\geq 0,~{}~{}i=1,\cdots,N,$

\displaystyle\begin{aligned} &\mbox{Model A}:\quad{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}_{n}^{i}-\lambda h({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\sigma\sqrt{h}Z_{n}^{l}\mathbb{e}_{l},\\ &\mbox{Model B}:\quad\begin{cases}\displaystyle\hat{\mathbb{x}}^{i}_{n}={\bar{\mathbb{x}}}_{n}^{*}+e^{-\lambda h}({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{*}),\\ \displaystyle{\mathbb{x}}^{i}_{n+1}=\hat{\mathbb{x}}_{n}^{i}-\sum_{l=1}^{d}(\hat{x}^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\sigma\sqrt{h}Z_{n}^{l}\mathbb{e}_{l},\end{cases}\\ &\mbox{Model C}:\quad{\mathbb{x}}^{i}_{n+1}={\bar{\mathbb{x}}}_{n}^{*}+\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\left[\exp\left(-\left(\lambda+\frac{1}{2}\sigma^{2}\right)h+\sigma\sqrt{h}Z_{n}^{l}\right)\right]\mathbb{e}_{l},\end{aligned}

where the random variables $\{Z^{l}_{n}\}_{n,l}$ are i.i.d standard normal distributions. Model A is precisely the Euler-Maruyama scheme applied to (2.1), and Models B and C are other discretizations of (2.1) introdueced in [6]. In [18], it is shown that the three models are special cases (with different choices of $\gamma$ and $\eta_{n}^{l}$ ) of the following generalized scheme:

(2.2)

\begin{cases}\displaystyle{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}^{i}_{n}-\gamma({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{*,l})\eta_{n}^{l}\mathbb{e}_{l},\quad i\in{\mathcal{N}},\\ \displaystyle{\bar{\mathbb{x}}}_{n}^{*}:=\frac{\sum_{j=1}^{N}{\mathbb{x}}^{j}_{n}e^{-\beta L({\mathbb{x}}^{j}_{n})}}{\sum_{j=1}^{N}e^{-\beta L({\mathbb{x}}^{j}_{n})}},\quad n\geq 0,\quad l=1,\cdots,d,\end{cases}

where the random variables $\{\eta^{l}_{n}\}_{n,l}$ are i.i.d. with

\mathbb{E}[\eta_{n}^{l}]=0,\quad\mathbb{E}[|\eta_{n}^{l}|^{2}]=\zeta^{2},\quad n\geq 0,\quad l=1,\cdots,d.

Moreover, discrete algorithm (2.2) corresponds to the special case of (1.3):

{S}:={\mathcal{N}},\qquad\omega_{{\mathcal{N}},j}=\frac{e^{-\beta L({\mathbb{x}}^{j})}}{\sum_{k=1}^{N}e^{-\beta L({\mathbb{x}}^{k})}},\quad F_{{\mathcal{N}}}(X):=\sum_{j=1}^{N}\frac{{\mathbb{x}}^{j}e^{-\beta L({\mathbb{x}}^{j})}}{\sum_{k=1}^{N}e^{-\beta L({\mathbb{x}}^{k})}},\quad\eta_{n}^{l}\equiv\eta_{n}^{i,l}.

Therefore, these three models can be explained by the dynamics (1.3).

2.2. Previous results

In this subsection, we briefly summarize the previous results [18] on the convergence and error estimates for the case

\eta_{n}^{i,l}\equiv\eta_{n}^{l}

(thus noises are not heterogeneous). In the next theorem, we recall the convergence estimate of the discrete scheme (2.2) as follows.

Theorem 2.1.

(Convergence estimate [18]) Suppose system parameters $\gamma$ and $\zeta$ satisfy

(\gamma-1)^{2}+\zeta^{2}<1,

and let $\{{\mathbb{x}}_{t}^{i}\}$ be a solution process to (2.2). Then, one has the following assertions:

(1)

Consensus in expectation and in almost sure sense emerge asymptotically:

\lim_{n\to\infty}\mathbb{E}|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}|^{2}=0,\quad|x^{i,l}_{n}-x^{j,l}_{n}|^{2}\leq|x^{i,l}_{0}-x^{j,l}_{0}|^{2}e^{-nY_{n}^{l}},\quad\mbox{a.s.}~{}\omega\in\Omega,

for $i,j\in{\mathcal{N}},~{}~{}l=1,\cdots,d,$ and a random variable $Y_{n}^{l}$ satisfying

\lim_{n\to\infty}Y_{n}^{l}(\omega)=1-(\gamma-1)^{2}-\zeta^{2}>0,\quad\mbox{a.s.}~{}\omega\in\Omega.

(2)

There exists a common random vector ${\mathbb{x}}_{\infty}=(x_{\infty}^{1},\cdots,x_{\infty}^{d})$ such that

$\lim\limits_{n\to\infty}{\mathbb{x}}_{n}^{i}={\mathbb{x}}_{\infty}~{}\mbox{a.s.},\quad i\in{\mathcal{N}}.$

Remark 2.1.

In [18], an error estimate of (2.2) toward the global minimum is also obtained as a partial result on optimality. If the initial guess ${\mathbb{x}}_{in}$ is good enough so that it surrounds the global minimum ${\mathbb{x}}_{*}$ and particles are close to each other, then the asymptotic consensus state ${\mathbb{x}}_{\infty}$ is close to ${\mathbb{x}}_{*}$ .

3. Discrete CBO algorithm with random batch interactions

In this section, we present matrix formulation of (1.3), some elementary estimates for ergodicity coefficient and stochastic consensus to system (1.3) in the absence of heterogeneous external noises.

3.1. A matrix reformulation

In this subsection, we present a matrix formulation of the following discrete CBO model:

(3.1)

{\mathbb{x}}^{i}_{n+1}={\mathbb{x}}^{i}_{n}-\gamma({\mathbb{x}}^{i}_{n}-{\bar{\mathbb{x}}}_{n}^{[i]_{n},*})-\sum_{l=1}^{d}(x^{i,l}_{n}-{\bar{x}}_{n}^{[i]_{n},*,l})\eta_{n}^{i,l}\mathbb{e}_{l},\quad n\geq 0,\quad i\in{\mathcal{N}}.

Then, the $l$ -th component of (3.1) can be rewritten as

(3.2)

x^{i,l}_{n+1}=(1-\gamma-\eta_{n}^{i,l})x^{i,l}_{n}+(\gamma+\eta_{n}^{i,l})\sum_{j\in[i]_{n}}\omega_{[i]_{n},j}(X_{n})x_{n}^{j,l}.

For $n\geq 0$ and $l=1,\cdots,d$ , set

(3.3)

W_{n}:=(\omega_{[i]_{n},j}(X_{n}))\in\mathbb{R}^{N\times N},\quad{\mathfrak{x}}_{n}^{l}:=(x_{n}^{1,l},\dots,x_{n}^{N,l})^{\top},\quad H_{n}^{l}:=\operatorname{diag}(\eta_{n}^{1,l},\dots,\eta_{n}^{N,l}).

Note that ${\mathfrak{x}}_{n}^{l}$ denotes the $l$ -th column of the matrix $X_{n}$ . Then system (3.2) can be rewritten in the following matrix form:

The randomness of $A_{n}$ is due to the random switching of network topology via random batch interactions. In contrast, $B_{n}$ takes care of the external noises $H_{n}^{l}$ . In the absence of external noise ( $\zeta=0$ ), one has $H_{n}^{l}=0$ for all $n\geq 0$ and system (3.1) reduces to the stochastic system:

(3.4)

{\mathfrak{x}}_{n+1}^{l}=A_{n}{\mathfrak{x}}_{n}^{l},\quad n\geq 0.

In the next lemma, we study several properties of $A_{n}$ .

Lemma 3.1.

Let $A_{n}$ be a matrix defined in (3.1). Then, it satisfies the following two properties:

(1)

$A_{n}$ is row-stochastic:

$[A_{n}]_{ij}\geq 0,\quad\sum_{j=1}^{N}[A_{n}]_{ij}=1,\quad\forall~{}i\in{\mathcal{N}},$

where $[A_{n}]_{ij}$ is the $(i,j)$ -th element of the matrix $A_{n}\in\mathbb{R}^{N\times N}$ .
(2)

The product $A_{n+m}\cdots A_{n}$ is row-stochastic:

$[A_{n+m}\cdots A_{n}]_{ij}\geq 0,\quad\sum_{j=1}^{d}[A_{n+m}\cdots A_{n}]_{ij}=1.$

Proof.

(1) By (3.3) and (3.1), one has

[A_{n}]_{ij}=[(1-\gamma)I_{N}+\gamma W_{n}]_{ij}=(1-\gamma)\delta_{ij}+\gamma\omega_{[i]_{n},j}(X_{n})\geq 0,

where we used the fact $\gamma\in(0,1)$ . This also yields

\sum_{j}[A_{n}]_{ij}=\sum_{j}(1-\gamma)\delta_{ij}+\gamma\omega_{[i]_{n},j}(X_{n})=(1-\gamma)\sum_{j}\delta_{ij}+\gamma\sum_{j}\omega_{[i]_{n},j}(X_{n})=1.

(2) The second assertion holds since the product of two row-stochastic matrices is also row-stochastic.

∎

Note that the asymptotic dynamics of ${\mathbb{x}}_{n}$ is completely determined by the matrix sequence $\{A_{n}\}_{n\geq 0}$ in (3.4).

3.2. Ergodicity coefficient

In this subsection, we recall the concept of ergodicity coefficient and investigate its basic properties which will be crucially used in later sections.

First, we recall the ergodicity coefficient of a matrix $A\in\mathbb{R}^{N\times N}$ as follows.

Definition 3.1.

[2] For $A=(a_{ij})\in\mathbb{R}^{N\times N}$ , we define the ergodicity coefficient of $A$ as

(3.5)

\alpha(A):=\min_{i,j}\sum_{k=1}^{N}\min\{a_{ik},a_{jk}\}.

Remark 3.1.

In [7], the ergodicity coefficient (3.5) is introduced to study the emergent dynamics of systems of the form (3.4).

As a direct application of (3.5), one has the super-additivity and monotonicity for $\alpha$ .

Lemma 3.2.

Let $A,B$ be square matrices in $\mathbb{R}^{N\times N}$ . Then, the ergodicity coefficient has the following assertions:

(1)

(Super-additivity and homogeneity):

$\alpha(A+B)\geq\alpha(A)+\alpha(B)\quad\text{and}\quad\alpha(tA)=t\alpha(A),\quad t\geq 0.$
(2)

(Monotonicity):

$A\geq B~{}\mbox{entry-wise}\quad\Longrightarrow\quad\alpha(A)\geq\alpha(B).$

Proof.

All the estimates can be followed from (3.5) directly. Hence, we omit its details. ∎

Next, we recall a key tool that will be crucially used in the consensus estimates for (3.1).

Lemma 3.3.

[2] Let $A\in\mathbb{R}^{N\times N}$ be a square matrix with equal row sums:

\sum_{j=1}^{d}a_{ij}=a,\quad\forall~{}i\in{\mathcal{N}}.

Then, we have

\mathcal{D}(A\mathbb{z})\leq(a-\alpha(A))\mathcal{D}(\mathbb{z}),\qquad\forall~{}\mathbb{z}\in\mathbb{R}^{N}.

Proof.

A proof using induction on $N$ can be found in [2], but here we present a different approach. By definition of state diameter (1.10), one has ∎

Next, we study elementary lemmas for the emergence of consensus. From Definition 1.1, for consensus, it suffices to show

\lim_{n\to\infty}{\mathcal{D}}(\mathfrak{x}_{n}^{l})=0,\quad\forall~{}l.

We apply Lemma 3.3 for (3.4) and row-stochasticity of $A_{n}$ to get

\displaystyle{\mathcal{D}}(\mathfrak{x}_{n+1}^{l})\leq(1-\alpha(A_{n})){\mathcal{D}}(\mathfrak{x}_{n}^{l}).

However, note that the ergodicity coefficient $\alpha(A_{n})$ may not be strictly positive. For $m\geq 1$ , we iterate (3.4) $m$ -times to get

{\mathfrak{x}}_{n+m}^{l}=A_{n+m-1}\cdots A_{n}{\mathfrak{x}}_{n}^{l},\quad n\geq 0.

Since the product $A_{n+m-1}\cdots A_{n}$ is also row-stochastic (see Lemma 3.1), we can apply Lemma 3.3 to find

{\mathcal{D}}(\mathfrak{x}_{n+m}^{l})\leq(1-\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})){\mathcal{D}}(\mathfrak{x}_{n}^{l}).

Then, the ergodicity coefficient $\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})$ depends on the network structure between $W_{n+m-1},W_{n+m-2},\ldots,W_{n}$ , and the probability of $\alpha$ being strictly positive increases, as $m$ grows.

In [7], the consensus $\mathcal{D}(\mathfrak{x}_{n}^{l})\to 0$ $(n\to\infty)$ was guaranteed under the assumption:

(3.6)

\displaystyle\sum_{k=0}^{\infty}\alpha(A_{t_{k+1}-1}A_{t_{k+1}-2}\dots A_{t_{k}})=\infty\quad\mbox{for some}\quad 0=t_{0}<t_{1}<\dots{}

However, examples of the network topology (which may vary in time, possibly randomly) that satisfies the aforementioned a priori assumption (3.6) was not well-studied yet. Below, we show that random batch interactions provide a network topology satisfying a priori condition (3.6).

First, we introduce a random variable measuring the degree of network connectivity. For a given pair $(i,j)\in{\mathcal{N}}\times{\mathcal{N}}$ , we consider the set of all time instant $r$ within the time zone $[n,n+m)$ such that $i$ and $j$ are in the same batch $([i]_{r}=[j]_{r})$ . More precisely, we introduce

(3.7)

{\mathcal{T}}^{ij}_{[n,n+m)}:=\{n\leq r<n+m~{}:~{}[i]_{r}=[j]_{r}\},\qquad{\mathcal{G}}_{[n,n+m)}:=\min_{i,j}\big{|}{\mathcal{T}}^{ij}_{[n,n+m)}\big{|}.

Note that ${\mathcal{T}}^{ij}_{[n,n+m)}$ is non-empty if one batch contains both $i$ and $j$ for some instant in $\{n,n+1,\dots,n+m-1\}$ . In the following lemma, we will estimate ergodicity coefficients from random batch interactions.

Lemma 3.4.

Let $A_{n}$ be the transition matrix in (3.1). Then, for given integers $m\geq 0$ , $n\geq 0$ and $l=1,\dots,d$ ,

(3.8)

\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}.

Proof.

We derive the estimate (3.8) via two steps:

(3.9)

\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})\geq\gamma(1-\gamma)^{m-1}\alpha\Bigg{(}\sum_{r={n}}^{n+m-1}W_{r}\Bigg{)}\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}.

$\bullet$ Step A (Derivation of the first inequality): Recall that

A_{n}=(1-\gamma)I_{N}+\gamma W_{n}.

Then, it follows from Lemma 3.2 that

(3.10)

\displaystyle\begin{aligned} &\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})\\ &\hskip 14.22636pt=\alpha\Big{(}\left[(1-\gamma)I_{N}+\gamma W_{n+m-1}\right]\dots\left[(1-\gamma)I_{N}+\gamma W_{n}\right]\Big{)}\\ &\hskip 14.22636pt=\alpha\Bigg{(}(1-\gamma)^{m}I_{N}+\gamma(1-\gamma)^{m-1}\sum_{r={n}}^{n+m-1}W_{r}+\cdots+\gamma^{m}W_{n+m-1}\cdots W_{n}\Bigg{)}\\ &\hskip 14.22636pt\geq\alpha\Bigg{(}\gamma(1-\gamma)^{m-1}\sum_{r={n}}^{n+m-1}W_{r}\Bigg{)}=\gamma(1-\gamma)^{m-1}\alpha\Bigg{(}\sum_{r={n}}^{n+m-1}W_{r}\Bigg{)}.\end{aligned}

$\bullet$ Step B (Derivation of the second inequality): We use $W_{r}:=(\omega_{[i]_{r},j}(X_{r}))$ and Definition 3.1 to see

(3.11)

\displaystyle\begin{aligned} \alpha\Bigg{(}\sum_{r={n}}^{n+m-1}W_{r}\Bigg{)}&=\min_{i,j}\sum_{k=1}^{N}\min\Bigg{\{}\sum_{r=n}^{n+m-1}\omega_{[i]_{r},k}(X_{r}),\sum_{r=n}^{n+m-1}\omega_{[j]_{r},k}(X_{r})\Bigg{\}}\\ &\geq\min_{i,j}\sum_{k=1}^{N}\min\Bigg{\{}\sum_{\begin{subarray}{c}n\leq r<n+m:\\ [i]_{r}=[j]_{r}\end{subarray}}\omega_{[i]_{r},k}(X_{r}),\sum_{\begin{subarray}{c}n\leq r<n+m:\\ [i]_{r}=[j]_{r}\end{subarray}}\omega_{[j]_{r},k}(X_{r})\Bigg{\}}\\ &=\min_{i,j}\sum_{k=1}^{N}\sum_{\begin{subarray}{c}n\leq r<n+m:\\ [i]_{r}=[j]_{r}\end{subarray}}\omega_{[i]_{r},k}(X_{r})=\min_{i,j}\sum_{\begin{subarray}{c}n\leq r<n+m:\\ [i]_{r}=[j]_{r}\end{subarray}}1\\ &=\min_{i,j}\left|\{n\leq r<n+m:[i]_{r}=[j]_{r}\}\right|.\end{aligned}

Finally we combine (3.10) and (3.11) to derive (3.9). ∎

Remark 3.2.

Below, we provide two comments on the result of Lemma 3.4.

(1)

Lemma 3.3 says that, if the union of network structure from time $r=n$ to time $r=n+m-1$ forms all-to-all connected network, then the ergodicity coefficient $\alpha(A_{n+m-1}A_{n+m-2}\cdots A_{n})$ is positive. To make the ergodicity coefficient positive, the union network needs not be necessarily all-to-all. However, the estimation on the lower bound of $\alpha$ could be more difficult.

(2)

The diameter $D({\mathfrak{x}}^{l}_{n})$ satisfies

{\mathcal{D}}(\mathfrak{x}_{n+m}^{l})\leq\Big{(}1-\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}\Big{)}{\mathcal{D}}(\mathfrak{x}_{n}^{l}),\quad n,m\geq 0

3.3. Almost sure consensus

In this subsection, we provide a stochastic consensus result when the stochastic noises are turned off, i.e., $\zeta=0$ . For this, we choose a suitable $m$ such that ${\mathcal{G}}_{[0,m)}$ has a strictly positive expectation.

Lemma 3.5.

The following assertions hold.

(1)

There exists $m_{0}\in\mathbb{N}$ such that there exist partitions ${\mathcal{P}}^{1},\dots,{\mathcal{P}}^{m_{0}}$ of ${\mathcal{N}}$ in $\mathcal{A}$ , such that for any $i,j\in{\mathcal{N}}$ , one has $i,j\in S\in{\mathcal{P}}^{k}$ for some $1\leq k\leq m_{0}$ and $S\in{\mathcal{P}}^{k}$ .
(2)

For $n\geq 0$ ,

(3.12) ${\mathbb{E}}\left[{\mathcal{G}}_{[n,n+m_{0})}\right]={\mathbb{E}}\left[{\mathcal{G}}_{[0,m_{0})}\right]=:p_{m_{0}}>0.$

Proof.

(1) First, the existence of $m_{0}$ and partitions are clear since we may consider all possible partitions ${\mathcal{P}}^{1},\ldots,{\mathcal{P}}^{m_{1}}$ of $\{1,\dots,N\}$ in $\mathcal{A}$ . In other words, we may choose $m_{0}$ as $m_{1}=|\mathcal{A}|$ .

(2) Note that the random choices of batches ${\mathcal{B}}^{n}$ $(n\geq 0)$ are independent and identically distributed. Hence, the expectations of ${\mathcal{G}}_{[0,m_{0})}$ and ${\mathcal{G}}_{[n,n+m_{0})}$ should be identical:

{\mathbb{E}}\left[{\mathcal{G}}_{[n,n+m_{0})}\right]={\mathbb{E}}\left[{\mathcal{G}}_{[0,m_{0})}\right]=:p_{m_{0}}.

Next, let ${\mathcal{P}}^{1},\dots,{\mathcal{P}}^{m_{0}}$ be the partitions chosen in (1). Note that the probability of having ${\mathcal{B}}^{1}={\mathcal{P}}^{1}$ depends on $m_{1}=|\mathcal{A}|$ : for any $i,j$ ,

{\mathbb{P}}\left\{{\mathcal{B}}^{i}={\mathcal{P}}^{j}\right\}=1/m_{1}.

Therefore, we may proceed to give a rough estimate on ${\mathcal{G}}_{[0,m_{0})}$ :

	$\displaystyle{\mathbb{E}}\left[{\mathcal{G}}_{[0,m_{0})}\right]$	$\displaystyle\geq{\mathbb{P}}\left\{{\mathcal{G}}_{[0,m_{0})}\geq 1\right\}$
		$\displaystyle\geq{\mathbb{P}}\left\{\text{for each }1\leq j\leq m_{0},~{}{\mathcal{B}}^{i}={\mathcal{P}}^{j}\text{ for some }0\leq i\leq m-1\right\}$
		$\displaystyle\geq\prod_{j=1}^{m_{0}}{\mathbb{P}}\left\{{\mathcal{B}}^{j-1}={\mathcal{P}}^{j}\right\}=1/{(m_{1})}^{m_{0}}>0.$

This gives $p_{m_{0}}>0.$ ∎

Now we are ready to provide a stochastic consensus estimate on the dynamics (3.1) in the absence of noise, $\zeta=0$ .

Theorem 3.1.

Let ${\{\mathbb{x}}^{i}_{n}\}$ be a solution process to (1.3) with $\eta_{n}^{i,l}=0$ a.s. for any $n,i,l$ , and let $k\geq 0$ . Then, for $n\in[km_{0},(k+1)m_{0})$ , there exists a nonnegative random variable $\Lambda_{0}=\Lambda_{0}(m_{0},k)$ such that

(3.13)

\displaystyle\begin{aligned} &(i)~{}{\mathcal{D}}({\mathfrak{x}}^{l}_{n})\leq{\mathcal{D}}({\mathfrak{x}}^{l}_{0})\exp(-\Lambda_{0}k),\quad\mbox{a.s.},\\ &(ii)~{}\max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}\leq\max_{i,j}\|{\mathbb{x}}^{i}_{0}-{\mathbb{x}}^{j}_{0}\|_{\infty}\exp(-\Lambda_{0}k),\quad\mbox{a.s.,}\end{aligned}

where the decay exponent $\Lambda_{0}(m_{0},k)$ satisfies

(3.14)

\lim_{k\to\infty}\Lambda_{0}(m_{0},k)=p_{m_{0}}\gamma(1-\gamma)^{m_{0}-1}\quad\mbox{a.s.}

Proof.

(i) We claim

(3.15)

{\mathcal{D}}({\mathfrak{x}^{l}_{n}})\leq{\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}})\leq{\mathcal{D}}({\mathfrak{x}^{l}_{0}})\exp\Big{[}-\gamma(1-\gamma)^{m_{0}-1}\Big{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{)}k\Big{]}.

$\bullet$ (First inequality in (3.15)): Since $A_{n}$ has only nonnegative elements, the values of $\alpha$ at products of $A_{n}$ ’s are nonnegative. Hence,

(3.16)

{\mathcal{D}}({\mathfrak{x}^{l}_{n}})\leq{\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}})\left(1-\alpha(A_{n-1}A_{n-2}\cdots A_{km_{0}})\right)\leq{\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}}).

$\bullet$ (Second inequality in (3.15)): By Lemma 3.4, the ergodicity coefficient can be written in terms of ${\mathcal{G}}_{[(k-1)m_{0},km_{0})}$ :

(3.17)

\displaystyle\begin{aligned} {\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}})&\leq{\mathcal{D}}({\mathfrak{x}^{l}_{(k-1)m_{0}}})(1-\alpha(A_{km_{0}-1}A_{km_{0}-2}\cdots A_{(k-1)m_{0}}))\\ &\leq{\mathcal{D}}({\mathfrak{x}^{l}_{(k-1)m_{0}}})(1-\gamma(1-\gamma)^{m_{0}-1}{\mathcal{G}}_{[(k-1)m_{0},km_{0})})\\ &\leq{\mathcal{D}}({\mathfrak{x}^{l}_{(k-1)m_{0}}})\exp(-\gamma(1-\gamma)^{m_{0}-1}{\mathcal{G}}_{[(k-1)m_{0},km_{0})}),\end{aligned}

where we used $1+x\leq e^{x}$ in the last inequality. By induction on $k$ in (3.17),

(3.18)

\displaystyle\begin{aligned} {\mathcal{D}}({\mathfrak{x}^{l}_{km_{0}}})&\leq{\mathcal{D}}({\mathfrak{x}^{l}_{0}})\exp\Big{[}-\gamma(1-\gamma)^{m_{0}-1}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{]}\\ &={\mathcal{D}}({\mathfrak{x}^{l}_{0}})\exp\Big{[}-\gamma(1-\gamma)^{m_{0}-1}\Big{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{)}k\Big{]}.\end{aligned}

We combine (3.16) and (3.18) to get the desired estimate (3.15). Finally, set random variable $\Lambda_{0}$ as

\Lambda_{0}(m_{0},k):=\gamma(1-\gamma)^{m_{0}-1}\Big{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{)}.

Note that the random variables ${\mathcal{G}}_{[(s-1)m_{0},sm_{0})}$ $(s\geq 1)$ are independent and identically distributed. Hence, it follows from the strong law of large numbers that $\Lambda_{0}(m_{0},k)$ converges to the expectation value as $k\to\infty$ , which is estimated in Lemma 3.4:

\displaystyle\lim_{k\to\infty}\gamma(1-\gamma)^{m_{0}-1}\Big{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Big{)}=\gamma(1-\gamma)^{m_{0}-1}{\mathbb{E}}[{\mathcal{G}}_{[0,m_{0})}]=\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}\quad\mbox{a.s.}

This gives the asymptotic property of $\Lambda_{0}(m_{0},k)$ in (3.14).

(ii) By (1.10) and the definition of ${\mathfrak{x}}_{n}^{l}$ , one can see that for $l=1,\cdots,d$ ,

{\mathcal{D}}({\mathfrak{x}}_{n}^{l})\leq\max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}=\max_{1\leq k\leq d}{\mathcal{D}}({\mathfrak{x}}_{n}^{k}).

This and $\eqref{C-11}_{1}$ yield $\eqref{C-11}_{2}$ :

\displaystyle\begin{aligned} \max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}&=\max_{1\leq l\leq d}{\mathcal{D}}(\mathfrak{x}_{n}^{l})\leq\max_{1\leq l\leq d}{\mathcal{D}}({\mathfrak{x}}^{l}_{0})\exp(-\Lambda_{0}k)=\max_{i,j}\|{\mathbb{x}}^{i}_{0}-{\mathbb{x}}^{j}_{0}\|_{\infty}\exp(-\Lambda_{0}k),\quad\mbox{a.s.}\end{aligned}

∎

It is also worth mentioning again that the transition matrix $A_{n}$ has only nonnegative elements. In [7], the nonnegative version of Lemma 3.3 was applied using the result in [25]. If one takes into account of the randomness $H^{l}_{n}$ , we need to consider Lemma 3.3 and generalize Lemma 3.4 for transition matrices with possibly negative entries. We will analyze it in the next section.

4. Discrete CBO with random batch interactions and heterogeneous external noises: consensus analysis

In this section, we study consensus estimates for the stochastic dynamics (3.1) in the presence of both random batch intereactions and heterogeneous external noises. In fact, the materials presented in Section 3 corresponds to the special case of this section. Hence, presentation will be parallel to that of Section 3 and we will focus on the effect of external noises on the stochastic consensus.

4.1. Ergodicity coefficient

As discussed in Section 3, we use the ergodicity coefficient to prove the emergence of stochastic consensus. Recall the governing stochastic system:

(4.1)

\displaystyle\begin{aligned} &\mathfrak{x}^{l}_{n+1}=(A_{n}+B_{n})\mathfrak{x}^{l}_{n},\\ &A_{n}:=(1-\gamma)I_{N}+\gamma W_{n}\quad\text{and}\quad B_{n}:=H^{l}_{n}(I_{N}+W_{n}).\end{aligned}

As in Lemma 3.4, one needs to compute the ergodicity coefficient of

(4.2)

(A_{n+m-1}+B_{n+m-1})(A_{n+m-2}+B_{n+m-2})\cdots(A_{n}+B_{n})

for integers $n\geq 0$ and $m\geq 1$ .

In what follows, we study preliminary lemmas for the estimation of ergodicity coefficient $\alpha(\cdot)$ to handle the stochastic part $B_{n}$ due to external noises as a perturbation. Then, system (LABEL:D-1) can be viewed as a perturbation of the nonlinear system (3.4) which has been treated in the previous section. For a given square matrix $A=(a_{ij})$ , define a mixed norm:

\|A\|_{1,\infty}:=\max\limits_{1\leq i\leq N}\sum\limits_{j=1}^{N}|a_{ij}|.

Lemma 4.1.

For a matrix $A=(a_{ij})_{i,j=1}^{N}\in\mathbb{R}^{N\times N}$ , the ergodicity coefficient $\alpha(A)$ has a lower bound:

\alpha(A)\geq-2\|A\|_{1,\infty}.

Proof.

By direct calculation, one has

	$\displaystyle\alpha(A)$	$\displaystyle=\min_{i_{1},i_{2}}\sum_{j=1}^{N}\min\{a_{i_{1}j},a_{i_{2}j}\}\geq\min_{i_{1},i_{2}}\sum_{j=1}^{N}\min\{-\|a_{i_{1}j}\|,-\|a_{i_{2}j}\|\}$
		$\displaystyle\geq\min_{i_{1},i_{2}}\sum_{j=1}^{N}(-\|a_{i_{1}j}\|-\|a_{i_{2}j}\|)=-2\\|A\\|_{1,\infty}.$

∎

The following lemma helps to give a lower bound for the ergodicity coefficient of the product of matrices in (4.2).

Lemma 4.2.

For $n\in\mathbb{N}$ , let $A_{1},\dots,A_{n}$ , $B_{1},\dots,B_{n}$ be matrices in $\mathbb{R}^{N\times N}$ . Then, one has

(4.3)

\displaystyle\begin{aligned} &\|(A_{1}+B_{1})\cdots(A_{n}+B_{n})-A_{1}\cdots A_{n}\|_{1,\infty}\\ &\hskip 56.9055pt\leq(\|A_{1}\|_{1,\infty}+\|B_{1}\|_{1,\infty})\cdots(\|A_{n}\|_{1,\infty}+\|B_{n}\|_{1,\infty})-\|A_{1}\|_{1,\infty}\cdots\|A_{n}\|_{1,\infty}.\end{aligned}

Proof.

We use the induction on $n$ . The initial step $n=1$ is trivial. Suppose that the inequality (LABEL:D-3-1) holds for $n-1$ . By the subadditivity and submultiplicativity of the matrix norm $\|\cdot\|_{1,\infty}$ , we have

	$\displaystyle\\|(A_{1}+B_{1})\cdots(A_{n}+B_{n})-A_{1}\cdots A_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\leq\\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})A_{n}-A_{1}\cdots A_{n}\\|_{1,\infty}+\\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})B_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\leq\\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})-A_{1}\cdots A_{n-1}\\|_{1,\infty}\\|A_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\quad+(\\|A_{1}\\|_{1,\infty}+\\|B_{1}\\|_{1,\infty})\cdots(\\|A_{n-1}\\|_{1,\infty}+\\|B_{n-1}\\|_{1,\infty})\\|B_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\leq\big{(}(\\|A_{1}\\|_{1,\infty}+\\|B_{1}\\|_{1,\infty})\cdots(\\|A_{n-1}\\|_{1,\infty}+\\|B_{n-1}\\|_{1,\infty})-\\|A_{1}\\|_{1,\infty}\cdots\\|A_{n-1}\\|_{1,\infty}\big{)}\\|A_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\quad+(\\|A_{1}\\|_{1,\infty}+\\|B_{1}\\|_{1,\infty})\cdots(\\|A_{n-1}\\|_{1,\infty}+\\|B_{n-1}\\|_{1,\infty})\\|B_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt=(\\|A_{1}\\|_{1,\infty}+\\|B_{1}\\|_{1,\infty})\cdots(\\|A_{n}\\|_{1,\infty}+\\|B_{n}\\|_{1,\infty})-\\|A_{1}\\|_{1,\infty}\cdots\\|A_{n}\\|_{1,\infty}.$

This verifies the desired estimate (LABEL:D-3-1). ∎

From Lemma 4.1 and Lemma 4.2, we are ready to estimate the ergodicity coefficient of (4.2). For this, we also introduce a new random variable ${\mathcal{H}}_{[n,n+m)}$ measuring the size of error:

(4.4)

{\mathcal{H}}_{[n,n+m)}:=2\left[\prod_{r=n}^{n+m-1}(1+2\|H^{l}_{r}\|_{1,\infty})-1\right],

where $H_{n}^{l}:=\operatorname{diag}(\eta_{n}^{1,l},\dots,\eta_{n}^{N,l})$ .

Lemma 4.3.

For given integers $m\geq 1$ , $n\geq 1$ and $l=1,\dots,d$ , one has

(4.5)

\alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})\right)\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}-{\mathcal{H}}_{[n,n+m)},

where ${\mathcal{G}}_{[n,n+m)}$ is defined in (3.7).

Proof.

First, we use super-additivity of $\alpha$ in Lemma 3.2:

(4.6)

\displaystyle\begin{aligned} \alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})\right)&\geq\alpha\left(\prod_{r=n}^{n+m-1}A_{r}\right)+\alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})-\prod_{r=n}^{n+m-1}A_{r}\right)\\ &=:{\mathcal{I}}_{11}+{\mathcal{I}}_{12}.\end{aligned}

$\bullet$ (Estimate of ${\mathcal{I}}_{11}$ ): This case has already been treated in Lemma 3.4:

(4.7)

\alpha\left(\prod_{r=n}^{n+m-1}A_{r}\right)\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}.

$\bullet$ (Estimate of ${\mathcal{I}}_{12}$ ): The term ${\mathcal{I}}_{12}$ can be regarded as an error term due to external stochastic noise. We may use Lemma 4.1 to get a lower bound of this term:

(4.8)

\displaystyle\alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})-\prod_{r=n}^{n+m-1}A_{r}\right)\geq-2\left\|\prod_{r=n}^{n+m-1}(A_{r}+B_{r})-\prod_{r=n}^{n+m-1}A_{r}\right\|_{1,\infty}.

By Lemma 4.2, one gets

(4.9)

\displaystyle\left\|\prod_{r=n}^{n+m-1}(A_{r}+B_{r})-\prod_{r=n}^{n+m-1}A_{r}\right\|_{1,\infty}\leq\prod_{r=n}^{n+m-1}(\|A_{r}\|_{1,\infty}+\|B_{r}\|_{1,\infty})-\prod_{r=n}^{n+m-1}\|A_{r}\|_{1,\infty}.

By the defining relations (3.1), $A_{r}$ and $B_{r}$ can be estimated as follows.

(4.10)			$\displaystyle\\|A_{r}\\|_{1,\infty}=\\|(1-\gamma)I_{N}+\gamma W_{r}\\|_{1,\infty}\leq 1,$
(4.10)			$\displaystyle\\|B_{r}\\|_{1,\infty}=\\|H^{l}_{r}(I_{N}-W_{r})\\|_{1,\infty}\leq\\|H^{l}_{r}\\|_{1,\infty}(\\|I_{N}\\|_{1,\infty}+\\|W_{r}\\|_{1,\infty})=2\\|H^{l}_{r}\\|_{1,\infty}.$

Now, we combine (4.8), (4.9) and (4.10) to estimate ${\mathcal{I}}_{12}$ :

(4.11)

\displaystyle{\mathcal{I}}_{12}\geq-2\left[\prod_{r=n}^{n+m-1}(1+2\|H^{l}_{r}\|_{1,\infty})-1\right]=-{\mathcal{H}}_{[n,n+m)}.

Finally, combining (4.6), (4.7) and (4.11) yield the desired estimate:

\alpha\left(\prod_{r=n}^{n+m-1}(A_{r}+B_{r})\right)\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[n,n+m)}-{\mathcal{H}}_{[n,n+m)}.

∎

Remark 4.1.

For the zero noise case $\zeta=0$ , the random variable ${\mathcal{H}}_{[n,n+m)}$ becomes zero and the estimate (4.5) reduces to (3.8).

4.2. Stochastic consensus

Recall the discrete system for ${\mathfrak{x}}^{l}_{n}$ :

\mathfrak{x}^{l}_{n+1}=(A_{n}+B_{n})\mathfrak{x}^{l}_{n}.

By iterating the above relation $m$ times, one gets

(4.12)

\mathfrak{x}^{l}_{n+m}=\Big{(}\prod_{r=n}^{n+m-1}(A_{r}+B_{r})\Big{)}\mathfrak{x}^{l}_{n}.

Lemma 4.4.

Let ${\{\mathbb{x}}^{i}_{n}\}$ be a solution process to (1.3) and let $m\geq 1$ and $k\geq 0$ be given integers. Then, for $n\in[km,(k+1)m)$ and $l=1,\dots,d$ ,

\displaystyle\mathcal{D}(\mathfrak{x}_{n}^{l})

\displaystyle\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\left(1+{\mathcal{H}}_{[km,n)}\right)\prod_{s=1}^{k}\Big{(}1-\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[(s-1)m,sm)}+{\mathcal{H}}_{[(s-1)m,sm)}\Big{)}.

Proof.

We will follow the same procedure employed in Theorem 3.1, i.e., we first bound $\mathcal{D}(\mathfrak{x}_{n}^{l})$ using ${\mathcal{D}}(\mathfrak{x}_{km}^{l})$ , and then we bound ${\mathcal{D}}(\mathfrak{x}_{km}^{l})$ using ${\mathcal{D}}(\mathfrak{x}_{0}^{l})$ .

$\bullet$ Step A: For $n\in[km,(k+1)m)$ , we estimate ${\mathcal{D}}(\mathbb{x}_{n}^{l})$ by using ${\mathcal{D}}(\mathbb{x}_{km}^{l})$ :

(4.13)	$\displaystyle{\mathcal{D}}(\mathfrak{x}_{n}^{l})$	$\displaystyle\leq\left(1-\alpha\left(\prod_{r=km}^{n-1}(A_{r}+B_{r})\right)\right){\mathcal{D}}(\mathfrak{x}_{km}^{l})$
		$\displaystyle\leq\left(1-\gamma(1-\gamma)^{n-km-1}{\mathcal{G}}_{[km,n)}+{\mathcal{H}}_{[km,n)}\right){\mathcal{D}}(\mathfrak{x}_{km}^{l})$
		$\displaystyle\leq\left(1+{\mathcal{H}}_{[km,n)}\right){\mathcal{D}}(\mathfrak{x}_{km}^{l}),$

where we used the fact that ${\mathcal{G}}_{[km,n)}\geq 0$ in the last inequality.

$\bullet$ Step B: We claim

(4.14)

{\mathcal{D}}(\mathfrak{x}_{km}^{l})\leq\prod_{s=1}^{k}\Big{(}1-\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[(k-1)m,km)}-{\mathcal{H}}_{[(k-1)m,km)}\Big{)}{\mathcal{D}}(\mathfrak{x}_{0}^{l}).

Proof of claim: By setting $n=(k-1)m$ in (4.12), one has

\mathfrak{x}^{l}_{km}=\Big{(}\prod_{r=(k-1)m}^{km-1}(A_{r}+B_{r})\Big{)}{\mathfrak{x}}^{l}_{(k-1)m}.

Then, we use Lemma 3.1 (with $a=1$ ) to (4.12) to obtain

(4.15)

{\mathcal{D}}(\mathfrak{x}_{km}^{l})\leq\left(1-\alpha\left(\prod_{r=(k-1)m}^{km-1}(A_{r}+B_{r})\right)\right){\mathcal{D}}(\mathfrak{x}_{(k-1)m}^{l}).

By induction on $k$ , the relation (4.15) yields

(4.16)

\displaystyle{\mathcal{D}}(\mathfrak{x}_{km}^{l})

\displaystyle\leq\prod_{s=1}^{k}\left(1-\alpha\left(\prod_{r=(s-1)m}^{sm-1}(A_{r}+B_{r})\right)\right){\mathcal{D}}(\mathfrak{x}_{0}^{l}).

On the other hand, it follows from Lemma 4.3 that

(4.17)

\alpha\left(\prod_{r=(k-1)m}^{km-1}(A_{r}+B_{r})\right)\geq\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[(k-1)m,km)}-{\mathcal{H}}_{[(k-1)m,km)}.

Finally, one combines (4.16) and (4.17) to derive (4.14).

$\bullet$ Final step: By combining (4.13) and (4.14), one has

\displaystyle\begin{aligned} {\mathcal{D}}(\mathfrak{x}_{n}^{l})&\leq\left(1+{\mathcal{H}}_{[km,n)}\right){\mathcal{D}}(\mathfrak{x}_{km}^{l})\\ &\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\left(1+{\mathcal{H}}_{[km,n)}\right)\prod_{s=1}^{k}\Big{(}1-\gamma(1-\gamma)^{m-1}{\mathcal{G}}_{[(s-1)m,sm)}+{\mathcal{H}}_{[(s-1)m,sm)}\Big{)}.\end{aligned}

∎

Now, we are ready to provide an exponential decay of ${\mathbb{E}}{\mathcal{D}}(\mathfrak{x}_{n}^{l})$ .

Theorem 4.1.

( $L^{1}(\Omega)$ -consensus) Let $\{{\mathbb{x}}^{i}_{n}\}$ be a solution process to (1.3), and let $m_{0}$ be a positive constant defined in Lemma 3.5. Then there exists $\zeta_{1}>0$ independent of the dimension $d$ , such that if $0\leq\zeta<\zeta_{1}$ , there exists some positive constants $C_{1}=C_{1}(N,m_{0},\zeta)$ and $\Lambda_{1}=\Lambda_{1}(\gamma,N,m_{0},\zeta)$ such that

\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C_{1}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})\exp\left(-\Lambda_{1}n\right),\quad n\geq 0.

In particular,

\max_{i,j}\|\mathbb{x}_{n}^{i}-\mathbb{x}_{n}^{j}\|_{\infty}\leq C_{1}\max_{i,j}\|\mathbb{x}_{0}^{i}-\mathbb{x}_{0}^{j}\|_{\infty}\exp\left(-\Lambda_{1}n\right),\quad n\geq 0.

Proof.

Assume $n\in[km_{0},(k+1)m_{0})$ $(k\geq 0)$ . It follows from Lemma 4.4 that

(4.18)

\mathcal{D}(\mathfrak{x}_{n}^{l})\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\left(1+{\mathcal{H}}_{[km_{0},n)}\right)\prod_{s=1}^{k}\Big{(}1-\gamma(1-\gamma)^{m_{0}-1}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}+{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\Big{)}.

Since the following random variables

{\mathcal{D}}(\mathfrak{x}_{0}^{l}),\quad{\mathcal{H}}_{[km_{0},n)},\quad{\mathcal{G}}_{[(s-1)m_{0},sm_{0})},\quad{\mathcal{H}}_{[(s-1)m_{0},sm_{0})},\quad s=1,\cdots,k

are independent, we can take expectations on both sides of the estimate in (4.18) and then use the inequality $1+x\leq e^{x}$ to see that

In the sequel, we estimate the terms ${\mathbb{E}}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}$ and ${\mathbb{E}}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}$ one by one.

$\bullet$ Case A (Estimate of ${\mathbb{E}}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}$ ): Recall the defining relation ${\mathcal{H}}_{[n,n+m_{0})}$ in (4.4):

{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}:=2\left[\prod_{(s-1)m_{0}}^{sm_{0}-1}(1+2\|H^{l}_{r}\|_{1,\infty})-1\right].

By taking expectation of ${\mathcal{H}}_{[(s-1)m_{0},sm_{0})}$ ,

(4.19)		$\displaystyle{\mathbb{E}}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}$	$\displaystyle=2\left[\prod_{r=(s-1)m_{0}}^{sm_{0}-1}\left(1+2{\mathbb{E}}\\|H^{l}_{r}\\|_{1,\infty}\right)-1\right]=2\left[\left(1+2{\mathbb{E}}\\|H^{l}_{0}\\|_{1,\infty}\right)^{m_{0}}-1\right]$
(4.19)			$\displaystyle\leq 2\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right],$

where we used the fact ${\mathbb{E}}|x|\leq\sqrt{{\mathbb{E}}|x|^{2}}$ to get

{\mathbb{E}}\|H_{n}^{l}\|_{1,\infty}={\mathbb{E}}\max_{k}|\eta_{n}^{k,l}|\leq{\mathbb{E}}\sqrt{\sum_{k}|\eta^{k,l}_{n}|^{2}}\leq\sqrt{{\mathbb{E}}\sum_{k}|\eta_{n}^{k,l}|^{2}}\leq\sqrt{N}\zeta.

$\bullet$ Case B (Estimate of ${\mathbb{E}}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}$ ): By (3.12) of Lemma 3.5, one has

{\mathbb{E}}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}=p_{m_{0}}.

Combining (4.2) and (4.19) yields

(4.20)		$\displaystyle{\mathbb{E}}{\mathcal{D}}(\mathfrak{x}_{n}^{l})$	$\displaystyle\leq{\mathbb{E}}\mathcal{D}(\mathfrak{x}_{0}^{l})(1+2[(1+2\sqrt{N}\zeta)^{n-km_{0}}-1])$
(4.20)			$\displaystyle\quad\times\exp\left\{-k\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}+2k\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right]\right\}.$

Note that our goal is to estimate ${\mathbb{E}}{\mathcal{D}}(\mathfrak{x}_{n}^{l})$ in terms of $n$ . Then, $k$ is actually a function of $n$ . To be more specific,

k=k(n)=\Big{\lfloor}\frac{n}{m_{0}}\Big{\rfloor}.

If $n\geq m_{0}$ , then

(4.21)

\frac{k}{n}\geq\frac{1}{n}\left(\frac{n}{m_{0}}-1+\frac{1}{m_{0}}\right)=\frac{1}{m_{0}}-\frac{1}{n}\left(1-\frac{1}{m_{0}}\right)\geq\frac{1}{m_{0}}-\frac{1}{m_{0}}\left(1-\frac{1}{m_{0}}\right)=\frac{1}{m_{0}^{2}}.

If $1\leq n\leq m_{0}$ , then

(4.22)

\frac{m_{0}}{n}\geq 1

Therefore, by combining (4.20), (4.21), (4.22) one has

\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq\exp\left(-\widehat{\Lambda}_{n}\cdot n\right)\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})(1+2[(1+2\sqrt{N}\zeta)^{m_{0}-1}-1])e^{m_{0}},\quad n\geq 1,

where the sequence $\{\widehat{\Lambda}_{n}\}_{n\geq 1}$ satisfies

	$\displaystyle\widehat{\Lambda}_{n}$	$\displaystyle:=\frac{k}{n}\Big{(}\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}-2\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right]\Big{)}+\frac{m_{0}}{n}$
		$\displaystyle\geq\min\left\{\frac{1}{m_{0}^{2}}\Big{(}\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}-2\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right]\Big{)},1\right\}$
		$\displaystyle=\frac{1}{m_{0}^{2}}\Big{(}\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}-2\left[\left(1+2\sqrt{N}\zeta\right)^{m_{0}}-1\right]\Big{)}=:\Lambda_{1}(\gamma,N,m,\zeta)>0,$

provided that $\zeta$ is sufficiently small. Hence

\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq\exp\left(-\Lambda_{1}\cdot n\right)\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})(1+2[(1+2\sqrt{N}\zeta)^{m_{0}-1}-1])e^{m_{0}},\quad n\geq 0.

Now by setting

C_{1}:=(1+2[(1+2\sqrt{N}\zeta)^{m_{0}-1}-1])e^{m_{0}}

one gets the desired result. ∎

In the next theorem, we derive almost sure consensus of (1.3).

Theorem 4.2.

(Almost sure consensus) Let ${\{\mathbb{x}}^{i}_{n}\}$ be a solution process to (1.3), and let $m_{0}$ be a positive constant defined in Lemma 3.5. Then there exists $\zeta_{2}>0$ independent of the dimension $d$ , such that if $0\leq\zeta<\zeta_{2}$ and $l\in\{1,\cdots,d\}$ , then there exist a positive constant $\Lambda_{2}=\Lambda_{2}(\gamma,N,m,\zeta)$ and random variables $C_{2}^{l}=C_{2}^{l}(\omega)$ and $C_{2}=\displaystyle\max_{1\leq l\leq d}C_{2}^{l}>0$ such that

\displaystyle\begin{aligned} &(i)~{}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C^{l}_{2}(\omega)\exp\left(-\Lambda_{2}n\right),\quad n\geq 0,\quad\quad\mbox{a.s.}\\ &(ii)~{}\max_{i,j}\|\mathbb{x}_{n}^{i}-\mathbb{x}_{n}^{j}\|_{\infty}\leq C_{2}\exp\left(-\Lambda_{2}n\right),\quad n\geq 0,\quad\mbox{a.s.}\end{aligned}

Proof.

(i) We apply the inequality $1+x\leq e^{x}$ to Lemma 4.4 to get

(4.23)

\displaystyle\begin{aligned} &\mathcal{D}(\mathfrak{x}_{n}^{l})\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\left(1+{\mathcal{H}}_{[km_{0},n)}\right)\exp\bigg{[}-\gamma(1-\gamma)^{m_{0}-1}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}+\sum_{s=1}^{k}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\bigg{]}\\ &\leq{\mathcal{D}}(\mathfrak{x}_{0}^{l})\exp\bigg{[}-\gamma(1-\gamma)^{m_{0}-1}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}+\sum_{s=1}^{k+1}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\bigg{]}\\ &={\mathcal{D}}(\mathfrak{x}_{0}^{l})\exp\Bigg{[}-\gamma(1-\gamma)^{m_{0}-1}\Bigg{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Bigg{)}\cdot\frac{k}{n}\cdot n\\ &\hskip 113.81102pt+\Bigg{(}\frac{1}{k+1}\sum_{s=1}^{k+1}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\Bigg{)}\cdot\frac{k+1}{n}\cdot n\Bigg{]}.\end{aligned}

As in Theorem 4.1, one may consider

k=k(n)=\Big{\lfloor}\frac{n}{m_{0}}\Big{\rfloor}.

Then, its limiting behavior is :

(4.24)

\lim_{n\to\infty}k=\infty\quad\mbox{and}\quad\lim_{n\to\infty}\frac{k}{n}=\frac{1}{m_{0}}.

Hence, one can apply the strong law of large numbers to get

(4.25)

\lim_{n\to\infty}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}=\mathbb{E}\left[{\mathcal{G}}_{[0,m_{0})}\right]=p_{m_{0}}\quad\mbox{a.s.}

and

(4.26)			$\displaystyle\lim_{n\to\infty}\frac{1}{k+1}\sum_{s=1}^{k+1}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}$
			$\displaystyle\hskip 28.45274pt=\lim_{n\to\infty}\frac{1}{k+1}\sum_{s=1}^{k+1}2\left[\left(1+2\\|H_{sm_{0}-1}^{l}\\|_{1,\infty}\right)\cdots\left(1+2\\|H_{(s-1)m_{0}}^{l}\\|_{1,\infty}\right)-1\right]$
			$\displaystyle\hskip 28.45274pt=2[(1+2\mathbb{E}\\|H_{0}^{l}\\|_{1,\infty})^{m_{0}}-1]\quad\mbox{a.s.}$

Finally, combining (4.23), (4.24), (4.25) and (4.26) gives

\mathcal{D}(\mathfrak{x}_{n}^{l})\leq\exp\left(-\widetilde{\Lambda}_{n}\cdot n\right)\mathcal{D}(\mathfrak{x}_{0}^{l}),\quad n\geq 1,

where the random process $(\widetilde{\Lambda}_{n})_{n\geq 1}$ is defined by

\displaystyle\widetilde{\Lambda}_{n}:=-\gamma(1-\gamma)^{m_{0}-1}\Bigg{(}\frac{1}{k}\sum_{s=1}^{k}{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\Bigg{)}\cdot\frac{k}{n}+\Bigg{(}\frac{1}{k+1}\sum_{s=1}^{k+1}{\mathcal{H}}_{[(s-1)m_{0},sm_{0})}\Bigg{)}\cdot\frac{k+1}{n},

and satisfies

	$\displaystyle\lim_{n\to\infty}\widetilde{\Lambda}_{n}$	$\displaystyle=\frac{\gamma(1-\gamma)^{m_{0}-1}}{m_{0}}\mathbb{E}\left[{\mathcal{G}}_{[(s-1)m_{0},sm_{0})}\right]-\frac{2[(1+2\mathbb{E}\\|H_{0}^{l}\\|_{1,\infty})^{m_{0}}-1]}{m_{0}}$
		$\displaystyle\geq\frac{\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}}{m_{0}}-\frac{2[(1+2\sqrt{N}\zeta)^{m_{0}}-1]}{m_{0}}.$

Thus, if $\zeta$ is sufficiently small,

\lim_{n\to\infty}\widetilde{\Lambda}_{n}>0.

This implies that, if one chooses a positive constant $\Lambda_{2}=\Lambda_{2}(\gamma,N,m_{0},\zeta)$ such that

(4.27)

0<\Lambda_{2}<\frac{\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}}{m_{0}}-\frac{2[(1+2\sqrt{N}\zeta)^{m_{0}}-1]}{m_{0}},

there exists a stopping time $T>0$ such that

\mathcal{D}(\mathfrak{x}_{n}^{l})\leq\exp\left(-\Lambda_{2}n\right)\mathcal{D}(\mathfrak{x}_{0}^{l}),\quad n\geq T.

In other words, there exists an almost surely positive and bounded random variable

C_{2}^{l}(\omega):=\max_{0\leq n<T}\{\exp\left(\Lambda_{2}n\right)\mathcal{D}(\mathfrak{x}_{n}^{l})\},

which satisfies

(4.28)

\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C^{l}_{2}(\omega)\exp\left(-\Lambda_{2}n\right),\quad n\geq 0.

(ii) In the course of proof of Theorem 3.1, we had

(4.29)

\max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}=\max_{1\leq l\leq d}{\mathcal{D}}({\mathfrak{x}}_{n}^{l})\quad\mbox{a.s.}

Combining (4.28) and (4.29) gives

\max_{i,j}\|{\mathbb{x}}^{i}_{n}-{\mathbb{x}}^{j}_{n}\|_{\infty}=\max_{1\leq l\leq d}{\mathcal{D}}(\mathfrak{x}_{n}^{l})\leq\Big{(}\max_{1\leq l\leq d}C^{l}_{2}(\omega)\Big{)}\exp(-\Lambda_{2}n)=:C_{2}(\omega)\exp(-\Lambda_{2}n),\quad\mbox{a.s.}

∎

5. Convergence of discrete CBO flow

In this section, we present the convergence analysis of the CBO algorithm (1.3) using the stochastic consensus estimates in Theorems 4.1 and 4.2. In those theorems, it turns out that consensus of (1.3), i.e., decay of the relative state $\|{\mathbb{x}}^{i}-{\mathbb{x}}^{j}\|$ , is shown to be exponentially fast in expectation and in almost sure sense. However, this does not guarantee that a solution converges toward an equilibrium of the discrete model (1.3), since it may approach a limit cycle or exhibit chaotic trajectory even if the relative states decay to zero. The convergence of the states $\mathfrak{x}_{n}^{i}$ to a common point is an important issue for the purpose of global optimization of the CBO algorithm.

Lemma 5.1.

(almost sure convergence) Let ${\{\mathbb{x}}^{i}_{n}\}$ be a solution process to (1.3). Then, there exists $\zeta_{2}>0$ independent of the dimension $d$ , such that if $0\leq\zeta<\zeta_{2}$ , there exists a random variable ${\mathbb{x}}_{\infty}$ such that the solution $\{{\mathbb{x}}^{i}_{n}\}$ of (1.3) converges to ${\mathbb{x}}_{\infty}$ almost surely:

\lim_{n\to\infty}{\mathbb{x}}_{n}^{i}={\mathbb{x}}_{\infty}\quad\mbox{a.s.,}\quad i\in{\mathcal{N}}.

Proof.

We will use almost sure consensus of the states ${\mathbb{x}}_{n}^{i}$ ( $i=1,\dots,N$ ) in Theorem 4.2 and similar arguments of Theorem 3.1 in [18]. Similar to (3.1), we may rewrite (1.3) as

(5.1)

\displaystyle\begin{aligned} \mathfrak{x}_{n+1}^{l}-\mathfrak{x}_{n}^{l}&=-\gamma\left(I_{N}-W_{n}\right)\mathfrak{x}_{n}^{l}-H_{n}^{l}(I_{N}-W_{n})\mathfrak{x}_{n}^{l},\quad 1\leq l\leq d,~{}n\geq 0,\end{aligned}

where the convergence of ${\mathbb{x}}^{i}_{n}$ for all $i=1,\ldots,N$ is equivalent to that of $\mathfrak{x}_{n}^{l}$ for all $l=1,\ldots,d$ . Summing up (5.1) over $n$ gives

(5.2)

\mathfrak{x}_{n}^{l}-\mathfrak{x}_{0}^{l}=-\gamma\underbrace{\sum_{k=0}^{n-1}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}}_{=:{\mathcal{J}}_{1}}-\underbrace{\sum_{k=0}^{n-1}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}}_{=:{\mathcal{J}}_{2}},\quad 1\leq l\leq d,~{}n\geq 0.

Note that the convergence of $\mathfrak{x}_{n}^{l}$ follows if the two vector series in the R.H.S. of (5.2) are both convergent almost surely as $n\to\infty$ .

$\bullet$ Case A (Almost sure convergence of ${\mathcal{J}}_{1}$ ): For the $i$ -th standard unit vector $\mathbb{e}_{i}$ in $\mathbb{R}^{N}$ , one has

(5.3)

\displaystyle\begin{aligned} |\mathbb{e}_{i}^{\top}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}|&=\left|x_{k}^{i,l}-\sum_{j=1}^{N}\omega_{[i]_{k},j}(X_{k}^{1},\dots,X_{k}^{N})x_{k}^{j,l}\right|\\ &\leq\sum_{j=1}^{N}\omega_{[i]_{k},j}(X_{k}^{1},\dots,X_{k}^{N})\left|x_{k}^{i,l}-x_{k}^{j,l}\right|\leq\max_{1\leq j\leq N}\left|x_{k}^{i,l}-x_{k}^{j,l}\right|=\mathcal{D}(\mathfrak{x}_{k}^{l}).\end{aligned}

where we used the fact that $\sum_{j}\omega_{[i]_{k},j}=1$ . It follows from Theorem 4.2 that the diameter process $\mathcal{D}(\mathfrak{x}_{k}^{l})$ decays to zero exponentially fast almost surely. Hence, for each $i=1,\dots,N$ , the relation (5.3) implies

\sum_{k=0}^{\infty}|\mathbb{e}_{i}^{\top}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}|\leq\sum_{k=0}^{\infty}\mathcal{D}(\mathfrak{x}_{k}^{l})<\infty,\quad\mbox{a.s.}

This shows that each component of the first series in (5.2) is convergent almost surely.

$\bullet$ Case B (Almost sure convergence of ${\mathcal{J}}_{2}$ ): Note that this series is martingale since the random vectors $\{H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\}_{k\geq 0}$ are independent and have zero mean: for any $\sigma$ -field $\mathcal{F}$ independent of $H^{l}_{k}$ , we have

{\mathbb{E}}[~{}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}~{}|~{}\mathcal{F}~{}]={\mathbb{E}}[~{}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}]=\Big{(}{\mathbb{E}}H_{k}^{l}\Big{)}\cdot\Big{(}{\mathbb{E}}(I_{N}-W_{k})\Big{)}\cdot\Big{(}{\mathbb{E}}\mathfrak{x}_{k}^{l}\Big{)}=0,

where we used the fact that ${\mathbb{E}}H_{k}^{l}=0$ . By Doob’s martingale convergence theorem [14], the martingale converges almost surely if the series is uniformly bounded in $L^{2}(\Omega)$ . Again, it follows from (5.3) and Theorem 4.2 that

\displaystyle\begin{aligned} &\sup_{n\geq 0}\mathbb{E}\left|\sum_{k=0}^{n}\mathbb{e}_{i}^{\top}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}=\sup_{n\geq 0}\mathbb{E}\left|\sum_{k=0}^{n}\eta_{k}^{i,l}\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}\\ &\hskip 28.45274pt\leq\sup_{n\geq 0}\mathbb{E}\sum_{k=0}^{n}\left|\eta_{k}^{i,l}\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}=\sup_{n\geq 0}\sum_{k=0}^{n}{\mathbb{E}}|\eta_{k}^{i,l}|^{2}\mathbb{E}\left|\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}\\ &\hskip 28.45274pt\leq\zeta^{2}\sum_{k=0}^{\infty}\mathbb{E}\left|\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\right|^{2}\leq\zeta^{2}\sum_{k=0}^{\infty}\mathbb{E}|\mathcal{D}(\mathfrak{x}_{k}^{l})|^{2}<\infty.\end{aligned}

This yields that the second series in (5.3) converges almost surely.

In (5.3), we combine results from Case A and Case B to conclude that $\mathfrak{x}^{l}_{n}$ converges to a random variable almost surely. ∎

Lemma 5.2.

Let $\{Y_{k}\}_{k\geq 0}$ be an i.i.d. sequence of random variables with $\mathbb{E}|Y_{0}|<\infty$ . Then, for any given $\delta>0$ , there exists a random variable $\bar{C}(\omega)$ such that

|Y_{k}|<\bar{C}(\omega)e^{\delta k},\quad\forall~{}k\geq 0,\quad\mbox{a.s.}

Proof.

By Markov’s inequality, for any constant $n\geq 1$ one has

\mathbb{P}(|Y_{k}|\geq ne^{\delta k})\leq\frac{\mathbb{E}|Y_{k}|}{ne^{\delta k}},\quad\forall~{}k\geq 0.

In particular,

\displaystyle\begin{aligned} \mathbb{P}(|Y_{k}|<ne^{\delta k},~{}\forall~{}k\geq 0)&=1-\mathbb{P}(|Y_{k}|\geq ne^{\delta k},~{}\exists k\geq 0)\geq 1-\sum_{k=0}^{N}\mathbb{P}(|Y_{k}|\geq ne^{\delta k})\geq 1-\frac{\mathbb{E}|Y_{k}|}{n(1-e^{-\delta})}.\end{aligned}

Therefore, for any $\delta>0$ ,

\mathbb{P}(\exists~{}n\geq 1~{}\mbox{s.t.}~{}\mbox{for}~{}\forall~{}k\geq 0,~{}|Y_{k}|<ne^{\delta k})=\lim_{n\to\infty}\mathbb{P}(|Y_{k}|<ne^{\delta k},~{}\forall~{}k\geq 0)=1.

This implies existence of a random variable $\bar{C}(\omega)$ satisfying

|Y_{k}|<\bar{C}(\omega)e^{\delta k},\quad\forall~{}k\geq 0,\quad\mbox{a.s.}

In details, we may define the random variable $\bar{C}(\omega)$ as

\bar{C}(\omega):=\sum_{n=1}^{\infty}\mathbf{1}_{\bar{C}_{n}},\quad\bar{C}_{n}:=\Omega\setminus\left\{\omega\in\Omega~{}:~{}|Y_{k}(\omega)|<ne^{\delta k},~{}\forall~{}k\geq 0\right\},\quad n\geq 1.

For example, if $\omega$ satisfies $|Y_{k}(\omega)|<me^{\delta k},~{}\forall~{}k\geq 0$ for some $m$ but not for $(m-1)$ , then $\omega\in\bar{C}_{1},\bar{C}_{2},\dots,\bar{C}_{m-1}$ and $\omega\notin\bar{C}_{m},\bar{C}_{m+1},\dots$ so that $\bar{C}(\omega)=m$ . ∎

Now, we are ready to present the convergence of $\mathbb{x}^{i}_{n}$ in expectation and almost sure sense.

Theorem 5.1.

Let ${\{\mathbb{x}}^{i}_{n}\}$ be a solution process to (1.3), and let $m_{0}$ be a positive constant defined in Lemma 3.5. Then, there exists $\zeta_{3}>0$ independent of the dimension $d$ , such that if $0\leq\zeta<\zeta_{3}$ , then there exists some random variable $\mathbb{x}_{\infty}$ such that ${\mathbb{x}}^{i}_{n}$ exponentially converges to ${\mathbb{x}}_{\infty}$ in the following senses:

(1)

(Convergence in expectation): for positive constants $C_{1}$ and $\Lambda_{1}$ appearing in Theorem 4.1, one has

{\mathbb{E}}\|{\mathbb{x}}_{n}^{i}-{\mathbb{x}}_{\infty}^{i}\|_{1}\leq\frac{d(\gamma+\sqrt{N}\zeta)C_{1}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})}{1-e^{-\Lambda_{1}}}e^{-\Lambda_{1}n},\quad n\geq 0,~{}~{}i\in{\mathcal{N}}.

(2)

(Almost sure convergence): for a positive constant $\Lambda_{2}$ appearing in Theorem 4.2, there exists a positive random variable $C_{3}(\omega)>0$ such that

\|\mathbb{x}_{n}-\mathbb{x}_{\infty}^{i}\|_{1}\leq C_{3}(\omega)e^{-\Lambda_{2}n}\quad\mbox{a.s.}\quad n\geq 0,~{}~{}i\in{\mathcal{N}}.

Proof.

Let $m_{0}$ be a positive integer appearing in Theorem 4.1. Then, for $n\geq 0$ , there exists $\zeta_{1}>0$ independent of the dimension $d$ , such that if $0\leq\zeta<\zeta_{1}$ , there exists some positive constants $C_{1}=C_{1}(N,m_{0},\zeta)$ and $\Lambda_{1}=\Lambda_{1}(\gamma,N,m_{0},\zeta)$ such that

(5.4)

\mathbb{E}\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C_{1}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})\exp\left(-\Lambda_{1}n\right),\quad n\geq 0.

Let $M\geq n$ . Then, summing up (5.1) over $n$ gives

\displaystyle\begin{aligned} \mathfrak{x}_{M}^{l}-\mathfrak{x}_{n}^{l}&=-\gamma\sum_{k=n}^{M-1}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}-\sum_{k=n}^{M-1}H_{k}^{l}(I_{N}-W_{k})\mathfrak{x}_{k}^{l},\quad 1\leq l\leq d,~{}n\geq 0.\end{aligned}

As in the proof of Lemma 5.1, for the $i$ -th standard unit vector $\mathbb{e}_{i}$ , one has

(5.5)		$\displaystyle\|x_{M}^{i,l}-x_{n}^{i,l}\|$	$\displaystyle=\|\mathbb{e}_{i}^{\top}(\mathfrak{x}_{M}^{l}-\mathfrak{x}_{n}^{l})\|\leq\gamma\sum_{k=n}^{M-1}\|\mathbb{e}_{i}^{\top}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}\|+\sum_{k=n}^{M-1}\|\eta_{k}^{i,l}\|\|\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\|$
(5.5)			$\displaystyle\leq\sum_{k=n}^{M-1}(\gamma+\|\eta_{k}^{i,l}\|)\mathcal{D}(\mathfrak{x}_{k}^{l}),$

where the last inequality follows from $\eqref{E-3}$ .

(i) Taking expectations to (5.5) and use (5.4) gives This yields

(5.6)

{\mathbb{E}}\|{\mathbb{x}}_{M}^{i}-{\mathbb{x}}_{n}^{i}\|_{\infty}=\max_{1\leq l\leq d}{\mathbb{E}}|x_{M}^{i,l}-x_{n}^{i,l}|\leq\frac{(\gamma+\sqrt{N}\zeta)C_{1}}{1-e^{-\Lambda_{1}}}\max_{1\leq l\leq d}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})e^{-\Lambda_{1}n}.

In (5.6), letting $M\to\infty$ and using Lemma 5.1 give

{\mathbb{E}}\|{\mathbb{x}}_{\infty}^{i}-{\mathbb{x}}_{n}^{i}\|_{\infty}\leq\frac{(\gamma+\sqrt{N}\zeta)C_{1}}{1-e^{-\Lambda_{1}}}\max_{1\leq l\leq d}\mathbb{E}\mathcal{D}(\mathfrak{x}_{0}^{l})e^{-\Lambda_{1}n},\quad n\geq 0.

(ii) In the proof of Theorem 4.2, the constant $\Lambda_{2}$ was chosen in (4.27) so that

0<\Lambda_{2}<\frac{\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}}{m_{0}}-\frac{2[(1+2\sqrt{N}\zeta)^{m_{0}}-1]}{m_{0}}.

Here one may fix $\Lambda_{2}$ and consider small $\varepsilon>0$ satisfying

0<\Lambda_{2}<\Lambda_{2}+\varepsilon<\frac{\gamma(1-\gamma)^{m_{0}-1}p_{m_{0}}}{m_{0}}-\frac{2[(1+2\sqrt{N}\zeta)^{m_{0}}-1]}{m_{0}}.

Then, from Theorem 4.2 and Lemma 5.2, the estimates of $\mathcal{D}(\mathfrak{x}_{n}^{l})$ and $|\eta_{k}^{i,l}|$ can be described by

\displaystyle\begin{aligned} &\mathcal{D}(\mathfrak{x}_{n}^{l})\leq C^{l}_{2}(\omega)\exp\left(-(\Lambda_{2}+\varepsilon)n\right),\quad n\geq 0,\quad\quad\mbox{a.s.},\\ &|\eta_{k}^{i,l}|<\bar{C}(\omega)e^{\varepsilon k},\quad\forall~{}k\geq 0,\quad\mbox{a.s.}\end{aligned}

We apply these estimates to (5.5) and obtain

\displaystyle\begin{aligned} |x_{M}^{i,l}-x_{n}^{i,l}|&\leq\sum_{k=n}^{M-1}(\gamma+\bar{C}(\omega)e^{\varepsilon k})C_{2}^{l}(\omega)e^{-(\Lambda_{2}+\varepsilon)k}\leq{\widetilde{C}}(\omega)\sum_{k=n}^{M-1}e^{-\Lambda_{2}k}\leq\frac{{\widetilde{C}}(\omega)e^{-\Lambda_{2}n}}{1-e^{-\Lambda_{2}}},\quad\mbox{a.s.}\end{aligned}

for some positive and bounded random variable ${\widetilde{C}}(\omega)$ , independent of $l$ . Now, sending $M\to\infty$ and using Lemma 5.1 give

|x_{\infty}^{l}-x_{n}^{i,l}|\leq\frac{{\widetilde{C}}(\omega)e^{-\Lambda_{2}n}}{1-e^{-\Lambda_{2}}},\quad\mbox{a.s.}

This yields the desired estimate:

\|\mathbb{x}_{\infty}-\mathbb{x}_{n}^{i}\|_{\infty}\leq\frac{{\widetilde{C}}(\omega)}{1-e^{-\Lambda_{2}}}e^{-\Lambda_{2}n},\quad\mbox{a.s.}\quad n\geq 0.

∎

6. Application to optimization problem

In this section, we discuss how the consensus estimates studied in previous section can be applied to the optimization problem. Let $L$ be an objective function that we look for a minimizer of

\min_{\mathbb{x}\in\mathbb{R}^{d}}L(\mathbb{x}).

For the weighted representative point (1.2) with a nonempty set $S\subset{\mathcal{N}}$ , we set

(6.1)

{\bar{\mathbb{x}}}_{n}^{S,*}:=\mathbb{x}_{n}^{i}\quad\mbox{with}~{}i=\min\{\operatorname{argmin}_{j\in S}L({\mathbb{x}_{n}^{j}})\}.

In this setting, we specify one agent of the set $\operatorname{argmin}_{X^{j}:j\in S}L(X^{j})$ [6].

Proposition 6.1.

Let ${\{\mathbb{x}}^{i}_{n}\}$ be a solution process to (1.3) equipped with (6.1). Then, for all $n\geq 0$ , the best guess among all agents at time $n$ has monotonicity of its objective value:

\min_{1\leq j\leq N}L(\mathbb{x}_{n+1}^{j})\leq\min_{1\leq j\leq N}L(\mathbb{x}_{n}^{j}).

Proof.

For a fixed $n$ , let $i$ be the index satisfying

i=\min\{\operatorname{argmin}_{1\leq j\leq N}L(\mathbb{x}_{n}^{j})\}.

Then, it is easy to see

i=\min\{\operatorname{argmin}_{j\in[i]_{n}}L(\mathbb{x}_{n}^{j})\},\quad\mbox{i.e.,}\quad{\bar{\mathbb{x}}}_{n}^{[i]_{n},*}=\mathbb{x}_{n}^{i}.

For each $1\leq l\leq d$ , we have

	$\displaystyle x^{i,l}_{n+1}$	$\displaystyle=(1-\gamma-\eta_{n}^{i,l})x^{i,l}_{n}+(\gamma+\eta_{n}^{i,l})\sum_{j\in[i]_{n}}\omega_{[i]_{n},j}(\mathbb{x}_{n}^{1},\dots,\mathbb{x}_{n}^{N})x_{n}^{j,l}$
		$\displaystyle=(1-\gamma-\eta_{n}^{i,l})x^{i,l}_{n}+(\gamma+\eta_{n}^{i,l})x_{n}^{i,l}=x_{n}^{i,l},$

i.e.,

\mathbb{x}_{n+1}^{i}=\mathbb{x}_{n}^{i}.

Hence, one has

\min_{1\leq j\leq N}L(\mathbb{x}_{n+1}^{j})\leq L(\mathbb{x}_{n+1}^{i})=L(\mathbb{x}_{n}^{i})=\min_{1\leq j\leq N}L(\mathbb{x}_{n}^{j}).

∎

Remark 6.1.

Proposition 6.1 suggests that from the relation

\min_{1\leq j\leq N}L(\mathbb{x}_{n}^{j})=\min_{0\leq t\leq n,~{}1\leq j\leq N}L(\mathbb{x}_{t}^{j}),

the monotonicity actually gives

{\bar{\mathbb{x}}}_{n}^{\mathcal{N},*}\in\operatorname{argmin}_{\mathbb{x}_{n}^{j},~{}1\leq j\leq N}L(\cdot)=\operatorname{argmin}_{\mathbb{x}_{n}^{j},~{}0\leq t\leq n,~{}1\leq j\leq N}L(\cdot).

In other words, ${\bar{\mathbb{x}}}_{n}^{\mathcal{N},*}$ is a best prediction for the minimizer of $L$ , among the trajectories of all agents on time interval $[0,~{}n]$ . In the particle swarm optimization (PSO, in short) literature [23], such point is called ‘global best’, and this observation tells us that the algorithm (1.3) equipped with (6.1) may be considered as a simplified model of the PSO algorithm.

Next, we provide numerical simulations for the CBO alglorithms with (6.1). From Proposition 6.1, this algorithm has monotonicity related to the objective function, which is useful to locate the minimum of $L$ . The CBO algorithms (1.3)–(6.1) will be tested with the Rastrigin function as in [27], which reads

L(x)=\frac{1}{d}\sum_{i=1}^{d}\left[(x_{i}-B_{i})^{2}-10\cos(2\pi(x_{i}-B_{i}))+10\right]+C,

where $\mathbf{B}=(B_{1},\dots,B_{d})$ is the global minimizer of $L$ . We set $C=0$ and $B_{i}=1$ for all $i$ . The Rastrigin function has a lot of local minima near $B$ , whose values are very close to the global minimum. The parameters are chosen similar to [6, 27]:

N=100,\quad\gamma=0.01,\quad\zeta=0.5,\quad P=10,50,\quad d=2,3,4,5,6,7,8,9,10,

where the initial positions of the particles are distributed uniformly on the box $[-3,3]^{d}$ .

Note that the objective function $L$ only affects the weighted average $\bar{\mathbb{x}}_{n}^{S,*}$ in the CBO dynamics (1.3). Theorems 4.1 and 4.2 show that $\gamma$ and $\zeta$ mainly determine (a lower bound of) speed of convergence. One useful property of the CBO algorithm is that it always produces a candidate solution for the optimization problem (see Theorem 5.1). However, the candidate may not even be a local minimum. Therefore, following [27], we consider one simulation successful if the candidate is close to the global minimum $B$ in the sense that

\|\mathbb{x}_{n}^{i}-\mathbf{B}\|_{\infty}<0.25,\quad\mbox{with}\quad i=\min\left\{\operatorname{argmin}_{1\leq j\leq N}L({\mathbb{x}_{n}^{j}})\right\}.

The stopping criterion is made with the change of positions,

\sum_{i=1}^{N}|\mathbb{x}_{n+1}^{i}-\mathbb{x}_{n}^{i}|^{2}<\varepsilon,\quad\varepsilon=10^{-3}.

Refer to caption — Figure 1. Time-evolution of the best objective value $\min_{j}L({\mathbb{x}}_{n}^{j})$ in CBO algorithms with full batch (left) and random batches of $P=10$ (right) for $d=4$ and $N=100$ .

Figure 1 shows the dynamics of the current guess of the minimum values in CBO algorithms with full batch (same as $P=100=N$ ) and random batches ( $P=10$ ) for $d=4$ . Since we choose the argument minimum for the weighted average $\bar{\mathbb{x}}_{n}^{S,*}$ , the best objective value $\min_{j}L({\mathbb{x}}_{n}^{j})$ is not increasing along the optimization step. This property cannot be observed when we use a weighted average as in (2.1). Of course, for large $\beta$ , (2.1) is close to the argument minimum (6.1). We can also check that the speed of consensus is much slower if $P=10$ since the best objective value does not change for a long time (for example, from $n=100$ to $n=200$ ).

Success rate	Full batch ( $P=100$ )	$P=50$	$P=10$
d = 2	1.000	1.000	1.000
d = 3	0.988	0.983	0.998
d = 4	0.798	0.920	0.988
d = 5	0.712	0.658	0.931
d = 6	0.513	0.655	0.880
d = 7	0.388	0.464	0.854
d = 8	0.264	0.389	0.832
d = 9	0.170	0.323	0.868
d = 10	0.117	0.274	0.886

Table 1. Success rates of CBO algorithms with different dimensions of Rastrigin functions.

1000

simulations are done for each algorithm.

Table 1 presents the success rates of finding the minima $\mathbf{B}$ for different dimensions and algorithms. It clearly shows that the rates get better for smaller $p$ , due to the increased randomness. However, the computation with small $p$ takes more steps to converge, as shown in Figure 2.

Figure 2 shows a weak point of the random batch interactions by showing the number of time steps needed to reach the stopping criterion. Note that the computational costs for each time step is similar for different $P$ . However, as it slows down the convergence speed, the number of time steps increases as $N/P$ increases.

Figure 3 shows sample trajectories of the CBO particles with full batch and random batches ( $P=10$ ) for $d=4$ when there is no noise ( $\zeta=0$ ). If there is noise, then the trajectories nearly cover the area near $0$ so that we cannot catch the differences easily. Note that the random interactions make the particles move around the space. Contrary to the random batch interactions, the particles iwithfull batch dynamics move toward the same point, which is the current minimum position among the particles. This may explain that random batch interactions have better searching ability though it requires more computational cost.

7. Conclusion

In this paper, we have provided stochastic consensus and convergence analysis for the discrete CBO algorithm with random batch interactions and heterogeneous external noises. Several versions of discrete CBO algorithm has been proposed to deal with large scale global non-convex minimization problem arising from, for example, machine learning prolems. Recently, thanks to it simplicity and derivative-free nature, consensus based optimization has received lots of attention from applied mathematics community, yet the full understanding of the dynamic features of the CBO algorithm is still far from complete. Previously, the authors investigated consensus and convergence of the modified CBO algorithm proposed in [6] in the presence of homogeneous external noises and full batch interactions and proposed a sufficient framework leading to the consensus and convergence in terms of system parameters. In this work, we extend our previous work [18] with two new components (random batch interactions and heterogeneous external noises). To deal with random batch interactions, we recast the CBO algorithm with random batch interactions into the consensus model with randomly switching network topology so that we can apply well-developed machineries [11, 12] for the continuous and discrete Cucker-Smale flockng models using ergodicity coefficient measuring connectivity of network topology (or mixing of graphs). Our analysis shows that stochastic consensus and convergence emerge exponentially fast as long as the variance of external noises is sufficiently small. Our consensus estimate yields the monotonicity of an objective function along the discrete CBO flow. One should note that our preliminary result does not yield error estimate for the optimization problem. Thus, whether our asymptotic limit point of the discrete CBO flow stays close to the optimal point or not will be an interesting problem for a future work.

References

[1] Albi, G., Bellomo, N., Fermo, L., Ha, S.-Y., Pareschi, L., Poyato, D. and Soler, J.: Vehicular traffic, crowds, and swarms. On the kinetic theory approach towards research perspectives. Math. Models Methods Appl. Sci. 29 (2019), 1901-2005.
[2] Alpin, Yu. A. and Gabassov, N. Z.: A remark on the problem locating the eigenvalues of real matrices. Izv. Vyssh. Uchebn. Zaved. Mat., 1976, no. 11, 98–100; Soviet Math. (Iz. VUZ), 20:11 (1976), 86–88.
[3] Askari-Sichani, O. and Jalili, M.: Large-scale global optimization through consensus of opinions over complex networks. Complex Adapt Syst Model 1, 11 (2013).
[4] Bonabeau, E., Dorigo, M. and Theraulaz, G. (1999). Swarm intelligence: from natural to artificial systems. Oxford university press.
[5] Carrillo, J. A., Choi, Y.-P., Totzeck, C. and Tse, O.: An analytical framework for consensus-based global optimization method. Math. Models Methods Appl. Sci. 21 (2018), 1037–1066.
[6] Carrillo, J. A., Jin, S., Li, L. and Zhu, Y.: A consensus-based global optimization method for high dimensional machine learning problems. ESAIM: COCV, 27 S5 (2021).
[7] Chatterjee, S. and Seneta, E.: Towards consensus: some convergence theorems on repeated averaging. Journal of Applied Probability 14 No. 1 (1977), pp. 89-97.
[8] Chen, J., Jin, S. and Lyu, L.: A consensus-based global optimization method with adaptive momentum estimate. Preprint (2020).
[9] Degroot, M. H.: Reaching a consensus. Journal of the American Statistical Association 69 No. 345 (1974), pp. 118-12.
[10] Dembo, A. and Zeitouni, O.: Large deviations techniques and applications. Springer-Verlag, Berlin, Heidelberg, second edition, 1998.
[11] Dong, J.-G., Ha, S.-Y., Jung, J. and Kim, D.: Emergence of stochastic flocking for the discrete Cucker-Smale model with randomly switching topologies. Commun. Math. Sci. 19 (2021), 205-228.
[12] Dong, J.-G., Ha, S.-Y., Jung, J. and Kim, D.: On the stochastic flocking of the Cucker-Smale flock with randomly switching topologies. SIAM Journal on Control and Optimization. 58 (2020), 2332-2353.
[13] Dorigo, M., Birattari, M. and Stutzle, T.: Ant colony optimization. IEEE computational intelligence magazine, 1 (2006) 28-39.
[14] Durrett, R.: Probability: theory and examples. Second ed.. Duxbury Press 1996.
[15] Fornasier, M., Huang, H., Pareschi, L. and Sünnen, P.: Consensus-based optimization on hypersurfaces: Well-posedness and mean-field limit. Mathematical Models and Methods in Applied Sciences 30 No. 14, pp. 2725-2751 (2020).
[16] Fornasier, M., Huang, H., Pareschi, L. and Sünnen, P.: Consensus-based optimization on the sphere II: Convergence to global minimizers and machine learning. Preprint (2020).
[17] Ha, S.-Y., Jin, S. and Kim, D.: Convergence of a first-order consensus-based global optimization algorithm. Math. Models Meth. Appl. Sci. 30 (12), 2417-2444 (2020).
[18] Ha, S.-Y., Jin, S. and Kim, D.: Convergence and error estimates for time-discrete consensus-based optimization algorithms. Numer. Math. 147, 255–282 (2021).
[19] Ha, S.-Y., Kang, M., Kim, D., Kim, J. and Yang, I.: Stochastic consensus dynamics for nonconvex optimization on the Stiefel manifold: mean-field limit and convergence. Submitted (2021).
[20] Holland, J. H.: Genetic algorithms. Scientific American 267 (1992), 66-73.
[21] Jin, S., Li, L. and Liu, J. G.: Random Batch Methods (RBM) for interacting particle systems. Journal of Computational Physics, 400 (2020), 108877.
[22] Kennedy, J. (2006). Swarm intelligence. In Handbook of nature-inspired and innovative computing (pp. 187-219). Springer, Boston, MA.
[23] Kennedy, J. and Eberhart, R. (1005). Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks (Vol. 4, pp. 1942-1948). IEEE.
[24] Kim, J., Kang, M., Kim, D., Ha, S.-Y. and Yang, I.: A stochastic consensus method for nonconvex optimization on the Stiefel Manifold. 2020 59th IEEE Conference on Decision and Control, 2020, pp. 1050-1057.
[25] Markov, A. A., (1906). Extension of the law oflarge numbers to dependent quantities [in Russian]. Izv. Fiz.-Matem. Ohsch. Kazan Univ., (2nd ser.) 15, 135-56.
[26] Mirjalili, S., Mirjalili, S. M. and Lewis, A.: Grey wolf optimizer. Advances in engineering software, 69 (2014), 46-61.
[27] Pinnau, R., Totzeck, C., Tse, O. and Martin, S.: A consensus-based model for global optimization and its mean-field limit. Mathematical Models and Methods in Applied Sciences, 27 (2017), 183-204.
[28] Totzeck, C. Pinnau, R., Blauth, S. and Schotthófer, S.: A numerical comparison of consensus-based global optimization to other particle-based global optimization scheme. Proceedings in Applied Mathematics and Mechanics, 18, 2018.
[29] Yang, X.-S.: Nature-inspired metaheuristic algorithms. Luniver Press, 2010.
[30] Yang, X.-S., Deb, S., Zhao, Y.-X., Fong, S. and He, X.: Swarm intelligence: past, present and future. Soft Comput 22 (2018), 5923-5933.

	$\displaystyle\\|(A_{1}+B_{1})\cdots(A_{n}+B_{n})-A_{1}\cdots A_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\leq\\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})A_{n}-A_{1}\cdots A_{n}\\|_{1,\infty}+\\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})B_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\leq\\|(A_{1}+B_{1})\cdots(A_{n-1}+B_{n-1})-A_{1}\cdots A_{n-1}\\|_{1,\infty}\\|A_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\quad+(\\|A_{1}\\|_{1,\infty}+\\|B_{1}\\|_{1,\infty})\cdots(\\|A_{n-1}\\|_{1,\infty}+\\|B_{n-1}\\|_{1,\infty})\\|B_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\leq\big{(}(\\|A_{1}\\|_{1,\infty}+\\|B_{1}\\|_{1,\infty})\cdots(\\|A_{n-1}\\|_{1,\infty}+\\|B_{n-1}\\|_{1,\infty})-\\|A_{1}\\|_{1,\infty}\cdots\\|A_{n-1}\\|_{1,\infty}\big{)}\\|A_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt\quad+(\\|A_{1}\\|_{1,\infty}+\\|B_{1}\\|_{1,\infty})\cdots(\\|A_{n-1}\\|_{1,\infty}+\\|B_{n-1}\\|_{1,\infty})\\|B_{n}\\|_{1,\infty}$
	$\displaystyle\hskip 28.45274pt=(\\|A_{1}\\|_{1,\infty}+\\|B_{1}\\|_{1,\infty})\cdots(\\|A_{n}\\|_{1,\infty}+\\|B_{n}\\|_{1,\infty})-\\|A_{1}\\|_{1,\infty}\cdots\\|A_{n}\\|_{1,\infty}.$

(5.5)		$\displaystyle\|x_{M}^{i,l}-x_{n}^{i,l}\|$	$\displaystyle=\|\mathbb{e}_{i}^{\top}(\mathfrak{x}_{M}^{l}-\mathfrak{x}_{n}^{l})\|\leq\gamma\sum_{k=n}^{M-1}\|\mathbb{e}_{i}^{\top}\left(I_{N}-W_{k}\right)\mathfrak{x}_{k}^{l}\|+\sum_{k=n}^{M-1}\|\eta_{k}^{i,l}\|\|\mathbb{e}_{i}^{\top}(I_{N}-W_{k})\mathfrak{x}_{k}^{l}\|$
(5.5)			$\displaystyle\leq\sum_{k=n}^{M-1}(\gamma+\|\eta_{k}^{i,l}\|)\mathcal{D}(\mathfrak{x}_{k}^{l}),$