Convergence analysis of the discrete consensus-based optimization algorithm with random batch interactions and heterogeneous noises
Abstract.
We present stochastic consensus and convergence of the discrete consensus-based optimization (CBO) algorithm with random batch interactions and heterogeneous external noises. Despite the wide applications and successful performance in many practical simulations, the convergence of the discrete CBO algorithm was not rigorously investigated in such a generality. In this work, we introduce a generalized discrete CBO algorithm with a weighted representative point and random batch interactions, and show that the proposed discrete CBO algorithm exhibits stochastic consensus and convergence toward the common equilibrium state exponentially fast under suitable assumptions on system parameters. For this, we recast the given CBO algorithm with random batch interactions as a discrete consensus model with a random switching network topology, and then we use the mixing property of interactions over sufficiently long time interval to derive stochastic consensus and convergence estimates in mean square and almost sure senses. Our proposed analysis significantly improves earlier works on the convergence analysis of CBO models with full batch interactions and homogeneous external noises.
Key words and phrases:
Consensus, external noise, interacting particle system, random batch interactions, randomly switching network topology2010 Mathematics Subject Classification:
37M99, 37N30, 65P99
1. Introduction
Swarm intelligence [4, 22] provides remarkable meta-heuristic gradient-free optimization methods. Motivated by collective behaviors [1] of herds, flock, colonies or schools, several population-based meta-heuristic optimization techniques were proposed in literature [29, 30]. To name a few, particle swarm optimization [23], ant colony optimization [13], genetic algorithm [20] and grey wolf optimizer [26] are popular meta-heuristic algorithms which have attracted researchers from various disciplines over the last two decades. Among them, consensus-based optimization (CBO) algorithm [27, 5] is a variant methodology of particle swarm optimization, which adopts the idea of consensus mechanism [7, 9] in a multi-agent system. The mechanism is a dynamic process for agent (particle) to approach the common consensus via cooperation [3, 6, 8, 15, 16, 17, 18, 19, 21, 28]. CBO algorithm was further studied as a system of stochastic differential equations [27] and has also been applied to handle objective functions on constrained search spaces, for example hypersurfaces [15], high-dimensional spaces [6], the Stiefel manifold [19, 24], etc.
In this paper, we are interested in stochastic consensus estimates for a discrete CBO algorithm incorporating two components: “heterogeneous external noises” and “random batch interactions”. In [6], a CBO algorithm with these two aspects was studied numerically, while a previous analytical study of CBO in [18] does not cover these aspects.
We first briefly describe the discrete CBO algorithm in [6, 18]. Suppose is non-convex objective function to minimize. As an initial guess, we introduce initial data of particles for . This is commonly sampled from random variables as in Monte Carlo methods, independent of other system parameters. Then, the following CBO algorithm [6, 18] governs temporal evolution of each sample point to find the optimal value among the values on the trajectories of particles:
(1.1) |
where the random variables are independent and identically distributed(i.i.d.) with
In particular, from the current data of distributed agents, the network interactions between agents share the information of neighbors, and each agent asymptotically approaches to the minimizer of the objective function. Hence, the main role of the CBO algorithm is to make the sample point converge to the global minimizer of .
In the sequel, we describe a more generalized CBO algorithm which includes (1.1) as its special case. Let be the state of the -th sample point at time in the search space, and the set denotes an index set of agents interacting with at time (neighboring sample points in the interaction network). For notational simplicity, we also set
For a nonempty set and an index , we introduce a weight function which satisfies the following relations:
and the convex combination of :
(1.2) |
Before we introduce our governing CBO algorithm, we briefly discuss two key components (heterogeneous external noises and random batch interactions) one by one.
First, a sequence of random matrices are assumed to be i.i.d. with respect to , with finite first two moments:
Second, we discuss “random batch method (RBM)” in [6, 21], to introduce random batch interactions. For the convergence analysis, we need to clarify how RBM provides a randomly switching network. At the -th time instant, we choose a partition of randomly, of batches (subsets) with sizes at most as follows:
Let be a set of all such partitions for . Then, the choices of batches are independent at each time and follow the uniform law in the finite set of partitions . The random variables yield random batches at time for each realization , which are also assumed to be independent of the initial data and noise . Thus, for each time , the index set of neighbors is an element of the partition from containing . With the aforementioned key components, we are ready to consider the Cauchy problem to the generalized discrete CBO algorithm under random batch interactions and heterogeneous noises:
(1.3) |
where the initial data can be deterministic or random, and the drift rate is a positive constant. In this way, the dynamics (1.3) includes random batch interactions and heterogeneous noises.
In [18], the consensus analysis and error estimates for (1.1) have been done, which is actually the following special case of (1.3): for a positive constant ,
(1.4) |
Here both right-hand sides are independent of , namely the full batch is used, all agent interacts with all the other agents, and the noise is the same for all particles. We refer to Section 2.1 for details.
In this paper, we are interested in the following two questions.
-
•
(Q1: Emergence of stochastic consensus state) Is there a common consensus state for system (1.3), i.e.,
-
•
(Q2: Optimality of the stochastic consensus state) If so, then is the consensus state an optimal point?
Among the above two posed questions, we will mostly focus on the first question (Q1). In fact, the continuous analogue for (1.3) has been dealt with in literature. In the first paper of CBO [3], the CBO algorithm without external noise () has been studied, and the formation of consensus has been proved under suitable assumption on the network structure. However, the optimality was only shown by numerical simulation even for this deterministic case. In [6], the dynamics (1.3) has been investigated via a mean-field limit (). In this case, the particle density distribution, which is the limit of the empirical measure in , satisfies a nonlinear Fokker-Planck equation. By using the analytical structure of the Fokker-Planck equation, the consensus estimate was established. The optimality of the limit value is also estimated by Laplace’s principle, as (see Section 2). Rigorous convergence and error estimates to the continuous and discrete algorithms for particle system (1.3) with fixed were addressed in authors’ recent works [17, 18] for the special setting (1.4).
Before we present our main results, we introduce several concepts of stochastic consensus and convergence as follows.
Definition 1.1.
Let be a discrete stochastic process whose dynamics is governed by (1.3). Then, several concepts of stochastic consensus and convergence can be defined as follows.
-
(1)
The random process exhibits “consensus in expectation” or “almost sure consensus”, respectively if
(1.5) where denotes the standard -norm in .
-
(2)
The random process exhibits “convergence in expectation” or “almost sure convergence”, respectively if there exists a random vector such that
(1.6)
In addition, we introduce some terminologies. For a given set of vectors , we use to denote the matrix whose -th row is , and the -th column vector , as
The main results of this paper are two-fold. First, we present stochastic consensus estimates for (1.5) in the sense of Definition 1.1 (1). For this, we begin with the dynamics of the -th column vector :
for some random matrices and corresponding to random source due to, first, only random batch interactions, and second, interplay between the external white noises and random batch interactions, respectively (see Section 2.1 for details). Then the diameter functional satisfies
(1.7) |
where is the ergodicity coefficient of matrix (see Definition 3.1).
For the case (1.4) in which interactions are full batch () with a constant weight function and the noises are homogenous, one can show that the ergodicity is strictly positive for small . Then, the recursive relation (1.7) yields desired exponential consensus. Moreover, the assumption (1.4) also implies that the dynamics of follows a closed form of stochastic difference equation (see [18]). Then, following the idea of geometric Brownian motion from [17], the decay of can be obtained. However, in our setting (1.3), the aforementioned blue picture breaks down. In fact, we cannot even guarantee the nonnegativity of . From the definition of ergodic coefficient, the positive ergodicity coefficient implies that any two particles are attracted thanks to network structure, which is not possible when particles are separated by random batches. Hence, the recursive relation (1.7) is not enough to derive an exponential decay of as it is. On the other hand, to quantify enough mixing effect via the network topology, we use the -transition relation with so that
In this case, the -transition relation satisfies
Again, we use the elementary property of ergodicity coefficient presented in Lemma 3.3 to find
One of the crucial novelties of this paper lies on the estimation technique for a product of matrix summations. When there is no noise , we can use analysis on randomly switching topologies [12] to derive the positivity of the ergodic coefficient. In order to handle heterogenous external noises, we adopt the concept of negative ergodicity from general -by- matrices and treat as a perturbation (see Lemma 4.1 and Lemma 4.2 for details). This yields an exponential decay of in suitable stochastic senses for a sufficiently small , the variance of the noise (see Theorem 4.1 and Theorem 4.2): there exist positive constants , , and a random variable such that
(1.8) |
This implies
Second, we deal with the convergence analysis (1.6) of stochastic flow . One first sees that the dynamics of can be described as
(1.9) |
Then, applying the strong law of large numbers as in [18], the almost sure exponential convergence (LABEL:A-6) for induces the convergence of the first series in the R.H.S. of (1.9). On the other hand, the convergence of the second series will be shown using Doob’s martingale convergence theorem. Thus, for sufficiently small , there exists a random variable such that
We refer to Lemma 5.1 for details. Moreover, the above convergence result can be shown to be exponential in expectation and almost sure senses. A natural estimate yields
however, the external noises are not uniformly bounded in almost sure sense. Instead, the noises with decay in time, , is uniformly bounded for any , as presented in Lemma 5.2. This is a key lemma employed in the derivation of exponential convergence: there exists a positive constant and a random variable such that
(See Theorem 5.1 for details).
The rest of this paper is organized as follows. In Section 2, we briefly present the discrete CBO algorithm [6] and review consensus and convergence result [18] for the case of homogeneous external noises. In Section 3, we study stochastic consensus estimates in the absence of external noises so that the randomness in the dynamics is mainly caused by random batch interactions. In Section 4, we consider the stochastic consensus arising from the interplay of two random sources (heterogeneous external noises and random batch interactions) when the standard deviation of heterogeneous external noise is sufficiently small. In Section 5, we provide a stochastic convergence analysis for the discrete CBO flow based on the consensus results. In Section 6, we provide monotonicity of the maximum of the objective function along the discrete CBO flow, and present several numerical simulations to confirm analytical results. Finally, Section 7 is devoted to a brief summary of our main results and some remaining issues to be explored in a future work.
Gallery of notation: We use superscript to denote the particle number and components of a vector, for example, denotes the -th component of the -th’s particle, i.e.,
We also use simplified notation for summation and maximization:
For a given state vector , we introduce its state diameter :
(1.10) |
For , we denote its -norm by .
Lastly, for integers and square matrices with the same size, we define
Note that the ordering in the product is important, because the matrix product is not commutative in general.
2. Preliminaries
In this section, we briefly present the discrete CBO algorithm (1.3) and then recall convergence and optimization results from [18].
2.1. The CBO algorithms
Let be the state of the -th sample point at time in search space, and we assume that state dynamics is governed by the CBO algorithm introduced in [6].
(2.1) |
Here , and are drift rate, noise intensity and reciprocal of temperature, respectively. is the standard orthonormal basis in , and are i.i.d. standard one-dimensional Brownian motions.
Note the following discrete analogue of Laplace’s principle [10]:
This explains why CBO algorithm (2.1) is expected to pick up the minimizer of .
Next, we consider time-discretization of (2.1). For this, we set
Consider the following three discrete algorithms based on [6, 18]: for
where the random variables are i.i.d standard normal distributions. Model A is precisely the Euler-Maruyama scheme applied to (2.1), and Models B and C are other discretizations of (2.1) introdueced in [6]. In [18], it is shown that the three models are special cases (with different choices of and ) of the following generalized scheme:
2.2. Previous results
In this subsection, we briefly summarize the previous results [18] on the convergence and error estimates for the case
(thus noises are not heterogeneous). In the next theorem, we recall the convergence estimate of the discrete scheme (2.2) as follows.
Theorem 2.1.
(Convergence estimate [18]) Suppose system parameters and satisfy
and let be a solution process to (2.2). Then, one has the following assertions:
-
(1)
Consensus in expectation and in almost sure sense emerge asymptotically:
for and a random variable satisfying
-
(2)
There exists a common random vector such that
3. Discrete CBO algorithm with random batch interactions
In this section, we present matrix formulation of (1.3), some elementary estimates for ergodicity coefficient and stochastic consensus to system (1.3) in the absence of heterogeneous external noises.
3.1. A matrix reformulation
In this subsection, we present a matrix formulation of the following discrete CBO model:
(3.1) |
Then, the -th component of (3.1) can be rewritten as
(3.2) |
For and , set
(3.3) |
Note that denotes the -th column of the matrix . Then system (3.2) can be rewritten in the following matrix form:
The randomness of is due to the random switching of network topology via random batch interactions. In contrast, takes care of the external noises . In the absence of external noise (), one has for all and system (3.1) reduces to the stochastic system:
(3.4) |
In the next lemma, we study several properties of .
Lemma 3.1.
Let be a matrix defined in (3.1). Then, it satisfies the following two properties:
-
(1)
is row-stochastic:
where is the -th element of the matrix .
-
(2)
The product is row-stochastic:
Proof.
(2) The second assertion holds since the product of two row-stochastic matrices is also row-stochastic.
∎
Note that the asymptotic dynamics of is completely determined by the matrix sequence in (3.4).
3.2. Ergodicity coefficient
In this subsection, we recall the concept of ergodicity coefficient and investigate its basic properties which will be crucially used in later sections.
First, we recall the ergodicity coefficient of a matrix as follows.
Definition 3.1.
[2] For , we define the ergodicity coefficient of as
(3.5) |
Remark 3.1.
As a direct application of (3.5), one has the super-additivity and monotonicity for .
Lemma 3.2.
Let be square matrices in . Then, the ergodicity coefficient has the following assertions:
-
(1)
(Super-additivity and homogeneity):
-
(2)
(Monotonicity):
Proof.
All the estimates can be followed from (3.5) directly. Hence, we omit its details. ∎
Next, we recall a key tool that will be crucially used in the consensus estimates for (3.1).
Lemma 3.3.
Proof.
Next, we study elementary lemmas for the emergence of consensus. From Definition 1.1, for consensus, it suffices to show
We apply Lemma 3.3 for (3.4) and row-stochasticity of to get
However, note that the ergodicity coefficient may not be strictly positive. For , we iterate (3.4) -times to get
Since the product is also row-stochastic (see Lemma 3.1), we can apply Lemma 3.3 to find
Then, the ergodicity coefficient depends on the network structure between , and the probability of being strictly positive increases, as grows.
In [7], the consensus was guaranteed under the assumption:
(3.6) |
However, examples of the network topology (which may vary in time, possibly randomly) that satisfies the aforementioned a priori assumption (3.6) was not well-studied yet. Below, we show that random batch interactions provide a network topology satisfying a priori condition (3.6).
First, we introduce a random variable measuring the degree of network connectivity. For a given pair , we consider the set of all time instant within the time zone such that and are in the same batch . More precisely, we introduce
(3.7) |
Note that is non-empty if one batch contains both and for some instant in . In the following lemma, we will estimate ergodicity coefficients from random batch interactions.
Lemma 3.4.
Let be the transition matrix in (3.1). Then, for given integers , and ,
(3.8) |
Proof.
We derive the estimate (3.8) via two steps:
(3.9) |
Step A (Derivation of the first inequality): Recall that
Then, it follows from Lemma 3.2 that
(3.10) |
Remark 3.2.
Below, we provide two comments on the result of Lemma 3.4.
-
(1)
Lemma 3.3 says that, if the union of network structure from time to time forms all-to-all connected network, then the ergodicity coefficient is positive. To make the ergodicity coefficient positive, the union network needs not be necessarily all-to-all. However, the estimation on the lower bound of could be more difficult.
-
(2)
The diameter satisfies
3.3. Almost sure consensus
In this subsection, we provide a stochastic consensus result when the stochastic noises are turned off, i.e., . For this, we choose a suitable such that has a strictly positive expectation.
Lemma 3.5.
The following assertions hold.
-
(1)
There exists such that there exist partitions of in , such that for any , one has for some and .
-
(2)
For ,
(3.12)
Proof.
(1) First, the existence of and partitions are clear since we may consider all possible partitions of in . In other words, we may choose as .
(2) Note that the random choices of batches are independent and identically distributed. Hence, the expectations of and should be identical:
Next, let be the partitions chosen in (1). Note that the probability of having depends on : for any ,
Therefore, we may proceed to give a rough estimate on :
This gives ∎
Now we are ready to provide a stochastic consensus estimate on the dynamics (3.1) in the absence of noise, .
Theorem 3.1.
Let be a solution process to (1.3) with a.s. for any , and let . Then, for , there exists a nonnegative random variable such that
(3.13) |
where the decay exponent satisfies
(3.14) |
Proof.
(i) We claim
(3.15) |
(First inequality in (3.15)): Since has only nonnegative elements, the values of at products of ’s are nonnegative. Hence,
(3.16) |
(Second inequality in (3.15)): By Lemma 3.4, the ergodicity coefficient can be written in terms of :
(3.17) |
where we used in the last inequality. By induction on in (3.17),
(3.18) |
We combine (3.16) and (3.18) to get the desired estimate (3.15). Finally, set random variable as
It is also worth mentioning again that the transition matrix has only nonnegative elements. In [7], the nonnegative version of Lemma 3.3 was applied using the result in [25]. If one takes into account of the randomness , we need to consider Lemma 3.3 and generalize Lemma 3.4 for transition matrices with possibly negative entries. We will analyze it in the next section.
4. Discrete CBO with random batch interactions and heterogeneous external noises: consensus analysis
In this section, we study consensus estimates for the stochastic dynamics (3.1) in the presence of both random batch intereactions and heterogeneous external noises. In fact, the materials presented in Section 3 corresponds to the special case of this section. Hence, presentation will be parallel to that of Section 3 and we will focus on the effect of external noises on the stochastic consensus.
4.1. Ergodicity coefficient
As discussed in Section 3, we use the ergodicity coefficient to prove the emergence of stochastic consensus. Recall the governing stochastic system:
(4.1) |
As in Lemma 3.4, one needs to compute the ergodicity coefficient of
(4.2) |
for integers and .
In what follows, we study preliminary lemmas for the estimation of ergodicity coefficient to handle the stochastic part due to external noises as a perturbation. Then, system (LABEL:D-1) can be viewed as a perturbation of the nonlinear system (3.4) which has been treated in the previous section. For a given square matrix , define a mixed norm:
Lemma 4.1.
For a matrix , the ergodicity coefficient has a lower bound:
Proof.
By direct calculation, one has
∎
The following lemma helps to give a lower bound for the ergodicity coefficient of the product of matrices in (4.2).
Lemma 4.2.
For , let , be matrices in . Then, one has
(4.3) |
Proof.
We use the induction on . The initial step is trivial. Suppose that the inequality (LABEL:D-3-1) holds for . By the subadditivity and submultiplicativity of the matrix norm , we have
This verifies the desired estimate (LABEL:D-3-1). ∎
From Lemma 4.1 and Lemma 4.2, we are ready to estimate the ergodicity coefficient of (4.2). For this, we also introduce a new random variable measuring the size of error:
(4.4) |
where .
Lemma 4.3.
Proof.
First, we use super-additivity of in Lemma 3.2:
(4.6) |
(Estimate of ): This case has already been treated in Lemma 3.4:
(4.7) |
(Estimate of ): The term can be regarded as an error term due to external stochastic noise. We may use Lemma 4.1 to get a lower bound of this term:
(4.8) |
By Lemma 4.2, one gets
(4.9) |
By the defining relations (3.1), and can be estimated as follows.
(4.10) | ||||
Now, we combine (4.8), (4.9) and (4.10) to estimate :
(4.11) |
Finally, combining (4.6), (4.7) and (4.11) yield the desired estimate:
∎
4.2. Stochastic consensus
Recall the discrete system for :
By iterating the above relation times, one gets
(4.12) |
Lemma 4.4.
Let be a solution process to (1.3) and let and be given integers. Then, for and ,
Proof.
We will follow the same procedure employed in Theorem 3.1, i.e., we first bound using , and then we bound using .
Step A: For , we estimate by using :
(4.13) | ||||
where we used the fact that in the last inequality.
Now, we are ready to provide an exponential decay of .
Theorem 4.1.
Proof.
Assume . It follows from Lemma 4.4 that
(4.18) |
Since the following random variables
are independent, we can take expectations on both sides of the estimate in (4.18) and then use the inequality to see that
In the sequel, we estimate the terms and one by one.
Case A (Estimate of ): Recall the defining relation in (4.4):
By taking expectation of ,
(4.19) | ||||
where we used the fact to get
Combining (4.2) and (4.19) yields
(4.20) | ||||
Note that our goal is to estimate in terms of . Then, is actually a function of . To be more specific,
If , then
(4.21) |
If , then
(4.22) |
Therefore, by combining (4.20), (4.21), (4.22) one has
where the sequence satisfies
provided that is sufficiently small. Hence
Now by setting
one gets the desired result. ∎
In the next theorem, we derive almost sure consensus of (1.3).
Theorem 4.2.
Proof.
(i) We apply the inequality to Lemma 4.4 to get
(4.23) |
As in Theorem 4.1, one may consider
Then, its limiting behavior is :
(4.24) |
Hence, one can apply the strong law of large numbers to get
(4.25) |
and
(4.26) | ||||
Finally, combining (4.23), (4.24), (4.25) and (4.26) gives
where the random process is defined by
and satisfies
Thus, if is sufficiently small,
This implies that, if one chooses a positive constant such that
(4.27) |
there exists a stopping time such that
In other words, there exists an almost surely positive and bounded random variable
which satisfies
(4.28) |
5. Convergence of discrete CBO flow
In this section, we present the convergence analysis of the CBO algorithm (1.3) using the stochastic consensus estimates in Theorems 4.1 and 4.2. In those theorems, it turns out that consensus of (1.3), i.e., decay of the relative state , is shown to be exponentially fast in expectation and in almost sure sense. However, this does not guarantee that a solution converges toward an equilibrium of the discrete model (1.3), since it may approach a limit cycle or exhibit chaotic trajectory even if the relative states decay to zero. The convergence of the states to a common point is an important issue for the purpose of global optimization of the CBO algorithm.
Lemma 5.1.
Proof.
We will use almost sure consensus of the states () in Theorem 4.2 and similar arguments of Theorem 3.1 in [18]. Similar to (3.1), we may rewrite (1.3) as
(5.1) |
where the convergence of for all is equivalent to that of for all . Summing up (5.1) over gives
(5.2) |
Note that the convergence of follows if the two vector series in the R.H.S. of (5.2) are both convergent almost surely as .
Case A (Almost sure convergence of ): For the -th standard unit vector in , one has
(5.3) |
where we used the fact that . It follows from Theorem 4.2 that the diameter process decays to zero exponentially fast almost surely. Hence, for each , the relation (5.3) implies
This shows that each component of the first series in (5.2) is convergent almost surely.
Case B (Almost sure convergence of ): Note that this series is martingale since the random vectors are independent and have zero mean: for any -field independent of , we have
where we used the fact that . By Doob’s martingale convergence theorem [14], the martingale converges almost surely if the series is uniformly bounded in . Again, it follows from (5.3) and Theorem 4.2 that
This yields that the second series in (5.3) converges almost surely.
In (5.3), we combine results from Case A and Case B to conclude that converges to a random variable almost surely. ∎
Lemma 5.2.
Let be an i.i.d. sequence of random variables with . Then, for any given , there exists a random variable such that
Proof.
By Markov’s inequality, for any constant one has
In particular,
Therefore, for any ,
This implies existence of a random variable satisfying
In details, we may define the random variable as
For example, if satisfies for some but not for , then and so that . ∎
Now, we are ready to present the convergence of in expectation and almost sure sense.
Theorem 5.1.
Proof.
Let be a positive integer appearing in Theorem 4.1. Then, for , there exists independent of the dimension , such that if , there exists some positive constants and such that
(5.4) |
Let . Then, summing up (5.1) over gives
As in the proof of Lemma 5.1, for the -th standard unit vector , one has
(5.5) | ||||
where the last inequality follows from .
(i) Taking expectations to (5.5) and use (5.4) gives This yields
(5.6) |
In (5.6), letting and using Lemma 5.1 give
(ii) In the proof of Theorem 4.2, the constant was chosen in (4.27) so that
Here one may fix and consider small satisfying
Then, from Theorem 4.2 and Lemma 5.2, the estimates of and can be described by
We apply these estimates to (5.5) and obtain
for some positive and bounded random variable , independent of . Now, sending and using Lemma 5.1 give
This yields the desired estimate:
∎
6. Application to optimization problem
In this section, we discuss how the consensus estimates studied in previous section can be applied to the optimization problem. Let be an objective function that we look for a minimizer of
For the weighted representative point (1.2) with a nonempty set , we set
(6.1) |
In this setting, we specify one agent of the set [6].
Proposition 6.1.
Proof.
For a fixed , let be the index satisfying
Then, it is easy to see
For each , we have
i.e.,
Hence, one has
∎
Remark 6.1.
Proposition 6.1 suggests that from the relation
the monotonicity actually gives
In other words, is a best prediction for the minimizer of , among the trajectories of all agents on time interval . In the particle swarm optimization (PSO, in short) literature [23], such point is called ‘global best’, and this observation tells us that the algorithm (1.3) equipped with (6.1) may be considered as a simplified model of the PSO algorithm.
Next, we provide numerical simulations for the CBO alglorithms with (6.1). From Proposition 6.1, this algorithm has monotonicity related to the objective function, which is useful to locate the minimum of . The CBO algorithms (1.3)–(6.1) will be tested with the Rastrigin function as in [27], which reads
where is the global minimizer of . We set and for all . The Rastrigin function has a lot of local minima near , whose values are very close to the global minimum. The parameters are chosen similar to [6, 27]:
where the initial positions of the particles are distributed uniformly on the box .
Note that the objective function only affects the weighted average in the CBO dynamics (1.3). Theorems 4.1 and 4.2 show that and mainly determine (a lower bound of) speed of convergence. One useful property of the CBO algorithm is that it always produces a candidate solution for the optimization problem (see Theorem 5.1). However, the candidate may not even be a local minimum. Therefore, following [27], we consider one simulation successful if the candidate is close to the global minimum in the sense that
The stopping criterion is made with the change of positions,


Figure 1 shows the dynamics of the current guess of the minimum values in CBO algorithms with full batch (same as ) and random batches () for . Since we choose the argument minimum for the weighted average , the best objective value is not increasing along the optimization step. This property cannot be observed when we use a weighted average as in (2.1). Of course, for large , (2.1) is close to the argument minimum (6.1). We can also check that the speed of consensus is much slower if since the best objective value does not change for a long time (for example, from to ).
Success rate | Full batch () | ||
---|---|---|---|
d = 2 | 1.000 | 1.000 | 1.000 |
d = 3 | 0.988 | 0.983 | 0.998 |
d = 4 | 0.798 | 0.920 | 0.988 |
d = 5 | 0.712 | 0.658 | 0.931 |
d = 6 | 0.513 | 0.655 | 0.880 |
d = 7 | 0.388 | 0.464 | 0.854 |
d = 8 | 0.264 | 0.389 | 0.832 |
d = 9 | 0.170 | 0.323 | 0.868 |
d = 10 | 0.117 | 0.274 | 0.886 |
Table 1 presents the success rates of finding the minima for different dimensions and algorithms. It clearly shows that the rates get better for smaller , due to the increased randomness. However, the computation with small takes more steps to converge, as shown in Figure 2.



Figure 2 shows a weak point of the random batch interactions by showing the number of time steps needed to reach the stopping criterion. Note that the computational costs for each time step is similar for different . However, as it slows down the convergence speed, the number of time steps increases as increases.
Figure 3 shows sample trajectories of the CBO particles with full batch and random batches () for when there is no noise (). If there is noise, then the trajectories nearly cover the area near so that we cannot catch the differences easily. Note that the random interactions make the particles move around the space. Contrary to the random batch interactions, the particles iwithfull batch dynamics move toward the same point, which is the current minimum position among the particles. This may explain that random batch interactions have better searching ability though it requires more computational cost.
7. Conclusion
In this paper, we have provided stochastic consensus and convergence analysis for the discrete CBO algorithm with random batch interactions and heterogeneous external noises. Several versions of discrete CBO algorithm has been proposed to deal with large scale global non-convex minimization problem arising from, for example, machine learning prolems. Recently, thanks to it simplicity and derivative-free nature, consensus based optimization has received lots of attention from applied mathematics community, yet the full understanding of the dynamic features of the CBO algorithm is still far from complete. Previously, the authors investigated consensus and convergence of the modified CBO algorithm proposed in [6] in the presence of homogeneous external noises and full batch interactions and proposed a sufficient framework leading to the consensus and convergence in terms of system parameters. In this work, we extend our previous work [18] with two new components (random batch interactions and heterogeneous external noises). To deal with random batch interactions, we recast the CBO algorithm with random batch interactions into the consensus model with randomly switching network topology so that we can apply well-developed machineries [11, 12] for the continuous and discrete Cucker-Smale flockng models using ergodicity coefficient measuring connectivity of network topology (or mixing of graphs). Our analysis shows that stochastic consensus and convergence emerge exponentially fast as long as the variance of external noises is sufficiently small. Our consensus estimate yields the monotonicity of an objective function along the discrete CBO flow. One should note that our preliminary result does not yield error estimate for the optimization problem. Thus, whether our asymptotic limit point of the discrete CBO flow stays close to the optimal point or not will be an interesting problem for a future work.
References
- [1] Albi, G., Bellomo, N., Fermo, L., Ha, S.-Y., Pareschi, L., Poyato, D. and Soler, J.: Vehicular traffic, crowds, and swarms. On the kinetic theory approach towards research perspectives. Math. Models Methods Appl. Sci. 29 (2019), 1901-2005.
- [2] Alpin, Yu. A. and Gabassov, N. Z.: A remark on the problem locating the eigenvalues of real matrices. Izv. Vyssh. Uchebn. Zaved. Mat., 1976, no. 11, 98–100; Soviet Math. (Iz. VUZ), 20:11 (1976), 86–88.
- [3] Askari-Sichani, O. and Jalili, M.: Large-scale global optimization through consensus of opinions over complex networks. Complex Adapt Syst Model 1, 11 (2013).
- [4] Bonabeau, E., Dorigo, M. and Theraulaz, G. (1999). Swarm intelligence: from natural to artificial systems. Oxford university press.
- [5] Carrillo, J. A., Choi, Y.-P., Totzeck, C. and Tse, O.: An analytical framework for consensus-based global optimization method. Math. Models Methods Appl. Sci. 21 (2018), 1037–1066.
- [6] Carrillo, J. A., Jin, S., Li, L. and Zhu, Y.: A consensus-based global optimization method for high dimensional machine learning problems. ESAIM: COCV, 27 S5 (2021).
- [7] Chatterjee, S. and Seneta, E.: Towards consensus: some convergence theorems on repeated averaging. Journal of Applied Probability 14 No. 1 (1977), pp. 89-97.
- [8] Chen, J., Jin, S. and Lyu, L.: A consensus-based global optimization method with adaptive momentum estimate. Preprint (2020).
- [9] Degroot, M. H.: Reaching a consensus. Journal of the American Statistical Association 69 No. 345 (1974), pp. 118-12.
- [10] Dembo, A. and Zeitouni, O.: Large deviations techniques and applications. Springer-Verlag, Berlin, Heidelberg, second edition, 1998.
- [11] Dong, J.-G., Ha, S.-Y., Jung, J. and Kim, D.: Emergence of stochastic flocking for the discrete Cucker-Smale model with randomly switching topologies. Commun. Math. Sci. 19 (2021), 205-228.
- [12] Dong, J.-G., Ha, S.-Y., Jung, J. and Kim, D.: On the stochastic flocking of the Cucker-Smale flock with randomly switching topologies. SIAM Journal on Control and Optimization. 58 (2020), 2332-2353.
- [13] Dorigo, M., Birattari, M. and Stutzle, T.: Ant colony optimization. IEEE computational intelligence magazine, 1 (2006) 28-39.
- [14] Durrett, R.: Probability: theory and examples. Second ed.. Duxbury Press 1996.
- [15] Fornasier, M., Huang, H., Pareschi, L. and Sünnen, P.: Consensus-based optimization on hypersurfaces: Well-posedness and mean-field limit. Mathematical Models and Methods in Applied Sciences 30 No. 14, pp. 2725-2751 (2020).
- [16] Fornasier, M., Huang, H., Pareschi, L. and Sünnen, P.: Consensus-based optimization on the sphere II: Convergence to global minimizers and machine learning. Preprint (2020).
- [17] Ha, S.-Y., Jin, S. and Kim, D.: Convergence of a first-order consensus-based global optimization algorithm. Math. Models Meth. Appl. Sci. 30 (12), 2417-2444 (2020).
- [18] Ha, S.-Y., Jin, S. and Kim, D.: Convergence and error estimates for time-discrete consensus-based optimization algorithms. Numer. Math. 147, 255–282 (2021).
- [19] Ha, S.-Y., Kang, M., Kim, D., Kim, J. and Yang, I.: Stochastic consensus dynamics for nonconvex optimization on the Stiefel manifold: mean-field limit and convergence. Submitted (2021).
- [20] Holland, J. H.: Genetic algorithms. Scientific American 267 (1992), 66-73.
- [21] Jin, S., Li, L. and Liu, J. G.: Random Batch Methods (RBM) for interacting particle systems. Journal of Computational Physics, 400 (2020), 108877.
- [22] Kennedy, J. (2006). Swarm intelligence. In Handbook of nature-inspired and innovative computing (pp. 187-219). Springer, Boston, MA.
- [23] Kennedy, J. and Eberhart, R. (1005). Particle swarm optimization. In Proceedings of ICNN’95-international conference on neural networks (Vol. 4, pp. 1942-1948). IEEE.
- [24] Kim, J., Kang, M., Kim, D., Ha, S.-Y. and Yang, I.: A stochastic consensus method for nonconvex optimization on the Stiefel Manifold. 2020 59th IEEE Conference on Decision and Control, 2020, pp. 1050-1057.
- [25] Markov, A. A., (1906). Extension of the law oflarge numbers to dependent quantities [in Russian]. Izv. Fiz.-Matem. Ohsch. Kazan Univ., (2nd ser.) 15, 135-56.
- [26] Mirjalili, S., Mirjalili, S. M. and Lewis, A.: Grey wolf optimizer. Advances in engineering software, 69 (2014), 46-61.
- [27] Pinnau, R., Totzeck, C., Tse, O. and Martin, S.: A consensus-based model for global optimization and its mean-field limit. Mathematical Models and Methods in Applied Sciences, 27 (2017), 183-204.
- [28] Totzeck, C. Pinnau, R., Blauth, S. and Schotthófer, S.: A numerical comparison of consensus-based global optimization to other particle-based global optimization scheme. Proceedings in Applied Mathematics and Mechanics, 18, 2018.
- [29] Yang, X.-S.: Nature-inspired metaheuristic algorithms. Luniver Press, 2010.
- [30] Yang, X.-S., Deb, S., Zhao, Y.-X., Fong, S. and He, X.: Swarm intelligence: past, present and future. Soft Comput 22 (2018), 5923-5933.