This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Analysis of Evolutionary Algorithms on Fitness Function with Time-linkage Property

Weijie Zheng, Huanhuan Chen, and Xin Yao This work was supported by Guangdong Basic and Applied Basic Research Foundation (Grant No. 2019A1515110177), Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), Shenzhen Science and Technology Program (Grant No. KQTD2016112514355531), the Program for University Key Laboratory of Guangdong Province (Grant No. 2017KSYS008), National Natural Science Foundation of China (Grant No. 61976111), and Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. JCYJ20180504165652917). (Corresponding author: Xin Yao.)Weijie Zheng is with Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Research Institute of Trustworthy Autonomous Systems (RITAS), Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China. He is also with School of Computer Science and Technology, University of Science and Technology of China, Hefei, China.Huanhuan Chen is with School of Computer Science and Technology, University of Science and Technology of China, Hefei, China.Xin Yao is with Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Research Institute of Trustworthy Autonomous Systems (RITAS), Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China. He is also with CERCIA, School of Computer Science, University of Birmingham, Birmingham, United Kingdom.
Abstract

In real-world applications, many optimization problems have the time-linkage property, that is, the objective function value relies on the current solution as well as the historical solutions. Although the rigorous theoretical analysis on evolutionary algorithms has rapidly developed in recent two decades, it remains an open problem to theoretically understand the behaviors of evolutionary algorithms on time-linkage problems. This paper takes the first step to rigorously analyze evolutionary algorithms for time-linkage functions. Based on the basic OneMax function, we propose a time-linkage function where the first bit value of the last time step is integrated but has a different preference from the current first bit. We prove that with probability 1o(1)1-o(1), randomized local search and (1+1)(1+1) EA cannot find the optimum, and with probability 1o(1)1-o(1), (μ+1)(\mu+1) EA is able to reach the optimum.

Index Terms:
Evolutionary algorithms, time-linkage, convergence, running time analysis.

I Introduction

Evolutionary Algorithms (EAs), one category of stochastic optimization algorithms that are inspired by Darwinian principle and natural selection, have been widely utilized in real-world applications. Although EAs are simple and efficient to use, the theoretical understandings on the working principles and complexity of EAs are much more complicated and far behind the practical usage due to the difficulty of mathematical analysis caused by their stochastic and iterative process.

In order to fundamentally understand EAs and ultimately design efficient algorithms in practice, researchers begin the rigorous analysis on functions with simple and clear structure, majorly like pseudo-Boolean function and classic combinatorial optimization problem, like in the theory books [1, 2, 3, 4, 5]. Despite the increasing attention and insightful theoretical analyses in recent decades, there remain many important open areas that haven’t been considered in the evolutionary theory community.

One important open issue is about the time-linkage problem. Time-linkage problem, firstly introduced by Bosman [6] into the evolutionary computation community, is the optimization problem where the objective function to be optimized relies not only on the solutions of the current time but also the historical ones. In other words, the current decisions also influence the future. There are plenty of applications with time-linkage property. We just list the dynamic vehicle routing with time-varying locations to visit [6] as a slightly detailed example. Suppose that the locations are clustered. Then the current vehicle serving some locations in one cluster is more efficient to serve other locations in the same cluster instead of serving the currently available locations when the locations oscillate among different clusters in future times. Besides, the efficiency of the current vehicle routing would influence the quality of the service, which further influences the future orders, that is, future locations to visit. In a word, the current routing and the impact of it in the future together determine the income of this company. The readers could also refer to the survey in [7] to see more than 3030 real-world applications, like an optimal watering scheduling to improve the quality of the crops along with the weather change [8].

The time-linkage optimization problems can be tackled offline or online according to different situations. If the problem pursues an overall solution with sufficient time budget and time-linkage dynamics can be integrated into a static objective function, then the problem can be solved offline. However, in the theoretical understanding on the static problem [1, 2, 3, 4, 5], no static benchmark function in the evolutionary theory community is time-linkage.

Another situation that the real-world applications often encounter is that the solution must be solved online as the time goes by. This time-linkage online problem belongs to dynamic optimization problem [7]. As pointed out in [7], the whole evolutionary community, not only the evolutionary theory community, is lack of research on these real-world problems. The dynamic problem analyzed so far in the theory community majorly includes Dynamic OneMax [9], Magnitude and Balance [10], Maze [11], Bi-stable problem [12], dynamic linear function [13], and dynamic BinVal function [14] for dynamic pseudo-Boolean function, and dynamic combinatorial problems including single-destination shortest path problem [15], makespan scheduling [16], vertex cover problem [17], subset selection [18], graph coloring [19], etc. However, there is no theoretical analysis on dynamic time-linkage fitness function, even no dynamic time-linkage pseudo-Boolean function is proposed for the theoretical analysis.

The main contributions of this paper can be summarized as follows. This paper conducts the first step towards the understanding of EAs on the time-linkage function. When solving a time-linkage problem by EAs in an offline mode, the first thing faced by the practitioners to utilize EAs is how they encode the solution. There are obviously two straightforward encoding ways. Take the objective function relying on solutions of two time steps as an example. One way is to merely ignore the time-linkage dependency by solving a non-time-linkage function with double problem size. The other way is to consider the time-linkage dependency, encode the solution with the original problem size, but store the solutions generated in the previous time steps for the fitness evaluation. When solving the time-linkage problem in an online mode, the engineers need to know before they conduct the experiments whether the algorithm they use can solve the problem or not. Hence, in this paper, we design a time-linkage toy function based on OneMax to shed some light on these questions. This function, called OneMax(0,1n){}_{(0,1^{n})} where nn is the dimension size, is the sum of two components, one is OneMax fitness of the current nn-dimensional solution, the other one is the value of the first dimension in the previous solution but multiplying minus dimension size. The design of this function considers the situation when the current solution prefers a different value from the previous solution, which could better show the influence of different encodings. Also, it could be the core element of some dynamic time-linkage functions and used in the situation that each time step we only optimize the current state of the online problem in a limited time, so that the analysis of this function could also show some insights to the undiscovered theory for the dynamic time-linkage function.

For our results, this paper analyzes the theoretical behaviors of randomized local search (RLS) and two most common benchmark EAs, (1+1)(1+1) EA and (μ+1)(\mu+1) EA, on OneMax(0,1n){}_{(0,1^{n})}. We will show that with probability 1o(1)1-o(1), RLS and (1+1)(1+1) EA cannot find the optimum of OneMax(0,1n){}_{(0,1^{n})} (Theorem 7) while the not small population size in (μ+1)(\mu+1) EA can help it reach the optimum with probability 1o(1)1-o(1) (Theorem 15). We also show that conditional on an event with probability 1o(1)1-o(1), the expected runtime for (μ+1)(\mu+1) EA is O(nμ)O(n\mu) (Theorem 16).

The remainder of this paper is organized as following. In Section II, we introduce the motivation and details about the designed OneMax(0,1n){}_{(0,1^{n})}. Section III shows the theoretical results of RLS and (1+1)(1+1) EA on OneMax(0,1n){}_{(0,1^{n})}, and our theoretical results of (μ+1)(\mu+1) EA are shown in Section IV. Our conclusion is summarized in Section V.

II OneMax(0,1n){}_{(0,1^{n})} Function

II-A OneMax(0,1n){}_{(0,1^{n})} Function

For the first time-linkage problem for theoretical analysis, we expect the function to be simple and with clear structure. OneMax, which counts the total ones in a bit string, is considered as one of the simplest pseudo-Boolean functions, and is a well-understood benchmark in the evolutionary theory community on static problems. Choosing it as a base function to add the time-linkage property could facilitate the theoretical understanding on the time-linkage property. Hence, the time-linkage function we will discuss in this paper is based on OneMax. In OneMax function, each dimension has the same importance and the same preference for having a dimension value 11. We would like to show the difference, or more aggressively show the difficulty that the time-linkage property will cause, which could better help us understand the behavior of EAs on time-linkage problems. Therefore, we will introduce the solutions of the previous steps but with different importance and preference. For simplicity of analysis, we only introduce one dimension, let’s say the first dimension, value of the last time step into the objective function but with the weight of n-n, where nn is the dimension size. Other weights could also be interesting, but for the first time-linkage benchmark, we just take n-n to show the possible difficulty caused by the time-linkage property. More precisely, this function f:{0,1}×{0,1}nf:\{0,1\}\times\{0,1\}^{n}\rightarrow\mathbb{Z} is defined by

f(xt1,xt)=i=1nxitnx1t1\displaystyle f(x^{{t-1}},x^{{t}})=\sum_{i=1}^{n}x_{i}^{{t}}-nx_{1}^{{t-1}} (1)

for two consecutive xt1=(x1t1,,xnt1)x^{{t-1}}=(x_{1}^{{t-1}},\dots,x_{n}^{{t-1}}) and xt=(x1t,,xnt){0,1}nx^{{t}}=(x_{1}^{{t}},\dots,x_{n}^{{t}})\in\{0,1\}^{n}. Clearly, (1) consists of two components, OneMax component relying on the current individual, and the drawing-back component determined by the first bit value of the previous individual. If our goal is to maximize (1), it is not difficult to see that the optimum is unique and the maximum value nn is reached if and only if (x1t1,xt)=(0,1n)(x_{1}^{{t-1}},x^{{t}})=(0,1^{n}). Hence, we integrate (0,1n)(0,1^{n}) and call (1) OneMax(0,1n){}_{(0,1^{n})} function.

The maximization of the proposed OneMax(0,1n){}_{(0,1^{n})} function specializes the maximization of the more general time-linkage pseudo-Boolean problems h:{0,1}n××{0,1}n{h:\{0,1\}^{n}\times\dots\times\{0,1\}^{n}\rightarrow\mathbb{R}} defined by

h(xt0,,xt0+)=t=0ht(xt0+t;xt0,,xt0+t1)\displaystyle h(x^{{t_{0}}},\dots,x^{{t_{0}+\ell}})=\sum_{t=0}^{\ell}h_{t}(x^{{t_{0}+t}};x^{{t_{0}}},\dots,x^{{t_{0}+t-1}}) (2)

for consecutive xt0,xt0+1,,xt0+x^{{t_{0}}},x^{{t_{0}+1}},\dots,x^{{t_{0}+\ell}} where \ell\in\mathbb{N} and could be infinite. OneMax(0,1n){}_{(0,1^{n})} function could be regarded as a specialization with =1\ell=1, h0(xt0)=nx1t0h_{0}(x^{{t_{0}}})=-nx_{1}^{{t_{0}}}, and h1(xt0+1;xt0)=i=1nxit0+1h_{1}(x^{{t_{0}+1}};x^{{t_{0}}})=\sum_{i=1}^{n}x_{i}^{{t_{0}+1}}. We acknowledge that more complicated models are more interesting and practical, like with >1\ell>1, with more than one bit value and with other weight values for the historical solutions, etc., but current specialization facilitates the establishment of the first theoretical analysis for the time-linkage problems, and our future work will focus on the analyses of more complicated models.

II-B Some Notes

Time-linkage optimization problem can be solved offline or online due to different situations. In the following, we follow the main terminology from [6], the first paper that used the term “time-linkage” and introduced it into the evolutionary computation community.

II-B1 Offline mode

Solving the general time-linkage problem hh defined above in an offline mode means that we could evaluate all possible (xt0,xt0+1,,xt0+)(x^{{t_{0}}},x^{{t_{0}+1}},\dots,x^{{t_{0}+\ell}}) before determine the final solution for the problem hh. In this case, the optimum is defined differently when we use different representations. Obviously, there are two straightforward kinds of representations. One is ignoring the time-linkage fact and encoding (xt0,xt0+1,,xt0+)(x^{{t_{0}}},x^{{t_{0}+1}},\dots,x^{{t_{0}+\ell}}) into an n(+1)n(\ell+1)-bit string as one solution, and the optimum is a search point with n(+1)n(\ell+1) dimensions. We denote this optimum as XX^{*}. Since the problem is transferred to a traditional non-time-linkage problem, it is not of interest of our topic.

The other kind of representation is considering the time-linkage property and encoding (xt,xt+1,,xt0+),t>t0(x^{{t^{\prime}}},x^{{t^{\prime}+1}},\dots,x^{{t_{0}+\ell}}),{t^{\prime}>t_{0}}, into an mm-bit string, m{n,2n,,n}m\in{\{n,2n,\dots,\ell n\}}, and storing other historical solutions for objective function evaluation. In this case, the optimum is a search point with mm dimensions taking the same values as the last mm bit values of XX^{*}, the optimum in the first kind of representation, condition on the stored solutions taking the same values as the corresponding bit values of XX^{*}. For the considered OneMax(0,1n){}_{(0,1^{n})} function, we encode an nn-bit string as one solution and store the previous result for objective function evaluation, and the optimum is the current search point with all 1s condition on that the stored previous first bit has value 0. This representation is more interesting to us since now hh and OneMax(0,1n){}_{(0,1^{n})} function are truly time-linkage functions and we could figure out how EAs react to the time-linkage property. Hence, the later sections only consider this representation when solving the OneMax(0,1n){}_{(0,1^{n})} function in an offline mode is analyzed.

II-B2 Online mode

As in [6, Section 2], the online mode means that we regard it as a dynamic optimization problem, and that the process continues only when the decision on the current solution is made, that is, solutions cannot be evaluated for any time t>tnowt>t^{now} and we can only evaluate the quality of the historical and the present solutions. More precisely, at present time tnowt^{now}, we can evaluate

h~(xt0,,xtnow)=t=0tnowt0ht~(xt0+t;xt0,,xt0+t1)\displaystyle\tilde{h}(x^{{t_{0}}},\dots,x^{t^{now}})=\sum_{t=0}^{t^{now}-{t_{0}}}\tilde{h_{t}}(x^{{t_{0}+t}};x^{{t_{0}}},\dots,x^{{t_{0}+t-1}})

where we note that ht~\tilde{h_{t}} could dynamically change and could be different from ht,t=0,,tnowh_{t},t=0,\dots,t^{now} in (2) when the process ends at t=t0+t={t_{0}+\ell}, since the impact of the historical solutions could change when time goes by. Usually, for the online mode, the overall optimum within the whole time period as in the offline mode cannot be reached once some non-optimal solution is made in one time step. Hence, our goal is to obtain the function value as larger as possible before the end of the time period, or more specifically, to obtain some function value above one certain threshold. We notice the similarity to the discounted total reward in the reinforcement learning [20, Chapter 3]. However, as pointed in [6, Section 1], the online dynamic time-linkage problem is fundamentally different since in each time the decisions themselves are needed to be optimized and can only be made once, while in reinforcement learning the policies are the solutions and the decisions during the process serve to help determine a good policy.

Back to the OneMax(0,1n){}_{(0,1^{n})}, the function itself is not suitable to be solved in an online mode since it only contains the previous time step and the current step. However, we could still relate OneMax(0,1n){}_{(0,1^{n})} function to online dynamic optimization problems, via regarding it as one piece of the objective function that considers the overall results during a given time period and each time step we only optimize the current piece. For example, we consider the following dynamic problem

h(x,t)=maxxτ=2tet+τ1x1τ2nx1t1+i=1nxit\displaystyle h(x,{t})=\max_{x}\sum_{{\tau=2}}^{{t}}e^{{-t+\tau-1}}x^{{\tau-2}}_{1}-nx^{{t-1}}_{1}+\sum_{i=1}^{n}x^{{t}}_{i} (3)

where x={x2,,xt}{x=\{x^{2},\dots,x^{{t}}\}}, xτ=(x1τ,,xnτ){0,1}n{x^{{\tau}}=(x_{1}^{{\tau}},\dots,x_{n}^{{\tau}})\in\{0,1\}^{n}} for τ=0,1,,t{{\tau=0,1,\dots,t}} and the initial x0x^{0} and x1x^{1} are given. For (3), our goal is to find the solution at some time step t{t} when its function value is greater than n1n-1. Since the previous elements in 1,,t21,\dots,{t-2} time steps can contribute at most τ=2tet+τ11/(e1)\sum_{{\tau=2}}^{{t}}e^{{-t+\tau-1}}\leq 1/(e-1) value, the goal can be transferred to find the time step when the component of the current and the last step, that is, OneMax(0,1n){}_{(0,1^{n})}, has the value of nn. Thus if we take the strategy for online optimization that we optimize the present each time as discussed in [6, Section 3], that is, for the current time tcur{t_{cur}}, we optimize h(x,tcur)h(x,{t_{cur}}) with knowing x0,,xtcur1x^{0},\dots,x^{{t_{cur}-1}}, then the problem can be functionally regarded as maximizing OneMax(0,1n){}_{(0,1^{n})} function with nn-bit string encoding as time goes by. Hence, we could reuse the optimum of OneMax(0,1n){}_{(0,1^{n})} function in the offline mode as our goal for solving (3) in an online mode, and call it “optimum” for this online mode with no confusion.

In summary, considering the OneMax(0,1n){}_{(0,1^{n})} problem, we note that for the representation encoding nn-bit string in an offline manner and for optimizing present in an online dynamic manner, the algorithm used for these two situations are the same but with different backgrounds and descriptions of the operators. The details will be discussed when they are mentioned in Sections III and IV.

III RLS and (1+1)(1+1) EA Cannot Find the Optimum

III-A RLS and (1+1)(1+1) EA Utilized for OneMax(0,1n){}_{(0,1^{n})}

(1+1)(1+1) EA is the simplest EA that is frequently analyzed as a benchmark algorithm in the evolutionary theory community, and randomized local search (RLS) can be regarded as the simplification of (1+1)(1+1) EA and thus a pre-step towards the theoretical understanding of (1+1)(1+1) EA. Both algorithms are only with one individual in their population. Their difference is on the mutation. In each generation, (1+1)(1+1) EA employs the bit-wise mutation on the individual, that is, each bit is independently flipped with probability 1/n1/n, where nn is the problem size, while RLS employs the one-bit mutation, that is, only one bit among nn bits is uniformly and randomly chosen to be flipped. For both algorithms, the generated offspring will replace its parent as long as it has at least the same fitness as its parent.

The general RLS and (1+1)(1+1) EA are utilized for non-time-linkage function, and they do not consider how we choose the individual representation and do not consider the requirement to make a decision in a short time. We need some small modifications on RLS and (1+1)(1+1) EA to handle the time-linkage OneMax(0,1n){}_{(0,1^{n})} function. The first issue, the representation choice, only happens when the problem is solved in an offline mode. As mentioned in Section II-B, for the two representation options, we only consider the one that encodes the current solution and stores the previous solution for fitness evaluation. In this encoding, we set that only offspring with better or equivalent fitness could affect the further fitness, hoping the optimization process to learn or approach to the situations which are suitable for the time-linkage property. Algorithm 1 shows our modified (1+1)(1+1) EA and RLS for solving OneMax(0,1n){}_{(0,1^{n})}, and we shall still use the name (1+1)(1+1) EA and RLS in this paper with no confusion. In this case, the optimum of OneMax(0,1n){}_{(0,1^{n})} is the 1n1^{n} as the current solution with the stored first bit value of the last generation being 0. Practically, some termination criterion is utilized in the algorithms when the practical requirement is met. Since we aim at theoretically analyzing the time to reach the optimum, we do not set the termination criterion here.

1:Generate the random initial two generations X0=(X10,,Xn0)X^{0}=(X_{1}^{0},\dots,X_{n}^{0}) and X1=(X11,,Xn1)X^{1}=(X_{1}^{1},\dots,X_{n}^{1})
2:for g=1,2,g=1,2,\dots do
3:%%\%\% Mutation
4:   For (1+1)(1+1) EA, generate X~g\tilde{X}^{g} via independently flipping each bit value of XgX^{g} with probability 1/n1/n;
5: For RLS, generate X~g\tilde{X}^{g} via flipping one uniformly and randomly selected bit value of XgX^{g}
6:%%\%\% Selection
7:   if f(Xg1,Xg)>f(Xg,X~g)f(X^{g-1},X^{g})>f(X^{g},\tilde{X}^{g}) then
8:      (Xg,Xg+1)=(Xg1,Xg)(X^{g},X^{g+1})=(X^{g-1},X^{g})
9:   else
10:      (Xg,Xg+1)=(Xg,X~g)(X^{g},X^{g+1})=(X^{g},\tilde{X}^{g})
11:   end if
12:end for
Algorithm 1 (1+1)(1+1) EA/RLS to maximize fitness function ff requiring two consecutive time steps

The second issue, the requirement to make a decision in a short time, happens when the problem is solved in an online mode. Detailedly, consider the problem (3) we discussed in Section II-B that in each time step we just optimize the present. If the time to make the decision is not so small that (1+1)(1+1) EA or RLS can solve the nn-dimension problem (OneMax function), then we could obtain Xt=(1,,1)X^{t}=(1,\dots,1) in each time step tt. Obviously, in this case, the sequence of {X1,,Xt}\{X^{1},\dots,X^{t}\} we obtained will lead to a fitness less than 11 for any time step tt, and thus we cannot achieve our goal and this case is not interesting. Hence, we assume the time to make the decision is small so that we cannot solve nn-dimensional OneMax function, and we can just expect to find some result with better or equivalent fitness value each time step. That is, we utilize (1+1)(1+1) EA or RLS to solve OneMax(0,1n){}_{(0,1^{n})} function, and the evolution process can go on only if some offspring with better or equivalent fitness appears. In this case, we can reuse Algorithm 1, but to note that the generation step gg need not to be the same as the time step tt of the fitness function since (1+1)(1+1) EA or RLS may need more than one generation to obtain an offspring with better or equivalent fitness for one time step.

In a word, no matter utilizing (1+1)(1+1) EA and RLS to solve OneMax(0,1n){}_{(0,1^{n})} offline or online, in the theoretical analysis, we only consider Algorithm 1 and the optimum is the current search point with all 1s condition on that the stored previous first bit has value 0, without mentioning the solving mode and regardless of the explanations of the different backgrounds.

III-B Convergence Analysis of RLS and (1+1)(1+1) EA on OneMax(0,1n){}_{(0,1^{n})}

This subsection will show that with high probability RLS and (1+1)(1+1) EA cannot find the optimum of OneMax(0,1n){}_{(0,1^{n})}. Obviously, OneMax(0,1n){}_{(0,1^{n})} has two goals to achieve, one is to find all 1s in the current string, and the other is to find the optimal pattern (0,1)(0,1) in the first bit that the current first bit value goes to 1 when the previous first bit value is 0. The two goals are somehow contradictory, so that only one individual in the population of RLS and (1+1)(1+1) EA will cause poor fault tolerance. Detailedly, as we will show in Theorem 7, with high probability, one of twos goal will be achieved before the optimum is found, but the population cannot be further improved.

Since this paper frequently utilizes different variants of Chernoff bounds, to make it self-contained, we put them from [21, 22] into Lemma 1. Besides, before establishing our main result, with the hope that it might be beneficial for further research, we also discuss in Lemma 2 the probability estimate of the event that the increase number of ones from one parent individual to its offspring is 11 under the condition that the increase number of ones is positive in one iteration of (1+1)(1+1) EA, given the parent individual has aa zeros.

Lemma 1 ([21, 22]).

Let ξ1,ξ2,,ξm\xi_{1},\xi_{2},\dots,\xi_{m} be independent random variables. Let Ξ=i=1mξi\Xi=\sum_{i=1}^{m}\xi_{i}.

  • (a)

    If ξi\xi_{i} takes values in [0,1][0,1], then for all δ[0,1]\delta\in[0,1], Pr[Ξ(1δ)E[Ξ]]exp(δ2E[Ξ]/2){\Pr[\Xi\leq(1-\delta)E[\Xi]]\leq\exp(-\delta^{2}E[\Xi]/2)}.

  • (b)

    If ξi\xi_{i} takes values in an interval of length cic_{i}, then for all λ0\lambda\geq 0, Pr[ΞE[Ξ]+λ]exp(2λ2/i=1mci).\Pr[\Xi\geq E[\Xi]+\lambda]\leq\exp(-2\lambda^{2}/\sum_{i=1}^{m}c_{i}).

  • (c)

    If ξi\xi_{i} follows the geometric distribution with success probability pp for all i[1..m]i\in[1..m], then for all δ0\delta\geq 0, Pr[Ξ(1+δ)E[Ξ]]exp(δ2(m1)2(1+δ));\Pr[\Xi\geq(1+\delta)E[\Xi]]\leq\exp\left(-\frac{\delta^{2}(m-1)}{2(1+\delta)}\right); for all δ[0,1]\delta\in[0,1], Pr[Ξ(1δ)E[Ξ]]exp(δ2m243δ).\Pr[\Xi\leq(1-\delta)E[\Xi]]\leq\exp\left(-\frac{\delta^{2}m}{2-\frac{4}{3}\delta}\right).

Lemma 2.

Suppose X{0,1}nX\in\{0,1\}^{n}. Y{0,1}nY\in\{0,1\}^{n} is generated by independently flipping each bit of XX with probability 1n\frac{1}{n}. Let a[1..n]a\in[1..n] be the number of zeros in XX, and let |X||X| denote the number of ones in XX and |Y||Y| for the number of ones in YY. Then

Pr[|Y|\displaystyle\Pr[|Y| |X|=1|Y|>|X|]\displaystyle-|X|=1\mid|Y|>|X|]
=\displaystyle={} i=1a(ai)(nai1)1n2i1(11n)n2i+1i=1aj=0i1(ai)(naj)1ni+j(11n)nij>1ean.\displaystyle\frac{\sum_{i=1}^{a}\binom{a}{i}\binom{n-a}{i-1}\frac{1}{n^{2i-1}}(1-\frac{1}{n})^{n-2i+1}}{\sum_{i=1}^{a}\sum_{j=0}^{i-1}\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}(1-\frac{1}{n})^{n-i-j}}>1-\frac{ea}{n}.
Proof.

Since

Pr\displaystyle\Pr [|Y|>|X|]=i=1aj=0i1(ai)(naj)1ni+j(11n)nij\displaystyle[|Y|>|X|]=\sum_{i=1}^{a}\sum_{j=0}^{i-1}\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}\left(1-\frac{1}{n}\right)^{n-i-j}
Pr\displaystyle\Pr [|Y||X|=1]=i=1a(ai)(nai1)1n2i1(11n)n2i+1,\displaystyle[|Y|-|X|=1]=\sum_{i=1}^{a}\binom{a}{i}\binom{n-a}{i-1}\frac{1}{n^{2i-1}}\left(1-\frac{1}{n}\right)^{n-2i+1},

and Pr[(|Y||X|=1)(|Y|>|X|)]=Pr[|Y||X|=1]{\Pr[\left(|Y|-|X|=1\right)\cap\left(|Y|>|X|\right)]=\Pr[|Y|-|X|=1]}, we have

Pr[|Y|\displaystyle\Pr[|Y|{} |X|=1|Y|>|X|]\displaystyle-|X|=1\mid|Y|>|X|]
=\displaystyle={} Pr[(|Y||X|=1)(|Y|>|X|)]Pr[|Y|>|X|]\displaystyle\frac{\Pr[(|Y|-|X|=1)\cap(|Y|>|X|)]}{\Pr[|Y|>|X|]}
=\displaystyle={} i=1a(ai)(nai1)1n2i1(11n)n2i+1i=1aj=0i1(ai)(naj)1ni+j(11n)nij.\displaystyle\frac{\sum_{i=1}^{a}\binom{a}{i}\binom{n-a}{i-1}\frac{1}{n^{2i-1}}(1-\frac{1}{n})^{n-2i+1}}{\sum_{i=1}^{a}\sum_{j=0}^{i-1}\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}(1-\frac{1}{n})^{n-i-j}}.

For a=1a=1, it is easy to see that

Pr[|Y||X|=1|Y|>|X|]=1>1en.\displaystyle\Pr[|Y|-|X|=1\mid|Y|>|X|]=1>1-\frac{e}{n}.

For a2a\geq 2, since

i=2a\displaystyle\sum_{i=2}^{a} j=0i2(ai)(naj)1ni+j(11n)nij\displaystyle\sum_{j=0}^{i-2}\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}\left(1-\frac{1}{n}\right)^{n-i-j}
=\displaystyle={} i=2a(ai)1ni(11n)nij=0i2(naj)1nj(11n)j\displaystyle\sum_{i=2}^{a}\binom{a}{i}\frac{1}{n^{i}}{\left(1-\frac{1}{n}\right)^{n-i}}\sum_{j=0}^{i-2}\binom{n-a}{j}\frac{1}{n^{j}}{\left(1-\frac{1}{n}\right)^{-j}}
\displaystyle\leq{} i=2a(ai)1ni(11n)nij=0i2(na)jj!1(n1)j\displaystyle\sum_{i=2}^{a}\binom{a}{i}\frac{1}{n^{i}}{\left(1-\frac{1}{n}\right)^{n-i}}\sum_{j=0}^{i-2}\frac{(n-a)^{j}}{j!}\frac{1}{(n-1)^{j}}
\displaystyle\leq{} i=2a(ai)1ni(11n)ni(i1)\displaystyle\sum_{i=2}^{a}\binom{a}{i}\frac{1}{n^{i}}{\left(1-\frac{1}{n}\right)^{n-i}}(i-1)
=\displaystyle={} i=2aa!i!(ai)!1ni(11n)ni(i1)\displaystyle\sum_{i=2}^{a}\frac{a!}{i!(a-i)!}\frac{1}{n^{i}}{\left(1-\frac{1}{n}\right)^{n-i}}(i-1)
<\displaystyle<{} i=2aa!(i1)!(ai)!1ni(11n)ni(i1)\displaystyle\sum_{i=2}^{a}\frac{a!}{(i-1)!(a-i)!}\frac{1}{n^{i}}{\left(1-\frac{1}{n}\right)^{n-i}}(i-1)
=\displaystyle={} i=2aa!(i2)!(ai)!1ni(11n)ni\displaystyle\sum_{i=2}^{a}\frac{a!}{(i-2)!(a-i)!}\frac{1}{n^{i}}{\left(1-\frac{1}{n}\right)^{n-i}}
=\displaystyle={} a(a1)n2i=2a(a2i2)1ni2(11n)ni\displaystyle\frac{a(a-1)}{n^{2}}\sum_{i=2}^{a}\binom{a-2}{i-2}\frac{1}{n^{i-2}}{\left(1-\frac{1}{n}\right)^{n-i}}
\displaystyle\leq{} a(a1)n2i=2a(a2i2)1ni2(11n)ai\displaystyle\frac{a(a-1)}{n^{2}}\sum_{i=2}^{a}\binom{a-2}{i-2}\frac{1}{n^{i-2}}{\left(1-\frac{1}{n}\right)^{a-i}}
=\displaystyle={} a(a1)n2(1n+11n)a2=a(a1)n2\displaystyle\frac{a(a-1)}{n^{2}}{\left(\frac{1}{n}+1-\frac{1}{n}\right)^{a-2}}=\frac{a(a-1)}{n^{2}}

and

i=1aj=0i1\displaystyle\sum_{i=1}^{a}\sum_{j=0}^{i-1}{} (ai)(naj)1ni+j(11n)nij\displaystyle\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}{\left(1-\frac{1}{n}\right)^{n-i-j}}
\displaystyle\geq{} an(11n)n1>aen,\displaystyle\frac{a}{n}{\left(1-\frac{1}{n}\right)^{n-1}}>\frac{a}{en},

we have

i=2aj=0i2(ai)(naj)1ni+j(11n)niji=1aj=0i1(ai)(naj)1ni+j(11n)nij<a(a1)n2aen<ean.\displaystyle\frac{\sum_{i=2}^{a}\sum_{j=0}^{i-2}\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}(1-\frac{1}{n})^{n-i-j}}{\sum_{i=1}^{a}\sum_{j=0}^{i-1}\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}\left(1-\frac{1}{n}\right)^{n-i-j}}<\frac{\frac{a(a-1)}{n^{2}}}{\frac{a}{en}}<\frac{ea}{n}.

With

i=1a(ai)\displaystyle\sum_{i=1}^{a}\binom{a}{i}{} (nai1)1n2i1(11n)n2i+1\displaystyle\binom{n-a}{i-1}\frac{1}{n^{2i-1}}{\left(1-\frac{1}{n}\right)^{n-2i+1}}
=\displaystyle={} i=1aj=0i1(ai)(naj)1ni+j(11n)nij\displaystyle\sum_{i=1}^{a}\sum_{j=0}^{i-1}\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}{\left(1-\frac{1}{n}\right)^{n-i-j}}
i=2aj=0i2(ai)(naj)1ni+j(11n)nij,\displaystyle-\sum_{i=2}^{a}\sum_{j=0}^{i-2}\binom{a}{i}\binom{n-a}{j}\frac{1}{n^{i+j}}{\left(1-\frac{1}{n}\right)^{n-i-j}},

the lemma is proved. ∎

Now we show the behavior for RLS and (1+1)(1+1) EA optimizing OneMax(0,1n){}_{(0,1^{n})} function. The outline to establish our main theorem (Theorem 7) is shown in Figure 1. First, Lemma 3 shows two cases once one of them happens before the optimum is reached, both RLS and (1+1)(1+1) EA cannot find the optimum in arbitrary further generations. Then for the non-trivial behaviors for three different initial states, Lemma 4, Lemma 5, and Lemma 6 respectively show that the algorithm will get stuck in one of the two cases. In the following, all proofs don’t specifically distinguish RLS and (1+1)(1+1) EA due to their similarity, and discuss each algorithm independently only when they have different behaviors. We start with one definition that will be frequently used in our proofs.

Refer to caption
Figure 1: Relationship between Theorem 7 and Lemmas 345, and 6.
Lemma 3.

Consider using (1+1)(1+1) EA (RLS) to optimize nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function. Let X0,X1,X^{0},X^{1},\cdots be the solution sequence generated by the algorithm. Let

  • Event I: There is a g0g_{0}\in\mathbb{N} such that (X1g01,X1g0)=(0,1)(X_{1}^{g_{0}-1},X_{1}^{g_{0}})=(0,1) and X[2..n]g01n1X_{[2..n]}^{g_{0}}\neq 1^{n-1},

  • Event II: There is a g0g_{0}\in\mathbb{N} such that (X1g01,Xg0)=(1,1n)(X_{1}^{g_{0}-1},X^{g_{0}})=(1,1^{n}).

Then if at a certain generation among the solution sequence, Event I or Event II happens, then (1+1)(1+1) EA (RLS) can not find the optimum of OneMax(0,1n){}_{(0,1^{n})} in an arbitrary long runtime afterwards.

Proof.

Consider the case when Event I happens, then (X1g01,X1g0)=(0,1)(X_{1}^{g_{0}-1},X_{1}^{g_{0}})=(0,1). In this case, the current fitness f(Xg01,Xg0)1f(X^{g_{0}-1},X^{g_{0}})\geq 1. For every possible X~g0\tilde{X}^{g_{0}}, the mutation outcome of Xg0X^{g_{0}}, since X1g0=1X_{1}^{g_{0}}=1, we know f(Xg0,X~g0)0f(X^{g_{0}},\tilde{X}^{g_{0}})\leq 0 regardless of the values of other bits in X~g0\tilde{X}^{g_{0}}. Hence, X~g0\tilde{X}^{g_{0}} cannot enter in the (g0+1)(g_{0}+1)-th generation, that is, any progress achieved in the OneMax component of OneMax(0,1n){}_{(0,1^{n})} function (any bit value changing from 0 to 11 from the 22-nd to the nn-th bit position) cannot pass on to the next generation. Besides, (X1g0,X1g0+1)=(X1g01,X1g0)=(0,1)(X_{1}^{g_{0}},X_{1}^{g_{0}+1})=(X_{1}^{g_{0}-1},X_{1}^{g_{0}})=(0,1), then (X1g,X1g+1)=(0,1)(X_{1}^{g},X_{1}^{g+1})=(0,1) holds for all gg01g\geq g_{0}-1. That is, RLS or (1+1)(1+1) EA gets stuck in this case.

Consider the case when Event II happens, that is, (X1g01,Xg0)=(1,1n)(X_{1}^{g_{0}-1},X^{g_{0}})=(1,1^{n}). In this case, the current fitness f(Xg01,Xg0)=0f(X^{g_{0}-1},X^{g_{0}})=0. Similar to the above case, since X1g0=1X_{1}^{g_{0}}=1, all possible mutation outcome X~g0\tilde{X}^{g_{0}} along with Xg0X^{g_{0}} will have fitness less than or equal to 0. Hence, only when f(Xg0,X~g0)=0f(X^{g_{0}},\tilde{X}^{g_{0}})=0 happens, X~g0\tilde{X}^{g_{0}} can enter into the next generation, which means X~g0=1n\tilde{X}^{g_{0}}=1^{n}. Therefore, RLS or (1+1)(1+1) EA will get stuck in this case. ∎

Lemma 4.

Consider using (1+1)(1+1) EA (RLS) to optimize nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function. Assume that at the first generation, (X10,X11)=(0,0)(X_{1}^{0},X_{1}^{1})=(0,0) and j=2nXj1<34n{\sum_{j=2}^{n}X_{j}^{1}<\frac{3}{4}n}. Then with probability at least 1e+1n1/31-\frac{e+1}{n^{1/3}}, Event I will happen at one certain generation after this initial state.

Proof.

Consider the subsequent process once the number of 0-bits among {2,,n}\{2,\dots,n\} bit positions of the current individual, becomes less than ncn^{c} for any given constant c<0.5c<0.5. Note that if the first bit value changes from 0 to 11 before the number of 0-bits decreases to ncn^{c}, Event I already happens. Hence, we just consider the case that the current first bit value is still 0 at the first time when the number of the remaining 0-bits decreases to ncn^{c}. Let aa denote the number of 0-bits of the current individual. We conduct the proof of this lemma based on the following two facts.

  • Among increase steps (the fitness has an absolute increase), a single increase step increases the fitness by 11 with conditional probability at least 1ea/n1-ea/n. For RLS, due to its one-bit mutation, the amount of fitness increase can only be 11 for a single increase step. For (1+1)(1+1) EA, Lemma 2 directly shows this fact.

  • Under the condition that one step increases the fitness by 11, with conditional probability at least 1/a1/a, the first bit changes its value from 0 to 11. It is obvious for RLS. For (1+1)(1+1) EA, suppose that the number of bits changing from 0 to 11 in this step is m[1..a]m\in[1..a], then the probability that the first bit contributes one 0 is (a1m1)/(am)=m/a1/a\binom{a-1}{m-1}/\binom{a}{m}=m/a\geq 1/a for m1m\geq 1.

Note that there are a1a-1 increase steps before the remaining n1n-1 positions become all 11s if each increase step increases the fitness by 11. Then with the above two facts, it is easy to see that the probability that one individual with the (0,1)(0,1) first bit pattern is generated before remaining positions all have bit value 11 is at least

(\displaystyle\Bigg{(} a=nc2(1ean))(1a=nc2(11a))\displaystyle\prod_{a=n^{c}}^{2}\left(1-\frac{ea}{n}\right)\Bigg{)}\left(1-\prod_{a=n^{c}}^{2}\left(1-\frac{1}{a}\right)\right)
=\displaystyle={} (a=nc2(1ean))(1a=nc2a1a)\displaystyle\left(\prod_{a=n^{c}}^{2}\left(1-\frac{ea}{n}\right)\right)\left(1-\prod_{a=n^{c}}^{2}\frac{a-1}{a}\right)
\displaystyle\geq{} (1en1c)nc(11nc)1en12c1nc\displaystyle{\left(1-\frac{e}{n^{1-c}}\right)^{n^{c}}\left(1-\frac{1}{n^{c}}\right)\geq 1-\frac{e}{n^{1-2c}}-\frac{1}{n^{c}}}
\displaystyle\geq{} 1e+1min{n12c,nc}.\displaystyle{1-\frac{e+1}{\min\{n^{1-2c},n^{c}\}}}.

We could just set c=13c=\frac{1}{3} and obtain the probability lower bound as 1e+1n1/31-\frac{e+1}{n^{1/3}}. ∎

Lemma 5.

Consider using (1+1)(1+1) EA (RLS) to optimize nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function. Assume that at the first generation, (X10,X11)=(1,0)(X_{1}^{0},X_{1}^{1})=(1,0) and j=2nXj1<34n{\sum_{j=2}^{n}X_{j}^{1}<\frac{3}{4}n}. Then with probability at least 1e+1n1/31-\frac{e+1}{n^{1/3}}, Event I will happen at one certain generation after this initial state.

Proof.

Since (X10,X11)=(1,0)(X_{1}^{0},X_{1}^{1})=(1,0), we know that any offspring will have better fitness than the current individual, and will surely enter into the next generation. Then with probability 1/n1/n, the first bit value in the next generation becomes 11, that is, Event I happens. Otherwise, with probability 11/n1-1/n, it turns to the above discussed (X10,X11)=(0,0)(X_{1}^{0},X_{1}^{1})=(0,0) situation. Hence, in this situation, the probability that eventually Event I happens is at lea st

1n+(11n)(1e+1n1/3)1e+1n1/3.\displaystyle{\frac{1}{n}+\left(1-\frac{1}{n}\right)\left(1-\frac{e+1}{n^{1/3}}\right)\geq 1-\frac{e+1}{n^{1/3}}}.\qed
Lemma 6.

Consider using (1+1)(1+1) EA (RLS) to optimize nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function. Assume that at the first generation, (X10,X11)=(1,1)(X_{1}^{0},X_{1}^{1})=(1,1) and j=2nXj1<34n{\sum_{j=2}^{n}X_{j}^{1}<\frac{3}{4}n}. Then with probability at least 1e+1n1/3(n1)exp(n1/3e)1-\frac{e+1}{n^{1/3}}-(n-1)\exp{\left(-\frac{n^{1/3}}{e}\right)}, Event I or Event II will happen at one certain generation after this initial state.

Proof.

For (X10,X11)=(1,1)(X_{1}^{0},X_{1}^{1})=(1,1), since in each iteration only one bit can be flipped for RLS, once the first bit is flipped from 11 to 0, the fitness of the offspring will be less than its parent and the offspring cannot enter into the next generation. Hence, for RLS, the individual will be eventually evolved to (X1g01,Xg0)=(1,1n)(X_{1}^{g_{0}-1},X^{g_{0}})=(1,1^{n}) for some g0g_{0}\in\mathbb{N}. That is, Event II happens.

For (1+1)(1+1) EA, similar to the (X10,X11)=(0,0)(X_{1}^{0},X_{1}^{1})=(0,0) situation, we consider the subsequent process once the number of 0-bits among {2,,n}\{2,\dots,n\} bit positions of the current individual, becomes less than ncn^{c} for some constant c<0.5c<0.5, and let aa denote the number of 0-bits of the current individual. If the first bit value changes from 11 to 0 before the number of 0-bits decreases to ncn^{c}, we turn to the (X10,X11)=(1,0)(X_{1}^{0},X_{1}^{1})=(1,0) situation. Otherwise, we will show that in the subsequent generations, with probability at least 1o(1)1-o(1), the (1,1)(1,1) pattern will be maintained after the remaining bits reach the optimal 1n11^{n-1}. Consider the condition that the first bit value stays at 11s and let T~\tilde{T} be the number of time that all n1n-1 bit positions have bit value 11 under this condition. Note that under this condition, one certain 0 bit position in [2..n][2..n] does not change to 11 in tt generations is at most (1(11n)n21n)t(11en)t(1-(1-\frac{1}{n})^{n-2}\frac{1}{n})^{t}\leq(1-\frac{1}{en})^{t}. Then a union bound shows

Pr\displaystyle\Pr [T~>n1+cthe 1-st value stays at 1]\displaystyle\left[\tilde{T}>n^{1+c}\mid\text{the $1$-st value stays at $1$}\right]
\displaystyle\leq{} (n1)(11en)n1+c(n1)ence.\displaystyle{(n-1)\left(1-\frac{1}{en}\right)^{n^{1+c}}\leq(n-1)e^{-\frac{n^{c}}{e}}}.

Noting that the probability that the offspring with the first bit value changing from 11 to 0 can enter into next generation is at most 1nan=an21n2c\frac{1}{n}\frac{a}{n}=\frac{a}{n^{2}}\leq\frac{1}{n^{2-c}}, we can obtain the probability that Event II happens within n1+cn^{1+c} generations is at least

(1\displaystyle\bigg{(}1- 1n2c)n1+c(1(n1)ence)\displaystyle\frac{1}{n^{2-c}}\bigg{)}^{n^{1+c}}{\left(1-(n-1)e^{-\frac{n^{c}}{e}}\right)}
\displaystyle\geq{} 1n1+cn2c(n1)ence=11n12c(n1)ence.\displaystyle 1-\frac{n^{1+c}}{n^{2-c}}-{(n-1)e^{-\frac{n^{c}}{e}}}=1-\frac{1}{n^{1-2c}}-{(n-1)e^{-\frac{n^{c}}{e}}}.

Further taking c=13c=\frac{1}{3}, together with the probability of Event I when the first bit pattern changes to (1,0)(1,0) before the number of zeros decreases to ncn^{c}, we have Event I or Event II will happen with probability at least 1e+1n1/3(n1)en1/3e1-\frac{e+1}{n^{1/3}}-(n-1)e^{-\frac{n^{1/3}}{e}}. ∎

Theorem 7.

Let n6n\geq 6. Then with probability at least 1(n+1)exp(n1/3e)e+1n1/31-(n+1)\exp{\left(-\frac{n^{1/3}}{e}\right)}-\frac{e+1}{n^{1/3}}, (1+1)(1+1) EA (RLS) cannot find the optimum of the nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function.

Proof.

For the uniformly and randomly generated 11-st generation, we have E[j=2nXj1]=(n1)/2E[\sum_{j=2}^{n}X_{j}^{1}]=(n-1)/2. The Chernoff inequality in Lemma 1(b) gives that Pr[j=2nXj134n]exp((n1)/8){\Pr[\sum_{j=2}^{n}X_{j}^{1}\geq\tfrac{3}{4}n]\leq\exp(-(n-1)/8)}. Hence, with probability at least 1exp((n1)/8)1-\exp(-(n-1)/8), j=2nXj1<34n\sum_{j=2}^{n}X_{j}^{1}<\tfrac{3}{4}n holds, that is, [2..n][2..n] bit positions have at least 14n1\frac{1}{4}n-1 zeros. Thus, neither Event II nor Event II happens at the first generation. We consider this initial status in the following.

Recall that from Lemma 3, Event I and Event II are two stagnation situations. For the first bit values X10X_{1}^{0} and X11X_{1}^{1} of the randomly generated two individuals X0X^{0} and X1X^{1}, there are four situations, (X10,X11)=(0,1),(0,0),(1,0)(X_{1}^{0},X_{1}^{1})=(0,1),(0,0),(1,0) or (1,1)(1,1). If (X10,X11)=(0,1)(X_{1}^{0},X_{1}^{1})=(0,1), Event I already happens. Respectively from Lemma 4, Lemma 5, and Lemma 6, we could know that with probability at least 1e+1n1/3(n1)ence1-\frac{e+1}{n^{1/3}}-(n-1)e^{-\frac{n^{c}}{e}}, Event I or Event II will happen in the certain generations after the initial first bit pattern (0,0),(1,0)(0,0),(1,0) and (1,1)(1,1).

Overall, together with the probability of j=2nXj1<34n\sum_{j=2}^{n}X_{j}^{1}<\tfrac{3}{4}n, we have the probability for getting stuck is at least

(1en18)\displaystyle\left(1-e^{-\frac{n-1}{8}}\right){} (1e+1n1/3(n1)en1/3e)\displaystyle\left(1-\frac{e+1}{n^{1/3}}-(n-1)e^{-\frac{n^{1/3}}{e}}\right)
\displaystyle\geq{} 1en18e+1n1/3(n1)en1/3e\displaystyle 1-e^{-\frac{n-1}{8}}-\frac{e+1}{n^{1/3}}-(n-1)e^{-\frac{n^{1/3}}{e}}
\displaystyle\geq{} 1(n+1)en1/3ee+1n1/3\displaystyle 1-(n+1)e^{-\frac{n^{1/3}}{e}}-\frac{e+1}{n^{1/3}}

where the last inequality uses exp(n18)2exp(n8)=2exp(n1/3een2/38)2exp(n1/3e)\exp{(-\frac{n-1}{8})}\leq 2\exp(-\frac{n}{8})=2\exp(-\frac{n^{1/3}}{e}\frac{en^{2/3}}{8})\leq 2\exp(-\frac{n^{1/3}}{e}) when n6n\geq 6. ∎

One key reason causing the difficulty for (1+1) EA and RLS is that there is only one individual in the population. As we can see in the proof, once the algorithm finds the (0,1)(0,1) optimal pattern in the first bit, the progress in OneMax component cannot pass on to the next generation, and once the current OneMax component finds the optimum before the first bit (0,1)(0,1) optimal pattern, the optimal first bit pattern cannot be obtained further. In EAs, for some cases, the large population size does not help [23], but the population could also have many benefits for ensuring the performance [24, 25, 26, 27]. Similarly, we would like to know whether introducing population with not small size will improve the fault tolerance to overcome the first difficulty, and help to overcome the second difficulty since (1,1n)(1,1^{n}) individual has worse fitness so that it is easy to be eliminated in the selection. The details will be shown in Section IV.

The above analyses are conducted for the offline mode. For the online mode on problem (3), as discussed in Sections II-B and III-A, we assume the time to make the decision that could change the fitness function value is so small that we can just expect one solution with a better or equivalent fitness value at each time step. Hence, when we reuse Algorithm 1, the time tt of the fitness function increases to t+1t+1 once the algorithm witnesses the generation gg where the generated X~g\tilde{X}^{g} satisfies f(Xg,X~g)f(Xg1,Xg)f(X^{g},\tilde{X}^{g})\geq f(X^{g-1},X^{g}). It is not difficult to see that tt and gg are usually different as Algorithm 1 may need more than one generation to generate such X~g\tilde{X}^{g}. Hence, from Theorem 7, we could also obtain that with probability at least 1(n+1)exp(n1/3e)e+1n1/31-(n+1)\exp{\left(-\frac{n^{1/3}}{e}\right)}-\frac{e+1}{n^{1/3}}, (1+1)(1+1) EA or RLS in our discussed online mode cannot find the solution at some time step tt when the function value of problem (3) is greater than n1n-1.

IV (μ+1)(\mu+1) EA Can Find the Optimum

IV-A (μ+1)(\mu+1) EA Utilized for OneMax(0,1n){}_{(0,1^{n})}

(μ+1)(\mu+1) EA is a commonly used benchmark algorithm for evolutionary theory analysis, which maintains a parent population of size μ\mu comparing with (1+1)(1+1) EA that has population size 11. In the mutation operator, one parent is uniformly and randomly selected from the parent population, and the bit-wise mutation is employed on this parent individual and generates its offspring. Then the selection operator will uniformly remove one individual with the worse fitness value from the union individual set of the population and the offspring.

Similar to (1+1)(1+1) EA discussed in Section III, the general (μ+1)(\mu+1) EA is utilized for non-time-linkage function, and some small modifications are required for solving time-linkage problems. For solving OneMax(0,1n){}_{(0,1^{n})} function in an offline mode, we just consider the representation that each individual in the population just encodes the current solution and stores the previous solution for the fitness evaluation. Similar to RLS and (1+1)(1+1) EA, only offspring with better or equivalent fitness could affect the further fitness, hoping the optimization process to learn the preference of the time-linkage property. Algorithm 2 shows how (μ+1)(\mu+1) EA solves the time-linkage function that relies on two consecutive time steps. With no confusion, we shall still call this algorithm (μ+1)(\mu+1) EA. Also note that we do not set the termination criterion in the algorithm statement, as we aim at theoretically analyzing the time to reach the optimum, that is, to generate X~g=(1,,1)\tilde{X}^{g}=(1,\dots,1) condition on that its parent Xi~gX_{\tilde{i}}^{g} has the first bit value 0 in Algorithm 2.

1:Generate the random initial two generations P0={X10,,Xμ0}P^{0}=\{X_{1}^{0},\dots,X_{\mu}^{0}\} and P1={X11,,Xμ1}P^{1}=\{X_{1}^{1},\dots,X_{\mu}^{1}\}, where Xig=(Xi,1g,,Xi,ng),i={1,,μ},g={0,1}X_{i}^{g}=(X_{i,1}^{g},\dots,X_{i,n}^{g}),i=\{1,\dots,\mu\},g=\{0,1\}
2:for g=1,2,g=1,2,\dots do
3:%%\%\% Mutation
4:   Uniformly and randomly select one index i~\tilde{i} from [1..μ][1..\mu]
5:   Generate X~g\tilde{X}^{g} via independently flipping each bit value of Xi~gX_{\tilde{i}}^{g} with probability 1/n1/n
6:%%\%\% Selection
7:   if f(Xi~g,X~g)mini{1,,μ}{f(Xig1,Xig)}f(X_{\tilde{i}}^{g},\tilde{X}^{g})\geq{\min\limits_{i\in\{1,\dots,\mu\}}\{f(X_{i}^{g-1},X_{i}^{g})\}} then
8:      Let (P~g1,P~g)={(Pg1,Pg),(Xi~g,X~g)}(\tilde{P}^{g-1},\tilde{P}^{g})=\{(P^{g-1},P^{g}),(X_{\tilde{i}}^{g},\tilde{X}^{g})\}
9:      Remove the pair with the lowest fitness in (P~g1,P~g)(\tilde{P}^{g-1},\tilde{P}^{g}) uniformly at random
10:      Pg+1=P~g,Pg=P~g1P^{g+1}=\tilde{P}^{g},P^{g}=\tilde{P}^{g-1}
11:   else
12:      Pg+1=Pg,Pg=Pg1P^{g+1}=P^{g},P^{g}=P^{g-1}
13:   end if
14:end for
Algorithm 2 (μ+1)(\mu+1) EA to maximize fitness function ff requiring two consecutive time steps

We do not consider using (μ+1)(\mu+1) EA to solve OneMax(0,1n){}_{(0,1^{n})} function in an online mode. If the decision must be made in a short time period as we discussed in Section III-A, since different individuals in the parent population has their own evolving histories and different time fronts, the better offspring generated in one step cannot be regarded as the decision of the next step for the individuals in the parent population other than its own parent. If we have enough budget before time step changes, similar to the discussion in Section III-A, we will have the fitness less than 1 for any time step since Xt=(1,,1)X^{t}=(1,\dots,1) for each time step tt. Also, it is not interesting for us. Hence, the following analysis only considers (μ+1)(\mu+1) EA (Algorithm 2) solving OneMax(0,1n){}_{(0,1^{n})} function in the offline mode.

IV-B Convergence Analysis of (μ+1)(\mu+1) EA on OneMax(0,1n){}_{(0,1^{n})}

In Section III, we show the two cases happening before the optimum is reached that cause the stagnation of (1+1)(1+1) EA and RLS for the OneMax(0,1n){}_{(0,1^{n})}. One is that the (0,1)(0,1) first bit pattern is reached, and the other is that the current individual has the value one in all its bits with the previous first bit value as 11. The single individual in the population of (1+1)(1+1) EA or RLS results in the poor tolerance to the incorrect trial of the algorithm. This subsection will show that the introduction of population can increase the tolerance to the incorrect trial, and thus overcome the stagnation. That is, we will show that (μ+1)(\mu+1) EA can find the optimum of OneMax(0,1n){}_{(0,1^{n})} with high probability. In order to give an intuitive feeling about the reason why the population can help for solving OneMax(0,1n){}_{(0,1^{n})}, we briefly and not-so-rigorously discuss it before establishing a rigorous analysis.

Corresponding to two stagnation cases for (1+1)(1+1) EA or RLS, (μ+1)(\mu+1) EA can get stuck when all individuals have the first bit value as 11 no matter the previous first bit value 0 as the first case or the previous first bit value 11 as the second case. As discussed in Section III, we know that the individual with previous first bit value as 11 has no fitness advantage against the one with previous first bit value as 0. Due to the selection operator, the one with previous first bit value 11 will be early replaced by the offspring with good fitness. As the process goes by, more detailedly in linear time of the population size in expectation, all individuals with the previous first bit value 11 will die out, and the offspring with its parent first bit value 11 cannot enter into the population. That is, the second case cannot take over the whole population to cause the stagnation.

As for the first case that the (0,1)(0,1) pattern individuals takes over the population, we focus on the evolving process of the best (0,0)(0,0) pattern individual, which is fertile, similar to the runtime analysis of original (μ+1)(\mu+1) EA in [28]. The best (0,0)(0,0) pattern individuals can be incorrectly replaced only by the (0,1)(0,1) pattern individual with better or the same fitness and only when all individuals with worse fitness than the best (0,0)(0,0) pattern individual are replaced. With a sufficient large population size, like Ω(n)\Omega(n) as nn the problem size, with high probability, the better (0,1)(0,1) pattern individuals cannot take over the whole population and the (0,1)(0,1) pattern individuals with the same fitness as the best (0,0)(0,0) pattern individual cannot replace all best (0,0)(0,0) individuals when the population doesn’t have any individual with worse fitness than the best (0,0)(0,0) individual. That is, the first case with high probability will not happen for (μ+1)(\mu+1) EA. In a word, the population in (μ+1)(\mu+1) EA increases the tolerance to the incorrect trial.

Now we start our rigorous analysis. As we could infer from the above description, the difficulty of the theoretical analysis lies on the combining discussion of the inter-generation dependencies (the time-linkage among two generations) and the inner-generation dependencies (such as the selection operator). One way to handle these complicated stochastic dependencies could be the mean-field analysis, that is, mathematically analysis on a designed simplified algorithm that discards some dependencies and together with an experimental verification on the similarity between the simplified algorithm and the original one. It has been already introduced for the evolutionary computation theory [29]. However, the mean-field analysis is not totally mathematically rigorous. Hence, we don’t utilize it here and analyze directly on the original algorithm. Maybe the mean-field analysis could help in more complicated algorithm and time-linkage problem, and we also hope our analysis could provide some other inspiration for the future theory work on the time-linkage problem.

For clarity, we put some calculations as lemmas in the following.

Lemma 8.

Let a,na,n\in\mathbb{N}, and a<na<n. Define the functions h1:[0,na1](0,1)h_{1}:[0,n-a-1]\rightarrow(0,1) and h2:[1,na](0,1)h_{2}:[1,n-a]\rightarrow(0,1) by

h1(d)=(a+d1d)nd+1,h2(d)=(a+d1d1)nd,\displaystyle h_{1}(d)=\frac{\binom{a+d-1}{d}}{n^{d+1}},h_{2}(d)=\frac{\binom{a+d-1}{d-1}}{n^{d}},

then h1(d)h_{1}(d) and h2(d)h_{2}(d) are monotonically decreasing.

Proof.

Since h1>0h_{1}>0, and for any d1,d2[0,na1]d_{1},d_{2}\in[0,n-a-1] and d1d2d_{1}\leq d_{2},

h1(d1)h1(d2)=\displaystyle\frac{h_{1}(d_{1})}{h_{1}(d_{2})}={} (a+d11d1)nd1+1nd2+1(a+d21d2)\displaystyle\frac{\binom{a+d_{1}-1}{d_{1}}}{n^{d_{1}+1}}\frac{n^{d_{2}+1}}{\binom{a+d_{2}-1}{d_{2}}}
=\displaystyle={} nd2d1(a+d11)!(a1)!d1!(a1)!d2!(a+d21)!\displaystyle n^{d_{2}-d_{1}}\frac{(a+d_{1}-1)!}{(a-1)!d_{1}!}\frac{(a-1)!d_{2}!}{(a+d_{2}-1)!}
=\displaystyle={} nd2d1d2(d1+1)(a+d21)(a+d1)\displaystyle n^{d_{2}-d_{1}}\frac{d_{2}\cdots(d_{1}+1)}{(a+d_{2}-1)\cdots(a+d_{1})}
\displaystyle\geq{} nd2d1(d1+1a+d1)d2d1(nn1)d2d11,\displaystyle n^{d_{2}-d_{1}}\left(\frac{d_{1}+1}{a+d_{1}}\right)^{d_{2}-d_{1}}\geq\left(\frac{n}{n-1}\right)^{d_{2}-d_{1}}\geq 1,

we know h1(d)h_{1}(d) is monotonically decreasing.

Similarly, since h1>0h_{1}>0, and for any d1,d2[1,na]d_{1},d_{2}\in[1,n-a] and d1d2d_{1}\leq d_{2},

h2(d1)h2(d2)=\displaystyle\frac{h_{2}(d_{1})}{h_{2}(d_{2})}={} (a+d11d11)nd1nd2(a+d21d21)\displaystyle\frac{\binom{a+d_{1}-1}{d_{1}-1}}{n^{d_{1}}}\frac{n^{d_{2}}}{\binom{a+d_{2}-1}{d_{2}-1}}
=\displaystyle={} nd2d1(a+d11)!a!(d11)!a!(d21)!(a+d21)!\displaystyle n^{d_{2}-d_{1}}\frac{(a+d_{1}-1)!}{a!(d_{1}-1)!}\frac{a!(d_{2}-1)!}{(a+d_{2}-1)!}
=\displaystyle={} nd2d1(d21)d1(a+d21)(a+d1)\displaystyle n^{d_{2}-d_{1}}\frac{(d_{2}-1)\cdots d_{1}}{(a+d_{2}-1)\cdots(a+d_{1})}
\displaystyle\geq{} nd2d1(d1a+d1)d2d1nd2d1(1n)d2d1=1,\displaystyle n^{d_{2}-d_{1}}\left(\frac{d_{1}}{a+d_{1}}\right)^{d_{2}-d_{1}}\geq n^{d_{2}-d_{1}}\left(\frac{1}{n}\right)^{d_{2}-d_{1}}=1,

we know h2(d)h_{2}(d) is monotonically decreasing. ∎

Lemma 9.

Let nn\in\mathbb{N}. Define the function g:[1,n1/2](0,1)g:[1,n^{1/2}]\rightarrow(0,1) by g(a)=aana2g(a)=\frac{a^{a}}{n^{a^{2}}}, then g(a)g(a) is monotonically decreasing.

Proof.

Consider g~(a)=lng(a)=alnaa2lnn\tilde{g}(a)=\ln g(a)=a\ln a-a^{2}\ln n. For any a1,a2[1,n1/2]a_{1},a_{2}\in[1,n^{1/2}] and a1<a2a_{1}<a_{2}, we have

g~(a1)\displaystyle\tilde{g}(a_{1})-{} g~(a2)=a1lna1a12lnna2lna2+a22lnn\displaystyle\tilde{g}(a_{2})=a_{1}\ln a_{1}-a_{1}^{2}\ln n-a_{2}\ln a_{2}+a_{2}^{2}\ln n
\displaystyle\geq{} (a1+a2)lnna2lna2>0.\displaystyle(a_{1}+a_{2})\ln n-a_{2}\ln a_{2}>0.

Then, g~(a)\tilde{g}(a), and hence g(a)g(a), are monotonically decreasing. ∎

Lemma 10.

Let n>(4e)2n>(4e)^{2}. Then (34)n1/21n1/2\left(\frac{3}{4}\right)^{n^{1/2}-1}\leq n^{-1/2}.

Proof.

Consider s(n)=(n1/21)14+12lnns(n)=-\left(n^{1/2}-1\right)\frac{1}{4}+\frac{1}{2}\ln n. Then for n>(4e)2>16n>(4e)^{2}>16,

s(n)=1412n1/2+12n1=18n1(n1/2+4)<0.\displaystyle s^{\prime}(n)=-\tfrac{1}{4}\cdot\tfrac{1}{2}n^{-1/2}+\tfrac{1}{2}n^{-1}=\tfrac{1}{8}n^{-1}(-n^{1/2}+4)<0.

Hence, s(n)s(n) is monotonically decreasing for n>(4e)2n>(4e)^{2}. Since s((4e)2)=4e14+ln(4e)<0s((4e)^{2})=-\frac{4e-1}{4}+\ln(4e)<0, we have s(n)<0{s(n)<0} for n>(4e)2n>(4e)^{2}. Noting that 14>ln34-\frac{1}{4}>\ln\frac{3}{4}, we have (n1/21)ln34<12lnn\left(n^{1/2}-1\right)\ln\frac{3}{4}<-\frac{1}{2}\ln n and hence (34)n1/21n1/2\left(\frac{3}{4}\right)^{n^{1/2}-1}\leq n^{-1/2} for n>(4e)2n>(4e)^{2}. ∎

Now we show our results that (μ+1)(\mu+1) EA can find the optimum of OneMax(0,1n){}_{(0,1^{n})} function with high probability. We start with some definitions that will be frequently used in our proofs.

Definition 11.

Consider using (μ+1)(\mu+1) EA with population size μ\mu to optimize nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function. PgP^{g} is the population at the gg-th generation, and Pg1P^{g-1} for the (g1)(g-1)-th generation. Let l=max{f(Xig1,Xig)(Xi,1g1,Xi,1g)=(0,0),Xig1Pg1,XigPg,i[1..μ]}l=\max\{f(X_{i}^{g-1},X_{i}^{g})\mid(X_{i,1}^{g-1},X_{i,1}^{g})=(0,0),X_{i}^{g-1}\in P^{g-1},X_{i}^{g}\in P^{g},i\in[1..\mu]\} be the best fitness among all individuals with the (0,0)(0,0) first bit pattern. For XigPg,i[1..μ]X_{i}^{g}\in P^{g},i\in[1..\mu],

  • it is called a temporarily undefeated individual if f(Xig1,Xig)>lf(X_{i}^{g-1},X_{i}^{g})>l and (Xi,1g1,Xi,1g)=(0,1)(X_{i,1}^{g-1},X_{i,1}^{g})=(0,1).

  • it is called a current front individual if f(Xig1,Xig)=lf(X_{i}^{g-1},X_{i}^{g})=l and (Xi,1g1,Xi,1g)=(0,0)(X_{i,1}^{g-1},X_{i,1}^{g})=(0,0).

  • it is called an interior individual if it is neither a temporarily undefeated individual nor a current front individual.

The most difficult case is when all (1,0)(1,0) and (1,1)(1,1) are replaced, which happens after linear time of the population size as in the proof of Theorem 15. Lemmas 12 to 14 discuss the behavior in this case. Lemma 12 will show that when the population is large enough, with high probability, there are at most 12μ1\frac{1}{2}\mu-1 accumulative temporarily undefeated individuals before the current front individuals have only 1 zero.

Lemma 12.

Given any δ>0\delta>0. For n>(4(1+δ)e)2n>(4(1+\delta)e)^{2}, consider using (μ+1)(\mu+1) EA with population size μ4(1+δ)(3e+1)(n+1)\mu\geq 4(1+\delta)(3e+1)(n+1) to optimize nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function. Assume that

  • the current front individuals have less than 34n\frac{3}{4}n zeros;

  • all individuals are with the (0,0)(0,0) or (0,1)(0,1) pattern.

Then with probability at least 1exp(δ22(1+δ)(n1))1-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right), there are at most 12μ1\frac{1}{2}\mu-1 accumulative temporarily undefeated individuals generated before the current front individuals only have 1 zero.

Proof.

Consider the case when the current front individuals have at least 2 zeros. For the current population, let aa denote the number of zeros in one current front individual, then a2a\geq 2. Let mdm_{d} denote the set of the (0,0)(0,0) individuals that have a+da+d number of zeros. Obviously, m0m_{0} is the set of the best (0,0)(0,0) individuals. Let AA represent the event that the best (0,0)(0,0) fitness of the population increases in one generation, and BB the event that one (0,1)(0,1) offspring with better fitness than the current best (0,0)(0,0) fitness is generated in one generation. Then we have

Pr[A]\displaystyle\Pr[A]\geq{} d0|md|μ(a+d1d+1)nd+1(11n)nd1\displaystyle\sum_{d\geq 0}\frac{|m_{d}|}{\mu}\frac{\binom{a+d-1}{d+1}}{n^{d+1}}\left(1-\frac{1}{n}\right)^{n-d-1}
\displaystyle\geq{} d0|md|eμ(a+d1d+1)nd+1\displaystyle\sum_{d\geq 0}\frac{|m_{d}|}{e\mu}\frac{\binom{a+d-1}{d+1}}{n^{d+1}}

and

Pr[B]d0|md|μ(a+d1d)nd+1.\displaystyle\Pr[B]\leq\sum_{d\geq 0}\frac{|m_{d}|}{\mu}\frac{\binom{a+d-1}{d}}{n^{d+1}}.

Firstly, we discuss what happens under the condition that the parent is not from m>am_{>a} individuals. Let BB^{\prime} represent the event that one of m>am_{>a} individuals generates a (0,1)(0,1) offspring with better fitness than the current best (0,0)(0,0) fitness in one generation. Since

(a+d1d+1)(a+d1d)=(a+d1)!(d+1)!(a2)!d!(a1)!(a+d1)!=a1d+1,\displaystyle\frac{\binom{a+d-1}{d+1}}{\binom{a+d-1}{d}}=\frac{(a+d-1)!}{(d+1)!(a-2)!}\cdot\frac{d!(a-1)!}{(a+d-1)!}=\frac{a-1}{d+1},

we know

Pr[A]Pr[BB]d=0a|md|eμ(a+d1d+1)nd+1d=0a|md|μ(a+d1d)nd+1a1e(a+1)=1e(12a+1)13e\begin{split}\frac{\Pr[A]}{\Pr[B-B^{\prime}]}\geq{}&\frac{\sum_{d=0}^{a}\frac{{|m_{d}|}}{e\mu}\frac{\binom{a+d-1}{d+1}}{n^{d+1}}}{\sum_{d=0}^{a}\frac{{|m_{d}|}}{\mu}\frac{\binom{a+d-1}{d}}{n^{d+1}}}\geq\frac{a-1}{e(a+1)}\\ ={}&\frac{1}{e}\left(1-\frac{2}{a+1}\right)\geq\frac{1}{3e}\end{split} (4)

where the last inequality uses a2a\geq 2.

Secondly, we consider the case when the parent is selected from m>am_{>a} individuals, that it, event BB^{\prime} happens. With Lemma 8, we know

Pr[B]\displaystyle\Pr[B^{\prime}]\leq{} d>a|md|μ(a+d1d)nd+1d>a|md|μ(2a1a)na+1(2a1a)na+1\displaystyle\sum_{d>a}\frac{{|m_{d}|}}{\mu}\frac{\binom{a+d-1}{d}}{n^{d+1}}\leq\sum_{d>a}\frac{{|m_{d}|}}{\mu}\frac{\binom{2a-1}{a}}{n^{a+1}}\leq\frac{\binom{2a-1}{a}}{n^{a+1}}
=\displaystyle={} (2a1)!a!(a1)!na+1(a+1)a1na+1=1n2(a+1n)a1.\displaystyle\frac{(2a-1)!}{a!(a-1)!n^{a+1}}\leq\frac{(a+1)^{a-1}}{n^{a+1}}=\frac{1}{n^{2}}\left(\frac{a+1}{n}\right)^{a-1}.

We discuss in two cases considering anca\geq n^{c} and a<nca<n^{c} for any given constant c(0,1)c\in(0,1). From the assumption in this lemma, we know a<34na<\tfrac{3}{4}n. Then for anca\geq n^{c}, we have

(a+1n)a1(34)a1(34)nc1.\displaystyle\left(\frac{a+1}{n}\right)^{a-1}\leq\left(\frac{3}{4}\right)^{a-1}\leq\left(\frac{3}{4}\right)^{n^{c}-1}.

For a[2,nc)a\in[2,n^{c}),

(a+1n)a1(ncn)a11n1c.\displaystyle\left(\frac{a+1}{n}\right)^{a-1}\leq\left(\frac{n^{c}}{n}\right)^{a-1}\leq\frac{1}{n^{1-c}}.

Hence, we have for a[2,34n)a\in[2,\tfrac{3}{4}n), Pr[B]nc3\Pr[B^{\prime}]\leq n^{c-3} since (34)nc1nc1(\tfrac{3}{4})^{n^{c}-1}\leq n^{c-1} when nn is large. Together with Pr[A]1/(eμn)\Pr[A]\geq 1/(e\mu n), we know

Pr[A]Pr[B]n2ceμ.\displaystyle\frac{\Pr[A]}{\Pr[B^{\prime}]}\geq\frac{n^{2-c}}{e\mu}. (5)

Hence, from (4) and (5), we obtain

Pr[A]Pr[B]1eμn2c+3e.\displaystyle\frac{\Pr[A]}{\Pr[B]}\geq\frac{1}{\frac{e\mu}{n^{2-c}}+3e}.

Then if we consider the subprocess that merely consists of event AA and BB, we have Pr[AAB]1/(eμn2c+3e+1)\Pr[A\mid A\cup B]\geq 1/(\frac{e\mu}{n^{2-c}}+3e+1). Let XX be the number of iterations that BB happens in the subprocess before AA occurs nn times, then E[X](eμn2c+3e+1)nE[X]\leq(\frac{e\mu}{n^{2-c}}+3e+1)n. It is not difficult to see that XX is stochastically dominated by the sum of nn geometric variables with success probability 1eμn2c+3e+1\frac{1}{\frac{e\mu}{n^{2-c}}+3e+1}. Hence, with the Chernoff bound for the sum of geometric variables in Lemma 1(c), we have that for any positive constant δ\delta,

Pr\displaystyle\Pr{} [X(1+δ)(eμn2c+3e+1)n]exp(δ2(n1)2(1+δ)).\displaystyle\left[X\geq(1+\delta)\left(\frac{e\mu}{n^{2-c}}+3e+1\right)n\right]\leq\exp\left(-\frac{\delta^{2}(n-1)}{2(1+\delta)}\right).

When n>(4(1+δ)e)11cn>(4(1+\delta)e)^{\frac{1}{1-c}} and μ4(1+δ)(3e+1)(n+1)\mu\geq 4(1+\delta)(3e+1)(n+1), we know

(1+δ)(eμn2c+3e+1)n=\displaystyle(1+\delta)\left(\frac{e\mu}{n^{2-c}}+3e+1\right)n={} (1+δ)eμn1c+(1+δ)(3e+1)n\displaystyle\frac{(1+\delta)e\mu}{n^{1-c}}+(1+\delta)(3e+1)n
<\displaystyle<{} μ4+μ41=12μ1.\displaystyle\tfrac{\mu}{4}+\tfrac{\mu}{4}-1=\tfrac{1}{2}\mu-1.

Hence, we know that with probability at least 1exp(δ22(1+δ)(n1))1-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right), there are at most 12μ1\frac{1}{2}\mu-1 possible accumulative temporarily undefeated individuals before AA occurs nn times. Now take c=12c=\frac{1}{2}, then (4(1+δ)e)11c=(4(1+δ)e)2(4(1+\delta)e)^{\frac{1}{1-c}}=(4(1+\delta)e)^{2}. With n>(4(1+δ)e)2>(4e)2n>(4(1+\delta)e)^{2}>(4e)^{2}, from Lemma 10, we know (34)nc1nc1(\tfrac{3}{4})^{n^{c}-1}\leq n^{c-1} and thus (5) hold. Noting that reducing the number of zeros in one current front individuals to 1 requires at most n1n-1 occurrence times of AA, this lemma is proved. ∎

Lemma 13 will show that with high probability, the current front individuals will get accumulated to Θ(n0.5)\Theta(n^{0.5}) before all interior individuals have the same number of zeros as the current front individuals.

Lemma 13.

Let n>(4e)2n>(4e)^{2}. Consider using (μ+1)(\mu+1) EA with population size μ\mu to optimize nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function.

  • Assume that

    • there are at most 12μ1\frac{1}{2}\mu-1 accumulative temporarily undefeated individuals before the current front individuals only have 1 zero;

    • all individuals are with the (0,0)(0,0) or (0,1)(0,1) pattern;

  • Consider the phase starting from the first time when the current front individuals have a[1..n]a\in[1..n] zeros, and ending with the first time when one (0,0)(0,0) pattern individual with less than aa zeros is generated for a[2..n]a\in[2..n], or ending with the first time when one (0,1)(0,1) pattern individual with all ones is generated for a=1a=1.

Then with probability at least 1exp(120n0.5)1-\exp\left(-\tfrac{1}{20}n^{0.5}\right), the current front individuals will increase by more than 15n0.5\tfrac{1}{5}n^{0.5} if the current phase doesn’t end before all individuals have at most aa zeros.

Proof.

Same as the notation in the proof of Lemma 12, md,d0m_{d},d\geq 0 denotes the set of the (0,0)(0,0) individuals that have a+da+d number of zeros. We now analyze the change of |m0||m_{0}| until all other individuals have at most aa zeros, if possible, during the phase. Let CC represent the event that one (0,0)(0,0) pattern individual with aa zeros is generated in one generation, and DD the event that one (0,1)(0,1) pattern individual with aa zeros is generated in one generation. Then we have

Pr[C]\displaystyle\Pr[C]\geq{} d0|md|μ(a+d1d)nd(11n)nd\displaystyle\sum_{d\geq 0}\frac{|m_{d}|}{\mu}\frac{\binom{a+d-1}{d}}{n^{d}}\left(1-\frac{1}{n}\right)^{n-d}
\displaystyle\geq{} d1|md|eμ(a+d1d)nd+|m0|μ(11n)n\displaystyle\sum_{d\geq 1}\frac{|m_{d}|}{e\mu}\frac{\binom{a+d-1}{d}}{n^{d}}+\frac{|m_{0}|}{\mu}\left(1-\frac{1}{n}\right)^{n}

and

Pr[D]d1|md|μ(a+d1d1)nd+|m0|μ(a11)n2\displaystyle\Pr[D]\leq\sum_{d\geq 1}\frac{|m_{d}|}{\mu}\frac{\binom{a+d-1}{d-1}}{n^{d}}+\frac{|m_{0}|}{\mu}\frac{\binom{a-1}{1}}{n^{2}}

where we define (01)=1\binom{0}{1}=1 for a=1a=1. Assume that the parent is not from m>a2m_{>a^{2}} individuals. Let DD^{\prime} represent the event that one of m>a2m_{>a^{2}} individuals generates a (0,1)(0,1) offspring with aa zeros in one generation. Due to the definition of mdm_{d}, when m>a2m_{>a^{2}} exist, we have a+a2na+a^{2}\leq n, and then a<n0.5a<n^{0.5}. Since

(a+d1d)(a+d1d1)=(a+d1)!d!(a1)!(d1)!a!(a+d1)!=ad,\displaystyle\frac{\binom{a+d-1}{d}}{\binom{a+d-1}{d-1}}=\frac{(a+d-1)!}{d!(a-1)!}\cdot\frac{(d-1)!a!}{(a+d-1)!}=\frac{a}{d},

we know

Pr[C]Pr[DD]d=1a2|md|eμ(a+d1d)nd+|m0|μ(11n)nd=1a2|md|μ(a+d1d1)nd+|m0|μ(a11)n2aea2=1ea>1en0.5.\begin{split}\frac{\Pr[C]}{\Pr[D-D^{\prime}]}\geq{}&\frac{\sum_{d=1}^{a^{2}}\frac{|m_{d}|}{e\mu}\frac{\binom{a+d-1}{d}}{n^{d}}+\frac{|m_{0}|}{\mu}\left(1-\frac{1}{n}\right)^{n}}{\sum_{d=1}^{a^{2}}\frac{|m_{d}|}{\mu}\frac{\binom{a+d-1}{d-1}}{n^{d}}+\frac{|m_{0}|}{\mu}\frac{\binom{a-1}{1}}{n^{2}}}\\ \geq{}&\frac{a}{ea^{2}}=\frac{1}{ea}>\frac{1}{en^{0.5}}.\end{split} (6)

Now we calculate Pr[D]\Pr[D^{\prime}], the probability that one of m>a2m_{>a^{2}} individual generates a (0,1)(0,1) offspring with aa zeros in one generation, as

Pr[D]d>a2|md|μ(a+d1d1)ndd>a2|md|μ(a+a21a21)na2(a+a21a21)na2=(a+a21)!(a21)!a!na2aana21n,\begin{split}\Pr[D^{\prime}]\leq{}&\sum_{d>a^{2}}\frac{{|m_{d}|}}{\mu}\frac{\binom{a+d-1}{d-1}}{n^{d}}\leq\sum_{d>a^{2}}\frac{{|m_{d}|}}{\mu}\frac{\binom{a+a^{2}-1}{a^{2}-1}}{n^{a^{2}}}\\ \leq{}&\frac{\binom{a+a^{2}-1}{a^{2}-1}}{n^{a^{2}}}=\frac{(a+a^{2}-1)!}{(a^{2}-1)!a!n^{a^{2}}}\leq\frac{a^{a}}{n^{a^{2}}}\leq{\frac{1}{n}},\end{split} (7)

where the second inequality follows from Lemma 8 and the last inequality follows from Lemma 9 and a1a\geq 1. With Pr[C](11/n)n/μ1/(2eμ)\Pr[C]\geq(1-1/n)^{n}/\mu\geq 1/(2e\mu) for n2n\geq 2, we have

Pr[C]Pr[D]n2eμ.\displaystyle\frac{\Pr[C]}{\Pr[D^{\prime}]}\geq\frac{n}{2e\mu}. (8)

Hence, from (6) and (8), we obtain

Pr[C]Pr[D]12eμn+en0.5.\displaystyle\frac{\Pr[C]}{\Pr[D]}\geq{\frac{1}{\frac{2e\mu}{n}+en^{0.5}}}.

Then if we consider the subprocess that merely consists of event CC and DD, we have Pr[CCD]1/(2eμn+en0.5+1)\Pr[C\mid C\cup D]\geq 1/(\frac{2e\mu}{n}+en^{0.5}+1). Recalling the definition of the phase we consider, it is not difficult to see that at the initial generation of this phase, there is only one (0,0)(0,0) front individual with aa number of zeros, and not difficult to see that all (0,1)(0,1) individuals with aa zeros, if exist, are temporarily undefeated individuals of the previous phase. Noting the assumption that there are at most 12μ1\frac{1}{2}\mu-1 accumulative temporarily undefeated individuals before a=1a=1, we know there are at least 12μ\frac{1}{2}\mu individuals that have more than aa zeros for the current phase. Hence, it requires at least 12μ\frac{1}{2}\mu number of steps of the subprocess to replace these individuals with more than aa zeros. Let YY be the number of times that CC happens in 12μ\frac{1}{2}\mu steps of the subprocess, then

E[Y]\displaystyle E[Y]\geq{} 12μ2eμn+en0.5+1\displaystyle\frac{\frac{1}{2}\mu}{\frac{2e\mu}{n}+en^{0.5}+1}
\displaystyle\geq{} min{124(1+δ)(3e+1)(n+1)2en0.5+1,12μ22eμn+1}\displaystyle\min\left\{\frac{\frac{1}{2}\cdot 4(1+\delta)(3e+1)(n+1)}{2en^{0.5}+1},\frac{\frac{1}{2}\mu}{2\frac{2e\mu}{n}+1}\right\}
\displaystyle\geq{} min{6en3en0.5,110en}25n0.5\displaystyle\min\{\tfrac{6en}{3en^{0.5}},\tfrac{1}{10e}n\}\geq\tfrac{2}{5}n^{0.5}

where the last inequality uses 110en110e4en0.5=25n0.5\frac{1}{10e}n\geq\frac{1}{10e}4en^{0.5}=\frac{2}{5}n^{0.5} for n(4e)2n\geq(4e)^{2}. The Chernoff inequality in Lemma 1(a) gives that Pr[Y15n0.5]exp(120n0.5)\Pr[Y\leq\frac{1}{5}n^{0.5}]\leq\exp\left(-\tfrac{1}{20}n^{0.5}\right). That is, with probability at least 1exp(120n0.5)1-\exp\left(-\tfrac{1}{20}n^{0.5}\right), |m0||m_{0}| will increase by more than 15n0.5{\tfrac{1}{5}n^{0.5}} if current phase does not end before all individuals have at most aa zeros. ∎

Lemma 14 shows that with high probability, it cannot happen that all current front individuals are replaced by the (0,1)(0,1) pattern individuals.

Lemma 14.

Let n>(4e)2n>(4e)^{2}. Consider using (μ+1)(\mu+1) EA with population size μ\mu to optimize nn-dimensional OneMax(0,1n){}_{(0,1^{n})} function. Considered the same assumptions and phase as in Lemma 13. Further assume that at one generation of the current phase, there are |m0|>15n0.5|m_{0}|>\tfrac{1}{5}n^{0.5} current front individuals with aa zeros, and all interior individuals has aa zeros. Then with probability at least 1(ee+1)n0.5/51-\big{(}\frac{e}{e+1}\big{)}^{n^{0.5}/5}, the current phase ends before |m0||m_{0}| decreases to 0.

Proof.

Let FF denote the event that the current phase ends, that is, a (0,0)(0,0) offspring with at most a1a-1 zeros when a2a\geq 2 or a (0,1)(0,1) offspring with 0 zero when a=1a=1 is generated. Then

Pr[Fa2]\displaystyle\Pr[F\mid a\geq 2] |m0|eμa1n|m0|enμ\displaystyle\geq\tfrac{|m_{0}|}{e\mu}\tfrac{a-1}{n}\geq\tfrac{|m_{0}|}{en\mu}
Pr[Fa=1]\displaystyle\Pr[F\mid a=1] |m0|eμ1n=|m0|enμ.\displaystyle\geq\tfrac{|m_{0}|}{e\mu}\tfrac{1}{n}=\tfrac{|m_{0}|}{en\mu}.

Then Pr[F]|m0|enμ\Pr[F]\geq\tfrac{|m_{0}|}{en\mu}.

Let GG denote the event that a (0,1)(0,1) offspring with aa zeros is generated and one (0,0)(0,0) individual is replaced. Suppose that the total number of the individuals with aa zeros is μ\mu^{\prime}. Then

Pr[G]|m0|μ1n|m0|μ+1|m0|2nμ(μ+1).\displaystyle\Pr[G]\leq\frac{|m_{0}|}{\mu}\frac{1}{n}\frac{|m_{0}|}{{\mu^{\prime}+1}}\leq\frac{|m_{0}|^{2}}{n\mu({\mu^{\prime}+1})}.

Hence,

Pr[F]Pr[G]|m0|enμnμ(μ+1)|m0|2=μ+1e|m0|1e\displaystyle\frac{\Pr[F]}{\Pr[G]}\geq\frac{|m_{0}|}{en\mu}\cdot\frac{n\mu({\mu^{\prime}+1})}{|m_{0}|^{2}}=\frac{{\mu^{\prime}+1}}{{e}|m_{0}|}\geq{\frac{1}{e}}

where the last inequality uses μ|m0|\mu^{\prime}\geq|m_{0}|. Then

Pr[GFG]ee+1.\Pr[G\mid F\cup G]\leq\frac{e}{e+1}.

Then the probability that GG happens |m0|>15n0.5|m_{0}|>\tfrac{1}{5}n^{0.5} times but FF does not happen is at most (ee+1)n0.5/5\big{(}\frac{e}{e+1}\big{)}^{n^{0.5}/5}, and this lemma is proved. ∎

Now the main result follows.

Theorem 15.

Given any δ>0\delta>0. Let n>(4(1+δ)e)2n>(4(1+\delta)e)^{2}. Then with probability at least 1(μ+2)exp(18n)exp(δ22(1+δ)(n1))2nexp(120n0.5)1-(\mu+2)\exp(-\tfrac{1}{8}n)-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right)-2n\exp(-\tfrac{1}{20}n^{0.5}), (μ+1)(\mu+1) EA with population size μ4(1+δ)(3e+1)(n+1)\mu\geq 4(1+\delta)(3e+1)(n+1) can find the optimum of OneMax(0,1n){}_{(0,1^{n})}.

Proof.

Similar to two situations in Lemma 3 that could possibly result in the stagnation of (1+1)(1+1) EA and RLS, the only two possible cases that could result in the stagnation of (μ+1)(\mu+1) EA are listed in the following.

  • Event I’: There is a g0g_{0}\in\mathbb{N} such that for all i[1..μ]i\in[1..\mu], (Xi,1g1,Xi,1g)=(0,1)(X_{i,1}^{g-1},X_{i,1}^{g})=(0,1) and Xi,[2..n]1n1X_{i,[2..n]}\neq 1^{n-1}.

  • Event II’: There is a g0g_{0}\in\mathbb{N} such that for all i[1..μ]i\in[1..\mu], (Xi,1g1,Xig)=(1,1n)(X_{i,1}^{g-1},X_{i}^{g})=(1,1^{n}).

For the uniformly and randomly generated P0P^{0} and P1P^{1}, we know that the expected number of the (1,0)(1,0) or (1,1)(1,1) first bit patterns, that is, the expected cardinality of the set {i[1..μ](Xi,10,Xi,11)=(1,0)\{i\in[1..\mu]\mid(X_{i,1}^{0},X_{i,1}^{1})=(1,0) or (1,1)}(1,1)\}, is μ2\tfrac{\mu}{2}. Via the Chernoff inequality in Lemma 1(b), we know that with probability at most 1exp(μ8)1-\exp(-\tfrac{\mu}{8}), at most 34μ\tfrac{3}{4}\mu individuals have the pattern (1,0)(1,0) or (1,1)(1,1). Under this condition, the expected number of the (0,0)(0,0) pattern is at least 18μ\tfrac{1}{8}\mu in the whole population. The Chernoff inequality in Lemma 1(a) also gives that under the condition that at most 34μ\tfrac{3}{4}\mu individuals have the pattern (1,0)(1,0) or (1,1)(1,1), with probability at least 1exp(μ64)1-\exp(-\tfrac{\mu}{64}), there are at least 116μ\tfrac{1}{16}\mu individuals with the pattern (0,0)(0,0) for the initial population. Since E[j=1n(1Xi,j1)]=12nE[\sum_{j=1}^{n}(1-X_{i,j}^{1})]=\tfrac{1}{2}n for any i[1..μ]i\in[1..\mu], the Chernoff inequality in Lemma 1(b) on the initial population gives that Pr[j=1n(1Xi,j1)34n]exp(18n)\Pr[\sum_{j=1}^{n}(1-X_{i,j}^{1})\geq\frac{3}{4}n]\leq\exp(-\tfrac{1}{8}n). Via a union bound, we know

Pr[i0[1..μ],j=1n(1Xi0,j1)34n]μexp(18n).\displaystyle\Pr\left[\exists i_{0}\in[1..\mu],\sum\nolimits_{j=1}^{n}\left(1-X_{i_{0},j}^{1}\right)\geq\frac{3}{4}n\right]\leq\mu\exp(-\tfrac{1}{8}n).

Hence, noting μ6412e64n>18n\tfrac{\mu}{64}\geq\frac{12e}{64}n>\frac{1}{8}n from μ4(1+δ)(3e+1)(n+1)\mu\geq 4(1+\delta)(3e+1)(n+1), it is easy to see that with probability at least

1\displaystyle 1- exp(μ8)exp(μ64)μexp(18n)\displaystyle\exp(-\tfrac{\mu}{8})-\exp(-\tfrac{\mu}{64})-\mu\exp(-\tfrac{1}{8}n)
\displaystyle\geq{} 12exp(μ64)μexp(18n)1(μ+2)exp(18n),\displaystyle 1-2\exp(-\tfrac{\mu}{64})-\mu\exp(-\tfrac{1}{8}n)\geq 1-(\mu+2)\exp(-\tfrac{1}{8}n),

the initial population has at most 34μ\tfrac{3}{4}\mu individuals with the pattern (1,0)(1,0) or (1,1)(1,1), at least 116μ\tfrac{1}{16}\mu individuals with the pattern (0,0)(0,0), and a<34na<\frac{3}{4}n at the first generation. Thus, in the following, we just consider this kind of initial population.

We first show that after O(μ)O(\mu) generations, the individuals with the first bit pattern (1,0)(1,0) or (1,1)(1,1) will be replaced and will not survive in any further generation, thus Event II’ cannot happen. Since a<34na<\frac{3}{4}n, we know the current front individual has fitness more than 11. Note that all (1,0)(1,0) or (1,1)(1,1) pattern individuals have fitness at most 0{0}. Then any offspring copied from one (0,0)(0,0) pattern individual, which has the (0,0)(0,0) pattern and the same fitness as its parent, will surely enter into the generation and replace some individual with the (1,0)(1,0) or (1,1)(1,1) pattern. Since there is at least 116μ\tfrac{1}{16}\mu individuals with the pattern (0,0)(0,0), we know that for each generation, with probability at least 116\tfrac{1}{16}, one individual with the (1,0)(1,0) or (1,1)(1,1) pattern will be replaced. Hence, the expected time to replace all individuals with the pattern (1,0)(1,0) or (1,1)(1,1) is at most 1634μ=12μ16\cdot\tfrac{3}{4}\mu=12\mu since there are at most 34μ\tfrac{3}{4}\mu individuals with the pattern (1,0)(1,0) or (1,1)(1,1). Also it is not difficult to see any offspring with the (1,0)(1,0) or (1,1)(1,1) pattern cannot be selected into the next generation for a population only with the (0,0)(0,0) or (0,1)(0,1) pattern. Later, only the (0,1)(0,1) and (0,0)(0,0) first bit pattern can survive in the further evolution.

Now we consider Event I’ after the first time when all (1,0)(1,0) or (1,1)(1,1) pattern individuals are replaced. From Lemma 12, we know that before a=1a=1, with probability at least 1exp(δ22(1+δ)(n1))1-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right), there are at most 12μ1\frac{1}{2}\mu-1 accumulative temporarily undefeated individuals. From Lemma 13 among all possible a[1..n]a\in[1..n], we know that if the optimum is not found before all individuals with at least 22 zeros are replaced, with probability at least 1nexp(120(n0.51))1-n\exp\left(-\tfrac{1}{20}(n^{0.5}-1)\right), there are at least 15n0.5\tfrac{1}{5}n^{0.5} number of (0,0)(0,0) individuals with a=1a=1. Then for the case a=1a=1 in Lemma 14 and considering all possible a[1..n]a\in[1..n], we know with probability at least 1n(ee+1)n0.5/51-n\big{(}\frac{e}{e+1}\big{)}^{n^{0.5}/5}, the optimum is found.

Overall, the probability that (μ+1)(\mu+1) EA can find the optimum of OneMax(0,1n){}_{(0,1^{n})} is at least

(1\displaystyle\bigg{(}1- (μ+2)e18n)(1exp(δ22(1+δ)(n1)))\displaystyle(\mu+2)e^{-\tfrac{1}{8}n}\bigg{)}\left(1-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right)\right)
(1nexp(120n0.5))(1n(ee+1)15n0.5)\displaystyle\cdot\Big{(}1-n\exp\left(-\tfrac{1}{20}n^{0.5}\right)\Big{)}\left(1-n\left(\frac{e}{e+1}\right)^{\tfrac{1}{5}n^{0.5}}\right)
\displaystyle\geq{} 1(μ+2)e18nexp(δ22(1+δ)(n1))\displaystyle 1-(\mu+2)e^{-\tfrac{1}{8}n}-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right)
nexp(120n0.5)n(ee+1)15n0.5\displaystyle-n\exp\left(-\tfrac{1}{20}n^{0.5}\right)-n\left(\frac{e}{e+1}\right)^{\tfrac{1}{5}n^{0.5}}
\displaystyle\geq{} 1(μ+2)e18nexp(δ22(1+δ)(n1))\displaystyle 1-(\mu+2)e^{-\tfrac{1}{8}n}-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right)
2nexp(120n0.5)\displaystyle-2n\exp\left(-\tfrac{1}{20}n^{0.5}\right)

where the last inequality uses e1/4>ee+1e^{-1/4}>\frac{e}{e+1}. ∎

In Theorem 15, we require that the population size μ4(1+δ)(3e+1)(n+1)\mu\geq 4(1+\delta)(3e+1)(n+1). One may ask about the behavior when μ=o(n)\mu=o(n). We note in the proof of Lemma 12, the upper bound for the expected number of accumulative temporarily undefeated individuals is (eμn2c+3e+1)n=Ω(n)(\frac{e\mu}{n^{2-c}}+3e+1)n=\Omega(n). If μ=o(n)\mu=o(n), we are not able to ensure that the accumulative temporarily undefeated individuals do not take over the population in our current proof, hence, we require μ=Ω(n)\mu=\Omega(n).

Comparing with (1+1)(1+1) EA, since (1,1n)(1,1^{n}) individual, corresponding to Event II in (1+1)(1+1) EA, has no fitness advantage against the one with previous first bit value as 0, it is easy to be replaced by the offspring with previous first bit value 0 in a population. Thus, this stagnation case cannot take over the whole population to cause the stagnation of (μ+1)(\mu+1) EA. The possible stagnation case that the (0,1)(0,1) pattern individuals take over the population, corresponding to Event I in (1+1)(1+1) EA, will not happen with a high probability because with a sufficient large, Ω(n)\Omega(n), population size as nn the problem size, with a high probability, the (0,0)(0,0) pattern can be maintained until the optimum is reached, that is, the population in (μ+1)(\mu+1) EA increases the tolerance to the incorrect (0,1)(0,1) pattern trial.

IV-C Runtime Analysis of (μ+1)(\mu+1) EA on OneMax(0,1n){}_{(0,1^{n})}

Theorem 15 only shows the probability that (μ+1)(\mu+1) EA can reach the optimum. One further question is about its runtime. Here, we give some comments on the runtime complexity. For the runtime of (μ+1)(\mu+1) EA on the original OneMax function, Witt [28] shows the upper bound of the expected runtime is O(μn+nlogn)O(\mu n+n\log n) based on the current best individuals’ replicas and fitness increasing. Analogously, for (μ+1)(\mu+1) EA on OneMax(0,1n){}_{(0,1^{n})} function, we could consider the expected time when the number of the current front individuals with aa zeros reaches n/an/a, that is, |m0|n/a|m_{0}|\geq n/a, and the expected time when a (0,0)(0,0) pattern offspring with less aa zeros is generated for a>1a>1 or when one (0,1)(0,1) pattern individual with all ones is generated for a=1a=1 conditional on that there are n/an/a current front individuals with aa zeros. From Lemma 12, with probability at least 1exp(δ22(1+δ)(n1))1-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right), there are at most 12μ1\frac{1}{2}\mu-1 possible accumulative temporarily undefeated individuals before the current front individuals have only 1 zero. Hence, in each generation before the current front individuals have only 1 zero, it always holds that at least half of individuals of the whole population are current front individuals and interior individuals. Hence, we could just discuss the population containing no temporarily undefeated individual and twice the upper bound of the expected time to reach the optimum as that for the true process. The (0,1)(0,1) pattern offspring with aa zeros will not influence the evolving process of the current front individuals we focus on until all interior individuals have aa zeros. Recalling Lemma 13, we know that with probability at least 1exp(120n0.5)1-\exp(-\tfrac{1}{20}n^{0.5}), |m0|15n0.5|m_{0}|\geq\tfrac{1}{5}n^{0.5} if the current phase does not end before all individuals have at most aa zeros. Hence, we just need to focus on the case when na15n0.5\frac{n}{a}\geq\tfrac{1}{5}n^{0.5}, that is, a5n0.5a\leq 5n^{0.5}.

We discuss the expected length of the phase defined in Lemma 13. When |m0|<na|m_{0}|<\frac{n}{a}, we consider the event that one replica of an m0m_{0} individual can enter into the next generation. When the population contains interior individual(s) with more than aa zeros, the probability is |m0|μ(11/n)n|m0|2eμ\frac{|m_{0}|}{\mu(1-1/n)^{n}}\geq\frac{|m_{0}|}{2e\mu}. When all interior individuals have aa zeros, we require |m0||m_{0}| to be less than 2n/a2n/a. Let μ\mu^{\prime} denote the total number of the individuals with aa zeros, and we know the probability of event HH that one replica of an m0m_{0} individual can enter into the next generation is |m0|μ(11n)nμ+1|m0|μ+1{\frac{|m_{0}|}{\mu}\left(1-\frac{1}{n}\right)^{n}\frac{\mu^{\prime}+1-|m_{0}|}{\mu^{\prime}+1}}. Note that the probability of event GG that an m0m_{0} individual generates a (0,1)(0,1) pattern offspring with aa zeros that successfully enters into the next generation is at most |m0|μ1n|m0|μ+1\frac{|m_{0}|}{\mu}\frac{1}{n}\frac{|m_{0}|}{\mu^{\prime}+1}. Then

Pr[H]Pr[G]\displaystyle\frac{\Pr[H]}{\Pr[G]}\geq{} |m0|μ(11n)nμ+1|m0|μ+1|m0|μ1n|m0|μ+112e(μ|m0|1)n\displaystyle\frac{\frac{|m_{0}|}{\mu}\left(1-\frac{1}{n}\right)^{n}\frac{{\mu^{\prime}+1}-|m_{0}|}{{\mu^{\prime}+1}}}{\frac{|m_{0}|}{\mu}\frac{1}{n}\frac{|m_{0}|}{{\mu^{\prime}+1}}}\geq\frac{1}{2e}\left(\frac{\mu^{\prime}}{|m_{0}|}-1\right)n
\displaystyle\geq{} 12e(2(1+δ)(3e+1)(n+1)|m0|1)n\displaystyle\frac{1}{2e}\left(\frac{{2(1+\delta)(3e+1)(n+1)}}{|m_{0}|}-1\right)n
\displaystyle\geq{} (1+δ)(3e+1)a12en32n\displaystyle\frac{{(1+\delta)(3e+1)a-1}}{2e}n\geq{\frac{3}{2}n}

where the antepenultimate inequality uses μ2(1+δ)(3e+1)(n+1)\mu^{\prime}\geq 2(1+\delta)(3e+1)(n+1) and the penultimate inequality uses |m0|2na|m_{0}|\leq\frac{2n}{a}. Hence, Pr[GHG]23n+2\Pr[G\mid H\cup G]\leq\frac{2}{3n+2}. Considering the process merely consisting of HH and GG, let ZZ be the number that HH happens before GG happens 15n0.5\tfrac{1}{5}n^{0.5} times. Then E[Z](3n+221)15n0.5=310n1.5E[Z]\geq\left(\frac{3n+2}{2}-1\right)\tfrac{1}{5}n^{0.5}=\frac{3}{10}n^{1.5}. It is not difficult to see that Z+15n0.5Z+\tfrac{1}{5}n^{0.5} stochastically dominates the sum of 15n0.5\tfrac{1}{5}n^{0.5} geometric variables with success probability 23n+2\frac{2}{3n+2}. Hence, with the Chernoff bound for the sum of geometric variables in Lemma 1(c), we have that

Pr[\displaystyle\Pr\big{[} Z+15n0.52n+25n0.5]\displaystyle Z+\tfrac{1}{5}n^{0.5}\leq 2n+\tfrac{2}{5}n^{0.5}\big{]}
\displaystyle\leq{} exp((1203n0.543n1)215n0.5243(1203n0.543n1))\displaystyle\exp\left(-\frac{\left(1-\tfrac{20}{3}n^{-0.5}-\tfrac{4}{3}n^{-1}\right)^{2}\tfrac{1}{5}n^{0.5}}{2-\tfrac{4}{3}\left(1-\tfrac{20}{3}n^{-0.5}-\tfrac{4}{3}n^{-1}\right)}\right)
\displaystyle\leq{} exp((1220e+13en0.5)15n0.5243(120e+13en0.5))\displaystyle\exp\left(-\frac{\left(1-2\cdot\tfrac{20e+1}{3e}n^{-0.5}\right)\tfrac{1}{5}n^{0.5}}{2-\tfrac{4}{3}\left(1-\tfrac{20e+1}{3e}n^{-0.5}\right)}\right)
\displaystyle\leq{} exp((31+20e+13en0.532)15n0.5)\displaystyle\exp\left(-\left(\frac{3}{1+\tfrac{20e+1}{3e}n^{-0.5}}-\frac{3}{2}\right)\frac{1}{5}n^{0.5}\right)
\displaystyle\leq{} exp((31+20e+13e4e32)15n0.5)exp(120n0.5)\displaystyle\exp\left(-\left(\frac{3}{1+\tfrac{20e+1}{3e\cdot 4e}}-\frac{3}{2}\right)\frac{1}{5}n^{0.5}\right)\leq\exp\left(-\frac{1}{20}n^{0.5}\right)

where the second inequality uses the fact 43n434en0.5=13en0.5\frac{4}{3n}\leq\frac{4}{3\cdot 4en^{0.5}}=\frac{1}{3en^{0.5}} for n>(4e)2n>(4e)^{2} and the fact (1x)212x(1-x)^{2}\geq 1-2x for xx\in\mathbb{R}, and the penultimate inequality uses n>(4e)2n>(4e)^{2}. Since 2n+15n0.52na+15n0.52n+\tfrac{1}{5}n^{0.5}\geq\frac{2n}{a}+\tfrac{1}{5}n^{0.5}, we know with probability at least 1exp(120n0.5){1-\exp(-\frac{1}{20}n^{0.5})}, |m0||m_{0}| could go above 2na\frac{2n}{a}.

When |m0||m_{0}| goes above 2n/a2n/a, we consider the event FF that the current phase ends. Recalling the proof in Lemma 14, we know that with probability at least 1(ee+1)15n0.5{1-(\frac{e}{e+1})^{\frac{1}{5}n^{0.5}}}, FF happens once before GG happens na15n0.5\frac{n}{a}\geq\frac{1}{5}n^{0.5} times, thus, before |m0||m_{0}| goes below na\frac{n}{a}.

In summary, in each phase, with probability at least (1exp(120n0.5))(1exp(120n0.5))(1(ee+1)15n0.5)13exp(120n0.5)(1-\exp(-\tfrac{1}{20}n^{0.5}))(1-\exp(-\frac{1}{20}n^{0.5}))(1-(\frac{e}{e+1})^{\frac{1}{5}n^{0.5}})\geq 1-3\exp(-\tfrac{1}{20}n^{0.5}), the current front individuals could increase its number to more than 2na\frac{2n}{a}, and will remain above n/an/a afterwards. Hence, together with the runtime analysis of the original (μ+1)(\mu+1) EA on OneMax in [28], we have the runtime result for (μ+1)(\mu+1) EA on OneMax(0,1n){}_{(0,1^{n})} in the following theorem, and know that comparing with OneMax function, the cost majorly lies on the o(1)o(1) success probability for (μ+1)(\mu+1) EA solving the time-linkage OneMax(0,1n){}_{(0,1^{n})}, and the asymptotic complexity remains the same for the case when (μ+1)(\mu+1) EA is able to find the optimum.

Theorem 16.

Given any δ>0\delta>0. Let n>(4(1+δ)e)2n>(4(1+\delta)e)^{2}. Consider using (μ+1)(\mu+1) EA with population size μ4(1+δ)(3e+1)(n+1){\mu\geq 4(1+\delta)(3e+1)(n+1)} to solve the OneMax(0,1n){}_{(0,1^{n})} function. Consider the same phase in Lemma 13. Let MM denote the event that

  • the first generation has at most 34μ\frac{3}{4}\mu individuals with the (1,0)(1,0) or (1,1)(1,1) first bit pattern, at least 14μ\frac{1}{4}\mu individuals with the (0,0)(0,0) pattern, and has all individuals with less than 34n\frac{3}{4}n zeros;

  • there are at most 12μ1\frac{1}{2}\mu-1 accumulative temporarily undefeated individuals before the current front individuals only have 1 zero;

  • the number of the current front individual with aa zeros can accumulate to 2na\frac{2n}{a} and stay above na\frac{n}{a} if the current phase does not end.

Then event MM occurring with probability at least 1(μ+2)exp(18n)exp(δ22(1+δ)(n1))3nexp(120n0.5)1-(\mu+2)\exp(-\tfrac{1}{8}n)-\exp\left(-\frac{\delta^{2}}{2(1+\delta)}(n-1)\right)-3n\exp(-\tfrac{1}{20}n^{0.5}), and conditional on MM, the expected runtime is O(μn)O(\mu n).

Recalling that the expected runtime of (μ+1)(\mu+1) EA on OneMax is O(μn+nlogn)O(\mu n+n\log n) [28], which is O(nlogn)O(n\log n) for μ=O(logn)\mu=O(\log n). Since μ=Ω(n)\mu=\Omega(n) is required for the convergence on OneMax(0,1n){}_{(0,1^{n})} in Section IV-B, Theorem 16 shows the expected runtime for OneMax(0,1n){}_{(0,1^{n})} is O(n2)O(n^{2}) if we choose μ=Θ(n)\mu=\Theta(n), which is the same complexity as for OneMax with μ=Θ(n)\mu=\Theta(n). To this degree, comparing (μ+1)(\mu+1) EA solving the time-linkage OneMax(0,1n){}_{(0,1^{n})} with the original OneMax function, we may say the cost majorly lies on o(1)o(1) convergence probability, not the asymptotic complexity.

V Conclusion and Future Work

In recent decades, rigorous theoretical analyses on EAs has progressed significantly. However, despite that many real-world applications have the time-linkage property, that is, the objective function relies on more than one time-step solutions, the theoretical analysis on the fitness function with time-linkage property remains an open problem.

This paper took the first step into this open area. We designed the time-linkage problem OneMax(0,1n){}_{(0,1^{n})}, which considers an opposite preference of the first bit value of the previous time step into the basic OneMax function. Via this problem, we showed that EAs with a population can prevent some stagnation in some deceptive situations caused by the time-linkage property. More specifically, we proved that the simple RLS and (1+1)(1+1) EA cannot reach the optimum of OneMax(0,1n){}_{(0,1^{n})} with 1o(1)1-o(1) probability but (μ+1)(\mu+1) EA can find the optimum with 1o(1)1-o(1) probability.

The time-linkage OneMax(0,1n){}_{(0,1^{n})} problem is simple. Only the immediate previous generation and the first bit value of the historical solutions matter for the fitness function. Our future work should consider more complicated algorithms, e.g., with crossover, on more general time-linkage pseudo-Boolean functions, e.g., with more than one bit value and other weight values for the historical solutions, and problems with practical backgrounds.

Acknowledgement

We thank Liyao Gao for his participation and discussion on the (1+1)(1+1) EA part during his summer intern.

References

  • [1] F. Neumann and C. Witt, Bioinspired Computation in Combinatorial Optimization – Algorithms and Their Computational Complexity.   Springer, Berlin, Heidelberg, 2010.
  • [2] A. Auger and B. Doerr, Theory of Randomized Search Heuristics: Foundations and Recent Developments.   World Scientific, 2011.
  • [3] T. Jansen, Analyzing Evolutionary Algorithms: The Computer Science Perspective.   Springer Science & Business Media, 2013.
  • [4] Z.-H. Zhou, Y. Yu, and C. Qian, Evolutionary Learning: Advances in Theories and Algorithms.   Springer, 2019.
  • [5] B. Doerr and F. Neumann, Theory of Evolutionary Computation: Recent Developments in Discrete Optimization.   Springer Nature, 2020.
  • [6] P. A. N. Bosman, “Learning, anticipation and time-deception in evolutionary online dynamic optimization,” in Genetic and Evolutionary Computation Conference, GECCO 2005, Workshop Proceedings.   ACM, 2005, pp. 39–47.
  • [7] T. T. Nguyen, “Continuous dynamic optimisation using evolutionary algorithms,” Ph.D. dissertation, University of Birmingham, 2011.
  • [8] T. Morimoto, Y. Ouchi, M. Shimizu, and M. Baloch, “Dynamic optimization of watering satsuma mandarin using neural networks and genetic algorithms,” Agricultural water management, vol. 93, no. 1-2, pp. 1–10, 2007.
  • [9] S. Droste, “Analysis of the (1+1) EA for a dynamically changing onemax-variant,” in Congress on Evolutionary Computation, CEC 2002, vol. 1.   IEEE, 2002, pp. 55–60.
  • [10] P. Rohlfshagen, P. K. Lehre, and X. Yao, “Dynamic evolutionary optimisation: an analysis of frequency and magnitude of change,” in Genetic and Evolutionary Computation Conference, GECCO 2009.   ACM, 2009, pp. 1713–1720.
  • [11] T. Kötzing and H. Molter, “ACO beats EA on a dynamic pseudo-boolean function,” in International Conference on Parallel Problem Solving from Nature, PPSN 2012.   Springer, 2012, pp. 113–122.
  • [12] T. Jansen and C. Zarges, “Analysis of randomised search heuristics for dynamic optimisation,” Evolutionary Computation, vol. 23, no. 4, pp. 513–541, 2015.
  • [13] J. Lengler and U. Schaller, “The (1+1)-EA on noisy linear functions with random positive weights,” in IEEE Symposium Series on Computational Intelligence, SSCI 2018.   IEEE, 2018, pp. 712–719.
  • [14] J. Lengler and J. Meier, “Large population sizes and crossover help in dynamic environments,” arXiv preprint arXiv:2004.09949, 2020.
  • [15] A. Lissovoi and C. Witt, “Runtime analysis of ant colony optimization on dynamic shortest path problems,” Theoretical Computer Science, vol. 561, pp. 73–85, 2015.
  • [16] F. Neumann and C. Witt, “On the runtime of randomized local search and simple evolutionary algorithms for dynamic makespan scheduling,” in International Joint Conference on Artificial Intelligence, IJCAI 2015.   AAAI Press, 2015, pp. 3742–3748.
  • [17] M. Pourhassan, W. Gao, and F. Neumann, “Maintaining 2-approximations for the dynamic vertex cover problem using evolutionary algorithms,” in Genetic and Evolutionary Computation Conference, GECCO 2015.   ACM, 2015, pp. 903–910.
  • [18] V. Roostapour, A. Neumann, F. Neumann, and T. Friedrich, “Pareto optimization for subset selection with dynamic cost constraints,” in AAAI Conference on Artificial Intelligence, AAAI 2019, 2019, pp. 2354–2361.
  • [19] J. Bossek, F. Neumann, P. Peng, and D. Sudholt, “Runtime analysis of randomized search heuristics for dynamic graph coloring,” in Genetic and Evolutionary Computation Conference, GECCO 2019.   ACM, 2019, pp. 1443–1451.
  • [20] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed.   MIT Press, 2018.
  • [21] B. Doerr, “Analyzing randomized search heuristics: Tools from probability theory,” in Theory of Randomized Search Heuristics: Foundations and Recent Developments.   World Scientific, 2011, pp. 1–20.
  • [22] ——, “Probabilistic tools for the analysis of randomized optimization heuristics,” in Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, B. Doerr and F. Neumann, Eds.   Springer, 2020, pp. 1–87.
  • [23] T. Chen, K. Tang, G. Chen, and X. Yao, “A large population size can be unhelpful in evolutionary algorithms,” Theoretical Computer Science, vol. 436, pp. 54–70, 2012.
  • [24] J. He and X. Yao, “From an individual to a population: An analysis of the first hitting time of population-based evolutionary algorithms,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 5, pp. 495–511, 2002.
  • [25] D.-C. Dang, T. Jansen, and P. K. Lehre, “Populations can be essential in tracking dynamic optima,” Algorithmica, vol. 78, no. 2, pp. 660–680, 2017.
  • [26] D. Corus and P. S. Oliveto, “On the benefits of populations for the exploitation speed of standard steady-state genetic algorithms,” in Genetic and Evolutionary Computation Conference, GECCO 2019.   ACM, 2019, pp. 1452–1460.
  • [27] D. Sudholt, “The benefits of population diversity in evolutionary algorithms: A survey of rigorous runtime analyses,” in Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, B. Doerr and F. Neumann, Eds.   Springer, 2020, pp. 359–404.
  • [28] C. Witt, “Runtime analysis of the (μ\mu+1) EA on simple pseudo-boolean functions,” Evolutionary Computation, vol. 14, no. 1, pp. 65–86, 2006.
  • [29] B. Doerr and W. Zheng, “Working principles of binary differential evolution,” Theoretical Computer Science, vol. 801, pp. 110–142, 2020.
[Uncaptioned image] Weijie Zheng received Bachelor Degree (July 2013) in Mathematics and Applied Mathematics from Harbin Institute of Technology, Harbin, Heilongjiang, China, and Doctoral Degree (October 2018) in Computer Science and Technology from Tsinghua University, Beijing, China. He is now a postdoctoral researcher in the Department of Computer Science and Engineering at Southern University of Science and Technology, Shenzhen, Guangdong, China, and in the School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China. His current research majorly focuses on the theoretical analysis and design of evolutionary algorithms, especially binary differential evolution and estimation-of-distribution algorithms.
[Uncaptioned image] Huanhuan Chen received the B.Sc. degree from the University of Science and Technology of China (USTC), Hefei, China, in 2004, and the Ph.D. degree in computer science from the University of Birmingham, Birmingham, U.K., in 2008. He is currently a Full Professor with the School of Computer Science and Technology, USTC. His current research interests include neural networks, Bayesian inference, and evolutionary computation. Dr. Chen was a recipient of the 2015 International Neural Network Society Young Investigator Award, the 2012 IEEE Computational Intelligence Society Outstanding Ph.D. Dissertation Award, the IEEE TRANSACTIONS ON NEURAL NETWORKS Outstanding Paper Award, and the 2009 British Computer Society Distinguished Dissertations Award. He is an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS and the IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE.
[Uncaptioned image] Xin Yao obtained his Ph.D. in 1990 from the University of Science and Technology of China (USTC), MSc in 1985 from North China Institute of Computing Technologies and BSc in 1982 from USTC. He is a Chair Professor of Computer Science at the Southern University of Science and Technology, Shenzhen, China, and a part-time Professor of Computer Science at the University of Birmingham, UK. He is an IEEE Fellow and was a Distinguished Lecturer of the IEEE Computational Intelligence Society (CIS). He was the President (2014-15) of IEEE CIS and the Editor-in-Chief (2003-08) of IEEE Transactions on Evolutionary Computation. His major research interests include evolutionary computation, ensemble learning, and their applications to software engineering. His work won the 2001 IEEE Donald G. Fink Prize Paper Award; 2010, 2016 and 2017 IEEE Transactions on Evolutionary Computation Outstanding Paper Awards; 2011 IEEE Transactions on Neural Networks Outstanding Paper Award; and many other best paper awards at conferences. He received a prestigious Royal Society Wolfson Research Merit Award in 2012, the IEEE CIS Evolutionary Computation Pioneer Award in 2013 and the 2020 IEEE Frank Rosenblatt Award.