This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Runtime Analysis of the (1+1) EA on Weighted Sums of Transformed Linear Functions

Frank Neumann
Optimisation and Logistics
School of Computer Science
The University of Adelaide
Adelaide, Australia &Carsten Witt
Algorithms, Logic and Graphs
DTU Compute
Technical University of Denmark
2800 Kgs. Lyngby Denmark
Abstract

Linear functions play a key role in the runtime analysis of evolutionary algorithms and studies have provided a wide range of new insights and techniques for analyzing evolutionary computation methods. Motivated by studies on separable functions and the optimization behaviour of evolutionary algorithms as well as objective functions from the area of chance constrained optimization, we study the class of objective functions that are weighted sums of two transformed linear functions. Our results show that the (1+1) EA, with a mutation rate depending on the number of overlapping bits of the functions, obtains an optimal solution for these functions in expected time O(nlogn)O(n\log n), thereby generalizing a well-known result for linear functions to a much wider range of problems.

1 Introduction

Runtime analysis is one of the major theoretical tools to provide rigorous insights into the working behavior of evolutionary algorithms and other randomized search heuristics [1, 2, 3]. The class of pseudo-Boolean linear functions plays a key role in the area of runtime analysis. Starting with the simplest linear functions called OneMax for which the first runtime analysis has been carried out, a wide range of results have been obtained for the general class of linear functions. This includes the study of Droste, Jansen and Wegener [4] who were the first to obtain an upper bound of O(nlogn)O(n\log n) for the (1+1) EA on the general class of pseudo-Boolean linear functions. This groundbreaking result has been based on a very lengthy proof and subsequently a wide range of improvements have been made in terms of the development of new techniques for the analysis as well as the precision of the results. The proof has been simplified significantly using the analytic framework of drift analysis [5] by He and Yao [6]. Jägersküpper [7, 8] provided the first analysis of the leading coefficient in the bound O(nlogn)O(n\log n) on the the optimisation time for the problem. Furthermore, advances to simplify proofs and getting precise results have been made using the framework of multiplicative drift [9]. Doerr, Johannsen and Winzen improved the upper bound result to (1.39+o(1))enlnn(1.39+o(1))en\ln n [10]. Finally, Witt [11] improved this bound to enlnn+O(n)en\ln n+O(n) by using adaptive drift analysis [12, 13]. We expand such investigations for the (1+1) EA into a wider class of problems that are modelled by two transformed linear functions. This includes classes of separable functions and chance constrained optimization problems.

1.1 Separable Functions

As an example, consider the separable objective function

f(x)=(i=1n/2wixi)2+i=n/2+1nwixif(x)=\left(\sum_{i=1}^{n/2}w_{i}x_{i}\right)^{2}+\sqrt{\sum_{i=n/2+1}^{n}w_{i}x_{i}} (1)

where wi+w_{i}\in\mathds{Z}^{+}, 1in1\leq i\leq n, and x=(x1,,xn){0,1}nx=(x_{1},\dots,x_{n})\in\{0,1\}^{n}. The function ff consists of two objective functions

f1(x1,,xn/2)=(i=1n/2wixi)2 and f2(xn/2+1,,xn)=i=n/2+1nwixi.f_{1}(x_{1},\ldots,x_{n/2})=\left(\sum_{i=1}^{n/2}w_{i}x_{i}\right)^{2}\text{ and }f_{2}(x_{n/2+1},\ldots,x_{n})=\sqrt{\sum_{i=n/2+1}^{n}w_{i}x_{i}}.

Here f1f_{1} is the square of a function linear in the first half of variables and f2f_{2} is the square root of a linear function in the remaining variables. Some investigations on how evolutionary algorithms optimize separable fitness functions have been carried out in [14]. It has been shown that if the different functions only have a small range, then the (1+1) EA optimizes separable functions efficiently if the different separable functions themselves are easy to be optimized. However, in our example above the two separable functions may take on exponentially many values but both functions on their own are optimized by the (1+1) EA in time O(nlogn)O(n\log n) using the results for the (1+1) EA on linear functions. This holds as the transformation applying the square in f1f_{1} or the square root in f2f_{2} does not change the behavior of the (1+1) EA. The questions arises whether the O(nlogn)O(n\log n) bounds also holds for the function ff which combines f1f_{1} and f2f_{2}. We investigate this setting of separable functions for the more general case where the objective function is given as a weighted sum of two separable transformed linear functions. For technical reasons, we consider a (1+1) EA with potentially reduced mutation probability depending on the number of overlapping bits of the two functions.

1.2 Chance Constrained Problems

Another motivation for our work comes from problems from the area of chance constrained optimization [15] and considers the case where the two functions are overlapping or are even defined on the same set of variables. Recently evolutionary algorithms have been used for chance constrained problems which motivates our investigations. In a chance constrained setting the input involves stochastic components and the goal is to optimize a given objective function under the condition that constraints are met with high probability or that function values are guaranteed with a high probability. Evolutionary algorithms have been designed for the chance constrained knapsack problem [16, 17, 18], chance constrained stock pile blending problems [19], and chance constrained submodular functions [20].

Runtime analysis results have been obtained for restricted settings of the knapsack problem [21, 22] where the weights are stochastic and the constraint bound has to be met with high probability. The analysis for the case of stochastic constraints and the class of submodular function [23] and the knapsack problem [16] already reveal constraint functions that are a linear combination of the expected weight and the standard deviation of a solution when using Chebyshev’s inequality for constraint evaluation. Such functions are the subject of our investigations.

To make the type of problems that we are interested in clear, we state the following problem. Given a set of mm items E={e1,,em}E=\{e_{1},\ldots,e_{m}\} with random weights wiw_{i}, 1im1\leq i\leq m. We assume that the weights are independent and each wiw_{i} is distributed according to a normal distribution N(μi,σi2)N(\mu_{i},\sigma_{i}^{2}), 1im1\leq i\leq m. We assume μi0\mu_{i}\geq 0 and σi0\sigma_{i}\geq 0, 1im1\leq i\leq m. Our goal is to

minW subject to Pr(w(x)W)α\min W\text{\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ subject to\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ }\operatorname{Pr}(w(x)\leq W)\geq\alpha (2)

where w(x)=i=1nwixiw(x)=\sum_{i=1}^{n}w_{i}x_{i}, x{0,1}mx\in\{0,1\}^{m}, and α]0,1[\alpha\in\mathopen{]}0,1\mathclose{[}. The problem given in Equation (2) is usually considered under additional constraints, e.g. spanning tree constraints in [24], which we do not consider in this paper.

According to [24] the problem given in Equation 2 is equivalent to minimizing the fitness function

g(x)=i=1mμixi+Kα(i=1mσi2xi)1/2g(x)=\sum_{i=1}^{m}\mu_{i}x_{i}+K_{\alpha}\left(\sum_{i=1}^{m}\sigma_{i}^{2}x_{i}\right)^{1/2} (3)

where KαK_{\alpha} is the α\alpha-fractile point of the standard normal distribution.

The fitness function gg is a linear combination of the expected value of a solution which is a linear function and the square root of its variance where the variance is again a linear function. In order to understand the behaviour of evolutionary algorithms on fitness functions obtained for chance constrained optimization problems, our runtime analysis for the (1+1) EA covers such fitness functions if we assume the reduced mutation probability mentioned above.

1.3 Transformed Linear Functions

In our investigations, we consider the much wider class of problems where a given fitness function is obtained by the linear combination of two transformed linear functions. The transformations applied to the linear functions only have to be monotonically increasing in terms of the functions values of the linear functions. This includes the setting of separable functions and chance constrained problems described previously. Furthermore, we do not require that the two linear functions are defined on the same number of bits.

The main result of our paper is an O(nlogn)O(n\log n) upper bound for the (1+1) EA with mutation probability 1/(n+s)1/(n+s) on the class of sums of two transformed linear functions where ss is the number of bits for which the two linear functions overlap. This directly transfers to the separable problem type given in Equation 1 with standard bit mutation probability 1/n1/n and to the chance constraint formulation given in Equation 3 when using mutation probability 1/(2n)1/(2n).

The outline of the paper is as follows. In Section 2, we formally introduce the problem formulation for which we analyze the (1+1) EA in this paper. We discuss the exclusion of negative weights in our setup in Section 3 and present the O(nlogn)O(n\log n) bound in Section 4. Finally, we finish with some discussion and conclusions.

2 Preliminaries

The (1+1) EA shown in Algorithm 1 (generalized with a parameter ss discussed below; classically s=0s=0 is assumed) is a simple evolutionary algorithm using independent bit flips and elitist selection. It is very well studied in the theory of evolutionary computation [25] and serves as a stepping stone towards the analysis of more complicated evolutionary algorithms. As common, in the area of runtime analysis, we measure the run time of the (1+1) EA by the number of iterations of the repeat loop. The optimization time refers to the number of fitness evaluations until an optimal solution has been obtained for the first time, and the expected optimization time refers to the expectation of this value.

2.1 Sums of Two Transformed Linear Functions without Constraints

We will study the (1+1) EA on the scenario given in (1) and (3), assuming no additional constraints. In fact, we will generalize the scenario to the sum of two transformed pseudo-Boolean linear functions which may be (partially) overlapping. Note, that in (1) there is no overlap on the domains of the two linear functions and the transformations are the square and the square root, whereas in (3) there is complete overlap on the domains and the transformations are the identity function and the square root.

The crucial observation in our analysis is that the scenario considered here extends the linear function problem [11] that is heavily investigated in the theory of evolutionary algorithms. Despite the simple structure of the problem, there is no clear fitness-distance correlation in the linear function problem, which makes the analysis of the global search operator of the (1+1) EA difficult. If only local mutations are used, leading to the well known randomized local search (RLS) algorithm [26], then both the linear function problem and the generalized scenario considered here are very easy to analyze using standard coupon collector arguments [27], leading to O(nlogn)O(n\log n) expected optimization time. For the globally searching (1+1) EA, we will obtain the same bound, proving that the problem is easy to solve for it; however, we need advanced drift analysis methods to prove this.

We note that the class of functions we consider falls within the more general class of so-called monotone functions. Such functions can be difficult to optimize with a (1+1) EA using mutation probabilities larger than 1/n1/n [28]; however, it is also known that the standard (1+1) EA with mutation probability 1/n1/n as considered here optimizes all monotone functions in expected time O(nlog2n)O(n\log^{2}n) [29]. Our bound is by an asymptotic factor of logn\log n better if s=o(n)s=o(n). However, it should be noted that for s=Ω(n)s=\Omega(n), the bound O(nlogn)O(n\log n) already follows directly from [28] since it corresponds to a mutation probability of c/nc/n for a constant c<1c<1. In fact, the fitness function g(x)g(x) arising from the chance-constrained scenario presented in (3) above would fall into the case s=n/2s=n/2.

1 Choose x{0,1}nsx\in\{0,1\}^{n-s} uniformly at random;
2 repeat
3       Create yy by flipping each bit xix_{i} of xx with probability p=1np=\frac{1}{n};
4       if f(y)f(x)f(y)\leq f(x) then
5             xy;x\leftarrow y;
6      
7until 𝑠𝑡𝑜𝑝\mathit{stop};
Algorithm 1 (1+1) EA for minimization of a pseudo-Boolean function f:{0,1}nsf\colon\{0,1\}^{n-s}\to\mathbb{R}, where s{0,,n/2}s\in\{0,\dots,n/2\}

Set-up.

We will investigate a general optimization scenario involving two linear pseudo-Boolean functions in an unconstrained search space. The objective function is an arbitrarily weighted sum of monotone transformations of two linear functions defined on (possibly overlapping) subspaces of {0,1}ns\{0,1\}^{n-s} for some s0s\geq 0, where ss denotes the number of shared bits. Note that the introduction of this paper mentions a search space of dimension nn and a mutation probability of p=1/(n+s)p=1/(n+s) for the (1+1) EA. While the former perspective is more natural to present, from now on, we consider the asymptotically equivalent setting of search space dimension nsn-s and mutation probability p=1/np=1/n, which eases notation in the upcoming calculations.

Let α\alpha be a constant such that 1/2αln(2ϵ)0.693ϵ/21/2\leq\alpha\leq\ln(2-\epsilon)\approx 0.693-\epsilon/2 for some constant ϵ>0\epsilon>0 and assume that αn\alpha n is an integer. We allow the subfunctions to depend on a number of bits in [(1α)n,αn[(1-\alpha)n,\alpha n], including the balanced case that both subfunctions depend on exactly n/2n/2 bits. Formally, we have

  • linear functions 1:{0,1}αn and 2:{0,1}(1α)n,\ell_{1}\colon\{0,1\}^{\alpha n}\to\mathbb{R}\text{ and }\ell_{2}\colon\{0,1\}^{(1-\alpha)n}\to\mathbb{R}, where 1(y1,,yαn)=i=1αnwi(1)yi\ell_{1}(y_{1},\dots,y_{\alpha n})=\sum_{i=1}^{\alpha n}w^{(1)}_{i}y_{i}, and similarly 2(z1,,z(1α)n)=i=1(1α)nwi(2)zi\ell_{2}(z_{1},\dots,z_{(1-\alpha)n})=\sum_{i=1}^{(1-\alpha)n}w^{(2)}_{i}z_{i} with non-negative weights wi(1)w^{(1)}_{i} and wi(2)w^{(2)}_{i}.

  • B1{1,,n}B_{1}\subseteq\{1,\dots,n\} and B2{1,,n}B_{2}\subseteq\{1,\dots,n\}, denoting the bit positions that 1\ell_{1} resp. 2\ell_{2} are defined on in the actual objective function f:{0,1}nsf\colon\{0,1\}^{n-s}\to\mathbb{R}.

  • The overlap count s|B1B2|s\coloneqq\lvert B_{1}\cap B_{2}\rvert, where smin{(1α)n,αn}=(1α)nn/2s\leq\min\{(1-\alpha)n,\alpha n\}=(1-\alpha)n\leq n/2

  • the linear functions with extended domain 1(x1,,xns)=iB1wr(1)(i)(1)xi\ell^{*}_{1}(x_{1},\dots,x_{n-s})=\sum_{i\in B_{1}}w^{(1)}_{r^{(1)}(i)}x_{i} where r(1)(i)r^{(1)}(i) is the rank of ii in B1B_{1} (with the smallest number receiving rank number 11); and analogously 2(x1,,xns)=iB2wr(2)(i)(2)xi\ell^{*}_{2}(x_{1},\dots,x_{n-s})=\sum_{i\in B_{2}}w^{(2)}_{r^{(2)}(i)}x_{i}; note that 1\ell_{1}^{*} and 2\ell_{2}^{*} only depend essentially on αn\alpha n and (1α)n(1-\alpha)n bits, respectively.

  • monotone increasing functions h1:h_{1}\colon\mathbb{R}\to\mathbb{R} and h2:h_{2}\colon\mathbb{R}\to\mathbb{R}.

Then the objective function f:{0,1}nsf\colon\{0,1\}^{n-s}\to\mathbb{R}, which w. l. o. g. is to be minimized, is given by

f(x1,,xns)=h1(1(x1,,xns))+h2(2(x1,,xns)).f(x_{1},\dots,x_{n-s})=h_{1}(\ell^{*}_{1}(x_{1},\dots,x_{n-s}))+h_{2}(\ell^{*}_{2}(x_{1},\dots,x_{n-s})).

For s=0s=0, h1h_{1} being the square function, and h2h_{2} being the square root function, this matches the setting of separable functions given in Equation 1. This set-up also includes the case that

f(x1,,xm)=1(x1,,xm)+R2(x1,,xm)f(x_{1},\dots,x_{m})=\ell_{1}(x_{1},\dots,x_{m})+R\sqrt{\ell_{2}(x_{1},\dots,x_{m})}

for two mm-dimensional, completely overlapping linear functions 1\ell_{1} and 2\ell_{2} and an arbitrary factor R0R\geq 0, as motivated and given in Equation 3. Note that this matches our set-up with n=2mn=2m and s=ns=n.

For our analysis we will make use of the multiplicative drift theorem (Theorem 1) that has been introduced in [30] and was enhanced with tail bounds by [13]. We use a slightly generalised presentation that can be found in [31].

Theorem 1 (Multiplicative Drift, cf. [30, 13, 31]).

Let (Xt)t0(X_{t})_{t\geq 0}, be a stochastic process, adapted to a filtration t\mathcal{F}_{t}, over some state space S{0}[smin,smax]S\subseteq\{0\}\cup[s_{\mathrm{min}},s_{\mathrm{max}}], where 0S0\in S and smin>0s_{\mathrm{min}}>0. Suppose that there exists a δ>0\delta>0 such that for all t0t\geq 0

E(XtXt+1t)δXt.\mathord{\mathrm{E}}\mathord{\left(X_{t}-X_{t+1}\mid\mathcal{F}_{t}\right)}\geq\delta X_{t}.

Then it holds for the first hitting time T:=min{tXt=0}T:=\min\{t\mid X_{t}=0\} that

E(T0)ln(X0/smin)+1δ.\mathord{\mathrm{E}}\mathord{\left(T\mid\mathcal{F}_{0}\right)}\leq\frac{\ln(X_{0}/s_{\mathrm{min}})+1}{\delta}.

Moreover, Pr(T>(ln(X0/smin)+r)/δ)er\operatorname{Pr}(T>(\ln(X_{0}/s_{\mathrm{min}})+r)/\delta)\leq e^{-r} for any r>0r>0.

3 Negative Weights Allow for Multimodal Functions

We will now justify that the inclusion of negative weights in the underlying linear functions, along with overlapping domains, can lead to multimodal problems that cannot be optimized in expected time O(nlogn)O(n\log n) any longer. In the following example, the two linear functions depend essentially on all nn bits.

Let

f(x1,,xn)=(x12+i=2nxi)h1(1(x))+(i=1n(1xi)n0.5)n2h2(2(x))f(x_{1},\dots,x_{n})=\underbrace{\left(\frac{x_{1}}{2}+\sum_{i=2}^{n}x_{i}\right)}_{h_{1}(\ell_{1}(x))}+\underbrace{\left(\frac{\sum_{i=1}^{n}(1-x_{i})}{n-0.5}\right)^{n^{2}}}_{h_{2}(\ell_{2}(x))}

Basically, the first linear function 1(x)=x1/2+i=2nxi\ell_{1}(x)=x_{1}/2+\sum_{i=2}^{n}x_{i} is a OneMax function except for the first bit that has a smaller weight than the rest. The second linear function 2(x)\ell_{2}(x) is linear in the number of zeros, i. e., corresponds to the ZeroMax function that is equivalent to OneMax for the (1+1) EA due to symmetry reasons. The transformation h2h_{2} that is applied to ZeroMax divides the number of zero-bits by n0.5n-0.5 and raises the result to a large power. Essentially, the value of h2(z)h_{2}(z) is eΘ(n)e^{\Theta(n)} if z=nz=n and eΘ(n)e^{-\Theta(n)} otherwise. This puts a constraint on the number of zero-bits. If |x|11\lvert x\rvert_{1}\geq 1, then ff is monotone increasing in 1\ell_{1}, i. e., search points decreasing the 1\ell_{1}-value also decrease the ff-value. However, the all-zeros string has the largest ff-value, i. e., is worst.

We can now see that all search points having one exactly one one-bit at one of the positions 2,,n2,\dots,n are local optima. To create the global optimum from such a point, two bits have to flip simultaneously, leading to Ω(n2)\Omega(n^{2}) expected time to reach the optimum from the point. The situation is similar to the optimization of linear function under uniform constraints [32].

4 Upper Bound

The following theorem is the main result of this paper, showing that the (1+1) EA can optimize the generalized class of functions in asymptotically the same time as an ordinary linear function.

Theorem 2.

Let ff be the sum of two transformed linear functions as defined in the set-up in Section 2.1. Then the expected optimization time of the (1+1) EA on ff is O(nlogn)O(n\log n).

The proof of Theorem 2 uses drift analysis with a carefully defined potential function, explained in the following.

Potential function.

We build upon the approach from [11] to construct a potential function g(1)g^{(1)} for 1\ell_{1} and a potential function g(2)g^{(2)} for 2\ell_{2}, resulting in a combined potential function ϕ(x)=g(1)(x)+g(2)(x)\phi(x)=g^{(1)}(x)+g^{(2)}(x). The individual potential functions are obtained in the same way as if the (1+1) EA with mutation probability 1/n1/n was only optimizing 1\ell_{1} and 2\ell_{2}, respectively, on an αn\alpha n-dimensional and (1α)n(1-\alpha)n-dimensional search space, respectively. The key idea is that accepted steps of the (1+1) EA on gg must improve at least one of the two functions 1\ell_{1} and 2\ell_{2}. This event leads to a high enough drift of the respective potential function that is still positive after pessimistically incorporating the potential loss due to flipping zero-bits that only the other linear function depends on.

We proceed with the definition of the potential functions g(1)g^{(1)} and g(2)g^{(2)} (similarly to Section 5 in [11]). For the two underlying linear functions we assume their arguments are reordered according to increasing weights. Note we cannot necessarily sort the set of all indices 1,,ns1,\dots,n-s of the function ff so that both underlying linear functions have increasing coefficients; however, as we analyze the underlying functions separately, we can each time use the required sorting in these separate considerations.

Definition 1.

Given a linear function i=1kwixi\sum_{i=1}^{k}w_{i}x_{i}, where w1wkw_{1}\leq\dots\leq w_{k}, we define the potential function g(x1,,xk)=i=1kgixig(x_{1},\dots,x_{k})=\sum_{i=1}^{k}g_{i}x_{i} by

gi=(1+1n)min{jiwj=wi}1.g_{i}=\left(1+\frac{1}{n}\right)^{\min\{j\leq i\;\mid\;w_{j}=w_{i}\}-1}.

In our scenario, g(1)(z)g^{(1)}(z) is the potential function obtained from applying this construction to the αn\alpha n-dimensional linear function 1(z)\ell_{1}(z), and proceeding accordingly with the (1α)n(1-\alpha)n-dimensional function g(2)(y)g^{(2)}(y) and 2(y)\ell_{2}(y). Finally, we define ϕ(x)=g(1)(z)+g(2)(y)\phi(x)=g^{(1)}(z)+g^{(2)}(y).

We can now give the proof of our main theorem.

Proof of Theorem 2.

Using the potential function from Definition 1, we analyze the (1+1) EA on ff, assume an arbitrary, non-optimal search point xt{0,1}nsx_{t}\in\{0,1\}^{n-s} and consider the expected change of gg from time tt to time t+1t+1. We consider an accepted step where the offspring differs from the parent since this is necessary for gg to change. That is, at least one 11-bit flips and ff does not grow. Let AA be the event that an offspring xxtx^{\prime}\neq x_{t} is accepted. For AA to occur, it is necessary that at least one of the two functions 1\ell_{1} and 2\ell_{2} does not grow. Since the two cases can be handled in an essentially symmetrically manner (they become perfectly symmetrical for α=n/2\alpha=n/2), we only analyze the case that 2\ell_{2} does not grow and that at least one bit in B2B_{2} is flipped from 11 to 0. Hence, we consider exactly the situation that the (1+1) EA with the linear function 2\ell_{2} as (1α)n(1-\alpha)n-bit fitness function produces an offspring that is accepted and different from the parent.

Let Yt=g(2)(yt)Y_{t}=g^{(2)}(y_{t}), where yty_{t} is the restriction of the search point xtx_{t} at time tt to the (1α)n(1-\alpha)n bits in B2B_{2} that g(2)g^{(2)} depends on, assuming the indices of xtx_{t} to be reordered with respect to increasing coefficients w1(2),,w(1α)n(2)w^{(2)}_{1},\dots,w^{(2)}_{(1-\alpha)n}. To compute the drift of gg, we distinguish between several cases and events in a way similar to the proof of Th. 5.1 in [11]. Each of these cases first bounds the drift of YtY_{t} sufficiently precisely and then adds a pessimistic estimate of the drift of Zt=g(1)(zt)Z_{t}=g^{(1)}(z_{t}), which corresponds to the other linear function on bits from B1B_{1}, i. e., the function whose value may grow under the event AA. Note that g(1)g^{(1)} depends on at least as many bits as g(2)g^{(2)} does.

Since the estimate of the drift of ZtZ_{t} is always the same, we present it first. Let Z~t+1\tilde{Z}_{t+1} denote the g(1)g^{(1)}-value of the mutated bit string xx^{\prime} (restricted to the bits in B1B_{1}). If xx^{\prime} is accepted, then Zt+1=Z~t+1Z_{t+1}=\tilde{Z}_{t+1}; otherwise Zt+1=ZtZ_{t+1}=Z_{t}. If we pessimistically assume that each bit in ztz_{t} (i. e., the restriction of xtx_{t} to the bits in B1B_{1}) is a zero-bit that can flip to 11, we obtain the upper bound

E(Z~t+1ZtZt)\displaystyle\mathord{\mathrm{E}}\mathord{\left(\tilde{Z}_{t+1}-Z_{t}\mid Z_{t}\right)} 1ni=1αn(1+1n)i1=1n(1+1n)αn111/n\displaystyle\leq\frac{1}{n}\sum_{i=1}^{\alpha n}\left(1+\frac{1}{n}\right)^{i-1}=\frac{1}{n}\frac{\left(1+\frac{1}{n}\right)^{\alpha n-1}-1}{1/n}
eα1eln(2ϵ)11ϵ,\displaystyle\leq e^{\alpha}-1\leq e^{\ln(2-\epsilon)}-1\leq 1-\epsilon, (4)

where we use that α<ln(2ϵ)\alpha<\ln(2-\epsilon) for some constant ϵ>0\epsilon>0. Also, since Zt+1=ZtZ_{t+1}=Z_{t} if the mutation is rejected and we only consider flipping zero-bits, we have under AA (the event that xx^{\prime} is accepted) that

E(Zt+1ZtZt;A)E(Z~t+1ZtZt)1ϵ.\mathord{\mathrm{E}}\mathord{\left(Z_{t+1}-Z_{t}\mid Z_{t};A\right)}\leq\mathord{\mathrm{E}}\mathord{\left(\tilde{Z}_{t+1}-Z_{t}\mid Z_{t}\right)}\leq 1-\epsilon. (5)

Note that the estimations (4) and (5) include the case that ss of the bits in ztz_{t} are shared with the input string yty_{t} of the other linear function g(2)(yt)g^{(2)}(y_{t}).

We next conduct the detailed drift analysis to bound E(ϕ(xt)ϕ(xt+1)xt)\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t}\right)}, considering certain events necessary for AA. Two different cases are considered.

Case 1: at least two one-bits in yty_{t} flip (event S1S_{1}). Let Y~t+1\tilde{Y}_{t+1} denote the g(2)g^{(2)}-value of the mutated bit string xx^{\prime}, restricted to the bits in B2B_{2}, under event S1S_{1} before selection. If xx^{\prime} is accepted, then Yt+1=Y~t+1Y_{t+1}=\tilde{Y}_{t+1}; otherwise Yt+1=YtY_{t+1}=Y_{t}. Since gi1g_{i}\geq 1 for all ii, every zero-bit in yty_{t} flips to one with probability at most 1/n1/n, and (1α)α(1-\alpha)\leq\alpha, we can re-use the estimations from (4). Bounding the contribution of the flipping one-bits from below by 22, we obtain

E(YtY~t+1Yt;S1)21ni=1(1α)n(1+1n)i1\displaystyle\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{1}\right)}\geq 2-\frac{1}{n}\sum_{i=1}^{(1-\alpha)n}\left(1+\frac{1}{n}\right)^{i-1}
21ni=1αn(1+1n)i12(1ϵ)=1+ϵ.\displaystyle\quad\geq 2-\frac{1}{n}\sum_{i=1}^{\alpha n}\left(1+\frac{1}{n}\right)^{i-1}\geq 2-(1-\epsilon)=1+\epsilon.

Along with (4), we have

E(ϕ(xt)ϕ(x)xt;S1)\displaystyle\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x^{\prime})\mid x_{t};S_{1}\right)} E(YtY~t+1Yt;S1)E(ZtZ~t+1Zt;S1)\displaystyle\geq\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{1}\right)}-\mathord{\mathrm{E}}\mathord{\left(Z_{t}-\tilde{Z}_{t+1}\mid Z_{t};S_{1}\right)}
2(1ϵ)(1ϵ)>ϵ.\displaystyle\geq 2-(1-\epsilon)-(1-\epsilon)>\epsilon.

Since the drift of ϕ\phi is non-negative in Case 11, we estimate it from below by 0 regardless of whether AA occurs or not and focus only on the event defined in the following case.

Case 2: exactly one one-bit in yty_{t} flips (event S2S_{2}). Let ii^{*} denote the random index of the flipping one-bit in yty_{t}. Moreover, let the function β(i)=min{jiwj(2)=wi(2)}\beta(i)=\min\{j\leq i\mid w^{(2)}_{j}=w^{(2)}_{i}\} denote the smallest index at most ii with the same weight as wi(2)w^{(2)}_{i}, i. e., β(i)1\beta(i)-1 is the largest index of a strictly smaller weight; using our assumption that the weights are monotonically increasing with their index. If at least one zero-bit having the same or a larger weight than bit ii^{*} flips, neither 2\ell_{2} nor g2g_{2} change (because the offspring has the same function value or is rejected); hence, we now, without loss of generality, only consider the subevents of S2S_{2} where all flipping zero-bits have an index of at most β(i)\beta(i^{*}). (This reasoning is similar to the analysis of Subcase 2.2.2 in the proof of Th. 5 from [11].)

Redefining notation, let Y~t+1\tilde{Y}_{t+1} denote the g(2)g^{(2)}-value of the mutated bit string xx^{\prime} (restricted to the bits in B2B_{2}) under event S2S_{2} before selection. If xx^{\prime} is accepted, then Yt+1=Y~t+1Y_{t+1}=\tilde{Y}_{t+1}; otherwise Yt+1=YtY_{t+1}=Y_{t}. Recalling that AA is the event that the mutation xx^{\prime} is accepted, we have by the law of total probability

E(YtYt+1Yt;S2)\displaystyle\mathord{\mathrm{E}}\mathord{\left(Y_{t}-Y_{t+1}\mid Y_{t};S_{2}\right)} =Pr(AS2)E(YtY~t+1Yt;AS2)\displaystyle=\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\cdot\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};A\cap S_{2}\right)}
Pr(AS2)E(YtY~t+1Yt;S2),\displaystyle\geq\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\cdot\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{2}\right)},

where the inequality holds since the our estimation of E(YtY~t+1Yt;S2)\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{2}\right)} below will consider exactly one one-bit to flip and assume all zero-bits to flip independently, even though already steps flipping two zero-bits right of β(i)\beta(i^{*}) may be rejected.

Moreover, using the law of total probability and (5),

E(Zt+1ZtZt;S2)Pr(AS2)E(Z~t+1ZtZt;S2)\mathord{\mathrm{E}}\mathord{\left(Z_{t+1}-Z_{t}\mid Z_{t};S_{2}\right)}\leq\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\cdot\mathord{\mathrm{E}}\mathord{\left(\tilde{Z}_{t+1}-Z_{t}\mid Z_{t};S_{2}\right)}

and therefore

E(ϕ(xt)ϕ(xt+1)xt;S2)\displaystyle\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};S_{2}\right)} =Pr(AS2)E(ϕ(xt)ϕ(x)Yt;AS2)\displaystyle=\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\cdot\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x^{\prime})\mid Y_{t};A\cap S_{2}\right)}
Pr(AS2)E((YtY~t+1)(Z~t+1Zt)xt;S2)\displaystyle\geq\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\mathord{\mathrm{E}}\mathord{\left((Y_{t}-\tilde{Y}_{t+1})-(\tilde{Z}_{t+1}-Z_{t})\mid x_{t};S_{2}\right)} (6)

It holds that Pr(AS2)(11/n)n1e1\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\geq(1-1/n)^{n-1}\geq e^{-1} since the mutation flipping ii^{*} is certainly accepted if no other bits flip. To bound the drift, we use that every zero-bit jj right of β(i)\beta(i^{*}) flips with probability 1/n1/n and contributes gj(2)g^{(2)}_{j} to the difference YtY~t+1Y_{t}-\tilde{Y}_{t+1}. Moreover, the flip of ii^{*} contributes the term gi(2)g^{(2)}_{i^{*}} to the difference. Altogether,

E(YtY~t+1Yt;S2)=(1+1n)β(i)11nj=1β(i)1(1+1n)j1\displaystyle\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{2}\right)}=\left(1+\frac{1}{n}\right)^{\beta(i^{*})-1}-\frac{1}{n}\sum_{j=1}^{\beta(i^{*})-1}\left(1+\frac{1}{n}\right)^{j-1}
=(1+1n)β(i)11n((1+1n)β(i)111/n)=1.\displaystyle=\left(1+\frac{1}{n}\right)^{\beta(i^{*})-1}-\frac{1}{n}\left(\frac{\left(1+\frac{1}{n}\right)^{\beta(i^{*})-1}-1}{1/n}\right)=1.

Combining this with (5), we have

E(ϕ(xt)ϕ(x)xt;S2)=E((YtY~t+1)(Z~t+1Zt)xt;S2)1(1ϵ)=ϵ.\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x^{\prime})\mid x_{t};S_{2}\right)}=\mathord{\mathrm{E}}\mathord{\left((Y_{t}-\tilde{Y}_{t+1})-(\tilde{Z}_{t+1}-Z_{t})\mid x_{t};S_{2}\right)}\geq 1-(1-\epsilon)=\epsilon.

Altogether, using (6) and our lower bound Pr(AS2)e1\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\geq e^{-1}, we have the following lower bound on the drift under S2S_{2}:

E(ϕ(xt)ϕ(xt+1)xt;S2)e1ϵ.\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};S_{2}\right)}\geq e^{-1}\epsilon.

Finally, we compute the total drift considering all possible one-bits that can flip under S2S_{2}. Let II be the set of one-bits in the whole bit string xtx_{t}. Since the analysis is analogous when considering an index iIi\in I, we still consider the situation that the corresponding linear function decreases or stays the same if iB2i\in B_{2}, i. e., ii belongs to yty_{t} and remark that an analogous event AA^{\prime} with respect to the bits B1B_{1} and the string ztz_{t} can be analyzed in the same way.

Now, for iIi\in I, let FiF_{i} denote the event that bit ii is the only flipping one-bit in the considered part of the bit string and let FF be the event that exactly one bit from II flips. We have for all iIi\in I that

E(ϕ(xt)ϕ(xt+1)xt;Fi)e1ϵ.\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};F_{i}\right)}\geq e^{-1}\epsilon.

and therefore also E(ϕ(xt)ϕ(xt+1)xt;F)e1ϵ\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};F\right)}\geq e^{-1}\epsilon. It is sufficient to flip one of the |I|\lvert I\rvert one-bits and no other bit to have an accepted mutation, which has probability at least (|I|/n)(11/n)n1|I|en(\lvert I\rvert/n)(1-1/n)^{n-1}\geq\frac{\lvert I\rvert}{en}. We obtain the unconditional drift

E(ϕ(xt)ϕ(xt+1)xt)|I|enE(ϕ(xt)ϕ(xt+1)xt;Fi)|I|e2nϵ,\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t}\right)}\geq\frac{\lvert I\rvert}{en}\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};F_{i}\right)}\geq\frac{\lvert I\rvert e^{-2}}{n}\epsilon,

recalling that we estimated the drift from below by 0 if at least two one-bits flip. To conclude the proof, we relate the last bound to ϕ(xt)\phi(x_{t}). Clearly, since gi(1+1/n)n1eg_{i}\leq(1+1/n)^{n-1}\leq e for all i{1,,αn}i\in\{1,\dots,\alpha n\} and since each one-bit can contribute to both g(1)(xt)g^{(1)}(x_{t}) and g(2)(yt)g^{(2)}(y_{t}), we have ϕ(xt)2e|I|\phi(x_{t})\leq 2e\lvert I\rvert so that

E(ϕ(xt)ϕ(xt+1)xt)(e3ϵ2n)ϕ(xt).\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t}\right)}\geq\left(\frac{e^{-3}\epsilon}{2n}\right)\phi(x_{t}).

Hence, we have established a multiplicative drift of the potential ϕ\phi with a factor of δ=(e3ϵ)/(2n)\delta=(e^{-3}\epsilon)/(2n) and we obtain the claimed O(nlogn)O(n\log n) bound on the expected optimization time via the multiplicative drift theorem (Theorem 1), using X0n(1+1/(n1))n=O(n)X_{0}\leq n(1+1/(n-1))^{n}=O(n) and smin=1s_{\mathrm{min}}=1. ∎

We remark that the drift factor (e3ϵ)/n(e^{-3}\epsilon)/n from the previous proof can be improved by constant factors using a more detailed case analysis; however, since ϵ\epsilon can be arbitrarily small and the final bound is in OO-notation, this does not seem worth the effort.

5 Discussion and Conclusions

Motivated by studies on separable functions and objective functions for chance constrained problems based on the expected value and variance of solutions, we investigated the quite general setting of the sum of two transformed linear functions and established an O(nlogn)O(n\log n) bound for the (1+1) EA.

We now would like to point out some topics for further investigations. Our result from Theorem 2 has some limitations. First of all, the domains of the two linear functions may not differ very much in size; more precisely they must be within a factor of α/(1α)(ln(2))/(1ln(2))2.26\alpha/(1-\alpha)\leq(\ln(2))/(1-\ln(2))\approx 2.26. With the current pessimistic assumption that an improving mutation only improves one of the two linear functions and simultaneously may flip any bit in the other function to 11 without the mutation being rejected, we cannot improve this to larger size differences for the domain. For the same reason, the result cannot easily be generalized to mutation probabilities c/nc/n for arbitrary constants c>0c>0 as shown for the original case of simple linear functions in [11]. Although that paper also suggests a different, more powerful class of potential functions to handle high mutation probabilities, it seems difficult to apply these more powerful potential functions in the presence of our pessimistic assumptions. With stronger conditions on α\alpha, it may be possible to extend the present results to mutation probabilities up to (1+ϵ)/(n+s)(1+\epsilon)/(n+s) for a positive constant ϵ\epsilon depending on α\alpha. However, it would be more interesting to see whether the O(nlogn)O(n\log n) bound would also hold for mutation probability 1/n1/n for all s1s\geq 1, which would include the function g(x)g(x) from the chance-constrained scenario in (3) for the usual mutation probability.

Acknowledgments

This work has been supported by the Australian Research Council (ARC) through grant FT200100536 and by the Independent Research Fund Denmark through grant DFF-FNU 8021-00260B.

References

  • [1] F. Neumann, C. Witt, Bioinspired Computation in Combinatorial Optimization – Algorithms and Their Computational Complexity, Springer, 2010.
  • [2] T. Jansen, Analyzing Evolutionary Algorithms – The Computer Science Perspective, Springer, 2013.
  • [3] B. Doerr, F. Neumann (Eds.), Theory of Evolutionary Computation—Recent Developments in Discrete Optimization, Springer, 2020.
  • [4] S. Droste, T. Jansen, I. Wegener, On the analysis of the (1+1) evolutionary algorithm, Theoretical Computer Science 276 (2002) 51–81.
  • [5] B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications, Advances in Applied Probability 13 (3) (1982) 502–525.
  • [6] J. He, X. Yao, A study of drift analysis for estimating computation time of evolutionary algorithms, Natural Computing 3 (1) (2004) 21–35.
  • [7] J. Jägersküpper, A blend of markov-chain and drift analysis, in: Proc. of PPSN ’08, Vol. 5199 of LNCS, Springer, 2008, pp. 41–51.
  • [8] J. Jägersküpper, Combining markov-chain analysis and drift analysis, Algorithmica 59 (3) (2011) 409–424.
  • [9] B. Doerr, D. Johannsen, C. Winzen, Multiplicative drift analysis, in: Proc. of GECCO ’10, ACM Press, 2010, pp. 1449–1456.
  • [10] B. Doerr, D. Johannsen, C. Winzen, Drift analysis and linear functions revisited, in: Proc. of CEC ’10, IEEE Press, 2010, pp. 1–8.
  • [11] C. Witt, Tight bounds on the optimization time of a randomized search heuristic on linear functions, Combinatorics, Probability & Computing 22 (2) (2013) 294–318.
  • [12] B. Doerr, L. A. Goldberg, Adaptive drift analysis, in: Proc. of PPSN ’10, Vol. 6238 of LNCS, Springer, 2010, pp. 32–41.
  • [13] B. Doerr, L. A. Goldberg, Adaptive drift analysis, Algorithmica 65 (1) (2013) 224–250.
  • [14] B. Doerr, D. Sudholt, C. Witt, When do evolutionary algorithms optimize separable functions in parallel?, in: FOGA, ACM, 2013, pp. 51–64.
  • [15] A. Charnes, W. W. Cooper, Chance-constrained programming, Management science 6 (1) (1959) 73–79.
  • [16] Y. Xie, O. Harper, H. Assimi, A. Neumann, F. Neumann, Evolutionary algorithms for the chance-constrained knapsack problem, in: GECCO, ACM, 2019, pp. 338–346.
  • [17] Y. Xie, A. Neumann, F. Neumann, Specific single- and multi-objective evolutionary algorithms for the chance-constrained knapsack problem, in: GECCO, ACM, 2020, pp. 271–279.
  • [18] H. Assimi, O. Harper, Y. Xie, A. Neumann, F. Neumann, Evolutionary bi-objective optimization for the dynamic chance-constrained knapsack problem based on tail bound objectives, in: ECAI, Vol. 325, IOS Press, 2020, pp. 307–314.
  • [19] Y. Xie, A. Neumann, F. Neumann, Heuristic strategies for solving complex interacting stockpile blending problem with chance constraints, in: GECCO, ACM, 2021, pp. 1079–1087.
  • [20] A. Neumann, F. Neumann, Optimising monotone chance-constrained submodular functions using evolutionary multi-objective algorithms, in: PPSN (1), Springer, 2020, pp. 404–417.
  • [21] F. Neumann, A. M. Sutton, Runtime analysis of the (1 + 1) evolutionary algorithm for the chance-constrained knapsack problem, in: FOGA, ACM, 2019, pp. 147–153.
  • [22] Y. Xie, A. Neumann, F. Neumann, A. M. Sutton, Runtime analysis of RLS and the (1+1) EA for the chance-constrained knapsack problem with correlated uniform weights, in: GECCO, ACM, 2021, pp. 1187–1194.
  • [23] B. Doerr, C. Doerr, A. Neumann, F. Neumann, A. M. Sutton, Optimization of chance-constrained submodular functions, in: AAAI, AAAI Press, 2020, pp. 1460–1467.
  • [24] H. Ishii, S. Shiode, T. Nishida, Y. Namasuya, Stochastic spanning tree problem, Discret. Appl. Math. 3 (4) (1981) 263–273.
  • [25] B. Doerr, Probabilistic tools for the analysis of randomized optimization heuristics, in: B. Doerr, F. Neumann (Eds.), Theory of Evolutionary Computation – Recent Developments in Discrete Optimization, Springer, 2020, pp. 1–87.
  • [26] B. Doerr, C. Doerr, The impact of random initialization on the runtime of randomized search heuristics, Algorithmica 75 (3) (2016) 529–553.
  • [27] R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.
  • [28] B. Doerr, T. Jansen, D. Sudholt, C. Winzen, C. Zarges, Mutation rate matters even when optimizing monotonic functions, Evol. Comput. 21 (1) (2013) 1–27.
  • [29] J. Lengler, A. Martinsson, A. Steger, When does hillclimbing fail on monotone functions: An entropy compression argument, in: ANALCO, SIAM, 2019, pp. 94–102.
  • [30] B. Doerr, D. Johannsen, C. Winzen, Multiplicative drift analysis, Algorithmica 64 (4) (2012) 673–697.
  • [31] P. K. Lehre, C. Witt, Tail bounds on hitting times of randomized search heuristics using variable drift analysis, Combinatorics, Probability & Computing 30 (4) (2021) 550–569.
  • [32] F. Neumann, M. Pourhassan, C. Witt, Improved runtime results for simple randomised search heuristics on linear functions with a uniform constraint, Algorithmica 83 (10) (2021) 3209–3237.