Runtime Analysis of the (1+1) EA on Weighted Sums of Transformed Linear Functions

Frank Neumann
Optimisation and Logistics
School of Computer Science
The University of Adelaide
Adelaide, Australia &Carsten Witt
Algorithms, Logic and Graphs
DTU Compute
Technical University of Denmark
2800 Kgs. Lyngby Denmark

Abstract

Linear functions play a key role in the runtime analysis of evolutionary algorithms and studies have provided a wide range of new insights and techniques for analyzing evolutionary computation methods. Motivated by studies on separable functions and the optimization behaviour of evolutionary algorithms as well as objective functions from the area of chance constrained optimization, we study the class of objective functions that are weighted sums of two transformed linear functions. Our results show that the (1+1) EA, with a mutation rate depending on the number of overlapping bits of the functions, obtains an optimal solution for these functions in expected time $O(n\log n)$ , thereby generalizing a well-known result for linear functions to a much wider range of problems.

1 Introduction

Runtime analysis is one of the major theoretical tools to provide rigorous insights into the working behavior of evolutionary algorithms and other randomized search heuristics [1, 2, 3]. The class of pseudo-Boolean linear functions plays a key role in the area of runtime analysis. Starting with the simplest linear functions called OneMax for which the first runtime analysis has been carried out, a wide range of results have been obtained for the general class of linear functions. This includes the study of Droste, Jansen and Wegener [4] who were the first to obtain an upper bound of $O(n\log n)$ for the (1+1) EA on the general class of pseudo-Boolean linear functions. This groundbreaking result has been based on a very lengthy proof and subsequently a wide range of improvements have been made in terms of the development of new techniques for the analysis as well as the precision of the results. The proof has been simplified significantly using the analytic framework of drift analysis [5] by He and Yao [6]. Jägersküpper [7, 8] provided the first analysis of the leading coefficient in the bound $O(n\log n)$ on the the optimisation time for the problem. Furthermore, advances to simplify proofs and getting precise results have been made using the framework of multiplicative drift [9]. Doerr, Johannsen and Winzen improved the upper bound result to $(1.39+o(1))en\ln n$ [10]. Finally, Witt [11] improved this bound to $en\ln n+O(n)$ by using adaptive drift analysis [12, 13]. We expand such investigations for the (1+1) EA into a wider class of problems that are modelled by two transformed linear functions. This includes classes of separable functions and chance constrained optimization problems.

1.1 Separable Functions

As an example, consider the separable objective function

f(x)=\left(\sum_{i=1}^{n/2}w_{i}x_{i}\right)^{2}+\sqrt{\sum_{i=n/2+1}^{n}w_{i}x_{i}}

(1)

where $w_{i}\in\mathds{Z}^{+}$ , $1\leq i\leq n$ , and $x=(x_{1},\dots,x_{n})\in\{0,1\}^{n}$ . The function $f$ consists of two objective functions

f_{1}(x_{1},\ldots,x_{n/2})=\left(\sum_{i=1}^{n/2}w_{i}x_{i}\right)^{2}\text{ and }f_{2}(x_{n/2+1},\ldots,x_{n})=\sqrt{\sum_{i=n/2+1}^{n}w_{i}x_{i}}.

Here $f_{1}$ is the square of a function linear in the first half of variables and $f_{2}$ is the square root of a linear function in the remaining variables. Some investigations on how evolutionary algorithms optimize separable fitness functions have been carried out in [14]. It has been shown that if the different functions only have a small range, then the (1+1) EA optimizes separable functions efficiently if the different separable functions themselves are easy to be optimized. However, in our example above the two separable functions may take on exponentially many values but both functions on their own are optimized by the (1+1) EA in time $O(n\log n)$ using the results for the (1+1) EA on linear functions. This holds as the transformation applying the square in $f_{1}$ or the square root in $f_{2}$ does not change the behavior of the (1+1) EA. The questions arises whether the $O(n\log n)$ bounds also holds for the function $f$ which combines $f_{1}$ and $f_{2}$ . We investigate this setting of separable functions for the more general case where the objective function is given as a weighted sum of two separable transformed linear functions. For technical reasons, we consider a (1+1) EA with potentially reduced mutation probability depending on the number of overlapping bits of the two functions.

1.2 Chance Constrained Problems

Another motivation for our work comes from problems from the area of chance constrained optimization [15] and considers the case where the two functions are overlapping or are even defined on the same set of variables. Recently evolutionary algorithms have been used for chance constrained problems which motivates our investigations. In a chance constrained setting the input involves stochastic components and the goal is to optimize a given objective function under the condition that constraints are met with high probability or that function values are guaranteed with a high probability. Evolutionary algorithms have been designed for the chance constrained knapsack problem [16, 17, 18], chance constrained stock pile blending problems [19], and chance constrained submodular functions [20].

Runtime analysis results have been obtained for restricted settings of the knapsack problem [21, 22] where the weights are stochastic and the constraint bound has to be met with high probability. The analysis for the case of stochastic constraints and the class of submodular function [23] and the knapsack problem [16] already reveal constraint functions that are a linear combination of the expected weight and the standard deviation of a solution when using Chebyshev’s inequality for constraint evaluation. Such functions are the subject of our investigations.

To make the type of problems that we are interested in clear, we state the following problem. Given a set of $m$ items $E=\{e_{1},\ldots,e_{m}\}$ with random weights $w_{i}$ , $1\leq i\leq m$ . We assume that the weights are independent and each $w_{i}$ is distributed according to a normal distribution $N(\mu_{i},\sigma_{i}^{2})$ , $1\leq i\leq m$ . We assume $\mu_{i}\geq 0$ and $\sigma_{i}\geq 0$ , $1\leq i\leq m$ . Our goal is to

\min W\text{\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ subject to\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ }\operatorname{Pr}(w(x)\leq W)\geq\alpha

(2)

where $w(x)=\sum_{i=1}^{n}w_{i}x_{i}$ , $x\in\{0,1\}^{m}$ , and $\alpha\in\mathopen{]}0,1\mathclose{[}$ . The problem given in Equation (2) is usually considered under additional constraints, e.g. spanning tree constraints in [24], which we do not consider in this paper.

According to [24] the problem given in Equation 2 is equivalent to minimizing the fitness function

g(x)=\sum_{i=1}^{m}\mu_{i}x_{i}+K_{\alpha}\left(\sum_{i=1}^{m}\sigma_{i}^{2}x_{i}\right)^{1/2}

(3)

where $K_{\alpha}$ is the $\alpha$ -fractile point of the standard normal distribution.

The fitness function $g$ is a linear combination of the expected value of a solution which is a linear function and the square root of its variance where the variance is again a linear function. In order to understand the behaviour of evolutionary algorithms on fitness functions obtained for chance constrained optimization problems, our runtime analysis for the (1+1) EA covers such fitness functions if we assume the reduced mutation probability mentioned above.

1.3 Transformed Linear Functions

In our investigations, we consider the much wider class of problems where a given fitness function is obtained by the linear combination of two transformed linear functions. The transformations applied to the linear functions only have to be monotonically increasing in terms of the functions values of the linear functions. This includes the setting of separable functions and chance constrained problems described previously. Furthermore, we do not require that the two linear functions are defined on the same number of bits.

The main result of our paper is an $O(n\log n)$ upper bound for the (1+1) EA with mutation probability $1/(n+s)$ on the class of sums of two transformed linear functions where $s$ is the number of bits for which the two linear functions overlap. This directly transfers to the separable problem type given in Equation 1 with standard bit mutation probability $1/n$ and to the chance constraint formulation given in Equation 3 when using mutation probability $1/(2n)$ .

The outline of the paper is as follows. In Section 2, we formally introduce the problem formulation for which we analyze the (1+1) EA in this paper. We discuss the exclusion of negative weights in our setup in Section 3 and present the $O(n\log n)$ bound in Section 4. Finally, we finish with some discussion and conclusions.

2 Preliminaries

The (1+1) EA shown in Algorithm 1 (generalized with a parameter $s$ discussed below; classically $s=0$ is assumed) is a simple evolutionary algorithm using independent bit flips and elitist selection. It is very well studied in the theory of evolutionary computation [25] and serves as a stepping stone towards the analysis of more complicated evolutionary algorithms. As common, in the area of runtime analysis, we measure the run time of the (1+1) EA by the number of iterations of the repeat loop. The optimization time refers to the number of fitness evaluations until an optimal solution has been obtained for the first time, and the expected optimization time refers to the expectation of this value.

2.1 Sums of Two Transformed Linear Functions without Constraints

We will study the (1+1) EA on the scenario given in (1) and (3), assuming no additional constraints. In fact, we will generalize the scenario to the sum of two transformed pseudo-Boolean linear functions which may be (partially) overlapping. Note, that in (1) there is no overlap on the domains of the two linear functions and the transformations are the square and the square root, whereas in (3) there is complete overlap on the domains and the transformations are the identity function and the square root.

The crucial observation in our analysis is that the scenario considered here extends the linear function problem [11] that is heavily investigated in the theory of evolutionary algorithms. Despite the simple structure of the problem, there is no clear fitness-distance correlation in the linear function problem, which makes the analysis of the global search operator of the (1+1) EA difficult. If only local mutations are used, leading to the well known randomized local search (RLS) algorithm [26], then both the linear function problem and the generalized scenario considered here are very easy to analyze using standard coupon collector arguments [27], leading to $O(n\log n)$ expected optimization time. For the globally searching (1+1) EA, we will obtain the same bound, proving that the problem is easy to solve for it; however, we need advanced drift analysis methods to prove this.

We note that the class of functions we consider falls within the more general class of so-called monotone functions. Such functions can be difficult to optimize with a (1+1) EA using mutation probabilities larger than $1/n$ [28]; however, it is also known that the standard (1+1) EA with mutation probability $1/n$ as considered here optimizes all monotone functions in expected time $O(n\log^{2}n)$ [29]. Our bound is by an asymptotic factor of $\log n$ better if $s=o(n)$ . However, it should be noted that for $s=\Omega(n)$ , the bound $O(n\log n)$ already follows directly from [28] since it corresponds to a mutation probability of $c/n$ for a constant $c<1$ . In fact, the fitness function $g(x)$ arising from the chance-constrained scenario presented in (3) above would fall into the case $s=n/2$ .

1 Choose

x\in\{0,1\}^{n-s}

uniformly at random;

2 repeat

3 Create

y

by flipping each bit

x_{i}

x

with probability

p=\frac{1}{n}

;

4 if $f(y)\leq f(x)$ then

x\leftarrow y;

7until $\mathit{stop}$ ;

Algorithm 1 (1+1) EA for minimization of a pseudo-Boolean function

f\colon\{0,1\}^{n-s}\to\mathbb{R}

, where

s\in\{0,\dots,n/2\}

Set-up.

We will investigate a general optimization scenario involving two linear pseudo-Boolean functions in an unconstrained search space. The objective function is an arbitrarily weighted sum of monotone transformations of two linear functions defined on (possibly overlapping) subspaces of $\{0,1\}^{n-s}$ for some $s\geq 0$ , where $s$ denotes the number of shared bits. Note that the introduction of this paper mentions a search space of dimension $n$ and a mutation probability of $p=1/(n+s)$ for the (1+1) EA. While the former perspective is more natural to present, from now on, we consider the asymptotically equivalent setting of search space dimension $n-s$ and mutation probability $p=1/n$ , which eases notation in the upcoming calculations.

Let $\alpha$ be a constant such that $1/2\leq\alpha\leq\ln(2-\epsilon)\approx 0.693-\epsilon/2$ for some constant $\epsilon>0$ and assume that $\alpha n$ is an integer. We allow the subfunctions to depend on a number of bits in $[(1-\alpha)n,\alpha n$ ], including the balanced case that both subfunctions depend on exactly $n/2$ bits. Formally, we have

•

linear functions $\ell_{1}\colon\{0,1\}^{\alpha n}\to\mathbb{R}\text{ and }\ell_{2}\colon\{0,1\}^{(1-\alpha)n}\to\mathbb{R},$ where $\ell_{1}(y_{1},\dots,y_{\alpha n})=\sum_{i=1}^{\alpha n}w^{(1)}_{i}y_{i}$ , and similarly $\ell_{2}(z_{1},\dots,z_{(1-\alpha)n})=\sum_{i=1}^{(1-\alpha)n}w^{(2)}_{i}z_{i}$ with non-negative weights $w^{(1)}_{i}$ and $w^{(2)}_{i}$ .
•

$B_{1}\subseteq\{1,\dots,n\}$ and $B_{2}\subseteq\{1,\dots,n\}$ , denoting the bit positions that $\ell_{1}$ resp. $\ell_{2}$ are defined on in the actual objective function $f\colon\{0,1\}^{n-s}\to\mathbb{R}$ .
•

The overlap count $s\coloneqq\lvert B_{1}\cap B_{2}\rvert$ , where $s\leq\min\{(1-\alpha)n,\alpha n\}=(1-\alpha)n\leq n/2$
•

the linear functions with extended domain $\ell^{*}_{1}(x_{1},\dots,x_{n-s})=\sum_{i\in B_{1}}w^{(1)}_{r^{(1)}(i)}x_{i}$ where $r^{(1)}(i)$ is the rank of $i$ in $B_{1}$ (with the smallest number receiving rank number $1$ ); and analogously $\ell^{*}_{2}(x_{1},\dots,x_{n-s})=\sum_{i\in B_{2}}w^{(2)}_{r^{(2)}(i)}x_{i}$ ; note that $\ell_{1}^{*}$ and $\ell_{2}^{*}$ only depend essentially on $\alpha n$ and $(1-\alpha)n$ bits, respectively.
•

monotone increasing functions $h_{1}\colon\mathbb{R}\to\mathbb{R}$ and $h_{2}\colon\mathbb{R}\to\mathbb{R}$ .

Then the objective function $f\colon\{0,1\}^{n-s}\to\mathbb{R}$ , which w. l. o. g. is to be minimized, is given by

f(x_{1},\dots,x_{n-s})=h_{1}(\ell^{*}_{1}(x_{1},\dots,x_{n-s}))+h_{2}(\ell^{*}_{2}(x_{1},\dots,x_{n-s})).

For $s=0$ , $h_{1}$ being the square function, and $h_{2}$ being the square root function, this matches the setting of separable functions given in Equation 1. This set-up also includes the case that

f(x_{1},\dots,x_{m})=\ell_{1}(x_{1},\dots,x_{m})+R\sqrt{\ell_{2}(x_{1},\dots,x_{m})}

for two $m$ -dimensional, completely overlapping linear functions $\ell_{1}$ and $\ell_{2}$ and an arbitrary factor $R\geq 0$ , as motivated and given in Equation 3. Note that this matches our set-up with $n=2m$ and $s=n$ .

For our analysis we will make use of the multiplicative drift theorem (Theorem 1) that has been introduced in [30] and was enhanced with tail bounds by [13]. We use a slightly generalised presentation that can be found in [31].

Theorem 1 (Multiplicative Drift, cf. [30, 13, 31]).

Let $(X_{t})_{t\geq 0}$ , be a stochastic process, adapted to a filtration $\mathcal{F}_{t}$ , over some state space $S\subseteq\{0\}\cup[s_{\mathrm{min}},s_{\mathrm{max}}]$ , where $0\in S$ and $s_{\mathrm{min}}>0$ . Suppose that there exists a $\delta>0$ such that for all $t\geq 0$

\mathord{\mathrm{E}}\mathord{\left(X_{t}-X_{t+1}\mid\mathcal{F}_{t}\right)}\geq\delta X_{t}.

Then it holds for the first hitting time $T:=\min\{t\mid X_{t}=0\}$ that

\mathord{\mathrm{E}}\mathord{\left(T\mid\mathcal{F}_{0}\right)}\leq\frac{\ln(X_{0}/s_{\mathrm{min}})+1}{\delta}.

Moreover, $\operatorname{Pr}(T>(\ln(X_{0}/s_{\mathrm{min}})+r)/\delta)\leq e^{-r}$ for any $r>0$ .

3 Negative Weights Allow for Multimodal Functions

We will now justify that the inclusion of negative weights in the underlying linear functions, along with overlapping domains, can lead to multimodal problems that cannot be optimized in expected time $O(n\log n)$ any longer. In the following example, the two linear functions depend essentially on all $n$ bits.

Let

f(x_{1},\dots,x_{n})=\underbrace{\left(\frac{x_{1}}{2}+\sum_{i=2}^{n}x_{i}\right)}_{h_{1}(\ell_{1}(x))}+\underbrace{\left(\frac{\sum_{i=1}^{n}(1-x_{i})}{n-0.5}\right)^{n^{2}}}_{h_{2}(\ell_{2}(x))}

Basically, the first linear function $\ell_{1}(x)=x_{1}/2+\sum_{i=2}^{n}x_{i}$ is a OneMax function except for the first bit that has a smaller weight than the rest. The second linear function $\ell_{2}(x)$ is linear in the number of zeros, i. e., corresponds to the ZeroMax function that is equivalent to OneMax for the (1+1) EA due to symmetry reasons. The transformation $h_{2}$ that is applied to ZeroMax divides the number of zero-bits by $n-0.5$ and raises the result to a large power. Essentially, the value of $h_{2}(z)$ is $e^{\Theta(n)}$ if $z=n$ and $e^{-\Theta(n)}$ otherwise. This puts a constraint on the number of zero-bits. If $\lvert x\rvert_{1}\geq 1$ , then $f$ is monotone increasing in $\ell_{1}$ , i. e., search points decreasing the $\ell_{1}$ -value also decrease the $f$ -value. However, the all-zeros string has the largest $f$ -value, i. e., is worst.

We can now see that all search points having one exactly one one-bit at one of the positions $2,\dots,n$ are local optima. To create the global optimum from such a point, two bits have to flip simultaneously, leading to $\Omega(n^{2})$ expected time to reach the optimum from the point. The situation is similar to the optimization of linear function under uniform constraints [32].

4 Upper Bound

The following theorem is the main result of this paper, showing that the (1+1) EA can optimize the generalized class of functions in asymptotically the same time as an ordinary linear function.

Theorem 2.

Let $f$ be the sum of two transformed linear functions as defined in the set-up in Section 2.1. Then the expected optimization time of the (1+1) EA on $f$ is $O(n\log n)$ .

The proof of Theorem 2 uses drift analysis with a carefully defined potential function, explained in the following.

Potential function.

We build upon the approach from [11] to construct a potential function $g^{(1)}$ for $\ell_{1}$ and a potential function $g^{(2)}$ for $\ell_{2}$ , resulting in a combined potential function $\phi(x)=g^{(1)}(x)+g^{(2)}(x)$ . The individual potential functions are obtained in the same way as if the (1+1) EA with mutation probability $1/n$ was only optimizing $\ell_{1}$ and $\ell_{2}$ , respectively, on an $\alpha n$ -dimensional and $(1-\alpha)n$ -dimensional search space, respectively. The key idea is that accepted steps of the (1+1) EA on $g$ must improve at least one of the two functions $\ell_{1}$ and $\ell_{2}$ . This event leads to a high enough drift of the respective potential function that is still positive after pessimistically incorporating the potential loss due to flipping zero-bits that only the other linear function depends on.

We proceed with the definition of the potential functions $g^{(1)}$ and $g^{(2)}$ (similarly to Section 5 in [11]). For the two underlying linear functions we assume their arguments are reordered according to increasing weights. Note we cannot necessarily sort the set of all indices $1,\dots,n-s$ of the function $f$ so that both underlying linear functions have increasing coefficients; however, as we analyze the underlying functions separately, we can each time use the required sorting in these separate considerations.

Definition 1.

Given a linear function $\sum_{i=1}^{k}w_{i}x_{i}$ , where $w_{1}\leq\dots\leq w_{k}$ , we define the potential function $g(x_{1},\dots,x_{k})=\sum_{i=1}^{k}g_{i}x_{i}$ by

g_{i}=\left(1+\frac{1}{n}\right)^{\min\{j\leq i\;\mid\;w_{j}=w_{i}\}-1}.

In our scenario, $g^{(1)}(z)$ is the potential function obtained from applying this construction to the $\alpha n$ -dimensional linear function $\ell_{1}(z)$ , and proceeding accordingly with the $(1-\alpha)n$ -dimensional function $g^{(2)}(y)$ and $\ell_{2}(y)$ . Finally, we define $\phi(x)=g^{(1)}(z)+g^{(2)}(y)$ .

We can now give the proof of our main theorem.

Proof of Theorem 2.

Using the potential function from Definition 1, we analyze the (1+1) EA on $f$ , assume an arbitrary, non-optimal search point $x_{t}\in\{0,1\}^{n-s}$ and consider the expected change of $g$ from time $t$ to time $t+1$ . We consider an accepted step where the offspring differs from the parent since this is necessary for $g$ to change. That is, at least one $1$ -bit flips and $f$ does not grow. Let $A$ be the event that an offspring $x^{\prime}\neq x_{t}$ is accepted. For $A$ to occur, it is necessary that at least one of the two functions $\ell_{1}$ and $\ell_{2}$ does not grow. Since the two cases can be handled in an essentially symmetrically manner (they become perfectly symmetrical for $\alpha=n/2$ ), we only analyze the case that $\ell_{2}$ does not grow and that at least one bit in $B_{2}$ is flipped from $1$ to $0$ . Hence, we consider exactly the situation that the (1+1) EA with the linear function $\ell_{2}$ as $(1-\alpha)n$ -bit fitness function produces an offspring that is accepted and different from the parent.

Let $Y_{t}=g^{(2)}(y_{t})$ , where $y_{t}$ is the restriction of the search point $x_{t}$ at time $t$ to the $(1-\alpha)n$ bits in $B_{2}$ that $g^{(2)}$ depends on, assuming the indices of $x_{t}$ to be reordered with respect to increasing coefficients $w^{(2)}_{1},\dots,w^{(2)}_{(1-\alpha)n}$ . To compute the drift of $g$ , we distinguish between several cases and events in a way similar to the proof of Th. 5.1 in [11]. Each of these cases first bounds the drift of $Y_{t}$ sufficiently precisely and then adds a pessimistic estimate of the drift of $Z_{t}=g^{(1)}(z_{t})$ , which corresponds to the other linear function on bits from $B_{1}$ , i. e., the function whose value may grow under the event $A$ . Note that $g^{(1)}$ depends on at least as many bits as $g^{(2)}$ does.

Since the estimate of the drift of $Z_{t}$ is always the same, we present it first. Let $\tilde{Z}_{t+1}$ denote the $g^{(1)}$ -value of the mutated bit string $x^{\prime}$ (restricted to the bits in $B_{1}$ ). If $x^{\prime}$ is accepted, then $Z_{t+1}=\tilde{Z}_{t+1}$ ; otherwise $Z_{t+1}=Z_{t}$ . If we pessimistically assume that each bit in $z_{t}$ (i. e., the restriction of $x_{t}$ to the bits in $B_{1}$ ) is a zero-bit that can flip to $1$ , we obtain the upper bound

	$\displaystyle\mathord{\mathrm{E}}\mathord{\left(\tilde{Z}_{t+1}-Z_{t}\mid Z_{t}\right)}$	$\displaystyle\leq\frac{1}{n}\sum_{i=1}^{\alpha n}\left(1+\frac{1}{n}\right)^{i-1}=\frac{1}{n}\frac{\left(1+\frac{1}{n}\right)^{\alpha n-1}-1}{1/n}$
		$\displaystyle\leq e^{\alpha}-1\leq e^{\ln(2-\epsilon)}-1\leq 1-\epsilon,$		(4)

where we use that $\alpha<\ln(2-\epsilon)$ for some constant $\epsilon>0$ . Also, since $Z_{t+1}=Z_{t}$ if the mutation is rejected and we only consider flipping zero-bits, we have under $A$ (the event that $x^{\prime}$ is accepted) that

\mathord{\mathrm{E}}\mathord{\left(Z_{t+1}-Z_{t}\mid Z_{t};A\right)}\leq\mathord{\mathrm{E}}\mathord{\left(\tilde{Z}_{t+1}-Z_{t}\mid Z_{t}\right)}\leq 1-\epsilon.

(5)

Note that the estimations (4) and (5) include the case that $s$ of the bits in $z_{t}$ are shared with the input string $y_{t}$ of the other linear function $g^{(2)}(y_{t})$ .

We next conduct the detailed drift analysis to bound $\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t}\right)}$ , considering certain events necessary for $A$ . Two different cases are considered.

Case 1: at least two one-bits in $y_{t}$ flip (event $S_{1}$ ). Let $\tilde{Y}_{t+1}$ denote the $g^{(2)}$ -value of the mutated bit string $x^{\prime}$ , restricted to the bits in $B_{2}$ , under event $S_{1}$ before selection. If $x^{\prime}$ is accepted, then $Y_{t+1}=\tilde{Y}_{t+1}$ ; otherwise $Y_{t+1}=Y_{t}$ . Since $g_{i}\geq 1$ for all $i$ , every zero-bit in $y_{t}$ flips to one with probability at most $1/n$ , and $(1-\alpha)\leq\alpha$ , we can re-use the estimations from (4). Bounding the contribution of the flipping one-bits from below by $2$ , we obtain

	$\displaystyle\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{1}\right)}\geq 2-\frac{1}{n}\sum_{i=1}^{(1-\alpha)n}\left(1+\frac{1}{n}\right)^{i-1}$
	$\displaystyle\quad\geq 2-\frac{1}{n}\sum_{i=1}^{\alpha n}\left(1+\frac{1}{n}\right)^{i-1}\geq 2-(1-\epsilon)=1+\epsilon.$

Along with (4), we have

	$\displaystyle\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x^{\prime})\mid x_{t};S_{1}\right)}$	$\displaystyle\geq\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{1}\right)}-\mathord{\mathrm{E}}\mathord{\left(Z_{t}-\tilde{Z}_{t+1}\mid Z_{t};S_{1}\right)}$
		$\displaystyle\geq 2-(1-\epsilon)-(1-\epsilon)>\epsilon.$

Since the drift of $\phi$ is non-negative in Case $1$ , we estimate it from below by $0$ regardless of whether $A$ occurs or not and focus only on the event defined in the following case.

Case 2: exactly one one-bit in $y_{t}$ flips (event $S_{2}$ ). Let $i^{*}$ denote the random index of the flipping one-bit in $y_{t}$ . Moreover, let the function $\beta(i)=\min\{j\leq i\mid w^{(2)}_{j}=w^{(2)}_{i}\}$ denote the smallest index at most $i$ with the same weight as $w^{(2)}_{i}$ , i. e., $\beta(i)-1$ is the largest index of a strictly smaller weight; using our assumption that the weights are monotonically increasing with their index. If at least one zero-bit having the same or a larger weight than bit $i^{*}$ flips, neither $\ell_{2}$ nor $g_{2}$ change (because the offspring has the same function value or is rejected); hence, we now, without loss of generality, only consider the subevents of $S_{2}$ where all flipping zero-bits have an index of at most $\beta(i^{*})$ . (This reasoning is similar to the analysis of Subcase 2.2.2 in the proof of Th. 5 from [11].)

Redefining notation, let $\tilde{Y}_{t+1}$ denote the $g^{(2)}$ -value of the mutated bit string $x^{\prime}$ (restricted to the bits in $B_{2}$ ) under event $S_{2}$ before selection. If $x^{\prime}$ is accepted, then $Y_{t+1}=\tilde{Y}_{t+1}$ ; otherwise $Y_{t+1}=Y_{t}$ . Recalling that $A$ is the event that the mutation $x^{\prime}$ is accepted, we have by the law of total probability

	$\displaystyle\mathord{\mathrm{E}}\mathord{\left(Y_{t}-Y_{t+1}\mid Y_{t};S_{2}\right)}$	$\displaystyle=\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\cdot\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};A\cap S_{2}\right)}$
		$\displaystyle\geq\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\cdot\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{2}\right)},$

where the inequality holds since the our estimation of $\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{2}\right)}$ below will consider exactly one one-bit to flip and assume all zero-bits to flip independently, even though already steps flipping two zero-bits right of $\beta(i^{*})$ may be rejected.

Moreover, using the law of total probability and (5),

\mathord{\mathrm{E}}\mathord{\left(Z_{t+1}-Z_{t}\mid Z_{t};S_{2}\right)}\leq\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\cdot\mathord{\mathrm{E}}\mathord{\left(\tilde{Z}_{t+1}-Z_{t}\mid Z_{t};S_{2}\right)}

and therefore

	$\displaystyle\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};S_{2}\right)}$	$\displaystyle=\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\cdot\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x^{\prime})\mid Y_{t};A\cap S_{2}\right)}$
		$\displaystyle\geq\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\mathord{\mathrm{E}}\mathord{\left((Y_{t}-\tilde{Y}_{t+1})-(\tilde{Z}_{t+1}-Z_{t})\mid x_{t};S_{2}\right)}$		(6)

It holds that $\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\geq(1-1/n)^{n-1}\geq e^{-1}$ since the mutation flipping $i^{*}$ is certainly accepted if no other bits flip. To bound the drift, we use that every zero-bit $j$ right of $\beta(i^{*})$ flips with probability $1/n$ and contributes $g^{(2)}_{j}$ to the difference $Y_{t}-\tilde{Y}_{t+1}$ . Moreover, the flip of $i^{*}$ contributes the term $g^{(2)}_{i^{*}}$ to the difference. Altogether,

	$\displaystyle\mathord{\mathrm{E}}\mathord{\left(Y_{t}-\tilde{Y}_{t+1}\mid Y_{t};S_{2}\right)}=\left(1+\frac{1}{n}\right)^{\beta(i^{})-1}-\frac{1}{n}\sum_{j=1}^{\beta(i^{})-1}\left(1+\frac{1}{n}\right)^{j-1}$
	$\displaystyle=\left(1+\frac{1}{n}\right)^{\beta(i^{})-1}-\frac{1}{n}\left(\frac{\left(1+\frac{1}{n}\right)^{\beta(i^{})-1}-1}{1/n}\right)=1.$

Combining this with (5), we have

\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x^{\prime})\mid x_{t};S_{2}\right)}=\mathord{\mathrm{E}}\mathord{\left((Y_{t}-\tilde{Y}_{t+1})-(\tilde{Z}_{t+1}-Z_{t})\mid x_{t};S_{2}\right)}\geq 1-(1-\epsilon)=\epsilon.

Altogether, using (6) and our lower bound $\mathord{\operatorname{Pr}}\mathord{\left(A\mid S_{2}\right)}\geq e^{-1}$ , we have the following lower bound on the drift under $S_{2}$ :

\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};S_{2}\right)}\geq e^{-1}\epsilon.

Finally, we compute the total drift considering all possible one-bits that can flip under $S_{2}$ . Let $I$ be the set of one-bits in the whole bit string $x_{t}$ . Since the analysis is analogous when considering an index $i\in I$ , we still consider the situation that the corresponding linear function decreases or stays the same if $i\in B_{2}$ , i. e., $i$ belongs to $y_{t}$ and remark that an analogous event $A^{\prime}$ with respect to the bits $B_{1}$ and the string $z_{t}$ can be analyzed in the same way.

Now, for $i\in I$ , let $F_{i}$ denote the event that bit $i$ is the only flipping one-bit in the considered part of the bit string and let $F$ be the event that exactly one bit from $I$ flips. We have for all $i\in I$ that

\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};F_{i}\right)}\geq e^{-1}\epsilon.

and therefore also $\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};F\right)}\geq e^{-1}\epsilon$ . It is sufficient to flip one of the $\lvert I\rvert$ one-bits and no other bit to have an accepted mutation, which has probability at least $(\lvert I\rvert/n)(1-1/n)^{n-1}\geq\frac{\lvert I\rvert}{en}$ . We obtain the unconditional drift

\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t}\right)}\geq\frac{\lvert I\rvert}{en}\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t};F_{i}\right)}\geq\frac{\lvert I\rvert e^{-2}}{n}\epsilon,

recalling that we estimated the drift from below by $0$ if at least two one-bits flip. To conclude the proof, we relate the last bound to $\phi(x_{t})$ . Clearly, since $g_{i}\leq(1+1/n)^{n-1}\leq e$ for all $i\in\{1,\dots,\alpha n\}$ and since each one-bit can contribute to both $g^{(1)}(x_{t})$ and $g^{(2)}(y_{t})$ , we have $\phi(x_{t})\leq 2e\lvert I\rvert$ so that

\mathord{\mathrm{E}}\mathord{\left(\phi(x_{t})-\phi(x_{t+1})\mid x_{t}\right)}\geq\left(\frac{e^{-3}\epsilon}{2n}\right)\phi(x_{t}).

Hence, we have established a multiplicative drift of the potential $\phi$ with a factor of $\delta=(e^{-3}\epsilon)/(2n)$ and we obtain the claimed $O(n\log n)$ bound on the expected optimization time via the multiplicative drift theorem (Theorem 1), using $X_{0}\leq n(1+1/(n-1))^{n}=O(n)$ and $s_{\mathrm{min}}=1$ . ∎

We remark that the drift factor $(e^{-3}\epsilon)/n$ from the previous proof can be improved by constant factors using a more detailed case analysis; however, since $\epsilon$ can be arbitrarily small and the final bound is in $O$ -notation, this does not seem worth the effort.

5 Discussion and Conclusions

Motivated by studies on separable functions and objective functions for chance constrained problems based on the expected value and variance of solutions, we investigated the quite general setting of the sum of two transformed linear functions and established an $O(n\log n)$ bound for the (1+1) EA.

We now would like to point out some topics for further investigations. Our result from Theorem 2 has some limitations. First of all, the domains of the two linear functions may not differ very much in size; more precisely they must be within a factor of $\alpha/(1-\alpha)\leq(\ln(2))/(1-\ln(2))\approx 2.26$ . With the current pessimistic assumption that an improving mutation only improves one of the two linear functions and simultaneously may flip any bit in the other function to $1$ without the mutation being rejected, we cannot improve this to larger size differences for the domain. For the same reason, the result cannot easily be generalized to mutation probabilities $c/n$ for arbitrary constants $c>0$ as shown for the original case of simple linear functions in [11]. Although that paper also suggests a different, more powerful class of potential functions to handle high mutation probabilities, it seems difficult to apply these more powerful potential functions in the presence of our pessimistic assumptions. With stronger conditions on $\alpha$ , it may be possible to extend the present results to mutation probabilities up to $(1+\epsilon)/(n+s)$ for a positive constant $\epsilon$ depending on $\alpha$ . However, it would be more interesting to see whether the $O(n\log n)$ bound would also hold for mutation probability $1/n$ for all $s\geq 1$ , which would include the function $g(x)$ from the chance-constrained scenario in (3) for the usual mutation probability.

Acknowledgments

This work has been supported by the Australian Research Council (ARC) through grant FT200100536 and by the Independent Research Fund Denmark through grant DFF-FNU 8021-00260B.

References

[1] F. Neumann, C. Witt, Bioinspired Computation in Combinatorial Optimization – Algorithms and Their Computational Complexity, Springer, 2010.
[2] T. Jansen, Analyzing Evolutionary Algorithms – The Computer Science Perspective, Springer, 2013.
[3] B. Doerr, F. Neumann (Eds.), Theory of Evolutionary Computation—Recent Developments in Discrete Optimization, Springer, 2020.
[4] S. Droste, T. Jansen, I. Wegener, On the analysis of the (1+1) evolutionary algorithm, Theoretical Computer Science 276 (2002) 51–81.
[5] B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications, Advances in Applied Probability 13 (3) (1982) 502–525.
[6] J. He, X. Yao, A study of drift analysis for estimating computation time of evolutionary algorithms, Natural Computing 3 (1) (2004) 21–35.
[7] J. Jägersküpper, A blend of markov-chain and drift analysis, in: Proc. of PPSN ’08, Vol. 5199 of LNCS, Springer, 2008, pp. 41–51.
[8] J. Jägersküpper, Combining markov-chain analysis and drift analysis, Algorithmica 59 (3) (2011) 409–424.
[9] B. Doerr, D. Johannsen, C. Winzen, Multiplicative drift analysis, in: Proc. of GECCO ’10, ACM Press, 2010, pp. 1449–1456.
[10] B. Doerr, D. Johannsen, C. Winzen, Drift analysis and linear functions revisited, in: Proc. of CEC ’10, IEEE Press, 2010, pp. 1–8.
[11] C. Witt, Tight bounds on the optimization time of a randomized search heuristic on linear functions, Combinatorics, Probability & Computing 22 (2) (2013) 294–318.
[12] B. Doerr, L. A. Goldberg, Adaptive drift analysis, in: Proc. of PPSN ’10, Vol. 6238 of LNCS, Springer, 2010, pp. 32–41.
[13] B. Doerr, L. A. Goldberg, Adaptive drift analysis, Algorithmica 65 (1) (2013) 224–250.
[14] B. Doerr, D. Sudholt, C. Witt, When do evolutionary algorithms optimize separable functions in parallel?, in: FOGA, ACM, 2013, pp. 51–64.
[15] A. Charnes, W. W. Cooper, Chance-constrained programming, Management science 6 (1) (1959) 73–79.
[16] Y. Xie, O. Harper, H. Assimi, A. Neumann, F. Neumann, Evolutionary algorithms for the chance-constrained knapsack problem, in: GECCO, ACM, 2019, pp. 338–346.
[17] Y. Xie, A. Neumann, F. Neumann, Specific single- and multi-objective evolutionary algorithms for the chance-constrained knapsack problem, in: GECCO, ACM, 2020, pp. 271–279.
[18] H. Assimi, O. Harper, Y. Xie, A. Neumann, F. Neumann, Evolutionary bi-objective optimization for the dynamic chance-constrained knapsack problem based on tail bound objectives, in: ECAI, Vol. 325, IOS Press, 2020, pp. 307–314.
[19] Y. Xie, A. Neumann, F. Neumann, Heuristic strategies for solving complex interacting stockpile blending problem with chance constraints, in: GECCO, ACM, 2021, pp. 1079–1087.
[20] A. Neumann, F. Neumann, Optimising monotone chance-constrained submodular functions using evolutionary multi-objective algorithms, in: PPSN (1), Springer, 2020, pp. 404–417.
[21] F. Neumann, A. M. Sutton, Runtime analysis of the (1 + 1) evolutionary algorithm for the chance-constrained knapsack problem, in: FOGA, ACM, 2019, pp. 147–153.
[22] Y. Xie, A. Neumann, F. Neumann, A. M. Sutton, Runtime analysis of RLS and the (1+1) EA for the chance-constrained knapsack problem with correlated uniform weights, in: GECCO, ACM, 2021, pp. 1187–1194.
[23] B. Doerr, C. Doerr, A. Neumann, F. Neumann, A. M. Sutton, Optimization of chance-constrained submodular functions, in: AAAI, AAAI Press, 2020, pp. 1460–1467.
[24] H. Ishii, S. Shiode, T. Nishida, Y. Namasuya, Stochastic spanning tree problem, Discret. Appl. Math. 3 (4) (1981) 263–273.
[25] B. Doerr, Probabilistic tools for the analysis of randomized optimization heuristics, in: B. Doerr, F. Neumann (Eds.), Theory of Evolutionary Computation – Recent Developments in Discrete Optimization, Springer, 2020, pp. 1–87.
[26] B. Doerr, C. Doerr, The impact of random initialization on the runtime of randomized search heuristics, Algorithmica 75 (3) (2016) 529–553.
[27] R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.
[28] B. Doerr, T. Jansen, D. Sudholt, C. Winzen, C. Zarges, Mutation rate matters even when optimizing monotonic functions, Evol. Comput. 21 (1) (2013) 1–27.
[29] J. Lengler, A. Martinsson, A. Steger, When does hillclimbing fail on monotone functions: An entropy compression argument, in: ANALCO, SIAM, 2019, pp. 94–102.
[30] B. Doerr, D. Johannsen, C. Winzen, Multiplicative drift analysis, Algorithmica 64 (4) (2012) 673–697.
[31] P. K. Lehre, C. Witt, Tail bounds on hitting times of randomized search heuristics using variable drift analysis, Combinatorics, Probability & Computing 30 (4) (2021) 550–569.
[32] F. Neumann, M. Pourhassan, C. Witt, Improved runtime results for simple randomised search heuristics on linear functions with a uniform constraint, Algorithmica 83 (10) (2021) 3209–3237.