Runtime Analysis of the (1+1) EA on Weighted Sums of Transformed Linear Functions
Abstract
Linear functions play a key role in the runtime analysis of evolutionary algorithms and studies have provided a wide range of new insights and techniques for analyzing evolutionary computation methods. Motivated by studies on separable functions and the optimization behaviour of evolutionary algorithms as well as objective functions from the area of chance constrained optimization, we study the class of objective functions that are weighted sums of two transformed linear functions. Our results show that the (1+1) EA, with a mutation rate depending on the number of overlapping bits of the functions, obtains an optimal solution for these functions in expected time , thereby generalizing a well-known result for linear functions to a much wider range of problems.
1 Introduction
Runtime analysis is one of the major theoretical tools to provide rigorous insights into the working behavior of evolutionary algorithms and other randomized search heuristics [1, 2, 3]. The class of pseudo-Boolean linear functions plays a key role in the area of runtime analysis. Starting with the simplest linear functions called OneMax for which the first runtime analysis has been carried out, a wide range of results have been obtained for the general class of linear functions. This includes the study of Droste, Jansen and Wegener [4] who were the first to obtain an upper bound of for the (1+1) EA on the general class of pseudo-Boolean linear functions. This groundbreaking result has been based on a very lengthy proof and subsequently a wide range of improvements have been made in terms of the development of new techniques for the analysis as well as the precision of the results. The proof has been simplified significantly using the analytic framework of drift analysis [5] by He and Yao [6]. Jägersküpper [7, 8] provided the first analysis of the leading coefficient in the bound on the the optimisation time for the problem. Furthermore, advances to simplify proofs and getting precise results have been made using the framework of multiplicative drift [9]. Doerr, Johannsen and Winzen improved the upper bound result to [10]. Finally, Witt [11] improved this bound to by using adaptive drift analysis [12, 13]. We expand such investigations for the (1+1) EA into a wider class of problems that are modelled by two transformed linear functions. This includes classes of separable functions and chance constrained optimization problems.
1.1 Separable Functions
As an example, consider the separable objective function
(1) |
where , , and . The function consists of two objective functions
Here is the square of a function linear in the first half of variables and is the square root of a linear function in the remaining variables. Some investigations on how evolutionary algorithms optimize separable fitness functions have been carried out in [14]. It has been shown that if the different functions only have a small range, then the (1+1) EA optimizes separable functions efficiently if the different separable functions themselves are easy to be optimized. However, in our example above the two separable functions may take on exponentially many values but both functions on their own are optimized by the (1+1) EA in time using the results for the (1+1) EA on linear functions. This holds as the transformation applying the square in or the square root in does not change the behavior of the (1+1) EA. The questions arises whether the bounds also holds for the function which combines and . We investigate this setting of separable functions for the more general case where the objective function is given as a weighted sum of two separable transformed linear functions. For technical reasons, we consider a (1+1) EA with potentially reduced mutation probability depending on the number of overlapping bits of the two functions.
1.2 Chance Constrained Problems
Another motivation for our work comes from problems from the area of chance constrained optimization [15] and considers the case where the two functions are overlapping or are even defined on the same set of variables. Recently evolutionary algorithms have been used for chance constrained problems which motivates our investigations. In a chance constrained setting the input involves stochastic components and the goal is to optimize a given objective function under the condition that constraints are met with high probability or that function values are guaranteed with a high probability. Evolutionary algorithms have been designed for the chance constrained knapsack problem [16, 17, 18], chance constrained stock pile blending problems [19], and chance constrained submodular functions [20].
Runtime analysis results have been obtained for restricted settings of the knapsack problem [21, 22] where the weights are stochastic and the constraint bound has to be met with high probability. The analysis for the case of stochastic constraints and the class of submodular function [23] and the knapsack problem [16] already reveal constraint functions that are a linear combination of the expected weight and the standard deviation of a solution when using Chebyshev’s inequality for constraint evaluation. Such functions are the subject of our investigations.
To make the type of problems that we are interested in clear, we state the following problem. Given a set of items with random weights , . We assume that the weights are independent and each is distributed according to a normal distribution , . We assume and , . Our goal is to
(2) |
where , , and . The problem given in Equation (2) is usually considered under additional constraints, e.g. spanning tree constraints in [24], which we do not consider in this paper.
According to [24] the problem given in Equation 2 is equivalent to minimizing the fitness function
(3) |
where is the -fractile point of the standard normal distribution.
The fitness function is a linear combination of the expected value of a solution which is a linear function and the square root of its variance where the variance is again a linear function. In order to understand the behaviour of evolutionary algorithms on fitness functions obtained for chance constrained optimization problems, our runtime analysis for the (1+1) EA covers such fitness functions if we assume the reduced mutation probability mentioned above.
1.3 Transformed Linear Functions
In our investigations, we consider the much wider class of problems where a given fitness function is obtained by the linear combination of two transformed linear functions. The transformations applied to the linear functions only have to be monotonically increasing in terms of the functions values of the linear functions. This includes the setting of separable functions and chance constrained problems described previously. Furthermore, we do not require that the two linear functions are defined on the same number of bits.
The main result of our paper is an upper bound for the (1+1) EA with mutation probability on the class of sums of two transformed linear functions where is the number of bits for which the two linear functions overlap. This directly transfers to the separable problem type given in Equation 1 with standard bit mutation probability and to the chance constraint formulation given in Equation 3 when using mutation probability .
The outline of the paper is as follows. In Section 2, we formally introduce the problem formulation for which we analyze the (1+1) EA in this paper. We discuss the exclusion of negative weights in our setup in Section 3 and present the bound in Section 4. Finally, we finish with some discussion and conclusions.
2 Preliminaries
The (1+1) EA shown in Algorithm 1 (generalized with a parameter discussed below; classically is assumed) is a simple evolutionary algorithm using independent bit flips and elitist selection. It is very well studied in the theory of evolutionary computation [25] and serves as a stepping stone towards the analysis of more complicated evolutionary algorithms. As common, in the area of runtime analysis, we measure the run time of the (1+1) EA by the number of iterations of the repeat loop. The optimization time refers to the number of fitness evaluations until an optimal solution has been obtained for the first time, and the expected optimization time refers to the expectation of this value.
2.1 Sums of Two Transformed Linear Functions without Constraints
We will study the (1+1) EA on the scenario given in (1) and (3), assuming no additional constraints. In fact, we will generalize the scenario to the sum of two transformed pseudo-Boolean linear functions which may be (partially) overlapping. Note, that in (1) there is no overlap on the domains of the two linear functions and the transformations are the square and the square root, whereas in (3) there is complete overlap on the domains and the transformations are the identity function and the square root.
The crucial observation in our analysis is that the scenario considered here extends the linear function problem [11] that is heavily investigated in the theory of evolutionary algorithms. Despite the simple structure of the problem, there is no clear fitness-distance correlation in the linear function problem, which makes the analysis of the global search operator of the (1+1) EA difficult. If only local mutations are used, leading to the well known randomized local search (RLS) algorithm [26], then both the linear function problem and the generalized scenario considered here are very easy to analyze using standard coupon collector arguments [27], leading to expected optimization time. For the globally searching (1+1) EA, we will obtain the same bound, proving that the problem is easy to solve for it; however, we need advanced drift analysis methods to prove this.
We note that the class of functions we consider falls within the more general class of so-called monotone functions. Such functions can be difficult to optimize with a (1+1) EA using mutation probabilities larger than [28]; however, it is also known that the standard (1+1) EA with mutation probability as considered here optimizes all monotone functions in expected time [29]. Our bound is by an asymptotic factor of better if . However, it should be noted that for , the bound already follows directly from [28] since it corresponds to a mutation probability of for a constant . In fact, the fitness function arising from the chance-constrained scenario presented in (3) above would fall into the case .
Set-up.
We will investigate a general optimization scenario involving two linear pseudo-Boolean functions in an unconstrained search space. The objective function is an arbitrarily weighted sum of monotone transformations of two linear functions defined on (possibly overlapping) subspaces of for some , where denotes the number of shared bits. Note that the introduction of this paper mentions a search space of dimension and a mutation probability of for the (1+1) EA. While the former perspective is more natural to present, from now on, we consider the asymptotically equivalent setting of search space dimension and mutation probability , which eases notation in the upcoming calculations.
Let be a constant such that for some constant and assume that is an integer. We allow the subfunctions to depend on a number of bits in ], including the balanced case that both subfunctions depend on exactly bits. Formally, we have
-
•
linear functions where , and similarly with non-negative weights and .
-
•
and , denoting the bit positions that resp. are defined on in the actual objective function .
-
•
The overlap count , where
-
•
the linear functions with extended domain where is the rank of in (with the smallest number receiving rank number ); and analogously ; note that and only depend essentially on and bits, respectively.
-
•
monotone increasing functions and .
Then the objective function , which w. l. o. g. is to be minimized, is given by
For , being the square function, and being the square root function, this matches the setting of separable functions given in Equation 1. This set-up also includes the case that
for two -dimensional, completely overlapping linear functions and and an arbitrary factor , as motivated and given in Equation 3. Note that this matches our set-up with and .
3 Negative Weights Allow for Multimodal Functions
We will now justify that the inclusion of negative weights in the underlying linear functions, along with overlapping domains, can lead to multimodal problems that cannot be optimized in expected time any longer. In the following example, the two linear functions depend essentially on all bits.
Let
Basically, the first linear function is a OneMax function except for the first bit that has a smaller weight than the rest. The second linear function is linear in the number of zeros, i. e., corresponds to the ZeroMax function that is equivalent to OneMax for the (1+1) EA due to symmetry reasons. The transformation that is applied to ZeroMax divides the number of zero-bits by and raises the result to a large power. Essentially, the value of is if and otherwise. This puts a constraint on the number of zero-bits. If , then is monotone increasing in , i. e., search points decreasing the -value also decrease the -value. However, the all-zeros string has the largest -value, i. e., is worst.
We can now see that all search points having one exactly one one-bit at one of the positions are local optima. To create the global optimum from such a point, two bits have to flip simultaneously, leading to expected time to reach the optimum from the point. The situation is similar to the optimization of linear function under uniform constraints [32].
4 Upper Bound
The following theorem is the main result of this paper, showing that the (1+1) EA can optimize the generalized class of functions in asymptotically the same time as an ordinary linear function.
Theorem 2.
Let be the sum of two transformed linear functions as defined in the set-up in Section 2.1. Then the expected optimization time of the (1+1) EA on is .
The proof of Theorem 2 uses drift analysis with a carefully defined potential function, explained in the following.
Potential function.
We build upon the approach from [11] to construct a potential function for and a potential function for , resulting in a combined potential function . The individual potential functions are obtained in the same way as if the (1+1) EA with mutation probability was only optimizing and , respectively, on an -dimensional and -dimensional search space, respectively. The key idea is that accepted steps of the (1+1) EA on must improve at least one of the two functions and . This event leads to a high enough drift of the respective potential function that is still positive after pessimistically incorporating the potential loss due to flipping zero-bits that only the other linear function depends on.
We proceed with the definition of the potential functions and (similarly to Section 5 in [11]). For the two underlying linear functions we assume their arguments are reordered according to increasing weights. Note we cannot necessarily sort the set of all indices of the function so that both underlying linear functions have increasing coefficients; however, as we analyze the underlying functions separately, we can each time use the required sorting in these separate considerations.
Definition 1.
Given a linear function , where , we define the potential function by
In our scenario, is the potential function obtained from applying this construction to the -dimensional linear function , and proceeding accordingly with the -dimensional function and . Finally, we define .
We can now give the proof of our main theorem.
Proof of Theorem 2.
Using the potential function from Definition 1, we analyze the (1+1) EA on , assume an arbitrary, non-optimal search point and consider the expected change of from time to time . We consider an accepted step where the offspring differs from the parent since this is necessary for to change. That is, at least one -bit flips and does not grow. Let be the event that an offspring is accepted. For to occur, it is necessary that at least one of the two functions and does not grow. Since the two cases can be handled in an essentially symmetrically manner (they become perfectly symmetrical for ), we only analyze the case that does not grow and that at least one bit in is flipped from to . Hence, we consider exactly the situation that the (1+1) EA with the linear function as -bit fitness function produces an offspring that is accepted and different from the parent.
Let , where is the restriction of the search point at time to the bits in that depends on, assuming the indices of to be reordered with respect to increasing coefficients . To compute the drift of , we distinguish between several cases and events in a way similar to the proof of Th. 5.1 in [11]. Each of these cases first bounds the drift of sufficiently precisely and then adds a pessimistic estimate of the drift of , which corresponds to the other linear function on bits from , i. e., the function whose value may grow under the event . Note that depends on at least as many bits as does.
Since the estimate of the drift of is always the same, we present it first. Let denote the -value of the mutated bit string (restricted to the bits in ). If is accepted, then ; otherwise . If we pessimistically assume that each bit in (i. e., the restriction of to the bits in ) is a zero-bit that can flip to , we obtain the upper bound
(4) |
where we use that for some constant . Also, since if the mutation is rejected and we only consider flipping zero-bits, we have under (the event that is accepted) that
(5) |
Note that the estimations (4) and (5) include the case that of the bits in are shared with the input string of the other linear function .
We next conduct the detailed drift analysis to bound , considering certain events necessary for . Two different cases are considered.
Case 1: at least two one-bits in flip (event ). Let denote the -value of the mutated bit string , restricted to the bits in , under event before selection. If is accepted, then ; otherwise . Since for all , every zero-bit in flips to one with probability at most , and , we can re-use the estimations from (4). Bounding the contribution of the flipping one-bits from below by , we obtain
Along with (4), we have
Since the drift of is non-negative in Case , we estimate it from below by regardless of whether occurs or not and focus only on the event defined in the following case.
Case 2: exactly one one-bit in flips (event ). Let denote the random index of the flipping one-bit in . Moreover, let the function denote the smallest index at most with the same weight as , i. e., is the largest index of a strictly smaller weight; using our assumption that the weights are monotonically increasing with their index. If at least one zero-bit having the same or a larger weight than bit flips, neither nor change (because the offspring has the same function value or is rejected); hence, we now, without loss of generality, only consider the subevents of where all flipping zero-bits have an index of at most . (This reasoning is similar to the analysis of Subcase 2.2.2 in the proof of Th. 5 from [11].)
Redefining notation, let denote the -value of the mutated bit string (restricted to the bits in ) under event before selection. If is accepted, then ; otherwise . Recalling that is the event that the mutation is accepted, we have by the law of total probability
where the inequality holds since the our estimation of below will consider exactly one one-bit to flip and assume all zero-bits to flip independently, even though already steps flipping two zero-bits right of may be rejected.
It holds that since the mutation flipping is certainly accepted if no other bits flip. To bound the drift, we use that every zero-bit right of flips with probability and contributes to the difference . Moreover, the flip of contributes the term to the difference. Altogether,
Combining this with (5), we have
Altogether, using (6) and our lower bound , we have the following lower bound on the drift under :
Finally, we compute the total drift considering all possible one-bits that can flip under . Let be the set of one-bits in the whole bit string . Since the analysis is analogous when considering an index , we still consider the situation that the corresponding linear function decreases or stays the same if , i. e., belongs to and remark that an analogous event with respect to the bits and the string can be analyzed in the same way.
Now, for , let denote the event that bit is the only flipping one-bit in the considered part of the bit string and let be the event that exactly one bit from flips. We have for all that
and therefore also . It is sufficient to flip one of the one-bits and no other bit to have an accepted mutation, which has probability at least . We obtain the unconditional drift
recalling that we estimated the drift from below by if at least two one-bits flip. To conclude the proof, we relate the last bound to . Clearly, since for all and since each one-bit can contribute to both and , we have so that
Hence, we have established a multiplicative drift of the potential with a factor of and we obtain the claimed bound on the expected optimization time via the multiplicative drift theorem (Theorem 1), using and . ∎
We remark that the drift factor from the previous proof can be improved by constant factors using a more detailed case analysis; however, since can be arbitrarily small and the final bound is in -notation, this does not seem worth the effort.
5 Discussion and Conclusions
Motivated by studies on separable functions and objective functions for chance constrained problems based on the expected value and variance of solutions, we investigated the quite general setting of the sum of two transformed linear functions and established an bound for the (1+1) EA.
We now would like to point out some topics for further investigations. Our result from Theorem 2 has some limitations. First of all, the domains of the two linear functions may not differ very much in size; more precisely they must be within a factor of . With the current pessimistic assumption that an improving mutation only improves one of the two linear functions and simultaneously may flip any bit in the other function to without the mutation being rejected, we cannot improve this to larger size differences for the domain. For the same reason, the result cannot easily be generalized to mutation probabilities for arbitrary constants as shown for the original case of simple linear functions in [11]. Although that paper also suggests a different, more powerful class of potential functions to handle high mutation probabilities, it seems difficult to apply these more powerful potential functions in the presence of our pessimistic assumptions. With stronger conditions on , it may be possible to extend the present results to mutation probabilities up to for a positive constant depending on . However, it would be more interesting to see whether the bound would also hold for mutation probability for all , which would include the function from the chance-constrained scenario in (3) for the usual mutation probability.
Acknowledgments
This work has been supported by the Australian Research Council (ARC) through grant FT200100536 and by the Independent Research Fund Denmark through grant DFF-FNU 8021-00260B.
References
- [1] F. Neumann, C. Witt, Bioinspired Computation in Combinatorial Optimization – Algorithms and Their Computational Complexity, Springer, 2010.
- [2] T. Jansen, Analyzing Evolutionary Algorithms – The Computer Science Perspective, Springer, 2013.
- [3] B. Doerr, F. Neumann (Eds.), Theory of Evolutionary Computation—Recent Developments in Discrete Optimization, Springer, 2020.
- [4] S. Droste, T. Jansen, I. Wegener, On the analysis of the (1+1) evolutionary algorithm, Theoretical Computer Science 276 (2002) 51–81.
- [5] B. Hajek, Hitting-time and occupation-time bounds implied by drift analysis with applications, Advances in Applied Probability 13 (3) (1982) 502–525.
- [6] J. He, X. Yao, A study of drift analysis for estimating computation time of evolutionary algorithms, Natural Computing 3 (1) (2004) 21–35.
- [7] J. Jägersküpper, A blend of markov-chain and drift analysis, in: Proc. of PPSN ’08, Vol. 5199 of LNCS, Springer, 2008, pp. 41–51.
- [8] J. Jägersküpper, Combining markov-chain analysis and drift analysis, Algorithmica 59 (3) (2011) 409–424.
- [9] B. Doerr, D. Johannsen, C. Winzen, Multiplicative drift analysis, in: Proc. of GECCO ’10, ACM Press, 2010, pp. 1449–1456.
- [10] B. Doerr, D. Johannsen, C. Winzen, Drift analysis and linear functions revisited, in: Proc. of CEC ’10, IEEE Press, 2010, pp. 1–8.
- [11] C. Witt, Tight bounds on the optimization time of a randomized search heuristic on linear functions, Combinatorics, Probability & Computing 22 (2) (2013) 294–318.
- [12] B. Doerr, L. A. Goldberg, Adaptive drift analysis, in: Proc. of PPSN ’10, Vol. 6238 of LNCS, Springer, 2010, pp. 32–41.
- [13] B. Doerr, L. A. Goldberg, Adaptive drift analysis, Algorithmica 65 (1) (2013) 224–250.
- [14] B. Doerr, D. Sudholt, C. Witt, When do evolutionary algorithms optimize separable functions in parallel?, in: FOGA, ACM, 2013, pp. 51–64.
- [15] A. Charnes, W. W. Cooper, Chance-constrained programming, Management science 6 (1) (1959) 73–79.
- [16] Y. Xie, O. Harper, H. Assimi, A. Neumann, F. Neumann, Evolutionary algorithms for the chance-constrained knapsack problem, in: GECCO, ACM, 2019, pp. 338–346.
- [17] Y. Xie, A. Neumann, F. Neumann, Specific single- and multi-objective evolutionary algorithms for the chance-constrained knapsack problem, in: GECCO, ACM, 2020, pp. 271–279.
- [18] H. Assimi, O. Harper, Y. Xie, A. Neumann, F. Neumann, Evolutionary bi-objective optimization for the dynamic chance-constrained knapsack problem based on tail bound objectives, in: ECAI, Vol. 325, IOS Press, 2020, pp. 307–314.
- [19] Y. Xie, A. Neumann, F. Neumann, Heuristic strategies for solving complex interacting stockpile blending problem with chance constraints, in: GECCO, ACM, 2021, pp. 1079–1087.
- [20] A. Neumann, F. Neumann, Optimising monotone chance-constrained submodular functions using evolutionary multi-objective algorithms, in: PPSN (1), Springer, 2020, pp. 404–417.
- [21] F. Neumann, A. M. Sutton, Runtime analysis of the (1 + 1) evolutionary algorithm for the chance-constrained knapsack problem, in: FOGA, ACM, 2019, pp. 147–153.
- [22] Y. Xie, A. Neumann, F. Neumann, A. M. Sutton, Runtime analysis of RLS and the (1+1) EA for the chance-constrained knapsack problem with correlated uniform weights, in: GECCO, ACM, 2021, pp. 1187–1194.
- [23] B. Doerr, C. Doerr, A. Neumann, F. Neumann, A. M. Sutton, Optimization of chance-constrained submodular functions, in: AAAI, AAAI Press, 2020, pp. 1460–1467.
- [24] H. Ishii, S. Shiode, T. Nishida, Y. Namasuya, Stochastic spanning tree problem, Discret. Appl. Math. 3 (4) (1981) 263–273.
- [25] B. Doerr, Probabilistic tools for the analysis of randomized optimization heuristics, in: B. Doerr, F. Neumann (Eds.), Theory of Evolutionary Computation – Recent Developments in Discrete Optimization, Springer, 2020, pp. 1–87.
- [26] B. Doerr, C. Doerr, The impact of random initialization on the runtime of randomized search heuristics, Algorithmica 75 (3) (2016) 529–553.
- [27] R. Motwani, P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995.
- [28] B. Doerr, T. Jansen, D. Sudholt, C. Winzen, C. Zarges, Mutation rate matters even when optimizing monotonic functions, Evol. Comput. 21 (1) (2013) 1–27.
- [29] J. Lengler, A. Martinsson, A. Steger, When does hillclimbing fail on monotone functions: An entropy compression argument, in: ANALCO, SIAM, 2019, pp. 94–102.
- [30] B. Doerr, D. Johannsen, C. Winzen, Multiplicative drift analysis, Algorithmica 64 (4) (2012) 673–697.
- [31] P. K. Lehre, C. Witt, Tail bounds on hitting times of randomized search heuristics using variable drift analysis, Combinatorics, Probability & Computing 30 (4) (2021) 550–569.
- [32] F. Neumann, M. Pourhassan, C. Witt, Improved runtime results for simple randomised search heuristics on linear functions with a uniform constraint, Algorithmica 83 (10) (2021) 3209–3237.