This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Approximation Guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II)

Weijie Zheng
School of Computer Science and Technology
International Research Institute for Artificial Intelligence
Harbin Institute of Technology
   Benjamin Doerr
Laboratoire d’Informatique (LIX)
École Polytechnique, CNRS
Institut Polytechnique de Paris
Palaiseau, France
Corresponding author.
Abstract

Recent theoretical works have shown that the NSGA-II efficiently computes the full Pareto front when the population size is large enough. In this work, we study how well it approximates the Pareto front when the population size is smaller.

For the OneMinMax benchmark, we point out situations in which the parents and offspring cover well the Pareto front, but the next population has large gaps on the Pareto front. Our mathematical proofs suggest as reason for this undesirable behavior that the NSGA-II in the selection stage computes the crowding distance once and then removes individuals with smallest crowding distance without considering that a removal increases the crowding distance of some individuals.

We then analyze two variants not prone to this problem. For the NSGA-II that updates the crowding distance after each removal (Kukkonen and Deb (2006)) and the steady-state NSGA-II (Nebro and Durillo (2009)), we prove that the gaps in the Pareto front are never more than a small constant factor larger than the theoretical minimum. This is the first mathematical work on the approximation ability of the NSGA-II and the first runtime analysis for the steady-state NSGA-II. Experiments also show the superior approximation ability of the two NSGA-II variants.

1 Introduction

While the theory of evolutionary algorithms (EAs), in particular, the mathematical runtime analysis, has made substantial progress in the last 25 years [NW10, AD11, Jan13, ZYQ19, DN20], the rigorous understanding of multi-objective EAs (MOEAs), started 20 years ago [LTZ+02, Gie03, LTZ04], is less developed and is massively lagging behind their success in practice. However, in the last years some significant progress has been made, for example [BQT18, RNNF19, QYT+19, QBF20, BFQY20, ZD23c, Cra21, DDHW23]. In particular, now the first analyses of MOEAs that are massively used in practice have appeared, namely for the MOEA/D [LZZZ16], the SMS-EMOA [BZLQ23], and most notably the NSGA-II [ZLD22] (see [ZD23a] for the journal version), the by far dominant algorithm in practice [ZQL+11].

The analysis of the NSGA-II [ZLD22] proved that several variants of this algorithm can compute the full Pareto front of the OneMinMax and LOTZ benchmarks efficiently when the population size is chosen by a constant factor larger than the size of the Pareto front (which is n+1n+1 for these two problems in {0,1}n\{0,1\}^{n}). However, it was also proven (for the OneMinMax problem with problem size nn) that a population size strictly larger than the Pareto front is necessary – if these two sizes are only equal, then with probability 1exp(Ω(n))1-\exp(-\Omega(n)) for an exponential number of iterations the population of the NSGA-II does not cover a constant fraction of the Pareto front. Experiments show that this fraction is roughly 20% for the OneMinMax benchmark and roughly 40% for the LOTZ benchmark. Several runtime analyses of the NSGA-II quickly followed this first work, see the literature review in Section 2.2. However, all these works only discuss the efficiency of covering the full Pareto front.

Since we cannot always assume that the NSGA-II is run with a population size larger than the Pareto front size by a constant factor – both because the algorithm user does not know the size of the Pareto front and because some problems have a so large Pareto front that using a comparably large population size is not possible –, a deeper understanding of the approximation performance of the NSGA-II is highly desirable. This is the target of this work.

There is some reason to be optimistic: The experiments conducted in [ZLD22] for the case that the population size equals the size of the Pareto front not only gave the negative result that 20% or 40% of the Pareto front was not covered, but they also showed that the missing points are relatively evenly distributed over the Pareto front: the largest empty interval ever seen in all experiments was of length 44. Hence the population evolved by the NSGA-II in these experiments was a very good approximation of the Pareto front.

Our results: Unfortunately, we observe that these positive findings do not extend to smaller population sizes. However, our negative results let us detect in the selection mechanism of the NSGA-II a reason for the lower-than-expected approximation capability. This suggests a natural modification of the algorithm. For this, we prove that it computes populations that approximate the Pareto fronts of the OneMinMax and LOTZ benchmarks very well.

In detail, when we ran experiments for OneMinMax with problem size n=601n=601 and population sizes N=(n+1)/2=301,(n+1)/4=151N=(n+1)/2=301,\lceil(n+1)/4\rceil=151, and (n+1)/8=76\lceil(n+1)/8\rceil=76, empty intervals with sizes around 88, 1515, and 2626 regularly occurred (see Table 1). Unfortunately, due to the complicated population dynamics of the NSGA-II, we were not able to prove a meaningful lower bound. Nevertheless, the experimental results show that even on a simple benchmark like OneMinMax, the largest empty interval the population has on the Pareto front is significantly larger than the optimal value (n+1)1N1\lceil\frac{(n+1)-1}{N-1}\rceil (33, 55, and 99 for these population sizes above), which would result from a perfect distribution of the population on the Pareto front.

To better understand how this discrepancy can arise, we regard two synthetic examples. We show that when the combined parent and offspring population is such that each point on the Pareto front is covered exactly once (this implies that the population size is essentially half the size of the Pareto front), then with high probability the next parent population does not cover an interval of length Θ(logn)\Theta(\log n) on the Pareto front (whereas simply removing every second point would give a population such that each point on the Pareto front has a neighbor that is covered by the population). We further construct a more artificial example where the combined parent and offspring population covers the Pareto front apart from isolated points, but the next parent population does not cover an interval of length n/3n/3 of the Pareto front.

The reason why we were able to construct such examples is the following property of the selection scheme of the NSGA-II. To select the new parent population, the NSGA-II uses as first criterion the non-dominated sorting and then the crowding distance. The crowding distance, however is not updated during the selection process. That is, while removing individuals with smallest crowding distance, the changing crowding distance of the remaining individuals is not taken into account, but instead the algorithm proceeds with the initial crowding distance. We assume that this design choice was made for reasons of efficiency – by not updating the crowding distance, it suffices to sort the combined parent and offspring population once and then remove the desired number of individuals. We note that this shortcoming of the traditional selection method of the NSGA-II was, with intuitive arguments, already detected by Kukkonen and Deb [KD06, Figure 2], a paper which unfortunately is not too well-known in the community (we overlooked it when preparing the conference version [ZD22] and none of the reviewers detected this oversight; our deepest thanks to Hisao Ishibuchi for pointing us to the work of Kukkonen and Deb).

There is a natural remedy to this shortcoming, proposed also by Kukkonen and Deb [KD06], and this is to sequentially remove individuals always based on the current crowding distance. This procedure can be implemented very efficiently: The removal of one individual changes the crowding distance of at most 44 other individuals (in a bi-objective problem), so at most 44 crowding distance values need to be updated. There is no need for a new sorting from scratch when we use as a data structure a priority queue. With this implementation, the selection based on the current crowding distance takes not more than O(NlogN)O(N\log N) operations, which is the same asymptotic complexity as the one of sorting the individuals by their initial crowding distance in the original NSGA-II. We note that both operations are fast compared to the non-dominated sorting step with its quadratic time complexity.

For this modified NSGA-II, the problems shown above for the traditional NSGA-II cannot occur. For problem size n=601n=601, the modified algorithm for N=301N=301 never created an empty interval on the Pareto front larger than optimal value of 33. For N=151N=151, in more than half the iterations all empty intervals observed the optimal value of 55, in the other iterations the maximum empty interval (MEI) had a length of 66. For N=76N=76, the median MEI value was 1111 (optimal value: 99). Hence the modified algorithm distributes the population on the Pareto front in a significantly more balanced manner.

For this algorithm, we can also prove a guarantee on the approximation quality. After a time comparable to the time needed to find the two extremal solutions of the front, for all future generations with probability one the largest empty interval on the Pareto front has length at most max{2nN3,1}\max\{\frac{2n}{N-3},1\}, hence at most a constant factor larger than the theoretical minimum of (n+1)1N1\lceil\frac{(n+1)-1}{N-1}\rceil. Consequently, even with a population size not large enough to cover the full Pareto front, this algorithm computes very good approximations to the Pareto front.

There is a second variant of the NSGA-II proposed in the past that, by definition, is not prone to the problem of working with initial crowding distance values, namely the steady-state NSGA-II proposed by Durillo et al. [DNLA09], which generates a single offspring per iteration. There are no theoretical result on this algorithm yet, but the empirical results of Nebro and Durillo showed its very competitive approximation strength. For this reason, we also conduct a mathematical runtime analysis of this algorithm on the OneMinMax benchmark. We prove the same good approximation guarantees as for the NSGA-II with current crowding distance. Our experiments in Section 7 verify this similarity and also indicate a more steady behavior of the steady-state NSGA-II.

This work extends our conference paper [ZD22] majorly in the following ways. This version obtains tighter estimates of the relation between the ε\varepsilon-dominance and maximal empty interval length. We further discuss the relation between the hypervolume indicator and the maximum empty interval length, which was not contained in the conference version. This version improves approximation guarantee for the NSGA-II with current crowding distance from 4n/(N3)4n/(N-3) to 2n/(N3)2n/(N-3). We also added a new section discussing the approximation guarantee of the steady-state NSGA-II and the corresponding experiments. Besides, this version contains all mathematical proofs that were omitted in the conference version for reasons of space.

This work is organized as follows. Section 2 briefly introduces bi-objective optimization and the NSGA-II. Section 3 discusses the approximation measures that will be used in this work. The approximation difficulties of the traditional NSGA-II are theoretically shown via two synthetic examples in Section 4. Section 5 introduces our modified variant of the NSGA-II and conducts the theoretical analysis of its approximation ability. Our experiments are discussed in Section 7. Section 8 concludes this work.

2 Preliminaries

2.1 Bi-objective Optimization and the OneMinMax Benchmark

In this paper, we regard on bi-objective optimization problems f=(f1,f2):{0,1}n2f=(f_{1},f_{2}):\{0,1\}^{n}\rightarrow\mathbb{R}^{2} with each objective to be maximized. For x,y{0,1}nx,y\in\{0,1\}^{n}, we say that xx strictly dominates yy, denoted by xyx\succ y, if f1(x)f1(y),f2(x)f2(y)f_{1}(x)\geq f_{1}(y),f_{2}(x)\geq f_{2}(y) and at least one of the inequalities is strict. If xx cannot be strictly dominated by any solution in {0,1}n\{0,1\}^{n}, we say that xx is Pareto optimal and that f(x)f(x) is a Pareto front point. The set of all Pareto front points is called Pareto front. The typical aim for a multi-objective optimizer is to compute the Pareto front, that is, compute a set of solutions such that f(P)f(P) is the Pareto front, or to approximate it well.

We shall work with the popular bi-objective benchmark OneMinMax. OneMinMax was first proposed in [GL10] and is similar to the COCZ benchmark defined in [LTZ04]. The first objective of OneMinMax counts the number of zeros in the bit-string, and the second objective counts the number of ones. More specifically, for any x=(x1,,xn){0,1}nx=(x_{1},\dots,x_{n})\in\{0,1\}^{n}, the OneMinMax function is defined by

f(x)=(f1(x),f2(x))=(ni=1nxi,i=1nxi).f(x)=(f_{1}(x),f_{2}(x))=\mathopen{}\mathclose{{}\left(n-\sum_{i=1}^{n}x_{i},\sum_{i=1}^{n}x_{i}}\right).

It is not difficult to see that any solution x{0,1}nx\in\{0,1\}^{n} is Pareto optimal and that the Pareto front is M:={(0,n),(1,n1),,(n,0)}M:=\{(0,n),(1,n-1),\dots,(n,0)\}.

2.2 The Non-Dominated Sorting Genetic Algorithm II (NSGA-II)

We now give a brief introduction to the NSGA-II, which was first proposed in [DPAM02] and now is the by far dominant MOEA in practice [ZQL+11]. It works with a fixed population size NN. Consequently, each time new individuals are generated, the NSGA-II needs to remove individuals to maintain this fixed population size. To this aim, the NSGA-II computes a complete order on the combined parent and offspring population that uses the dominance as the first criterion and a diversity measure (crowding distance) as the second criterion, and removes the worst individuals.

In more detail, after the random initialization, in each generation tt, an offspring population QtQ_{t} with size NN is generated from the parent population PtP_{t}. The NSGA-II now needs to remove NN individuals from the combined population Rt=PtQtR_{t}=P_{t}\cup Q_{t}. To this aim, it divides RtR_{t} into several fronts F1,F2,,F_{1},F_{2},\dots, where F1F_{1} is the set of the non-dominated solutions in RtR_{t}, and Fi,i>1,F_{i},i>1, is the set of the non-dominated solutions in Rt{F1,,Fi1}R_{t}\setminus\{F_{1},\dots,F_{i-1}\}. For the first index ii^{*} such that the size of i=1iFi\bigcup_{i=1}^{i^{*}}F_{i} is at least NN, the NSGA-II will calculate the crowding distance, denoted by cDis\operatorname{cDis}, of the individuals in FiF_{i^{*}} as follows.111If the crowding distance is also used for the parent selection (like in tournament selection), then now (Algorithm 2, step 6) also the crowding distance of the individuals in F1,,Fi1F_{1},\dots,F_{i^{*}-1} is computed. For each objective, the individuals are sorted according to their objective values. The cDis\operatorname{cDis} value of the first and last point in the sorted list is infinite. For the other individuals, the cDis\operatorname{cDis} with respect to the current objective is the normalized distance of the objective values of its two neighbors in the list. The complete cDis\operatorname{cDis} of an individual is the sum of its cDis\operatorname{cDis} components for all objectives. See Algorithm 1 for a complete description of the computation of the crowding distance. Then the |i=1iFi|N|\bigcup_{i=1}^{i^{*}}F_{i}|-N individuals in FiF_{i*} with smallest crowding distance value and all individuals in Fi,i>iF_{i},i>i^{*}, are removed (ties broken randomly). The complete NSGA-II framework is shown in Algorithm 2.

We note that, for reasons of generality, in Algorithm 1 we make no assumption on how individuals with equal objective values are sorted. As pointed out in [BQ22], for bi-objective problems the sorting w.r.t. the second objective can be taken as the inverse of the first sorting. This has mild algorithmic advantages (but note that the quadratic time complexity of the non-dominated sorting procedure dominates the complexity of the selection stage) and can lower the minimum required population size to compute the whole Pareto front by a factor of two. It is clear that this idea can only make a difference when two individuals with identical value in one objective are present, as only then the sorting is not unique. In our setting with the population size smaller than the size of the Pareto front, we do not expect this to happen too often, so we do not expect significantly better results under this assumption.

Since it is not the most central aspect of our work, we have not yet discussed how the offspring population is computed. As in [ZLD22], we shall regard a mutation-only version of the NSGA-II. For a problem like OneMinMax, composed of two very simple unimodal objectives, we do not expect that the use of crossover gives significant advantages. Also, we are convinced that our proofs can easily be extended to variants that use crossover with some constant probability (less than one) – we note that in particular the central result of this work regarding the selection of the next population (Lemma 13) does not rely on any assumption about how the offspring are created. As mutation operators, we shall regard the two classic ones of one-bit mutation, which flips a random bit in the argument, and standard bit-wise mutation, which flips each bit independently with probability 1n\frac{1}{n}. We consider three ways to select the parents for the mutation (mating selection). In fair selection, we let each of the NN parents generate exactly one offspring. In random selection, we NN times independently and uniformly at random (hence “with replacement”) select a parent and let it create an offspring. In binary tournament selection, we NN times independently and uniformly at random pick two parents, and let the better one (according to non-dominated sorting and crowding distance, ties broken randomly) create an offspring.

1 Input: S={S1,,S|S|}S=\{S_{1},\dots,S_{|S|}\}, a set of individuals
2 Output: cDis(S)=(cDis(S1),,cDis(S|S|))\operatorname{cDis}(S)=(\operatorname{cDis}(S_{1}),\dots,\operatorname{cDis}(S_{|S|})), where cDis(Si)\operatorname{cDis}(S_{i}) is the crowding distance for SiS_{i}
1:cDis(S)=(0,,0)\operatorname{cDis}(S)=(0,\dots,0)
2:for each objective fif_{i} do
3:   Sort SS in order of descending fif_{i} value: Si.1,,Si.|S|S_{i.1},\dots,S_{i.{|S|}}
4:   cDis(Si.1)=+,cDis(Si.|S|)=+\operatorname{cDis}(S_{i.1})=+\infty,\operatorname{cDis}(S_{i.{|S|}})=+\infty
5:   for j=2,,|S|1j=2,\dots,|S|-1 do
6:      cDis(Si.j)=cDis(Si.j)+fi(Si.j1)fi(Si.j+1)fi(Si.1)fi(Si.|S|)\operatorname{cDis}(S_{i.j})=\operatorname{cDis}(S_{i.j})+\frac{f_{i}(S_{i.{j-1}})-f_{i}(S_{i.{j+1}})}{f_{i}(S_{i.1})-f_{i}(S_{i.{|S|}})}
7:   end for
8:end for
Algorithm 1 Computation of the crowding distance cDis(S)\operatorname{cDis}(S)
1:Uniformly at random generate the initial population P0={x1,x2,,xN}P_{0}=\{x_{1},x_{2},\dots,x_{N}\} with xi{0,1}n,i=1,2,,N.x_{i}\in\{0,1\}^{n},i=1,2,\dots,N.
2:for t=0,1,2,t=0,1,2,\dots do
3:   Generate the offspring population QtQ_{t} with size NN
4:   Use fast-non-dominated-sort() in [DPAM02] to divide RtR_{t} into fronts F1,F2,F_{1},F_{2},\dots
5:   Find i1i^{*}\geq 1 such that |i=1i1Fi|<N|\bigcup_{i=1}^{i^{*}-1}F_{i}|<N and |i=1iFi|N|\bigcup_{i=1}^{i^{*}}F_{i}|\geq N
6:   Use Algorithm 1 to separately calculate the crowding distance of each individual in F1,,FiF_{1},\dots,F_{i^{*}}
7:   Let F~i\tilde{F}_{i^{*}} be the N|i=1i1Fi|N-|\bigcup_{i=1}^{i^{*}-1}F_{i}| individuals in FiF_{i^{*}} with largest crowding distance, chosen at random in case of a tie
8:   Pt+1=(i=1i1Fi)F~iP_{t+1}=\mathopen{}\mathclose{{}\left(\bigcup_{i=1}^{i^{*}-1}F_{i}}\right)\cup\tilde{F}_{i^{*}}
9:end for
Algorithm 2 NSGA-II

We briefly review the state of the art in runtime analysis for the NSGA-II. In the first mathematical runtime analysis222As usual, we call a mathematical runtime analysis a theoretical work estimating the number of iterations or fitness evaluations it takes to reach a certain goal. This is different from the implementational complexity of the operations for each iteration, as discussed, e.g., already in the original NSGA-II paper [DPAM02]. of the NSGA-II [ZLD22], it was shown that the NSGA-II with several mating selection and mutation strategies and population size N=Ω(n)N=\Omega(n) sufficiently large, efficiently computes the Pareto fronts of the OneMinMax and LOTZ benchmarks, namely in expected O(Nnlogn)O(Nn\log n) and O(Nn2)O(Nn^{2}) fitness evaluations. For N=Θ(n)N=\Theta(n), these are the same asymptotic complexities as those known for the basic global simple evolutionary multi-objective optimizer (GSEMO) [Gie03], namely O(n2logn)O(n^{2}\log n) for OneMinMax and O(n3)O(n^{3}) for LOTZ. However, the work [ZLD22] also showed that with a population size of n+1n+1, that is, equal to the size of the Pareto front, it takes at least an exponential time to compute a population covering the Pareto front better than with a constant-factor loss.

This work has quickly led to several follow-up works beyond the conference version [ZD22] of this work. In [BQ22], it was observed that using the same sorting for the two objectives allows to lower the minimum required population size to twice the size of the Pareto front (this is proven for the LOTZ benchmark, but the argument can easily be extended to OneMinMax). Also, in this work for the first time an NSGA-II with crossover is regarded and runtime guarantees analogous to those in [ZLD22] are proven. The most profound result of this work is that significant runtime improvements (from O(Nn2)O(Nn^{2}) to O(Nn)O(Nn) when optimizing LOTZ with N=Θ(n)N=\Theta(n)) can be obtained from a newly proposed tournament selection operator, which chooses the tournament size chosen uniformly at random from the range [1..N][1..N]. This is particularly interesting in that here a uniform random choice was successful, whereas most previous works using random parameter choices employ heavy-tailed distributions, see, e.g., [DLMN17, ABD21, ZD23c, DELQ22]. The first mathematical runtime analysis on a multimodal problem, the OneJumpZeroJump benchmark from [ZD23c], was conducted in [DQ23a]. It shows that when the population size is at least 44 times the Pareto front size, then the NSGA-II with the right population size is as effective as the GSEMO on this benchmark. The first lower bounds for the NSGA-II [DQ23b] show that the previous results for OneMinMax [ZLD22] and OneJumpZeroJump [DQ23a] are asymptotically tight, even for larger population sizes. This implies that increasing the population size above the minimum required size immediately leads to asymptotic performance losses. This is very different from single-objective optimization, where often a larger range of population sizes gives the same asymptotic performance, see, e.g., [JJW05, DK15] for such results for the OneMax benchmark. Two results showing that crossover can give asymptotic performance gains appeared in parallel [DOSS23b, DQ23c]. The first runtime analysis for a combinatorial optimization problem, the bi-objective minimum spanning tree problem previously analyzed for the GSEMO [Neu07], appeared in [CDH+23]. A runtime analysis in the presence of noise (together with the independent parallel work [DDHW23] the first mathematical runtime analysis of a MOEA in a noisy environment) was conducted in [DOSS23a]. That the NSGA-II can have difficulties with more than two objectives was shown for OneMinMax in [ZD23b], whereas the NSGA-III was proven to be efficient on the 33-objective OneMinMax problem in [WD23].

As pointed in Section 1, all these theoretical works consider the time taken to cover the full Pareto front, so no other work exists on approximating the Pareto front.

3 Approximation Measures

For reasons of efficiency, instead of computing the whole Pareto front, one often resorts to approximating it, that is, to computing a set of solutions that is a reasonable representation for the whole front. This raises the question of how to measure the approximation quality of such a set of solutions. Several different approximation measures have been proposed in the literature such as multiplicative ε\varepsilon-dominance [LTDZ02], various kinds of generational distances [VL98, BT03, CCRS04], or the hypervolume [ZT98]. For the OneMinMax problem regarded in this work, the particular structure of the problem suggests to use a more elementary measure, namely the maximum length of an empty interval on the Pareto front. We define this measure in Section 3.1. We will then in Sections 3.2 and 3.3 compare the new measure with two classic measures, namely ε\varepsilon-dominance, which has frequently been regarded in theoretical works, and hypervolume, which is mostly widely used in general [SIHP21]. We are optimistic that our new measure can equally easily be compared to other measures.

3.1 Maximal Empty Interval Size

For OneMinMax with problem size nn that this paper will analyze, each possible objective value is on the Pareto front and the first objective values of full Pareto front are exactly 0,1,,n0,1,\dots,n. Any missing Pareto front point can be directly seen in [0..n][0..n], hence, we now simply use a measure about the size of the maximal empty interval, denoted as MEI, inside [0..n][0..n] in terms of the solutions that one MOEA reaches and with respect to f1f_{1} values. If the maximal empty interval size is as small as possible, then the MOEA can approximate the Pareto front as well as possible. The formal definition of the MEI of a set UU in the objective space is as follows.

Definition 1.

Let S={(s1,ns1),,(sm,nsm)}S=\{(s_{1},n-s_{1}),\dots,(s_{m},n-s_{m})\} be a subset of the Pareto front MM of OneMinMax. Let j1,j2,,jmj_{1},j_{2},\dots,j_{m} be the sorted list of s1,,sms_{1},\dots,s_{m} in the increasing order (ties broken uniformly at random). We define the maximal empty interval size of SS, denoted by MEI(S)\operatorname{MEI}(S), as

MEI(S)=max{ji+1jii=1,,m1}.\displaystyle\operatorname{MEI}(S)=\max\{j_{i+1}-j_{i}\mid i=1,\dots,m-1\}.

For n2n\in\mathbb{N}_{\geq 2}, we further define

MEIopt(N):=min{MEI(S)SM,|S|N,(0,n)S,(n,0)S}.\displaystyle\operatorname{MEI}_{\operatorname{opt}}(N):=\min\{\operatorname{MEI}(S)\mid S\subseteq M,|S|\leq N,(0,n)\in S,(n,0)\in S\}.

Obviously, this is the smallest MEI\operatorname{MEI} that an MOEA with a fixed population size NN can obtain when the extremal points (0,n)(0,n) and (n,0)(n,0) are covered.

It is not difficult to see that MEIopt(N)\operatorname{MEI}_{\operatorname{opt}}(N) is witnessed by a set SS as evenly distributed as possible. We explicitly formulate this observation in the following lemma.

Lemma 2.

For all N2N\in\mathbb{N}_{\geq 2}, we have MEIopt(N)=nN1\operatorname{MEI}_{\operatorname{opt}}(N)=\lceil\frac{n}{N-1}\rceil.

Proof.

Consider any SMS\subseteq M with |S|N,(0,n)S,|S|\leq N,(0,n)\in S, and (n,0)S(n,0)\in S. Let JJ be the set of the first objective values of SS, and let J={j1,,j|J|}J=\{j_{1},\dots,j_{|J|}\} with ja<jbj_{a}<j_{b} for a<ba<b. Then we have j1=0j_{1}=0 and j|J|=nj_{|J|}=n, and thus

i=1|J|1ji+1ji=j|J|j1=n.\displaystyle\sum_{i=1}^{|J|-1}j_{i+1}-j_{i}=j_{|J|}-j_{1}=n.

Hence, there exists i0[1..|J|1]i_{0}\in[1..|J|-1] such that ji0+1ji0n/(|J|1)j_{i_{0}+1}-j_{i_{0}}\geq n/(|J|-1). Since ji0+1j_{i_{0}+1} and ji0j_{i_{0}} are integers, we then have ji0+1ji0n/(|J|1)j_{i_{0}+1}-j_{i_{0}}\geq\lceil n/(|J|-1)\rceil. As MEI(S)=max{ji+1jii=1,,|J|}\operatorname{MEI}(S)=\max\{j_{i+1}-j_{i}\mid i=1,\dots,|J|\}, we know that MEI(S)n/(|J|1)n/(N1)\operatorname{MEI}(S)\geq\lceil n/(|J|-1)\rceil\geq\lceil n/(N-1)\rceil, where the last inequality uses |J||S|N|J|\leq|S|\leq N. Hence, MEIopt(N)n/(N1)\operatorname{MEI}_{\operatorname{opt}}(N)\geq\lceil{n}/{(N-1)}\rceil.

Now we prove that MEIopt(N)n/(N1)\operatorname{MEI}_{\operatorname{opt}}(N)\leq\lceil n/(N-1)\rceil. Let ji=min{(i1)n/(N1),n}j_{i}=\min\{(i-1)\mathopen{}\mathclose{{}\left\lceil n/(N-1)}\right\rceil,n\} for i=1,,Ni=1,\dots,N. Then j1=0j_{1}=0 and jN=nj_{N}=n. Consider the set S={(ji,njj)i[1..N]}S^{\prime}=\{(j_{i},n-j_{j})\mid i\in[1..N]\}. It is not difficult to see that

ji+1ji=min{(i+1)nN1,n}min{inN1,n}nN1\displaystyle j_{i+1}-j_{i}=\min\mathopen{}\mathclose{{}\left\{(i+1)\mathopen{}\mathclose{{}\left\lceil\tfrac{n}{N-1}}\right\rceil,n}\right\}-\min\mathopen{}\mathclose{{}\left\{i\mathopen{}\mathclose{{}\left\lceil\tfrac{n}{N-1}}\right\rceil,n}\right\}\leq\mathopen{}\mathclose{{}\left\lceil\tfrac{n}{N-1}}\right\rceil

for all i[1..N1]i\in[1..N-1]. Hence MEI(S)n/(N1)\operatorname{MEI}(S^{\prime})\leq\lceil n/(N-1)\rceil. By definition of MEIopt(N)\operatorname{MEI}_{\operatorname{opt}}(N), we know that MEIopt(N)MEI(S)n/(N1)\operatorname{MEI}_{\operatorname{opt}}(N)\leq\operatorname{MEI}(S^{\prime})\leq\lceil n/(N-1)\rceil. ∎

3.2 ε\varepsilon-Dominance

This subsection discusses the optimal approximation quality w.r.t. the classic ε\varepsilon-dominance measure for OneMinMax and compares this measure with the MEI.

3.2.1 Background and Definition

One way to measure the quality of solution sets is via ε\varepsilon-dominance, which is a relaxed notion of dominance first defined in [LTDZ02]. A set SS of objective vectors is then called an ε\varepsilon-approximation of the Pareto front if each point of the Pareto front is ε\varepsilon-dominated by a point from SS.

In this subsection, we only discuss multiplicative ε\varepsilon-dominance and simply call it ε\varepsilon-dominance, as it is the main variant discussed in [LTDZ02] and the variant most used in follow-up works. Here is the formal definition.

Definition 3 ((Multiplicative) ε\varepsilon-dominance [LTDZ02]).

Let ε>0\varepsilon>0 and m>0m>0 be the number of objectives. For u,vmu,v\in\mathbb{R}^{m}, we say uu ε\varepsilon-dominates vv, denoted by uεvu\succeq_{\varepsilon}v, if and only if (1+ε)uv(1+\varepsilon)u\geq v, that is, (1+ε)uivi(1+\varepsilon)u_{i}\geq v_{i} for all i=1,,mi=1,\dots,m.

Let W={uum}W=\{u\mid u\in\mathbb{R}^{m}\} be the whole objective vector set for a given problem. We say a subset SWS\subseteq W is an ε\varepsilon-approximation for this problem if and only if for each vWv\in W, there exists uSu\in S such that uεvu\succeq_{\varepsilon}v.

Here we briefly review the theoretical works utilizing ε\varepsilon-dominance relation as the measure to evaluate the approximation performance of the MOEAs or as the basis to design the MOEAs considering more diversity in the survival selection. Horoba and Neumann [HN08] defined the LargeFrontε function and proved that to obtain an ε\varepsilon-approximation, the GSEMO needs a 2Ω(n1/4)2^{\Omega(n^{1/4})} runtime with 12Ω(n1/4)1-2^{-\Omega(n^{1/4})} probability, while the GSEMO with a diversity strategy based on ε\varepsilon-dominance only requires an expected runtime of O(n2logn)O(n^{2}\log n). A similar work with respect to the additive ε\varepsilon-dominance relation was conducted in [HN09]. To reach an ε\varepsilon-approximation for the LargeFrontε, Brockhoff, Friedrich, and Neumann [BFN08] proved that the runtime for the (μ+1)(\mu+1)-simple indicator-based evolutionary algorithm ((μ+1)(\mu+1)-SIBEA) is O(n2logn)O(n^{2}\log n).

For the GSEMO with ε\varepsilon-dominance diversity strategy, Neumann and Reichel [NR08] proved that it can achieve good approximations for the minimum cut problem in expected polynomial time without restrictions on the graph weight, while the original GSEMO requires the bounded weight to ensure the efficient approximation. The efficiency of the GSEMO with ε\varepsilon-dominance diversity strategy can also be witnessed in [NRS11, PSN19].

Gutjahr [Gut12] replaced the survival selection of the SEMO considering the common dominance by the additive ε\varepsilon-dominance, inserted the modified SEMO into the adaptive Pareto sampling (APS) framework, and proved that for one stochastic multi-objective combinatorial optimization problem, the expected runtime for such APS can be bounded from above, with the bound depending on the expected runtime of the original SEMO on the deterministic counterpart.

3.2.2 Relation of Maximal Empty Intervals and ε\varepsilon-Approximations

The two approximation measures MEI and ε\varepsilon-approximation are almost equivalent. In this subsection, we show that for any SMS\subseteq M with (0,n),(n,0)S(0,n),(n,0)\in S the smallest ε\varepsilon rendering SS an ε\varepsilon-approximation satisfies (MEI(S)1)/(nMEI(S))εMEI(S)/(nMEI(S))(\operatorname{MEI}(S)-1)/(n-\operatorname{MEI}(S))\leq\varepsilon\leq\operatorname{MEI}(S)/(n-\operatorname{MEI}(S)).

We first prove the following relation between the MEI and ε\varepsilon-dominance.

Lemma 4.

Let SS be a subset of the Pareto front MM of OneMinMax with (0,n),(n,0)S(0,n),(n,0)\in S. Let ε=MEI(S)/(nMEI(S))\varepsilon=\operatorname{MEI}(S)/(n-\operatorname{MEI}(S)). Then SS is an ε\varepsilon-approximation for M¯:={(x,nx)x[0,n]}\overline{M}:=\{(x,n-x)\mid x\in[0,n]\} (and thus also for MM).

Proof.

Let JJ be the set of the first objective values of SS. More specifically, let J={j1,,j|J|}J=\{j_{1},\dots,j_{|J|}\} with ja<jbj_{a}<j_{b} for a<ba<b. From the assumption, we know j1=0j_{1}=0 and j|J|=nj_{|J|}=n. Consider any a[1..|J|1]a\in[1..|J|-1] and let v1=(ja,nja)v_{1}=(j_{a},n-j_{a}) and v2=(ja+1,nja+1)v_{2}=(j_{a+1},n-j_{a+1}). For any v=(r,nr)v=(r,n-r) with r[ja,ja+1]r\in[j_{a},j_{a+1}], we have

(1+ε)(n(ja+1ja))\displaystyle(1+\varepsilon)(n-(j_{a+1}-j_{a})) =(1+MEI(S)nMEI(S))(n(ja+1ja))\displaystyle={}\mathopen{}\mathclose{{}\left(1+\frac{\operatorname{MEI}(S)}{n-\operatorname{MEI}(S)}}\right)(n-(j_{a+1}-j_{a}))
=n(n(ja+1ja))nMEI(S)n.\displaystyle={}\frac{n(n-(j_{a+1}-j_{a}))}{n-\operatorname{MEI}(S)}\geq n.

Hence we know that (1+ε)jar(1+\varepsilon)j_{a}\geq r or (1+ε)(nja+1)nr(1+\varepsilon)(n-j_{a+1})\geq n-r. Thus vv is ε\varepsilon-dominated by one of v1v_{1} and v2v_{2}. ∎

Similar to MEIopt\operatorname{MEI}_{\operatorname{opt}}, we define an optimal value for ε\varepsilon. From Definition 3, it is not difficult to see that the smaller ε\varepsilon, the better SS approximates the set WW. Let A(M,ε)A(M,\varepsilon) be the collection of all SS that are an ε\varepsilon-approximation of MM. Then, for N2N\in\mathbb{N}_{\geq 2}, let

εopt(N)=inf{ε>0SA(M,ε),|S|=N,(0,n)S,(n,0)S}.\displaystyle\varepsilon_{\operatorname{opt}}(N)=\inf\{\varepsilon>0\mid\exists S\in A(M,\varepsilon),|S|=N,(0,n)\in S,(n,0)\in S\}.

Obviously, this is the smallest ε\varepsilon so that a population of size NN covering the two extremal points can be an ε\varepsilon approximation of MM.

From Lemma 4, we easily obtain the following upper bound of the optimal ε\varepsilon.

Corollary 5.

For all N2N\in\mathbb{N}_{\geq 2}, we have εopt(N)MEIopt(N)/(nMEIopt(N))\varepsilon_{\operatorname{opt}}(N)\leq\operatorname{MEI}_{\operatorname{opt}}(N)/(n-\operatorname{MEI}_{\operatorname{opt}}(N)).

Now we establish a lower bound for ε\varepsilon w.r.t. MEI\operatorname{MEI}, and also a lower bound for the optimal ε\varepsilon as its corollary.

Lemma 6.

Let SS with (0,n),(n,0)S(0,n),(n,0)\in S be an ε\varepsilon-approximation for M{M}. Then ε(MEI(S)1)/(nMEI(S))\varepsilon\geq(\operatorname{MEI}(S)-1)/(n-\operatorname{MEI}(S)).

Proof.

Let v1=(i1,ni1)Sv_{1}=(i_{1},n-i_{1})\in S and v2=(i2,ni2)S,i1<i2v_{2}=(i_{2},n-i_{2})\in S,i_{1}<i_{2} be any two neighboring points in SS. Let v3=(i3,ni3)v_{3}=(i_{3},n-i_{3}) with i3[i1+1..i21]i_{3}\in[i_{1}+1..i_{2}-1]. If there exists an ii2i^{\prime}\geq i_{2} such that v=(i,ni)εv3v^{\prime}=(i^{\prime},n-i^{\prime})\succeq_{\varepsilon}v_{3}, then (1+ε)(ni)ni3(1+\varepsilon)(n-i^{\prime})\geq n-i_{3}. Since ii2i^{\prime}\geq i_{2}, we have ni2nin-i_{2}\geq n-i^{\prime} and thus (1+ε)(ni2)ni3(1+\varepsilon)(n-i_{2})\geq n-i_{3}. Together with i2>i3i_{2}>i_{3}, we know v2εv3v_{2}\succeq_{\varepsilon}v_{3}. Analogously, if there exists an ε\varepsilon and ii1i^{\prime}\leq i_{1} such that v=(i,ni)εv3v^{\prime}=(i^{\prime},n-i^{\prime})\succeq_{\varepsilon}v_{3}, then v1εv3v_{1}\succeq_{\varepsilon}v_{3}. Hence one of v1v_{1} and v2v_{2} ε\varepsilon-dominates v3v_{3}.

If v2εv3v_{2}\succeq_{\varepsilon}v_{3}, then (ni2)(1+ε)ni3(n-i_{2})(1+\varepsilon)\geq n-i_{3}, that is, ε(ni3)/(ni2)1\varepsilon\geq(n-i_{3})/(n-i_{2})-1. If v1εv3v_{1}\succeq_{\varepsilon}v_{3}, then (1+ε)i1i3(1+\varepsilon)i_{1}\geq i_{3}, that is, εi3/i11\varepsilon\geq i_{3}/i_{1}-1. Hence,

εmin{ni3ni21,i3i11}=min{i2i3ni2,i3i1i1}.\displaystyle\varepsilon\geq\min\mathopen{}\mathclose{{}\left\{\frac{n-i_{3}}{n-i_{2}}-1,\frac{i_{3}}{i_{1}}-1}\right\}=\min\mathopen{}\mathclose{{}\left\{\frac{i_{2}-i_{3}}{n-i_{2}},\frac{i_{3}-i_{1}}{i_{1}}}\right\}. (1)

Let a=i1n/(n(i2i1))a=i_{1}n/(n-(i_{2}-i_{1})). While not important for the proof, we note that this is the value for i3i_{3} that renders the two arguments of the minimum equal and thus maximizes (1). If aa is an integer, then from taking i3=ai_{3}=a we obtain ε(i2i1)/(n(i2i1))\varepsilon\geq(i_{2}-i_{1})/(n-(i_{2}-i_{1})). If aa is not an integer, let a:=aa^{\prime}:=\lfloor a\rfloor and δ:=aa\delta:=a-a^{\prime}. Exploiting (1) with i3=ai_{3}=a^{\prime} and i3=a+1i_{3}=a^{\prime}+1, we obtain

ε\displaystyle\varepsilon max{min{i2ani2,ai1i1},min{i2(a+1)ni2,a+1i1i1}}\displaystyle\geq{}\max\mathopen{}\mathclose{{}\left\{\min\mathopen{}\mathclose{{}\left\{\frac{i_{2}-a^{\prime}}{n-i_{2}},\frac{a^{\prime}-i_{1}}{i_{1}}}\right\},\min\mathopen{}\mathclose{{}\left\{\frac{i_{2}-(a^{\prime}+1)}{n-i_{2}},\frac{a^{\prime}+1-i_{1}}{i_{1}}}\right\}}\right\}
=max{ai1i1,i2(a+1)ni2}\displaystyle={}\max\mathopen{}\mathclose{{}\left\{\frac{a^{\prime}-i_{1}}{i_{1}},\frac{i_{2}-(a^{\prime}+1)}{n-i_{2}}}\right\}
=max{i2i1n(i2i1)δi1,i2i1n(i2i1)1δni2}\displaystyle={}\max\mathopen{}\mathclose{{}\left\{\frac{i_{2}-i_{1}}{n-(i_{2}-i_{1})}-\frac{\delta}{i_{1}},\frac{i_{2}-i_{1}}{n-(i_{2}-i_{1})}-\frac{1-\delta}{n-i_{2}}}\right\}
=i2i1n(i2i1)min{δi1,1δni2}i2i11n(i2i1),\displaystyle={}\frac{i_{2}-i_{1}}{n-(i_{2}-i_{1})}-\min\mathopen{}\mathclose{{}\left\{\frac{\delta}{i_{1}},\frac{1-\delta}{n-i_{2}}}\right\}\geq\frac{i_{2}-i_{1}-1}{n-(i_{2}-i_{1})},

where the last inequality uses that the maximal value of min{δ/i1,(1δ)/(ni2)}\min\{\delta/i_{1},(1-\delta)/(n-i_{2})\} is 1/(n(i2i1))1/(n-(i_{2}-i_{1})) when δ=i1/(n(i2i1))\delta=i_{1}/(n-(i_{2}-i_{1})).

Let J={j1,,j|J|}J=\{j_{1},\dots,j_{|J|}\} be the set of first objective values in SS and ordered such that jb<jcj_{b}<j_{c} for all b<cb<c. We have just shown that εjb+1jb1n(jb+1jb)\varepsilon\geq\frac{j_{b+1}-j_{b}-1}{n-(j_{b+1}-j_{b})} for all b[1..|S|1]b\in[1..|S|-1]. Hence

εmaxb[1..|J|1]{jb+1jb1n(jb+1jb)}=MEI(S)1nMEI(S),\displaystyle\varepsilon\geq\max_{b\in[1..|J|-1]}\mathopen{}\mathclose{{}\left\{\frac{j_{b+1}-j_{b}-1}{n-(j_{b+1}-j_{b})}}\right\}=\frac{\operatorname{MEI}(S)-1}{n-\operatorname{MEI}(S)}, (2)

where we use the fact that (jb+1jb1)/(n(jb+1jb))(j_{b+1}-j_{b}-1)/(n-(j_{b+1}-j_{b})) increases as jb+1jbj_{b+1}-j_{b} increases. ∎

Corollary 7.

For all N2N\in\mathbb{N}_{\geq 2}, we have εopt(N)(MEIopt(N)1)/(nMEIopt(N))\varepsilon_{\operatorname{opt}}(N)\geq(\operatorname{MEI}_{\operatorname{opt}}(N)-1)/(n-\operatorname{MEI}_{\operatorname{opt}}(N)).

3.3 Hypervolume

In this subsection, we discuss the classic hypervolume indicator, the most common measure for how well a set of solutions approximates the Pareto front, and its relation to the MEI.

3.3.1 Background and Definition

The hypervolume indicator, denoted by HV in the remainder, was first proposed by Zitzler and Thiele [ZT98]. It is the most intensively used measure for approximation quality in evolutionary multi-objective optimizations [SIHP21]. Here “hypervolume” refers to the size of the space covered by the solution set (with respect to a reference point). Different from MEI\operatorname{MEI}, ε\varepsilon-dominance, and generational distance measures [VL98, BT03, CCRS04], the HV does not require a prior knowledge or estimate about the Pareto front, which is a huge advantage in practical applications.

Definition 8 (Hypervolume (HV)).

Let m>0m>0 be the number of objectives, and let W={uum}W=\{u\mid u\in\mathbb{R}^{m}\} be the whole objective vector set for a given problem. Let rmr\in\mathbb{R}^{m} be such that uru\succeq r for all uWu\in W. Let \mathcal{L} denote the Lebesgue measure in m\mathbb{R}^{m}. Then the hypervolume of a subset JWJ\subseteq W is defined as

HV(J,r)=(uJHu,r),\operatorname{HV}(J,r)=\mathcal{L}\mathopen{}\mathclose{{}\left(\bigcup_{u\in J}H_{u,r}}\right),

where Hu,r=i=1m[ri,ui]H_{u,r}=\prod_{i=1}^{m}[r_{i},u_{i}] is the hyper-rectangle defined by the corners uu and rr.

In addition to its wide usage in practice, there are also some theoretical works discussing the HV. Brockhoff, Friedrich, and Neumann [BFN08] proposed the (μ+1)(\mu+1)-simple indicator-based evolutionary algorithm, (μ+1)(\mu+1)-SIBEA, and proved its efficient expected runtime of O(n2logn)O(n^{2}\log n) for reaching an ε\varepsilon-approximation of the Pareto front of LargeFrontε, for any ε>0\varepsilon\in\mathbb{R}_{>0}. In contrast, the GSEMO with high probability needs at least exponential time to achieve an ε\varepsilon-approximation [HN08]. Besides this result, [BFN08] also proved that the (μ+1)(\mu+1)-SIBEA with μn+1\mu\geq n+1 computes the full Pareto front of the LOTZ function in an expected runtime of O(μn2)O(\mu n^{2}).

Nguyen, Sutton, and Neumann [NSN15] proved that the expected runtime of the (μ+1)(\mu+1)-SIBEA, μn+1\mu\geq n+1, to compute the full Pareto front (hence to reach the maximum HV value possible with this population size) for the OneMinMax problem is O(μnlogn)O(\mu n\log n). They also proved that starting from some initial population and using a population size of only μ=n+1\mu=\sqrt{n}+1, the (μ+1)(\mu+1)-SIBEA with high probability takes at least exponential time to reach the maximum HV value (attainable with this population size). For the LOTZ problem, they showed the following results. If μn+1\mu\geq n+1, then the algorithm can reach the maximum HV value (and thus compute the full Pareto front) in expected time O(μn2)O(\mu n^{2}). If 1<μ<n/31<\mu<n/3, the expected time to reach the maximum hypervolume (and thus approximate the Pareto front best in this measure) is O(μ3n3)O(\mu^{3}n^{3}).

While the work just discussed has shown that the (μ+1)(\mu+1)-SIBEA with small population sizes has difficulties to compute the optimal approximation of the Pareto front of OneMinMax, this algorithm can easily compute good approximations. Doerr, Gao, and Neumann [DGN16] proved that (μ+1)(\mu+1)-SIBEA with μn\mu\leq\sqrt{n} in expected time O(μnlogn)O(\mu n\log n) reaches a population with HV below the value for the full Pareto front by only an additive term of at most 2n2/μ2n^{2}/\mu.

We note that in addition to the above runtime analyses, other theoretical works on structural aspects of the HV exist, e.g., [ABB10, ABBZ12, BF13, FNT15]. We will not discuss them here in full detail as our main focus in this work is the analysis of runtimes of evolutionary algorithms.

However, we note that for discrete problems, the optimal HV value for a population with size less than the finite Pareto front size is not well understood. Emmerich, Deutz, Beume [EDB07, Section 3] proved that the HV value for approximating the continuous Pareto front {(x,1x)x[0,1]}\{(x,1-x)\mid x\in[0,1]\} with μ\mu points plus the two extremal points (1,0),(0,1)(1,0),(0,1) is maximized if the μ\mu points are evenly spaced between two extremal points. Beume et al. [BFLI+09, Lemma 1] proved this result for the line {(x,βx)x[xmin,xmax]}\{(x,\beta-x)\mid x\in[x_{\min},x_{\max}]\} for any β>0\beta>0 and xmin,xmaxx_{\min},x_{\max}\in\mathbb{R} with xminxmaxx_{\min}\leq x_{\max}. Auger et al. [ABBZ12] further extended it to the line {(x,βαx)x[xmin,xmax]}\{(x,\beta-\alpha x)\mid x\in[x_{\min},x_{\max}]\} with 0xmin<xmaxβ/α0\leq x_{\min}<x_{\max}\leq\beta/\alpha for any α,β>0\alpha,\beta>0. They also considered the case that the reference point is dominated by at least one Pareto optimal point, but not necessarily all. Nguyen, Sutton, and Neumann [NSN15] discussed two special cases, namely μ=n+1\mu=\sqrt{n}+1 for OneMinMax and μ(1,n/3)\mu\in(1,n/3) for LOTZ. Both implicitly assume n/(μ1)n/(\mu-1) to be an integer. Doerr, Gao, and Neumann [DGN16] compared the HV of the computed approximations with the HV of the full Pareto front and thus did not require an analysis of the best HV achievable with fixed small population sizes. In summary, for the discrete OneMinMax problem, the optimal HV for μ\mu points is not well analyzed. Hence, we will also bound the optimal HV value when we discuss the relation of MEI and HV in the following Section 3.3.2.

3.3.2 Relation of Maximal Empty Intervals and Hypervolumes

We now analyze the relation between the MEI and the HV\operatorname{HV}. As we shall see, the fact that the hypervolume depends on the sum of the squares of the lengths of the empty intervals disallows a simple relation between the two measures. However, we shall also see that the optimal approximations for the two measures are very similar, in particular, an optimal approximation with respect to the HV\operatorname{HV} has the minimum possible empty interval size.

We first give an exact formula for the hypervolume. Let r=(r1,r2)r=(r_{1},r_{2}) with r1,r20r_{1},r_{2}\leq 0 be the reference point so that it is dominated by any x{0,1}nx\in\{0,1\}^{n} w.r.t. OneMinMax. Let SS be a subset of the Pareto front MM of OneMinMax with (0,n),(n,0)S(0,n),(n,0)\in S, and JJ be the set of the first objective values of SS. More specifically, let J={j1,,j|J|}J=\{j_{1},\dots,j_{|J|}\} with ja<jbj_{a}<j_{b} for a<ba<b. Then we know j1=0j_{1}=0 and j|J|=nj_{|J|}=n. Since each Pareto front point (a,b)(a,b) satisfies a+b=na+b=n, we have

HV(f(S),r)=(r1nr2n+r1r2+12n2)a=2|J|12(jaja1)2=:Aa=2|J|12(jaja1)2,\begin{split}\operatorname{HV}(f(S),r)&={}\mathopen{}\mathclose{{}\left(-r_{1}n-r_{2}n+r_{1}r_{2}+\tfrac{1}{2}n^{2}}\right)-\sum_{a=2}^{|J|}\tfrac{1}{2}(j_{a}-j_{a-1})^{2}\\ &=:{}A-\sum_{a=2}^{|J|}\tfrac{1}{2}(j_{a}-j_{a-1})^{2},\end{split} (3)

where we note that A=r1nr2n+r1r2+12n2A=-r_{1}n-r_{2}n+r_{1}r_{2}+\tfrac{1}{2}n^{2} is the HV of the continuous front {(x,nx)x[0,n]}\{(x,n-x)\mid x\in[0,n]\}. We point out that the AA-part was wrongly computed as A=r1nr2n+2r1r2+12n2A=-r_{1}n-r_{2}n+2r_{1}r_{2}+\tfrac{1}{2}n^{2} in [DGN16, Lemma 2]. This renders the calculations of the expressions IH(F)I_{H}(F) and IH(P)I_{H}(P) there incorrect, but this does not influence the correctness of Lemma 2 in [DGN16] as the extra r1r2r_{1}r_{2} term cancels out in IH(F)IH(P)I_{H}(F)-I_{H}(P).

Now we estimate HV(f(S),r)\operatorname{HV}(f(S),r).

Lemma 9.

Let SS be a subset of the Pareto front MM of OneMinMax with (0,n),(n,0)S(0,n),(n,0)\in S, and let r=(r1,r2)r=(r_{1},r_{2}) with r1,r20r_{1},r_{2}\leq 0 be the reference point. Let A=r1nr2n+r1r2+12n2A=-r_{1}n-r_{2}n+r_{1}r_{2}+\frac{1}{2}n^{2}. Then

HV(f(S),r)[A(|S|1)(MEI(S))22,An2(|S|1)].\displaystyle\operatorname{HV}(f(S),r)\in\mathopen{}\mathclose{{}\left[A-\frac{(|S|-1)(\operatorname{MEI}(S))^{2}}{2},A-\frac{n}{2(|S|-1)}}\right].
Proof.

We use the same JJ as defined above. Then we have |J||S||J|\leq|S|. By the definition of MEI(S)\operatorname{MEI}(S), we have

a=2|J|12(jaja1)2a=2|J|12(MEI(S))2a=2|S|12(MEI(S))2=12(|S|1)(MEI(S))2.\begin{split}\sum_{a=2}^{|J|}&{}\tfrac{1}{2}(j_{a}-j_{a-1})^{2}\leq\sum_{a=2}^{|J|}\tfrac{1}{2}(\operatorname{MEI}(S))^{2}\\ &\leq{}\sum_{a=2}^{|S|}\tfrac{1}{2}(\operatorname{MEI}(S))^{2}=\tfrac{1}{2}(|S|-1)(\operatorname{MEI}(S))^{2}.\end{split} (4)

From the Cauchy–Bunyakovsky–Schwarz inequality, we have

a=2|J|12(jaja1)2=12(|J|1)(a=2|J|(jaja1)2)(a=2|J|12)12(|J|1)(a=2|J|jaja1)2=12(|J|1)(j|J|j1)2=n2(|J|1)n2(|S|1),\begin{split}\sum_{a=2}^{|J|}&{}\tfrac{1}{2}(j_{a}-j_{a-1})^{2}=\frac{1}{2(|J|-1)}\mathopen{}\mathclose{{}\left(\sum_{a=2}^{|J|}(j_{a}-j_{a-1})^{2}}\right)\mathopen{}\mathclose{{}\left(\sum_{a=2}^{|J|}1^{2}}\right)\\ &\geq{}\frac{1}{2(|J|-1)}\mathopen{}\mathclose{{}\left(\sum_{a=2}^{|J|}j_{a}-j_{a-1}}\right)^{2}=\frac{1}{2(|J|-1)}(j_{|J|}-j_{1})^{2}\\ &={}\frac{n}{2(|J|-1)}\geq\frac{n}{2(|S|-1)},\end{split} (5)

where the last equality uses j|J|=nj_{|J|}=n and j1=0j_{1}=0.

From (3) to (5), this lemma is proved. ∎

Similar to MEIopt\operatorname{MEI}_{\operatorname{opt}}, we define an optimal value for the hypervolume. From Definition 8, it is not difficult to see that the larger HV\operatorname{HV}, the better SS approximates the Pareto front. For N2N\in\mathbb{N}_{\geq 2}, we further define

HVopt(N,r):=sup{HV(f(S),r)SM,|S|N,(0,n)S,(n,0)S}.\displaystyle\operatorname{HV}_{\operatorname{opt}}(N,r):=\sup\{\operatorname{HV}(f(S),r)\mid S\subseteq M,|S|\leq N,(0,n)\in S,(n,0)\in S\}.

Obviously, this is the largest HV\operatorname{HV} that a population of size NN containing the two extremal points can obtain. Now we estimate HVopt(N,r)\operatorname{HV}_{\operatorname{opt}}(N,r).

Lemma 10.

Let r=(r1,r2)r=(r_{1},r_{2}) with r1,r20r_{1},r_{2}\leq 0 be the reference point, and let A=r1nr2n+r1r2+12n2A=-r_{1}n-r_{2}n+r_{1}r_{2}+\frac{1}{2}n^{2}. For all N2N\in\mathbb{N}_{\geq 2}, we have

HVopt(N,r)[A(N1)(MEIopt(N))22,An2(N1)].\displaystyle\operatorname{HV}_{\operatorname{opt}}(N,r)\in\mathopen{}\mathclose{{}\left[A-\frac{(N-1)(\operatorname{MEI}_{\operatorname{opt}}(N))^{2}}{2},A-\frac{n}{2(N-1)}}\right].
Proof.

For any SMS\subseteq M with |S|N,(0,n)S|S|\leq N,(0,n)\in S, and (n,0)S(n,0)\in S, Lemma 9 shows that

HV(S,r)An2(|S|1)An2(N1),\displaystyle\operatorname{HV}(S,r)\leq A-\frac{n}{2(|S|-1)}\leq A-\frac{n}{2(N-1)},

where the second inequality uses |S|N|S|\leq N. Thus HVopt(N,r)An/(2(N1))\operatorname{HV}_{\operatorname{opt}}(N,r)\leq A-n/(2(N-1)).

Now we show the lower bound of HVopt(N,r)\operatorname{HV}_{\operatorname{opt}}(N,r). Let ji=min{(i1)n/(N1),n},i=1,,Nj_{i}=\min\{(i-1)\mathopen{}\mathclose{{}\left\lceil n/(N-1)}\right\rceil,n\},i=1,\dots,N, and we know that j1=1j_{1}=1 and jN=nj_{N}=n. Consider the set S={(ji,njj)i[1..N]}S^{\prime}=\{(j_{i},n-j_{j})\mid i\in[1..N]\}. It is not difficult to see that

min{inN1,n}min{(i1)nN1,n}nN1.\displaystyle\min\mathopen{}\mathclose{{}\left\{i\mathopen{}\mathclose{{}\left\lceil\tfrac{n}{N-1}}\right\rceil,n}\right\}-\min\mathopen{}\mathclose{{}\left\{(i-1)\mathopen{}\mathclose{{}\left\lceil\tfrac{n}{N-1}}\right\rceil,n}\right\}\leq\mathopen{}\mathclose{{}\left\lceil\tfrac{n}{N-1}}\right\rceil.

Hence, from (3) we have

HV(S,r)\displaystyle\operatorname{HV}(S^{\prime},r) =Aa=2N12(jaja1)2\displaystyle={}A-\sum_{a=2}^{N}\tfrac{1}{2}(j_{a}-j_{a-1})^{2}
Aa=2N12(nN1)2=A12(N1)(nN1)2.\displaystyle\geq{}A-\sum_{a=2}^{N}\tfrac{1}{2}\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left\lceil\tfrac{n}{N-1}}\right\rceil}\right)^{2}=A-\tfrac{1}{2}(N-1)\mathopen{}\mathclose{{}\left(\mathopen{}\mathclose{{}\left\lceil\tfrac{n}{N-1}}\right\rceil}\right)^{2}.

By the definition of HVopt(N,r)\operatorname{HV}_{\operatorname{opt}}(N,r) and noting MEIopt(N)=n/(N1)\operatorname{MEI}_{\operatorname{opt}}(N)=\mathopen{}\mathclose{{}\left\lceil n/(N-1)}\right\rceil from Lemma 2, the lower bound is proven. ∎

Different from Corollary 5 and 7, we do not see a simple relation between MEIopt(N)\operatorname{MEI}_{\operatorname{opt}}(N) and HVopt(N,r)\operatorname{HV}_{\operatorname{opt}}(N,r) here. However, we note that the construction of SS^{\prime} here is the same as the one in the proof of Lemma 2. Hence, the lower bound of the HVopt(N,r)\operatorname{HV}_{\operatorname{opt}}(N,r) is reached when MEIopt(N)\operatorname{MEI}_{\operatorname{opt}}(N) is reached.

Since our MEI measure is more intuitive and crucial in our proofs, and it can be easily transferred to the ε\varepsilon-approximation and HV measure that are utilized in the community, we will use MEI as the approximation measure to see how well an MOEA approximates the Pareto front in the remaining of this work.

4 Difficulties for the NSGA-II to Approximate the Pareto Front

In this section, we show that the traditional way how the NSGA-II selects the next population, namely by relying on the initial crowding distance, can lead to suboptimal approximations of the Pareto front. To this aim, we analyze by mathematical means the results of the selection from two different combined parent and offspring populations. These examples demonstrate quite clearly that the traditional selection can lead to unwanted results. We note that these results do not prove completely that the NSGA-II has difficulties to find good approximations since we do not know how often the NSGA-II enters exactly these situations. Unfortunately the population dynamics of the NSGA-II are too complicated for a full proof. Our experimental results in Section 7, however, suggest that the phenomena we observe in these synthetic situations (in particular, the first one) do show up.

We start by regarding the optimal-looking situation that the combined parent and offspring population for each point on the Pareto front contains exactly one individual. Hence by removing the individual corresponding to (essentially) every second point on the Pareto front, one could obtain a very good approximation of the front. Surprisingly, the NSGA-II does much worse. Since all solutions apart from the two extremal ones have the same crowding distance, the NSGA-II removes a random set of NN out of these 2N22N-2 inner solutions. As we show now, with high probability this creates an empty interval on the Pareto front of length Θ(logn)\Theta(\log n).

Lemma 11.

Let n7n\geq 7 be odd. Consider using the NSGA-II to optimize the OneMinMax function with problem size nn. Let the population size be N=(n+1)/2N=(n+1)/2. Suppose that in a certain generation t0t\geq 0, the combined parent and offspring population RtR_{t} fully covers the Pareto front, that is, f(Rt)={(0,n),(1,n1),,(n,0)}f(R_{t})=\{(0,n),(1,n-1),\dots,(n,0)\}. Then with probability at least 1exp(Ω(n1/2))1-\exp(-\Omega(n^{1/2})), we have MEI(Pt+1,f1)13lnn\operatorname{MEI}(P_{t+1},f_{1})\geq\lfloor\frac{1}{3}\ln n\rfloor. On the positive side, E[MEI(Pt+1,f1)]log3/2n+O(1)E[\operatorname{MEI}(P_{t+1},f_{1})]\leq\log_{3/2}n+O(1) and Pr[MEI(Pt+1,f1)clog3/2n]n1c\Pr[\operatorname{MEI}(P_{t+1},f_{1})\geq c\log_{3/2}n]\leq n^{1-c} for any constant c>1c>1.

We note that in the above result, we did not try to optimize the implicit constants. The proof of the upper bound on the length of the longest empty interval is simple union bound argument. The proof of the lower bound needs some care to deal with the stochastic dependencies among the intervals. We overcome these difficulties by only regarding a set of disjoint intervals and with a two-stage sampling process such that the first stage treats the regarded intervals in an independent manner.

Proof.

We note that RtR_{t} has size at least n+1n+1, since it covers the Pareto front, which has a size of n+1n+1. From our assumption N=(n+1)/2N=(n+1)/2, we thus conclude that every point in the Pareto front has exactly one corresponding individual in RtR_{t}. Hence, except for the individuals 0n0^{n} and 1n1^{n} that have infinite crowding distance, all other individuals have the equal crowding distance of 4/n4/n. Consequently, the original NSGA-II survival selection will randomly select NN individuals from the 2N2=n12N-2=n-1 inner individuals to be removed.

To prove the logarithmic lower bound on the expectation, we argue as follows. Let k=13lnnk=\lfloor\frac{1}{3}\ln n\rfloor. Let M[1..nk]M\subseteq[1..n-k] such that |M|=n/lnn|M|=\lceil n/\ln n\rceil and for any two m1,m2Mm_{1},m_{2}\in M, we have |m1m2|k|m_{1}-m_{2}|\geq k or m1=m2m_{1}=m_{2} (note that such a set exists since k|M|n1k|M|\leq n-1 by definition of kk and MM). Consequently, for different mMm\in M, the intervals Im=[m..m+k1]I_{m}=[m..m+k-1] are disjoint subsets of [1..n1][1..n-1].

To cope with the stochastic dependencies, we regard a particular way to sample the NN individuals to be removed. In a first phase, we select each individual xx with f1(x)[1..n1]f_{1}(x)\in[1..n-1] with probability N/(2(n1))N/(2(n-1)). This defines a random set XX of individuals with expected cardinality E[|X|]=N/2E[|X|]=N/2. We remove these individuals from the combined parent and offspring population. In a second phase, if |X|<N|X|<N, which is very likely, then we repeat removing random individuals xx with f1(x)[1..n1]f_{1}(x)\in[1..n-1] until we have removed a total of NN individuals. If |X|>N|X|>N, then we repeat adding random previously removed individuals until we have removed only NN individuals.

We now prove that with high probability, the first phase ends with (i) an interval II of length kk on the Pareto front which is not covered by the population, and with (ii) |X|N|X|\leq N. Apparently, then II also in the final population is not covered. Denote by A~m,k\tilde{A}_{m,k} the probability that all individuals xx with f1(x)Imf_{1}(x)\in I_{m} are removed in the first phase. We have

Pr[A~m,k]\displaystyle\Pr[\tilde{A}_{m,k}] =(N2(n1))k4knln(4)/3.\displaystyle=\mathopen{}\mathclose{{}\left(\tfrac{N}{2(n-1)}}\right)^{k}\geq 4^{-k}\geq n^{-\ln(4)/3}.

By construction, the events A~m,k\tilde{A}_{m,k}, mMm\in M, are independent. Hence, using the estimate 1+rexp(r)1+r\leq\exp(r) valid for all rr\in\mathbb{R}, we estimate

Pr[mM:¬A~m,k]\displaystyle\Pr[\forall m\in M:\neg\tilde{A}_{m,k}] (1nln(4)/3)|M|exp(nln(4)/3nlnn)\displaystyle\leq(1-n^{-\ln(4)/3})^{|M|}\leq\exp\mathopen{}\mathclose{{}\left(-n^{-\ln(4)/3}\frac{n}{\ln n}}\right)
=exp(n1ln(4)/3lnn)=exp(Ω(n1/2)).\displaystyle=\exp\mathopen{}\mathclose{{}\left(-\frac{n^{1-\ln(4)/3}}{\ln n}}\right)=\exp(-\Omega(n^{1/2})).

Finally, we estimate the probability that |X|>N|X|>N. We note that |X||X| can be written as a sum of n1n-1 independent binary random variables. Hence by the multiplicative Chernoff bound (Theorem 1.10.1 in [Doe20]), we have

Pr[|X|>N]Pr[|X|2E[X]]exp(E[X]/3)exp(n/6).\Pr[|X|>N]\leq\Pr[|X|\geq 2E[X]]\leq\exp(-E[X]/3)\leq\exp(-n/6).

This proves that with probability at least 1exp(n1ln(4)/3/lnn)exp(n/6)=1exp(Ω(n1/2))1-\exp(-n^{1-\ln(4)/3}/\ln n)-\exp(-n/6)=1-\exp(-\Omega(n^{1/2})), after the selection phase there is an interval of length kk of the Pareto front such that none of its points is covered by the new population Pt+1P_{t+1}. This also implies E[MEI(Pt+1,f1)](1exp(Ω(n1/2)))k=Ω(logn)E[\operatorname{MEI}(P_{t+1},f_{1})]\geq(1-\exp(-\Omega(n^{-1/2})))k=\Omega(\log n).

We now turn to the upper bounds. Let k[1..N]k\in[1..N] and m[1..nk]m\in[1..n-k], and let Ak,mA_{k,m} be the event that all individuals with f1f_{1} value in Im=[m..m+k1]I_{m}=[m..m+k-1] are selected for removal. Then

Pr[Am,k]=(2N2kNk)(2N2N)=(2N2k)!(Nk)!(N2)!(2N2)!N!(N2)!=N(N1)(Nk+1)(2N2)(2N3)(2N1k).\displaystyle\Pr[A_{m,k}]=\frac{\binom{2N-2-k}{N-k}}{\binom{2N-2}{N}}=\frac{\frac{(2N-2-k)!}{(N-k)!(N-2)!}}{\frac{(2N-2)!}{N!(N-2)!}}=\frac{N(N-1)\cdots(N-k+1)}{(2N-2)(2N-3)\cdots(2N-1-k)}.

It is not difficult to see that if Am,kA_{m,k} happens for some mm, then MEI(Pt,f1)k\operatorname{MEI}(P_{t},f_{1})\geq k. Hence, by a union bound over all possible mm, we obtain

Pr[MEI(Pt+1,f1)k]\displaystyle\Pr[\operatorname{MEI}(P_{t+1},f_{1})\geq k] (nk)Pr[Am,k]=(nk)N(N1)(Nk+1)(2N2)(2N3)(2N1k)\displaystyle\leq{}(n-k)\Pr[A_{m,k}]=\frac{(n-k)N(N-1)\cdots(N-k+1)}{(2N-2)(2N-3)\cdots(2N-1-k)}
(nk)(N2N2)kn(122/N)kn(23)k,\displaystyle\leq{}(n-k)\mathopen{}\mathclose{{}\left(\frac{N}{2N-2}}\right)^{k}\leq n\mathopen{}\mathclose{{}\left(\frac{1}{2-2/N}}\right)^{k}\leq n\mathopen{}\mathclose{{}\left(\frac{2}{3}}\right)^{k},

where the last inequality uses n7n\geq 7 and thus N=(n+1)/24N=(n+1)/2\geq 4. Hence for any constant c>1c>1, we know that

Pr[MEI(Pt+1,f1)clog3/2n]n(23)clog3/2n=1nc1.\displaystyle\Pr[\operatorname{MEI}(P_{t+1},f_{1})\geq c\log_{3/2}n]\leq n\mathopen{}\mathclose{{}\left(\tfrac{2}{3}}\right)^{c\log_{3/2}n}=\tfrac{1}{n^{c-1}}.

For the expectation, we compute

E\displaystyle E [MEI(Pt+1,f1)]=k=1+Pr[MEI(Pt+1,f1)k]\displaystyle{}[\operatorname{MEI}(P_{t+1,f_{1}})]=\sum_{k=1}^{+\infty}\Pr[\operatorname{MEI}(P_{t+1},f_{1})\geq k]
k=1log3/2nPr[MEI(Pt+1,f1)k]+k=log3/2n+1+n(23)k\displaystyle\leq{}\sum_{k=1}^{\lfloor\log_{3/2}n\rfloor}\Pr[\operatorname{MEI}(P_{t+1},f_{1})\geq k]+\sum_{k=\lfloor\log_{3/2}n\rfloor+1}^{+\infty}n\mathopen{}\mathclose{{}\left(\tfrac{2}{3}}\right)^{k}
log3/2n+n(23)log3/2n+1k=0+(23)klog3/2n+O(1).\displaystyle\leq{}\lfloor\log_{3/2}n\rfloor+n\mathopen{}\mathclose{{}\left(\tfrac{2}{3}}\right)^{\lfloor\log_{3/2}n\rfloor+1}\sum_{k=0}^{+\infty}\mathopen{}\mathclose{{}\left(\tfrac{2}{3}}\right)^{k}\leq\log_{3/2}n+O(1).\qed

The example above shows that even in a perfectly symmetric situation, the NSGA-II with high probability selects a new parent population with high irregularities and relatively large areas on the Pareto front that are not covered by the population.

We now show that even more extreme examples can be constructed. This example builds on the same idea as the 1111-point example in Figure 2 in [KD06], but is much more general (working for arbitrary large population sizes) and gives a more extreme result. We do not expect these to come up often in a regular run of the NSGA-II, but they underline that the drawbacks of working with the initial crowding distance can be tremendous.

Lemma 12.

For all n3n\in 3\mathbb{N}, there is a combined parent and offspring population RR with 0n,1nR0^{n},1^{n}\in R and the following two properties.

  • The population PP^{\prime} selected by the traditional NSGA-II satisfies MEI(P,f1)=13n+2\operatorname{MEI}(P^{\prime},f_{1})=\frac{1}{3}n+2.

  • There is a population PRP^{\prime\prime}\subseteq R with |P|=N|P^{\prime\prime}|=N and 0n,1nP0^{n},1^{n}\in P^{\prime\prime} such that MEI(P,f1)4\operatorname{MEI}(P^{\prime\prime},f_{1})\leq 4.

Proof.

Let RR be such that f1(R)=[0..13n+1]{13n+2ii[1..13n]}f_{1}(R)=[0..\frac{1}{3}n+1]\cup\{\frac{1}{3}n+2i\mid i\in[1..\frac{1}{3}n]\} and |R|=2(13n+1)|R|=2(\frac{1}{3}n+1). Then for any two different x,xRx,x^{\prime}\in R, we have f1(x)f1(x)f_{1}(x)\neq f_{1}(x^{\prime}). By construction, 0n,1nR0^{n},1^{n}\in R. We note that exactly those xRx\in R with f1(x)[1..13n+1]f_{1}(x)\in[1..\frac{1}{3}n+1] have the smallest occurring crowding distance of 4n\frac{4}{n}. Since these are 13n+1\frac{1}{3}n+1 elements of RR and since N=12|R|=13n+1N=\frac{1}{2}|R|=\frac{1}{3}n+1 elements that have to be removed, they will all be removed in the selection step, leaving all points (i,ni)(i,n-i) with i[1..13n+1]i\in[1..\frac{1}{3}n+1] uncovered by the selected population.

Now let PP^{\prime\prime} be obtained from RR by keeping 0n0^{n}, removing the individuals with f1f_{1} values 11 and 22, keeping the individual with f1f_{1} value 33, and then alternatingly in order of increasing f1f_{1} value deleting and keeping the individuals. This removes exactly half the individuals from RR, but leaves 0n0^{n} and 1n1^{n} in PP^{\prime\prime}. Apart from the interval [0..3][0..3], the intervals in f1(P)f_{1}(P^{\prime\prime}) are all the union of two intervals in f1(R)f_{1}(R). Since MEI(R,f1)=2\operatorname{MEI}(R,f_{1})=2, this shows MEI(P,f1)4\operatorname{MEI}(P^{\prime\prime},f_{1})\leq 4. ∎

5 Working With the Current Crowding Distance

As already noted in [KD06] and then further quantified in the preceding section, the traditional way the NSGA-II selects the next parent population can lead to unevenly distributed populations. The natural reason, discussed also in [KD06] and made again very visible in the proofs of Lemmas 11 and 12, is that the crowding distance is computed only once and then used for all removals of individuals. Consequently, it may happen that individuals are removed which, at the time of their removal, have a much larger crowding distance than at the start of the selection phase.

This phenomenon is heavily exploited in the construction of the very negative example in the proof of Lemma 12. In this example, the individuals xx with f1(x)[1..13n]f_{1}(x)\in[1..\frac{1}{3}n] all have the smallest crowding distance of 4/n4/n, and thus are all removed in some arbitrary order. When the last individual is removed, its neighbors on the front have objective values (0,n)(0,n) and (13n+1,n(13n+1))(\frac{1}{3}n+1,n-(\frac{1}{3}n+1)). Consequently, this individual at the moment of its removal has a crowding distance of 2/3+2/n2/3+2/n, which is (for large nn) much larger than its initial value of 4/n4/n, but also much larger than the crowding distance of 8/n8/n, which most of the remaining individuals still have. This example shows very clearly the downside of working with the initial crowding distance and at the same time suggests to work with the current crowding distance instead.

This is the road we will follow in this section. We first argue that there is an efficient implementation of the NSGA-II that repeatedly selects an individual with smallest crowding distance. We then show that this selection mechanism leads to much more balanced selections for the OneMinMax benchmark. We prove that the modified NSGA-II achieves an MEI value of at most 2nN3\frac{2n}{N-3}, which is only by roughly a factor of 22 larger than the optimal MEI of (1+o(1))nN(1+o(1))\frac{n}{N}. Reaching such a balanced distribution is very efficient – once the two extremal points are found (in time O(Nnlog(n))O(Nn\log(n))), it only takes additional time O(Nn)O(Nn) to find a population satisfying the MEI guarantee above. From this point on, the MEI never increases above 2nN3\frac{2n}{N-3}.

5.1 Implementation of the NSGA-II Using the Current Crowding Distance

It is immediately obvious that the computation of the crowding distance in the original NSGA-II can be implemented in a straightforward manner that takes O(mNlogN)O(mN\log N) time, where mm is the number of objectives and NN is the number of individuals for which the crowding distance is used as tie-breaker.

For the NSGA-II using the current crowding distance, some more thought is necessary. A naïve computation of the crowding distance for each removal could lead to a total effort of Θ(mN2logN)\Theta(mN^{2}\log N). Since the removal of one individual can change the crowding distance of the at most 2m2m individuals which are its neighbors in one of the sortings, one would hope that more efficient implementations exist, and this is indeed true.

In fact, already in [KD06] this problem was discussed and a solution via a priority queue and an additional data structure storing the neighbor relation was presented. Since the presentation in [KD06] is very compact (and did not discuss how to implement a random tie-breaking among individuals with the same crowding distance), we now describe this approach in more detail.

For the following presentation, let us assume that we have only m=2m=2 objectives. Let us assume that we have a set RR of individuals which pairwise are not comparable (none dominates the other) or have identical objective values. When optimizing OneMinMax, this set RR will be the combined parent and offspring population; in the general case it will be the front FiF_{i^{*}}. Let us call r=|R|r=|R| the size of this set. Let us assume that we want to remove some number ss of individuals from RR, sequentially by repeatedly removing an individual with the smallest current crowding distance, breaking ties randomly.

Besides keeping the crowding distance updated (which can be done in constant time per removal, as we just saw), we also need to be able to efficiently find and remove an individual with the smallest (current) crowding distance, and moreover, a random one in case of ties. The detection and removal of an element with the smallest key calls for a priority queue. Let us ignore for the moment the random tie-breaking and only discuss how to use a priority queue for the detection and removal of an individual with the smallest crowding distance. We recall that a priority queue is a data structure that stores items together with a key, a numerical value assigned to each item that describes the priority of the item. Standard priority queues support at least the following three operations: Adding new items to the queue, removing from the queue an item with the smallest key, and decreasing the key of an item in the queue. They do so with a time complexity that is only logarithmic in the current length of the queue.

For our problem, at the start of the selection phase, we add all individuals with their crowding distance as key to the priority queue. We repeatedly remove individuals according to the following scheme: (i) We find and remove from the priority queue an individual xx with smallest key. We also remove xx from RR. (ii) We find the up to four neighbors of xx in the two sortings of RR according to the two objectives, compute their crowding distance, and update their keys in the priority queue accordingly. This is not a decrease-key operation (but an increase of the key), but such an increase-key can be simulated by first decreasing the key to an artificially small value (smaller than all real values that can occur, here for example 1-1), then removing the item with smallest key (which is just this item), and then adding it with the new key to the queue.

These two steps can be implemented in logarithmic time, since all operations of the priority queue take at most logarithmic time, except that we still need to provide an efficient way to find the neighbors of an element in the sortings. This is necessary to determine the up to four neighbors of xx, but also to compute their crowding distance (which needs knowing their neighbors). To enable efficient computations of such neighbors, we use an additional data structure, namely for each objective a doubly-linked list that stores the current set RR sorted according to this objective. This list data structure must enable finding predecessors and successors (provided they exist) as well as the deletion of elements. Standard doubly-linked lists support these operations in constant time. We use this list in step (ii) above to find the desired neighbors. We also need to delete the removed individual xx in step (i) from this list.

To allow finding individuals in the priority queue or the doubly-linked list, we need a helper data structure with pointers to the individuals in these data structures. This can be a simple array indexed by the initial set RR. When RR is not suitable as index, we can initially build an array a[1..|R|]a[1..|R|] storing RR and then use the indices of the elements of RR in aa as identifiers. With this approach, regardless of the structure of RR, we can access individuals in any of our data structures in constant time.

So far we have described how to repeatedly remove an element with the currently smallest crowding distance each in logarithmic time. To add the random tie-breaking, it suffices to give each individual xx a random second-priority key, e.g., a random number rx[0,1]r_{x}\in[0,1]. The key of an individual xx used in the priority queue now is composed of the current crowding distance and this number rxr_{x}, where a key is smaller than another when its crowding distance is smaller or when the crowding distances are equal and the rxr_{x} number is smaller (lexicographic order of the two parts of the key). With these extended keys, the individuals with the currently smallest crowding distance have the highest priority, and in the case of several such individuals, the number rxr_{x} serves as a random tie-breaker.

With this tie-breaking mechanism, we have now implemented a way to repeatedly remove an individual with the smallest current crowding distance, breaking ties randomly, in time logarithmic in rr, the size of the set RR. Overall, our selection using the current crowding distance takes the same time of O(rlogr)O(r\log r) as the selection based on the initial crowding distance. Note that both complexities are small compared to the non-dominated sorting step, which takes time quadratic in NN.

Without going into details, we note that when the possible crowding distance values are known and they are not too numerous, say there is an upper bound of SS for their number, then using a bucket queue instead of a standard priority queue would give a complexity of order O(S+r)O(S+r). This is slightly superior to the above runtime, e.g., for OneMinMax when N=Ω(n/logn)N=\Omega(n/\log n).

1:Uniformly at random generate the initial population P0={x1,x2,,xN}P_{0}=\{x_{1},x_{2},\dots,x_{N}\} with xi{0,1}n,i=1,2,,N.x_{i}\in\{0,1\}^{n},i=1,2,\dots,N.
2:for t=0,1,2,t=0,1,2,\dots do
3:   Generate the offspring population QtQ_{t} with size NN
4:   Use fast-non-dominated-sort() in [DPAM02] to divide RtR_{t} into fronts F1,F2,F_{1},F_{2},\dots
5:   Find i1i^{*}\geq 1 such that |i=1i1Fi|<N|\bigcup_{i=1}^{i^{*}-1}F_{i}|<N and |i=1iFi|N|\bigcup_{i=1}^{i^{*}}F_{i}|\geq N
6:   Use Algorithm 1 to separately calculate the crowding distance of each individual in F1,,FiF_{1},\dots,F_{i^{*}}
7:  %% Survival selection using the current crowding distance
8:   while |i=1iFi|N|\bigcup_{i=1}^{i^{*}}F_{i}|\neq N do
9:      Let xx be the individual with the smallest crowding distance in FiF_{i^{*}}, chosen at random in case of a tie
10:      Find four neighbors of xx, two in the sorted list with respect to f1f_{1} and two for f2f_{2}. Update the crowding distance of these four neighbors
11:      Fi=Fi{x}F_{i^{*}}=F_{i^{*}}\setminus\{x\}
12:   end while
13:   Pt+1=(i=1iFi)P_{t+1}=\mathopen{}\mathclose{{}\left(\bigcup_{i=1}^{i^{*}}F_{i}}\right)
14:end for
Algorithm 3 NSGA-II with the survival selection using the current crowding distance

5.2 Runtime Analysis and Approximation Quality of the NSGA-II with Current Crowding Distance

We now conduct a mathematical analysis of the NSGA-II with survival selection based on the current crowding distance. The following lemma shows that invididuals with at least a certain crowding distance will certainly enter the next generation. The key argument in this proof is an averaging argument based on the observation that the sum of the crowding distances of all individuals other than the ones with infinite crowding distance is at most 44.

Lemma 13.

Let N4N\geq 4. Consider using the NSGA-II with survival selection based on the current crowding distance to optimize the OneMinMax function with problem size nn. Let t0t_{0} be the first generation such that the two extreme points 0n0^{n} and 1n1^{n} are in Pt0P_{t_{0}}. Let tt0t\geq t_{0}. Consider the selection of the next population Pt+1P_{t+1} from RtR_{t}, which consists of NN times removing an individual with smallest current crowding distance, breaking ties in an arbitrary manner. Assume that at some stage of this removal process the individual xx has a crowding distance of 4N3\frac{4}{N-3} or higher. Then xPt+1x\in P_{t+1}. Also, Pt+1P_{t+1} surely contains the two extreme points.

Proof.

We consider first a single removal of an individual at some moment of the selection phase. Let RR be the set of individuals from the combined parent and offspring population that have not yet been removed. Note that r:=|R|>Nr:=|R|>N. Note also that, by definition of the crowding distance, there can be at most 44 individuals with infinite crowding distance. Since N4N\geq 4, such individuals are never removed, and consequently, RR surely contains 0n0^{n} and 1n1^{n}. Let y1,,yry_{1},\dots,y_{r} be the sorting of RR by increasing f1f_{1} value and z1,,zrz_{1},\dots,z_{r} be the sorting of RR by increasing f2f_{2} value that are used for the computation of the crowding distance. For xRx\in R, let i,j[1..r]i,j\in[1..r] such that x=yi=zjx=y_{i}=z_{j} (we assume here that individuals have unique identifiers, so they are distinguishable also when having the same genotype; consequently, these ii and jj are uniquely defined). From the definition of cDis(x)\operatorname{cDis}(x), we know the following. If {i,j}{1,r}\{i,j\}\cap\{1,r\}\neq\emptyset, then cDis(x)=\operatorname{cDis}(x)=\infty. Otherwise, cDis(x)=(f1(yi+1)f1(yi1)+f2(zj+1)f2(zj1))/n\operatorname{cDis}(x)=(f_{1}(y_{i+1})-f_{1}(y_{i-1})+f_{2}(z_{j+1})-f_{2}(z_{j-1}))/n.

We compute

i=2r1f1(yi+1)f1(yi1)\displaystyle\sum_{i=2}^{r-1}f_{1}(y_{i+1})-f_{1}(y_{i-1}) =i=2r1f1(yi+1)f1(yi)+f1(yi)f1(yi1)\displaystyle=\sum_{i=2}^{r-1}f_{1}(y_{i+1})-f_{1}(y_{i})+f_{1}(y_{i})-f_{1}(y_{i-1})
2i=1r1f1(yi+1)f1(yi)\displaystyle\leq 2\sum_{i=1}^{r-1}f_{1}(y_{i+1})-f_{1}(y_{i})
=2(f1(yr)f1(y1))=2(n0)=2n,\displaystyle=2(f_{1}(y_{r})-f_{1}(y_{1}))=2(n-0)=2n,

the latter by our insight that RR contains the two extremal individuals. An analogous estimate holds for f2f_{2} and the zjz_{j}. This allows the estimate

xRncDis(x)i=1r1f1(yi+1)f1(yi1)+j=1r1f2(zj+1)f2(zj1)4n,\displaystyle\sum_{x\in R^{*}}n\operatorname{cDis}(x)\leq\sum_{i=1}^{r-1}f_{1}(y_{i+1})-f_{1}(y_{i-1})+\sum_{j=1}^{r-1}f_{2}(z_{j+1})-f_{2}(z_{j-1})\leq 4n,

where we write RR^{*} for the individuals in RR having a finite crowding distance. Since |R|r4|R^{*}|\geq r-4, a simple averaging argument shows that there is an xRx\in R^{*} with cDis(x)4r4\operatorname{cDis}(x)\leq\frac{4}{r-4}. Hence also the individual that is removed has a crowding distance of at most 4r44N3\frac{4}{r-4}\leq\frac{4}{N-3}.

Now looking at all removals in this selection step, we see that never an individual with crowding distance above 4N3\frac{4}{N-3} is removed. ∎

The result just shown easily implies that no large empty interval is created by the removal of a solution from RtR_{t} in the selection phase. Slightly more involved arguments are necessary to argue that the MEI-value decreases relatively fast. It is not too difficult to see that if the set of f1f_{1} values in PtP_{t} contains a large empty interval (the same will then be true for f2f_{2}), then this interval can be reduced in length by at least one via the event that one of the individuals corresponding to the boundaries of the empty interval create an offspring “inside the interval”. What needs more care is that we require such arguments for all large empty intervals in parallel. For this, we regard how an empty interval shrinks over a longer time. Since this shrinking is composed of independent small steps, we can use strong concentration arguments to show that an interval shrinks to the desired value in the desired time with very high probability. This admits a union bound argument to extend this result to all empty intervals.

A slight technical challenge is to make precise what “one interval” means over the course of time (recall that in one iteration, we generate NN offspring and then remove NN individuals from RtR_{t}). We overcome this by regarding a half-integral point i+0.5i+0.5 and the corresponding interval

Ii=[max{f1(x)xPt,f1(x)i+0.5}..min{f1(x)xPt,f1(x)i+0.5}].\displaystyle I_{i}=[\max\{f_{1}(x)\mid x\in P_{t},f_{1}(x)\leq i+0.5\}..\min\{f_{1}(x)\mid x\in P_{t},f_{1}(x)\geq i+0.5\}].

This definition is unambiguous. It gives room to some strange effects, which seem hard to avoid, nevertheless. Assume that Ii=[i..b]I_{i}=[i..b] for some bb sufficiently larger than ii. Assume that there is a single individual xx with f1(x)=if_{1}(x)=i in PtP_{t}. Assume that xx has an offspring yy with f1(y)=i+1f_{1}(y)=i+1. If both xx and yy survive into the next generation, then IiI_{i} has suddenly shrunk a lot to [i..i+1][i..i+1]. If yy survives but xx does not, then IiI_{i} has changed considerably to [a..i+1][a..i+1], where a=max{f1(x)xPt,f1(x)<i}a=\max\{f_{1}(x)\mid x\in P_{t},f_{1}(x)<i\}. Since in each iteration NN individuals are newly generated and then NN are removed, we do not see a way to define the empty intervals in a more stable manner. Nevertheless, our proof below will be such that it covers all these cases in a relatively uniform manner.

We recall from Section 2 that fair selection means that each individual of the parent population generates exactly one offspring and random selection means that NN times an individual is chosen uniformly at random from the parent population to generate one offspring.

Lemma 14.

Let N4N\geq 4. Consider using the NSGA-II with fair or random parent selection, with one-bit or bit-wise mutation, and with survival selection using the current crowding distance, to optimize the OneMinMax function with problem size nn. Let t0t_{0} be the first generation such that the two extreme points 0n0^{n} and 1n1^{n} are contained in Pt0P_{t_{0}}. Let t1t0t_{1}\geq t_{0} be the first generation such that MEI(Pt1,f1)max{2nN3,1}=:L\operatorname{MEI}(P_{t_{1}},f_{1})\leq\max\{\frac{2n}{N-3},1\}=:L. Then t1t0=O(n)t_{1}-t_{0}=O(n), both in expectation and with probability 1o(1)1-o(1). Also, for all t>t1t>t_{1}, we have MEI(Pt,f1)L\operatorname{MEI}(P_{t},f_{1})\leq L with probability one.

Proof.

Let i[0..n1]i\in[0..n-1]. For all tt0t\geq t_{0}, let XtX_{t} be the length of the empty interval in f1(Pt)f_{1}(P_{t}) containing i+0.5i+0.5, that is,

Xt=min{f1(x)xPt,f1(x)i+0.5}max{f1(x)xPt,f1(x)i+0.5},\displaystyle X_{t}=\min\{f_{1}(x)\mid x\in P_{t},f_{1}(x)\geq i+0.5\}-\max\{f_{1}(x)\mid x\in P_{t},f_{1}(x)\leq i+0.5\},

and let analogously YtY_{t} be the length of the empty interval in f1(Rt)f_{1}(R_{t}) that contains i+0.5i+0.5.

We first prove if for some tt0t\geq t_{0} and MLM\geq L we have YtMY_{t}\leq M, then we have XtMX_{t^{\prime}}\leq M for all t>tt^{\prime}>t with probability one. It apparently suffices to regard one iteration, so assume by way of contradiction that Xt+1>MX_{t+1}>M. Note that Pt+1P_{t+1} is obtained from RtR_{t} by NN times removing an individual with smallest current crowding distance. Consider the removal step which extends the empty interval containing i+0.5i+0.5 to a length larger than MM. Let [a..b][a..b] be the empty interval containing i+0.5i+0.5 before this removal. To increase the size of this empty interval, the individual xx removed in this step must have f1(x){a,b}f_{1}(x)\in\{a,b\} and it must be the only individual with this f1f_{1} value. Assume, without loss of generality, that f1(x)=af_{1}(x)=a. Removing xx, by definition of the crowding distance, creates an empty interval of length exactly n/2n/2 times the current crowding distance of aa (here the factor of 1/21/2 stems from the fact that for an individual xx with unique f1f_{1} and f2f_{2} value, the contributions of the two objectives to the crowding distance are equal). By Lemma 13, this current crowding distance of xx is at most 4N3\frac{4}{N-3}, contradicting our assumption that the removal of aa creates an empty interval of length larger than LL. This proves our first claim.

Since XtYtX_{t}\geq Y_{t} for all tt0t\geq t_{0}, this claim implies that XtXt+1X_{t}\geq X_{t+1} for XtLX_{t}\geq L, and therefore that once XtLX_{t}\leq L, we have XtLX_{t^{\prime}}\leq L for all ttt^{\prime}\geq t. Consequently, for all t>t1t>t_{1}, we have MEI(Pt,f1)L\operatorname{MEI}(P_{t},f_{1})\leq L with probability one.

It remains to show that XtX_{t} is actually decreasing to at most LL over time. To this aim, we now consider the situation that Xt>LX_{t}>L for some tt0t\geq t_{0}. Let aa and bb with aba\leq b be the two ends of the interval in f1(Pt)f_{1}(P_{t}) that contains i+0.5i+0.5. Let xa,xbPtx_{a},x_{b}\in P_{t} respectively be individuals with f1f_{1} values aa and bb. The probability that mutation applied to xax_{a} creates an offspring yy with f1(y)=a+1f_{1}(y)=a+1 is nan\frac{n-a}{n} for one-bit mutation and (11n)n1nannaen(1-\frac{1}{n})^{n-1}\frac{n-a}{n}\geq\frac{n-a}{en} for bit-wise mutation. Likewise, the probability that xbx_{b} mutates into a yy with f1(y)=b1f_{1}(y)=b-1 is bn\frac{b}{n} for one-bit mutation and at least ben\frac{b}{en} for bit-wise mutation. Since aba\leq b, we have (na)+bn(n-a)+b\geq n, that is, for at least one of xax_{a} and xbx_{b} from which a yy with f1(y){a+1,b1}f_{1}(y)\in\{a+1,b-1\} is mutated, such probability is at least 12e\frac{1}{2e}. The probability that this individual is chosen as parent at least once in this iteration is one for fair parent selection and (1(11N)N)11e(1-(1-\frac{1}{N})^{N})\geq 1-\frac{1}{e} for random parent selection. Consequently, the probability that at least one individual with f1f_{1} value in {a+1,b1}\{a+1,b-1\} is created in this iteration is at least 12e(11e)=:p\frac{1}{2e}(1-\frac{1}{e})=:p.

Let us assume that this positive event has happened. Then the interval in f1(Rt)f_{1}(R_{t}) containing i+0.5i+0.5 has length YtXt1Y_{t}\leq X_{t}-1 (note that, depending on the value of ii and the outcome of the offspring generation, this interval can be any of [a..a+1][a..a+1], [a..b1][a..b-1], [a+1..b1][a+1..b-1], [a+1..b][a+1..b], and [b1..b][b-1..b], but all arguments below hold for any of these cases). By our first claim, we now have Xt+1Xt1X_{t+1}\leq X_{t}-1 with probability one. This argument shows that in any iteration starting with Xt>LX_{t}>L, regardless of what happened in previous iterations, we have a probability of at least pp of reducing XtX_{t} by at least one.

Now we analyze the probability of the event AA that XtX_{t} drops to LL or less in T=2(1/p)nT=\lceil 2(1/p)n\rceil generations. We consider a random variable Z=i=1TZiZ=\sum_{i=1}^{T}Z_{i}, where Z1,,ZTZ_{1},\dots,Z_{T} are independent Bernoulli random variables with success probability of pp. Then Xt0Xt0+T=t=t0T1(XtXt+1)X_{t_{0}}-X_{t_{0}+T}=\sum_{t=t_{0}}^{T-1}(X_{t}-X_{t+1}) stochastically dominates Z:=min{Z,Xt0L}Z^{\prime}:=\min\{Z,X_{t_{0}}-L\}. Since E[Z]2nE[Z]\geq 2n, applying the multiplicative Chernoff bound (see, e.g., [Doe20, Theorem 1.10.7]), we have

Pr[ZXt0L]Pr[ZnL]Pr[Zn]exp(n/4).\displaystyle\Pr\mathopen{}\mathclose{{}\left[Z\leq X_{t_{0}}-L}\right]\leq\Pr\mathopen{}\mathclose{{}\left[Z\leq n-L}\right]\leq\Pr[Z\leq n]\leq\exp(-n/4).

Hence

Pr[A]Pr[Xt0Xt0+TXt0L]Pr[ZXt0L]1exp(n/4).\displaystyle\Pr[A]\geq\Pr\mathopen{}\mathclose{{}\left[X_{t_{0}}-X_{t_{0}+T}\geq X_{t_{0}}-L}\right]\geq\Pr\mathopen{}\mathclose{{}\left[Z^{\prime}\geq X_{t_{0}}-L}\right]\geq 1-\exp(-n/4).

Note that the above discussed XtX_{t} is corresponding to one given i[0..n1]i\in[0..n-1]. A union bound over all i[0..n1]i\in[0..n-1] gives that for any generation after the (t0+T)(t_{0}+T)-th generation, with probability at least

1nexp(n/4)\displaystyle 1-n\exp(-n/4)

all empty intervals in the population will have the length at most LL, that is, we have MEI(Pt,f1)L\operatorname{MEI}(P_{t},f_{1})\leq L for all tt0+Tt\geq t_{0}+T. This proves the claimed bound with high probability.

For the claimed bound on the expectation, we note that we can repeat our argument above since we did not make any particular assumptions on the initial state. Consequently, the probability that MEI(Pt0+λT,f1)>L\operatorname{MEI}(P_{t_{0}+\lambda T},f_{1})>L is at most (nexp(n/4))λ(n\exp(-n/4))^{\lambda}. This immediately implies that that the expected time to have all empty intervals of length at most LL is at most t0+O(1)T=t0+O(n)t_{0}+O(1)T=t_{0}+O(n), see, e.g., Corollary 1.6.2 in [Doe20]. ∎

It remains to analyze the time it takes to have the two extreme points 0n0^{n} and 1n1^{n} in the population. This analysis is very similar to the analysis of the original NSGA-II with population size large enough that the full Pareto front is found in [ZLD22, Theorems 2 and 6]. Since the proof below also applies to the original NSGA-II, we formulate the following result for both algorithms.

Lemma 15.

Consider using the NSGA-II in the classic version or using the current crowding distance (Algorithm 2 or 3) with one of the following six ways to generate the offspring, namely, applying fair selection, random selection, or binary tournament selection and applying one-bit mutation or standard bit-wise mutation. Then after an expected number of O(nlogn)O(n\log n) iterations, that is, an expected number of O(Nnlogn)O(Nn\log n) fitness evaluations, the two extreme points 0n0^{n} and 1n1^{n} are contained in the population.

Proof.

We first consider the time to generate 1n1^{n}, that is, the unique search point with largest f1f_{1} value. Let t0t\geq 0. Let xmaxx_{\max} be an individual in the parent population PtP_{t} with largest f1f_{1} value and with infinite crowding distance. Let k=f1(xmax)k=f_{1}(x_{\max}). Let pskp_{s}^{k} denote the probability that xmaxx_{\max} appears at least once in the NN individuals selected to generate offspring and let p+kp_{+}^{k} denote the probability of generating an individual with larger f1f_{1} value from xmaxx_{\max} via the mutation operator. Note that if there exists individuals with f1f_{1} value larger than f1(x)f_{1}(x), then one such individual will survive to Pt+1P_{t+1}, and Pt+1P_{t+1} will have an individual with f1f_{1} fitness at least f1(xmax)+1f_{1}(x_{\max})+1. The expected number of iterations until this happens is at most 1/(pskp+k)1/(p_{s}^{k}p_{+}^{k}). It is not difficult to see that the largest f1f_{1} value in PtP_{t} cannot decrease. Since f1(1n)=nf_{1}(1^{n})=n, the expected number of iterations to reach 1n1^{n} is at most k=0n11pskp+k\sum_{k=0}^{n-1}\frac{1}{p_{s}^{k}p_{+}^{k}}.

It is not difficult to see that psk=1p_{s}^{k}=1 for the fair selection and at least 1(11/N)N11/e1-\mathopen{}\mathclose{{}\left(1-1/N}\right)^{N}\geq 1-1/e for random selection. For binary tournament selection, there are at most two individuals with infinite crowding distance but f1f_{1} value different from f1(xmax)f_{1}(x_{\max}). Hence psk1(11NN2N)N=1exp((N2)/N)=Θ(1)p_{s}^{k}\geq 1-\mathopen{}\mathclose{{}\left(1-\frac{1}{N}\frac{N-2}{N}}\right)^{N}=1-\exp(-(N-2)/N)=\Theta(1) in this case. It is also not difficult to see that p+k=(nk)/np_{+}^{k}=(n-k)/n for the one-bit mutation, and p+knkn(11n)n1nkenp_{+}^{k}\geq\frac{n-k}{n}(1-\frac{1}{n})^{n-1}\geq\frac{n-k}{en}. Hence the expected number of iterations to find 1n1^{n} is at most

k=0n11pskp+k=O(k=0n1nnk)=O(nlogn).\displaystyle\sum_{k=0}^{n-1}\frac{1}{p_{s}^{k}p_{+}^{k}}=O\mathopen{}\mathclose{{}\left(\sum_{k=0}^{n-1}\frac{n}{n-k}}\right)=O(n\log n).

Similarly, we could show that the expected number of iterations to have 0n0^{n} in the population is also O(nlogn)O(n\log n). Hence, the expected number of iteration to have the two extreme points in the population is O(nlogn)O(n\log n). Since each iteration uses NN fitness evaluations, the expected number of fitness evaluations is O(Nnlogn)O(Nn\log n). ∎

From Lemmas 14 and 15, we easily obtain the following theorem on the approximation ability of the NSGA-II using the current crowding distance.

Theorem 16.

Let N4N\geq 4. Consider using the NSGA-II with fair or random parent selection, with survival selection using the current crowding distance, and one-bit mutation to optimize the OneMinMax function with problem size nn. Then after an expected number of O(Nnlogn)O(Nn\log n) fitness evaluations, a population containing the two extreme points 0n0^{n} and 1n1^{n} and with MEI(Pt,f1)max{2nN3,1}\operatorname{MEI}(P_{t},f_{1})\leq\max\{\frac{2n}{N-3},1\} is reached and kept for all future time.

Recalling from Lemma 2 that the optimal maximal empty interval size is nN1\frac{n}{N-1}, Theorem 16 shows that the gaps in the Pareto front are at most by around a factor of 22 larger than this theoretical minimum. Also, comparing the runtimes in Lemma 14 with those in Lemma 15 or Theorem 16, we see that the cost for reaching a good approximation is asymptotically negligible compared to the one proven for reaching the two extreme points.

6 Steady-State Approaches

In order to overcome the shortcoming of the original NSGA-II, we discussed in the previous section the approach to update the crowding distance after each removal. A different way to resolve the problems arising from in parallel removing individuals based on the crowding distance is a steady-state approach, that is, a version of the NSGA-II which in each iteration generates only one new solution and selects the new population from the old one plus the new individual. Building on first ideas in this direction in [SR06], such a steady-state NSGA-II was proposed in [DNLA09] and empirically a good performance was shown.

In this section, we will theoretically prove that the steady-state NSGA-II can approximate well the Pareto front. This is also the first mathematical runtime analysis of the steady-state NSGA-II.

6.1 The Steady-State NSGA-II

We now make precise the steady-state NSGA-II we regard in this work. Apart from the way the offspring is generated, it is identical to the steady-state NSGA-II proposed in [DNLA09]. In this NSGA-II, with population size NN, a single offspring is generated per iteration. Note that when generating a single offspring, fair selection is not well-defined, so we only regard random parent selection. Hence the single offspring will be generated from mutating a randomly selected parent, either via one-bit mutation or via bit-wise mutation. From the combined parent and offspring population RR of size N+1N+1, the next parent population of size NN is selected in the same way as in the classic NSGA-II (apart from the different population sizes), that is, non-dominated sorting is applied to RR and then all individuals are taken into the next generation apart from one individual in the last front with smallest crowding distance (ties broken randomly).

We note that the above description of the steady-stage NSGA-II requires to run the non-dominated sorting and crowding-distance computation procedures once for each newly generated individual, whereas in the classic NSGA-II, these procedures are called only once per NN newly generated offspring. A more efficient implementation of the steady-state NSGA was proposed in [NBS16]. Readers can refer to this paper for more details.

6.2 Runtime Analysis and Approximation Quality of the Steady-State NSGA-II

We now prove runtime guarantees and approximation guarantees for the steady-state NSGA-II of the same quality as those proven for the NSGA-II using the current crowding distance in Section 5.2.

We first note that the removal of a least favorable individual from N+1N+1 individuals, which is the selection step of the steady-state NSGA-II, was already analyzed as a special case of Lemma 13, namely as the last of the NN removal steps in the NSGA-II with current crowding distance. Since there no particular structure of the N+1N+1 individuals was assumed, the proof of this lemma immediately implies the following result

Lemma 17.

Let N4N\geq 4. Consider using the steady-state NSGA-II to optimize the OneMinMax function with problem size nn. Let t0t_{0} be the first generation such that the two extreme points 0n0^{n} and 1n1^{n} are in Pt0P_{t_{0}}. Let tt0t\geq t_{0}. Consider the selection of the next population Pt+1P_{t+1} from RtR_{t}. Let xRtx\in R_{t} with crowding distance 4N3\frac{4}{N-3} or higher. Then xPt+1x\in P_{t+1}.

Also, Pt+1P_{t+1} surely contains the two extreme points.

Fix some i[0..n1]i\in[0..n-1]. We reuse from Section 5.2 the notations of XtX_{t} being the length of the empty interval in f1(Pt)f_{1}(P_{t}) that contains i+0.5i+0.5, and YtY_{t} being the corresponding length for RtR_{t}.

From Lemma 17, in a fashion analogous to the proof of Lemma 14, we obtain first that if for some tt0t\geq t_{0} and ML:=max{2nN3,1}M\geq L:=\max\{\frac{2n}{N-3},1\} we have YtMY_{t}\leq M, then XtMX_{t^{\prime}}\leq M for all t>tt^{\prime}>t. Consequently, (i) once XtLX_{t}\leq L, we have XtLX_{t^{\prime}}\leq L for all ttt^{\prime}\geq t, and (ii) if XtLX_{t}\geq L, then Xt+1XtX_{t+1}\leq X_{t}.

Analogous to the second part of the proof in Lemma 14, we have for the steady-state NSGA-II that with probability Ω(1/N)\Omega(1/N), the empty-interval length XtX_{t} reduces in one iteration, that is, that Xt+1Xt1X_{t+1}\leq X_{t}-1, when Xt>LX_{t}>L. This probability was at least 12e(11e)\frac{1}{2e}(1-\frac{1}{e}) in Lemma 14 where we consider the event that a desired individual is contained in the NN selected parents. Now where only one parent is selected, it is smaller by a factor of Θ(N)\Theta(N). These arguments, as in the proof of Lemma 14, give the following lemma.

Lemma 18.

Let N4N\geq 4. Consider using the steady-state NSGA-II with random parent selection and with one-bit or bit-wise mutation to optimize the OneMinMax function with problem size nn. Let t0t_{0} be the first generation that the two extreme points 0n0^{n} and 1n1^{n} are contained in Pt0P_{t_{0}}. Let t1t0t_{1}\geq t_{0} be the first generation such that MEI(Pt1,f1)max{2nN3,1}=:L\operatorname{MEI}(P_{t_{1}},f_{1})\leq\max\{\frac{2n}{N-3},1\}=:L. Then t1t0=O(Nn)t_{1}-t_{0}=O(Nn), both in expectation and with probability 1o(1)1-o(1). Also, for all t>t1t>t_{1}, we have MEI(Pt,f1)L\operatorname{MEI}(P_{t},f_{1})\leq L with probability one.

It remains to analyze the time to have the two extreme points 0n0^{n} and 1n1^{n} in the population. This analysis is completely identical to the one in Lemma 15 apart from the calculation of the probability pksp^{k}_{s}, which now is the probability that the unique parent selected in one iteration is an individual with maximal f1f_{1}-value and infinite crowding distance.

For random selection, this probability is at least pks1/Np^{k}_{s}\geq 1/N. For binary tournament selection, pks1NN2N=N2N2p^{k}_{s}\geq\frac{1}{N}\frac{N-2}{N}=\frac{N-2}{N^{2}}, again using the argument that there are at most two individuals with infinite crowding distance and f1f_{1}-value different from the maximal value. Naturally, these probabilities are by a factor of Θ(N)\Theta(N) smaller than the values computed in Lemma 15, where in each iteration NN offspring were generated. Consequently, in terms of iterations, our bound for the time to find the two extremal points is by a factor of Θ(N)\Theta(N) larger.

Lemma 19.

Consider using the steady-state NSGA-II with one of the following four ways to generate the offspring, namely, applying random selection or binary tournament selection and applying one-bit mutation or standard bit-wise mutation. Then after an expected number of O(Nnlogn)O(Nn\log n) iterations (fitness evaluations), the two extreme points 0n0^{n} and 1n1^{n} are contained in the population.

From Lemmas 18 and 19, we have the following theorem for the approximation ability of the steady-state NSGA-II.

Theorem 20.

Let N4N\geq 4. Consider using the steady-state NSGA-II with random selection and with one-bit or standard bit-wise mutation to optimize the OneMinMax function with problem size nn. Then after an expected number of O(Nnlogn)O(Nn\log n) fitness evaluations, a population containing the two extreme points 0n0^{n} and 1n1^{n} and with MEI(Pt,f1)max{2nN3,1}\operatorname{MEI}(P_{t},f_{1})\leq\max\{\frac{2n}{N-3},1\} is reached and kept for all future time.

Note that except Lemma 19 also discussing the binary tournament selection, we only consider random parent selection in other propositions in this section. It is not clear how to extend them to tournament selection, and we leave it as an interesting open question for future research.

7 Experiments

In Section 4 we conducted a theoretical analysis of two synthetic situations to show that the traditional NSGA-II can have difficulties in approximating the Pareto front. The complicated population dynamics prevented us from analyzing how often such situations arise. In Section 5 and 6, we proved that the NSGA-II using the current crowding distance and the steady-state NSGA-II leave gaps on the Pareto front that are asymptotically at most a factor of 22 larger than those given by a perfect approximation of the Pareto front. To complete the picture, in this section, we present some experimental results.

7.1 Settings

We conduct experiments with the following settings.

  • Problem: OneMinMax, the benchmark in our theoretical results in Sections 4 and 5.

  • Problem size nn: 601. Given that the OneMinMax problem is an easy multi-objective problem, this is a moderate problem size. Such a choice is sensible, since with a too small size we will not gain reliable insights, whereas insights obtained on a large problem size raise the question whether they still apply to practically relevant problem sizes. We use the odd number n=601n=601 to include the setting discussed in Lemma 11.

  • Algorithms: We regard the classic NSGA-II using the initial crowding distance for the selection, the NSGA-II using the current crowding distance, and the steady-state NSGA-II. As in our theoretical analysis, we do not use crossover. That is, mutation is the only operator to generate the offspring population.

  • Mating selection and mutation strategy: We select the parents of the mutation operations via fair selection for the two generational NSGA-II variant. For the steady-state NSGA-II, we select a random individual as parent. As mutation operators, we use one-bit mutation. These setting are included in our mathematical runtime analysis. We have no reason to believe that other settings, e.g., random selection and bit-wise mutation (as also covered in our mathematical analyses), lead to substantially different results.

  • Population size NN: (n+1)/2=301,(n+1)/4=151,(n+1)/2=301,\lceil(n+1)/4\rceil=151, and (n+1)/8=76\lceil(n+1)/8\rceil=76. We choose (n+1)/2(n+1)/2 as this value is used in Lemma 11. We did not regard larger population sizes, since for N=n+1N=n+1 experiments where conducted in [ZLD22]. The two smaller population sizes are used to see how the approximation ability scales with the population size.

  • 2020 independent runs for each setting.

7.2 Results

Our focus is the maximal length of an empty interval on the Pareto front, that is, the MEI\operatorname{MEI} value defined earlier. As long as the population has not fully spread out on the Pareto front, that is, the extremal solutions 0n0^{n} and 1n1^{n} are not yet part of the population, enlarging the spread of the population is more critical than a balanced distribution in the known part of the front. For this reason, we only regard times after both extremal solutions have entered the population.

To see whether the approximation quality changes over time, we regard separately the two intervals of [1..100][1..100] and [3001..3100][3001..3100] generations after finding the two extremal solutions. We collect statistical data on the MEI\operatorname{MEI} value in these intervals in Table 1 and we display exemplary runs in Figure 1.

Table 1: The first, second, and third quartiles (displayed in the format of (,,)(\cdot,\cdot,\cdot)) for the maximal empty interval sizes MEI\operatorname{MEI} within 100100 generations and 2020 independent runs. We regard separately generations [1..100][1..100] and [3001..3100][3001..3100] after the two extremal points have entered the population. For the steady-state NSGA-II with population size NN, we regard generations [1..100N][1..100N] and [3000N+1..3100N][3000N+1..3100N] for a fair comparison.
Generations [1..100][1..100] [3001..3100][3001..3100]
N=301N=301
Initial CD (7,8,9) (7,8,9)
Current CD (3,3,3) (3,3,3)
Steady-State (3,3,3) (3,3,3)
N=151N=151
Initial CD (14,15,17) (14,15,17)
Current CD (5,5,6) (5,5,6)
Steady-State (5,5,5) (5,5,5)
N=76N=76
Initial CD (23,26,29) (24,27,30)
Current CD (11,12,12) (11,12,12)
Steady-State (11,12,12) (11,11,11)
Refer to caption
Refer to caption
Refer to caption
Figure 1: The MEI\operatorname{MEI} for generations [1..100][1..100] and [3001..3100][3001..3100] after the two extreme points were found, in one exemplary run. For the steady-state NSGA-II with population size NN, at point ii on the x-axis actually the data for generations [1+N(i1)..Ni[1+N(i-1)..Ni are displayed, namely the maximal, minimal, and median MEI value in this interval (which always happened to be identical, and also identical to the current-CD value for N=301N=301).

The optimal MEI value n/(N1)\lceil n/(N-1)\rceil for OneMinMax with n=601n=601 and population sizes N=301,151,76N=301,151,76 equals 3,5,3,5, and 99, respectively. From Table 1, we see that the modified NSGA-II and the steady-state NSGA-II often reach the optimal MEI for N=301N=301 and 151151, and are only slightly sub-optimal values for N=76N=76. In contrast, the traditional NSGA-II shows median MEI values of 8,15,268,15,26 in the better time interval. This is more than twice the optimal MEI and the median MEI of the two NSGA-II variants that overcome the problem with the initial crowding distance.

We observe no greater differences between the 100100 generations right after finding the two extremal points and the 100100 generations 30003000 generations later. For N=76N=76, the NSGA-II with initial crowding distance displays slightly larger MEI values in the later interval, whereas the steady-state NSGA-II shows slightly smaller values. We do not have an explanation for this, but the small differences render it not very interesting to investigate this observation further.

In Figure 1, we see that the MEI value oscillates considerably for the classic NSGA-II, whereas it is relatively stable for the NSGA-II using the current crowding distance and constant for the steady state NSGA-II. This appears to be the second advantage of the two variants of the NSGA-II.

Our experimental data is not sufficient to answer the question if the traditional NSGA-II suffers from super-constant MEI values. Our theoretical result in Lemma 11 could be seen as an indication that logarithmic MEI values can show up (or MEI values of order Θ(nNlogN)\Theta(\frac{n}{N}\log N) for general values of NnN\leq n). To answer this question, significantly more experiments with truly large problem sizes would be necessary (due to the slow growth behavior of logarithmic functions). For our purposes, our results, however, are fully sufficient. They show clearly that the two variants of the NSGA-II not building on the initial crowding distance yield much smaller and more stable MEI values. Not surprisingly, the experimentally observed MEI values for these two variants are better than the mathematical guarantee given in Theorems 16 and 20.

8 Conclusion

None of the existing runtime analyses for the NSGA-II (including those published after our conference version [ZD22]) regards the performance of the NSGA-II when the population size is smaller than the size of the Pareto front. In this work, we regard this situation and discuss how well the population evolved by the NSGA-II approximates the Pareto front. Our theoretical analysis of two artificial cases and our experiments give a mixed picture. However, they also suggest that the reason for the not fully satisfying approximation behavior is the fact that the selection of the next parent population is based on the initial crowding distance of individuals in the combined parent and offspring population, which can be very different from the crowding distance at the moment when an individual is removed, an effect previously observed with less mathematical arguments in [KD06].

For the NSGA-II building on the current crowding distance, we proved very positive approximation results for the OneMinMax benchmark. After an expected time comparable to the runtime of the classic NSGA-II on the OneMinMax benchmark, a population is reached that covers the Pareto front apart from gaps that are only a constant factor larger than in an optimal approximation. This state of the population is maintained for the remaining runtime, with probability one. We further discussed the steady-state NSGA-II and proved the same good approximation ability. Our experiments confirm the superiority of these two selection strategies.

From our proofs, we conjecture that similar results can be obtained for other classic benchmark problems such as LOTZ or the large front problem [HN08]. Overall, this work gives additional motivation to prefer two less common variants of the NSGA-II, the NSGA-II working with the current crowding distance [KD06] (around 180 citations on Google scholar) and the steady-state NSGA-II [DNLA09] (76 citations), over the classic NSGA-II (over 50,000 citations).

We want to mention two open problems arising from this work. As already discussed, due to the complex population dynamics, we were not able to conduct a mathematical analysis of the approximation ability of the classic NSGA-II. So neither we could prove, say, a super-constant lower bound on the typical MEI value for, say, Nn/2N\approx n/2, nor we could prove any interesting upper bound on the MEI value, say a logarithmic factor above the ideal value. This is clearly an interesting problem area to further explore. On the more technical side, we note that in this first work on the approximation strength of the NSGA-II we only regarded fair and random parent selection. Our proofs do not apply to tournament selection, and we feel that substantially different arguments will be necessary to prove approximation guarantees in this case.

Acknowlegements

This work was supported by National Natural Science Foundation of China (Grant No. 62306086), Science, Technology and Innovation Commission of Shenzhen Municipality (Grant No. GXWD20220818191018001), Guangdong Basic and Applied Basic Research Foundation (Grant No. 2019A1515110177).

This work was also supported by a public grant as part of the Investissement d’avenir project, reference ANR-11-LABX-0056-LMH, LabEx LMH.

We thank Hisao Ishibuchi for pointing out Kukkonen and Deb’s work [KD06]. We also thank Ke Shang for his question about the behavior of the steady-state NSGA-II at GECCO 2022.

References

  • [ABB10] Anne Auger, Johannes Bader, and Dimo Brockhoff. Theoretically investigating optimal μ\mu-distributions for the hypervolume indicator: First results for three objectives. In Parallel Problem Solving from Nature, PPSN 2010, pages 586–596. Springer, 2010.
  • [ABBZ12] Anne Auger, Johannes Bader, Dimo Brockhoff, and Eckart Zitzler. Hypervolume-based multiobjective optimization: Theoretical foundations and practical implications. Theoretical Computer Science, 425:75–103, 2012.
  • [ABD21] Denis Antipov, Maxim Buzdalov, and Benjamin Doerr. Lazy parameter tuning and control: choosing all parameters randomly from a power-law distribution. In Genetic and Evolutionary Computation Conference, GECCO 2021, pages 1115–1123. ACM, 2021.
  • [AD11] Anne Auger and Benjamin Doerr, editors. Theory of Randomized Search Heuristics. World Scientific Publishing, 2011.
  • [BF13] Karl Bringmann and Tobias Friedrich. Approximation quality of the hypervolume indicator. Artificial Intelligence, 195:265–290, 2013.
  • [BFLI+09] Nicola Beume, Carlos M. Fonseca, Manuel Lopez-Ibanez, Luis Paquete, and Jan Vahrenhold. On the complexity of computing the hypervolume indicator. IEEE Transactions on Evolutionary Computation, 13:1075–1082, 2009.
  • [BFN08] Dimo Brockhoff, Tobias Friedrich, and Frank Neumann. Analyzing hypervolume indicator based algorithms. In Parallel Problem Solving from Nature, PPSN 2008, pages 651–660. Springer, 2008.
  • [BFQY20] Chao Bian, Chao Feng, Chao Qian, and Yang Yu. An efficient evolutionary algorithm for subset selection with general cost constraints. In Conference on Artificial Intelligence, AAAI 2020, pages 3267–3274. AAAI Press, 2020.
  • [BQ22] Chao Bian and Chao Qian. Better running time of the non-dominated sorting genetic algorithm II (NSGA-II) by using stochastic tournament selection. In Parallel Problem Solving From Nature, PPSN 2022, pages 428–441. Springer, 2022.
  • [BQT18] Chao Bian, Chao Qian, and Ke Tang. A general approach to running time analysis of multi-objective evolutionary algorithms. In International Joint Conference on Artificial Intelligence, IJCAI 2018, pages 1405–1411. IJCAI, 2018.
  • [BT03] Peter A.N. Bosman and Dirk Thierens. The balance between proximity and diversity in multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 7:174–188, 2003.
  • [BZLQ23] Chao Bian, Yawen Zhou, Miqing Li, and Chao Qian. Stochastic population update can provably be helpful in multi-objective evolutionary algorithms. In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5513–5521. ijcai.org, 2023.
  • [CCRS04] Carlos A. Coello Coello and Margarita Reyes Sierra. A study of the parallelization of a coevolutionary multi-objective evolutionary algorithm. In Mexican International Conference on Artificial Intelligence, MICAI 2004, pages 688–697. Springer, 2004.
  • [CDH+23] Sacha Cerf, Benjamin Doerr, Benjamin Hebras, Jakob Kahane, and Simon Wietheger. The first proven performance guarantees for the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) on a combinatorial optimization problem. In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5522–5530. ijcai.org, 2023.
  • [Cra21] Victoria G. Crawford. Faster guarantees of evolutionary algorithms for maximization of monotone submodular functions. In International Joint Conference on Artificial Intelligence, IJCAI 2021, pages 1661–1667. ijcai.org, 2021.
  • [DDHW23] Matthieu Dinot, Benjamin Doerr, Ulysse Hennebelle, and Sebastian Will. Runtime analyses of multi-objective evolutionary algorithms in the presence of noise. In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5549–5557. ijcai.org, 2023.
  • [DELQ22] Duc-Cuong Dang, Anton V. Eremeev, Per Kristian Lehre, and Xiaoyu Qin. Fast non-elitist evolutionary algorithms with power-law ranking selection. In Genetic and Evolutionary Computation Conference, GECCO 2022, pages 1372–1380. ACM, 2022.
  • [DGN16] Benjamin Doerr, Wanru Gao, and Frank Neumann. Runtime analysis of evolutionary diversity maximization for OneMinMax. In Genetic and Evolutionary Computation Conference, GECCO 2016, pages 557–564. ACM, 2016.
  • [DK15] Benjamin Doerr and Marvin Künnemann. Optimizing linear functions with the (1+λ)(1+\lambda) evolutionary algorithm—different asymptotic runtimes for different instances. Theoretical Computer Science, 561:3–23, 2015.
  • [DLMN17] Benjamin Doerr, Huu Phuoc Le, Régis Makhmara, and Ta Duy Nguyen. Fast genetic algorithms. In Genetic and Evolutionary Computation Conference, GECCO 2017, pages 777–784. ACM, 2017.
  • [DN20] Benjamin Doerr and Frank Neumann, editors. Theory of Evolutionary Computation—Recent Developments in Discrete Optimization. Springer, 2020. Also available at http://www.lix.polytechnique.fr/Labo/Benjamin.Doerr/doerr_neumann_book.html.
  • [DNLA09] Juan J. Durillo, Antonio J. Nebro, Francisco Luna, and Enrique Alba. On the effect of the steady-state selection scheme in multi-objective genetic algorithms. In International Conference on Evolutionary Multi-Criterion Optimization, EMO 2009, pages 183–197. Springer Berlin Heidelberg, 2009.
  • [Doe20] Benjamin Doerr. Probabilistic tools for the analysis of randomized optimization heuristics. In Benjamin Doerr and Frank Neumann, editors, Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pages 1–87. Springer, 2020. Also available at https://arxiv.org/abs/1801.06733.
  • [DOSS23a] Duc-Cuong Dang, Andre Opris, Bahare Salehi, and Dirk Sudholt. Analysing the robustness of NSGA-II under noise. In Genetic and Evolutionary Computation Conference, GECCO 2023, pages 642–651. ACM, 2023.
  • [DOSS23b] Duc-Cuong Dang, Andre Opris, Bahare Salehi, and Dirk Sudholt. A proof that using crossover can guarantee exponential speed-ups in evolutionary multi-objective optimisation. In Conference on Artificial Intelligence, AAAI 2023, pages 12390–12398. AAAI Press, 2023.
  • [DPAM02] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6:182–197, 2002.
  • [DQ23a] Benjamin Doerr and Zhongdi Qu. A first runtime analysis of the NSGA-II on a multimodal problem. Transactions on Evolutionary Computation, 2023. https://doi.org/10.1109/TEVC.2023.3250552.
  • [DQ23b] Benjamin Doerr and Zhongdi Qu. From understanding the population dynamics of the NSGA-II to the first proven lower bounds. In Conference on Artificial Intelligence, AAAI 2023, pages 12408–12416. AAAI Press, 2023.
  • [DQ23c] Benjamin Doerr and Zhongdi Qu. Runtime analysis for the NSGA-II: provable speed-ups from crossover. In Conference on Artificial Intelligence, AAAI 2023, pages 12399–12407. AAAI Press, 2023.
  • [EDB07] Michael Emmerich, André Deutz, and Nicola Beume. Gradient-based/evolutionary relay hybrid for computing pareto front approximations maximizing the S-metric. In International Workshop on Hybrid Metaheuristics, HM 2007, pages 140–156. Springer, 2007.
  • [FNT15] Tobias Friedrich, Frank Neumann, and Christian Thyssen. Multiplicative approximations, optimal hypervolume distributions, and the choice of the reference point. Evolutionary Computation, 23:131–159, 2015.
  • [Gie03] Oliver Giel. Expected runtimes of a simple multi-objective evolutionary algorithm. In Congress on Evolutionary Computation, CEC 2003, pages 1918–1925. IEEE, 2003.
  • [GL10] Oliver Giel and Per Kristian Lehre. On the effect of populations in evolutionary multi-objective optimisation. Evolutionary Computation, 18:335–356, 2010.
  • [Gut12] Walter J. Gutjahr. Runtime analysis of an evolutionary algorithm for stochastic multi-objective combinatorial optimization. Evolutionary Computation, 20:395–421, 2012.
  • [HN08] Christian Horoba and Frank Neumann. Benefits and drawbacks for the use of epsilon-dominance in evolutionary multi-objective optimization. In Genetic and Evolutionary Computation Conference, GECCO 2008, pages 641–648. ACM, 2008.
  • [HN09] Christian Horoba and Frank Neumann. Additive approximations of pareto-optimal sets by evolutionary multi-objective algorithms. In Foundations of Genetic Algorithms, FOGA 2009, pages 79–86. ACM, 2009.
  • [Jan13] Thomas Jansen. Analyzing Evolutionary Algorithms – The Computer Science Perspective. Springer, 2013.
  • [JJW05] Thomas Jansen, Kenneth A. De Jong, and Ingo Wegener. On the choice of the offspring population size in evolutionary algorithms. Evolutionary Computation, 13:413–440, 2005.
  • [KD06] Saku Kukkonen and Kalyanmoy Deb. Improved pruning of non-dominated solutions based on crowding distance for bi-objective optimization problems. In Conference on Evolutionary Computation, CEC 2006, pages 1179–1186. IEEE, 2006.
  • [LTDZ02] Marco Laumanns, Lothar Thiele, Kalyanmoy Deb, and Eckart Zitzler. Combining convergence and diversity in evolutionary multiobjective optimization. Evolutionary Computation, 10:263–282, 2002.
  • [LTZ+02] Marco Laumanns, Lothar Thiele, Eckart Zitzler, Emo Welzl, and Kalyanmoy Deb. Running time analysis of multi-objective evolutionary algorithms on a simple discrete optimization problem. In Parallel Problem Solving from Nature, PPSN 2002, pages 44–53. Springer, 2002.
  • [LTZ04] Marco Laumanns, Lothar Thiele, and Eckart Zitzler. Running time analysis of multiobjective evolutionary algorithms on pseudo-Boolean functions. IEEE Transactions on Evolutionary Computation, 8:170–182, 2004.
  • [LZZZ16] Yuan-Long Li, Yu-Ren Zhou, Zhi-Hui Zhan, and Jun Zhang. A primary theoretical study on decomposition-based multiobjective evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 20:563–576, 2016.
  • [NBS16] Niyaz Nigmatullin, Maxim Buzdalov, and Andrey Stankevich. Efficient removal of points with smallest crowding distance in two-dimensional incremental non-dominated sorting. In Genetic and Evolutionary Computation Conference, GECCO 2016, Companion Material, pages 1121–1128. ACM, 2016.
  • [Neu07] Frank Neumann. Expected runtimes of a simple evolutionary algorithm for the multi-objective minimum spanning tree problem. European Journal of Operational Research, 181:1620–1629, 2007.
  • [NR08] Frank Neumann and Joachim Reichel. Approximating minimum multicuts by evolutionary multi-objective algorithms. In Parallel Problem Solving from Nature, PPSN 2008, pages 72–81. Springer, 2008.
  • [NRS11] Frank Neumann, Joachim Reichel, and Martin Skutella. Computing minimum cuts by randomized search heuristics. Algorithmica, 59:323–342, 2011.
  • [NSN15] Anh Quang Nguyen, Andrew M. Sutton, and Frank Neumann. Population size matters: rigorous runtime results for maximizing the hypervolume indicator. Theoretical Computer Science, 561:24–36, 2015.
  • [NW10] Frank Neumann and Carsten Witt. Bioinspired Computation in Combinatorial Optimization – Algorithms and Their Computational Complexity. Springer, 2010.
  • [PSN19] Mojgan Pourhassan, Feng Shi, and Frank Neumann. Parameterized analysis of multiobjective evolutionary algorithms and the weighted vertex cover problem. Evolutionary Computation, 27:559–575, 2019.
  • [QBF20] Chao Qian, Chao Bian, and Chao Feng. Subset selection by Pareto optimization with recombination. In Conference on Artificial Intelligence, AAAI 2020, pages 2408–2415. AAAI Press, 2020.
  • [QYT+19] Chao Qian, Yang Yu, Ke Tang, Xin Yao, and Zhi-Hua Zhou. Maximizing submodular or monotone approximately submodular functions by multi-objective evolutionary algorithms. Artificial Intelligence, 275:279–294, 2019.
  • [RNNF19] Vahid Roostapour, Aneta Neumann, Frank Neumann, and Tobias Friedrich. Pareto optimization for subset selection with dynamic cost constraints. In Conference on Artificial Intelligence, AAAI 2019, pages 2354–2361. AAAI Press, 2019.
  • [SIHP21] Ke Shang, Hisao Ishibuchi, Linjun He, and Lie Meng Pang. A survey on the hypervolume indicator in evolutionary multiobjective optimization. IEEE Transactions on Evolutionary Computation, 25:1–20, 2021.
  • [SR06] Dipti Srinivasan and Lily Rachmawati. An efficient multi-objective evolutionary algorithm with steady-state replacement model. In Genetic and Evolutionary Computation Conference, GECCO 2006, pages 715–722. ACM, 2006.
  • [VL98] David A. Van Veldhuizen and Gary B. Lamont. Evolutionary computation and convergence to a Pareto front. In John R. Koza, editor, Late Breaking Papers at the Genetic Programming Conference 1998, pages 221–228. Stanford University Bookstore, 1998.
  • [WD23] Simon Wietheger and Benjamin Doerr. A mathematical runtime analysis of the Non-dominated Sorting Genetic Algorithm III (NSGA-III). In International Joint Conference on Artificial Intelligence, IJCAI 2023, pages 5657–5665. ijcai.org, 2023.
  • [ZD22] Weijie Zheng and Benjamin Doerr. Better approximation guarantees for the NSGA-II by using the current crowding distance. In Genetic and Evolutionary Computation Conference, GECCO 2022, pages 611–619. ACM, 2022.
  • [ZD23a] Weijie Zheng and Benjamin Doerr. Mathematical runtime analysis for the non-dominated sorting genetic algorithm II (NSGA-II). Artificial Intelligence, 2023. In Press, https://doi.org/10.1016/j.artint.2023.104016.
  • [ZD23b] Weijie Zheng and Benjamin Doerr. Runtime analysis for the NSGA-II: proving, quantifying, and explaining the inefficiency for three or more objectives. IEEE Transactions on Evolutionary Computation, 2023. In Press, https://doi.org/0.1109/TEVC.2023.3320278.
  • [ZD23c] Weijie Zheng and Benjamin Doerr. Theoretical analyses of multiobjective evolutionary algorithms on multimodal objectives. Evolutionary Computation, 2023. In Press, https://doi.org/10.1162/evco_a_00328.
  • [ZLD22] Weijie Zheng, Yufei Liu, and Benjamin Doerr. A first mathematical runtime analysis of the Non-Dominated Sorting Genetic Algorithm II (NSGA-II). In Conference on Artificial Intelligence, AAAI 2022, pages 10408–10416. AAAI Press, 2022.
  • [ZQL+11] Aimin Zhou, Bo-Yang Qu, Hui Li, Shi-Zheng Zhao, Ponnuthurai Nagaratnam Suganthan, and Qingfu Zhang. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm and Evolutionary Computation, 1:32–49, 2011.
  • [ZT98] Eckart Zitzler and Lothar Thiele. Multiobjective optimization using evolutionary algorithms—a comparative case study. In Parallel Problem Solving from Nature, PPSN 1998, pages 292–301. Springer, 1998.
  • [ZYQ19] Zhi-Hua Zhou, Yang Yu, and Chao Qian. Evolutionary Learning: Advances in Theories and Algorithms. Springer, 2019.