This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Analysis of Quality Indicators Using Approximated Optimal Distributions in a Three-dimensional Objective Space

Ryoji Tanabe, and Hisao Ishibuchi R. Tanabe and H. Ishibuchi are with Shenzhen Key Laboratory of Computational Intelligence, University Key Laboratory of Evolving Intelligent Systems of Guangdong Province, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China. e-mail: ([email protected], [email protected]). (Corresponding author: Hisao Ishibuchi)
Abstract

Although quality indicators play a crucial role in benchmarking evolutionary multi-objective optimization algorithms, their properties are still unclear. One promising approach for understanding quality indicators is the use of the optimal distribution of objective vectors that optimizes each quality indicator. However, it is difficult to obtain the optimal distribution for each quality indicator, especially when its theoretical property is unknown. Thus, optimal distributions for most quality indicators have not been well investigated. To address these issues, first, we propose a problem formulation of finding the optimal distribution for each quality indicator on an arbitrary Pareto front. Then, we approximate the optimal distributions for nine quality indicators using the proposed problem formulation. We analyze the nine quality indicators using their approximated optimal distributions on eight types of Pareto fronts of three-objective problems. Our analysis demonstrates that uniformly-distributed objective vectors over the entire Pareto front are not optimal in many cases. Each quality indicator has its own optimal distribution for each Pareto front. We also examine the consistency among the nine quality indicators.

Index Terms:
Evolutionary multi-objective optimization, quality indicators, optimal distributions of objective vectors

I Introduction

MULTI-OBJECTIVE optimization problems (MOPs) appear in real-world applications. Since no solution can simultaneously optimize multiple objective functions in general, the goal of MOPs is usually to find a Pareto optimal solution preferred by a decision maker [1]. When no information about the decision maker’s preference is available, an “a posteriori” decision making is performed. The decision maker selects a final solution from a non-dominated solution set that approximates the Pareto front in the objective space. An evolutionary multi-objective optimization algorithm (EMOA) is frequently used for the “a posteriori” decision making [2].

In multi-objective optimization, a vector that consists of all objective values of a solution is referred to as an objective vector. In this paper, we are interested in the distribution of objective vectors in the objective space, rather than the distribution of solutions in the solution space. Two terms “objective vector” and “solution” are used synonymously in some previous studies (e.g., [3, 4, 5]). For the sake of clarity, we distinguish the two terms. That is, solutions in the objective space are always referred to as objective vectors in this paper. We also assume that the number of objective vectors is bounded by the population size μ\mu.

Most quality indicators evaluate the quality of an objective vector set in terms of at least one of the three criteria (the convergence, the uniformity, and the spread) [6, 7, 8, 5]. Although a number of quality indicators have been proposed in the literature, we focus on unary quality indicators that map μ\mu objective vectors to a real number. Representative quality indicators include hypervolume (HV) [9], the additive ϵ\epsilon-indicator (Iϵ+I_{\epsilon+}) [7], and inverted generational distance (IGD) [10]. Since the performance of EMOAs is discussed based on the quality of obtained objective vector sets, quality indicators play a central role in benchmarking EMOAs. Although this paper does not address the cardinality of an objective vector set, it does not mean that the cardinality is less important. As pointed out in [5], the cardinality is one of important aspects when the size of the Pareto front is relatively small (e.g., combinatorial MOPs [11]). Since we address only non-dominated objective vectors on the Pareto front of a continuous MOPs in this paper, we do not analyze the cardinality-based quality indicators.

One of the critical issues in quality indicators is that their properties are not always obvious. Thus, some quality indicators are likely to provide misleading information about the quality of objective vector sets. HV is maximized by non-uniform objective vectors when the Pareto front is nonlinear [12]. Although the original generalized spread (Δ\Delta) for bi-objective problems [13] can evaluate the uniformity of given objective vectors, its extended version for problems with more than two objectives [14] overestimates the uniformity of non-uniform objective vectors in some cases [15, 5].

Since understanding the properties of quality indicators is necessary for the development of EMOAs in the right direction, some analysis studies have been presented in the literature. One of the most promising approaches is the use of the optimal distribution of objective vectors that optimizes a given quality indicator. A distribution of μ\mu objective vectors is said to be the optimal μ\mu-distribution if it optimizes a quality indicator [16]. The optimal μ\mu-distribution can explain which distribution of objective vectors is highly evaluated by each quality indicator. For example, the optimal μ\mu-distribution for IGD does not contain objective vectors on the edges of the linear Pareto front in some cases [3]. Thus, IGD may overestimate an objective vector set with a poor spread. The optimal μ\mu-distribution helps an indirect analysis of each quality indicator in this manner.

Although the optimal μ\mu-distributions for HV and IGD can be obtained on the linear Pareto front of two objective problems [12, 3], it is difficult to obtain the optimal μ\mu-distributions for other quality indicators in an exact manner, especially problems with more than two objectives. Thus, approximated optimal μ\mu-distributions are used for the analysis of quality indicators. In general, an approximated optimal μ\mu-distribution is obtained in an analytical or empirical manner [12, 17, 18, 19, 3]. However, existing analytical and empirical approaches have some limitations. Analytical approaches utilize some specific properties of a quality indicator (e.g., the decomposability of the HV calculation [18]). For this reason, analytical approaches can be applied only to quality indicators whose theoretical properties are clear (e.g., HV for two- and three-objective problems [12, 18] and Iϵ+I_{\epsilon+} only for two-objective problems [19]). Empirical approaches translate the problem of finding the optimal μ\mu-distribution on the Pareto front into a black-box single-objective continuous optimization problem. Then, the optimal μ\mu-distribution is approximated using a derivative-free optimizer. For example, the optimal μ\mu-distribution for R2 was approximated by CMA-ES [20] in [17]. Although empirical approaches are more flexible than analytical approaches, empirical approaches have been applied to only a few quality indicators, including HV [12] and R2 [17]. In addition, most previous studies addressed only two-objective problems. This is because it is not obvious how to formulate the problem of finding the optimal μ\mu-distribution on a multi-objective problem with more than two objectives. For these reasons, the (approximated) optimal μ\mu-distributions for most quality indicators on problems with more than two objectives have not been well analyzed in the literature.

To address these issues, first, we propose a problem formulation of finding the optimal μ\mu-distribution for each quality indicator on a Pareto front in an arbitrary-dimensional objective space. Then, we approximate the optimal μ\mu-distribution for each of the following nine quality indicators on a Pareto front in a three-dimensional objective space: HV, IGD, IGD plus (IGD+) [21], R2, new R2 (NR2) [22], Iϵ+I_{\epsilon+}, ss-energy (SE) [23], Δ\Delta, and pure diversity (PD) [24]. We analyze the nine quality indicators using their approximated optimal μ\mu-distributions on eight types of Pareto fronts.

Our main contributions in this paper are as follows:

  1. 1.

    We propose a general problem formulation of finding the optimal μ\mu-distribution. In contrast to the existing empirical approaches, the proposed formulation can handle any number of objectives. The proposed formulation is applicable to any types of the Pareto front (under the condition that its front shape function is given). Note that the formulation proposed in [12] and our formulation can be categorized into set-based optimization [25]. Details are discussed in Section IV.

  2. 2.

    We approximate the optimal μ\mu-distributions for the nine quality indicators on the Pareto fronts of eight problems with three objectives. This is the first study to present the approximated optimal μ\mu-distributions for NR2, Δ\Delta, SE, PD, R2, and Iϵ+I_{\epsilon+} for three-objective problems.

  3. 3.

    We analyze the nine quality indicators from the viewpoint of the approximated optimal μ\mu-distributions. We provide insightful information about the nine quality indicators. For example, PD may overestimate an objective vector set with a small dissimilarity. Our observations are summarized in Section VII.

The rest of this paper is organized as follows. Section II provides some preliminaries. Section III describes related studies. Section IV proposes a problem formulation to search for the optimal distribution of objective vectors. Section V describes the setting of our computational experiments. Section VI shows analysis results. Section VII concludes this paper.

II Preliminaries

II-A Definition of multi-objective optimization problems (MOPs)

We suppose a multi-objective minimization problem in this paper. A continuous MOP is to find a solution 𝒙𝕊\mbox{\boldmath$x$}\in\mathbb{S} that minimizes a given objective function vector 𝒇:nm\mbox{\boldmath$f$}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}. Here, 𝕊n\mathbb{S}\subseteq\mathbb{R}^{n} is the nn-dimensional solution space, and m\mathbb{R}^{m} is the mm-dimensional objective space. Thus, nn is the number of decision variables, and mm is the number of objective functions.

A solution 𝒙1\mbox{\boldmath$x$}_{1} is said to dominate 𝒙2\mbox{\boldmath$x$}_{2} iff fi(𝒙1)fi(𝒙2)f_{i}(\mbox{\boldmath$x$}_{1})\leq f_{i}(\mbox{\boldmath$x$}_{2}) for all i{1,,m}i\in\{1,...,m\} and fi(𝒙1)<fi(𝒙2)f_{i}(\mbox{\boldmath$x$}_{1})<f_{i}(\mbox{\boldmath$x$}_{2}) for at least one index ii. In addition, 𝒙1\mbox{\boldmath$x$}_{1} is said to weakly dominate 𝒙2\mbox{\boldmath$x$}_{2} iff fi(𝒙1)fi(𝒙2)f_{i}(\mbox{\boldmath$x$}_{1})\leq f_{i}(\mbox{\boldmath$x$}_{2}) for all i{1,,m}i\in\{1,...,m\}. If 𝒙\mbox{\boldmath$x$}^{*} is not dominated by any other solutions in 𝕊\mathbb{S}, 𝒙\mbox{\boldmath$x$}^{*} is a Pareto optimal solution. The set of all 𝒙\mbox{\boldmath$x$}^{*} is the Pareto optimal solution set. The set of all 𝒇(𝒙)\mbox{\boldmath$f$}(\mbox{\boldmath$x$}^{*}) is the Pareto front. The goal of MOPs for the “a posteriori” decision making is to find a non-dominated solution set that approximates the Pareto front in the objective space.

II-B Quality indicators

For the sake of simplicity, we denote the objective vector 𝒇(𝒙)=(f1(𝒙),,fm(𝒙))T\mbox{\boldmath$f$}(\mbox{\boldmath$x$})=\bigl{(}f_{1}(\mbox{\boldmath$x$}),...,f_{m}(\mbox{\boldmath$x$})\bigr{)}^{\rm T} as 𝒂=(a1,,am)T\mbox{\boldmath$a$}=(a_{1},...,a_{m})^{\rm T}, where ai=fi(𝒙)a_{i}=f_{i}(\mbox{\boldmath$x$}) for i{1,,m}i\in\{1,...,m\}. Let 𝑨={𝒂1,,𝒂μ}\mbox{\boldmath$A$}=\{\mbox{\boldmath$a$}_{1},...,\mbox{\boldmath$a$}_{\mu}\} be a set of μ\mu objective vectors, where μ\mu is the population size.

Let Ω\Omega be a set of all possible objective vector sets. A unary quality indicator I:ΩI:\Omega\rightarrow\mathbb{R} evaluates the quality of 𝑨A in terms of at least one of the convergence, the uniformity, and the spread. II is said to be Pareto-compliant iff the ranking of all objective vector sets in Ω\Omega by II is consistent with the Pareto dominance relation (for details, see [26]). In this paper, we define the convergence, the uniformity, and the spread as follows. The convergence of 𝑨A means how close objective vectors in 𝑨A are to the Pareto front. The uniformity of 𝑨A means how uniform the distribution of objective vectors in 𝑨A is. The spread of 𝑨A means how well objective vectors in 𝑨A cover the entire Pareto front. As pointed out in [5], the spread is not equal to the extensity, which represents the length of the boundaries of 𝑨A. A combination of the uniformity and the spread is referred to as “diversity” in the evolutionary multi-objective optimization (EMO) community. Fig. S.1 in the supplementary file shows examples of distributions of objective vectors in the three cases on the linear Pareto front with m=2m=2.

Below, we explain the following nine quality indicators used in this paper: HV, IGD, IGD+, R2, NR2, Iϵ+I_{\epsilon+}, SE, Δ\Delta, and PD. We do not consider convergence-based quality indicators, such as generational distance (GD) [27], which evaluate the quality of an objective vector set 𝑨A in terms only of the convergence. This is because the optimal μ\mu-distribution for such a quality indicator is obvious. 𝑨A is optimal for convergence-based quality indicators if all elements in 𝑨A are on the Pareto front regardless of their distribution. For the same reason, we do not consider cardinality-based quality indicators, such as overall non-dominated vector generation (ONVG) [28], which evaluate the quality of 𝑨A based on the number of non-dominated solutions. For example, all of 𝑨1\mbox{\boldmath$A$}_{1}, 𝑨2\mbox{\boldmath$A$}_{2}, and 𝑨3\mbox{\boldmath$A$}_{3} in Fig. S.1 are optimal for both GD and ONVG.

Below, 𝑹R is a set of mm-dimensional reference vectors that approximate the Pareto front in the objective space. The reference point 𝒒=(q1,,qm)Tm\mbox{\boldmath$q$}=(q_{1},...,q_{m})^{\rm T}\in\mathbb{R}^{m} is used for the HV and NR2 calculations. It should be noted that 𝒒𝑹\mbox{\boldmath$q$}\not\in\mbox{\boldmath$R$}.

II-B1 Hypervolume (HV)

Since HV [9] is the only Pareto-compliant quality indicator known so far [29], HV is one of the most popular quality indicators. The HV value of 𝑨A is the volume of the area that is dominated by objective vectors in 𝑨A and bounded by the reference point 𝒒q as follows:

HV(𝑨)=volume(𝒂𝑨[a1,q1]××[am,qm]),\displaystyle{\rm HV}(\mbox{\boldmath$A$})={\rm volume}\Biggl{(}\bigcup_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}[a_{1},q_{1}]\times...\times[a_{m},q_{m}]\Biggr{)}, (1)

where the function “volume{\rm volume}” in (1) is the Lebesgue measure. A large HV value indicates that 𝑨A approximates the Pareto front well in terms of both convergence and diversity.

II-B2 Inverted generational distance (IGD)

IGD [10] evaluates the quality of 𝑨A in terms of both convergence and diversity. IGD measures the average distance from each reference vector 𝒓r in 𝑹R to its nearest objective vector 𝒂a in 𝑨A as follows:

IGD(𝑨)=1|𝑹|(𝒓𝑹min𝒂𝑨{d(𝒂,𝒓)}),\displaystyle{\rm IGD}(\mbox{\boldmath$A$})=\frac{1}{|\mbox{\boldmath$R$}|}\left(\sum_{\mbox{\boldmath$r$}\in\mbox{\boldmath$R$}}\min_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}\Bigl{\{}d(\mbox{\boldmath$a$},\mbox{\boldmath$r$})\Bigr{\}}\right), (2)

where d(𝒂,𝒓)d(\mbox{\boldmath$a$},\mbox{\boldmath$r$}) in (2) is the distance between 𝒂a and 𝒓r. IGD uses the following Euclidean distance: dIGD(𝒂,𝒓)=i=1m(airi)2d_{\rm IGD}(\mbox{\boldmath$a$},\mbox{\boldmath$r$})=\sqrt{\sum^{m}_{i=1}(a_{i}-r_{i})^{2}}.

II-B3 IGD plus (IGD+)

Since IGD is not Pareto compliant, IGD may incorrectly evaluate the quality of 𝑨A that contains non-converged objective vectors [21, 30]. IGD+ [21] addresses the issue in IGD. While IGD is Pareto non-compliant, IGD+ is weakly Pareto compliant. The IGD+ value of 𝑨A is the average distance from each reference vector 𝒓r to its nearest objective vector in the dominated region by 𝑨A [31]. Although the IGD+ value of 𝑨A is calculated using (2), dd is replaced by the following distance function: dIGD+(𝒂,𝒓)=i=1m(max{airi,0})2d_{\rm IGD^{+}}(\mbox{\boldmath$a$},\mbox{\boldmath$r$})=\sqrt{\sum^{m}_{i=1}\bigl{(}\max\{a_{i}-r_{i},0\}\bigr{)}^{2}}. It should be noted that small IGD and IGD+ values are preferable whereas large HV values are preferable.

In addition to IGD+, the averaged Hausdorff distance indicator (Δp\Delta_{p}) [30] addresses the issue in IGD. When all elements in 𝑨A are on the Pareto front and 𝑹R has a large number of uniformly distributed reference vectors over the entire Pareto front (as in our computational experiments in this paper), the Δp\Delta_{p} value of 𝑨A is equal to the IGD value of 𝑨A in general. Actually, the IGD and Δp\Delta_{p} values are the same for all of the obtained solution sets in this paper. For this reason, we do not report experimental results for Δp\Delta_{p}. Of course, the above-mentioned relationship between Δp\Delta_{p} and IGD does not always hold depending on 𝑨A and 𝑹R.

II-B4 The additive ϵ\epsilon-indicator (Iϵ+I_{\epsilon+})

Here, we describe the unary version of Iϵ+I_{\epsilon+} [7]. The Iϵ+I_{\epsilon+} value of two objective vectors 𝒂a and 𝒓r is given as follows:

Iϵ+vec(𝒂,𝒓)=maxi{1,,m}{airi},\displaystyle I_{\epsilon+}^{\rm vec}(\mbox{\boldmath$a$},\mbox{\boldmath$r$})=\max_{i\in\{1,...,m\}}\{a_{i}-r_{i}\}, (3)

where the Iϵ+vec(𝒂,𝒓)I_{\epsilon+}^{\rm vec}(\mbox{\boldmath$a$},\mbox{\boldmath$r$}) value is the minimal shift such that 𝒂a weakly dominates 𝒓r. It should be noted that Iϵ+vec(𝒂,𝒓)Iϵ+vec(𝒓,𝒂)I_{\epsilon+}^{\rm vec}(\mbox{\boldmath$a$},\mbox{\boldmath$r$})\neq I_{\epsilon+}^{\rm vec}(\mbox{\boldmath$r$},\mbox{\boldmath$a$}) in (3). Iϵ+I_{\epsilon+} evaluates how well 𝑨A approximates the reference vector set 𝑹R as follows:

Iϵ+(𝑨)=max𝒓𝑹{min𝒂𝑨{Iϵ+vec(𝒂,𝒓)}},\displaystyle I_{\epsilon+}(\mbox{\boldmath$A$})=\max_{\mbox{\boldmath$r$}\in\mbox{\boldmath$R$}}\biggl{\{}\min_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}\Bigl{\{}I_{\epsilon+}^{\rm vec}(\mbox{\boldmath$a$},\mbox{\boldmath$r$})\Bigr{\}}\biggr{\}}, (4)

where the Iϵ+I_{\epsilon+} value of 𝑨A in (4) is the minimal shift such that each reference objective vector in 𝑹R is weakly dominated by at least one objective vector in 𝑨A. A small Iϵ+I_{\epsilon+} value indicates that the corresponding 𝑨A is a good approximation of the Pareto front in terms of both convergence and diversity.

II-B5 R2

Although three R indicators are proposed in [32], the following R2 indicator is the most widely used one:

R2(𝑨)\displaystyle{\rm R2}(\mbox{\boldmath$A$}) =1|𝑾|𝒘𝑾min𝒂𝑨{gtch(𝒂,𝒘)},\displaystyle=\frac{1}{|\mbox{\boldmath$W$}|}\sum_{\mbox{\boldmath$w$}\in\mbox{\boldmath$W$}}\min_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}\Bigl{\{}g^{\rm tch}(\mbox{\boldmath$a$},\mbox{\boldmath$w$})\Bigr{\}}, (5)
gtch(𝒂,𝒘)\displaystyle g^{\rm tch}(\mbox{\boldmath$a$},\mbox{\boldmath$w$}) =maxi{1,,m}{wi|aizi|},\displaystyle=\max_{i\in\{1,...,m\}}\bigl{\{}w_{i}|a_{i}-z_{i}^{*}|\bigr{\}}, (6)

where 𝑾W in (5) is a set of weight vectors. For each 𝒘=(w1,,wm)T𝑾\mbox{\boldmath$w$}=(w_{1},...,w_{m})^{\rm T}\in\mbox{\boldmath$W$}, 0wi10\leq w_{i}\leq 1 for i{1,,m}i\in\{1,...,m\} and i=1mwi=1\sum^{m}_{i=1}w_{i}=1. In (6), gtch:mg^{\rm tch}:\mathbb{R}^{m}\rightarrow\mathbb{R} is the weighted Tchebycheff function, and 𝒛=(z1,,zm)T\mbox{\boldmath$z$}^{*}=(z^{*}_{1},...,z^{*}_{m})^{\rm T} is the ideal point. The ii-th element ziz^{*}_{i} of 𝒛\mbox{\boldmath$z$}^{*} is the minimum value of the ii-th objective function over the Pareto front. According to [5], R2 with a tuned 𝒛\mbox{\boldmath$z$}^{*} works well on various problems with m=2m=2. A small R2 value indicates that 𝑨A approximates the Pareto front well in terms of both convergence and diversity.

II-B6 New R2 indicator (NR2)

Although the Pareto compliance of HV is attractive, the computation time for the HV calculation increases exponentially with the number of objectives mm. To address this issue, the method of approximating the HV value using R2 is proposed in [33]. The following NR2 [22] is an improved version of R2 to obtain a better approximation of the HV value:

NR2(𝑨)\displaystyle{\rm NR2}(\mbox{\boldmath$A$}) =1|𝑾|𝒘𝑾min𝒂𝑨{gmtch(𝒂,𝒘)}m,\displaystyle=\frac{1}{|\mbox{\boldmath$W$}|}\sum_{\mbox{\boldmath$w$}\in\mbox{\boldmath$W$}}\min_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}\Bigl{\{}g^{\rm mtch}(\mbox{\boldmath$a$},\mbox{\boldmath$w$})\Bigr{\}}^{m}, (7)
gmtch(𝒂,𝒘)\displaystyle g^{\rm mtch}(\mbox{\boldmath$a$},\mbox{\boldmath$w$}) =maxi{1,,m}{|qiai|wi},\displaystyle=\max_{i\in\{1,...,m\}}\biggl{\{}\frac{|q_{i}-a_{i}|}{w_{i}}\biggr{\}}, (8)

where gmtchg^{\rm mtch} in (8) is a modified version of gtchg^{\rm tch}. If wi=0w_{i}=0 in (8), wiw_{i} was set to 10610^{-6} to avoid division by zero. Similar to HV, NR2 uses the reference point 𝒒q. A large NR2 value indicates that the corresponding 𝑨A has a good HV value. Since NR2 was designed to approximate the HV value, it is expected that their optimal μ\mu-distributions are similar to each other. However, such an expectation has not been verified in the literature. To address this issue, we examine the approximated optimal μ\mu-distribution for NR2 in this paper.

II-B7 ss-energy (SE)

Although SE [23] was not originally proposed in the field of EMO, SE is used as a quality indicator in some studies (e.g., [34]). SE is given as follows:

SE(𝑨)=𝒂𝑨𝒃𝑨{𝒂}𝒂𝒃2s,\displaystyle{\rm SE}(\mbox{\boldmath$A$})=\sum_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}\sum_{\mbox{\boldmath$b$}\in\mbox{\boldmath$A$}\setminus\{\mbox{\boldmath$a$}\}}\|\mbox{\boldmath$a$}-\mbox{\boldmath$b$}\|_{2}^{-s}, (9)

where ss is usually set to m1m-1, and 𝒂p\|\mbox{\boldmath$a$}\|_{p} is the LpL_{p} norm of 𝒂a. If 𝒂=𝒃\mbox{\boldmath$a$}=\mbox{\boldmath$b$} in (9), we set the SE value of 𝑨A to an infinitely large value to avoid division by zero. A small SE value indicates that 𝑨A has a good uniformity and a good spread.

II-B8 Generalized Spread (Δ\Delta)

The original Δ\Delta indicator [13] can handle only the two-dimensional objective space. In this paper, we use an extension of Δ\Delta with respect to mm presented in [14]. Δ\Delta evaluates the quality of 𝑨A in terms of both uniformity and spread as follows:

Δ(𝑨)\displaystyle\Delta(\mbox{\boldmath$A$}) =dext+𝒂𝑨|dist(𝒂,𝑨)davg|dext+davg(|𝑨|m),\displaystyle=\frac{d^{\rm ext}+\sum_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}|{\rm dist}(\mbox{\boldmath$a$},\mbox{\boldmath$A$})-d^{\rm avg}|}{d^{\rm ext}+d^{\rm avg}(|\mbox{\boldmath$A$}|-m)}, (10)
dist(𝒖,𝑽)\displaystyle{\rm dist}(\mbox{\boldmath$u$},\mbox{\boldmath$V$}) =min𝒗𝑽,𝒗𝒖𝒖𝒗2,\displaystyle=\min_{\mbox{\boldmath$v$}\in\mbox{\boldmath$V$},\mbox{\boldmath$v$}\neq\mbox{\boldmath$u$}}\|\mbox{\boldmath$u$}-\mbox{\boldmath$v$}\|_{2}, (11)
dext\displaystyle d^{\rm ext} =i=1mdist(𝒓iext,𝑨),\displaystyle=\sum^{m}_{i=1}{\rm dist}(\mbox{\boldmath$r$}^{\rm ext}_{i},\mbox{\boldmath$A$}), (12)
davg\displaystyle d^{\rm avg} =1|𝑨|𝒂𝑨dist(𝒂,𝑨),\displaystyle=\frac{1}{|\mbox{\boldmath$A$}|}\sum_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}{\rm dist}(\mbox{\boldmath$a$},\mbox{\boldmath$A$}), (13)

where the function dist(𝒖,𝑽){\rm dist}(\mbox{\boldmath$u$},\mbox{\boldmath$V$}) in (11) returns the Euclidean distance from 𝒖u to its nearest vector in the set 𝑽V. In (12), 𝒓iext\mbox{\boldmath$r$}^{\rm ext}_{i} is the ii-th extreme objective vector with the maximum value of the ii-th objective function in 𝑹R (i{1,,m}i\in\{1,...,m\}). In (10), dextd^{\rm ext} aims to evaluate the spread of 𝑨A, while the other parts aim to evaluate the uniformity of 𝑨A. A small Δ\Delta value indicates that 𝑨A has good spread and uniformity.

II-B9 Pure diversity (PD)

PD [24] is an extended version of the Solow-Polasky diversity indicator [35] for multi-objective optimization. The PD value of 𝑨A is based on the dissimilarity of each objective vector 𝒂a to 𝑨A as follows:

PD(𝑨)\displaystyle{\rm PD}(\mbox{\boldmath$A$}) =max𝒂𝑨{PD(𝑨{𝒂})+d(𝒂,𝑨{𝒂})},\displaystyle=\max_{\mbox{\boldmath$a$}\in\mbox{\boldmath$A$}}\Bigl{\{}{\rm PD}\bigl{(}\mbox{\boldmath$A$}\setminus\{\mbox{\boldmath$a$}\}\bigr{)}+d\bigl{(}\mbox{\boldmath$a$},\mbox{\boldmath$A$}\setminus\{\mbox{\boldmath$a$}\}\bigr{)}\Bigr{\}}, (14)
d(𝒖,𝑽)\displaystyle d(\mbox{\boldmath$u$},\mbox{\boldmath$V$}) =min𝒗𝑽{𝒖𝒗p},\displaystyle=\min_{\mbox{\boldmath$v$}\in\mbox{\boldmath$V$}}\bigl{\{}\|\mbox{\boldmath$u$}-\mbox{\boldmath$v$}\|_{p}\bigr{\}}, (15)

where the recommended value of pp in the LpL_{p} norm in (15) is 0.10.1. Most diversity-based quality indicators (including SE and Δ\Delta) evaluate the uniformity of 𝑨A. In contrast, PD evaluates the dissimilarity between objective vectors in 𝑨A.

III Related work

This section describes related work. Analysis of quality indicators can be categorized into two approaches. One is ranking information based approaches. First, multiple objective vector sets are ranked by each quality indicator. Then, the properties of each quality indicator are discussed based on which objective vector set is highly evaluated by the quality indicator. The consistency of multiple quality indicators can also be investigated in this manner. One main drawback of the ranking information based approaches is that only relative information can be obtained.

The other is optimal μ\mu-distribution based approaches. As explained in Section I, the optimal μ\mu-distribution provides insightful information about the properties of each quality indicator. One main drawback of the optimal μ\mu-distribution based approaches is that it is difficult to find the optimal μ\mu-distributions for quality indicators.

III-A Ranking information based approaches

Okabe et al. [8] analyzed 15 quality indicators using objective vector sets artificially generated on the linear Pareto front and objective vector sets found by EMOAs on the convex Pareto front. The results show that some quality indicators can generate misleading results. Jiang et al. [15] examined six quality indicators using a pair of objective vector sets on the linear, convex, and concave Pareto fronts. The results show that Iϵ+I_{\epsilon+}, IGD, and HV are consistent with each other on the convex Pareto front, but IGD is inconsistent with HV on the concave Pareto front. Ravber et al. [36] investigated 11 quality indicators using a chess rating system. They used objective vector sets found by EMOAs on some test problems. The results show the robustness of HV, Iϵ+I_{\epsilon+}, and IGD+. However, the results also show that the three quality indicators do not equally evaluate the quality of objective vector sets in terms of the convergence, the uniformity, and the spread.

Wessing and Naujoks [37] analyzed the correlation between HV, R2, and Iϵ+I_{\epsilon+}. Their analysis was based on objective vector sets randomly generated on two- and three-objective problems. The results based on Pearson’s correlation coefficient show that HV is highly consistent with R2. Liefooghe and Derbel [38] examined the correlation between six quality indicators. They used objective vector sets found by the random search and NSGA-II [13] on test problems with m{2,3}m\in\{2,3\}. The results based on the Kendall rank correlation show that HV is consistent with R2 and R3. In contrast, Iϵ+I_{\epsilon+} and its multiplicative version are inconsistent with other quality indicators. Ishibuchi et al. [39] compared IGD+ with IGD on various objective vector sets found by NSGA-II on problems with up to 10 objectives. The results show that IGD+ can evaluate objective vector sets more accurately than IGD in terms of the Pareto compliance. In addition to IGD+ and IGD, three quality indicators (GD, Iϵ+I_{\epsilon+}, and D1 [32]) were examined. Bezerra et al. [40] analyzed the relation between IGD+ and IGD on the linear Pareto front. Various objective vector sets were used in their analysis. The results show that IGD and IGD+ are consistent with each other in most cases. The results also show that IGD may favor objective vectors with a poor spread.

III-B Optimal μ\mu-distribution based approaches

It is difficult to obtain the optimal μ\mu-distributions in an exact manner. Thus, the optimal μ\mu-distributions are generally approximated by analytical or empirical methods. For the differences between the two methods, see Section I.

Auger et al. [12] formulated finding the optimal μ\mu-distribution for HV on two-objective problems as a single-objective continuous optimization problem. Below, we describe its problem formulation in a general manner. Our description significantly differs from the original one.

Let 𝑨={𝒂1,,𝒂μ}\mbox{\boldmath$A$}=\{\mbox{\boldmath$a$}_{1},...,\mbox{\boldmath$a$}_{\mu}\} be a set of μ\mu objective vectors on the Pareto front. Also, let θi\theta_{i} be a real value that determines the first element (i.e., the value of the first objective f1f_{1}) of the ii-th objective vector 𝒂i\mbox{\boldmath$a$}_{i} (i{1,,μ}i\in\{1,...,\mu\}) in 𝑨A. All objective vectors in the optimal μ\mu-distributions must always be on the Pareto front. For this restriction, the position of 𝒂i\mbox{\boldmath$a$}_{i} for f1f_{1} specifies that for the second objective f2f_{2}. In [12], a front shape function F:F:\mathbb{R}\rightarrow\mathbb{R} is used to determine the position of 𝒂i\mbox{\boldmath$a$}_{i} for f2f_{2} as follows: 𝒂i=(θi,F(θi))T\mbox{\boldmath$a$}_{i}=\bigl{(}\theta_{i},F(\theta_{i})\bigr{)}^{\rm T}. For all i{1,,μ}i\in\{1,...,\mu\}, θi\theta_{i} is in the range [θmin,θmax][\theta^{\rm min},\theta^{\rm max}], where θmin\theta^{\rm min} and θmax\theta^{\rm max} are the lower and upper bounds for θi\theta_{i}. Here, θmin\theta^{\rm min} and θmax\theta^{\rm max} correspond to the minimum and maximum values of f1f_{1} in the Pareto front, respectively. In this manner, the distribution of μ\mu objective vectors in 𝑨A is determined by a vector 𝜽=(θ1,,θμ)T\mbox{\boldmath$\theta$}=(\theta_{1},...,\theta_{\mu})^{\rm T}.

Fig. 1 shows examples of five objective vectors. In Fig. 1, the Pareto front is shown by

FDTLZ1(θ)\displaystyle F_{\rm DTLZ1}(\theta) =0.5θ\displaystyle=0.5-\theta (θ[0,0.5]),\displaystyle(\theta\in[0,0.5]), (16)
FDTLZ2(θ)\displaystyle F_{\rm DTLZ2}(\theta) =1θ2\displaystyle=\sqrt{1-\theta^{2}} (θ[0,1]),\displaystyle(\theta\in[0,1]), (17)
FZDT1(θ)\displaystyle F_{\rm ZDT1}(\theta) =1θ\displaystyle=1-\sqrt{\theta} (θ[0,1]),\displaystyle(\theta\in[0,1]), (18)

where FDTLZ1F_{\rm DTLZ1}, FDTLZ2F_{\rm DTLZ2}, and FZDT1F_{\rm ZDT1} represent the Pareto front shapes of DTLZ1 [41], DTLZ2 [41], and ZDT1 [42], respectively. In Fig. 1, μ=5\mu=5, 𝜽=(0,0.125,0.25,0.375,0.5)T\mbox{\boldmath$\theta$}=(0,0.125,0.25,0.375,0.5)^{\rm T} for FDTLZ1F_{\rm DTLZ1}, and 𝜽=(0,0.25,0.5,0.75,1)T\mbox{\boldmath$\theta$}=(0,0.25,0.5,0.75,1)^{\rm T} for FDTLZ2F_{\rm DTLZ2} and FZDT1F_{\rm ZDT1}. As shown in Fig. 1, distributions of objective vectors on various Pareto fronts can be examined by changing FF.

In summary, finding 𝑨A that minimizes a quality indicator II is the same as finding 𝜽\theta that indirectly minimizes II as follows:

minimize I(𝑨)=I({(θi,F(θi))T|i{1,,μ}}),\displaystyle\text{minimize }I(\mbox{\boldmath$A$})=I\biggl{(}\Bigl{\{}\bigl{(}\theta_{i},F(\theta_{i})\bigr{)}^{\rm T}\,|\,i\in\{1,...,\mu\}\Bigr{\}}\biggr{)}, (19)

where 𝜽\theta is a solution of the problem defined in (19). II that needs to be maximized (e.g., HV, NR2, and PD) can be translated as I-I without loss of generality. Any derivative-free optimizer can be used to minimize II in (19).

Refer to caption
(a) FDTLZ1F_{\rm DTLZ1}
Refer to caption
(b) FDTLZ2F_{\rm DTLZ2}
Refer to caption
(c) FZDT1F_{\rm ZDT1}
Figure 1: Distributions of five translated objective vectors. The xx and yy axes represent f1f_{1} (θ\theta) and f2f_{2} (F(θ)F(\theta)), respectively.

Auger et al. [12] proposed analytical and empirical methods of approximating optimal μ\mu-distributions for HV. Details of the analytical method for each FF can be found in the supplementary website of [12] (https://sop.tik.ee.ethz.ch/download/supplementary/testproblems/). The proposed problem-specific local search algorithm in [12] for (19) uses variable-wise perturbation. For each iteration, each θi\theta_{i} is perturbed by a value selected randomly from a normal distribution. Brockhoff et al. [17] applied CMA-ES [20] to (19) to find the optimal μ\mu-distributions for R2. Their results show that objective vectors in the optimal μ\mu-distributions for R2 are densely distributed in the center of the Pareto front. They also examined the contribution of each objective vector in the optimal μ\mu-distribution for R2 to HV (and that for HV to R2).

In contrast to the above-mentioned studies [12, 17], the following studies do not use (19). Bringmann et al. [19] proposed the analytical method for finding the optimal μ\mu-distributions for Iϵ+I_{\epsilon+} on the Pareto front with m=2m=2. The proposed method explicitly exploits the properties of Iϵ+I_{\epsilon+} presented in [43]. The results show that the optimal μ\mu-distributions for Iϵ+I_{\epsilon+} on most Pareto fronts with m=2m=2 are uniform. Glasmachers [18] proposed a gradient-based analytical method for approximating the optimal μ\mu-distributions for HV on m{2,3}m\in\{2,3\}. For m=3m=3, the proposed method in [18] decomposes the three-dimensional objective space into cuboids. Then, the proposed method in [18] uses the gradient of HV based on the cuboids.

Ishibuchi et al. [4] used SMS-EMOA [44] to approximate the optimal μ\mu-distributions for HV on Pareto fronts of three-objective problems. SMS-EMOA uses the HV contribution in the steady-state selection. For each iteration, the worst individual in terms of the HV contribution is removed from the last non-domination level. In [4], SMS-EMOA was applied to the DTLZ and inverted-DTLZ problems whose number of distance variables is zero. Then, the influence of the reference vector on the optimal μ\mu-distributions for HV is examined. Ishibuchi et al. [3] also investigated the optimal μ\mu-distributions for IGD approximated by a similar approach. The HV contribution-based selection in SMS-EMOA was replaced by the IGD contribution-based selection. The optimal μ\mu-distributions for HV, IGD, and IGD+ approximated by SMS-EMOA in a similar manner were analyzed in [31]. The results show that the optimal μ\mu-distributions for IGD are almost uniform in all problems. The results also show that the optimal μ\mu-distributions for HV and IGD+ are similar in all problems when the reference vector for HV is set appropriately.

IV Proposed formulation

Here, we introduce the proposed problem formulation of finding the optimal μ\mu-distribution for quality indicators on Pareto fronts for arbitrary number of objectives (m2m\geq 2). As reviewed in Subsection III-B, some methods of approximating the optimal μ\mu-distributions have been proposed in the literature. The analytical methods [12, 18, 19] can efficiently approximate the optimal μ\mu-distributions in a computationally cheap manner. Unfortunately, they can be applied to only quality indicators whose theoretical properties are clear (i.e., HV and Iϵ+I_{\epsilon+}). It is questionable whether the empirical methods using SMS-EMOA [3, 4, 31] can be applied to quality indicators except for HV, IGD, and IGD+. The empirical methods based on the problem formulation in (19) are promising. Although the previous studies [12, 17] addressed only HV and R2, the approaches using (19) can be applied to any quality indicators in principle. However, the problem formulation in (19) can handle only Pareto fronts for m=2m=2. In contrast, the proposed problem formulation can handle Pareto fronts for m2m\geq 2. We do not claim that the degenerated version of our formulation for m=2m=2 is better than the formulation in (19). Note that our formulation for m=2m=2 is almost the same as the formulation in (19).

The formulation proposed in [12] for m=2m=2 and our formulation for m2m\geq 2 are essentially one of the set-based optimization methods [25]. Test problems constructed by bottom-up approaches (e.g., DTLZ) have two types of decision variables [45]. One is position-related variables that specify the position of an objective vector on the Pareto front. The other is distance-related variables that determine the distance between the objective vector and the Pareto front. Thus, any solutions with only position-related variables are always on the Pareto front in the objective space. The optimal μ\mu-distribution for a quality indicator can be approximated by a set-based method that finds a set of μ\mu solutions (not objective vectors) with only position-related variables for minimization of the quality indicator. While we here denote this approach as the solution set-based approach, we denote the formulations proposed in [12] and this paper as the objective vector set-based approach. The main advantage of the objective vector set-based approach is its simplicity. While the solution set-based approach needs to handle both solution and objective spaces, the objective vector set-based approach just needs to handle only the objective space. We can focus only on the objective space in the objective vector set-based approach. This simplicity is beneficial for an analysis of the optimal μ\mu-distributions as demonstrated in [12].

IV-A Proposed problem formulation

Below, we explain the proposed formulation for general cases. Only a disconnected front shape function FdisconnectedF_{\rm disconnected} explained in Subsection IV-C requires an alternative formulation. For details, see Subsection S.1-D in the supplementary file. Finding the optimal μ\mu-distribution for a quality indicator II on the Pareto front with mm objectives is formulated as follows:

minimize I(𝑨),\displaystyle I(\mbox{\boldmath$A$}), (20)
subject to G(𝑨)0,\displaystyle G(\mbox{\boldmath$A$})\leq 0,

where GG is a constraint function for constrained front shape functions (e.g., FcconcaveF_{\rm c\mathchar 45\relax concave} explained in Subsection IV-C). In contrast to (19), GG is added to (20). If a given front shape function has no constraint, GG is eliminated from (20). An objective vector set 𝑨={𝒂1,,𝒂μ}\mbox{\boldmath$A$}=\{\mbox{\boldmath$a$}_{1},...,\mbox{\boldmath$a$}_{\mu}\} is a translated version of a solution 𝜽\theta of this problem in a two-phase manner as follows:

𝑨A =F(𝑩),\displaystyle=F(\mbox{\boldmath$B$}), (21)
𝑩B =translate(𝜽),\displaystyle={\rm translate}(\mbox{\boldmath$\theta$}), (22)

where 𝜽=(θ1,,θd)T\mbox{\boldmath$\theta$}=(\theta_{1},...,\theta_{d})^{\rm T} is a dd-dimensional vector, and d=μ(m1)d=\mu(m-1). Any single-objective black-box optimizer can be used to find 𝜽\theta that minimizes II. While the size of 𝜽\theta is μ\mu in (19), that is dd in (21). For all i{1,,d}i\in\{1,...,d\}, θi\theta_{i} is in [0,1][0,1].

The function “translate{\rm translate}” in (22) translates a dd-dimensional vector 𝜽\theta into a set 𝑩={𝒃1,,𝒃μ}\mbox{\boldmath$B$}=\{\mbox{\boldmath$b$}_{1},...,\mbox{\boldmath$b$}_{\mu}\} that consists of μ\mu mm-dimensional vectors. For each i{1,,μ}i\in\{1,...,\mu\}, 𝒃i=(bi,1,,bi,m)T\mbox{\boldmath$b$}_{i}=(b_{i,1},...,b_{i,m})^{\rm T}, and j=1mbi,j=1\sum^{m}_{j=1}b_{i,j}=1. Although 𝒃i\mbox{\boldmath$b$}_{i} is an mm-dimensional vector, it is not an objective vector. Subsection IV-B explains the translation operator in detail.

After translating 𝜽\theta into 𝑩B in (22), 𝑩B is further mapped to the objective vector set 𝑨A by a front shape function FF in (21). For each i{1,,μ}i\in\{1,...,\mu\}, an element 𝒃i\mbox{\boldmath$b$}_{i} in 𝑩B corresponds to an objective vector 𝒂i\mbox{\boldmath$a$}_{i} in 𝑨A. All objective values are normalized into [0,1][0,1] in our study by using the ideal point 𝒛\mbox{\boldmath$z$}^{*} and the nadir point 𝒛nadir=(z1nadir,,zmnadir)T\mbox{\boldmath$z$}^{\rm nadir}=(z^{\rm nadir}_{1},...,z^{\rm nadir}_{m})^{\rm T} of each FF as follows: ai:=(aizi)/(zinadirzi)a_{i}:=(a_{i}-z^{*}_{i})/(z^{\rm nadir}_{i}-z^{*}_{i}) for each i{1,,m}i\in\{1,...,m\}. Here, zinadirz^{\rm nadir}_{i} is the maximum value of the ii-th objective function of the Pareto front. We use the normalization in order to examine the approximated optimal μ\mu-distributions for the quality indicators without worrying about the influence of differently-scaled objective values. Subsection IV-C specifies how to map 𝑩B to 𝑨A by the front shape function FF.

IV-B Translation operator

We explain how the translation operator in (22) maps 𝜽\theta to 𝑩B. First, all dd elements in 𝜽\theta are divided into μ\mu vector groups as (θ1,,θ1+m2)T(\theta_{1},...,\theta_{1+m-2})^{\rm T}, …, (θdm+2,,θd)T(\theta_{d-m+2},...,\theta_{d})^{\rm T}. For example, when μ=2\mu=2, m=3m=3, d=μ(m1)=4d=\mu(m-1)=4, and 𝜽=(0.1,0.2,0.3,0.4)T\mbox{\boldmath$\theta$}=(0.1,0.2,0.3,0.4)^{\rm T}, 4 elements in 𝜽\theta are divided as follows: (0.1,0.2)T(0.1,0.2)^{\rm T} and (0.3,0.4)T(0.3,0.4)^{\rm T}. For the sake of simplicity, we denote (θi,,θi+m2)T(\theta_{i},...,\theta_{i+m-2})^{\rm T} as 𝒚=(y1,,ym1)T\mbox{\boldmath$y$}=(y_{1},...,y_{m-1})^{\rm T} for each i{1,1(m1),2(m1),,(dm+2)}i\in\{1,1(m-1),2(m-1),...,(d-m+2)\}.

Below, we describe how the (m1)(m-1)-dimensional vector 𝒚y is translated into an mm-dimensional vector 𝒃b in 𝑩B. To obtain all μ\mu vectors in 𝑩B, the same operation is repeatedly applied to 𝒚y for each i{1,1(m1),2(m1),,(dm+2)}i\in\{1,1(m-1),2(m-1),...,(d-m+2)\}. Each element in 𝒃b is obtained using the Jaszkiewicz’s method of generating random weight vectors [46] as follows:

b1\displaystyle b_{1} =1y1m1,\displaystyle=1-\sqrt[m-1]{y_{1}}, (23)
bj\displaystyle b_{j} =(1k=1j1bk)(1yjm1j),\displaystyle=\Biggl{(}1-\sum^{j-1}_{k=1}b_{k}\Biggr{)}\bigl{(}1-\sqrt[m-1-j]{y_{j}}\bigr{)},
bm\displaystyle b_{m} =1k=1m1bk,\displaystyle=1-\sum^{m-1}_{k=1}b_{k},

where j{2,,m1}j\in\{2,...,m-1\} in (23). It should be noted that 𝒃b obtained by (23) satisfies the condition k=1mbk=1\sum^{m}_{k=1}b_{k}=1. Although the Jaszkiewicz’s method was originally proposed for the weight vector generation in decomposition-based EMOAs, it can be used to translate 𝒚y into 𝒃b in a straightforward manner.

IV-C Front shape function FF

The proposed formulation can be applied to any Pareto front when its front shape function FF is available. In other words, the proposed formulation cannot be applied to the Pareto front whose analytical expression (i.e., its front shape function) is unavailable such as WFG3 [47]. Although various front shape functions were presented in the previous study [12], they are only for m=2m=2. We design front shape functions for m3m\geq 3 based on a method of generating uniformly distributed reference vectors in representative test problems (e.g., DTLZ and WFG) presented in [48].

We use eight front shape functions shown in Table I. The name of the original problems and the shape of the Pareto fronts are also shown in Table I. FlinearF_{\rm linear}, FconcaveF_{\rm concave}, and FconvexF_{\rm convex} are the most basic front shape functions. FilinearF_{\rm i\mathchar 45\relax linear}, FiconcaveF_{\rm i\mathchar 45\relax concave}, and FiconvexF_{\rm i\mathchar 45\relax convex} are inverted versions of FlinearF_{\rm linear}, FconvexF_{\rm convex}, and FconcaveF_{\rm concave}, respectively. It should be noted that an inverted version of the convex Pareto front is concave, and vice versa. FdisconnectedF_{\rm disconnected} has the disconnected and mixed Pareto front. Some parts of the Pareto front of FcconcaveF_{\rm c\mathchar 45\relax concave} are infeasible due to the constraint. We select FcconcaveF_{\rm c\mathchar 45\relax concave} to demonstrate that the proposed formulation can handle the constrained Pareto front. In our preliminary experiments, we used the degenerate front shape function FdegenerateF_{\rm degenerate} whose original problem is DTLZ5. FdegenerateF_{\rm degenerate} is not a surface but a curve since it is degenerate. Because the shape of FdegenerateF_{\rm degenerate} is concave, the optimal μ\mu-distribution on FdegenerateF_{\rm degenerate} is similar to that for a two-objective problem with FconcaveF_{\rm concave}. For this reason, we omit FdegenerateF_{\rm degenerate}.

TABLE I: Properties of the eight front shape functions.
FF Original problem Shape
FlinearF_{\rm linear} DTLZ1 [41] Linear
FconcaveF_{\rm concave} DTLZ2 [41] Concave
FconvexF_{\rm convex} Convex DTLZ2 [49] Convex
FilinearF_{\rm i\mathchar 45\relax linear} Inverted DTLZ1 [50] Inverted, linear
FiconcaveF_{\rm i\mathchar 45\relax concave} Inverted convex DTLZ2 [51] Inverted, concave
FiconvexF_{\rm i\mathchar 45\relax convex} Inverted DTLZ2 [51] Inverted, convex
FdisconnectedF_{\rm disconnected} DTLZ7 [41] Disconnected, mixed
FcconcaveF_{\rm c\mathchar 45\relax concave} C2-DTLZ2 [50] Disconnected, concave

Due to the paper length limitation, we explain only FconcaveF_{\rm concave} in this paper. We describe other front shape functions in Section S.1 in the supplementary file. In FconcaveF_{\rm concave}, the ii-th element bib_{i} of 𝒃b in 𝑩B is translated into the ii-th element aia_{i} of 𝒂a in 𝑨A on the concave Pareto front as follows:

ai=bit,\displaystyle a_{i}=\frac{b_{i}}{t}, (24)

where t=j=1mbj2t=\sqrt{\sum^{m}_{j=1}b_{j}^{2}}. This translation is based on the method of generating reference vectors in DTLZ2 described in [48]. All objective vectors obtained by (24) satisfy the following relation of the Pareto front of DTLZ2: i=1m(fi(𝒙))2=1\sum^{m}_{i=1}(f_{i}(\mbox{\boldmath$x$}^{*}))^{2}=1. In addition to the eight front shape functions in Table I, we believe that it is possible to design other front shape functions using [48] as reference.

V Experimental settings

This section describes experimental settings. We examine the properties of the nine quality indicators using the eight front shape functions in Table I. Although the number of objectives mm can be arbitrarily specified in the proposed formulation, mm is set to three in order to visually discuss the distributions of objective vectors. We investigate the optimal μ\mu-distributions with μ=10,15,21,28,36,45\mu=10,15,21,28,36,45 (d=20,30,42,56,72,90d=20,30,42,56,72,90, respectively). We select these μ\mu values so that we can visually discuss the distribution of objective vectors. The distribution of objective vectors with a too large μ\mu value is unclear. We can also examine the influence of μ\mu on the optimal μ\mu-distributions by using the various μ\mu values.

In the proposed formulation in (20), a black-box optimizer is necessary to find 𝜽\theta that minimizes a given quality indicator. We use L-SHADE [52], which is an improved version of differential evolution (DE) [53]. L-SHADE was the winner of the IEEE CEC2014 competition on single-objective real-parameter optimization. We used the Java source code of L-SHADE provided by the authors of [52]. The default parameter setting was used for L-SHADE. The maximum number of function evaluations was set to 104×d10^{4}\times d, and 31 independent runs were performed. For the constrained front shape function FcconcaveF_{\rm c\mathchar 45\relax concave}, we use the death-penalty constraint handling method. This is because objective vector sets are just classified into feasible and infeasible groups as described in Subsection S.1-E in the supplementary file. The fitness value of an infeasible individual is an infinitely large value so that all infeasible individuals cannot survive to the next iteration. We compared L-SHADE to the analytical approach that approximates the optimal μ\mu-distributions for HV for m=2m=2 [12] (see Subsection III-B). We confirmed that L-SHADE can find the approximated optimal μ\mu-distribution for HV with acceptable quality for m=2m=2. For details, see Section S.3 in the supplementary file.

We implemented the nine quality indicators ourselves, except for HV. The WFG algorithm [54] was used for the HV calculation. We used the source code of the WFG algorithm provided by the authors of [54]. For PD, we carefully translated the Matlab source code provided by the authors of [24] into our implementation in order to speed up the PD calculation. We confirmed that our version and the original version output exactly the same PD value on all the front shape functions used in our study. The source codes used in our experiments can be downloaded from the supplementary website (https://sites.google.com/view/optmudist).

We set the reference vector 𝒒q for HV and NR2 to (1.2,,1.2)T(1.2,...,1.2)^{\rm T} according to [4]. We set 𝒛\mbox{\boldmath$z$}^{*} for R2 to (0,,0)T(0,...,0)^{\rm T}. As mentioned in Subsection IV-A, all objective values are in the range [0,1][0,1] for all front shape functions. For R2 and NR2, we generated the weight vector set 𝑾W using simplex-lattice design [55]. For IGD, IGD+, Iϵ+I_{\epsilon+}, and Δ\Delta, we generated the reference vector set 𝑹R by applying each front shape function FF to 𝑾W as follows 𝑹=F(𝑾)\mbox{\boldmath$R$}=F(\mbox{\boldmath$W$}). That is, in order to generate 𝑹R, 𝑾W is used in (21) instead of 𝑩B (the resulting 𝑨A is used as 𝑹R). For details, see Section S.2 in the supplementary file. The size of 𝑾W and 𝑹R was set to 1 0351\,035. For FdisconnectedF_{\rm disconnected} and FcconcaveF_{\rm c\mathchar 45\relax concave}, we set the size of 𝑹R to 1 0891\,089 and 1 0871\,087, respectively. This is due to the properties of FdisconnectedF_{\rm disconnected} and FcconcaveF_{\rm c\mathchar 45\relax concave}.

Refer to caption
(a) 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
Refer to caption
(b) 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}
Refer to caption
(c) 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
Refer to caption
(d) 𝑨R2\mbox{\boldmath$A$}_{\rm R2}
Refer to caption
(e) 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}
Refer to caption
(f) 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}
Refer to caption
(g) 𝑨SE\mbox{\boldmath$A$}_{\rm SE}
Refer to caption
(h) 𝑨Δ\mbox{\boldmath$A$}_{\Delta}
Refer to caption
(i) 𝑨PD\mbox{\boldmath$A$}_{\rm PD}
Refer to caption
(j) 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}
Figure 2: Results on FlinearF_{\rm linear}.
Refer to caption
(a) 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
Refer to caption
(b) 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}
Refer to caption
(c) 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
Refer to caption
(d) 𝑨R2\mbox{\boldmath$A$}_{\rm R2}
Refer to caption
(e) 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}
Refer to caption
(f) 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}
Refer to caption
(g) 𝑨SE\mbox{\boldmath$A$}_{\rm SE}
Refer to caption
(h) 𝑨Δ\mbox{\boldmath$A$}_{\Delta}
Refer to caption
(i) 𝑨PD\mbox{\boldmath$A$}_{\rm PD}
Refer to caption
(j) 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}
Figure 3: Results on FconcaveF_{\rm concave}.
Refer to caption
(a) 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
Refer to caption
(b) 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}
Refer to caption
(c) 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
Refer to caption
(d) 𝑨R2\mbox{\boldmath$A$}_{\rm R2}
Refer to caption
(e) 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}
Refer to caption
(f) 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}
Refer to caption
(g) 𝑨SE\mbox{\boldmath$A$}_{\rm SE}
Refer to caption
(h) 𝑨Δ\mbox{\boldmath$A$}_{\Delta}
Refer to caption
(i) 𝑨PD\mbox{\boldmath$A$}_{\rm PD}
Refer to caption
(j) 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}
Figure 4: Results on FconvexF_{\rm convex}.
Refer to caption
(a) 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
Refer to caption
(b) 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}
Refer to caption
(c) 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
Refer to caption
(d) 𝑨R2\mbox{\boldmath$A$}_{\rm R2}
Refer to caption
(e) 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}
Refer to caption
(f) 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}
Refer to caption
(g) 𝑨SE\mbox{\boldmath$A$}_{\rm SE}
Refer to caption
(h) 𝑨Δ\mbox{\boldmath$A$}_{\Delta}
Refer to caption
(i) 𝑨PD\mbox{\boldmath$A$}_{\rm PD}
Refer to caption
(j) 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}
Figure 5: Results on FilinearF_{\rm i\mathchar 45\relax linear}.
Refer to caption
(a) 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
Refer to caption
(b) 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}
Refer to caption
(c) 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
Refer to caption
(d) 𝑨R2\mbox{\boldmath$A$}_{\rm R2}
Refer to caption
(e) 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}
Refer to caption
(f) 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}
Refer to caption
(g) 𝑨SE\mbox{\boldmath$A$}_{\rm SE}
Refer to caption
(h) 𝑨Δ\mbox{\boldmath$A$}_{\Delta}
Refer to caption
(i) 𝑨PD\mbox{\boldmath$A$}_{\rm PD}
Refer to caption
(j) 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}
Figure 6: Results on FiconcaveF_{\rm i\mathchar 45\relax concave}.
Refer to caption
(a) 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
Refer to caption
(b) 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}
Refer to caption
(c) 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
Refer to caption
(d) 𝑨R2\mbox{\boldmath$A$}_{\rm R2}
Refer to caption
(e) 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}
Refer to caption
(f) 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}
Refer to caption
(g) 𝑨SE\mbox{\boldmath$A$}_{\rm SE}
Refer to caption
(h) 𝑨Δ\mbox{\boldmath$A$}_{\Delta}
Refer to caption
(i) 𝑨PD\mbox{\boldmath$A$}_{\rm PD}
Refer to caption
(j) 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}
Figure 7: Results on FiconvexF_{\rm i\mathchar 45\relax convex}.
Refer to caption
(a) 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
Refer to caption
(b) 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}
Refer to caption
(c) 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
Refer to caption
(d) 𝑨R2\mbox{\boldmath$A$}_{\rm R2}
Refer to caption
(e) 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}
Refer to caption
(f) 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}
Refer to caption
(g) 𝑨SE\mbox{\boldmath$A$}_{\rm SE}
Refer to caption
(h) 𝑨Δ\mbox{\boldmath$A$}_{\Delta}
Refer to caption
(i) 𝑨PD\mbox{\boldmath$A$}_{\rm PD}
Figure 8: Approximated optimal μ\mu-distributions on FdisconnectedF_{\rm disconnected}.
Refer to caption
(a) 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
Refer to caption
(b) 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}
Refer to caption
(c) 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
Refer to caption
(d) 𝑨R2\mbox{\boldmath$A$}_{\rm R2}
Refer to caption
(e) 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}
Refer to caption
(f) 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}
Refer to caption
(g) 𝑨SE\mbox{\boldmath$A$}_{\rm SE}
Refer to caption
(h) 𝑨Δ\mbox{\boldmath$A$}_{\Delta}
Refer to caption
(i) 𝑨PD\mbox{\boldmath$A$}_{\rm PD}
Figure 9: Approximated optimal μ\mu-distributions on FcconcaveF_{\rm c\mathchar 45\relax concave}.

VI Experimental results

This section analyzes the nine quality indicators using their optimal μ\mu-distributions approximated by the proposed formulation. Subsection VI-A discusses the optimal μ\mu-distributions for each quality indicator. Subsection VI-B examines the nine quality indicators using the ranking information. Subsection VI-C analyzes the generality of results for m=3m=3 with respect to mm. Subsection VI-D investigates a unary version of an MM-nary quality indicator.

VI-A Approximated optimal μ\mu-distributions for the nine quality indicators

Figs. 49 show approximated optimal μ\mu-distributions with μ=21\mu=21 obtained by the proposed approach for the nine quality indicators on the eight Pareto fronts. In Figs. 49, xx, yy, and zz axes represent f1f_{1}, f2f_{2}, and f3f_{3}, respectively. Figs. 49 show the best objective vector set found by L-SHADE among 31 runs for each quality indicator on each front shape function FF. Below, we denote the best μ\mu-distribution for a quality indicator II as 𝑨I\mbox{\boldmath$A$}_{I} (e.g., 𝑨HV\mbox{\boldmath$A$}_{\rm HV}). Figs. S.3–S.50 in the supplementary file show approximated optimal μ\mu-distributions with the other μ\mu values. All reference vectors in 𝑹R are shown in each figure.

For the sake of comparison, Figs. 47 (j) show a set of μ\mu objective vectors generated by simplex-lattice design, denoted as 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}. We generated 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} using the same method as for generating the reference vector set 𝑹R described in Section V. Since the number of objective vectors generated by simplex-lattice design on FdisconnectedF_{\rm disconnected} and FcconcaveF_{\rm c\mathchar 45\relax concave} cannot be specified arbitrarily due to their characteristics, we do not show 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} in Figs. 9 and 9. 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} can be viewed as an “ideal” objective vector set found by decomposition-based EMOAs (e.g., NSGA-III [49]). Although the distribution of objective vectors in 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} is uniform, 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} is not optimal for all the nine quality indicators (see Subsection VI-B).

Since the distribution of objective vectors in some objective vector sets is not uniform, one may think that the approximated objective vector sets are far from optimal. However, as demonstrated in Subsection VI-B using Table II, some quality indicators do not prefer uniformly distributed objective vectors. For example, as shown in Figs. 4 (j) and (h), 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} clearly has a better uniformity than 𝑨Δ\mbox{\boldmath$A$}_{\Delta} on FconcaveF_{\rm concave}. Nevertheless, Table II shows that Δ\Delta evaluates 𝑨Δ\mbox{\boldmath$A$}_{\Delta} as better than 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} (details are discussed later). Such an unintuitive result may be due to a hidden property of each quality indicator. We reemphasize that a theoretical analysis of the optimal μ\mu-distribution for m3m\geq 3 is difficult. For this reason, the (true and approximated) optimal μ\mu-distributions for almost all quality indicators have not been investigated on the Pareto front with m3m\geq 3.

Below, we individually discuss the optimal μ\mu-distribution for each of the nine quality indicators (𝑨HV\mbox{\boldmath$A$}_{\rm HV}, 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}, 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}, 𝑨R2\mbox{\boldmath$A$}_{\rm R2}, 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}, 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}, 𝑨SE\mbox{\boldmath$A$}_{\rm SE}, 𝑨Δ\mbox{\boldmath$A$}_{\Delta}, and 𝑨PD\mbox{\boldmath$A$}_{\rm PD}). Although results of HV, IGD, and IGD+ are consistent with the previous studies, some results of other quality indicators provide insightful information. Our findings are summarized in Section VII.

VI-A1 𝑨HV\mbox{\boldmath$A$}_{\rm HV}

It should be noted that the optimal μ\mu-distribution for HV significantly depends on the position of the reference vector 𝒒q [4]. Objective vectors in 𝑨HV\mbox{\boldmath$A$}_{\rm HV} are on the edge and the center of the Pareto front for all the front shape functions, except for FconvexF_{\rm convex} and FiconvexF_{\rm i\mathchar 45\relax convex}. These observations on the convex Pareto fronts are consistent with the previous studies [12, 15, 4]. When μ\mu is small, 𝑨HV\mbox{\boldmath$A$}_{\rm HV} does not contain the extreme objective vectors even for the linear front shape function FlinearF_{\rm linear} (e.g., 𝑨HV\mbox{\boldmath$A$}_{\rm HV} with μ=10\mu=10 shown in Fig. S.3 in the supplementary file). As μ\mu increases, 𝑨HV\mbox{\boldmath$A$}_{\rm HV} covers the edge of the Pareto front on most front shape functions.

VI-A2 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}

The distribution of objective vectors in 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} is uniform on most front shape functions, even including FcconcaveF_{\rm c\mathchar 45\relax concave}. However, objective vectors in 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} on all front shape functions are not on the edge of the Pareto front. Thus, IGD may incorrectly evaluate the spread quality of objective vector sets. The poor spread quality of 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} is consistent with the previous studies [3, 31].

VI-A3 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}

Our results are consistent with the results presented in [31]. The distribution of objective vectors in 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}} is almost the same as that in 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} on FlinearF_{\rm linear} and similar to that in 𝑨HV\mbox{\boldmath$A$}_{\rm HV} on the other front shape functions. As the μ\mu value increases, the distributions of objective vectors in 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}} and 𝑨HV\mbox{\boldmath$A$}_{\rm HV} become more similar. For details, see Figs. S.3–S.50 in the supplementary file.

VI-A4 𝑨R2\mbox{\boldmath$A$}_{\rm R2}

Objective vectors in 𝑨R2\mbox{\boldmath$A$}_{\rm R2} are densely distributed in the center of the Pareto front for FlinearF_{\rm linear}, FconcaveF_{\rm concave}, and FconvexF_{\rm convex}. These results are consistent with the previous study [17] for m=2m=2. However, our results on FilinearF_{\rm i\mathchar 45\relax linear} and FiconcaveF_{\rm i\mathchar 45\relax concave} are inconsistent with [17]. Only one objective vector in 𝑨R2\mbox{\boldmath$A$}_{\rm R2} is in the center of the Pareto front for FilinearF_{\rm i\mathchar 45\relax linear}. Almost all objective vectors in 𝑨R2\mbox{\boldmath$A$}_{\rm R2} are on the edge of the Pareto front. Similarly, all objective vectors in 𝑨R2\mbox{\boldmath$A$}_{\rm R2} are on the edge of the Pareto front for FiconcaveF_{\rm i\mathchar 45\relax concave}. When μ\mu is set to a larger value, 𝑨R2\mbox{\boldmath$A$}_{\rm R2} contains a few objective vectors on the center of the Pareto front for FiconcaveF_{\rm i\mathchar 45\relax concave} as shown in Figs. S.3–S.50.

Our inconsistent results on FilinearF_{\rm i\mathchar 45\relax linear} and FiconcaveF_{\rm i\mathchar 45\relax concave} are mainly due to the inverted triangular Pareto front. The analysis presented in [51] shows that some decomposition-based EMOAs (e.g., MOEA/D and NSGA-III) perform poorly on problems with inverted triangular Pareto fronts. This is because the shape of the distribution of weight vectors used in the decomposition-based EMOAs is different from the shape of the Pareto front. Since R2 uses the weight vector set 𝑾W that is similar to decomposition-based EMOAs, R2 is likely to have a similar issue in decomposition-based EMOAs for m=3m=3.

VI-A5 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}

Since NR2 was designed to approximate the HV value, 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2} is very similar to 𝑨HV\mbox{\boldmath$A$}_{\rm HV} for all front shape functions. Although NR2 is the modified version of R2, 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2} is dissimilar to 𝑨R2\mbox{\boldmath$A$}_{\rm R2}. For example, 𝑨R2\mbox{\boldmath$A$}_{\rm R2} does not contain any objective vector in the center of the Pareto front for FiconcaveF_{\rm i\mathchar 45\relax concave}. In contrast, 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2} contains some objective vectors in the center of the Pareto front for FiconcaveF_{\rm i\mathchar 45\relax concave} that is similar to 𝑨HV\mbox{\boldmath$A$}_{\rm HV}.

VI-A6 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}}

As mentioned in Subsection III-B, the optimal μ\mu-distributions for Iϵ+I_{\epsilon+} on m=2m=2 were investigated in [19]. However, our results on m=3m=3 are inconsistent with [19]. The distribution of objective vectors in 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}} is not uniform for all front shape functions (except for FiconcaveF_{\rm i\mathchar 45\relax concave}). Thus, our results show that optimal μ\mu-distributions for Iϵ+I_{\epsilon+} depend on the choice of mm. As pointed out in [5], the Iϵ+I_{\epsilon+} value of a given objective vector set 𝑨A depends only on one objective value of one objective vector in 𝑨A. This property of Iϵ+I_{\epsilon+} may influence its optimal μ\mu-distributions on m=3m=3 in a complicated manner. An in-depth analysis of the optimal μ\mu-distributions for Iϵ+I_{\epsilon+} on m=3m=3 is another future work.

VI-A7 𝑨SE\mbox{\boldmath$A$}_{\rm SE}

Similar to 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}, the distribution of objective vectors in 𝑨SE\mbox{\boldmath$A$}_{\rm SE} is uniform for all front shape functions. In contrast, objective vectors in 𝑨SE\mbox{\boldmath$A$}_{\rm SE} are more widely distributed than those in 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD}. For example, while objective vectors in 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} are densely distributed in the center of the Pareto front for FconvexF_{\rm convex}, those in 𝑨SE\mbox{\boldmath$A$}_{\rm SE} cover the whole Pareto front. We can conclude that SE evaluates the quality of objective vector sets in terms of both uniformity and spread.

However, most objective vectors in 𝑨SE\mbox{\boldmath$A$}_{\rm SE} are on the edge of the Pareto front (especially on FdisconnectedF_{\rm disconnected}). Even as the μ\mu value increases, this characteristic can be still found (see Figs. S.3–S.50). Thus, SE may overestimate a set of objective vectors on the edge of the Pareto front.

TABLE II: Rankings of the nine approximated optimal μ\mu-distributions (𝑨HV\mbox{\boldmath$A$}_{\rm HV}, …, 𝑨PD\mbox{\boldmath$A$}_{\rm PD}) and 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} by each quality indicator on FconcaveF_{\rm concave}.
𝑨HV\mbox{\boldmath$A$}_{\rm HV} 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}} 𝑨R2\mbox{\boldmath$A$}_{\rm R2} 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2} 𝑨Iϵ+\mbox{\boldmath$A$}_{I_{\epsilon+}} 𝑨SE\mbox{\boldmath$A$}_{\rm SE} 𝑨Δ\mbox{\boldmath$A$}_{\Delta} 𝑨PD\mbox{\boldmath$A$}_{\rm PD} 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}
HV 1 10 3 5 2 7 4 8 9 6
IGD 9 1 10 7 8 2 4 6 5 3
IGD+ 2 9 1 4 3 7 5 8 10 6
R2 2 10 4 1 3 8 5 7 9 6
NR2 2 10 3 5 1 7 4 8 9 6
Iϵ+I_{\epsilon+} 3 5 6 4 2 1 8 9 10 7
SE 7 6 4 3 5 8 1 9 10 2
Δ\Delta 9 3 6 4 7 5 2 1 10 8
PD 7 5 8 3 9 2 6 4 1 10
Avg. 4.2 5.9 4.5 3.6 4.0 4.7 3.9 6.0 7.3 5.4

VI-A8 𝑨Δ\mbox{\boldmath$A$}_{\Delta}

Since 𝑨Δ\mbox{\boldmath$A$}_{\Delta} contains the extreme objective vectors on all FF, Δ\Delta can evaluate the spread quality of objective vector sets. In contrast, the distribution of objective vectors in 𝑨Δ\mbox{\boldmath$A$}_{\Delta} is far from uniform. Thus, Δ\Delta may incorrectly evaluate the uniformity quality of objective vector sets.

Δ\Delta uses a pairwise distance between two closest objective vectors. As pointed out in [15], this calculation scheme in Δ\Delta can provide misleading results in some cases. This is because Δ\Delta does not consider the distance between non-closest objective vectors. In fact, objective vectors in 𝑨Δ\mbox{\boldmath$A$}_{\Delta} for FlinearF_{\rm linear} seem to be connected in a chained manner (e.g., see Fig. 4 (h)). In contrast, SE handles the distance between all pairs of two objective vectors. Thus, SE does not have this issue in Δ\Delta. It should be noted that the misleading results by Δ\Delta have already been reported in [15, 5]. Thus, this observation is not new. Our contribution here is that we find and analyze 𝑨Δ\mbox{\boldmath$A$}_{\Delta} on the eight front shape functions. The above-mentioned unintuitive distribution of objective vectors in 𝑨Δ\mbox{\boldmath$A$}_{\Delta} has not been investigated in the literature.

VI-A9 𝑨PD\mbox{\boldmath$A$}_{\rm PD}

Unlike the other quality indicators, PD evaluates the dissimilarity between objective vectors. For this reason, the distribution of objective vectors in 𝑨PD\mbox{\boldmath$A$}_{\rm PD} is not uniform on all front shape functions. This observation is consistent with the property of PD reported in [24, 5]. Also, 𝑨PD\mbox{\boldmath$A$}_{\rm PD} does not contain all the three extreme objective vectors in most cases.

Interestingly, at least two objective vectors in 𝑨PD\mbox{\boldmath$A$}_{\rm PD} overlap. Since some objective vectors in 𝑨PD\mbox{\boldmath$A$}_{\rm PD} completely overlap on FiconcaveF_{\rm i\mathchar 45\relax concave} and FdisconnectedF_{\rm disconnected}, it looks like that μ\mu is less than 2121. Thus, PD may overestimate an objective vector set that contains similar objective vectors. Although we calculated the PD value using the translated version of the original source code as explained in Section V, we can obtain exactly the same results even using the original one.

To understand the influence of overlapping objective vectors on PD, we slightly modified one of overlapping objective vectors in 𝑨PD\mbox{\boldmath$A$}_{\rm PD} on FlinearF_{\rm linear} (Fig. 7 (i)). Two resulting objective vector sets are denoted as 𝑨PDmod1\mbox{\boldmath$A$}_{\rm PD}^{\rm mod1} and 𝑨PDmod2\mbox{\boldmath$A$}_{\rm PD}^{\rm mod2}. Fig. S.2 in the supplementary file shows the distribution of objective vectors in 𝑨PDmod1\mbox{\boldmath$A$}_{\rm PD}^{\rm mod1} and 𝑨PDmod2\mbox{\boldmath$A$}_{\rm PD}^{\rm mod2} on FlinearF_{\rm linear}. The distribution of objective vectors in 𝑨PDmod1\mbox{\boldmath$A$}_{\rm PD}^{\rm mod1} and 𝑨PDmod2\mbox{\boldmath$A$}_{\rm PD}^{\rm mod2} is more diverse than that in 𝑨PD\mbox{\boldmath$A$}_{\rm PD}. Nevertheless, 𝑨PDmod1\mbox{\boldmath$A$}_{\rm PD}^{\rm mod1} and 𝑨PDmod2\mbox{\boldmath$A$}_{\rm PD}^{\rm mod2} are evaluated as worse than 𝑨PD\mbox{\boldmath$A$}_{\rm PD} as follows: PD(𝑨PD)=1.74×105{\rm PD}(\mbox{\boldmath$A$}_{\rm PD})=1.74\times 10^{5}, PD(𝑨PDmod1)=1.32×105{\rm PD}(\mbox{\boldmath$A$}_{\rm PD}^{\rm mod1})=1.32\times 10^{5}, and PD(𝑨PDmod2)=1.22×105{\rm PD}(\mbox{\boldmath$A$}_{\rm PD}^{\rm mod2})=1.22\times 10^{5}. Recall that PD is to be maximized.

Our observation may be due to the sensitivity of PD to the order of objective vectors. PD constructs a tree based on μ\mu objective vectors 𝒂1,,𝒂μ\mbox{\boldmath$a$}_{1},...,\mbox{\boldmath$a$}_{\mu} in an objective vector set 𝑨A. In this procedure, each objective vector is greedily linked to its nearest unreplicated objective vector in lexical order (i.e., from 𝒂1\mbox{\boldmath$a$}_{1} to 𝒂μ\mbox{\boldmath$a$}_{\mu}). The resulting tree depends on the order of the objective vectors. We arranged the objective vectors in 𝑨PD\mbox{\boldmath$A$}_{\rm PD} on FlinearF_{\rm linear} in reverse order (i.e., from 𝒂μ\mbox{\boldmath$a$}_{\mu} to 𝒂1\mbox{\boldmath$a$}_{1}). The resulting objective vector set is denoted as 𝑨PDmod3\mbox{\boldmath$A$}_{\rm PD}^{\rm mod3}. Although 𝑨PDmod3\mbox{\boldmath$A$}^{\rm mod3}_{\rm PD} and 𝑨PD\mbox{\boldmath$A$}_{\rm PD} contain exactly the same objective vectors, 𝑨PDmod3\mbox{\boldmath$A$}^{\rm mod3}_{\rm PD} is evaluated as worse than 𝑨PD\mbox{\boldmath$A$}_{\rm PD} in terms of PD as follows: PD(𝑨PDmod3)=1.29×105{\rm PD}(\mbox{\boldmath$A$}_{\rm PD}^{\rm mod3})=1.29\times 10^{5}. Overlapping objective vectors in 𝑨PD\mbox{\boldmath$A$}_{\rm PD} may help to construct a larger tree by implicitly exploiting the lexical characteristic of PD.

VI-B Ranking information based analysis

In Subsection VI-A, we analyzed the nine quality indicators using the optimal μ\mu-distributions. Here, we examine the nine quality indicators using the ranking information.

Table II shows the rankings of the nine approximated optimal μ\mu-distributions (𝑨HV\mbox{\boldmath$A$}_{\rm HV}, …, 𝑨PD\mbox{\boldmath$A$}_{\rm PD}) and 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} by each quality indicator on FconcaveF_{\rm concave}. The average ranking values are reported at the bottom of Table II. Tables S.2–S.6 in the supplementary file show the rankings on FlinearF_{\rm linear}, FconvexF_{\rm convex}, FilinearF_{\rm i\mathchar 45\relax linear}, FiconcaveF_{\rm i\mathchar 45\relax concave}, and FiconvexF_{\rm i\mathchar 45\relax convex}, respectively. Since 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} cannot be generated on FdisconnectedF_{\rm disconnected} and FcconcaveF_{\rm c\mathchar 45\relax concave}, the rankings for FdisconnectedF_{\rm disconnected} and FcconcaveF_{\rm c\mathchar 45\relax concave} are not shown. Note that the quality indicator called “SLD” does not exist. Recall that 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} is generated by simplex-lattice design. Due to the paper length limitation, we mainly explain the results on FconcaveF_{\rm concave}.

On the one hand, from each row in Table II, we can read the ranking of the ten objective vector sets by the corresponding quality indicator on FconcaveF_{\rm concave}. For example, HV evaluates 𝑨HV\mbox{\boldmath$A$}_{\rm HV} and 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} as the best and the worst, respectively. On the other hand, from each column in Table II, we can read the ranking of the corresponding objective vector set by the nine indicators. For example, 𝑨SE\mbox{\boldmath$A$}_{\rm SE} is ranked at fourth by HV, IGD, and NR2.

The average ranking value of each objective vector set in the bottom row of Table II (i.e., average value of each column) means how highly the corresponding objective vector set is evaluated by all the nine quality indicators. 𝑨R2\mbox{\boldmath$A$}_{\rm R2} performs the best among the 10 objective vector sets on FconcaveF_{\rm concave} in terms of the average ranking values. Thus, a good objective vector set with respect to R2 may be highly evaluated by the other quality indicators on FconcaveF_{\rm concave}. In contrast, 𝑨PD\mbox{\boldmath$A$}_{\rm PD} performs the worst in terms of the average ranking values. This result indicates that a good objective vector set with respect to PD may be evaluated as inferior to other objective vector sets by the other quality indicators on FconcaveF_{\rm concave}. These results are useful for selecting a quality indicator used in indicator-based EMOAs. Note that the best objective vector set in terms of the average ranking values depends on the choice of FF. For example, Table S.4 shows that 𝑨R2\mbox{\boldmath$A$}_{\rm R2} performs the second worst on FilinearF_{\rm i\mathchar 45\relax linear}.

Each quality indicator II evaluates its optimal μ\mu-distribution 𝑨I\mbox{\boldmath$A$}_{I} as the best. For this reason, all diagonal elements in Table II are “1”. The same results can be found in the rankings on the other front shape functions, except for the ranking of 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}} by IGD+ on FlinearF_{\rm linear} in Table S.2 and 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2} by NR2 on FiconcaveF_{\rm i\mathchar 45\relax concave} in Table S.5. IGD+ and NR2 evaluate 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} and 𝑨HV\mbox{\boldmath$A$}_{\rm HV} as the best in Tables S.2 and S.5, respectively. These exceptional results are due to the similarity between IGD and IGD+, and the similarity between HV and NR2.

As in [38], Table III shows the rank-based nonlinear Kendall rank correlation τ\tau values of the nine quality indicator on FconcaveF_{\rm concave}. Tables S.7–S.11 in the supplementary file show the τ\tau values of the nine quality indicator on FlinearF_{\rm linear}, FconvexF_{\rm convex}, FilinearF_{\rm i\mathchar 45\relax linear}, FiconcaveF_{\rm i\mathchar 45\relax concave}, and FiconvexF_{\rm i\mathchar 45\relax convex}, respectively. The τ\tau value in Table III represents the similarity of the rankings by two quality indicators I1I_{1} and I2I_{2} on the 10 objective vector sets (𝑨HV\mbox{\boldmath$A$}_{\rm HV}, …, 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD}). The τ\tau values in Table III are symmetric. The range of the τ\tau value is [1,1][-1,1]. While the non-negative τ\tau value means that I1I_{1} and I2I_{2} are consistent with each other, the negative τ\tau value means that I1I_{1} and I2I_{2} are inconsistent with each other. For example, Table III shows that HV is inconsistent with IGD (τ=0.60\tau=-0.60), but HV is consistent with IGD+ (τ=0.82\tau=0.82).

TABLE III: Kendall τ\tau values of the nine indicator on FconcaveF_{\rm concave}.
HV IGD IGD+ R2 NR2 Iϵ+I_{\epsilon+} SE Δ\Delta PD
HV 1.00 -0.60 0.82 0.78 0.96 0.33 0.24 -0.24 -0.42
IGD -0.60 1.00 -0.69 -0.56 -0.56 -0.02 -0.02 0.20 0.20
IGD+ 0.82 -0.69 1.00 0.69 0.78 0.33 0.33 -0.16 -0.42
R2 0.78 -0.56 0.69 1.00 0.73 0.29 0.29 -0.11 -0.29
NR2 0.96 -0.56 0.78 0.73 1.00 0.38 0.29 -0.20 -0.47
Iϵ+I_{\epsilon+} 0.33 -0.02 0.33 0.29 0.38 1.00 -0.07 -0.11 -0.11
SE 0.24 -0.02 0.33 0.29 0.29 -0.07 1.00 0.16 -0.47
Δ\Delta -0.24 0.20 -0.16 -0.11 -0.20 -0.11 0.16 1.00 0.11
PD -0.42 0.20 -0.42 -0.29 -0.47 -0.11 -0.47 0.11 1.00

As seen from Table II, no optimal μ\mu-distribution is ranked as the best by multiple quality indicators due to the conflicting properties of the nine quality indicators. For example, 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} is ranked as the best by IGD but the worst by HV, R2, and NR2. In contrast, the rankings by HV and NR2 are almost the same on all front shape functions (except for the rankings of 𝑨HV\mbox{\boldmath$A$}_{\rm HV} and 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}). In fact, the τ\tau value of HV and NR2 in Table III is 0.960.96, which indicates HV and NR2 are highly correlated with each other. Although the rankings by IGD and IGD+ are almost the same on the linear Pareto fronts (FlinearF_{\rm linear} and FilinearF_{\rm i\mathchar 45\relax linear}), they are dissimilar on the non-linear Pareto fronts (FconcaveF_{\rm concave}, FconvexF_{\rm convex}, FiconcaveF_{\rm i\mathchar 45\relax concave}, and FiconvexF_{\rm i\mathchar 45\relax convex}). The τ\tau value of IGD and IGD+ in Table III is also 0.60-0.60 on FconcaveF_{\rm concave}. These results of IGD and IGD+ are consistent with [40, 31].

The simplex lattice-based 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} can be viewed as the optimal objective vector set obtained by decomposition-based EMOAs. Since 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} has a good uniformity, 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} is evaluated as the second best by SE on FconcaveF_{\rm concave}. In contrast, 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} is evaluated as “poor” by the other quality indicators in most cases (except for the results on FilinearF_{\rm i\mathchar 45\relax linear} in Table S.4). In fact, the distribution of objective vectors in 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} is not identical to optimal μ\mu-distributions for the nine quality indicators (see Figs. 47). The poor results of 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} suggest that decomposition-based EMOAs need an adaptive weight vector method considering the optimal μ\mu-distributions for a given quality indicator.

VI-C Analysis for m4m\geq 4

Since it is almost impossible to clearly show the distribution of objective vectors of a problem with m4m\geq 4 in an understandable manner [56, 57], this paper focused on m=3m=3. We reemphasize that our analysis for m=3m=3 itself is already an important contribution to the EMO community. Nevertheless, it is interesting to investigate whether the results for m=3m=3 can be generalized to the case of m4m\geq 4.

Since the distribution of objective vectors for m4m\geq 4 is unclear, our analysis here is based on the ranking information as in Subsection VI-B. Tables S.12–S.23 in the supplementary file show the rankings of the nine approximated optimal μ\mu-distributions (𝑨HV\mbox{\boldmath$A$}_{\rm HV}, …, 𝑨PD\mbox{\boldmath$A$}_{\rm PD}) with μ=21\mu=21 by each quality indicator on six front shape functions (FlinearF_{\rm linear}, FconcaveF_{\rm concave}, FconvexF_{\rm convex}, FilinearF_{\rm i\mathchar 45\relax linear}, FiconcaveF_{\rm i\mathchar 45\relax concave}, and FiconvexF_{\rm i\mathchar 45\relax convex}) with m=5m=5 and 88, respectively. Since 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} with μ=21\mu=21 cannot be generated for m=5m=5 and 88, 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} is removed from this experiment. The size of the reference point set 𝑹R was set to 3 0603\,060 for m=5m=5 and 5 1485\,148 for m=8m=8. We generated 𝑹R for m=8m=8 using the two-layered version of simplex-lattice design [49]. Tables S.24–S.35 in the supplementary file also show the Kendall rank correlation τ\tau values of the nine quality indicators on the six front shape functions with m=5m=5 and 88, respectively.

Although most results for m4m\geq 4 are consistent with the results for m=3m=3, some exceptional results can be found. 𝑨SE\mbox{\boldmath$A$}_{\rm SE} is ranked as the best by Δ\Delta on FlinearF_{\rm linear} and FilinearF_{\rm i\mathchar 45\relax linear} with m=8m=8, and FconvexF_{\rm convex} and FiconcaveF_{\rm i\mathchar 45\relax concave} with m{5,8}m\in\{5,8\}. This means that a better approximated optimal μ\mu-distribution for Δ\Delta can be found by optimizing SE. Similarly, 𝑨R2\mbox{\boldmath$A$}_{\rm R2} is ranked as the best by Iϵ+I_{\epsilon+} on FiconcaveF_{\rm i\mathchar 45\relax concave} with m{5,8}m\in\{5,8\}. This may be because of the problem difficulty in finding the optimal μ\mu-distribution for Δ\Delta and Iϵ+I_{\epsilon+}. For example, as mentioned in Subsection VI, the Iϵ+I_{\epsilon+} value of 𝑨A depends only on one objective value of one objective vector in 𝑨A. As in the Schwefel 2.21 function f(𝒙)=maxi{1,,n}{|xi|}f(\mbox{\boldmath$x$})=\max_{i\in\{1,...,n\}}\{|x_{i}|\} [58], this property of Iϵ+I_{\epsilon+} can incorporate a strong nonseparability between variables θ1,,θd\theta_{1},...,\theta_{d} in the proposed formulation. In addition, this property can make some areas of the fitness landscape of the proposed formulation plateaus. These problem difficulties become pronounced with the increase of mm. A landscape analysis of the proposed formulation for each quality indicator is needed for deeper understanding.

Table IV shows the best objective vector sets on each FF with m{3,5,8}m\in\{3,5,8\} in terms of the average ranking. We can see that the best objective vector set in terms of the average ranking depends on mm in addition to the choice of FF. For example, 𝑨R2\mbox{\boldmath$A$}_{\rm R2}, 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}, and 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}} perform the best on FconcaveF_{\rm concave} with m=3m=3, 55, and 88, respectively. These results indicate the necessity of an adaptive indicator selection mechanism in indicator-based EMOAs (e.g., [34]).

TABLE IV: Best objective vector sets on each front shape function with m{3,5,8}m\in\{3,5,8\} in terms of the average ranking.
FlinearF_{\rm linear} FconcaveF_{\rm concave} FconvexF_{\rm convex} FilinearF_{\rm i\mathchar 45\relax linear} FiconcaveF_{\rm i\mathchar 45\relax concave} FiconvexF_{\rm i\mathchar 45\relax convex}
33 𝑨HV\mbox{\boldmath$A$}_{\rm HV}, 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2} 𝑨R2\mbox{\boldmath$A$}_{\rm R2} 𝑨HV\mbox{\boldmath$A$}_{\rm HV} 𝑨HV\mbox{\boldmath$A$}_{\rm HV} 𝑨HV\mbox{\boldmath$A$}_{\rm HV} 𝑨HV\mbox{\boldmath$A$}_{\rm HV}
55 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2}, 𝑨Δ\mbox{\boldmath$A$}_{\Delta} 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2} 𝑨HV\mbox{\boldmath$A$}_{\rm HV} 𝑨NR2\mbox{\boldmath$A$}_{\rm NR2} 𝑨HV\mbox{\boldmath$A$}_{\rm HV} 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}
88 𝑨Δ\mbox{\boldmath$A$}_{\Delta} 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}} 𝑨IGD\mbox{\boldmath$A$}_{\rm IGD} 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}} 𝑨HV\mbox{\boldmath$A$}_{\rm HV} 𝑨IGD+\mbox{\boldmath$A$}_{\rm IGD^{+}}

In addition, the correlation between HV and NR2 becomes weak with the increase of mm. For example, the τ\tau values of HV and NR2 on FlinearF_{\rm linear} are 0.960.96 for m=3m=3, 0.890.89 for m=5m=5, and 0.220.22 for m=8m=8 (see Tables III, S.24, and S.30, respectively). This observation indicates that the approximation performance of NR2 is likely to deteriorate with the increase of mm.

In summary, the results here show that some results for m=3m=3 cannot be always generalized to the case of m4m\geq 4. Thus, our observations are only for m=3m=3 as emphasized in the title of this paper. Further analysis for many-objective problems is need in future research.

VI-D Analysis of a unary version of an MM-nary indicator

As in Iϵ+I_{\epsilon+}, some binary or MM-nary quality indicators can be converted into their unary versions by using the reference vector set 𝑹R as the compared vector set. However, it is questionable that such a converted unary quality indicator works well. Here, we demonstrate that the proposed formulation can be useful to analyze the unary version of an MM-nary indicator.

Since a number of MM-nary quality indicators have been proposed, it is very difficult to investigate all of them. Instead, we selected the diversity comparison indicator (DCI) [59] since it is recommended in [5]. DCI relatively compares MM objective vector sets 𝑨1\mbox{\boldmath$A$}_{1}, …, 𝑨M\mbox{\boldmath$A$}_{M}. A large DCI value indicates that the corresponding 𝑨A has a good diversity among 𝑨1\mbox{\boldmath$A$}_{1}, …, 𝑨M\mbox{\boldmath$A$}_{M}. We explain DCI in Section S.4 in the supplementary file. For the DCI calculation, we used the source code provided by the authors of [59].

We converted DCI into the unary indicator using the reference vector set 𝑹R. To generate 𝑹R, we used the same method described in Section V. Thus, the DCI value of 𝑨A is obtained by comparing 𝑨A to 𝑹R by DCI. As recommend in [59], we set divdiv in DCI to 1919 for m=3m=3 in this study. Since we want to examine the general property of DCI, we did not finely tune divdiv for each FF. We set μ\mu to 21. We investigate the influence of the size of 𝑹R on DCI. Below, we denote DCI with |𝑹|=21|\mbox{\boldmath$R$}|=21 and |𝑹|=28|\mbox{\boldmath$R$}|=28 as DCI21 and DCI28, respectively.

Fig. 10 shows the distribution of objective vectors in 𝑨DCI21\mbox{\boldmath$A$}_{\rm DCI21} and 𝑨DCI28\mbox{\boldmath$A$}_{\rm DCI28} on FlinearF_{\rm linear}. Figs. S.51–S.55 in the supplementary file show 𝑨DCI21\mbox{\boldmath$A$}_{\rm DCI21} (and 𝑨DCI28\mbox{\boldmath$A$}_{\rm DCI28}) on FconcaveF_{\rm concave}, FconvexF_{\rm convex}, FilinearF_{\rm i\mathchar 45\relax linear}, FiconcaveF_{\rm i\mathchar 45\relax concave}, and FiconvexF_{\rm i\mathchar 45\relax convex}, respectively. On the one hand, Fig. 10 (a) shows that the objective vectors in 𝑨DCI21\mbox{\boldmath$A$}_{\rm DCI21} are uniformly distributed. Thus, DCI21 is likely to prefer uniformly distributed objective vectors. In fact, DCI21(𝑨DCI21)=1{\rm DCI21}(\mbox{\boldmath$A$}_{\rm DCI21})=1 and DCI21(𝑨SLD)=1{\rm DCI21}(\mbox{\boldmath$A$}_{\rm SLD})=1, where 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} is shown in Fig. 4 (j). On the other hand, Fig. 10 (b) shows that the distribution of the objective vectors in 𝑨DCI28\mbox{\boldmath$A$}_{\rm DCI28} is not uniform. This result means that DCI28 does not prefer uniformly distributed objective vectors. Actually, DCI28 evaluates 𝑨DCI28\mbox{\boldmath$A$}_{\rm DCI28} as better than 𝑨SLD\mbox{\boldmath$A$}_{\rm SLD} as follows: DCI28(𝑨DCI28)=0.75{\rm DCI28}(\mbox{\boldmath$A$}_{\rm DCI28})=0.75 and DCI28(𝑨SLD)=0.67{\rm DCI28}(\mbox{\boldmath$A$}_{\rm SLD})=0.67.

In summary, we can conclude that it is possible to design a unary version of DCI, but its result is sensitive to the size of 𝑹R. In the unary version of DCI, the size of 𝑹R and μ\mu should be the same. Otherwise, non-uniformly distributed objective vectors are highly evaluated by DCI (as shown in Fig. 10 (b)).

Refer to caption
(a) 𝑨DCI21\mbox{\boldmath$A$}_{\rm DCI21}
Refer to caption
(b) 𝑨DCI28\mbox{\boldmath$A$}_{\rm DCI28}
Figure 10: 𝑨DCI21\mbox{\boldmath$A$}_{\rm DCI21} and 𝑨DCI28\mbox{\boldmath$A$}_{\rm DCI28} on FlinearF_{\rm linear}.

VII Conclusion

We have analyzed the nine quality indicators using their approximated optimal μ\mu-distributions for m=3m=3. First, we proposed the problem formulation of finding the optimal μ\mu-distribution for each quality indicator on Pareto fronts for arbitrary number of objectives (m2m\geq 2). Then, we approximated the optimal μ\mu-distributions for the nine quality indicators on the Pareto fronts of eight problems with m=3m=3. We analyzed the nine quality indicators based on the optimal μ\mu-distribution based approach and the ranking information based approach. We also examined the generality of results for m=3m=3 with respect to mm and a unary version of an MM-nary indicator.

Our observations for m=3m=3 can be summarized as follows:

  1. i)

    Objective vectors in the approximated optimal μ\mu-distributions for R2 are biased on the edge of the Pareto front for FilinearF_{\rm i\mathchar 45\relax linear} and FiconcaveF_{\rm i\mathchar 45\relax concave}. These results for m=3m=3 are inconsistent with the results for m=2m=2.

  2. ii)

    The approximated optimal μ\mu-distributions for NR2 and HV are similar. NR2 and HV are correlated with each other in terms of the Kendall rank correlation (τ=0.96\tau=0.96 on FconcaveF_{\rm concave}). Thus, NR2 can substitute for HV.

  3. iii)

    While the optimal μ\mu-distributions for Iϵ+I_{\epsilon+} for m=2m=2 are almost uniform, those for m=3m=3 are not uniform even on the linear front.

  4. iv)

    Although SE can evaluate both uniformity and spread quality of objective vector sets, SE prefers objective vectors that are on the edge of the Pareto front.

  5. v)

    Since the optimal μ\mu-distributions for Δ\Delta are far from uniform, Δ\Delta may incorrectly evaluate the uniformity quality of objective vector sets.

  6. vi)

    PD may overestimate an objective vector set with a small dissimilarity. PD is also sensitive to the order of objective vectors.

  7. vii)

    Most quality indicators evaluate objective vectors generated by simplex-lattice design as poor. Thus, decomposition-based EMOAs need an adaptive weight vector method when they try to search for a good objective vector set with respect to each quality indicator.

  8. viii)

    A unary version of DCI is sensitive to the size of the reference vector set.

We emphasize that the above-mentioned observations on the Pareto front for m=3m=3 could not be obtained without searching for the optimal distributions of objective vectors by the proposed problem formulation. As shown in Subsection VI-C, our results for m=3m=3 cannot be always generalized to the case of m4m\geq 4. Further analysis is needed for many-objective problems in future research.

A further analysis of the optimal μ\mu-distributions for R2 and Iϵ+I_{\epsilon+} on the Pareto front for m=3m=3 is needed. The undesirable property of PD could be addressed by using an algorithm for finding a minimum spanning tree (e.g., [60]). The problem of finding the optimal μ\mu-distribution in the proposed problem formulation itself can be viewed as a “real-world” black-box single-objective optimization problem. Although we used L-SHADE in this paper, we believe that benchmarking various single-objective optimizers on the proposed problem formulation is beneficial for both the EMO community and the evolutionary single-objective optimization community.

Acknowledgment

This work was supported by National Natural Science Foundation of China (Grant No. 61876075), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), Shenzhen Peacock Plan (Grant No. KQTD2016112514355531), the Science and Technology Innovation Committee Foundation of Shenzhen (Grant No. ZDSYS201703031748284), the Program for University Key Laboratory of Guangdong Province (Grant No. 2017KSYS008).

References

  • [1] K. Miettinen, Nonlinear Multiobjective Optimization.   Springer, 1998.
  • [2] K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms.   John Wiley & Sons, 2001.
  • [3] H. Ishibuchi, R. Imada, Y. Setoguchi, and Y. Nojima, “Reference Point Specification in Inverted Generational Distance for Triangular Linear Pareto Front,” IEEE TEVC, vol. 22, no. 6, pp. 961–975, 2018.
  • [4] ——, “How to Specify a Reference Point in Hypervolume Calculation for Fair Performance Comparison,” Evol. Comput., vol. 26, no. 3, 2018.
  • [5] M. Li and X. Yao, “Quality Evaluation of Solution Sets in Multiobjective Optimisation: A Survey,” ACM CSUR, vol. 52, no. 2, pp. 26:1–26:38, 2019.
  • [6] J. Knowles and D. Corne, “On Metrics for Comparing Nondominated Sets,” in IEEE CEC, 2002, pp. 711–716.
  • [7] E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. da Fonseca, “Performance assessment of multiobjective optimizers: an analysis and review,” IEEE TEVC, vol. 7, no. 2, pp. 117–132, 2003.
  • [8] T. Okabe, Y. Jin, and B. Sendhoff, “A Critical Survey of Performance Indices for Multi-Objective Optimisation,” in IEEE CEC, 2002, pp. 878–885.
  • [9] E. Zitzler and L. Thiele, “Multiobjective Optimization Using Evolutionary Algorithms - A Comparative Case Study,” in PPSN, 1998, pp. 292–304.
  • [10] C. A. C. Coello and M. R. Sierra, “A Study of the Parallelization of a Coevolutionary Multi-objective Evolutionary Algorithm,” in MICAI, 2004, pp. 688–697.
  • [11] H. Ishibuchi and T. Murata, “A multi-objective genetic local search algorithm and its application to flowshop scheduling,” IEEE Trans. SMC, Part C, vol. 28, no. 3, pp. 392–403, 1998.
  • [12] A. Auger, J. Bader, D. Brockhoff, and E. Zitzler, “Theory of the hypervolume indicator: optimal μ\mu-distributions and the choice of the reference point,” in FOGA, 2009, pp. 87–102.
  • [13] K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE TEVC, vol. 6, no. 2, pp. 182–197, 2002.
  • [14] Y. Wang, L. Wu, and X. Yuan, “Multi-objective self-adaptive differential evolution with elitist archive and crowding entropy-based diversity measure,” Soft Comput., vol. 14, no. 3, pp. 193–209, 2010.
  • [15] S. Jiang, Y. Ong, J. Zhang, and L. Feng, “Consistencies and Contradictions of Performance Metrics in Multiobjective Optimization,” IEEE Trans. Cyber., vol. 44, no. 12, pp. 2391–2404, 2014.
  • [16] A. Auger, J. Bader, D. Brockhoff, and E. Zitzler, “Hypervolume-based multiobjective optimization: Theoretical foundations and practical implications,” Theor. Comput. Sci., vol. 425, pp. 75–103, 2012.
  • [17] D. Brockhoff, T. Wagner, and H. Trautmann, “On the properties of the R2 indicator,” in GECCO, 2012, pp. 465–472.
  • [18] T. Glasmachers, “Optimized Approximation Sets for Low-Dimensional Benchmark Pareto Fronts,” in PPSN, 2014, pp. 569–578.
  • [19] K. Bringmann, T. Friedrich, and P. Klitzke, “Efficient computation of two-dimensional solution sets maximizing the epsilon-indicator,” in IEEE CEC, 2015, pp. 970–977.
  • [20] N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evol. Comput., vol. 9, no. 2, pp. 159–195, 2001.
  • [21] H. Ishibuchi, H. Masuda, Y. Tanigaki, and Y. Nojima, “Modified Distance Calculation in Generational Distance and Inverted Generational Distance,” in EMO, 2015, pp. 110–125.
  • [22] K. Shang, H. Ishibuchi, M. Zhang, and Y. Liu, “A new R2 indicator for better hypervolume approximation,” in GECCO, 2018, pp. 745–752.
  • [23] D. P. Hardin and E. B. Saff, “Discretizing Manifolds via Minimum Energy Points,” Notices of the AMS, vol. 51, no. 10, pp. 1186–1194, 2004.
  • [24] H. Wang, Y. Jin, and X. Yao, “Diversity Assessment in Many-Objective Optimization,” IEEE Trans. Cyber., vol. 47, no. 6, pp. 1510–1522, 2017.
  • [25] E. Zitzler, L. Thiele, and J. Bader, “On Set-Based Multiobjective Optimization,” IEEE TEVC, vol. 14, no. 1, pp. 58–79, 2010.
  • [26] J. Knowles, L. Thiele, and E. Zitzler, “A tutorial on the performance assessment of stochastic multiobjective optimizers,” ETH Zurich, Tech. Rep. TIK-214, 2006.
  • [27] D. A. V. Veldhuizen and G. B. Lamont, “Multiobjective Evolutionary Algorithm Research: A History and Analysis,” Air Force Institute of Technology, Tech. Rep. TR-98-03, 1998.
  • [28] D. A. V. Veldhuizen, “Multiobjective evolutionary algorithms: Classifications, analyses, and new innovations,” Air University, Tech. Rep. AFIT/DS/ENG/99-01, 1999.
  • [29] E. Zitzler, D. Brockhoff, and L. Thiele, “The Hypervolume Indicator Revisited: On the Design of Pareto-compliant Indicators Via Weighted Integration,” in EMO, 2006, pp. 862–876.
  • [30] O. Schütze, X. Esquivel, A. Lara, and C. A. C. Coello, “Using the Averaged Hausdorff Distance as a Performance Measure in Evolutionary Multiobjective Optimization,” IEEE TEVC, vol. 16, no. 4, pp. 504–522, 2012.
  • [31] H. Ishibuchi, R. Imada, N. Masuyama, and Y. Nojima, “Comparison of Hypervolume, IGD and IGD+ from the Viewpoint of Optimal Distributions of Solutions,” in EMO, 2019, pp. 332–345.
  • [32] M. P. Hansen and A. Jaszkiewicz, “Evaluating the quality of approximations to the non-dominated set,” Poznan University of Technology, Tech. Rep. IMM-REP-1998-7, 1998.
  • [33] H. Ishibuchi, N. Tsukamoto, Y. Sakane, and Y. Nojima, “Indicator-based evolutionary algorithm with hypervolume approximation by achievement scalarizing functions,” in GECCO, 2010, pp. 527–534.
  • [34] J. G. Falcón-Cardona and C. A. C. Coello, “A multi-objective evolutionary hyper-heuristic based on multiple indicator-based density estimators,” in GECCO, 2018, pp. 633–640.
  • [35] A. R. Solow and S. Polasky, “Measuring biological diversity,” Environ. Ecol. Stat., vol. 1, no. 2, pp. 95–103, 1994.
  • [36] M. Ravber, M. Mernik, and M. Crepinsek, “The impact of Quality Indicators on the rating of Multi-objective Evolutionary Algorithms,” Appl. Soft Comput., vol. 55, pp. 265–275, 2017.
  • [37] S. Wessing and B. Naujoks, “Sequential parameter optimization for multi-objective problems,” in IEEE CEC, 2010, pp. 1–8.
  • [38] A. Liefooghe and B. Derbel, “A Correlation Analysis of Set Quality Indicator Values in Multiobjective Optimization,” in GECCO, 2016, pp. 581–588.
  • [39] H. Ishibuchi, H. Masuda, and Y. Nojima, “A Study on Performance Evaluation Ability of a Modified Inverted Generational Distance Indicator,” in GECCO, 2015, pp. 695–702.
  • [40] L. C. T. Bezerra, M. López-Ibáñez, and T. Stützle, “An Empirical Assessment of the Properties of Inverted Generational Distance on Multi- and Many-Objective Optimization,” in EMO, 2017, pp. 31–45.
  • [41] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, “Scalable Test Problems for Evolutionary Multi-Objective Optimization,” in Evolutionary Multiobjective Optimization. Theoretical Advances and Applications.   Springer, 2005, pp. 105–145.
  • [42] E. Zitzler, K. Deb, and L. Thiele, “Comparison of Multiobjective Evolutionary Algorithms: Empirical Results,” Evol. Comput., vol. 8, no. 2, pp. 173–195, 2000.
  • [43] K. Bringmann, T. Friedrich, and P. Klitzke, “Two-dimensional subset selection for hypervolume and epsilon-indicator,” in GECCO, 2014, pp. 589–596.
  • [44] N. Beume, B. Naujoks, and M. T. M. Emmerich, “SMS-EMOA: multiobjective selection based on dominated hypervolume,” EJOR, vol. 181, no. 3, pp. 1653–1669, 2007.
  • [45] S. Z. Martínez, C. A. C. Coello, H. E. Aguirre, and K. Tanaka, “A Review of Features and Limitations of Existing Scalable Multiobjective Test Suites,” IEEE TEVC, vol. 23, no. 1, pp. 130–142, 2019.
  • [46] A. Jaszkiewicz, “On the performance of multiple-objective genetic local search on the 0/10/1 knapsack problem - a comparative experiment,” IEEE TEVC, vol. 6, no. 4, pp. 402–412, 2002.
  • [47] H. Ishibuchi, H. Masuda, and Y. Nojima, “Pareto Fronts of Many-Objective Degenerate Test Problems,” IEEE TEVC, vol. 20, no. 5, pp. 807–813, 2016.
  • [48] Y. Tian, X. Xiang, X. Zhang, R. Cheng, and Y. Jin, “Sampling Reference Points on the Pareto Fronts of Benchmark Multi-Objective Optimization Problems,” in IEEE CEC, 2018, pp. 1–6.
  • [49] K. Deb and H. Jain, “An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints,” IEEE TEVC, vol. 18, no. 4, pp. 577–601, 2014.
  • [50] H. Jain and K. Deb, “An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach, part II: handling constraints and extending to an adaptive approach,” IEEE TEVC, vol. 18, no. 4, pp. 602–622, 2014.
  • [51] H. Ishibuchi, Y. Setoguchi, H. Masuda, and Y. Nojima, “Performance of Decomposition-Based Many-Objective Algorithms Strongly Depends on Pareto Front Shapes,” IEEE TEVC, vol. 21, no. 2, pp. 169–190, 2017.
  • [52] R. Tanabe and A. S. Fukunaga, “Improving the search performance of SHADE using linear population size reduction,” in IEEE CEC, 2014, pp. 1658–1665.
  • [53] R. Storn and K. Price, “Differential Evolution - A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces,” J. Global Optimiz., vol. 11, no. 4, pp. 341–359, 1997.
  • [54] R. L. While, L. Bradstreet, and L. Barone, “A Fast Way of Calculating Exact Hypervolumes,” IEEE TEVC, vol. 16, no. 1, pp. 86–95, 2012.
  • [55] I. Das and J. E. Dennis, “Normal-Boundary Intersection: A New Method for Generating the Pareto Surface in Nonlinear Multicriteria Optimization Problems,” SIAM J. Optimiz, vol. 8, no. 3, pp. 631–657, 1998.
  • [56] T. Tusar and B. Filipic, “Visualization of Pareto Front Approximations in Evolutionary Multiobjective Optimization: A Critical Review and the Prosection Method,” IEEE TEVC, vol. 19, no. 2, pp. 225–245, 2015.
  • [57] M. Li, L. Zhen, and X. Yao, “How to Read Many-Objective Solution Sets in Parallel Coordinates [Educational Forum],” IEEE CIM, vol. 12, no. 4, pp. 88–100, 2017.
  • [58] X. Yao, Y. Liu, and G. Lin, “Evolutionary Programming Made Faster,” IEEE TEVC, vol. 3, no. 2, pp. 82–102, 1999.
  • [59] M. Li, S. Yang, and X. Liu, “Diversity Comparison of Pareto Front Approximations in Many-Objective Optimization,” IEEE Trans. Cyber., vol. 44, no. 12, pp. 2568–2584, 2014.
  • [60] B. Chazelle, “A minimum spanning tree algorithm with Inverse-Ackermann type complexity,” J. ACM, vol. 47, no. 6, pp. 1028–1047, 2000.
[Uncaptioned image] Ryoji Tanabe is a Research Assistant Professor with Department of Computer Science and Engineering, Southern University of Science and Technology, China, from 2017 to 2019. He was a Post-Doctoral Researcher with ISAS/JAXA, Japan, from 2016 to 2017. He received his Ph.D. in Science from The University of Tokyo, Japan, in 2016. His research interests include stochastic single- and multi-objective optimization algorithms, parameter control in evolutionary algorithms, and automatic algorithm configuration.
[Uncaptioned image] Hisao Ishibuchi (M’93-SM’10-F’14) received the B.S. and M.S. degrees in precision mechanics from Kyoto University, Kyoto, Japan, in 1985 and 1987, respectively, and the Ph.D. degree in computer science from Osaka Prefecture University, Sakai, Osaka, Japan, in 1992. He was with Osaka Prefecture University in 1987-2017. Since 2017, he is a Chair Professor at Southern University of Science and Technology, China. His research interests include fuzzy rule-based classifiers, evolutionary multi-objective and many-objective optimization, memetic algorithms, and evolutionary games. Dr. Ishibuchi was the IEEE Computational Intelligence Society (CIS) Vice-President for Technical Activities (2010-2013), an AdCom member of the IEEE CIS (2014-2019), and the Editor-in-Chief of the IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE (2014-2019).