A novel multiobjective evolutionary algorithm based on decomposition and multi-reference points strategy

Wang Chen Jian Chen Weitian Wu Xinmin Yang Hui Li College of Mathematics, Sichuan University, Chengdu 610065, China College of Sciences, Shanghai University, Shanghai 200444, China School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China

Abstract

Many real-world optimization problems such as engineering design can be eventually modeled as the corresponding multiobjective optimization problems (MOPs) which must be solved to obtain approximate Pareto optimal fronts. Multiobjective evolutionary algorithm based on decomposition (MOEA/D) has been regarded as a significantly promising approach for solving MOPs. Recent studies have shown that MOEA/D with uniform weight vectors is well-suited to MOPs with regular Pareto optimal fronts, but its performance in terms of diversity usually deteriorates when solving MOPs with irregular Pareto optimal fronts. In this way, the solution set obtained by the algorithm can not provide more reasonable choices for decision makers. In order to efficiently overcome this drawback, we propose an improved MOEA/D algorithm by virtue of the well-known Pascoletti-Serafini scalarization method and a new strategy of multi-reference points. Specifically, this strategy consists of the setting and adaptation of reference points generated by the techniques of equidistant partition and projection. For performance assessment, the proposed algorithm is compared with existing four state-of-the-art multiobjective evolutionary algorithms on benchmark test problems with various types of Pareto optimal fronts. According to the experimental results, the proposed algorithm exhibits better diversity performance than that of the other compared algorithms. Finally, our algorithm is applied to two real-world MOPs in engineering optimization successfully.

keywords:

Evolutionary computations , Multiobjective optimization , Pascoletti-Serafini scalarization , Multi-reference points , Decomposition

1 Introduction

The problems of simultaneously optimizing multiple conflicting objectives often arise in engineering, finance, transportation and many other fields; see [3, 46, 65, 7, 49]. These problems are called multiobjective optimization problems (MOPs). It is not possible mathematically to define a single optimal solution for a given MOP but we have a set of trade-offs, that is, a set of so-called Pareto optimal solutions in the decision space, which constitute the Pareto optimal set. The image of Pareto optimal set in objective space is known as the Pareto optimal front (POF). Finding the entire POF is very time-consuming since the POF of most MOPs is frequently composed of exponential or even an infinite number of solutions. Moreover, the decision makers may not be interested in having an unduly large number of solutions. Thus, a commonly-used technique in practice is to find a representative approximation of the true POF.

Over the past two decades, we have witnessed a large variety of methods for solving MOPs; see [41, 20, 59, 56, 58, 19, 18], the survey papers [32, 48, 21, 51, 38, 47] and the books [40, 9, 16, 43]. Among the various methods mentioned in the above literature, multiobjective evolutionary algorithms (MOEAs) have attracted tremendous attention by many researchers. A reasonable interpretation is that the population-based heuristic search mechanism makes a MOEA find a suitable approximation of the entire POF in a single run. Three goals of a MOEA summarized in [51] are - 1) to find a set of solutions as close as possible to the POF (known as convergence), 2) to find a well distributed set of solutions (known as diversity), 3) to cover the entire POF (known as coverage). In order to achieve these goals, several existing MOEAs can be broadly classified into three groups: Pareto dominance-based approaches (e.g., nondominated sorting genetic algorithm II (NSGA-II) [12]), indicator-based approaches (e.g., hypervolume estimation algorithm (HypE) [5]) and decomposition-based approaches (e.g., cellular multiobjective genetic algorithm (cMOGA) [42], MOEA/D [60]).

In the above-mentioned three groups, decomposition-based approaches have become increasingly popular recently. MOEA/D, proposed by [60], is the most well-known and effective method in decomposition-based MOEAs. The philosophy behind MOEA/D is that it decomposes a target MOP into a series of scalar optimization subproblems by means of a set of uniform weight vectors generated by the lattice method proposed by [10] and a scalarization method (or, decomposition method) such as Tchebycheff, and then solves these subproblems simultaneously by using an evolutionary algorithm and evolving a population of solutions. Although MOEA/D with even weight vectors is well-suited to MOPs with regular POFs (i.e., simplex-like, e.g., a triangle and a sphere), many recent studies [45, 34, 38] have suggested that its performance is often bottlenecked by MOPs with irregular POFs (e.g., disconnected, degenerate, inverted, highly nonlinear and badly scaled). From the variants (see the survey papers [51, 38, 55]) gestated by MOEA/D in the past a dozen years, it can be seen that the predefined uniform weight vectors and the scalarization approach in MOEA/D limit the diversity of population to a great extent. Therefore, the adjustment of weight vectors and the improvement of scalarization approach become two crucial ingredients in the variants of MOEA/D.

1.

Weight Vectors: The weight vectors determine the search directions and, to a certain extent, the distribution of the final solution set [38]. In MOEA/D [60], the weight vectors are predefined and cannot be changed during the search process. It is exciting that various interesting attempts [45, 23, 29, 8, 35, 15] have been made to adjust the weight vectors adaptively during the evolution process. For instance, $pa\lambda$ -MOEA/D [29] uses a method called Pareto-adaptive weight vectors to automatically adjust the weight vectors via the geometrical features of the estimated POF. $pa\lambda$ -MOEA/D is suitable for MOPs whose POFs have a symmetric shape, but deteriorates on MOPs with more complex POFs. DMOEA/D [23] adopts the technique of equidistant interpolation to adjust weight vectors after several generations according to the projection of the current nondominated solutions. MOEA/D-AWA [45] dynamically adjusts the weight vectors at the later stage of evolution. To be specific, MOEA/D-AWA periodically deletes weight vectors in crowed areas and adds ones in sparse areas. RVEA* [8] employs the preset weight vectors in the initialization and then uses random weight vectors to replace the invalid weight vectors associated with no solution in the evolutionary process. For the MOPs with different objective scales, [35] gave a new strategy to adjust the weight vectors. This strategy modifies the component of each weight vector by multiplying a factor, which corresponds to the range of associated objective values of solutions in current population.
2.

Scalarization Approaches: The scalarization method defines an improvement region or a contour line for every subproblem, which can significantly affect the search ability of the evolving population. In MOEA/D, the authors presented three kinds of scalarization methods, namely, weighted sum (WS), Tchebycheff (TCH) and penalty-based boundary intersection (PBI). However, the solutions obtained by these scalarization approaches with uniform weight vectors are not always uniformly distributed along POF and the performance of PBI is suffered from the penalty parameter. Moreover, the choice of scalarization method plays critical role in the performance of MOEA/D on a particular problem and it is not an easy task to choose an appropriate scalarization approach for different MOPs [51]. To alleviate these drawbacks, many different scalarizing functions have been proposed in the literature; see the survey papers [51, 55]. For example, [26] used the augmented weighted Tchebycheff within the framework of MOEA/D so as to cope with the problem of selecting a suitable scalarization method for a particular problem. [30] proposed the reverse Tchebycheff approach, which can deal with the problems with highly nonlinear and convex POFs. [57] investigated the search ability of $L_{p}$ scalarization method with different contour lines determined by the $p$ values and then introduced a Pareto adaptive scalarizing approximation to approach the optimal $p$ value at different search stages. [39] proposed the Tchebycheff scalarization approach with $l_{p}$ -norm constraint on direction vectors in which the experimental results show that this method is capable of obtaining high quality solutions.

These methods in the aforementioned literature improve the diversity of the final solution set obtained by corresponding algorithms to a certain extent. Unfortunately, the ability of scalarization function and the adjustment of weight vectors are still limited. In particular, for the MOP whose POF has the shape of highly nonlinearity and convexity, e.g., the hatch cover design problem [49] and GLT3 [23], these methods do not seem to be very effective. More importantly, we observe that various scalarization approaches introduced in MOEA/D and its variants all take into account the information on the weight vectors and the ideal point (see the scalarization approaches summarized in Table 1 of Subsection 2.2). From the geometric point of view, we attribute these methods to a category of “single-point and multi-directions”, as shown in Fig. 1 in Subsection 2.2. Herein, single-point stands for the ideal point or the nadir point and multi-directions denote a set of preset uniformly distributed weight vectors. The left part of Fig. 3 in Subsection 3.1 reveals that the diversity of the final solution set is vulnerable to this geometric phenomenon.

In order to essentially change this geometric phenomenon, an intuitive idea called “multi-points and single-direction” is proposed (see the right part of Fig. 3 in Subsection 3.1). As a result, the aforesaid scalarization method may no longer be applicable. Now, the question is whether there is a scalarization method without considering the weight vectors and the ideal point, and then matches this idea? Here, it is shown that the answer to this question is positive. The Pascoletti-Serafini (PS) scalarization method with additional constraints proposed by [44] does not rely on the weight vector and the ideal point. It has two parameters, i.e., reference point and direction, which by varying them in $R^{m}$ ( $m$ -dimensional Euclidean space), all Pareto optimal solutions can be obtained for a given MOP. An advantage of the PS scalarization method is that it is very general in the sense that many other well-known and commonly-used scalarization methods such as the WS method, the $\epsilon$ -constraint method, the generalized WS method and the TCH method can be seen as a special case of it (see Section 2.5 in [16]). For researches on theories and applications of the PS method and its variants, we refer the interested readers to the literature [16, 17, 6, 33, 1, 14, 50] for more details. Note that [16] indicated that the optimal solution obtained by the PS scalarization method is the intersection point between the POF and the negative ordering cone along the line generated by reference point and direction. Therefore, in order to obtain a set of uniformly distributed solutions which can well approximate the true POF, the setting of the reference point and the direction in the PS scalarization approach is crucial.

In this paper, we propose a new multiobjective evolutionary algorithm based on decomposition and adaptive multi-reference points strategy, termed as MOEA/D-AMR, which performs well in diversity. The main contributions of this paper can be concluded as follows:

1.

Using a standard trick from mathematical programming, we equivalently transform the PS scalarization problem into a minimax optimization problem when each component of the direction is restricted to positive. Based on the proposed idea (i.e., “multi-points and single-direction”), a given MOP is decomposed into a series of the transformed minimax optimization subproblems.
2.

A strategy of setting multi-reference points is introduced. More specifically, the selection range of reference points is limited to a convex hull formed by the projection points of the vertices of the hypercube $[0,1]^{m}$ on a hyperplane and then the generation of reference points is realized by using two techniques including equidistant partition and projection.
3.

A multi-reference points adjustment strategy based on the obtained solutions in the later stage of evolution is proposed. This strategy can identify the promising reference points, delete unpromising ones and generate some new ones.
4.

We verify the diversity performance of the proposed algorithm by comparing it with four representative MOEAs on a series of benchmark multiobjective test problems with regular and irregular POFs. The proposed algorithm is used to solve two real-world MOPs in engineering optimization including the hatch cover design and the rocket injector design. The experimental results illustrate the effectiveness of our algorithm.

The rest of this paper is organized as follows. Section 2 gives some fundamental definitions related to multiobjective optimization and recalls several scalarization approaches. Section 3 discusses the motivation of the proposed algorithm and illustrates the details of its implemention. Algorithmic comparions and analysis on test problems are presented in Section 4, followed by applications on real-world MOPs in Section 5. Lastly, Section 6 concludes this paper and identifies some future plans.

2 Related works

We start with the descriptions of some basic concepts in multiobjective optimization. Then, we recall some scalarization methods used in MOEA/D framework. Finally, a brief review of the Pascoletti-Serafini scalarization method is presented.

2.1 Basic concepts

Throughtout this paper, for $n,m\in\mathbb{N}$ , where $\mathbb{N}$ denotes the set of natural numbers, we use the symbols

\langle m\rangle=\{1,\ldots,m\},\quad n_{m}=\underbrace{(n,\ldots,n)}\limits_{m}.

Let $R_{+}^{m}$ be the nonnegative orthant of $m$ -dimensional Euclidean space $R^{m}$ , i.e,

R_{+}^{m}=\{y=(y_{1},\ldots,y_{m})\in R^{m}:y_{i}\geq 0,i\in\langle m\rangle\}.

We consider the following multiobjective optimization problem:

		$\displaystyle\text{min}\quad F(x)=(f_{1}(x),...,f_{m}(x))^{T}$		(1)
		$\displaystyle\text{s.t.}\quad\;x=(x_{1},\ldots,x_{n})^{T}\in\Omega,$		(1)

where $x$ is a decision variable vector, $\Omega=\prod_{i=1}^{n}[l_{i},u_{i}]\subseteq R^{n}$ is the decision (search) space, $l_{i}$ and $u_{i}$ are the lower and upper bounds of the $i$ -th decision variable $x_{i}$ , respectively. $F:\Omega\rightarrow R^{m}$ consists of $m$ real-valued objective functions $f_{i}$ , $i\in\langle m\rangle$ and $R^{m}$ is also called the objective space. Since these objectives conflict with one another, it is not possible to find a feasible point that minimizes all objective functions at the same time. Therefore, it is necessary to give a concept of optimality which described in [40].

Definition 2.1.

A point $\hat{x}\in\Omega$ is called a

(i)

Pareto optimal solution of (1) if there is no $x\in\Omega$ such that $f_{i}(x)\leq f_{i}(\hat{x})$ for all $i\in\langle m\rangle$ and $f_{j}(x)<f_{j}(\hat{x})$ for at least one index $j\in\langle m\rangle$ .
(ii)

weakly Pareto optimal of (1) if there is no $x\in\Omega$ such that $f_{i}(x)<f_{i}(\hat{x})$ for all $i\in\langle m\rangle$ .

Remark 2.1.

It is obvious that if $\hat{x}\in\Omega$ is a Pareto optimal solution of (1), then $\hat{x}$ is a weakly Pareto optimal solution.

Definition 2.2.

If $\hat{x}$ is a (weakly) Pareto optimal solution of (1), then $F(\hat{x})$ is called a (weakly) Pareto optimal vector.

Definition 2.3.

The set of all Pareto optimal solutions is called the Pareto optimal set (POS). The set of all Pareto optimal vectors, ${\rm POF}=\{F(x)\in R^{m}:x\in{\rm POS}\}$ , is called the Pareto optimal front (POF).

Definition 2.4.

A point $z^{*}=(z_{1}^{*},\ldots,z_{m}^{*})$ is called an ideal point if $z_{i}^{*}=\min_{x\in\Omega}f_{i}(x)$ for each $i\in\langle m\rangle$ .

Remark 2.2.

Note that we have in Definition 2.4 that $f_{i}(x)-z_{i}^{*}\geq 0$ for all $x\in\Omega$ , $i\in\langle m\rangle$ .

Definition 2.5.

A point $z^{{\rm nad}}=(z_{1}^{{\rm nad}},\ldots,z_{m}^{{\rm nad}})$ is called a nadir point if $z_{i}^{{\rm nad}}=\max_{x\in{\rm POS}}f_{i}(x)$ for each $i\in\langle m\rangle$ .

2.2 Scalarization methods in MOEA/D framework

Let $w=(w_{1},\ldots,w_{m})$ be a weight vector, i.e., $\sum_{i=1}^{m}w_{i}=1$ and $w_{i}>0$ for all $i\in\langle m\rangle$ . The formulas of three traditional scalarization approaches (i.e., WS [40], TCH [40] and PBI [60]) and other six scalarization methods (i.e., the augmented Tchebycheff (a-TCH) [26], the modified Tchebycheff (m-TCH) [37], the reverse Tchebycheff (r-TCH) [30], $L_{p}$ scalarization [57], the multiplicative scalarizing function (MSF) [31] and the Tchebycheff with $l_{p}$ -norm constraint ( $p$ -TCH) [39]) used in MOEA/D framework are summarized in Table 1. More scalarization approaches within the framework of MOEA/D can be found in the survey papers [51, 55].

Table 1: The scalarization methods used in MOEA/D framework.

Scalarization	Formulas
WS	$\min_{x\in\Omega}\quad g^{{\rm ws}}(x\|w)=\sum_{i=1}^{m}w_{i}f_{i}(x)$
TCH	$\min_{x\in\Omega}\quad g^{{\rm tch}}(x\|w,z^{})=\max_{1\leq i\leq m}\{w_{i}(f_{i}(x)-z_{i}^{})\}$
PBI	$\min_{x\in\Omega}\quad g^{{\rm pbi}}(x\|w,z^{})=d_{1}+\theta d_{2}$ , $d_{1}=\frac{\\|(F(x)-z^{})^{T}w\\|}{\\|w\\|}$ , $d_{2}=\\|F(x)-(z^{*}+d_{1}w)\\|$ , $\theta>0$
a-TCH	$\min_{x\in\Omega}\quad g^{{\rm atch}}(x\|w,z^{})=\max_{1\leq i\leq m}\{w_{i}\|z_{i}^{}-f_{i}(x)\|\}+\rho\sum_{j=1}^{m}\|f_{j}(x)-z_{j}^{*}\|$ , $\rho>0$
m-TCH	$\min_{x\in\Omega}\quad g^{{\rm mtch}}(x\|w,z^{})=\max_{1\leq i\leq m}\left\{\frac{f_{i}(x)-z_{i}^{}}{w_{i}}\right\}$
r-TCH	$\max_{x\in\Omega}\quad g^{{\rm rtch}}(x\|w,z^{{\rm nad}})=\min_{1\leq i\leq m}\{w_{i}(z_{i}^{{\rm nad}}-f_{i}(x))\}$
$L_{p}$	$\min_{x\in\Omega}\quad g^{{\rm wd}}(x\|w,z^{})=(\sum_{i=1}^{m}(\frac{1}{w_{i}})^{p}(f_{i}(x)-z_{i}^{})^{p})^{\frac{1}{p}}$ , $p\geq 1$
MSF	$\min_{x\in\Omega}\quad g^{{\rm msf}}(x\|w,z_{i}^{})=\frac{\left[\max_{1\leq i\leq m}\left\{\frac{1}{w_{i}}(f_{i}(x)-z_{i}^{})\right\}\right]^{1+\beta}}{\left[\min_{1\leq i\leq m}\left\{\frac{1}{w_{i}}(f_{i}(x)-z_{i}^{*})\right\}\right]^{\beta}}$
$p$ -TCH	$\min_{x\in\Omega}\quad g^{{\rm ptch}}(x\|\lambda,z^{})=\max_{1\leq i\leq m}\left\{\frac{f_{i}(x)-z_{i}^{}}{\lambda_{i}}\right\}$ , $\lambda=(\lambda_{1},\ldots,\lambda_{m})$ with $\\|\lambda\\|_{p}=1$ and $\lambda_{i}>0,i\in\langle m\rangle$

We select six representative scalarization methods from Table 1 to give some illustrations.

1.

WS method: The WS method is also called the linear scalarization. It is probably the most commonly used scalarization technique for MOPs in traditional mathematical programming. It associates every objective function with a weighting coefficient and optimizes real-valued function of weighted sum of the objectives. A major difficulty of this approach is that if the POF is not convex, then there does not exist any weight vector such that the solution lies in the nonconvex part.
2.

TCH method: This approach uses preference information received from a decision maker to find the Pareto optimal solution. The preference information consists of a weight vector and an ideal point. It is used as an scalarization method in MOEA/D for continuous MOPs, because it can deal with both convex POFs and concave POFs. It is noteworthy that the final solution set obtained by MOEA/D with the TCH approach is not well-distributed along the regular POFs.
3.

PBI method: This is a direction-based decomposition approach, which uses two distance to reflect the assessment of convergence measured by $d_{1}$ and diversity measured by $d_{2}$ of population in MOEA/D, respectively (see Fig. 1(c)). The penalty parameter $\theta$ in PBI is used to control the balance between convergence and diversity. A drawback of this method is that the penalty parameter is to be properly tuned.
4.

r-TCH method: It is an inverted version of the TCH method, which is determined by a weight vector and a nadir point (see Fig. 1(d)). Compared with the TCH approach, the r-TCH approach is superior in diversity when solving a MOP whose POF has a shape of highly nonlinearity and convexity. However, its performance deteriorates on MOPs with highly nonlinear and concave POFs.
5.

$L_{p}$ method: Contour curves of this method with different $p$ values are shown in Fig. 1(e). In this approach, there is a trade-off dependent on the $p$ value between the search ability of the method and its robustness on POF geometries [57]. As the value of $p$ increases, the search ability of the associated $L_{p}$ scalarization approach decreases. Therefore, a strategy called Pareto adaptive scalarizing approximation is introduced to determine the optimal $p$ value. The WS and TCH can be derived by setting $p=1$ and $p=\infty$ , respectively [57].
6.

MSF method: A main feature of the MSF approach is the shape or positioning of its contour lines, which play a key role in search ability. As compared with the $L_{p}$ method, the opening angle of contour lines of MSF approach is less than $\pi/2$ and it is controlled by the parameter $\beta$ . The geometric figure of this approach is depicted in Fig. 1(f). With the increasing of $\beta$ , the geometric figures become closer to the used weight vector. Clearly, when $\beta=+\infty$ , the geometric figure overlaps with the weight vector and when $\beta=0$ , MSF degenerates to m-TCH.

When $m=2$ , the graphical interpretations of the six scalarization approaches and the positions of optimal solutions for subproblems are plotted in Fig. 1.

2.3 The Pascoletti-Serafini scalarization method

This subsection focuses on the scalarization method introduced by [44]. The scalar problem of the Pascoletti-Serafini (PS) scalarization method with respect to the ordering cone $R_{+}^{m}$ is defined as follows:

$\displaystyle\min$	$\displaystyle\quad t$	(2)
s.t.	$\displaystyle\quad a+tr-F(x)\in R^{m}_{+},$
	$\displaystyle\quad x\in\Omega,t\in R,$

where $a=(a_{1},\ldots,a_{m})$ and $r=(r_{1},\ldots,r_{m})$ are parameters selected from $R^{m}$ .

Remark 2.3.

(i)

The paremeters $a$ and $r$ in (2) are respectively described as a reference point and a direction in [16].
(ii)

Compared with the scalarization methods presented in Table 1, it is obvious that the PS method does not take into account the information on the weight vector and the ideal point or the nadir point.

The geometric interpretation of PS method presented in [16] or [33] is that the ordering cone $-R^{m}_{+}$ is moved in the direction $r$ (or $-r$ ) on the line $a+tr$ starting in the point $a$ until the set $(a+tr-R^{m}_{+})\cap F(\Omega)$ is reduced to the empty set, where $a+tr-R_{+}^{m}=\{a+tr-y:y\in R^{m}_{+}\}$ . The smallest value $\hat{t}$ for which $(a+\hat{t}r-R^{m}_{+})\cap F(\Omega)\neq\emptyset$ is the optimal value of (2). If the point pair $(\hat{t},\hat{x})$ is an optimal solution of (2), then $F(\hat{x})$ with $F(\hat{x})\in(a+\hat{t}r-R_{+}^{m})\cap F(\Omega)$ is an at least weakly Pareto optimal vector and by a variation of the parameters $(a,r)\in R^{m}\times R^{m}$ , all Pareto optimal vectors can be obtained. To have an intuitive glimpse of (2), we use the following Fig. 2 to give the graphical illustration of (2) in the case of $m=2$ .

Some interesting properties of this scalarization approach can be found in [16]. Here we concentrate on the following major result (see Theorem 2.1(b) and (c) in [16]), which gives the relation of solutions between (1) and (2).

Theorem 2.1.

(i)

If $\hat{x}$ is a Pareto optimal solution of (1), then $(\hat{x},0)$ is an optimal solution of (2) for the parameter $a=F(\hat{x})$ and for arbitrary $r\in R_{+}^{m}\backslash\{0_{m}\}$ .
(ii)

If $(\hat{x},\hat{t})$ is an optimal solution of (2), then $\hat{x}$ is a weakly Pareto solution of (1).

Remark 2.4.

As reported in [16], a direct result of Theorem 2.1 is that we can find all Pareto optimal solutions of (1) for a constant parameter $r\in R_{+}^{m}\backslash\{0_{m}\}$ by varying the parameter $a\in R^{m}$ only.

3 The proposed algorithm

In this section, the motivation on use of the PS method and an adaptive multi-reference points adjustment strategy for controlling the diversity is first discussed. Then we give specific answers to the questions proposed in motivation. Finally, the detailed implementation of the proposed algorithm is presented.

3.1 Motivation

It follows from these scalarization approaches given in Table 1 and the geometric interpretations shown in Fig. 1 that there is an interesting phenomenon, which we call “single-point and multi-directions”. This is attributed to the ideal point (or the nadir point) and the uniform weight vectors. Such a phenomenon results in the fact that the optimal solutions of all subproblems defined by uniform weight vectors in MOEA/D can form a good approximation for the regular POF. However, it usually leads to the unsatisfactory performance on solving MOPs with irregular POFs. Let us now take an example to illustrate this case. In the left part of Fig. 3, there are nine uniform weight vectors $w^{1},\ldots,w^{9}$ and an ideal point $z^{*}$ depicted by red solid point. The green, blue and red curves denote POF1, POF2 and POF3, respectively, and the black solid points stand for the optimal solutions for subproblems. It is clear to see that the solutions are roughly uniformly distributed along POF1 and POF2. However, when POF shape is highly nonlinear and convex like POF3 depicted in the left part of Fig. 3, uniform weighting strategy can not produce a set of evenly distributed solutions.

It can be said that the intersection points between a set of equidistant parallel lines and any curves are almost evenly distributed along the curves. Based on this fact, a fundamental idea comes up naturally: “multi-points and single-direction”. As shown in the right part of Fig. 3, there are nine equidistant parallel lines generated by nine red solid reference points and a direction. The black solid points are the intersection points between the nine equidistant parallel lines and three curves. Obviously, these intersection points are uniformly distributed along curve1–curve3.

Traces of this idea of studying the solutions involved in MOPs go back to the work of [10]. They proposed a direction-based decomposition approach, which is called normal boundary intersection (NBI). This approach can obtain the intersection points between the lower boundary part of the image set of the decision space and a series of straight lines defined by a normal vector and a group of evenly distributed points in the convex hull of individual minima (CHIM). Despite the intersection points generated by NBI are uniform, the method has limitations recognized by [10]. In particular, the NBI method may fail to cover the entire POF for problems with more than three objectives. In addition, this method can not guarantee the Pareto optimality of these intersection points. An advantage of the NBI method is that it is relatively insensitive to the scales of objective functions. [61] pointed out that NBI can not be easily used within MOEA/D framework because it has additional constraints. A meaningful attempt in their work is that they absorbed the strengthens of NBI and TCH, and introduced the NBI-style Tchebycheff method for biobjective optimization problem. We next briefly describe this method. Consider $m=2$ in (1) and let $F^{1}=(F_{1}^{1},F_{2}^{1})$ and $F^{2}=(F_{1}^{2},F_{2}^{2})$ be two extreme points of the POF of (1) in the objective space. The reference points $b^{i}$ , $i\in\langle N\rangle$ , are evenly distributed along the line segment linking $F^{1}$ and $F^{2}$ , i.e., $b^{i}=\xi_{i}F^{1}+(1-\xi_{i})F^{2}$ , where $\xi_{i}=\frac{N-i}{N-1}$ for $i\in\langle N\rangle$ . Therefore, the $i$ -th NBI-style Tchebycheff scalarizing subproblem is

\min\limits_{x\in\Omega}\quad g^{{\rm nbi\text{-}tch}}(x|b^{i},\gamma)=\max\{\gamma_{1}(f_{1}(x)-b_{1}^{i}),\gamma_{2}(f_{2}(x)-b_{2}^{i})\},

(3)

where $\gamma_{1}=|F^{2}_{2}-F^{1}_{2}|$ and $\gamma_{2}=|F^{2}_{1}-F^{1}_{1}|$ . The graphical illustation of the NBI-style Tchebycheff method is shown in Fig. 4. As we have seen, the optimal solutions to the above subproblems can be uniformly distributed along the POF. However, as pointed out by [34], the main weakness of the NBI-style Tchebycheff method is that it can not be extended to handle the MOPs with more than two objectives.

According to the descriptions in Subsection 2.3, it is worth mentioning that the setting of reference point and direction in the PS scalarization method is quite flexible. With the help of the flexibility of reference point and direction, [19] designed a new approximation algorithm to compute a box-coverage of the POF. We observe from their work that the reference points (the lower bounds of some boxes) and the directions (the differences between the lower bounds and the upper bounds of these boxes) are dynamically updated. Based on the flexibility of parameters in the PS scalarization method, the idea (i.e., “multi-points and single-direction”) and the decomposition strategy [60], the primary motivation of this paper is to decompose a target MOP into a number of PS scalar optimization subproblems which have different reference points and same direction, and optimize them simultaneously in a collaborative manner. For example, in the right part of Fig. 3, there are nine decomposed PS scalar optimization subproblems defined the reference points depicted by the red solid points and the same direction depicted by the dotted line. From the graphical illustration in Subsection 2.3, the black solid points are the optimal solutions of all subproblems and they are uniformly distributed. Although this way can approximate continuous POF very well, it may also face the same dilemma as MOEA/D [60], that is, the performance degrades on MOPs with discontinuous POF or even degenerate POF. Therefore, in our proposed algorithm, the reference points related to the subproblems can be properly adjusted to obtain better performance in diversity. Given the above-mentioned descriptions, three inevitable questions are:

(1)

How can we embed the PS scalarization method into the MOEA/D framework?
(2)

How to set the multi-reference points and the direction?
(3)

How to adaptively adjust the multi-reference points?

The following three subsections present the solutions for these questions.

3.2 The transformation of PS scalarization method

It is obvious that (2) cannot be easily used within the framework of decomposition-based multiobjective evolutionary algorithms since it introduces extra constraints. However, when $r_{i}>0$ , $i\in\langle m\rangle$ , (2) is equivalent to the following optimization problem:

$\displaystyle\min$	$\displaystyle\quad t$	(4)
s.t.	$\displaystyle\quad t\geq\frac{f_{i}(x)-a_{i}}{r_{i}},i\in\langle m\rangle,$
	$\displaystyle\quad x\in\Omega,t\in R.$

Note that, using a standard trick from mathematical programming, (4) can be rewritten as the following minimax optimization problem:

\min\limits_{x\in\Omega}\quad g^{{\rm ps}}(x|a,r)=\max\limits_{1\leq i\leq m}\left\{\frac{1}{r_{i}}(f_{i}(x)-a_{i})\right\}.

(5)

An advantage of (5) is that it can be used as a scalarizing function in MOEA/D framework. Obviously, Theorem 2.1 also holds for (5).

Remark 3.1.

Notice that, when $m=2$ in (5), if $a=b^{i}$ and $r=\gamma$ , then (5) reduces to the $i$ -th NBI-style Tchebycheff subproblem (3). In addition, some scalarization approaches used in MOEA/D framework are the special cases of (5) by selecting suitable values for parameters (see Table 2).

Table 2: The special cases of (5).

Parameters in (5)	Special cases
$a_{i}=z_{i}^{*}$ , $r_{i}=\frac{1}{w_{i}}$ , $i\in\langle m\rangle$	TCH
$a_{i}=z_{i}^{*}$ , $r_{i}=w_{i}$ , $i\in\langle m\rangle$	m-TCH
$a_{i}=z_{i}^{*}$ , $r_{i}=\lambda_{i}$ , $\\|\lambda\\|_{p}=1$ and $\lambda_{i}>0,i\in\langle m\rangle$	$p$ -TCH
$a_{i}=z_{i}^{*}$ , $r_{i}=(\sum_{i=1}^{m}\frac{1}{w_{i}})/\frac{1}{w_{i}}$ , $i\in\langle m\rangle$	[45]

3.3 The setting of multi-reference points and direction

Our aim is to obtain a good approximation of the true POF for a given MOP by solving (5) for some parameters. From Remark 2.4, all Pareto optimal solutions of (1) can be obtained for a fixed $r\in R_{+}^{m}\backslash\{0_{m}\}$ by varying $a\in R^{m}$ . However, it seems unnecessary to choose the parameters $a$ in the whole space $R^{m}$ . We observe that, Theorem 2.11 in [16] points out that the parameter $a$ only need to be varied in a hyperplane. We now present the result as follows:

Theorem 3.1.

Let $\hat{x}\in\Omega$ be a Pareto optimal solution of (1) and define a hyperplane

\mathcal{H}=\{y\in R^{m}:\varsigma^{\top}y=\kappa\}

with $\varsigma\in R^{m}\backslash\{0_{m}\}$ and $\kappa\in R$ . Let $r\in R_{+}^{m}$ with $\varsigma^{\top}r\neq 0$ be arbitrarily given. Then there exist a parameter $a\in\mathcal{H}$ and some $\hat{t}\in R$ such that $(\hat{x},\hat{t})$ is an optimal solution of (2).

Remark 3.2.

(i)

Evidently, Theorem 3.1 also holds for (5).
(ii)

Theorem 3.1 shows that it suffices to select the parameter $r\in R_{+}^{m}\backslash\{0_{m}\}$ as constant and to vary the parameter $a$ only in the hyperplane $\mathcal{H}$ .

In order to make the proposed algorithm having better performance in diversity, we can choose $r$ as the normal vector of the hyperplane $\mathcal{H}$ (i.e., $\varsigma=r$ ) and $\kappa=0$ . More specifically, let $\varsigma=r=1_{m}$ and $\kappa=0$ (note that this is not the only option). In this case, $\mathcal{H}$ becomes the hyperplane

\mathcal{H}_{0}=\left\{y=(y_{1},\ldots,y_{m})\in R^{m}:\sum_{i=1}^{m}y_{i}=0\right\}

which passes through the origin. Then a set of reference points should be uniformly sampled on $\mathcal{H}_{0}$ . In this way, different Pareto optimal solutions can be obtained by solving (5). This way of sampling reference points on the entire $\mathcal{H}_{0}$ seems to be unwise, because it may lead to many redundant reference points. However, the range of true POFs of most test instances in the filed of evolutionary multiobjective optimization belong to $[0,1]^{m}$ . Now, it is assumed that $P$ is the set of projection points of all vertices of the hypercube $[0,1]^{m}$ on $\mathcal{H}_{0}$ and let $\mathcal{\tilde{H}}_{0}$ be the convex hull of $P$ . An example is shown in the left part of Fig. 5, $P=\{a^{1},a^{2},a^{3}\}$ and

\mathcal{\tilde{H}}_{0}=\left\{y\in R^{2}:y=\lambda_{1}a^{1}+\lambda_{2}a^{3},\lambda_{1},\lambda_{2}\in[0,1]\right\}.

The right part of Fig. 5 is the case of $m=3$ , i.e., $P=\{a^{1},a^{2},\ldots,a^{7}\}$ and

\mathcal{\tilde{H}}_{0}=\left\{y\in R^{3}:y=\sum_{i=1}^{6}\lambda_{i}a^{i},\lambda_{i}\in[0,1],i\in\langle 6\rangle\right\}.

Therefore, we only need to sample a set of uniformly distributed reference points on $\mathcal{\tilde{H}}_{0}$ . It should be noted that in this paper the uniformly distributed reference points on $\mathcal{\tilde{H}}_{0}$ are obtained by the following two steps:

(1)

Equidistant partition. When $m=2$ , we divide the interval $[0,1]$ on each axis into $l$ equal parts, and then $2l+1$ base points are obtained. The left part of Fig. 6 shows an example consisting of 9 base points marked by the blue solid points for $l=4$ . When $m=3$ , we also divide $[0,1]$ on each axis into $l$ equal parts. Then the coordinates of these base points are exchanged and $3l^{2}+3l+1$ new base points are obtained (see the blue solid points in the left and middle parts of Fig. 7).
(2)

Projection. The base points are projected into the hyperplane $\mathcal{H}_{0}$ , and then these projection points are regarded as the reference points. Obviously, the convex hull formed by these reference points is $\mathcal{\tilde{H}}_{0}$ . The right part of Fig. 6 shows the 9 uniformly distributed reference points for $l=4$ and the right part of Fig. 7 shows an example consisting of 19 uniformly distributed reference points for $l=2$ .

We would like to emphasize that, if we consider the process of coordinate exchange for the case of $m=2$ , then we can obtain the same reference points. Hence, we omit this process here. The aforementioned method can also be used for $m\geq 4$ . In general, the range of true POFs of some test problems and real-world MOPs is not necessarily belong to $[0,1]^{m}$ . It is apparent that the lines related to the reference points generated by the above-mentioned method cannot cover the whole POF. Herein, the normalization technique mentioned in [60] is adopted to tackle this issue. After normalization, all solutions in current generation are normalized to points in $[0,1]^{m}$ in the normalized objective space. Therefore, the update of solutions can be performed by the following scalar optimization subproblem:

\min\limits_{x\in\Omega}\quad\tilde{g}^{{\rm ps}}(x|a,r,z^{*},z^{{\rm nad}})=\max\limits_{1\leq i\leq m}\left\{\frac{1}{r_{i}}\left(\frac{f_{i}(x)-z_{i}^{*}}{z_{i}^{{\rm nad}}-z_{i}^{*}}-a_{i}\right)\right\}.

(6)

Remark 3.3.

It is noteworthy that the objective normalization, as a transformation that maps the objective values from a scaled objective space onto the normalized objective space, changes the actual objective values, but does not affect the evaluation of $x$ by the scalarizing function value. Consequently, for $r=1_{m}$ and $a\in\mathcal{\tilde{H}}_{0}$ , the optimal solutions between (5) and (6) are equivalent.

Remark 3.4.

If the true ideal point $z^{*}$ and the true nadir point $z^{{\rm nad}}$ in (6) are not available, then we use the best value among all the examined solutions so far and worst value among the current population to assign values to $z^{*}$ and $z^{{\rm nad}}$ , respectively.

We conclude this subsection by giving a brief discussion between the way of generating weights in MOEA/D and the reference points generation technique in our method. In MOEA/D, a set of weights evenly distributed on $m$ -dimensional unit simplex is used, which is usually generated by the lattice method introduced in [10]. However, in our proposed method, the reference points uniformly distributed on the set $\mathcal{\tilde{H}}_{0}$ , which are generated by the techniques of equidistant partition and projection. The set $\mathcal{\tilde{H}}_{0}$ has a larger range than the simplex and can better cover the space $[0,1]^{m}$ from the diagonal perspective.

3.4 The adaptation of multi-reference points

It is noteworthy that a set of equidistant parallel lines generated by the aforesaid method can cover the whole space $[0,1]^{m}$ . However, if some straight lines, formed by the direction $r$ and some reference points, do not intersect the POF, then it follows from the geometric interpretation of the PS method in Subsection 2.3 that the same solution may be obtained for some subproblems or many solutions concentrate on the boundary or discontinuous location of the true POF. Taking Fig. 8(a) as an example, the subproblems related to the reference points $a^{1}$ , $a^{2}$ and $a^{3}$ have the same Pareto optimal vector $F(\hat{x})$ . This issue greatly affects the performance of the algorithm.

To improve algorithmic performance, it is imperative to consider when, where and how to adjust the reference points. Early adjustment of reference points could be unnecessary and ineffective because the population does not provide a good approximation for POF at early generation. A better approach would be to trigger the adjustment only when the population has roughly reached the POF in later generation. Herein, we use the evolutionary rate $\varepsilon$ to assist the reference points adaptation. When the ratio of iteration numbers to the maximal number of generations is $\varepsilon$ , the adaptation of reference points will be started. In the context, a reference point is regarded as a promising one if it is associated with one or more solutions. On the other hand, a reference point is marked as unpromising if it is not associate with any solution, and then the marked unpromising reference point should be deleted. For example, for the discontinuous POF shown in Fig. 8(a), the six red solid reference points are promising and three blue solid are unpromising. Similarly, for the simplex POF shown in Fig. 8(b) and the degenerate POF shown in Fig. 8(c), there also exist some promising and unpromising reference points. Additional reference points should be generated by utilizing the promising ones so as to keep a prespecified number of reference points. Overall, the key ingredients of adaptive multi-reference points adjustment are how to identify the promising reference points and to add the new ones. The specific implementation schemes will be discussed detailedly in Subsection 3.5.4.

3.5 The description of MOEA/D-AMR

Based on the previous discussions, we are now in a position to propose a new algorithm called MOEA/D-AMR, which integrates the PS scalarization method and the adaptation of multi-reference points into the framework of MOEA/D-DE [36]. Its pseudo-code is presented in Algorithm 1.

Input: MOP (1),

l

: the number of division on each aixs,

T

: neighborhood size,

n^{{\rm rep}}

: replacement size,

G_{\max}

: the maximal number of generations.

Output: Approximate POS:

X=\{x^{1},\ldots,x^{N}\}

, approximate POF:

F(X)=\{F(x^{1}),\ldots,F(x^{N})\}

[L,B,X,F(X),z^{*}]\leftarrow

Initialization();

2 gen

\leftarrow

3 while gen $\leq$ $G_{\max}$ do

4 for $i\leftarrow 1$ to $N$ do

5 if ${\rm rand}[0,1]\leq\delta$ then

V\leftarrow B^{i}

;

8 else

V\leftarrow\langle N\rangle

;

11 end if

12 Randomly select two indexes

v_{1}

and

v_{2}

from

V

;

y\leftarrow

Reproduction-repair(

x^{i},x^{v_{1}},x^{v_{2}}

);

14 Evaluate the function value of new solution

y

, update

z^{*}

and

z^{{\rm nad}}

;

c\leftarrow 1

;

16 while $c\leq n^{{\rm rep}}$ and $V\neq\emptyset$ do

17 Randomly select an index

j

from

V

, and

V\leftarrow V\backslash\{j\}

;

18 if $\tilde{g}^{{\rm ps}}(y|a^{j},r,z^{*},z^{{\rm nad}})\leq\tilde{g}^{{\rm ps}}(x^{j}|a^{j},r,z^{*},z^{{\rm nad}})$ then

x^{j}\leftarrow y

c\leftarrow c+1

;

21 end if

23 end while

25 end for

26 if ${\rm gen}=\varepsilon G_{\max}$ then

[L^{{\rm pro}},I^{{\rm pro}},X^{{\rm pro}}]\leftarrow

Identifying-promising-reference-points

(L,X)

;

28 if $|L^{{\rm pro}}|=N$ then

L\leftarrow L^{{\rm pro}}

X\leftarrow X^{{\rm pro}}

;

30 Update

B

using

L

;

32 else

[L,X,B]\leftarrow

Adding-new-reference-points

(L^{{\rm pro}},I^{{\rm pro}},X^{{\rm pro}})

;

34 end if

36 end if

{\rm gen}\leftarrow{\rm gen}+1

;

39 end while

Algorithm 1 MOEA/D-AMR Framework

Some important components of MOEA/D-AMR such as initialization (line 1 of Algorithm 1), reproduction and repair (lines 5–11 of Algorithm 1), update of solutions (lines 12–19 of Algorithm 1) and adjustment of multi-reference points (lines 21–29 of Algorithm 1) will be illustrated in detail in the following subsections.

3.5.1 Initialization

The first step is to create a set of $N$ reference points via the method introduced in Subsection 3.3. The set of all reference points is denoted by $L=\{a^{1},\ldots,a^{N}\}$ . Secondly, we compute the Euclidean distance between any two reference points and then work out the $T$ closest reference points to each reference point. For each $i\in\langle N\rangle$ , set the neighborhood index list $B^{i}=\{i_{1},\ldots,i_{T}\}$ , where $a^{i_{1}},\ldots,a^{i_{T}}$ are the $T$ closest reference points to $a^{i}$ . All the neighborhood index lists are defined as $B=\{B^{1},\ldots,B^{N}\}$ . Thirdly, an initial population $X=\{x^{1},\ldots,x^{N}\}$ , where $N$ is the population size, is generated by uniformly sampling from the decision space $\Omega$ . The function values $F(x^{i})$ are calculated for every $x^{i}$ , $i\in\langle N\rangle$ and let $F(X)=\{F(x^{1}),\ldots,F(x^{N})\}$ . Finally, the ideal point $z^{*}=(z^{*}_{1},\ldots,z^{*}_{m})$ is initialized by setting $z_{j}^{*}=\min_{1\leq i\leq N}f_{j}(x^{i})$ , $j\in\langle m\rangle$ .

3.5.2 Reproduction and repair

A probability parameter $\delta$ is used to choose a mating pool $V$ from either the neighborhood of solutions or the whole population (lines 5–9 in Algorithm 1). The diffential evolution (DE) mutation operator and the polynomial mutation (PM) operator are used in this paper to produce an offspring solution from $x^{i}$ , $i\in\langle N\rangle$ , which are also considered in MOEA/D-DE [36]. The DE operator generates a candidate solution $\bar{y}=(\bar{y}_{1},\ldots,\bar{y}_{n})$ by

\bar{y}_{k}=\left\{\begin{array}[]{ll}x_{k}^{i}+SF\times(x_{k}^{v_{1}}-x_{k}^{v_{2}}),&\hbox{if ${\rm rand}[0,1]<CR$},\\ x_{k}^{i},&\hbox{otherwise,}\end{array}\right.

where $v_{1},v_{2}$ are randomly selected from the mating pool $V$ , $\bar{y}_{k}$ is the $k$ -th component of $\bar{y}$ , $k\in\langle n\rangle$ , $SF$ is the scale factor, $CR$ is the cross rate and ${\rm rand}[0,1]$ is a uniform random number chosen from $[0,1]$ . The PM operator is applied to generate a solution $y=(y_{1},\ldots,y_{m})$ from $\bar{y}$ in the following way:

y_{k}=\left\{\begin{array}[]{ll}\bar{y}_{k}+\sigma_{k}\times(u_{k}-l_{k}),&\hbox{if ${\rm rand}[0,1]<p_{m}$},\\ \bar{y}_{k},&\hbox{otherwise}\end{array}\right.

with

\sigma_{k}=\left\{\begin{array}[]{ll}(2\times{\rm rand}[0,1])^{\frac{1}{1+\eta}}-1,&\hbox{if ${\rm rand}[0,1]<0.5$},\\ 1-(2-2\times{\rm rand}[0,1])^{\frac{1}{1+\eta}},&\hbox{otherwise,}\end{array}\right.

where the distribution index $\eta>0$ and the mutation rate $p_{m}\in[0,1]$ are two control parameters. It is not always guaranteed that the new solution $y$ generated by reproduction belongs to the decision space $\Omega$ . When a component of $y$ is out of the boundary of $\Omega$ , a repair strategy is applied to $y$ such that $y\in\Omega$ , i.e.,

y_{k}=\left\{\begin{array}[]{ll}l_{k},&\hbox{if $y_{k}<l_{k}$},\\ u_{k},&\hbox{if $y_{k}>u_{k}$,}\\ y_{k},&\hbox{otherwise.}\\ \end{array}\right.

3.5.3 Update of solutions

After $y$ is generated, the procedure of updating solutions is performed, as shown in the lines 12–19 of Algorithm 1. First, the ideal point $z^{*}$ should be updated by $y$ , i.e., for any $j\in\langle m\rangle$ , if $z_{j}^{*}>f_{j}(y)$ , then set $z_{j}^{*}=f_{j}(y)$ (line 12 of Algorithm 1). Then the nadir point $z^{{\rm nad}}$ is calculated by $z_{j}^{{\rm nad}}=\max_{1\leq i\leq N}f_{j}(x^{i})$ for each $j\in\langle m\rangle$ . As mentioned in Remark 3.4, $z^{*}$ is the best value among all the examined solutions so far and $z^{{\rm nad}}$ is the worst value among the current population. Next, an index $j$ is randomly selected from $V$ and $j$ is subsequently deleted from $V$ (line 15 of Algorithm 1). Moreover, the individual $x^{j}$ is compared with the offspring $y$ based on the scalarization function $\tilde{g}^{{\rm ps}}$ defined in (6). If $y$ is better than $x^{j}$ according to their scalarizing function values, then $x^{j}$ is replaced with $y$ (line 17 of Algorithm 1). $c$ is used to count the number of solutions replaced by $y$ . If $c$ reaches the replacement size $n^{\rm rep}$ , then the procedure of updating solutions terminates (line 14 of Algorithm 1).

3.5.4 Adjustment of multi-reference points

As analyzed in Subsection 3.4, we need to adjust the reference points adaptively in the later process of algorithm. Therefore, the evolutionary rate $\varepsilon$ is used to adaptively control evolutionary generations (line 21 of Algorithm 1). The adjustment strategy of multi-reference points consists of two parts:

(i)

The identification of promising reference points (Algorithm 2);
(ii)

The addition of new reference points (Algorithm 3).

For (i), we first need to give a distance criteria by virtue of the original reference points obtained by initialization process, i.e., the minimum distance $d^{L\to L}_{\min}$ between any two reference points in $L$ (lines 1–7 of Algorithm 2). Secondly, the current population is normalized by the ideal point and the nadir point obtained by line 12 of Algorithm 1, and then project the normalized points onto the set $\mathcal{\tilde{H}}_{0}$ where the original reference points are located (line 9 of Algorithm 2). Here, we denote the set of all obtained projection points as the set $Q=\{q^{1},\ldots,q^{N}\}$ . Thirdly, we need obtain a distance matrix $D=(d^{L\to Q}_{ij})_{N\times N}$ , where $d^{L\to Q}_{ij}$ stands for the distance between the $i$ -th reference point in $L$ and the $j$ -th projection point in $Q$ (lines 10–14 of Algorithm 2). The next step is to find the minimum value of each row in matrix $D$ and to denote it as $d_{\min}^{L\to Q}(i)$ , $i\in\langle N\rangle$ . In other words, for each reference point $a^{i}$ in $L$ , we need to find a projection point closest to $a^{i}$ from $Q$ and denote the distance between them as $d_{\min}^{L\to Q}(i)$ . Finally, if $d_{\min}^{L\to Q}(i)$ is less than $d^{L\to L}_{\min}$ , then the $i$ -th reference point is recognized as a promising reference point and it is stored in a new set $L^{{\rm pro}}$ , and its associated index and solution are preserved in $I^{{\rm pro}}$ and $X^{{\rm pro}}$ , respectively (lines 16–20 of Algorithm 2). To elaborate on the process, we explain it with an example.

Input:

L=\{a^{1},\ldots,a^{N}\}

: the original reference points,

X

: the current population.

Output: The promising reference points set

L^{{\rm pro}}

, and its associated index set

I^{{\rm pro}}

and the population

X^{{\rm pro}}

1 for $i\leftarrow 1$ to $N$ do

d^{L\to L}_{ii}=+\infty

;

3 for $j\leftarrow 1$ to $N$ do

d^{L\text{-}L}_{ij}=\|a^{i}-a^{j}\|

;

6 end for

8 end for

d^{L\to L}_{\min}=\min_{1\leq i,j\leq N}d^{L\to L}_{ij}

;

10 Normalize the function values of the current population;

11 Project these normalized points into

\mathcal{H}_{0}

, and then obtain the set

Q=\{q^{1},\ldots,q^{N}\}

of all the projection points;

12 for $i\leftarrow 1$ to $N$ do

13 for $j\leftarrow 1$ to $N$ do

d^{L\to Q}_{ij}=\|a^{i}-q^{j}\|

;

16 end for

18 end for

L^{{\rm pro}}=\emptyset

I^{{\rm pro}}=\emptyset

X^{{\rm pro}}=\emptyset

;

20 for $i\leftarrow 1$ to $N$ do

d^{L\to Q}_{\min}(i)=\min_{1\leq j\leq N}d^{L\to Q}_{ij}

;

22 if $d^{L\to Q}_{\min}(i)\leq d^{L\to L}_{\min}$ then

I^{{\rm pro}}\leftarrow I^{{\rm pro}}\cup\{i\}

L^{{\rm pro}}\leftarrow L^{{\rm pro}}\cup\{a^{i}\}

X^{{\rm pro}}\leftarrow X^{{\rm pro}}\cup\{x^{i}\}

;

25 end if

27 end for

Algorithm 2 Identifying-promising-reference-points

In the left part of Fig. 9, the red solid points $a^{1},\ldots,a^{7}$ stand for all original reference points, the black solid points $x^{1},\ldots,x^{7}$ represent all normalized points and the blue solid points $q^{1},\ldots,q^{7}$ denote all projection points. The closest points to $a^{1},a^{2},a^{3},a^{4},a^{5},a^{6},a^{7}$ are $q^{1},q^{1},q^{2},q^{3},q^{5},q^{7},q^{7}$ , respectively. It is clear to see that $d_{\min}^{L\to Q}(1)=\|a^{1}-q^{1}\|>d_{\min}^{L\to L}$ and $d_{\min}^{L\to Q}(7)=\|a^{7}-q^{7}\|>d_{\min}^{L\to L}$ . Therefore, the reference points $a^{1}$ and $a^{7}$ are regarded as unpromising ones and they are subsequently dropped. The other reference points are marked as promising ones.

If the cardinality of $L^{{\rm pro}}$ obtained by Algorithm 2 is equal to $N$ , then all reference points are deemed as promising ones. Otherwise, some new reference points need to be generated by using the elements of $L^{{\rm pro}}$ . Put another way, Algorithm 3 needs to be executed.

Input:

L^{{\rm pro}}

: the promising reference points,

I^{{\rm pro}}

: the associated index set,

X^{{\rm pro}}

: the associated population.

Output: The adjusted population

X

, the updated reference points

L

, the updated neighborhood index set

B

1 for $i\leftarrow 1$ to $\langle N\rangle\setminus I^{{\rm pro}}$ do

X\leftarrow X^{{\rm pro}}\cup\{x^{i}\}

;

4 end for

5Find any two adjacent reference points in

L^{{\rm pro}}

to form a point pair, and then construct the point pairs set

\Lambda

whose elements do not repeat each other;

6 while $|L^{{\rm pro}}|\leq N$ do

7 if $N-|L^{{\rm pro}}|\leq|\Lambda|$ then

8 Randomly select

N-|L^{{\rm pro}}|

point pairs to form set

\bar{\Lambda}

from

\Lambda

;

9 for $k\leftarrow 1$ to $|\bar{\Lambda}|$ do

a^{{\rm new}}=\frac{a_{1}^{\Lambda(k)}+a_{2}^{\Lambda(k)}}{2}

;

L^{{\rm pro}}\leftarrow L^{{\rm pro}}\cup\{a^{{\rm new}}\}

;

13 end for

15 else

16 for $k\leftarrow 1$ to $|\Lambda|$ do

a^{{\rm new}}=\frac{a_{1}^{\Lambda(k)}+a_{2}^{\Lambda(k)}}{2}

;

L^{{\rm pro}}\leftarrow L^{{\rm pro}}\cup\{a^{{\rm new}}\}

;

20 end for

21 Update the point pairs set

\Lambda

using

L^{{\rm pro}}

and Step 7;

23 end if

25 end while

L\leftarrow L^{{\rm pro}}

;

27 Update

B

using the updated reference points

L

;

Algorithm 3 Adding-new-reference-points

Therefore, for (ii), firstly, we need find any two adjacent reference points in $L^{{\rm pro}}$ to form a point pair, and the set of all point pairs is denoted by $\Lambda$ where the elements do not repeat (line 4 of Algorithm 3). Next, we consider the following two cases:

Case 1. If $N-|L^{{\rm pro}}|$ is smaller than $|\Lambda|$ , then we randomly choose $N-|L^{{\rm pro}}|$ elements from $\Lambda$ and use the midpoint of these selected elements to form new reference points (lines 7–11 of Algorithm 3). For example, in the left part Fig. 9, we can obtain $L^{{\rm pro}}=\{a^{2},a^{3},a^{4},a^{5},a^{6}\}$ and the point pair set $\Lambda=\{(a^{2},a^{3}),(a^{3},a^{4}),(a^{4},a^{5}),(a^{5},a^{6})\}$ . Obviously, $N-|L^{{\rm pro}}|<|\Lambda|$ , then two elements are randomly selected from $\Lambda$ (we suppose that they are $(a^{2},a^{3})$ and $(a^{5},a^{6})$ ). Then $\bar{\Lambda}=\{(a^{2},a^{3}),(a^{5},a^{6})\}$ . According to lines 9–10 of Algorithm 3, new reference points $a^{{\rm new1}}=\frac{a^{2}+a^{3}}{2}$ and $a^{{\rm new2}}=\frac{a^{5}+a^{6}}{2}$ are added into $L^{{\rm pro}}$ (see the right part of Fig. 9 and $a^{{\rm new1}},a^{{\rm new2}}$ are marked by the green solid points).

Case 2. If $N-|L^{{\rm pro}}|$ is strictly bigger than $|\Lambda|$ , then all elements in $\Lambda$ need to be selected to generate new reference points, and then these newly generated points are added into $L^{{\rm pro}}$ . Next, according to the obtained set $L^{{\rm pro}}$ and line 4 of Algorithm 3, we reconstruct the point pair set $\Lambda$ and then repeat lines 13–17 until $|L^{{\rm pro}}|=N$ . For example, similar to the above analysis, we can obtain $L^{{\rm pro}}=\{a^{1},a^{2},a^{6},a^{7}\}$ and the point pair set $\Lambda=\{(a^{1},a^{2}),(a^{6},a^{7})\}$ in the left part of Fig. 10.

Clearly, $N-|L^{{\rm pro}}|>|\Lambda|$ . At the first cycle, all elements in $\Lambda$ are selected to generate new reference points, i.e., $a^{{\rm new1}}=\frac{a^{1}+a^{2}}{2}$ and $a^{{\rm new2}}=\frac{a^{6}+a^{7}}{2}$ . After the first cycle, the reference points set becomes $L^{{\rm pro}}=\{a^{1},a^{2},a^{6},a^{7},a^{{\rm new1}},a^{{\rm new2}}\}$ (see the middle part of Fig. 10) and the point pair set becomes $\Lambda=\{(a^{1},a^{{\rm new1}}),(a^{{\rm new1}},a^{2}),(a^{6},a^{{\rm new2}}),(a^{{\rm new2}},a^{7})\}$ . Obviously, $|L^{{\rm pro}}|<N$ and $N-|L^{{\rm pro}}|<|\Lambda|$ . At the second cycle, only one element is randomly selected from $\Lambda$ and it is assumed that $(a^{{\rm new2}},a^{7})$ . Therefore, a new reference point $a^{{\rm new3}}=\frac{a^{{\rm new2}}+a^{7}}{2}$ is added into $L^{{\rm pro}}$ (see the right part of Fig. 10).

4 Experimental studies

This section is devoted to the experimental studies for the verification of the performance of the proposed algorithm. We compare it with four existing state-of-the-art algorithms: MOEA/D-DE [36], NSGA-III [11], RVEA* [8] and MOEA/D-PaS [57]. MOEA/D-DE possesses the TCH scalarization method and it is a representative steady-state decomposition-based MOEA. NSGA-III is an extension of NSGA-II, which maintains the diversity of population via decomposition. The main characteristic of RVEA* is the adaptive strategy of weight vectors. MOEA/D-PaS is equipped with $L_{p}$ scalarization method and it uses a Pareto adaptive scalarizing approximation to obtain the optimal $p$ value. Before presenting the results, we first give the experimental settings in the next subsection.

4.1 Experimental settings

(1)

Benchmark problems. In this paper, a set of benchmark test problems with a variety of representative POFs is used. Moreover, we also construct two modified test problems based on ZDT1 [62] and IDTLZ2 [27]. The first test instance denoted as F1 has the following form:

		$\displaystyle\text{min}\quad\left(\begin{array}[]{cc}g(x)\left(1-\frac{1}{1+e^{-10x_{1}}}\right)\\ x_{1}\end{array}\right)$
		$\displaystyle\text{s.t.}\quad\;\;x\in[-1,1]\times[0,1]^{n-1},$

where $g(x)=1+\frac{9\sum_{i=2}^{n}x_{i}}{n-1}$ . The mathematical description of the second test problem is as follows:

		$\displaystyle\text{min}\quad\left(\begin{array}[]{cc}((1+g(x))-(1+g(x))\cos(0.5\pi x_{1})\cos(0.5\pi x_{2}))^{1.8}\\ ((1+g(x))-(1+g(x))\cos(0.5\pi x_{1})\sin(0.5\pi x_{2}))^{1.8}\\ ((1+g(x))-(1+g(x))\sin(0.5\pi x_{2}))^{1.8}\\ \end{array}\right)$
		$\displaystyle\text{s.t.}\quad\;\;x\in[0,1]^{n},$

where $g(x)=\sum_{i=3}^{n}(x_{i}-0.5)^{2}$ . Note that the parameter in the variant of ZDT1 (termed as mZDT1) introduced in [45] is set as $M=0.5$ and the variable space of DTLZ1 is set as $[0.0001,0.9999]^{n}$ . Other configures of all these problems are described in the corresponding literature. Table 3 provides a brief summary of these problems.

Table 3: Test Problems.

m

and

n

denote the number of objectives and decision variables, repectively.

m

Problem

n

The POF shape

m=2

ZDT1 [62]

mZDT1 [45]

GLT3 [23]

SCH1 [53]

ZDT3 [62]

GLT1 [23]

Simplex-like, Convex

Highly nonlinear, Concave

Highly nonlinear (piecewise linear), Convex

Highly nonlinear, Convex

Highly nonlinear, Nonconvex-nonconcave

Disconnected

m=3

DTLZ1 [13]

DTLZ2 [13]

DTLZ5 [13]

VNT2 [54]

DTLZ7 [13]

IDTLZ1 [27]

IDTLZ2 [27]

7
12
12
2
15
7
12
12

Simplex-like, Linear

Simplex-like, Concave

Degenerate, Concave

Degenerate, Convex

Disconnected

Inverted , Linear

Inverted, Concave

Inverted, Highly concave

(2)
Parameter settings. In MOEA/D-AMR, the number of division on each axis is $l=50$ for $m=2$ and $l=10$ for $m=3$ . Therefore, we set the population size as $N=101$ and 331 for biobjective and triobjective MOPs, respectively. To make all algorithms comparable, the population size for the other four algorithms are the same as MOEA/D-AMR and the initial weights are kept the same for the four compared algorithms. The maximum number of generations of all algorithms is set as $G_{\max}=500$ on all the test problems. The other parameter settings of MOEA/D-AMR are listed as follows:
- (a)
  
  the neighborhood size: $T=20$ ;
- (b)
  
  the probability of selecting parent from the neighborhood: $\delta=0.9$ ;
- (c)
  
  the replacement size: $n^{{\rm rep}}=2$ ;
- (d)
  
  the control parameters in DE operator: $SF=0.5$ and $CR=1$ ;
- (e)
  
  the parameters in PM operator: $p_{m}=1/n$ and $\eta=20$ ;
- (f)
  
  the evolutionary rate: $\varepsilon=0.8$ .
The parameters $T$ , $\delta$ , $n^{{\rm rep}}$ , $SF$ , $CR$ , $p_{m}$ and $\eta$ in MOEA/D-DE and MOEA/D-PaS share the same settings with the MOEA/D-AMR. The rate of change of penalty and the frequency of employing weight vector adaptation in RVEA* are set as $\alpha=2$ and $f_{r}=0.1$ , respectively.
(3)

Performance metrics. Various performance metrics have been summarized in [4] for measure the quality of POF approximations. In this paper, two widely used performance metrics in MOEAs, the inverted generational distance (IGD) [64] and Hypervolume (HV) [63], are utilized to measure the obtained solution sets in terms of diversity. In the calculation of IGD, we select roughly 1000 scattered points evenly distributed in the true POF for all biobjective test problems and 5000 for the test problems with three objectives. All the objective function values are normalized by the ideal and nadir points of the POF before calculating HV metric. Then, the normalized HV value of the solution set is computed with a reference point $(1.1,1.1,\ldots,1.1)$ . Every algorithm is run 30 times independently for each test problem. The mean and standard deviation values of the IGDs and HVs are calculated and listed in tables, where the results of best mean for each problem are highlighted with gray background. Furthermore, the Wilcoxon rank sum test with a significance level of 0.05 is adopted to perform statistics analysis on the experimental results, where the symbols ’ $+$ ’, ’ $\approx$ ’ and ’ $-$ ’ denote that the result of other algorithms is significantly better, statistically similar and significantly worse to that of MOEA/D-AMR, respectively.

4.2 Experimental results and analysis

The quantitative results on mean and standard deviation values of the performance indicators obtained by the five algorithms on these test instances are summarized in Tables 4.2 and 4.2.

Parameter	Description	Value	Unit
$\varrho_{b}$	Calculated bending stresses	$4500/(x_{1}x_{2})$	kg/cm²
$\varrho_{b,{\rm max}}$	Maximum allowable bending stresses	700	kg/cm²
$\tau$	Calculated shearing stresses	$1800/x_{2}$	kg/cm²
$\tau_{{\rm max}}$	Maximum allowable shearing stresses	450	kg/cm²
$\vartheta$	Deflections at the mid of cover	$562000/(Ex_{1}x_{2}^{2})$	cm
$\vartheta_{{\rm max}}$	Maximum allowable deflections	1.5	cm
$\varrho_{k}$	The buckling stresses of the flanges	$Ex_{1}^{2}/100$	kg/cm²
$E$	Young Modulus	700000	kg/cm²

Problem	Metric	MOEA/D-DE	NSGA-III	RVEA*	MOEA/D-PaS	MOEA/D-AMR
HC	IGD	8.8194e-0(2.98e-2)	8.7709e-0(1.84e-2)	9.9514e-0(1.83e-0)	7.5090e-0(4.92e-1)	\markoverwith \ULon1.1498e-0(1.59e-2)
HC	HV	1.0536e-0(1.59e-5)	1.0537e-0(5.55e-6)	1.0536e-0(4.86e-4)	1.0541e-0(2.06e-4)	\markoverwith \ULon1.0554e-0(5.49e-6)

		$\displaystyle\text{min}\quad(f_{1}(x),f_{2}(x),f_{3}(x))^{T}$		(8)
		$\displaystyle\text{s.t.}\quad\;x_{i}\in[0,1],i\in\langle 4\rangle,$		(8)

	$\displaystyle f_{1}(x)$	$\displaystyle=0.692+0.477x_{1}-0.687x_{2}-0.08x_{3}-0.065x_{4}-0.167x_{1}^{2}-0.0129x_{1}x_{2}$
		$\displaystyle\quad+0.0796x_{2}^{2}-0.0634x_{1}x_{3}-0.0257x_{2}x_{3}+0.0877x_{3}^{2}-0.0521x_{1}x_{4}$
		$\displaystyle\quad+0.00156x_{2}x_{4}+0.00198x_{3}x_{4}+0.0184x_{4}^{2},$
	$\displaystyle f_{2}(x)$	$\displaystyle=0.153-0.322x_{1}+0.396x_{2}+0.424x_{3}+0.0226x_{4}+0.175x_{1}^{2}+0.0185x_{1}x_{2}$
		$\displaystyle\quad-0.0701x_{2}^{2}-0.251x_{1}x_{3}+0.179x_{2}x_{3}+0.015x_{3}^{2}+0.0134x_{1}x_{4}$
		$\displaystyle\quad+0.0296x_{2}x_{4}+0.0752x_{3}x_{4}+0.0192x_{4}^{2},$
	$\displaystyle f_{3}(x)$	$\displaystyle=0.37-0.205x_{1}+0.0307x_{2}+0.108x_{3}+1.019x_{4}-0.135x_{1}^{2}+0.0141x_{1}x_{2}$
		$\displaystyle\quad+0.0998x_{2}^{2}+0.208x_{1}x_{3}-0.0301x_{2}x_{3}-0.226x_{3}^{2}+0.353x_{1}x_{4}$
		$\displaystyle\quad-0.0497x_{3}x_{4}-0.423x_{4}^{2}+0.202x_{1}^{2}x_{2}-0.281x_{1}^{2}x_{3}-0.342x_{1}x_{2}^{2}$
		$\displaystyle\quad-0.245x_{2}^{2}x_{3}+0.281x_{2}x_{3}^{2}-0.184x_{1}x_{4}^{2}-0.281x_{1}x_{2}x_{3}.$

Problem	Metric	MOEA/D-DE	NSGA-III	RVEA*	MOEA/D-PaS	MOEA/D-AMR
RI	IGD	4.3249e-2(3.17e-4)	3.6629e-2(1.08e-3)	3.6760e-2(4.94e-3)	4.7835e-2(9.66e-4)	\markoverwith \ULon3.2578e-2(1.04e-3)
RI	HV	9.5235e-1(6.66e-5)	9.4930e-1(1.23e-3)	9.5079e-1(1.61e-3)	9.4939e-1(6.61e-4)	\markoverwith \ULon9.5588e-1(2.33e-3)

		$\displaystyle\text{min}\quad\left(x_{1}+120x_{2},\sum_{i=1}^{4}\max\{-g_{i}(x),0\}\right)^{T}$		(7)
		$\displaystyle\text{s.t.}\quad\;x_{1}\in[0.5,4],x_{2}\in[0.5,50],$		(7)

A novel multiobjective evolutionary algorithm based on decomposition and multi-reference points strategy

Abstract

keywords:

1 Introduction

2 Related works

2.1 Basic concepts

Definition 2.1.

Remark 2.1.

Definition 2.2.

Definition 2.3.

Definition 2.4.

Remark 2.2.

Definition 2.5.

2.2 Scalarization methods in MOEA/D framework

2.3 The Pascoletti-Serafini scalarization method

Remark 2.3.

Theorem 2.1.

Remark 2.4.

3 The proposed algorithm

3.1 Motivation

3.2 The transformation of PS scalarization method

Remark 3.1.

3.3 The setting of multi-reference points and direction

Theorem 3.1.

Remark 3.2.

Remark 3.3.

Remark 3.4.

3.4 The adaptation of multi-reference points

3.5 The description of MOEA/D-AMR

3.5.1 Initialization

3.5.2 Reproduction and repair

3.5.3 Update of solutions

3.5.4 Adjustment of multi-reference points

4 Experimental studies

4.1 Experimental settings

4.2 Experimental results and analysis

5 Applications on real-world MOPs

5.1 The hatch cover (HC) design problem

5.2 The rocket injector (RI) design problem

6 Conclusion and Future Work

References