This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Experience-Based Evolutionary Algorithms for Expensive Optimization

Xunzhao Yu, Yan Wang, Ling Zhu, Dimitar Filev,  and Xin Yao This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.This work was supported by Ford USA through a research grant to the University of Birmingham, UK. Xin Yao was also supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement number 766186 (ECOLE), the Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), and Shenzhen Science and Technology Program (Grant No. KQTD2016112514355531).X. Yu and X. Yao (the corresponding author) are with the CERCIA, School of Computer Science, University of Birmingham, Birmingham, B15 2TT, U.K. (e-mail: [email protected]; [email protected]). X. Yao is also with the Research Institute of Trustworthy Autonomous Systems, and Guangdong Provincial Key Laboratory of Brain-inspired Computation, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China.Y. Wang, L. Zhu, and D. Filev are with the Ford Motor Company, 2101 Village Rd, Dearborn, MI 48124, USA. (e-mail: [email protected], [email protected], [email protected])
Abstract

Optimization algorithms are very different from human optimizers. A human being would gain more experiences through problem-solving, which helps her/him in solving a new unseen problem. Yet an optimization algorithm never gains any experiences by solving more problems. In recent years, efforts have been made towards endowing optimization algorithms with some abilities of experience learning, which is regarded as experience-based optimization. In this paper, we argue that hard optimization problems could be tackled efficiently by making better use of experiences gained in related problems. We demonstrate our ideas in the context of expensive optimization, where we aim to find a near-optimal solution to an expensive optimization problem with as few fitness evaluations as possible. To achieve this, we propose an experience-based surrogate-assisted evolutionary algorithm (SAEA) framework to enhance the optimization efficiency of expensive problems, where experiences are gained across related expensive tasks via a novel meta-learning method. These experiences serve as the task-independent parameters of a deep kernel learning surrogate, then the solutions sampled from the target task are used to adapt task-specific parameters for the surrogate. With the help of experience learning, competitive regression-based surrogates can be initialized using only 1dd solutions from the target task (dd is the dimension of the decision space). Our experimental results on expensive multi-objective and constrained optimization problems demonstrate that experiences gained from related tasks are beneficial for the saving of evaluation budgets on the target problem.

Index Terms:
Evolutionary algorithms, experience-based optimization, surrogate-assisted, expensive optimization, meta-learning.

I Introduction

When solving a new unseen problem, a human being always benefits from experiences gained from related problems he/she has seen in the past. Such an ability of experience learning enhances the efficiency of solving new problems and thus is desirable for intelligent optimization algorithms. To endow optimization algorithms with some abilities of experience learning, many efforts have been made to combine diverse experience learning techniques with optimization algorithms, resulting in experience-based optimization algorithms [1, 2, 3]. In the past decade, experience-based optimization approaches such as evolutionary transfer optimization [3, 4, 5] and evolutionary multi-tasking optimization (EMTO) [6, 7, 8] have been proposed to solve diverse optimization problems, including automatic parameter tuning problems [2], dynamic optimization problems [9, 10]. These studies have demonstrated that the idea of experience-based optimization is effective when solving hard optimization problems, experiences gained from past problems can be helpful for solving new unseen optimization problems.

The motivation of this paper is to demonstrate that the idea of experience-based optimization is also working for diverse expensive optimization problems. In real-world applications, many expensive optimization problems are related since they are working on similar issues in the same domain. For example, in the domain of gasoline engine calibration, many calibration problems can be treated as related tasks. Although the categories of gasoline engines to be calibrated are different, the physical properties of gasoline will not change. Moreover, the mechanical structures of some gasoline engines are similar [11]. These domain-specific features make many calibration problems related to each other. It is desirable to explore the domain-specific features from related tasks and then use them as experiences in SAEAs. The learned experiences can help SAEAs to build reliable surrogates with a lower cost of fitness evaluations than before (e.g., 1dd evaluations). Consequently, these experience-based SAEAs can save more fitness evaluations (in surrogate initialization) than non-experience-based SAEAs and thus are more suitable for very expensive optimization problems, where only 150 or fewer fitness evaluations are allowed during the optimization.

The related tasks considered in this paper are expensive and each of them can provide only a small dataset of evaluated samples for experience learning. Therefore, our experience-based SAEA is based on the context of few-shot problems [12, 13], where plenty of small related tasks are available for experience learning. A challenge is that most existing experience-based optimization approaches can not learn experiences from small related tasks. However, recently, meta-learning [14] has been proved to be powerful in solving few-shot problems. In meta-learning, the underlying common features of related tasks are extracted as past experiences, which can be integrated with the solutions sampled from new tasks and thus enhance the learning efficiency for new tasks. Quite a few meta-learning methods [13] have been proposed for few-shot problems. Benefiting from the ability of experience learning, these methods are capable of fitting accurate classification or regression models with limited samples from the target task, which motivates us to develop meta-learning methods to learn experiences for SAEAs.

In this paper, we propose an experience-based SAEA framework to solve expensive optimization problems, including multi-objective optimization and constrained optimization. Major contributions are summarized as follows.

  • A novel meta-learning method is developed to gain experiences from related expensive tasks. Based on the learned experiences, a regression-based surrogate is generated and then adapted to approximate the fitness landscape of the target task. The surrogate is derived from the deep kernel learning framework in which a Gaussian process employs a deep kernel to work as its covariance function. Different from existing deep kernel learning models that are trained through meta-learning, our method learns only one common neural network for all related tasks and adapts task-specific parameters for the base kernel of a Gaussian process. Such a framework simplifies the architecture of experience learning as the model complexity will not grow with the number of past tasks, yet it is still able to adapt itself to a new task explicitly.

  • We propose an experience-based SAEA framework to combine regression-based SAEAs with our experience learning method. The framework employs our meta-learning model as surrogates for experience learning, and an update strategy is designed to adapt surrogates in each generation. Note that our SAEA framework is a general framework as it is not designed for any specific scenario of optimization problems. However, due to the page limitation, this paper focuses on only the scenarios of multi-objective optimization and constrained optimization, which have not been investigated before.

  • We provide empirical evidence to show the effectiveness of our experience-based SAEA framework in expensive multi-objective optimization and expensive constrained single-objective optimization. Our ablation studies also discover the influence of some factors on the performance of experience-based optimization.

The rest of this paper is organized as follows: Section II introduces the related work and some preliminaries. The details of the proposed method are described in Section III. Section IV presents experimental studies and analysis. The conclusions and some further work are given in Section V.

II Background

II-A Related Work

In the past decade, experience-based evolutionary optimization has attracted much attention as it uses experiences gained from other optimization problems to improve the optimization efficiency of target problems, which mimics human capabilities of cognitive and knowledge generalization [15]. The optimization problems that provide experiences or knowledge are regarded as source tasks, while the target optimization problems are regarded as target tasks. To obtain useful experiences, the tasks that are related to target tasks are chosen as source tasks since they usually share domain-specific features with target tasks. Diverse experience-based evolutionary optimization methods have been proposed to use experiences gained from related tasks to tackle target tasks. They can be divided into two categories based on the direction of experience transformation.

In the first category, experiences are transformed mutually. Every considered optimization problem is a target task and also is one of the source tasks of other optimization problems. In other words, the roles of source task and target task are compatible. One representative tributary is EMTO that aims to solve multiple optimization tasks concurrently [16, 6, 17, 7, 8]. In EMTO, experiences are learned, updated, and spontaneously shared among target tasks through multi-task learning techniques. A variant of EMTO is multiforms optimization [15, 18, 19]. In multiforms optimization, multi-task learning methods are employed to learn experiences from distinct formulations of a single target task.

In the second category, experiences are transformed unidirectionally. The roles of source task and target task are not compatible, an optimization problem cannot serve as a source task and a target task simultaneously. One popular tributary is transfer optimization which employs transfer learning techniques to transform experiences from source tasks to target tasks [3, 4, 5, 20]. In transfer learning, experiences can be transformed from a single source task, multiple source tasks, or even source tasks from a different domain [21]. However, these transfer learning techniques pay more attention to experience transformation instead of experience learning. Despite diverse and complex situations of experience transformation have been studied [9, 10], the difficult of learning experiences from small (expensive) source tasks has not been well studied. Actually, a common scenario in transfer learning is that the source task(s) is/are large enough such that useful experiences can be obtained easily through solving source task(s) [21]. In contrast to transfer optimization, recently, some experience-based optimization algorithms attempted to use meta-learning methods to learn experiences from small source tasks, which are known as few-shot optimization (FSO)[22]. Since meta-learning only works for related tasks in the same domain, the situations of experience transformation are less complex than that of transfer learning. As a result, meta-learning pays more attention to experience learning instead of experience transformation. Domain-specific features are extracted as experiences and no related task needs to be solved.

Our work belongs to the FSO in the second category discussed above since the experiences are transformed unidirectionally. More importantly, our experiences are learned across many related expensive tasks, rather than gained through solving more or less source tasks. Existing studies on FSO only work for global optimization [22], leaving other optimization scenarios such as multi-objective optimization and constrained optimization still awaiting for investigation. In addition, in-depth ablation studies are lacking in the literature, making it unclear which factors affect the performance of FSO. Moreover, some studies use existing meta-learning models [23] as their surrogates. No further adaptations are made to these surrogates during optimization since they were not originally designed for optimization. Therefore, a novel meta-learning model for FSO is desirable. Our work fills the aforesaid gaps by proposing a novel meta-learning modeling method and a general experience-based SAEA framework. As a result, accurate surrogates and competitive optimization results can be achieved while the cost of surrogate initialization is only 1dd evaluations on the expensive target problem.

II-B Preliminaries

A meta deep kernel learning modeling method is developed to learn experiences in our SAEA framework. This subsection gives preliminaries about meta-learning and deep kernel learning. The former is the method of experience learning, the latter is the underlying structure of experience representation.

II-B1 Meta-Learning in Few-Shot Problems

In the context of few-shot problems, we have plenty of related tasks, each task TT contributes a couple of small datasets D={(S,Q)}D=\{(S,Q)\}, namely support dataset SS and query dataset QQ, respectively. After learning from datasets of random related tasks, a support set SS_{*} from new unseen task TT_{*} is given and one is asked to estimate the labels or values of a query set QQ_{*}. The problem is called 1-shot or 5-shot when only 1 data point or 5 data points are provided in SS_{*}. A comprehensive definition of few-shot problems is available in [12, 13].

Meta-learning methods have been widely used to solve few-shot problems [13]. They learn domain-specific features that are shared among related tasks as experiences, these experiences are used to understand and interpret the data collected from new tasks encountered in the future.

II-B2 Deep Kernel Learning

Deep kernel learning (DKL) [24] aims at constructing kernels that encapsulate the expressive power of deep architectures for Gaussian processes (GPs [25], also known as the Kriging model [26] or the design and analysis of computer experiments (DACE) stochastic process model [27]). To create expressive and scalable closed form covariance kernels, DKL combines the non-parametric flexibility of kernel methods and the structural properties of deep neural networks. In practice, a deep kernel k(xi,xj|𝜸)k(\textbf{x}^{i},\textbf{x}^{j}|\bm{\gamma}) transforms the inputs x of a base kernel k(xi,xj|𝜽)k(\textbf{x}^{i},\textbf{x}^{j}|\bm{\theta}) through a non-linear mapping given by a deep architecture ϕ(x|w,b)\phi(\textbf{x}|\textbf{w},\textbf{b}):

k(xi,xj|𝜸)=k(ϕ(xi|w,b),ϕ(xj|w,b)|𝜽),k(\textbf{x}^{i},\textbf{x}^{j}|\bm{\gamma})=k(\phi(\textbf{x}^{i}|\textbf{w},\textbf{b}),\phi(\textbf{x}^{j}|\textbf{w},\textbf{b})|\bm{\theta}), (1)

where 𝜽\bm{\theta} and (w,b)(\textbf{w},\textbf{b}) are parameter vectors of the base kernel and the deep architecture, respectively. 𝜸={𝜽,w,b}\bm{\gamma}=\{\bm{\theta},\textbf{w},\textbf{b}\} is a set of all parameters in this deep kernel. Note that in DKL, all parameters 𝜸\bm{\gamma} of a deep kernel k(xi,xj|𝜸)k(\textbf{x}^{i},\textbf{x}^{j}|\bm{\gamma}) are learned jointly by using the log marginal likelihood function of GPs as a loss function. Such a jointly learning strategy makes a DKL algorithm outperforms a combination of a deep neural network and a GP model, where a trained GP model is applied to the output layer of a trained deep neural network [24].

II-B3 Meta-Learning on DKL

An important distinction between DKL algorithms and the applications of meta-learning on DKL is that DKL algorithms learn their deep kernels from single tasks instead of collections of related tasks. Such a difference alleviates two drawbacks of single task DKL [28]:

  • First, the scalability of deep kernels is no longer an issue as each dataset in meta-learning is small.

  • Second, the risk of overfitting is decreased since diverse data points are sampled across tasks.

III Experience-Based SAEA Framework

In this paper, expensive optimization is carried out in the context of few-shot problems [2], where the expensive optimization problem to be solved is denoted as target task TT_{*}, and plenty of small datasets DiD_{i} sampled from related tasks TiT_{i} are available for experience learning. We propose a SAEA framework to learn experiences from TiT_{i} and use experiences in TT_{*}, which saves fitness evaluations for expensive optimization problems. A meta deep kernel learning (MDKL) modeling method is developed to learn experiences from TiT_{i} and then initializes surrogates with a cost of 1dd evaluations on TT_{*}. Our SAEA framework combines MDKL with existing regression-based SAEAs. A new update strategy is designed to maintain and adapt surrogates for SAEAs. As a result, although much fewer evaluations from TT_{*} are used, our SAEA framework can still achieve competitive or even better optimization results than the SAEAs that are unable to learn experiences from TiT_{i}.

III-A Overall Working Mechanism

A diagram of our experience-based SAEA framework is illustrated in Fig. 1.

Refer to caption
Figure 1: Diagram of our experience-based SAEA framework. The grey block includes all modules that are related to the evolutionary optimization of target task TT_{*}. The MDKL surrogate modeling method used in the framework consists of a meta-learning procedure and an adaptation procedure. The meta-learning procedure learns experiences (task-independent parameters 𝜸𝒆\bm{\gamma^{e}}) from related tasks TiT_{i}. Based on the learned experiences, the adaptation procedure adapts MDKL task-specific parameters to approximate target task TT_{*}. Note that existing SAEAs train and update their surrogates on SS_{*} only, thus their workflows do not contain a meta-learning procedure and they can not gain experiences from related tasks.

All modules about the evolutionary optimization of target task TT_{*} are included in a grey block. The modules beyond the grey block are associated with related tasks TiT_{i} and experience learning, which are the main distinctions between our SAEA framework and existing SAEAs. The MDKL surrogate modeling method used in our SAEA framework consists of two procedures: meta-learning procedure and adaptation procedure. The former learns experiences from TiT_{i}, the latter uses experiences to adapt surrogates to fit TT_{*}.

The framework of experience-based SAEA is depicted in Algorithm 1, it consists of the following major steps.

Algorithm 1 Experience-Based SAEA Framework.

Input:
    DiD_{i}: Datasets collected from related tasks TiT_{i}, i={1,,N}\{1,\dots,N\};
    NmN_{m}: Number of datasets DmD_{m} for meta-learning;
    |Dm||D_{m}|: Size of DmD_{m}. Note that |Dm||Di||D_{m}|\leq|D_{i}|;
    BB: Number of related tasks in a batch;
    α,β\alpha,\beta: Surrogate learning rates;
    TT_{*}: Target task;
    OptOpt: A SAEA optimizer;
    FEmaxFE_{max}: Fitness evaluation budget.
Procedure:

1:  Experiences 𝜸e\bm{\gamma}^{e}\leftarrow Meta-learning(Di,Nm,|Dm|,B,αD_{i},N_{m},|D_{m}|,B,\alpha).
2:  SS_{*}\leftarrow Sampling 1dd solutions from TT_{*}.
3:  h(𝜸)h(\bm{\gamma}^{*})\leftarrow Adaptation(𝜸e,S,β\bm{\gamma}^{e},S_{*},\beta). //^{*}Initialize surrogate./{}^{*}/
4:  Set evaluation counter FE=|S|FE=|S_{*}|.
5:  while FE<FEmaxFE<FE_{max} do
6:     Candidate solution(s) x\textbf{x}^{*}\leftarrow Surrogate-assisted optimization (Opt,h(𝜸)Opt,h(\bm{\gamma}^{*}))
7:     f(x)f(\textbf{x}^{*})\leftarrow Evaluate x\textbf{x}^{*} on TT_{*}.
8:     SS{(x,f(x))}S_{*}\leftarrow S_{*}\cup\{(\textbf{x}^{*},f(\textbf{x}^{*}))\}.
9:     h(𝜸)h(\bm{\gamma}^{*})\leftarrow Update(𝜸,S,β\bm{\gamma}^{*},S_{*},\beta).
10:     Update FEFE.
11:  end while

Output: Optimal solutions in SS_{*}.

  1. 1.

    Experience learning: Before evolutionary optimization starts, a meta-learning procedure is conducted to train task-independent parameters 𝜸e\bm{\gamma}^{e} for MDKL surrogates (line 1). 𝜸e\bm{\gamma}^{e} are trained on NmN_{m} datasets {Dm1,,DmNm}\{D_{m1},\dots,D_{mN_{m}}\} which are collected from NN related tasks {T1,,TN}\{T_{1},\dots,T_{N}\}. 𝜸e\bm{\gamma}^{e} are experiences that represent the domain-specific features of related tasks.

  2. 2.

    Initialize surrogates with experiences: Evolutionary optimization starts when a target optimization task TT_{*} is given. An initial dataset SS_{*} is sampled (line 2) to adapt task-specific parameters 𝜸\bm{\gamma}^{*} on the basis of experiences 𝜸e\bm{\gamma}^{e}. After that, MDKL surrogates are updated (line 3).

  3. 3.

    Reproduction: MDKL surrogates h(𝜸)h(\bm{\gamma}^{*}) are combined with a SAEA optimizer OptOpt to search for optimal solution(s) x\textbf{x}^{*} on h(𝜸)h(\bm{\gamma}^{*}) (line 6). This is implemented by replacing the original (regression-based) surrogates in a SAEA with h(𝜸)h(\bm{\gamma}^{*}).

  4. 4.

    Update archive and surrogates: New optimal solution(s) x\textbf{x}^{*} is evaluated on target task TT_{*} (line 7). The evaluated solutions will be added to dataset SS_{*} (line 8) which serves as an archive. Further surrogate adaptation is triggered, surrogates h(𝜸)h(\bm{\gamma}^{*}) are updated (line 9).

  5. 5.

    Stop criterion: Once the evaluation budget has run out, the evolutionary optimization will be terminated and the optimal solutions in dataset SS_{*} will be outputted. Otherwise, the algorithm goes back to step 3.

In the following subsections, we first present the details of our MDKL surrogate modeling method, including how to learn experiences through meta-learning and adapt it to a specific target task. Then we explain the surrogate update strategy in our experience-based SAEA framework. Finally, we discuss the usage of MDKL surrogates and the compatibility of our SAEA framework with existing SAEAs.

III-B Learning and Using Experiences by MDKL

In MDKL, the domain-specific features of related tasks are used as experiences, which are represented by the task-independent parameters 𝜸e\bm{\gamma}^{e} learned across related tasks. To make MDKL more capable of expressing complex domain-specific features, the base kernel k(xi,xj|𝜽)k(\textbf{x}^{i},\textbf{x}^{j}|\ \bm{\theta}) in GP is combined with a neural network ϕ(w,b)\phi(\textbf{w},\textbf{b}) to construct a deep kernel (see Eq.(1)).

The main novelties of our MDKL surrogate modeling method can be summarized as follows.

  • A simple and efficient meta-learning structure for experience learning. Our method learns only one common neural network for all related tasks, such a neural network works as a component of the shared deep kernel. In comparison, [29, 28] generate separate deep kernels for each related task and train multiple neural networks to encode and decode task features, which makes their model complexity grow with the number of related tasks.

  • The explicit task-specific adaptation for the deep kernel, which is implemented by cumulating task-specific increments on the basis of the learned experiences.

The modeling of a MDKL model consists of two procedures: meta-learning procedure and adaptation procedure. To make a clear illustration, we introduce frameworks of two procedures and then explain them in detail.

Meta-learning procedure: Learning experiences
Our MDKL model uses the kernel in [30] as its base kernel:

k(xi,xj|𝜽,p)=exp(k=1dθk|xkixkj|pk).k(\textbf{x}^{i},\textbf{x}^{j}|\bm{\theta},\textbf{p})=exp(-\sum_{k=1}^{d}\theta_{k}|x_{k}^{i}-x_{k}^{j}|^{p_{k}}). (2)

Therefore, the deep kernel will be:

k(xi,xj|𝜸)=exp(k=1dθk|ϕ(xki)ϕ(xkj)|pk),k(\textbf{x}^{i},\textbf{x}^{j}|\bm{\gamma})=exp(-\sum_{k=1}^{d}\theta_{k}|\phi(x_{k}^{i})-\phi(x_{k}^{j})|^{p_{k}}), (3)

where 𝜸={w,b,𝜽,p}\bm{\gamma}=\{\textbf{w},\textbf{b},\bm{\theta},\textbf{p}\} is a set of deep kernel parameters. Details about alternative base kernels are available in [25].

The aim of meta-learning procedure is to learn experiences 𝜸e\bm{\gamma}^{e} from related tasks {T1,,TN}\{T_{1},\dots,T_{N}\}, including neural network parameters w,b\textbf{w},\textbf{b}, and task-independent base kernel parameters 𝜽e,pe\bm{\theta}^{e},\textbf{p}^{e}. The pseudo-code of meta-learning procedure is given in Algorithm 2.

Algorithm 2 Meta-learning(Di,Nm,|Dm|,B,αD_{i},N_{m},|D_{m}|,B,\alpha)

Input:
    DiD_{i}: Datasets collected from related tasks TiT_{i}, i={1,,N}\{1,\dots,N\};
    NmN_{m}: Number of datasets DmD_{m} for meta-learning;
    |Dm||D_{m}|: Size of DmD_{m}. Note that |Dm||Di||D_{m}|\leq|D_{i}|;
    BB: Number of related tasks in a batch;
    α\alpha: Learning rate for priors.
Procedure:

1:  Random initialize w,b,𝜽e,pe\textbf{w},\textbf{b},\bm{\theta}^{e},\textbf{p}^{e}.
2:  Set the number of update iterations U = Nm/BN_{m}/B.
3:  for j=1j=1 to UU do
4:     {D1,,DB}\{D^{\prime}_{1},\dots,D^{\prime}_{B}\}\leftarrow Randomly select a batch of datasets from {D1,,DN}\{D_{1},\dots,D_{N}\}.
5:     for all DiD^{\prime}_{i} in the batch do
6:        DmiD_{mi}\leftarrow Randomly sampling(DiD^{\prime}_{i}, |Dm||D_{m}|).
7:        Initialize task-specific increment Δ𝜽i,Δpi\Delta\bm{\theta}^{i},\Delta\textbf{p}^{i}.
8:        Compute task-specific parameters: 𝜽i=𝜽e+Δ𝜽i\bm{\theta}^{i}=\bm{\theta}^{e}+\Delta\bm{\theta}^{i}, pi=pe+Δpi\textbf{p}^{i}=\textbf{p}^{e}+\Delta\textbf{p}^{i}.
9:        Obtain deep kernel k(xi,xj|𝜸)k(\textbf{x}^{i},\textbf{x}^{j}|\bm{\gamma}) based GP: h(𝜸)h(\bm{\gamma}), where 𝜸={w,b,𝜽i,pi}\bm{\gamma}=\{\textbf{w},\textbf{b},\bm{\theta}^{i},\textbf{p}^{i}\} (Eq.(3)).
10:        Compute the loss function L(Dmi,h(𝜸))L({D_{mi}},h(\bm{\gamma})) (Eq.(4)).
11:        Update w,b,𝜽e,pe\textbf{w},\textbf{b},\bm{\theta}^{e},\textbf{p}^{e} using gradient descent: αL(Dmi,h(𝜸))\alpha\bigtriangledown L({D_{mi}},h(\bm{\gamma})) (Eq.(5)).
12:     end for
13:  end for

Output: Task-independent parameters: 𝜸e\bm{\gamma}^{e} = {w,b,𝜽e,pe}\{\textbf{w},\textbf{b},\bm{\theta}^{e},\textbf{p}^{e}\}.

Ideally, the experiences 𝜸e\bm{\gamma}^{e} are learned from plenty of (NmN_{m}) small datasets DmD_{m} collected from different related tasks. However, in practice, the number of available related tasks NN may be much smaller than NmN_{m}. Hence, the meta-learning is conducted gradually over UU update iterations (line 2). During each update iteration, a small batch of related tasks contribute BB small datasets {Dm1,,DmB}\{D_{m1},\dots,D_{mB}\} for meta-learning purpose (line 4 and line 6). Note that if N<NmN<N_{m}, a related task TiT_{i} can be used multiple times in the meta-learning procedure.

For a given dataset DmiD_{mi}, we denote 𝜽i=𝜽e+Δ𝜽i\bm{\theta}^{i}=\bm{\theta}^{e}+\Delta\bm{\theta}^{i} and pi=pe+Δpi\textbf{p}^{i}=\textbf{p}^{e}+\Delta\textbf{p}^{i} as the task-specific kernel parameters, where Δ𝜽i,Δpi\Delta\bm{\theta}^{i},\Delta\textbf{p}^{i} are the distance we need to move from the task-independent parameters to the task-specific parameters (line 8). The loss function LL of MDKL is the likelihood function defined as follows [30]:

1(2π)n/2(σ2)n/2|R|1/2exp[(y1μ)TR1(y1μ)2σ2],\frac{1}{(2\pi)^{n/2}(\sigma^{2})^{n/2}|\textbf{R}|^{1/2}}exp[-\frac{(\textbf{y}-\textbf{1}\mu)^{T}\textbf{R}^{-1}(\textbf{y}-\textbf{1}\mu)}{2\sigma^{2}}], (4)

where |R||\textbf{R}| is the determinant of the correlation matrix R, each element in the matrix is computed through Eq.(3). y is the fitness vector of DmiD_{mi}. And μ\mu and σ2\sigma^{2} are the mean and the variance of the prior distribution, respectively. Experiences 𝜸e={w,b,𝜽e,pe}\bm{\gamma}^{e}=\{\textbf{w},\textbf{b},\bm{\theta}^{e},\textbf{p}^{e}\} is updated by gradient descent (line 9), take 𝜽e\bm{\theta}^{e} as an example:

𝜽e𝜽eα𝜽eL(Dmi,h(𝜸)).\bm{\theta}^{e}\leftarrow\bm{\theta}^{e}-\alpha\bigtriangledown_{\bm{\theta}^{e}}L(D_{mi},h(\bm{\gamma})). (5)

After UU iterations, 𝜸e\bm{\gamma}^{e} has been trained sufficiently by NmN_{m} small datasets DmD_{m} and will be used in target task TT_{*} later.

Adaptation procedure: Using experiences
The meta-learning of experiences 𝜸e\bm{\gamma}^{e} enables MDKL to handle a family of related tasks in general. To approximate a specific task TT_{*} well, surrogate h(𝜸e)h(\bm{\gamma}^{e}) needs to adapt task-specific increments Δ𝜽\Delta\bm{\theta}^{*} and Δp\Delta\textbf{p}^{*} in the way described in Algorithm 3.

Algorithm 3 Adaptation(𝜸,S,β\bm{\gamma}^{*},S_{*},\beta)

Input:
    𝜸\bm{\gamma}^{*}: Current surrogate parameters;
    SS_{*}: A dataset sampled from target task TT_{*} (Archive).
    β\beta: Learning rate for adaptation.
Procedure:

1:  if 𝜸==𝜸e\bm{\gamma}^{*}==\bm{\gamma}^{e} then
2:     Initialize task-specific increments Δ𝜽,Δp\Delta\bm{\theta}^{*},\Delta\textbf{p}^{*}.
3:     Compute task-specific parameters: 𝜽=𝜽e+Δ𝜽\bm{\theta}^{*}=\bm{\theta}^{e}+\Delta\bm{\theta}^{*}, p=pe+Δp\textbf{p}^{*}=\textbf{p}^{e}+\Delta\textbf{p}^{*}.
4:     Obtain deep kernel k(xi,xj|𝜸)k(\textbf{x}^{i},\textbf{x}^{j}|\bm{\gamma}^{*}) based GP: h(𝜸)h(\bm{\gamma}^{*}), where 𝜸={w,b,𝜽,p}\bm{\gamma}^{*}=\{\textbf{w},\textbf{b},\bm{\theta}^{*},\textbf{p}^{*}\} (Eq.(3)).
5:  end if
6:  Compute the loss function L(S,h(𝜸))L(S_{*},h(\bm{\gamma}^{*})) (Eq.(4)).
7:  Update Δ𝜽,Δp\Delta\bm{\theta}^{*},\Delta\textbf{p}^{*} using gradient descent: β\beta\bigtriangledown L(S,h(𝜸))L(S_{*},h(\bm{\gamma}^{*})) (Eq.(6)).

Output: Adapted MDKL h(𝜸)h(\bm{\gamma}^{*}).

There are three major differences between the meta-learning procedure and the adaptation procedure:

  • First, the dataset SS_{*} for adaptations is sampled from target task TT_{*}. In comparison, in the meta-learning procedure, the training datasets {Dm1,,DmNm}\{D_{m1},\dots,D_{mN_{m}}\} are sampled from NN different related tasks.

  • Second, the parameters to be updated are task-specific increments Δ𝜽\Delta\bm{\theta}^{*} and Δp\Delta\textbf{p}^{*}, not task-independent parameters 𝜽e\bm{\theta}^{e} and pe\textbf{p}^{e}. The gradient descent in Eq.(5) is replaced by the follows (line 7).

    Δ𝜽Δ𝜽βΔ𝜽L(S,h(𝜸)).\Delta\bm{\theta}^{*}\leftarrow\Delta\bm{\theta}^{*}-\beta\bigtriangledown_{\Delta\bm{\theta}^{*}}L(S_{*},h(\bm{\gamma}^{*})). (6)
  • Third, task-specific increments Δ𝜽\Delta\bm{\theta}^{*} and Δp\Delta\textbf{p}^{*} are initialized only when it is the first time to adapt task-specific parameters (lines 1-5). From the perspective of tuning parameters, we can say task-independent base kernel parameters 𝜽e\bm{\theta}^{e} and pe\textbf{p}^{e} are the initialization points for further adaptations. In other words, the learning of a new task starts on the basis of past experiences.

A diagram of the deep kernel implemented in our MDKL model is illustrated in Fig. 2.

Refer to caption
Figure 2: Diagram of our deep kernel implementation. The continuous lines depict the training process, the dotted lines depict the inference process. QQ_{*} denotes query samples to be evaluated on our surrogates. The neural network ensures the expressive power of our deep kernel. Common features of related tasks are represented by neural network parameters w,b\textbf{w},\textbf{b} and base kernel parameters 𝜽e,pe\bm{\theta}^{e},\textbf{p}^{e}, which are the learned experiences. Task-specific increments Δ𝜽\Delta\bm{\theta}^{*} and Δp\Delta\textbf{p}^{*} distinguishes a given task TT_{*} from other tasks.

III-C Surrogate Update Strategy

The adaptation procedure has explained how experiences are used to adapt a MDKL surrogate with a dataset SS_{*} from the target task TT_{*}. In this subsection, we describe the update strategy in our experience-based SAEA framework. To properly integrate experiences and data from TT_{*}, our update strategy is designed to determine whether a MDKL surrogate should be adapted in the current iteration or not, ensuring an optimal update frequency of surrogates.

As illustrated in Algorithm 4, the surrogate update starts when a new optimal solution(s) has been evaluated on expensive functions and an updated archive SS_{*} is available.

Algorithm 4 Update(𝜸,S,β\bm{\gamma}^{*},S_{*},\beta)

Input:
    𝜸\bm{\gamma}^{*}: Current surrogate parameters;
    SS_{*}: Updated archive.
    β\beta: Learning rate for further adaptations.
Procedure:

1:  e0e_{0}\leftarrow MSE(h(𝜸),S(h(\bm{\gamma}^{*}),S_{*}).
2:  h(𝜸)h(\bm{\gamma}^{\prime})\leftarrow Adaptation(𝜸,S,β)(\bm{\gamma}^{*},S_{*},\beta).//^{*}Temporary surrogate/{}^{*}/
3:  e1e_{1}\leftarrow MSE(h(𝜸),S)(h(\bm{\gamma}^{\prime}),S_{*}).
4:  if e0>e1e_{0}>e_{1} then
5:     update 𝜸=𝜸\bm{\gamma}^{*}=\bm{\gamma}^{\prime}, obtain new h(𝜸)h(\bm{\gamma}^{*}).
6:  end if

Output: Optimal solutions in SS_{*}.

For a given surrogate h(𝜸)h(\bm{\gamma}^{*}), its mean squared error (MSE) on SS_{*} is selected as the update criterion: If the MSE after an adaptation e1e_{1} (line 3) is larger than the MSE without an adaptation e0e_{0} (line 1), then the surrogate will roll back to the status before the adaptation. This indicates the surrogate update has been refused and h(𝜸)h(\bm{\gamma}^{*}) will not be adapted in the current iteration. Otherwise, the adapted surrogate will be chosen (line 5). Note that no matter whether surrogate adaptations are accepted or refused, the resulting surrogates will be treated as updated surrogates, which are employed to assist the SAEA optimizer in the next iteration.

III-D Framework Compatibility and Surrogate Usage

Our experience-based SAEA framework is compatible with regression-based SAEAs since the original surrogates in these SAEAs can be replaced by our MDKL surrogates directly. Due to the nature of GP, when predicting the fitness of a query solution x\textbf{x}^{*}, a MDKL surrogate produces a predictive Gaussian distribution 𝒩\mathcal{N}\sim(y^(x),s^2(x))(\hat{y}(\textbf{x}^{*}),\hat{s}^{2}(\textbf{x}^{*})) , the predicted mean y^(x)\hat{y}(\textbf{x}^{*}) and covariance s^2(x)\hat{s}^{2}(\textbf{x}^{*}) are specified as [30]:

y^(x)=μ+rR1(y1μ),\hat{y}(\textbf{x}^{*})=\mu+\textbf{r}^{\prime}\textbf{R}^{-1}(\textbf{y}-\textbf{1}\mu), (7)
s^2(x)=σ2(1rR1r),\hat{s}^{2}(\textbf{x}^{*})=\sigma^{2}(1-\textbf{r}^{\prime}\textbf{R}^{-1}\textbf{r}), (8)

where r is a correlation vector consisting of covariances between x\textbf{x}^{*} and SS_{*}, other variables are explained in Eq.(4).

Classification-based SAEAs are not compatible with our SAEA framework. The classification surrogates in these SAEAs are employed to learn the relation between pairs of solutions, or the relation between solutions and a set of reference solutions. The class labels used for surrogate training can be fluctuating during the optimization and thus hard to be learned over related tasks. Similarly, in ordinal-regression-based SAEAs, the ordinal relation values to be learned are not as stable as the fitness of expensive functions. So ordinal-regression-based SAEAs are also not compatible with our SAEA framework. In this paper, we focus on experiences-based optimization for regression-based SAEAs, while other SAEA categories are left to be discussed in future work.

IV Computational Studies

Our computational studies can be divided into three parts:

  • Section IV-A evaluates the effectiveness of learning experiences through a real-world engine modeling problem.

  • Sections IV-B to IV-D use the scenario of multi-objective optimization as an example to investigate the performance of our SAEA framework in depth. Empirical evidence is provided to guide the use of our SAEA framework.

  • Section IV-E investigates the performance of our SAEA framework in breadth. The engine calibration problem we solve in the experiment covers many representative features, such as constrained optimization, single-objective optimization, and real-world applications.

The source code is available online111https://github.com/XunzhaoYu/Experience-Based-SAEA. For all meta-learning methods used in our experiments, their basic setups are listed in Table I.

TABLE I: Parameter setups for meta-learning methods.
Module Parameter Value
Meta-learning Number of meta-learning datasets NmN_{m} 20000
Number of update iterations UU 2000
Batch size BB 10
Neural network Number of hidden layers 2
Number of units in each hidden layer 40
Learning rates α,β\alpha,\beta 0.001, 0.001
Activation function ReLU

The neural network structure is suggested by [31, 23], and the learning rates are the default values that have been widely used in many meta-learning methods [32, 23].

IV-A Effectiveness of Learning Experiences

The evaluation of the effectiveness of learning experiences aims at demonstrating that our MDKL model is able to learn experiences from related tasks and it outperforms other meta-learning models. For this reason, the experiment is designed to answer the following questions:

  • Given a small dataset SS_{*} from target task TT_{*}, can MDKL learn experiences from related tasks and generate a model with the smallest MSE?

  • If yes, which components of MDKL contribute to the effectiveness of learning experiences? Meta-learning or/and deep kernel learning? If not, why not?

To answer the two questions above, we consider two experiments to evaluate the effectiveness of learning experiences: amplitude prediction for unknown periodic sinusoid functions, and fuel consumption prediction for a gasoline motor engine. The former is a few-shot regression problem that motivates many meta-learning studies [31, 32, 28, 23], while the latter is a real-world regression problem [33]. The results of the sinusoid function regression experiment are reported in the supplementary material.In this subsection, we focus on a Brake Special Fuel Consumption (BSFC) regression task for a gasoline motor engine [33], where BSFC is evaluated on the gasoline engine simulation (denoted by TT_{*}) provided by Ford Motor Company.

Experimental setups:
The related tasks TiT_{i} used in our experiment are N=100N=100 gasoline engine models. These engine models have different behaviors when compared with TT_{*}, but they share the basic features of gasoline engines. All related tasks and the target task have the same six decision variables. Each related task TiT_{i} provides only 60 solutions, forming a dataset DiD_{i}. The size of datasets used for meta-learning |Dm||D_{m}| is set to 40. Six tests are conducted where |S|={|S_{*}|=\{2, 3, 5, 10, 20, 40}\} data points are sampled from the real engine simulation TT_{*}. The MSE is chosen as an indicator of modeling accuracy, which is measured using a dataset consisting of 12500 data points that are sampled uniformly from the engine decision space.

Comparison methods:
In this experiment, three families of modeling methods are compared with our MDKL model:

  • Meta-learning methods that are proposed for regression tasks: MAML [31], ALPaCA [32], and DKT [23]. The configurations of MAML, ALPaCA, DKT are the same as suggested in the original literature.

  • Non-meta-learning method that is widely used for regression tasks: GP model. We choose GP as a baseline since it is effective and more relevant to MDKL than other non-meta-learning modeling methods. We set the range of base kernel parameters in the GP model as θ[105,10]\theta\in[10^{-5},10] and p[1,2]p\in[1,2].

  • MDKL related methods that are designed to investigate which components of MDKL contribute to the modeling performance: GP_Adam, DKL, and MDKL_NN. GP_Adam is a GP model fitted by Adam optimizer. The combination of GP_Adam and a neural network results in a kind of DKL algorithm. MDKL_NN is a meta-learning version of DKL, but it learns only neural network parameters through meta-learning and has no task-independent base kernel parameters.

Results and analysis:
The statistical test results of the MSE values achieved by comparison algorithms in BSFC regression experiments are summarized in Table II. Each row lists the results obtained when the same number of fitness evaluations |S||S_{*}| are used to train models. The results of Wilcoxon rank sum test between MDKL and other compared algorithms are listed in the last row.

TABLE II: Average MSE and standard deviation (in brackets) of 30 runs on the regression of engine fuel consumption. Support data points are used to train non-meta surrogates or adapt meta-learning surrogates. All results are normalized since the actual engine data is unable to be disclosed. The symbols ’++’, ’\approx’, ’-’ denote the win/tie/loss result of Wilcoxon rank sum test (significance level is 0.05) between MDKL and comparison modeling methods, respectively. The last row counts the total win/tie/loss results.
Support data GP [26] GP_Adam DKL MDKL_NN MDKL DKT [23] MAML [31] ALPaCA [32]
points |S||S_{*}|
2 2.23e+1(3.20e+0)++ 2.37e+1(6.30e+0)++ 2.30e+1(5.87e+0)++ 1.73e+1(6.33e+0)\approx 1.72e+1(6.34e+0) 1.81e+1(5.68e+0)\approx 1.87e+1(6.37e+0)\approx 1.91e+1(1.02e+1)\approx
3 2.14e+1(3.74e+0)++ 2.41e+1(1.38e+1)++ 2.20e+1(3.74e+0)++ 1.45e+1(7.13e+0)\approx 1.45e+0(7.01e+0) 1.55e+1(6.66e+0)\approx 1.80e+1(4.69e+0)\approx 2.13e+1(1.97e+1)\approx
5 2.13e+1(3.27e+0)++ 2.46e+1(1.00e+1)++ 2.07e+1(3.95e+0)++ 1.12e+1(6.65e+0)\approx 1.10e+1(6.58e+0) 1.21e+1(6.49e+0)\approx 1.84e+1(6.05e+0)++ 1.99e+1(2.29e+1)++
10 1.84e+1(1.89e+0)++ 2.06e+1(1.19e+1)++ 2.10e+1(5.79e+0)++ 7.19e+0(4.82e+0)\approx 7.08e+0(4.77e+0) 7.99e+0(4.87e+0)\approx 1.70e+1(5.54e+0)++ 1.38e+1(8.12e+0)++
20 1.56e+1(2.00e+0)++ 2.38e+1(1.05e+1)++ 1.76e+1(2.42e+0)++ 5.03e+0(1.82e+0)\approx 4.86e+0(1.71e+0) 5.74e+0(1.91e+0)++ 1.50e+1(2.59e+0)++ 1.01e+1(5.52e+0)++
40 1.28e+1(2.03e+0)++ 1.48e+1(7.35e+0)++ 1.67e+1(3.73e+0)++ 4.13e+0(7.90e-1)\approx 4.00e+0(8.59e-1) 4.92e+0(1.09e+0)++ 1.45e+1(1.85e+0)++ 8.01e+0(3.35e+0)++
win/tie/loss 6/0/0 6/0/0 6/0/0 0/6/0 -/-/- 2/4/0 4/2/0 4/2/0

It can be observed that MDKL and MDKL_NN outperform other comparison modeling methods since they have achieved the smallest MSE on all tests. Consistent observations can also be made from the results of the sinusoid function regression experiments (presented in the supplementary material), MDKL has also achieved the best performance.

Additional Wilcoxon rank sum tests have been conducted between MDKL related algorithms to answer our second question (results are not reported in Table II). The statistical test results between DKL and GP_Adam are 1/5/0, indicating that the neural network in DKL makes some contributions to the performance of MDKL. The statistical test results between MDKL_NN and DKL are 6/0/0, demonstrating that the meta-learning of neural network parameters constructs a useful deep kernel and contributes to the improvement of modeling accuracy. However, there is no significant difference between the performance of MDKL and that of MDKL_NN, the meta-learning on base kernel parameters does not play a critical role on this engine problem. In comparison, the meta-learning on base kernel parameters is effective in sinusoid function regression experiments (see the supplementary material). Besides, the statistical test results between MDKL_NN and MAML are 4/2/0. Considering that MAML is a neural network regressor learned through meta-learning, we can conclude that GP is an essential component of our MDKL. In summary, all components in MDKL are necessary, and they all contribute to the effectiveness of learning experiences.

The comparison experiments on the gasoline motor engine and sinusoid functions have demonstrated the effectiveness of our MDKL modeling method in the learning of experiences. Given a small dataset of the target task, the model learned through MDKL method has the smallest MSE among all comparison models. Besides, the investigation between MDKL and its variants shows that all components in MDKL have made their contributions to the effectiveness of learning experiences. However, similar to other meta-learning studies [31, 32], we have not defined the similarity between tasks. In other words, the boundary between related tasks and unrelated tasks has not been defined. This should be a topic of further study on meta-learning. Besides, the relationship between task similarity and modeling performance has not been investigated. Instead, we study the relationship between task similarity and SAEA optimization performance in Section IV-C, since our main focus is the surrogate-assisted evolutionary optimization.

IV-B Performance on Expensive Multi-Objective Optimization

So far we have shown the effectiveness of experience learning method. In the following subsections, we aim at demonstrating the effectiveness of our experience-based SAEA framework. The experiment in this subsection is designed to answer the question below:

  • With the experiences learned from related tasks, can our SAEA framework helps a SAEA to save 9dd solutions without a loss of optimization performance?

The computational study is conducted on DTLZ testing functions [34]. All DTLZ functions have d=10d=10 decision variables and 3 objectives, as the setups that have been widely used in [35, 36]. The details of generating DTLZ variants (related tasks) are provided in the supplementary material.

Comparison algorithms:
As explained in Section III-D, our experience-based SAEA framework is compatible with regression-based SAEAs. Hence, we select MOEA/D-EGO [37] as an example and replace its GP surrogates by our MDKL surrogates. The resulting algorithm is denoted as MOEA/D-EGO(EB). Note that it is not necessary to specially select a newly proposed regression-based SAEA as our example, our main objective is to save evaluations with experiences and observe if there is any damage to the optimization performance caused by the saving of evaluations. Therefore, it does not make any difference which regression-based SAEAs we choose as our example. In addition, to demonstrate the improvement of optimization performance caused by using experiences on DTLZ functions is significant, several state-of-the-art SAEAs are also compared as baselines, including ParEGO [38], K-RVEA [39], CSEA [35], OREA [40], and KTA2 [36]. Among these SAEAs, ParEGO, K-RVEA, and KTA2 use regression-based surrogates, CSEA uses a classification-based surrogate, and OREA employs an ordinal-regression-based surrogate.

We implemented the experience-based SAEA framework, MOEA/D-EGO, ParEGO, and OREA, while the code of K-RVEA, CSEA, and KTA2 is available on PlatEMO [41], an open source Matlab platform for evolutionary multi-objective optimization. To make a fair comparison, all comparison algorithms share the same initial dataset SS_{*} in an independent run. We also set 𝜽[105,100]d\bm{\theta}\in[10^{-5},100]^{d} and p=2\textbf{p}=\textbf{2} for all GP surrogates as suggested in [36], these GP surrogates are implemented through DACE [27]. Other configurations are the same as suggested in their original literature.

Experimental setups:
The parameter setups for this multi-objective optimization experiment are listed in Table III.

TABLE III: Parameter setups for DTLZ optimization.
Parameter MOEA/D-EGO(EB) Comparisons
Number of related tasks NN 20000 (NmN_{m} in Table I) -
Size of datasets from related tasks |Di||D_{i}| 20 (2dd) -
Size of datasets for meta-learning |Dm||D_{m}| |Di||D_{i}| -
Evaluations for initialization 10 (1d1d) 100 (10dd)
Evaluations for further optimization 50 50
Total evaluations 60 150

In the meta-learning procedure, we assume that plenty of DTLZ variants are available, thus N=Nm=20000N=N_{m}=20000, each DTLZ variant TiT_{i} provides |Di|=|Dm|=20|D_{i}|=|D_{m}|=20 samples for learning experiences. During the optimization process, an initial dataset SS_{*} is sampled using Latin-Hypervolume Sampling (LHS) method [42], then extra evaluations are conducted until the evaluation budget has run out. Please note that our purpose is to use related tasks to save 9dd evaluations without a loss of SAEA optimization performance. Hence, the total evaluation budget for MOEA/D-EGO(EB) and comparison algorithms is different.

Since the testing functions have 3 objectives, we employ inverted generational distance plus (IGD+) [43] as our performance indicator, where smaller IGD+ values indicate better optimization results. 5000 reference points are generated for computing IGD+ values, as suggested in [35].

Results and analysis:
The statistical test results are reported in Tables IV.

TABLE IV: Average IGD+ values and standard deviation (in brackets) obtained from 30 runs on DTLZ functions. MOEA/D-EGO(EB) and comparison algorithms initialize their surrogates with 10, 100 samples, respectively. Extra 50 evaluations are allowed during the optimization process. The symbols ’++’, ’\approx’, ’-’ denote the win/tie/loss result of Wilcoxon rank sum test (significance level is 0.05) between MOEA/D-EGO(EB) and comparison algorithms, respectively. The last row counts the total win/tie/loss results.
Problem MOEA/D-EGO MOEA/D-EGO(EB) ParEGO K-RVEA KTA2 CSEA OREA
DTLZ1 1.07e+2(2.05e+1)++ 9.70e+1(1.87e+1) 7.82e+1(1.54e+1)- 1.18e+2(2.45e+1)++ 1.01e+2(2.38e+1)\approx 1.10e+2(2.50e+1)++ 1.02e+2(1.97e+1)\approx
DTLZ2 2.99e-1(7.01e-2)++ 1.43e-1(2.29e-2) 3.17e-1(4.12e-2)++ 2.69e-1(5.97e-2)++ 2.14e-1(3.84e-2)++ 2.98e-1(5.25e-2)++ 1.76e-1(4.69e-2)++
DTLZ3 3.15e+2 (6.04e+1)++ 1.97e+2 (1.64e+1) 2.30e+2 (5.99e+1)\approx 3.24e+2 (5.90e+1)++ 2.67e+2 (6.70e+1)++ 2.82e+2(6.97e+1)++ 2.72e+2(6.88e+1)++
DTLZ4 5.04e-1(8.25e-2)\approx 4.44e-1(1.35e-1) 5.44e-1(7.58e-2)++ 4.57e-1(1.14e-1)\approx 4.51e-1(9.54e-2)\approx 4.75e-1(1.09e-1)\approx 3.18e-1(1.54e-1)-
DTLZ5 2.39e-1(7.17e-2)++ 1.13e-1(2.24e-2) 2.58e-1(3.68e-2)++ 1.92e-1 (5.97e-2)++ 1.44e-1(4.60e-2)++ 2.14e-1(4.05e-2)++ 7.84e-2(2.42e-2)-
DTLZ6 1.29e+0(4.74e-1)\approx 1.11e+0(5.71e-1) 1.67e+0(6.77e-1)++ 4.62e+0(6.42e-1)++ 3.37e+0(6.71e-1)++ 6.26e+0(3.40e-1)++ 4.60e+0(1.19e+0)++
DTLZ7 3.31e-1(3.11e-1)- 2.47e+0(1.89e+0) 3.66e-1(1.31e-1)- 1.74e-1(3.57e-2)- 4.34e-1(2.20e-1)- 4.17e+0(1.13e+0)++ 2.14e+0(1.15e+0)\approx
win/tie/loss 4/2/1 -/-/- 4/1/2 5/1/1 4/2/1 6/1/0 3/2/2

It can be seen from Table IV that, although 90 fewer evaluations are used in surrogate initialization, MOEA/D-EGO(EB) can still achieve competitive or even smaller IGD+ values than MOEA/D-EGO on all DTLZ functions except for DTLZ7. Fig. LABEL:supp-fig:_DTLZ150 (in the supplementary material) also shows that the IGD+ values obtained by MOEA/D-EGO(EB) drop rapidly, especially during the first few evaluations, implying the experience learned from DTLZ variants are effective. Therefore, in most situations, our experience-based SAEA framework is able to assist MOEA/D-EGO in reaching competitive or even better optimization results, with the number of evaluations used for surrogate initialization reduced from 10dd to only 1dd.

MOEA/D-EGO(EB) is less effective on DTLZ7 than on other DTLZ functions, which might be attributed to the discontinuity of the Pareto Front on DTLZ7. Note that MOEA/D-EGO(EB) learns experience from small datasets such as DmD_{m} and SS_{*}. The solutions in these small datasets are sampled randomly, hence, the probability of having optimal solutions being sampled is small. However, it is difficult to learn the discontinuity of the Pareto Front from the sampled non-optimal solutions. As a result, the knowledge of ’there are four discrete optimal regions’ cannot be learned from such small datasets (|Dm|=20|D_{m}|=20) collected from related tasks.

The experience learned from related tasks makes MOEA/D-EGO more competitive when compared to other SAEAs. The use of MDKL surrogates results in significantly smaller IGD+ values on DTLZ1, DTLZ2, DTLZ3, and DTLZ5 than before. As a result, MOEA/D-EGO(EB) achieves the smallest IGD+ values on DTLZ2 and DTLZ3, and its optimization results on DTLZ1 and DTLZ5 are much closer to the best optimization results (e.g. results obtained by ParEGO and OREA) than MOEA/D-EGO. Although MOEA/D-EGO(EB) does not achieve the smallest IGD+ values on all DTLZ functions, it should be noted that MOEA/D-EGO(EB) is still the best algorithm among comparison SAEAs due to its overall performance. Table IV shows that no comparison SAEA outperforms MOEA/D-EGO(EB) on three DTLZ functions, but MOEA/D-EGO(EB) outperforms all comparison SAEAs on at least three DTLZ functions. Furthermore, the IGD+ values of MOEA/D-EGO(EB) are achieved with an evaluation budget of 60, while the IGD+ values of other SAEAs are reached with a cost of 150 evaluations (see Table III).

More performance comparison experiments:
We also compared our MOEA/D-EGO(EB) with the best two comparison algorithms in Table IV, ParEGO and OREA, using the same evaluation budget. The results reported in Table SLABEL:supp-tab:_DTLZadditional (see the supplementary material) show that our experience-based SAEA framework is more effective than comparison algorithms when the same evaluation budget is used. Besides, the performance of our SAEA framework in the context of extremely expensive optimization has been investigated in the supplementary material (see Table SLABEL:supp-tab:_DTLZ90).

The question raised at the beginning of this subsection can be answered by the results discussed so far. Due to the integration of experiences learned from related tasks (DTLZ variants), although the evaluation cost of surrogates initialization has been reduced from 10dd to 1dd, our experience-based SAEA framework is still capable of assisting regression-based SAEAs to achieve competitive or even better optimization results in most situations.

IV-C Influence of Task Similarity

In real-world applications, it might be unrealistic to assume that some related tasks are very similar to the target task. A more common situation is that all related tasks have limited similarity to the target task. To investigate the relationship between task similarity and SAEA optimization performance, we also tested the performance in an ‘out-of-range’ situation, where the original DTLZ is excluded from the range of DTLZ variants during the MDKL meta-learning procedure. As a result, only the DTLZ variants that are quite different from the original DTLZ function can be used to learn experiences. The ‘out-of-range’ situation eliminates the probability that MDKL surrogates benefit greatly from the DTLZ variants that are very similar to the original DTLZ function. Detailed definitions of the related tasks used in the ‘out-of-range’ situation are given in the supplementary material. Apart from the related tasks used, the remaining experimental setups are the same as the setups described in Section IV-B. For the sake of convenience, we denote the situation we tested in Section IV-B as ‘in-range’ below.

The statistical test results reported in Table V

TABLE V: Average IGD+ values and standard deviation (in brackets) obtained from 30 runs on DTLZ functions. Both MOEA/D-EGO(EB)s initialize their surrogates with 10 samples, extra 50 evaluations are allowed in the further optimization. The symbols ’++’, ’\approx’, ’-’ denote the win/tie/loss result of Wilcoxon rank sum test (significance level is 0.05) between the ‘out-of-range’ and the ‘in-range’ situations, respectively. The last two rows count the statistical test results between MOEA/D-EGO(EB)s and other compared algorithms.
MOEA/D-EGO(EB)s In-range Out-of-range
DTLZ1 9.70e+1(1.87e+1)\approx 9.11e+1(1.53e+1)
DTLZ2 1.43e-1(2.29e-2)\approx 1.41e-1(1.75e-2)
DTLZ3 1.97e+2 (1.64e+1)\approx 1.98e+1(1.51e+1)
DTLZ4 4.44e-1(1.35e-1)\approx 4.96e-1(8.63e-2)
DTLZ5 1.13e-1(2.24e-2)\approx 1.03e-1(2.39e-2)
DTLZ6 1.11e+0(5.71e-1)\approx 1.17e+0(6.88e-1)
DTLZ7 2.47e+0(1.89e+0)\approx 2.86e+0(1.87e+0)
win/tie/loss 0/7/0 -/-/-
vs MOEA/D-EGO 4/2/1 4/2/1
vs 6 Comparisons 26/9/7 27/7/8

show that the ‘out-of-range’ situation achieves competitive IGD+ values to the ‘in-range’ situation on all 7 test instances. This suggests that related tasks that are very similar to the target task have a limited impact on the optimization performance of our experience-based SAEA framework. Useful experiences can be learned from related tasks that are not very similar to the target task. Crucially, when comparing the performance of the ‘out-of-range’ situation and that of MOEA/D-EGO, we can still observe competitive or improved optimization results on 6 DTLZ functions (see Table V, the row titled by ’vs MOEA/D-EGO’, or Fig. LABEL:supp-fig:_DTLZ150 in the supplementary material). Moreover, it can be seen from the last row of Table V that the ‘out-of-range’ situation achieves better/competitive/worse IGD+ values than all compared SAEAs on 27/7/8 test instances. In comparison, the corresponding statistical test results for the ‘in-range’ situation are 26/9/7. The difference between these statistical test results is not significant.

A study on the ‘out-of-range’ situation in the context of extremely expensive multi-objective optimization is presented in Section LABEL:supp-sec:_exp-out90 of the supplementary material. Consistent results can be observed from Table SLABEL:supp-tab:_DTLZ40-out and Fig. LABEL:supp-fig:_DTLZ90.

Consequently, related tasks that are very similar to the target task are not essential to the optimization performance of our experience-based SAEA framework. In the ‘out-of-range’ situation, our MOEA/D-EGO(EB) can still achieve competitive or better optimization results than MOEA/D-EGO while using only 1dd samples for surrogate initialization.

IV-D Influence of the Size of Datasets Used in Meta-Learning

We also investigated the performance of our experience-based SAEA framework when different sizes of datasets |Dm||D_{m}| are used in the meta-learning procedure. The experimental setups are the same as the setups of MOEA/D-EGO(EB) in Section IV-B except for |Dm||D_{m}|.

TABLE VI: Average IGD+ values and standard deviation (in brackets) obtained from 30 runs on DTLZ functions. 10 samples are used for initialization and extra 50 evaluations are allowed in the further optimization. |Dm||D_{m}| is the size of the dataset collected from each related task. The symbols ’++’, ’\approx’, ’-’ denote the win/tie/loss result of Wilcoxon rank sum test (significance level is 0.05) between |Dm||D_{m}|=60 and |Dm||D_{m}|=20, respectively. The last row counts the total win/tie/loss results.
Problem In-range Out-of-range
|Dm||D_{m}|=20 |Dm||D_{m}|=60 |Dm||D_{m}|=20 |Dm||D_{m}|=60
DTLZ1 9.70e+1(1.87e+1)\approx 9.77e+1(1.73e+1) 9.11e+1(1.53e+1)\approx 9.93e+1(1.87e+1)
DTLZ2 1.43e-1(2.29e-2)++ 1.24e-1(2.11e-2) 1.41e-1(1.75e-2)++ 1.29e-1(2.36e-2)
DTLZ3 1.97e+2 (1.64e+1)\approx 1.98e+2 (2.21e+1) 1.98e+1(1.51e+1)\approx 1.93e+2(1.19e+1)
DTLZ4 4.44e-1(1.35e-1)\approx 5.17e-1(5.68e-2) 4.96e-1(8.63e-2)\approx 5.17e-1(5.38e-2)
DTLZ5 1.13e-1(2.24e-2)++ 9.96e-2(2.18e-2) 1.03e-1(2.39e-2)\approx 1.05e-1(2.73e-2)
DTLZ6 1.11e+0(5.71e-1)\approx 1.04e+0(6.06e-1) 1.17e+0(6.88e-1)\approx 1.22e+0(6.41e-1)
DTLZ7 2.47e+0(1.89e+0)++ 7.49e-1(2.61e-1) 2.86e+0(1.87e+0)++ 6.96e-1(2.41e-1)
win/tie/loss 3/4/0 -/-/- 2/5/0 -/-/-

It is evident from Table VI that when each DTLZ variant provides |Dm||D_{m}|=60 samples for the meta-learning of MDKL surrogates, the performance of both MOEA/D-EGO(EB)s are improved on 2 or 3 DTLZ functions. Particularly, a significant improvement can be observed from the optimization results of DTLZ7. As we discussed in Section IV-B, the poor performance of our experience-based optimization on DTLZ7 is caused by the small |Dm||D_{m}|. Optimal solutions have few chances to be included in a small DmD_{m}, which makes DmD_{m} fails to provide experiences about the discontinuity of optimal regions. In comparison, the experience of ’optimal regions’ can be learned from large datasets DmD_{m} and thus the optimization results are improved significantly.

In conclusion, for our experience-based SAEA framework, a large |Dm||D_{m}| for meta-learning procedure indicates more useful experiences can be learned from related tasks, which further improves the performance of experience-based optimization. Therefore, when applying our experience-based SAEA framework to real-world optimization problems, it is preferable to collect more data from related tasks for experience learning.

IV-E Performance on Expensive Constrained Optimization (Engine Calibration Problem)

The experiments on multi-objective benchmark testing functions have investigated the performance of our SAEA framework in depth. In this subsection, we use our proposed framework to solve a real-world gasoline motor engine calibration problem, which is an expensive constrained single-objective optimization problem. This experiment covers the optimization scenarios of single-objective optimization, constrained optimization, and real-world applications. It serves as an example to demonstrate the generality and broad applicability of our proposed framework.

The calibration problem has 6 adjustable engine parameters, namely the throttle angle, waste gate orifice, ignition timing, valve timings, state of injection, and air-fuel-ratio. The calibration aims at minimizing the BSFC while satisfying 4 constraints in terms of temperature, pressure, CA50, and load simultaneously [33].

Comparison algorithms:
Since the comparison algorithms in the DTLZ optimization experiments are not designed for handling constrained single-objective optimization, our comparison is conducted with two state-of-the-art constrained optimization algorithms used in industry [33]: a variant of EGO designed to handle constrained optimization problems (denoted by cons_EGO), and a GA customized for this calibration problem (denoted by adaptiveGA). The settings of comparison algorithms are the same as suggested in [33]. In this experiment, we apply our SAEA framework to cons_EGO and investigate its optimization performance. The GP surrogates in cons_EGO are replaced by our MDKL surrogates to conduct the comparison, and the resulting algorithm is denoted as cons_EGO(EB).

Experimental setups:
The setup of related tasks (N,DiN,D_{i}) is the same as described in Section IV-A. In the meta-learning procedure, both the support set and the query set contain 6 data points, thus |Dm|=12|D_{m}|=12. The total evaluation budget for all algorithms is set to 60. For adaptiveGA, all evaluations are used in the optimization process as it is not a SAEA. For cons_EGO, 40 samples are used to initialize surrogates and 20 extra evaluations are used in the optimization process. For cons_EGO(EB), only 6 samples are used to initialize MDKL surrogates, and the remaining evaluations are used for further optimization.

Optimization results and analysis:
The average normalized BSFC results and the average number of feasible solutions found over the number of evaluations used are plotted in Figs. 3a and 3c, respectively. The statistical results of three comparison algorithms are illustrated in Figs. 3b and 3d.

Refer to caption
(a)
Refer to caption
(b)

Refer to caption
(c)
Refer to caption
(d)
Figure 3: Results of 30 runs on the engine calibration problem, all BSFC values are normalized. The evaluation budget is set to 60, including 40, 6 samples used to initialize surrogates for cons_EGO and its experience-based variant, respectively. Figs. (a) and (c) show how BSFC and the number of feasible solutions vary with the number of evaluations, respectively. The star markers highlight the results achieved when 20 evaluations are used in the optimization process. Figs. (b) and (d) illustrate the statistical results of BSFC and the number of feasible solutions when the evaluation budget has run out. Mean values are shown on the top, the results in brackets are achieved at the star markers of Figs. (a) and (c).

From Fig. 3a, it can be observed that the minimal BSFC obtained by cons_EGO(EB) decreases drastically in the first few evaluations, implying that the experiences learned from related tasks are effective. In comparison, the minimal BSFC obtained by adaptiveGA and cons_EGO drops in a relatively slower rate, even though cons_EGO has used 34 more samples to initialize its surrogates. The star marker denotes the point at which cons_EGO(EB) has evaluated 20 samples after surrogate initialization. It is worth noting that when 20 samples have been evaluated in the optimization, cons_EGO(EB) achieves a smaller BSFC value than cons_EGO. After the star marker, the decrease of BSFC becomes slower as cons_EGO(EB) has reached the optimal region. Therefore, further improvement in the normalized BSFC value is not significant and thus hard to observe. The advantages of our experience-based SAEA framework can also be observed in constraint handling. In Figs. 3c and 3d, cons_EGO(EB) finds more feasible solutions than two comparison algorithms. These results indicate that our SAEA framework improves the performance of cons_EGO on both objective function and constraint functions. Meanwhile, only 1dd evaluations are used to initialize surrogates.

Discussion on runtime:
It should be noted that real engine performance evaluations on engine facilities are very costly in terms of both time and financial budget [11]. Since a single real engine performance evaluation can cost several hours [44, 11], the time cost of the meta-learning procedure is negligible as it takes only a few minutes. Savings from reduced real engine performance evaluations on engine facilities and the reduced development cycle due to our SAEA framework could amount to millions of dollars [11]. Our SAEA framework is an effective and efficient method to solve this real-world calibration problem.

V Conclusion and further work

Experienced human engineers are good at tackling new and unseen optimization problems in comparison to novices. There has been an on-going effort in evolutionary optimization trying to capture and then use such experiences [1, 2, 3]. In this paper, we present an experience-based SAEA framework to solve expensive optimization problems. To learn experiences from related tasks, a novel meta-learning modeling method, namely MDKL, has been developed. Our MDKL model learns the domain-specific features of a set of related tasks from plenty of small datasets. The learned experiences are integrated with very limited examples collected from the target optimization problem, which improves the modeling efficiency and approximation accuracy. The effectiveness of learning experiences has been demonstrated by comparing our MDKL modeling method with other meta-learning and non-meta-learning modeling methods. Our experience-based SAEA framework uses MDKL models as surrogates and an MSE-based update criterion is proposed to manage MDKL surrogates during the evolutionary optimization. Our SAEA framework is applicable to any regression-based SAEAs by replacing their original surrogates with our MDKL surrogates. Our computational studies have demonstrated the effectiveness of our SAEA framework on multi-objective optimization and constrained single-objective optimization (a real-world engine application). On most testing functions, better or comparable optimization results are achieved when only 1dd samples are used to initialize our surrogates. A detailed summary of our experiments is presented in the supplementary material.

There are still many open research questions remaining. For example, the precise definition of experience is still unclear. Different work so far seems to have capture different aspects of experience. A more precise and comprehensive definition of optimization experience is needed. It is also a challenging task to represent such experience formally. Regarding to the SAEA work in this paper, there are two specific future research directions. First, we do not have a mathematical definition of related tasks. Although our computational studies have demonstrated that the experiences learned from a set of related tasks that differ from the target task are beneficial to the optimization, we cannot guarantee the optimization performance if we further decrease the similarity between the related tasks and the target task. It is very interesting to study similarity measures between tasks in the context of experience-based SAEA framework. In fact, this is also a general research topic that is relevant to other experience-based optimization methods, such as transfer optimization [9, 10]. Second, the proposed framework is currently for regression-based SAEAs only. As we can see from the DTLZ optimization experiments, classification-based SAEAs and ordinal-regression-based SAEAs could be more effective sometimes. These SAEAs learn surrogates from user-assigned class labels or ordinal relationships, instead of original fitness values. It is an interesting future work to do meta-learning from user-assigned values.

References

  • [1] S. Liu, K. Tang, and X. Yao, “Experience-based optimization: A coevolutionary approach,” arXiv preprint arXiv:1703.09865, 2017.
  • [2] K. Tang, S. Liu, P. Yang, and X. Yao, “Few-shots parallel algorithm portfolio construction via co-evolution,” IEEE Transactions on Evolutionary Computation, vol. 25, no. 3, pp. 595–607, 2021.
  • [3] K. C. Tan, L. Feng, and M. Jiang, “Evolutionary transfer optimization-a new frontier in evolutionary computation research,” IEEE Computational Intelligence Magazine, vol. 16, no. 1, pp. 22–33, 2021.
  • [4] M. Jiang, Z. Wang, L. Qiu, S. Guo, X. Gao, and K. C. Tan, “A fast dynamic evolutionary multiobjective algorithm via manifold transfer learning,” IEEE Transactions on Cybernetics, vol. 51, no. 7, pp. 3417–3428, 2020.
  • [5] M. Jiang, Z. Wang, S. Guo, X. Gao, and K. C. Tan, “Individual-based transfer learning for dynamic multiobjective optimization,” IEEE Transactions on Cybernetics, vol. 51, no. 10, pp. 4968–4981, 2020.
  • [6] T. Wei, S. Wang, J. Zhong, D. Liu, and J. Zhang, “A review on evolutionary multi-task optimization: Trends and challenges,” IEEE Transactions on Evolutionary Computation, vol. 26, no. 5, pp. 941–960, 2021.
  • [7] K. K. Bali, Y.-S. Ong, A. Gupta, and P. S. Tan, “Multifactorial evolutionary algorithm with online transfer parameter estimation: MFEA-II,” IEEE Transactions on Evolutionary Computation, vol. 24, no. 1, pp. 69–83, 2019.
  • [8] X. Xue, K. Zhang, K. C. Tan, L. Feng, J. Wang, G. Chen, X. Zhao, L. Zhang, and J. Yao, “Affine transformation-enhanced multifactorial optimization for heterogeneous problems,” IEEE Transactions on Cybernetics, pp. 1–15, 2020.
  • [9] G. Ruan, L. L. Minku, S. Menzel, B. Sendhoff, and X. Yao, “When and how to transfer knowledge in dynamic multi-objective optimization,” in Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI’19), 2019, pp. 2034–2041.
  • [10] ——, “Computational study on effectiveness of knowledge transfer in dynamic multi-objective optimization,” in Proceedings of the 22nd IEEE Congress on Evolutionary Computation (CEC’20), 2020, pp. 1–8.
  • [11] X. Yu, L. Zhu, Y. Wang, D. Filev, and X. Yao, “Internal combustion engine calibration using optimization algorithms,” Applied Energy, vol. 305, p. 117894, 2022.
  • [12] W.-Y. Chen, Y.-C. Liu, Z. Kira, Y.-C. F. Wang, and J.-B. Huang, “A closer look at few-shot classification,” in Proceedings of the 7th International Conference on Learning Representations (ICLR’19), 2019.
  • [13] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a few examples: A survey on few-shot learning,” ACM Computing Surveys, vol. 53, no. 3, pp. 1–34, 2020.
  • [14] T. M. Hospedales, A. Antoniou, P. Micaelli, and A. J. Storkey, “Meta-learning in neural networks: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  • [15] A. Gupta, Y.-S. Ong, and L. Feng, “Insights on transfer optimization: Because experience is the best teacher,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 51–64, 2017.
  • [16] J. Ding, C. Yang, Y. Jin, and T. Chai, “Generalized multitasking for evolutionary optimization of expensive problems,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 1, pp. 44–58, 2017.
  • [17] R.-T. Liaw and C.-K. Ting, “Evolutionary manytasking optimization based on symbiosis in biocoenosis,” in Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), 2019, pp. 4295–4303.
  • [18] L. Zhang, Y. Xie, J. Chen, L. Feng, C. Chen, and K. Liu, “A study on multiform multi-objective evolutionary optimization,” Memetic Computing, vol. 13, no. 3, pp. 307–318, 2021.
  • [19] Z. Guo, H. Liu, Y.-S. Ong, X. Qu, Y. Zhang, and J. Zheng, “Generative multiform bayesian optimization,” IEEE Transactions on Cybernetics (Early access), 2022.
  • [20] M. Volpp, L. P. Fröhlich, K. Fischer, A. Doerr, S. Falkner, F. Hutter, and C. Daniel, “Meta-learning acquisition functions for transfer learning in bayesian optimization,” in Proceedings of the 8th International Conference on Learning Representations (ICLR’20), 2020.
  • [21] F. Zhuang, Z. Qi, K. Duan, D. Xi, Y. Zhu, H. Zhu, H. Xiong, and Q. He, “A comprehensive survey on transfer learning,” Proceedings of the IEEE, vol. 109, no. 1, pp. 43–76, 2020.
  • [22] M. Wistuba and J. Grabocka, “Few-shot bayesian optimization with deep kernel surrogates,” in Proceedings of the 9th International Conference on Learning Representations (ICLR’21), 2021.
  • [23] M. Patacchiola, J. Turner, E. J. Crowley, M. O’Boyle, and A. Storkey, “Bayesian meta-learning for the few-shot setting via deep kernels,” in Advance in Neural Information Processing Systems 33 (NeurIPS’20), 2020.
  • [24] A. G. Wilson, Z. Hu, R. Salakhutdinov, and E. P. Xing, “Deep kernel learning,” in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS’16), 2016, pp. 370–378.
  • [25] C. K. Williams and C. E. Rasmussen, Gaussian Processes for Machine Learning.   Cambridge, MA: MIT press, 2006.
  • [26] M. L. Stein, Interpolation of Spatial Data: Some Theory for Kriging.   New York, NY: Springer Science & Business Media, 1999.
  • [27] J. Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn, “Design and analysis of computer experiments,” Statistical Science, vol. 4, no. 4, pp. 409–423, 1989.
  • [28] P. Tossou, B. Dura, F. Laviolette, M. Marchand, and A. Lacoste, “Adaptive deep kernel learning,” arXiv preprint arXiv:1905.12131, 2019.
  • [29] M. Garnelo, J. Schwarz, D. Rosenbaum, F. Viola, D. J. Rezende, S. Eslami, and Y. W. Teh, “Neural processes,” arXiv preprint arXiv:1807.01622, 2018.
  • [30] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” Journal of Global Optimization, vol. 13, no. 4, pp. 455–492, 1998.
  • [31] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning (ICML’17), 2017, pp. 1126–1135.
  • [32] J. Harrison, A. Sharma, and M. Pavone, “Meta-learning priors for efficient online bayesian regression,” in Proceedings of the 13th Workshop on the Algorithmic Foundations of Robotics (WAFR’18), 2018, pp. 318–337.
  • [33] L. Zhu, Y. Wang, A. Pal, and G. Zhu, “Engine calibration using global optimization methods with customization,” SAE Technical Paper, Tech. Rep. 2020-01-0270, 2020.
  • [34] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler, “Scalable test problems for evolutionary multiobjective optimization,” in Evolutionary Multiobjective Optimization.   London, U.K.: Springer, 2005, pp. 105–145.
  • [35] L. Pan, C. He, Y. Tian, H. Wang, X. Zhang, and Y. Jin, “A classification-based surrogate-assisted evolutionary algorithm for expensive many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 1, pp. 74–88, 2018.
  • [36] Z. Song, H. Wang, C. He, and Y. Jin, “A kriging-assisted two-archive evolutionary algorithm for expensive many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 25, no. 6, pp. 1013–1027, 2021.
  • [37] Q. Zhang, W. Liu, E. Tsang, and B. Virginas, “Expensive multiobjective optimization by MOEA/D with gaussian process model,” IEEE Transactions on Evolutionary Computation, vol. 14, no. 3, pp. 456–474, 2010.
  • [38] J. Knowles, “ParEGO: A hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 1, pp. 50–66, 2006.
  • [39] T. Chugh, Y. Jin, K. Miettinen, J. Hakanen, and K. Sindhya, “A surrogate-assisted reference vector guided evolutionary algorithm for computationally expensive many-objective optimization,” IEEE Transactions on Evolutionary Computation, vol. 22, no. 1, pp. 129–142, 2016.
  • [40] X. Yu, X. Yao, Y. Wang, L. Zhu, and D. Filev, “Domination-based ordinal regression for expensive multi-objective optimization,” in Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI’19), 2019, pp. 2058–2065.
  • [41] Y. Tian, R. Cheng, X. Zhang, and Y. Jin, “PlatEMO: A MATLAB platform for evolutionary multi-objective optimization [educational forum],” IEEE Computational Intelligence Magazine, vol. 12, no. 4, pp. 73–87, 2017.
  • [42] M. D. McKay, R. J. Beckman, and W. J. Conover, “A comparison of three methods for selecting values of input variables in the analysis of output from a computer code,” Technometrics, vol. 42, no. 1, pp. 55–61, 2000.
  • [43] H. Ishibuchi, H. Masuda, Y. Tanigaki, and Y. Nojima, “Modified distance calculation in generational distance and inverted generational distance,” in Proceedings of the 8th International Conference on Evolutionary Multi-criterion Optimization (EMO’15), 2015, pp. 110–125.
  • [44] H. Ma, “Control oriented engine modeling and engine multi-objective optimal feedback control,” Ph.D. dissertation, University of Birmingham, 2013.