This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\jyear

2022

[1]\fnmHaris Moazam \surSheikh

1]\orgdivCFD Lab, Department of Mechanical Engineering, \orgnameUniversity of California, Berkeley, \orgaddress \stateCA, \countryUSA

Bayesian Optimization For Multi-Objective Mixed-Variable Problems

[email protected]    \fnmPhilip S. \surMarcus [email protected] [
Abstract

Optimizing multiple, non-preferential objectives for mixed-variable, expensive black-box problems is important in many areas of engineering and science. The expensive, noisy, black-box nature of these problems makes them ideal candidates for Bayesian optimization (BO). Mixed-variable and multi-objective problems, however, are a challenge due to BO’s underlying smooth Gaussian process surrogate model. Current multi-objective BO algorithms cannot deal with mixed-variable problems. We present MixMOBO, the first mixed-variable, multi-objective Bayesian optimization framework for such problems. Using MixMOBO, optimal Pareto-fronts for multi-objective, mixed-variable design spaces can be found efficiently while ensuring diverse solutions. The method is sufficiently flexible to incorporate different kernels and acquisition functions, including those that were developed for mixed-variable or multi-objective problems by other authors. We also present HedgeMO, a modified Hedge strategy that uses a portfolio of acquisition functions for multi-objective problems. We present a new acquisition function, SMC. Our results show that MixMOBO performs well against other mixed-variable algorithms on synthetic problems. We apply MixMOBO to the real-world design of an architected material and show that our optimal design, which was experimentally fabricated and validated, has a normalized strain energy density 10410^{4} times greater than existing structures.

keywords:
Bayesian Optimization, Mixed Variables, Multi Objective, MixMOBO, HedgeMO, Architected Meta-Materials

1 Introduction

Optimization is an inherent part of design for complex physical systems. Often optimization problems are posed as noisy black-box problems subject to constraints, where each function call requires an extremely expensive computation or a physical experiment. Many of these problems require optimizing a mixed-variable design space (combinatorial, discrete, and continuous) for multiple non-preferential objectives. Architected material design Frazier and Wang (2015); Chen et al. (2018a, 2019); Shaw et al. (2019); Song et al. (2019a); Vangelatos et al. (2021), hyper-parameter tuning for machine learning algorithms Snoek et al. (2012); Chen et al. (2018b); Oh et al. (2018), drug design Pyzer-Knapp (2018); Korovina et al. (2020), fluid machinery Sheikh et al. (2022a, b); doi:10.1177/0309524X17709732; Sheikh et al. (2021) and, controller sensor placement Krause et al. (2008) pose such problems. Due to their cost of evaluation, Bayesian optimization is a natural candidate for their optimization.

Much research has gone into Bayesian optimization for continuous design spaces using Gaussian processes (GP) as a surrogate model and efficiently optimizing this design space with a minimum number of expensive function calls Mockus (1994); Rasmussen and Williams (2006); Brochu et al. (2010). Despite the success of continuous variable Bayesian optimization strategies, multi-objective and mixed-variable problems remain an area of open research. The inherent continuous nature of GP makes dealing with mixed-variable problems challenging. Finding a Pareto-front for multi-objective problems, and parallelizing function calls for batch updates, Q-batch, also remain challenges in the sequential setting of the BO algorithm. Hedge strategies, which use a portfolio of acquisition functions to reduce the effect of choosing a particular acquisition function, have not been formulated for multi-objective problems.

1.1 Mixed-Variable BO Algorithms:

We provide a brief description of the current approaches in recent studies for dealing with mixed variables.

One Hot Encoding Approach: Most BO schemes use Gaussian processes as surrogate models. When dealing with categorical variables, a common method is ‘one-hot encoding’ Golovin et al. (2017). Popular BO packages, such as GPyOpt and Spearmint Snoek et al. (2012), use this strategy. However, this can result in inefficiency when searching the parameter space because the surrogate model is continuous. For categorical variables, this approach also leads to a quick explosion in dimensional space Ru et al. (2020).

Multi-Armed Bandit (MAB) Approach: Some studies use the MAB approach when dealing with categorical variables where a surrogate surface for continuous variables is defined for each bandit arm. These strategies can be expensive in terms of the number of samples required Gopakumar et al. (2018); Nguyen et al. (2019), and they do not share information across categories. An interesting approach, where coupling is introduced between continuous and categorical variables, is presented in the CoCaBO algorithm Ru et al. (2020), and it is one of the baselines that we test MixMOBO against.

Latent Space Approach: A latent variable approach has also been proposed to model categorical variables Qian et al. (2008); Zhou et al. (2011); Zhang et al. (2020); Deshwal and Doppa (2021). This approach embeds each categorical variable in a 𝒵\mathcal{Z} latent variable space. However, the embedding is dependent on the kernel chosen, and for small-data settings can be inefficient.

Modified Kernel Approach: There is a rich collection of studies in which the underlying kernel is modified to work with ordinal or categorical variables. For example, Ru et al. (2020) considers the sum + product kernel; Deshwal et al. (2021) proposes hybrid diffusion kernels, HyBO; and Oh et al. (2021) proposes frequency modulated kernels. The BOCS algorithm Baptista and Poloczek (2018) for categorical variables uses a scalable modified acquisition function. Pelamatti et al. (2018); Oh et al. (2019); Nguyen et al. (2019); Garrido-Merchán and Hernández-Lobato (2020) all use modified kernels to adapt the underlying surrogate surface. Our approach is unique in that any modified kernel can be incorporated into our framework. Currently we use the modified radial basis function (RBF) kernel for modelling the surrogate surface, with our future research focused on using different kernels in our framework.

Other Surrogate Models: Other surrogate models can be used in place of the GP to model mixed-variable problems such as random forests, an approach used by SMAC3 Lindauer et al. (2021) or tree based estimators, used in the Tree-Parzen Estimator (TPE) Bergstra et al. (2013). Daxberger et al. (2020) considers a linear model with cross-product features. BORE Tiao et al. (2021) leverages the connection to density ratio estimation.

1.2 Multi-Objective BO Algorithms:

Multi-objective Bayesian optimization (MOBO) has been the subject of some recent studies. BoTorch Balandat et al. (2020), the popular BO framework, uses the EHVI and ParEGO based on the works of Fonseca et al. (2006) and Daulton et al. (2020, 2021). Hyper-volume improvement is the main mechanism used to ensure diversity in generations. ‘QQ-batch’ parallel settings of the above two acquisition functions use hyper-volume improvement and the previously selected point in the same batch to choose the next set of points. For most single-objective BO algorithms with parallel batch selection, the next batch of test points is selected by adding the ‘fantasy’ cost-function evaluation, usually the predicted mean, to the previously selected test point within that batch. However, this commonly used method often leads to overly confident test point selection, and the surrogate surface then needs to be optimized, and sometimes refitted QQ times. Using a genetic algorithm (GA), we can select a ‘QQ-batch’ of points with a single optimization of the surrogate surface from the GA generation.

Suzuki et al. (2020) provide an interesting Pareto-frontier entropy method as an acquisition function, and Shu et al. (2020) use Pareto-frontier heuristics to formulate new acquisition functions. Their approaches were not extended to mixed-variable problems.

1.3 Hedge Strategies

Hedge algorithms have proven to be efficient in dealing with a diverse set of problems by using a portfolio of acquisition functions. ‘GP-Hedge’, introduced by Brochu et al. (2011) is a well-known and efficient algorithm. However, current Hedge algorithms have not been extended for multi-objective problems and, to the authors’ knowledge, there is no existing Hedge strategy implementation that solves such problems.

2 MixMOBO

In this paper, we present a Mixed-variable, Multi-Objective Bayesian Optimization (MixMOBO) algorithm, the first generalized framework that can deal with mixed-variable, multi-objective problems in small data setting and can optimize a noisy black-box function with a small number of function calls.

Genetic algorithms, such as NSGA-II Deb et al. (2002), are well known for dealing with mixed-variable spaces and finding an optimal Pareto-frontier. However, these algorithms require a large number of black-box function calls and are not well suited to expensive small-data problems. Our approach is to use a GA to optimize the surrogate model itself and find a Pareto-frontier. Diversification is ensured by the distance metrics used while optimizing the surrogate model. This method allows cheap Q-batch samples from within the GA generation, and also allows the use of commonly used acquisition functions such as Expected Improvement (EI), Probability of Improvement (PI) and Upper Confidence Bound (UCB) Brochu et al. (2011), which work well for single objective problems. We note here that other metrics can easily be incorporated instead of a distance metric within the GA setting and is one of the areas of our future work. Using a GA on a mixed variable surrogate model in a multi-objective setting allows us to work with modified kernels that were developed for mixed-variable problems in literature. We also present a new acquisition function, ‘Stochastic Monte-Carlo’ (SMC), which performs well for multi-objective categorical variable problems Vangelatos et al. (2021).

Hedge strategies for Bayesian optimization are efficient for single objective algorithms. We present here our Hedge Multi-Objective (HedgeMO) algorithm, which uses a portfolio of acquisition functions for multi-objective problems and can work with Q-batch updates. It is an extension of GP-Hedge Brochu et al. (2011), which has regret bounds, and the same bounds hold for HedgeMO.

We note here that MixMOBO is designed for mixed-variable, multi-objective problems. Although there are algorithms in the literature that can solve problems with a subset of these attributes (e.g. mixed-variable single-objective or multi-objective continuous variable problems), no algorithm, to our knowledge, can deal with all of these attributes. In addition, MixMOBO outputs a batch of query points and uses HedgeMO, the first multi-objective hedging strategy. To the authors’ knowledge, no existing approaches can achieve all this within a single framework.

In summary, the main contributions of our work are as follows:

  • We present Mixed-variable, Multi-Objective Bayesian Optimization (MixMOBO), the first algorithm that can deal with mixed-variable, multi-objective problems. The framework uses GA to optimize the acquisition function on a surrogate surface, so it can use modified kernels or surrogate surfaces developed to deal with mixed-variable problems in previous studies. This extends the capabilities of previous approaches in literature to handle mixed-variable and multi-objective problems as well if adopted within our framework, since our framework is agnostic to the underlying GP kernel over mixed-variables.

  • GA is used to optimize surrogate models, which allows the optimization of multi-objective problems. ‘QQ-batch’ samples can be extracted in parallel from within the GA generation without sacrificing diversification.

  • We present a Hedge Multi-Objective (HedgeMO) strategy for multiple objectives for which regret bounds hold. We also present an acquisition function, Stochastic Monte-Carlo (SMC), which performs well for combinatorial multi-objective problems, and use it as part of our HedgeMO portfolio.

  • We benchmark our algorithm against other mixed-variable algorithms and prove that MixMOBO performs well on test functions. We applied MixMOBO to a practical engineering problem: the design of a new architected meta-material that was optimized to have the maximum possible strain-energy density within the constraints of a design space. The fabrication and testing of this new material showed that is has a normalized strain energy density that is 10410^{4} times greater than existing unblemished microlattice structures in literature.

The rest of the paper is organized in the following manner: Section 3 defines the optimization problem to be solved with MixMOBO. The detailed workings of MixMOBO and HedgeMO are presented in Section 4. Section 5 details the validation tests performed on our framework to test its efficiency and comparison to existing algorithms. Our application of MixMOBO for design of architected materials and its results are presented in Section 6.

3 MixMOBO Problem Statement

We pose the multi-objective and mixed-variable problem as:

wopt=argmaxw𝒲(f(w))\vec{w}_{opt}=argmax_{\vec{w}\in\mathcal{W}}(\vec{f}(\vec{w})) (1)

for maximizing the objective. Here f(w)=[f1(w),f2(w),\vec{f}(\vec{w})=[f_{1}(\vec{w}),f_{2}(\vec{w}),\ldots ,fk(w)],f_{k}(\vec{w})] are the KK non-preferential objectives to be maximized, and w\vec{w} is a mixed-variable vector, defined as {w𝒲}={x𝒳,y𝒴,z𝒵}\vec{w}\in\mathcal{W}\}=\{\vec{x}\in\mathcal{X},\vec{y}\in\mathcal{Y},\vec{z}\in\mathcal{Z}\}. x\vec{x} is an mm-dimensional vector defined over a bounded set 𝒳m\mathcal{X}\subset\mathbb{R}^{m} representing mm continuous variables. Ordinal and categorical variables are defined as y=[y1,,yn]\vec{y}=[y_{1},\ldots,y_{n}] and z=[z1,,zo]\vec{z}=[z_{1},\ldots,z_{o}], respectively. Each variable yj{O1,,Oj}y_{j}\in\{O_{1},\ldots,O_{j}\} takes one of OjO_{j} ordinal ‘levels’ (discrete numbers on the real-number line) and each categorical variable takes a value zj{C1,,Cj}z_{j}\in\{C_{1},\ldots,C_{j}\} from CjC_{j} unordered categories (that cannot, by definition, be ordered on the real-number line). 𝒴\mathcal{Y} and 𝒵\mathcal{Z} are the ordinal and combinatorial spaces respectively.

Generally, {wopt}\{\vec{w}_{opt}\} is a set of Pareto-optimal solution vectors i.e., vectors that are not Pareto-dominated by any other vector. A vector w\vec{w} is Pareto-dominated by w\vec{w}^{\prime}, iff fk(w)fk(w)k=1,Kf_{k}(\vec{w})\leq f_{k}(\vec{w}^{\prime})~{}\forall~{}k=1,...K. This {wopt}\{\vec{w}_{opt}\} is the optimal set found by MixMOBO, details of which are presented in the following section.

4 Methodology

Preliminaries

Single-objective Bayesian optimization is a sequential optimization technique, aimed at finding the global optimum of a single objective noisy black-box function ff with minimum number of evaluations of ff. For every ithi^{th} iteration, a surrogate model, gg, is fit over the existing data set 𝒟={(w1,f(w1)),,(wi,f(wi))}\mathcal{D}=\{(w_{1},f(w_{1})),\ldots,(w_{i},f(w_{i}))\}. An acquisition function then determines the next point wi+1\vec{w}_{i+1} for evaluation with ff, balancing exploration and exploitation. Data is appended for the next iteration, 𝒟=𝒟(wi+1,f(wi+1))\mathcal{D}=\mathcal{D}\cup(w_{i+1},f(w_{i+1})), and the process is repeated until the evaluation budget for ff or the global optimum is reached.

Gaussian processes are often used as surrogate models for BO Rasmussen and Williams (2006); Murphy (2012). A GP is defined as a stochastic process such that a linear combination of a finite set of the random variables is a multivariate Gaussian distribution. A GP is uniquely specified by its mean μ(w)\mu(\vec{w}) and covariance function ker(wker(\vec{w}, w)\vec{w}^{\prime}). The GP is a distribution over functions, and g(w)g(\vec{w}) is a function sampled from this GP:

g(w)GP(μ(w),ker(w,w)).{\vec{g}}(\vec{w})\sim GP\bigl{(}{{\mu}(\vec{w})},ker(\vec{w},\vec{w}^{\prime})\bigr{)}. (2)

Here, ker(w,w)ker(\vec{w},\vec{w}^{\prime}) is the covariance between input variables w\vec{w} and w\vec{w}^{\prime}. Once a GP has been defined, at any w{\vec{w}} the GP returns the mean μ(w)\mu({\vec{w}}) and variance σ(w)\sigma({\vec{w}}). The acquisition function 𝒜(g)\mathcal{A}(\vec{g}), balances exploration and exploitation, and is optimized to find the next optimal point wi+1\vec{w}_{i+1}. The success of BO comes from the fact that evaluating g\vec{g} is much cheaper than evaluating f\vec{f}.

4.1 MixMOBO Approach

Our Mixed-variable Multi-Objective Bayesian Optimization (MixMOBO) algorithm extends the single-objective, continuous variable BO approach presented in the preceding section, to more generalized optimization problems and is detailed in Algorithm 1.

A single noisy GP surrogate surface is fit for multiple objectives, g(w)GP(μ(w),ker(w,w)){\vec{g}}(\vec{w})\sim GP\bigl{(}{\vec{\mu}(\vec{w})},ker(\vec{w},\vec{w}^{\prime})\bigr{)}. Note that this is different from Eq. 2, since the GP would predict mean for multiple objectives. For details on fitting a single GP to multi-objective data, we refer the reader to (Rasmussen and Williams, 2006, Eq. 2.25-2.26). For multiple objectives, the response vector, with nn-data points, is of size k×nk\times n. The predicted variance remains the same, but the predicted mean is a k×1k\times 1 vector. This is equivalent to fitting KK GP surfaces with the same kernel for all of the surfaces, where KK is the total number of objectives. All KK objectives are assumed to have equal noise levels. Only one set of hyper-parameters needs to be fit over this single surface, rather than fitting KK sets of hyper-parameters for KK different surfaces; thus, when KK is large, the overall computational cost for the algorithm is reduced. Note that we could fit KK different GP surfaces, particularly if different noise levels for different objectives is to be considered, with different hyper-parameters to the data to add further flexibility to the fitted surfaces. This idea will be investigated in our future work. We use LOOCV Murphy (2012) for estimating hyper-parameters since we are dealing with small-data problems.

Algorithm 1 Mixed-variable Multi-Objective Bayesian Optimization (MixMOBO) Algorithm
1:Input: Black-box function f(w):w𝒲\vec{f}(\vec{w}):{\vec{w}\in\mathcal{W}}, initial data set size N_iN\_i, batch points per epoch QQ, total epochs NN, mutation rate β[0,1]\beta\in[0,1]
2:Initialize: Sample black-box function f\vec{f} for 𝒟={(wj,f(wj))}j=1:N_i\mathcal{D}=\left\{\left(\vec{w}_{j},\vec{f}(\vec{w}_{j})\right)\right\}_{j=1:N\_i}
3:for n=1n=1 to NN do
4:     Fit a noisy Gaussian process surrogate function g(w)GP(μ(w),ker(w,w)){\vec{g}}(\vec{w})\sim GP\bigl{(}{\vec{\mu}(\vec{w})},ker(\vec{w},\vec{w}^{\prime})\bigr{)}
5:     For LL total acquisition functions, from each 𝒜l\mathcal{A}^{l} acquisition function, propose QQ-batch test-points, {(u)nl}1:Q={argmaxu𝒲𝒜l(g)}1:Q\left\{(\vec{u})_{n}^{l}\right\}_{1:Q}=\left\{argmax_{\vec{u}\in\mathcal{W}}\mathcal{A}^{l}\left(\vec{g}\right)\right\}_{1:Q} within the constrained search space 𝒲\mathcal{W} using multi-objective GA
6:     Mutate point {(u)nl}q\left\{(\vec{u})_{n}^{l}\right\}_{q} within the search space 𝒲\mathcal{W} with probability rate β\beta if L2L_{2}-norm of its difference with any other member in set {(u)nl}1:Q\left\{(\vec{u})_{n}^{l}\right\}_{1:Q} is below tolerance
7:     Select batch of QQ points using HedgeMO, {wn}1:Q=HedgeMO(g,{(u)1:n1:L}1:Q,𝒟)\left\{\vec{w}_{n}\right\}_{1:Q}=\textit{HedgeMO}\left(\vec{g},\left\{(\vec{u})_{1:n}^{1:L}\right\}_{1:Q},\mathcal{D}\right)
8:     Evaluate the selected points from the black-box function, {f(wn)}1:Q\{\vec{f}(\vec{w}_{n})\}_{1:Q}
9:     Update 𝒟=𝒟{(wn,f(wn))}1:Q\mathcal{D}=\mathcal{D}\cup\left\{\left(\vec{w}_{n},\vec{f}(\vec{w}_{n})\right)\right\}_{1:Q}
10:end for
11:return Pareto-optimal solution set {(wopt,f(wopt))}\left\{\left(\vec{w}_{opt},\vec{f}(\vec{w}_{opt})\right)\right\}

Gaussian processes are defined for continuous variables. For mixed variables, we need to adapt the kernel so that a GP can be fit over these variables. Cited works in Section 1 dealt with modified kernels that were designed to model mixed variables. Those kernels can be used in the MixMOBO algorithm. For the current study, we use a simple modified squared exponential kernel:

ker(w,w)ϵf2exp[12|w,w|CTM¯¯|w,w|C],ker({\vec{w},\vec{w}^{\prime}})\equiv\epsilon_{f}^{2}\,\,exp\left[-\frac{1}{2}\lvert{\vec{w}},{\vec{w}}^{\prime}\rvert_{C}^{T}\,\,\,\,{\underline{\underline{M}}}\,\,\,\,\lvert{\vec{w}},\vec{w}^{\prime}\rvert_{C}\right], (3)

where θ=({M¯¯},ϵf)\vec{\theta}=(\{{\underline{\underline{M}}}\},\epsilon_{f}) is a vector containing all the hyper-parameters, {M¯¯}=diag(h)2\{{\underline{\underline{M}}}\}=diag(\vec{h})^{-2} is the covariance hyper-parameter matrix and h\vec{h} is the vector of covariance lengths. The distance metric, |w,w|C\lvert{\vec{w}},{\vec{w}}^{\prime}\rvert_{C}, is an concatenated vector, with the distance between categorical variables defined to be the Hamming distance, and the distance between continuous variables and the distance between ordinal variables defined to be their Euclidean distances. Noise is added to the diagonal of the covariance matrix. We emphasize that any modified kernel discussed in the citations of Section 1 can be used within our framework and is a focus of our future work.

Once the GP is fit over multi-objective data, acquisition functions, 𝒜l\mathcal{A}^{l}, explore the surrogate model to maximize reward by balancing exploration and exploitation. Using a standard optimization scheme is problematic when dealing with mixed-variable and multi-objective problems due to non-smooth surrogate surface and conflicting objectives. We propose using a constrained, multi-objective GA to optimize the acquisition functions, which, although expensive to use on an actual black-box function, is an ideal candidate for optimizing the acquisition function working on the surrogate surface. For multi-objective problems, multi-objective GA algorithms, such as Deb et al. (2002), are ideal candidates for obtaining a Pareto-front of optimal solutions.

Within a GA generation, for multi-objectives, diversification is ensured by a ‘distance crowding function’ which ranks the members of a non-dominated Pareto-front. The ‘distance crowding function’ can be computed in decision-variable space, in function space or a hybrid of the two, and ensures that the generations are distinct and diverse. This inherent feature of GA is exploited to ensure diversity in the ‘QQ-batch’ of points. The ranking takes place when choosing the test points from an acquisition function for a multi-objective problem because the choice must take into account the diversity of the solution and propagate the Pareto-front. Because the members of the population are ranked by the GA, we can easily extract a ‘QQ-batch’ of points from each of the acquisition functions without needing to add any ‘fantasy’ cost function evaluations or optimizing the acquisition functions again. This is a great advantage of using GA as our optimizer since we can output a ‘QQ-batch’ of diverse query points using the inherent GA features.

For dealing with mixed-variable problems, GA are again ideal candidates. Genetic algorithms (GA) can be easily be constrained to work in mixed variable spaces. These variables can be dealt with by using probabilistic mutation rates. The genes are allowed to mutate within their prescribed categories, thereby constraining the proposed test points to the 𝒲\mathcal{W} space.

Common acquisition functions, such as EI, PI, and UCB, can be used within this framework and can be used to nominate a ‘QQ-batch’ of points. If a candidate in a Q-batch is within the tolerance limit of another candidate in the same batch or a previous data point (for convex functions), we mutate the proposed point within 𝒲\mathcal{W} to avoid sampling the same data point again.

Test points are selected from 𝒲\mathcal{W} to evaluate their f\vec{f} using HedgeMO algorithm which is detailed in the next section. HedgeMO selects a ‘QQ-batch’ of test-points from the candidates proposed by each of the acquisition functions. These points are then, along with their function evaluations f\vec{f}s, appended to the data set.

4.2 HedgeMO Algorithm

Hedge strategies use a portfolio of acquisition functions, rather than a single acquisition function. It is an extension to multi-objective problems of GP-Hedge algorithm proposed by Brochu et al. (2011). HedgeMO is part of our MixMOBO algorithm that not only extends the Hedge strategy to multi-objective problems, but also allows ‘QQ-batches’. Our algorithm is shown in Algorithm 2.

Algorithm 2 HedgeMO Algorithm
1:Input: Surrogate function g(w):w𝒲\vec{g}(\vec{w}):{\vec{w}\in\mathcal{W}}, proposed test points by AFs ({(u)1:n1:L}1:Q)\left(\left\{(\vec{u})_{1:n}^{1:L}\right\}_{1:Q}\right), batch points per epoch QQ, current epoch nn, total objective KK, parameter η+\eta\in\mathbb{R}^{+}
2:for l=1l=1 to LL do
3:     For lthl^{th} acquisition function, find rewards for QQ-batch points nominated by that AF from epochs 1:n{1{:}n}-1, by sampling from g\vec{g}, {θ1:n1l}1:Q=μ({(u)1:n1l}1:Q)\left\{\vec{\theta}^{l}_{1:n-1}\right\}_{1:Q}=\vec{\mu}(\left\{(\vec{u})_{1:n-1}^{l}\right\}_{1:Q}), where θ={θ}k\vec{\theta}=\{\theta\}^{k} for each objective kk
4:end for
5:Normalize rewards for each lthl^{th} AF and kthk^{th} objective, ϕlk=j=1n1q=1Q{θjl}qkmin(Θ)max(Θ)min(Θ)\phi_{l}^{k}=\sum_{j=1}^{n-1}\sum_{q=1}^{Q}\frac{\left\{{\theta}^{l}_{j}\right\}^{k}_{q}-min(\Theta)}{max(\Theta)-min(\Theta)}, where Θ\Theta is defined as Θ={θ1:n11:L}1:Qk\Theta=\left\{{\theta}^{1:L}_{1:n-1}\right\}^{k}_{1:Q}
6:Calculate probability for selecting nominees from lthl^{th} acquisition function, pl=exp(ηk=1Kϕlk)i=1Lexp(ηk=1Kϕik)p^{l}=\frac{exp(\eta\sum^{K}_{k=1}\phi_{l}^{k})}{\sum_{i=1}^{L}exp(\eta\sum^{K}_{k=1}\phi_{i}^{k})}
7:for q=1q=1 to QQ do
8:     Select qthq^{th} nominee as test-point wnq{\vec{w}_{n}}^{q} from lthl^{th} AF with probability plp^{l}
9:end for
10:return Batch of test points {wn}1:Q\left\{\vec{w}_{n}\right\}_{1:Q}

Extending the methodology presented by Brochu et al. (2011), HedgeMO chooses the next ‘QQ-batch’ of test points from the history of the candidates nominated by all of the acquisition functions. Rewards are calculated for each acquisition function from the surrogate surface for the entire history of the nominated points by the LL acquisition functions. The rewards are then normalized to scale them to the same range for each objective. This step is fundamentally important because it prevents biasing the probability of any objective. This type of bias, of course, cannot occur in single objective problems. The rewards for different objectives kk are then summed and the probability, plp^{l}, of choosing a nominee from a specific acquisition function is calculated using step 6 in Algorithm 2. For a ‘QQ-batch’ of tests points, the test points are chosen QQ times.

Regret Bounds: The regret bounds derived by Brochu et al. (2011) hold for HedgeMO if and only if the Upper Confidence Bound (UCB) acquisition function is a part of the portfolio of acquisition functions. The regret bounds follow from the work of Srinivas et al. (2012) who derived cumulative regret bounds for UCB. In essence, the cumulative regret in our case is bounded by two sublinear terms as for UCB and an additional term which depends on proximity of the chosen point with the test point proposed by UCB. The interested reader is directed to Srinivas et al. (2012) and Brochu et al. (2011) for a description of the exact regret bounds and their derivation.

4.3 SMC Acquisition Function

We introduce a new acquisition function, Stochastic Monte-Carlo (SMC), which for the maximization of an objective, is defined as:

SMCargmaxw𝒲[μ(w)+r(w)],SMC\equiv argmax_{\vec{w}\in\mathcal{W}}[\vec{\mu}({\vec{w}})+r(\vec{w})], (4)

where r(w)r(\vec{w}) is sampled from U(0,2σ(w))U(0,2\sigma(\vec{w})), and μ(w)\vec{\mu}(\vec{w}) and σ(w)\sigma(\vec{w}) are the mean and standard deviation returned by the GP at w\vec{w}, respectively. This is equivalent to taking Monte-Carlo samples from a truncated distribution. For categorical and ordinal variable problems, this acqusition function performs well across a range of benchmark tests Vangelatos et al. (2021). We use this acquisition function as part of our portfolio of HedgeMO in the MixMOBO algorithm.

Refer to caption
Figure 1: Performance comparison of MixMOBO against other mixed-variable algorithms

5 Validation Tests

MixMOBO is designed to deal with mixed-variable, multi-objective problems. However, no other small-data algorithm, to the authors’ knowledge, can similarly deal with all the attributes of such problems to provide an honest comparison. In the absence of such competition, we use the specific case of mixed-variable, single-objective problems to provide a comparison to state-of-the-art algorithms present for such problems and prove that even for this subset case, MixMOBO is able to perform better than existing algorithms in the literature. We then perform further experiments in both single and multi-objective settings to show the efficacy of the HedgeMO algorithm compared to stand-alone acquisition functions and the performance of SMC in the multi-objective setting.

We benchmarked MixMOBO against a range of existing state-of-the-art optimization strategies that are commonly used for optimizing expensive black-box functions with mixed-variable design spaces. We chose the following single objective optimization algorithms for comparison: CoCaBO Ru et al. (2020), which combines the multi-armed bandit (MAB) and Bayesian optimization approaches by using a mixing kernel. CoCaBO has been shown to be more efficient than GPyOpt (one-hot encoding approach GPyOptAuthors (2016)) and EXP3BO (multi-armed bandit (MAB approach Gopakumar et al. (2018)). We used CoCaBO with a mixing parameter of 0.50.5. We also tested MixMOBO against GBRT, a sequential optimization technique using gradient boosted regression trees Scikit-learn (2021). TPE_Hyperopt (Tree-structured Parzen Estimator) is a sequential method for optimizing expensive black-box functions, introduced by Bergstra et al. (2011). SMAC3 is a popular Bayesian optimization algorithm in combination with an aggressive racing mechanism Hutter et al. (2011). Both of these algorithms, in addition to Random Sampling, were used as baselines. Publicly available libraries for these algorithms were used.

Six different test functions for mixed variables were chosen as benchmarks. A brief description of these test functions and their properties is given below with further details in Appendix A:

Contamination Problem: This problem, introduced by Hu et al. (2011), considers a food supply chain with various stages in the chain where food may be contaminated with pathogens. The objective is to maximize the reward of prevention efforts while making sure the chain does not get contaminated. It is widely used as a benchmark for binary categorical variables. We use the problem as a benchmark with 2121 binary categorical variables.

Encrypted Amalgamated: An anisotropic, mixed-variable function created using a combination of other commonly used test functions Tušar et al. (2019). We modify the combined function so that it can be used with mixed variables. In particular, it is adapted for categorical variables by encrypting the input space with a random vector, which produces a random landscape mimicking categorical variables Vangelatos et al. (2021). Our Encrypted Amalgamated function has 1313 inputs: 88 categorical, 33 ordinal variables (with 55 categories or states each) and 22 continuous.

NK Landscapes: This is a popular benchmark for simulating categorical variable problems using randomly rugged, interconnected landscapes Kauffman and Levin (1987); Li et al. (2006). The fitness landscape can be produced with random connectivity and number of optima. The problem is widely used in evolutionary biology and control optimization and is NPNP-complete. The probability of connectivity between NKNK is controlled by a ‘ruggedness’ parameter, which we set at 20%20\%. We test the Li et al. (2006) variant with 88 categorical variables with 44 categories each.

Rastringin: This is an isotropic test function, commonly used for continuous design spaces Tušar et al. (2019). We use a 99-D Rastringin function for testing a design space of 33 continuous and 66 ordinal variables with 55 discrete states.

Encrypted Syblinski-Tang: This function is isotropic Tušar et al. (2019), and we have modified it as we did with the Encrypted Amalgamated test function so that it can work with categorical variables and was used as a representative benchmark for NN-categorical variable problems. The 1010-D variant tested here consists only of categorical variables with 55 categories each.

Refer to caption
Figure 2: Performance comparison of HedgeMO against other acquisition functions

Encrypted ZDT6: This is a multi-objective test function introduced by Zitzler et al. (2000) that we modified with encryption so that it can deal with mixed variables. The test function is non-convex and non-uniform in the parameter space. We test ZDT6 with 10 categorical variables with 5 states each. ZDT6 was only used for testing HedgeMO.

To the extent of our knowledge, no other optimization algorithm is capable of handling mixed-variable, multi-objective problems in small-data settings. Thus, we have no direct comparisons between MixMOBO and other published algorithms. Therefore, we tested MixMOBO against a variant of NSGA-II Deb et al. (2002) with the ZDT4 and ZDT6 test functions with mixed variables. However, we found that using a GA required more than 10210^{2} more function calls to find the Pareto front to a similar tolerance. For visualization purposes, we do not plot the GA results.

All of the optimization algorithms were run as maximizers, with a 0.0050.005 noise variance built into all the benchmarks. The budget for each benchmark test was fixed at 250 function calls including the evaluations of 50 initial randomly sampled data points for all algorithms, except for SMAC3 which determines its own initial sample size. The algorithms were run in single output setting (GBRT, CoCaBO and MixMOBO’s batch mode was not used for fair comparison). Each algorithm was run 10 times for every benchmark. Our metric for optimization is the ‘Normalized Reward’, defined as (current optimum - random sampling optimum)/(global optimum - random sampling optimum). Figure 1 shows the Normalized Rewards versus the number of black-box function evaluations for MixMOBO and five other algorithms. The mean and standard deviation of the Normalized Rewards of the 1010 runs for each algorithm, along with their standard deviations (S.D.), are plotted. The width of each of the translucent colored bands is equal to 1/51/5 of their S.D.

MixMOBO outperforms all of the other baselines and is significantly better in dealing with mixed-variable problems. GBRT is the next best algorithm and performs better than MixMOBO on the Rastringin function; however, note that the Rastringin function does not include any categorical variables. For problems involving categorical variables, MixMOBO clearly outperforms the others. TPE and CoCaBO have similar performances, and SMAC3 has the poorest performance. All three are outdistanced by MixMOBO.

We then perform experiments to test the performance of our HedgeMO algorithm by comparing it to four different acquisition functions which make up the entirety of its porfolio. These acquisition functions, namely, EI, PI, UCB, and SMC, along with HedgeMO are tested on three different test functions: the Encrypted Amalgamated, Encrypted Syblinski-Tang, and Encrypted ZDT6. The latter is used as the multi-objective test function. The Normalized Reward for the multi-objective Encrypted ZDT6 is defined as (current P-optimum - random sampling P-optimum)/(global P-optimum - random sampling P-optimum). Here, P-optimum=1Ni=1Nexp=\frac{1}{N}\sum_{i=1}^{N}exp(-minimum Hamming distance in parameter space between ithi^{th} global Pareto-optimal point and any point in the current Pareto-optimal set), where NN is the number of global Pareto-optimal points.

The results of our acquisition function comparisons are shown in Figure 2, which shows that HedgeMO performs well across all three test functions. For single-objective test functions, PI outperforms HedgeMO for Encrypted Amalgamated test function. However, for the multi-objective Encrypted ZDT6 test function, PI performs significantly worse and is outperformed by both SMC and UCB. SMC performs well on multi-objective problems combinatorial problems and hence should be a part of portfolio for a hedging algorithm.

These results prove that for a range of different problems, acquisition function choice can play a huge role in the performance of the algorithm. For a black-box function, this information can not be known a priori, making hedging necessary. HedgeMO consistently performs well in all scenarios and ensures efficiency across a range of different problems. Thus, for unknown black-box functions, HedgeMO should be the hedge strategy of choice for multi-objective problems.

Table 1: Experimental values of the critical buckling PcP_{c} (minimization objective for MixMOBO), strain energy density at buckling and fracture, ubu_{b} and ufu_{f} respectively, elastic stiffness SS, and ratio of normalized strain energy density compared to the Unblemished structure.
Structure Pc[μN]P_{c}[\mu N] ub[MJm3]u_{b}[MJm^{-3}] uf[MJm3]u_{f}[MJm^{-3}] S[MPa]S[MPa] (ufi/ubi)/(uf1/ub1)(u_{fi}/u_{bi})/(u_{f1}/u_{b1})
Unblemished 3814.5 1.08 0.071 388.21 1
Random Sampling Optimal 996.2 0.08 2.85 347.19 526
MixMOBO Optimal 545.1 0.02 14.71 460.35 12030

6 Application to Architected Materials

We applied our MixMOBO framework to the optimization of the design of architected, microlattice structures. Advances in modeling, fabrication, and testing of architected materials have promulgated their utility in engineering applications, such as ultralight Zheng et al. (2014); Pham et al. (2019); Zhang et al. (2019), reconfigurable Xia et al. (2019), and high-energy-absorption materials Song et al. (2019b), and in bio-implants Song et al. (2020). The optimization of architected materials Bauer et al. (2016); Pham et al. (2019); Xia et al. (2019); Zhang et al. (2019) often requires searching huge combinatorial design spaces, where the evaluation of each design is expensive. Meza et al. (2014); Berger et al. (2017); Tancogne-Dejean et al. (2018).

Refer to caption
Figure 3: Top Left: The 4 unit cells, labelled AD. Top Right: The 2 orientations in which they can be joined. Bottom Left: Optimization results using MixMOBO. Bottom Right: SEM images of Unblemished and Optimum structures.

The design space for the architected material we optimize here has \sim 8.5 billion possible combinations of its 17 categorical inputs (one with 2 possible states, and the other 16 with 4 possible states). Our goal is to maximize the strain energy density of a microlattice structure. We maximize the strain energy density (which is extremely expensive to compute, even for one design) by minimizing the buckling load PcP_{c}, while maintaining the lattice’s structural integrity and stiffness before fracturing. Minimizing PcP_{c} (a proxy for maximizing the strain energy density by instigating buckling which leads to the densification of the deformed lattice members) is a more computationally tractable cost function to evaluate (but, it is still expensive and involves solving a numerical finite element code for each evaluation of the cost function.) The manufacturing and testing details of our methodology are included in Vangelatos et al. (2021), which focuses on the material aspects of the problem.

The design space consists of choosing one of four possible unit cells (shown in the upper left of Fig. 3, each with one or more defects (shown in color) in them, at each of the 16 independent lattice sites) creating 16 of the categorical inputs with 4 possible values; and the choice of whether the cells are connected along their faces or along their edges on 45o45^{\text{o}}-diagonals (shown in the upper right panel of Fig. 3) creating the 17th17^{th} categorical input with 2 possible values.

The minimization of PcP_{c} using MixMOBO was initialized with 50 random structures and the evaluation budget, including initial samples, was set at 250. The algorithm achieved a 42%42\% improvement in the PcP_{c} of the lattice structure over the best structure obtained with the first 50 random samples (Figure 3). The optimal microlattice obtained using PcP_{c} as a proxy with MixMOBO has an experimentally measured normalized strain energy density that is 12,030 times greater than that of the unblemished microlattice structure with no defects that is cited in the literature to have the best strain energy density Vangelatos et al. (2020), a 44 orders of magnitude improvement. Table 1 shows the properties of the fabricated and experimentally measured design created with MixMOBO. The choices of the units cells in the optimally designed lattice that were determined by MixMOBO are not intuitive and have no obvious pattern or structure. Images of our optimized structures using Helium Ion Microscopy (HIM) 4, shows the comparison of the Unblemished structure from literature with our MixMOBO Optimal structure, before and after loading. It is evident that the MixMOBO Optimal structure due to its densification mechanism, can handle much higher loads without breaking Vangelatos et al. (2021).

Refer to caption
Figure 4: HIM images of the loaded and unloaded unblemished and optimum structures. (a) Unloaded Unblemished structure (b) Unblemished structure after loading, showing severe fracture and collapse of many beam members. (c) Focused image revealing several fractured beams and the internal collapse of the upper layer that subsequently instigated the accumulation of damage in the underlying layers. (d) Unloaded MixMOBO Optimum structure. (e) MixMOBO Optimum structure after the structure was subjected to the same maximum compressive load as the structure shown in (b). Unloading of the optimum structure showed only excessive plastic deformation without catastrophic collapse and the manifestation of the buckling mode. (f) Focused revealing the effect of buckling that led to deformation but no fracture due to the occurrence of densification. (g) Side view of the unloaded optimum structure. (h) Side view of the unloaded optimum structure revealing that fracture was inhibited throughout the structure due to the densification precipitated by the low critical buckling load. Scale: Each scale bar is equal to 10 µm{\micro m}.

7 Conclusions

The existing optimization literature does not offer an algorithm for optimizing multi-objective, mixed-variable problems with expensive black-box functions. We have introduced Mixed-variable Multi-Objective Bayesian Optimization (MixMOBO), the first BO based algorithm for optimizing such problems. MixMOBO is agnostic to the underlying kernel. It is compatible with modified kernels and other surrogate methods developed in previous studies for mixed-variable problems. MixMOBO allows for parallel batch updates without repeated evaluations of the surrogate surface, while maintaining diversification within the solution set. We presented the Hedge Multi-Objective (HedgeMO) algorithm, a novel Hedge strategy for which regret bounds hold for multi-objective problems. A new acquisition function, Stochastic Monte-Carlo (SMC) was also proposed as part of the HedgeMO portfolio. MixMOBO and HedgeMO were benchmarked and shown to be significantly better on a variety of test problems compared to existing mixed-variable optimization algorithms. MixMOBO was then applied to the real-world optimization of an architected micro-lattice, and we increased the structure’s strain-energy density by 10410^{4} compared to existing Unblemished structures in the literature reported to have highest strain energy density. Our future work entails further testing multi-objective and ‘QQ-batch’ settings. We have also applied MixMOBO for optimization of draft-tubes for hydrokinetic turbines Sheikh et al. (2022b) and Cauchy symmetric meta-material structures and are currently applying it for optimization of vertical-axis wind turbines.

Acknowledgments

The authors would like to thank Chiyu ‘Max’ Jiang, research scientist at Waymo Research, and Professor Uros Seljak, Department of Physics, University of California at Berkeley (UCB) for insightful discussions regarding Bayesian optimization. We would also like to thank Zacharias Vangelatos and Professor Costas P. Grigoropoulos, Department of Mechanical Engineering, University of California at Berkeley (UCB) for the collaboration to design and manufacture architected materials, conduct nanoindentation, SEM, and HIM experiments. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562 through allocation TG-CTS190047.

Declarations

Funding and Competing Interests

The authors declare funding sources, conflicts or competing interests that need to be disclosed.

Availability of Data and Materials

Complete data sets to reproduce any and all experiments which were generated during and analysed during the current study and MixMOBO code are available from the corresponding author on reasonable request.

Authors’ Contributions

H.M.S conceptualized the algorithm, designed the methodology and performed experiments under the supervision of P.S.M. H.M.S and P.S.M then wrote and edited the manuscript.

Replication of Results

All the results in this manuscript can be replicated. The complete data sets, MixMOBO algorithm or any other supplementary material and information required for replication are available from the corresponding author on reasonable request.

References

  • Frazier and Wang [2015] Peter I. Frazier and Jialei Wang. Bayesian optimization for materials design. Springer Series in Materials Science, page 45–75, Dec 2015. ISSN 2196-2812. 10.1007/978-3-319-23871-5_3.
  • Chen et al. [2018a] Desai Chen, Mélina Skouras, Bo Zhu, and Wojciech Matusik. Computational discovery of extremal microstructure families. Science Advances, 4(1):eaao7005, 2018a.
  • Chen et al. [2019] Wen Chen, Seth Watts, Julie A Jackson, William L Smith, Daniel A Tortorelli, and Christopher M Spadaccini. Stiff isotropic lattices beyond the maxwell criterion. Science Advances, 5(9):eaaw1937, 2019.
  • Shaw et al. [2019] Lucas A Shaw, Frederick Sun, Carlos M Portela, Rodolfo I Barranco, Julia R Greer, and Jonathan B Hopkins. Computationally efficient design of directionally compliant metamaterials. Nature Communications, 10(1):1–13, 2019.
  • Song et al. [2019a] Jian Song, Yuejiao Wang, Wenzhao Zhou, Rong Fan, Bin Yu, Yang Lu, and Lixiao Li. Topology optimization-guided lattice composites and their mechanical characterizations. Composites Part B: Engineering, 160:402–411, 2019a.
  • Vangelatos et al. [2021] Zacharias Vangelatos, Haris Moazam Sheikh, Philip S. Marcus, Costas P. Grigoropoulos, Victor Z. Lopez, George Flamourakis, and Maria Farsari. Strength through defects: A novel bayesian approach for the optimization of architected materials. Science Advances, 7(41), 2021. 10.1126/sciadv.abk2218.
  • Snoek et al. [2012] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical bayesian optimization of machine learning algorithms, 2012.
  • Chen et al. [2018b] Yutian Chen, Aja Huang, Ziyu Wang, Ioannis Antonoglou, Julian Schrittwieser, David Silver, and Nando de Freitas. Bayesian optimization in alphago. CoRR, abs/1812.06855, 2018b.
  • Oh et al. [2018] ChangYong Oh, Efstratios Gavves, and Max Welling. BOCK : Bayesian optimization with cylindrical kernels. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3868–3877. PMLR, 10–15 Jul 2018.
  • Pyzer-Knapp [2018] Edward Pyzer-Knapp. Bayesian optimization for accelerated drug discovery. IBM Journal of Research and Development, PP:1–1, 11 2018. 10.1147/JRD.2018.2881731.
  • Korovina et al. [2020] Ksenia Korovina, Sailun Xu, Kirthevasan Kandasamy, Willie Neiswanger, Barnabas Poczos, Jeff Schneider, and Eric Xing. Chembo: Bayesian optimization of small organic molecules with synthesizable recommendations. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 3393–3403. PMLR, 26–28 Aug 2020.
  • Sheikh et al. [2022a] Haris Moazam Sheikh, Sangjoon Lee, Jinge Wang, and Philip S. Marcus. Airfoil optimization using design-by-morphing, 2022a. URL https://arxiv.org/abs/2207.11448.
  • Sheikh et al. [2022b] Haris Moazam Sheikh, Tess A. Callan, Kealan J. Hennessy, and Philip S. Marcus. Optimization of the shape of a hydrokinetic turbine’s draft tube and hub assembly using design-by-morphing with bayesian optimization, 2022b. URL https://arxiv.org/abs/2207.11451.
  • Sheikh et al. [2021] Haris Moazam Sheikh, Tess Callan, Kealan Hennessy, and Philip Marcus. Shape Optimization Methodology for Fluid Flows Using Mixed Variable Bayesian Optimization and Design-by-Morphing. In APS Division of Fluid Dynamics Meeting Abstracts, APS Meeting Abstracts, page A15.004, January 2021.
  • Krause et al. [2008] Andreas Krause, Ajit Singh, and Carlos Guestrin. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. Journal of Machine Learning Research, 9(8):235–284, 2008.
  • Mockus [1994] J. Mockus. Application of bayesian approach to numerical methods of global and stochastic optimization. Journal of Global Optimization, 4:347–365, 1994.
  • Rasmussen and Williams [2006] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, 2006. ISBN 026218253X.
  • Brochu et al. [2010] Eric Brochu, Vlad M. Cora, and Nando de Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning, 2010.
  • Golovin et al. [2017] Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D. Sculley. Google vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, page 1487–1495, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450348874. 10.1145/3097983.3098043.
  • Ru et al. [2020] Binxin Ru, Ahsan S. Alvi, Vu Nguyen, Michael A. Osborne, and Stephen J Roberts. Bayesian optimisation over multiple continuous and categorical inputs, 2020.
  • Gopakumar et al. [2018] Shivapratap Gopakumar, Sunil Gupta, Santu Rana, Vu Nguyen, and Svetha Venkatesh. Algorithmic assurance: An active approach to algorithmic testing using bayesian optimisation. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  • Nguyen et al. [2019] Dang Nguyen, Sunil Gupta, Santu Rana, Alistair Shilton, and Svetha Venkatesh. Bayesian optimization for categorical and category-specific continuous inputs, 2019.
  • Qian et al. [2008] Peter Z G Qian, Huaiqing Wu, and CF Jeff Wu. Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics, 50(3):383–396, 2008.
  • Zhou et al. [2011] Qiang Zhou, Peter ZG Qian, and Shiyu Zhou. A simple approach to emulation for computer models with qualitative and quantitative factors. Technometrics, 53(3):266–273, 2011.
  • Zhang et al. [2020] Yichi Zhang, Daniel W Apley, and Wei Chen. Bayesian optimization for materials design with mixed quantitative and qualitative variables. Scientific Reports, 10(1):1–13, 2020.
  • Deshwal and Doppa [2021] Aryan Deshwal and Janardhan Rao Doppa. Combining latent space and structured kernels for bayesian optimization over combinatorial spaces. CoRR, abs/2111.01186, 2021.
  • Deshwal et al. [2021] Aryan Deshwal, Syrine Belakaria, and Janardhan Rao Doppa. Bayesian optimization over hybrid spaces, 2021.
  • Oh et al. [2021] Changyong Oh, Efstratios Gavves, and Max Welling. Mixed variable bayesian optimization with frequency modulated kernels, 2021.
  • Baptista and Poloczek [2018] Ricardo Baptista and Matthias Poloczek. Bayesian optimization of combinatorial structures, 2018.
  • Pelamatti et al. [2018] Julien Pelamatti, Loïc Brevault, Mathieu Balesdent, El-Ghazali Talbi, and Yannick Guerin. Efficient global optimization of constrained mixed variable problems, 2018.
  • Oh et al. [2019] Changyong Oh, Jakub M. Tomczak, Efstratios Gavves, and Max Welling. Combinatorial bayesian optimization using the graph cartesian product, 2019.
  • Garrido-Merchán and Hernández-Lobato [2020] Eduardo C. Garrido-Merchán and Daniel Hernández-Lobato. Dealing with categorical and integer-valued variables in bayesian optimization with gaussian processes. Neurocomputing, 380:20–35, Mar 2020. ISSN 0925-2312. 10.1016/j.neucom.2019.11.004.
  • Lindauer et al. [2021] Marius Lindauer, Katharina Eggensperger, Matthias Feurer, André Biedenkapp, Difan Deng, Carolin Benjamins, René Sass, and Frank Hutter. Smac3: A versatile bayesian optimization package for hyperparameter optimization, 2021.
  • Bergstra et al. [2013] James Bergstra, Daniel Yamins, and David Cox. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 115–123, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
  • Daxberger et al. [2020] Erik Daxberger, Anastasia Makarova, Matteo Turchetta, and Andreas Krause. Mixed-variable bayesian optimization. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Jul 2020. 10.24963/ijcai.2020/365.
  • Tiao et al. [2021] Louis C. Tiao, Aaron Klein, Matthias Seeger, Edwin V. Bonilla, Cedric Archambeau, and Fabio Ramos. Bore: Bayesian optimization by density-ratio estimation, 2021.
  • Balandat et al. [2020] Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, and Eytan Bakshy. Botorch: A framework for efficient monte-carlo bayesian optimization, 2020.
  • Fonseca et al. [2006] C.M. Fonseca, L. Paquete, and M. Lopez-Ibanez. An improved dimension-sweep algorithm for the hypervolume indicator. In 2006 IEEE International Conference on Evolutionary Computation, pages 1157–1163, 2006. 10.1109/CEC.2006.1688440.
  • Daulton et al. [2020] Samuel Daulton, Maximilian Balandat, and Eytan Bakshy. Differentiable expected hypervolume improvement for parallel multi-objective bayesian optimization, 2020.
  • Daulton et al. [2021] Samuel Daulton, David Eriksson, Maximilian Balandat, and Eytan Bakshy. Multi-objective bayesian optimization over high-dimensional search spaces, 2021.
  • Suzuki et al. [2020] Shinya Suzuki, Shion Takeno, Tomoyuki Tamura, Kazuki Shitara, and Masayuki Karasuyama. Multi-objective bayesian optimization using pareto-frontier entropy, 2020.
  • Shu et al. [2020] Leshi Shu, Ping Jiang, Xinyu Shao, and Yan Wang. A New Multi-Objective Bayesian Optimization Formulation With the Acquisition Function for Convergence and Diversity. Journal of Mechanical Design, 142(9), 03 2020. ISSN 1050-0472. 10.1115/1.4046508. 091703.
  • Brochu et al. [2011] Eric Brochu, Matthew W. Hoffman, and Nando de Freitas. Portfolio allocation for bayesian optimization, 2011.
  • Deb et al. [2002] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Transactions on Evolutionary Computation, 6(2):182–197, 2002. 10.1109/4235.996017.
  • Murphy [2012] K. P. Murphy. Machine learning: a probabilistic perspective. MIT Press, 2012.
  • Srinivas et al. [2012] Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265, May 2012. ISSN 1557-9654. 10.1109/tit.2011.2182033.
  • GPyOptAuthors [2016] GPyOptAuthors. GPyOpt: A bayesian optimization framework in python. http://github.com/SheffieldML/GPyOpt, 2016.
  • Scikit-learn [2021] Scikit-learn. scikit-optimize. https://scikit-optimize.github.io/stable/, 2021.
  • Bergstra et al. [2011] James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper-parameter optimization. In Proceedings of the 24th International Conference on Neural Information Processing Systems, NIPS’11, page 2546–2554, Red Hook, NY, USA, 2011. Curran Associates Inc. ISBN 9781618395993.
  • Hutter et al. [2011] Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Carlos A. Coello Coello, editor, Learning and Intelligent Optimization, pages 507–523, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-25566-3.
  • Hu et al. [2011] Yingjie Hu, Jianqiang Hu, Yifan Xu, Fengchun Wang, and Rong Cao. Contamination control in food supply chain. pages 2678 – 2681, 01 2011. 10.1109/WSC.2010.5678963.
  • Tušar et al. [2019] Tea Tušar, Dimo Brockhoff, and Nikolaus Hansen. Mixed-integer benchmark problems for single- and bi-objective optimization. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, page 718–726, New York, NY, USA, 2019. Association for Computing Machinery. ISBN 9781450361118. 10.1145/3321707.3321868.
  • Kauffman and Levin [1987] Stuart Kauffman and Simon Levin. Towards a general theory of adaptive walks on rugged landscapes. Journal of Theoretical Biology, 128(1):11–45, 1987. ISSN 0022-5193. https://doi.org/10.1016/S0022-5193(87)80029-2.
  • Li et al. [2006] Rui Li, Michael Emmerich, Jeroen Eggermont, Ernst Bovenkamp, Thomas Bäck, Jouke Dijkstra, and Johan Reiber. Mixed-integer nk landscapes. volume 4193, pages 42–51, 01 2006. ISBN 978-3-540-38990-3. 10.1007/11844297_5.
  • Zitzler et al. [2000] Eckart Zitzler, Kalyanmoy Deb, and Lothar Thiele. Comparison of multiobjective evolutionary algorithms: Empirical results. Evol. Comput., 8(2):173–195, jun 2000. ISSN 1063-6560. 10.1162/106365600568202.
  • Zheng et al. [2014] Xiaoyu Zheng, Howon Lee, Todd H Weisgraber, Maxim Shusteff, Joshua DeOtte, Eric B Duoss, Joshua D Kuntz, Monika M Biener, Qi Ge, Julie A Jackson, et al. Ultralight, ultrastiff mechanical metamaterials. Science, 344(6190):1373–1377, 2014.
  • Pham et al. [2019] Minh-Son Pham, Chen Liu, Iain Todd, and Jedsada Lertthanasarn. Damage-tolerant architected materials inspired by crystal microstructure. Nature, 565(7739):305–311, 2019.
  • Zhang et al. [2019] Xuan Zhang, Andrey Vyatskikh, Huajian Gao, Julia R Greer, and Xiaoyan Li. Lightweight, flaw-tolerant, and ultrastrong nanoarchitected carbon. Proceedings of the National Academy of Sciences, 116(14):6665–6672, 2019.
  • Xia et al. [2019] Xiaoxing Xia, Arman Afshar, Heng Yang, Carlos M Portela, Dennis M Kochmann, Claudio V Di Leo, and Julia R Greer. Electrochemically reconfigurable architected materials. Nature, 573(7773):205–213, 2019.
  • Song et al. [2019b] Jian Song, Wenzhao Zhou, Yuejiao Wang, Rong Fan, Yinchu Wang, Junying Chen, Yang Lu, and Lixiao Li. Octet-truss cellular materials for improved mechanical properties and specific energy absorption. Materials & Design, 173:107773, 2019b.
  • Song et al. [2020] Jiaxi Song, Christos Michas, Christopher S Chen, Alice E White, and Mark W Grinstaff. From simple to architecturally complex hydrogel scaffolds for cell and tissue engineering applications: Opportunities presented by two-photon polymerization. Advanced Healthcare Materials, 9(1):1901217, 2020.
  • Bauer et al. [2016] Jens Bauer, Almut Schroer, Ruth Schwaiger, and Oliver Kraft. Approaching theoretical strength in glassy carbon nanolattices. Nature Materials, 15(4):438–443, 2016.
  • Meza et al. [2014] Lucas R Meza, Satyajit Das, and Julia R Greer. Strong, lightweight, and recoverable three-dimensional ceramic nanolattices. Science, 345(6202):1322–1326, 2014.
  • Berger et al. [2017] JB Berger, HNG Wadley, and RM McMeeking. Mechanical metamaterials at the theoretical limit of isotropic elastic stiffness. Nature, 543(7646):533–537, 2017.
  • Tancogne-Dejean et al. [2018] Thomas Tancogne-Dejean, Marianna Diamantopoulou, Maysam B Gorji, Colin Bonatti, and Dirk Mohr. 3d plate-lattices: An emerging class of low-density metamaterial exhibiting optimal isotropic stiffness. Advanced Materials, 30(45):1803334, 2018.
  • Vangelatos et al. [2020] Z Vangelatos, K Komvopoulos, and C Grigoropoulos. Regulating the mechanical behavior of metamaterial microlattices by tactical structure modification. Journal of the Mechanics and Physics of Solids, page 104112, 2020.

Appendix A Benchmark Test Functions

In this section, we define the benchmark test functions, all of which are set to be maximized during our optimizations.

Contamination Problem

The contamination problem was introduced by Hu et al. [2011] and is used to test categorical variables with binary categories. The problem aims to maximize the reward function for applying a preventative measure to stop contamination in a food supply chain with DD stages. At each ithi^{th} stage, where i[1,D]i\in[1,D], decontamination efforts can be applied. However, this effort comes at a cost cc and will decrease the contamination by a random rate Γi\Gamma_{i}. If no prevention effort is taken, the contamination spreads with a rate of Ωi\Omega_{i}. At each stage ii, the fraction of contaminated food is given by the recursive relation:

Zi=Ωi(1wi)(1Zi1)+(1Σiwi)Zi1\displaystyle Z_{i}=\Omega_{i}(1-w_{i})(1-Z_{i-1})+(1-\Sigma_{i}w_{i})Z_{i-1} (5)

here wi0,1w_{i}\in{0,1} and is the decision variable to determine if preventative measures are taken at ithi^{th} stage or not. The goal is to decide which stages ii action should be taken to make sure ZiZ_{i} does not exceed an upper limit UiU_{i}. Ωi\Omega_{i} and Σi\Sigma_{i} are determined by a uniform distribution. We consider the problem setup with Langrangian relaxation Baptista and Poloczek [2018]: Here violation of Zk<UiZ_{k}<U_{i} is penalized by ρ=1\rho=1 and summing the contaminated stages if the limit is violated and our total stages or dimensions are D=21D=21. The cost cc is set to be 0.2 and Z1=0.01Z_{1}=0.01. As in the setup for Baptista and Poloczek [2018], we use T=100T=100 stages, Ui=0.1U_{i}=0.1, λ=0.01\lambda=0.01 and ϵ=0.05\epsilon=0.05.

Encrypted Amalgamated

Analytic test functions generally cannot mimic mixed variables. To map the continuous output of a function into NN discrete ordinal or categorical variables, the continuous range of the test function’s output is first discretized into NN discrete subranges by selecting (N1)(N-1) break points, often equally spaced, within the bounds of the range. Then, the continuous output variable is assigned the integer round-off value of the subrange defined by its surrounding pair of break points. If necessary, the domain of the test function’s output is first mapped into a larger domain so that each subrange has a unique integer value. To mimic ordinal variables, we are done, but for categorical variables, a random vector for each categorical variable is then generated which scrambles or ‘encrypts’ the indices of these values, thus creating random landscapes as is the case for categorical variables with a latent space. The optimization algorithm only sees the encrypted space and the random vector is only used when evaluating the black-box function.

We also define a new test function that we call the Amalgamated function, a piece-wise function formed from commonly used analytical test functions with different features (for more details on these functions we refer to Tušar et al. [2019]). The Amalgamated function is non-convex and anisotropic, unlike conventional test functions where isotropy can be exploited.

For i=1ni=1...n, k=k=mod(i1,7)(i-1,7):

f(w)=i=1Dg(wi)f(\vec{w})=\sum_{i=1}^{D}g(w_{i}) (7)

where

g(wi)={sin(wi)ifk=0,wi(0,π)wi416wi2+5wi2ifk=1,wi(5,5)(wi2)ifk=2,wi(10,10)[10+wi210cos(2πwi)]ifk=3,wi(5,5)[100(wiwi12)2+(1wi)2]ifk=4,wi(2,2)abs(cos(wi))ifk=5,wi(π/2,π/2)wiifk=6,wi(30,30)\displaystyle g(w_{i})=\left\{\begin{array}[]{@{\:}l@{}l}\>\lx@intercol sin(w_{i})&\text{if}\>k=0,\ w_{i}\in(0,\pi)\\[4.30554pt] \>\lx@intercol-\frac{w_{i}^{4}-16w_{i}^{2}+5w_{i}}{2}&\text{if}\>k=1,\ w_{i}\in(-5,5)\\[4.30554pt] \>\lx@intercol-(w_{i}^{2})&\text{if}\>k=2,\ w_{i}\in(-10,10)\\[4.30554pt] \>\lx@intercol-[10+w_{i}^{2}-10cos(2\pi w_{i})]&\text{if}\>k=3,\ w_{i}\in(-5,5)\\[4.30554pt] \>\lx@intercol-[100(w_{i}-w_{i-1}^{2})^{2}+(1-w_{i})^{2}]&\text{if}\>k=4,\ w_{i}\in(-2,2)\\[4.30554pt] \>\lx@intercol abs(cos(w_{i}))&\text{if}\>k=5,\ w_{i}\in(-\pi/2,\pi/2)\\[4.30554pt] \>\lx@intercol-w_{i}&\text{if}\>k=6,\ w_{i}\in(-30,30)\end{array}\right. (8)

To create the Encrypted Amalgamated function, for categorical and ordinal variables, equally spaced points are taken within the bounds defined above. For our current work, we use a D=13D=13 with 88 categorical and 33 ordinal variables with 55 states each, and 22 continuous variables.

NK Landscapes

NK Landscapes were introduced by Kauffman and Levin [1987] as a way of creating optimization problems with categorical variables. NN describes the number of genes or number of dimensions DD and KK is the number of epistatic links of each gene to other genes, which describes the ‘ruggedness’ of the landscape. A large number of random landscapes can be created for given NN and KK values. The global optimum of a generated landscape for experimentation can only be computed through complete enumeration. The landscape cost for any vector is calculated as an average of each component cost. Each component cost is based on the random values generated for the categories, not only by its own alleles, but also by the alleles in the other genes connected through the random epistasis matrix, with KK probability or ruggedness. A K=1K=1 ruggedness translates to a fully connected genome.

The NKNK Landscapes from Kauffman and Levin [1987] were formulated only for binary variables. They were extended by Li et al. [2006] for multi-categorical problems, which is the formulation we use. Details of the NKNK Landscape test-functions we use can be found in Li et al. [2006]. For the current study, we use N=8N=8 with 4 categories each and ruggedness K=0.2K=0.2.

Rastringin

Rastringin function is a commonly used non-convex optimization function Tušar et al. [2019] with a large number of local optima. It is defined as:

f(w)=[10+wi210cos(2πwi)],wi(5,5)f(\vec{w})=-[10+w_{i}^{2}-10cos(2\pi w_{i})],\ w_{i}\in(-5,5) (9)

We use D=9D=9 for testing with 6 ordinal with 5 discrete states and 3 continuous variables. The ordinal variables are equally spaced within the bounds.

Encrypted Syblinski-Tang

We use the Syblinski-Tang function Tušar et al. [2019], an isotropic non-convex function. The function is considered difficult to optimize because many search algorithms get ‘stuck’ at a local optimum. For use with categorical variables, we encrypt it as described previously. The Syblinski-Tang function, in terms of input vector w{\vec{w}}, is defined as:

f(w)=i=1Dwi416wi2+5wi2,wi(5,2.5)f(\vec{w})=-\frac{\sum_{i=1}^{D}w_{i}^{4}-16w_{i}^{2}+5w_{i}}{2},\ w_{i}\in(-5,2.5) (10)

For the current study, this function was tested with D=10D=10 categorical variables and 5 categories for each variable.

Encrypted ZDT6

ZDTZDT benchmarks are a suite of multi-objective problems, suggested by Zitzler et al. [2000], and most commonly used for testing such problems. We use ZDT6ZDT6, which is non-convex and non-uniform in its parameter space. We again modify the function by encrypting it to work with categorical problems. ZDT6ZDT6 is defined as:

f1(w)=exp(4w1)sin6(6πw1)1f2(w)=g(w)[1(f1(w)/g(w))2]g(w)=1+9[(i=2Dwi)/(n1)]1/4\ \begin{aligned} f_{1}(\vec{w})&=exp(-4w_{1})sin^{6}(6\pi w_{1})-1\\ f_{2}(\vec{w})&=-g(\vec{w})\left[1-(f_{1}(\vec{w})/g(\vec{w}))^{2}\right]\\ g(\vec{w})&=1+9\left[\left(\sum_{i=2}^{D}w_{i}\right)/(n-1)\right]^{1/4}\end{aligned} (11)

Here w1[0,1]w_{1}\in[0,1] and wi=0w_{i}=0 for i=2,,Di=2,\dots,D. The function was tested for D=10D=10 with 5 categories each. We note that to evaluate the performance of MixMOBO, we compared it against the NSGA-II variant Deb et al. [2002] that can deal with mixed variables (by running ZDT4ZDT4 in a mixed variable setting and ZDT6ZDT6 with categorical variables). No encryption is necessary for GAs. GAs required, on average, 10210^{2} more function calls compared to MixMOBO.