This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Continuous black-box optimization with quantum annealing and random subspace coding

Syun Izawa Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8561, Japan    Koki Kitai Department of Mechanical Engineering, The University of Tokyo, Tokyo 113-8654, Japan    Shu Tanaka Department of Applied Physics and Physico-Informatics, Keio University, Yokohama 223-8522, Japan Green Computing System Research Organization, Waseda University, Tokyo, 162-0042, Japan    Ryo Tamura [email protected] International Center for Materials Nanoarchitectonics, National Institute for Materials Science, Tsukuba 305-0047, Japan Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8561, Japan RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan    Koji Tsuda [email protected] Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-8561, Japan RIKEN Center for Advanced Intelligence Project, Tokyo 103-0027, Japan Research and Services Division of Materials Data and Integrated System, National Institute for Materials Science, Tsukuba 305-0047, Japan
Abstract

A black-box optimization algorithm such as Bayesian optimization finds extremum of an unknown function by alternating inference of the underlying function and optimization of an acquisition function. In a high-dimensional space, such algorithms perform poorly due to the difficulty of acquisition function optimization. Herein, we apply quantum annealing (QA) to overcome the difficulty in the continuous black-box optimization. As QA specializes in optimization of binary problems, a continuous vector has to be encoded to binary, and the solution of QA has to be translated back. Our method has the following three parts: 1) Random subspace coding based on axis-parallel hyperrectangles from continuous vector to binary vector. 2) A quadratic unconstrained binary optimization (QUBO) defined by acquisition function based on nonnegative-weighted linear regression model which is solved by QA. 3) A penalization scheme to ensure that the QA solution can be translated back. It is shown in benchmark tests that its performance using D-Wave AdvantageTM quantum annealer is competitive with a state-of-the-art method based on the Gaussian process in high-dimensional problems. Our method may open up a new possibility of quantum annealing and other QUBO solvers including quantum approximate optimization algorithm (QAOA) using a gated-quantum computers, and expand its range of application to continuous-valued problems.

preprint: APS/123-QED

I Introduction

Given a black-box function f(𝒙)f({\bm{x}}), 𝒙d{\bm{x}}\in\Re^{d}, whose closed-form expression is unknown, Black-box optimization (BO) identifies the extremum via an iterative procedure of input design and evaluation [1, 2]. Figure 1 shows the procedure of software-based black-box optimization. First, the black-box function is estimated using a machine-learning prediction model fML(𝒙)f_{\rm ML}({\bm{x}}) trained by known dataset D={(𝒙1,y1),,(𝒙n,yn)}D=\{({\bm{x}}_{1},y_{1}),...,({\bm{x}}_{n},y_{n})\}. Using a trained model, an acquisition function g(𝒙)g({\bm{x}}) is defined, and the next input 𝒙{\bm{x}}^{*} is designed by optimizing it. Next, the value of black-box function at the recommended point y=f(𝒙)y^{*}=f({\bm{x}}^{*}) is evaluated by a black-box system, e.g., simulation or experiment. A new data (𝒙,y)({\bm{x}}^{*},y^{*}) is added to DD. These steps are iterated as long as the budget allows. Black-box optimization has been applied to many physics applications including autonomous X-ray scattering experiments [3], inverse scattering [4], crystal structure prediction [5], design of organic synthesis experiments [6], and effective model estimation [7].

It is expected that BO accelerates extremum identification in comparison to random sampling. Namely, at the same number of evaluations, the best input vector by BO should be closer to the extremum than that of random sampling on average. It is known that, however, the advantage of BO quickly diminishes as the number of dimensionality increases [8]. There are two possible reasons, statistical and computational. In statistical terms, it is difficult to estimate f(𝒙)f({\bm{x}}) with a limited number of training data in a high-dimensional space. In computational terms, the optimization of a non-convex acquisition function becomes increasingly difficult as the number of dimensionality increases.

Refer to caption
Figure 1: Software-based and hardware-assisted black-box optimization with continuous variables. To use special hardware, appropriate encoder and decoder between continuous and binary states should be put.

In this paper, to solve computational problem for continuous black-box optimization, we propose a hardware-assisted black-box optimization algorithm called CONBQA (CONtinuous Black-box optimization with Quantum Annealing, pronounced like kombucha) [9] that employs a quantum annealing[10, 11] as an acquisition function optimizer. Here, we use quantum annealers developed by D-Wave Systems, Inc.[12] which are special hardware for searching the ground state of Ising model or solving quadratic unconstrained binary optimization (QUBO). For our method, instead of quantum annealer, we can use other Ising machines established using characteristic physics systems such as, digital circuit [13], CMOS [14], laser oscillator [15, 16], GPU [17], and so on. Recently, Kitai et al. [18] proposed a method that solves a discrete black-box optimization problem with the help of quantum annealing, which is called FMQA algorithm. This method uses the factorization machine as a machine-learning prediction model, and can solve computational problem in discrete black-box optimization. On the other hand, continuous black-box optimization cannot be performed by FMQA, and technique to encode continuous data to binary data is required. Note that some methods for discrete black-box optimization by Ising machines are recently developed [19, 20].

Figure 1 illustrates the procedure of CONBQA. 1) Continuous data 𝒙{\bm{x}} are encoded to binary data 𝒛{\bm{z}}. 2) Machine learning is performed and an acquisition function g(𝒛)g({\bm{z}}) is prepared. 3) The QUBO defined by g(𝒛)g({\bm{z}}) and additional constraints to ensure decodability is solved by a quantum annealer to obtain maximizer 𝒛{\bm{z}}^{*}. 4) The solution is decoded back to a continuous vector 𝒙{\bm{x}}^{*}. 5) The value of black-box function y=f(𝒙)y^{*}=f({\bm{x}}^{*}) is evaluated. 6) A new data point is added to the dataset.

Continuous-to-binary encoding is a common subject of research in scientific fields including neuroscience [21, 22] and machine learning [23, 24]. Among existing methods, we employ random subspace coding [22] due to its simplicity and high resolution. Several coordinates are randomly chosen and an axis parallel hyperrectangle is sampled randomly in the subspace. A continuous vector is mapped to a binary variable by projecting it to the subspace. If the mapped point is included in the hyperectangle, the vector is mapped to one, otherwise zero. By using mm hyperrectangles, a continuous vector is mapped to an mm-dimensional binary vector. In Fig. 2 (a), example of random subspace coding in the two-dimensional continuous space is shown. Here, two-dimensional coordinates are selected to generate hyperrectangles, and m=2m=2 case is considered. Two rectangles divide the whole space into four regions. It corresponds to an encoding to a binary vector with two bits, i.e., if the continuous vector is included in the left blue hyperrectangle, the first bit is 1, while for the right red rectangle, second bit becomes 1 if continuous vector is located. This example is surjective encoding. On the other hand, it is worth noting that the random subspace coding is not necessarily surjective. Figure 2 (b) is an example of non-surjective encoding. In this case, there may exist binary vectors that cannot be decoded to continuous vectors, i.e., ‘11’ in this figure. In addition, for three-dimensional case, an example of the random subspace coding is shown in Fig. 2 (c). Here, the continuous vector located at the red point is mapped to the binary vector where the bits expressing bold hyperrectangles are 1 and the others are 0.

Refer to caption
Figure 2: (a) Surjective encoding for two-dimensional case when two-dimensional hyperrectangles are used. If the continuous vector is included in the left blue hyperrectangle, the first bit is 1, while for the right red rectangle, second bit becomes 1 if continuous vector is located. (b) None-surjective encoding for two-dimensional case. The vector ‘11’ does not have any corresponding point. (c) Random subspace coding in the three-dimensional continuous space. Although each square is drawn on each plane, it extends in the vertical axis for each place and forms rectangular. The bits expressing the bold rectangle are 1, and others 0 for the red point position.

In CONBQA, as the machine learning model in the binary space, a nonnegative-weighted Bayesian linear model is employed, and the acquisition function based on the trained model is defined. We prove that the global optimal solution of the acquisition function under constraints to ensure decodability is always decodable via Helly’s theorem. Interestingly, nonnegativity constraints in the prediction model are essential in proving the property. To use quantum annealer, a QUBO is formulated with the acquisition function and additional constraints. Note that it is not really necessary to obtain global optimal solution of QUBO by quantum annealing, because decodable solutions are exist even if not global optimal solution. Thus, the task of quantum annealing is to search decodable solutions with as large acquisition function as possible. In numerical experiments using benchmark functions, it was shown that our method is competitive with Add-GP [25], which is one of the state-of-the art method for high-dimensional Bayesian optimization [8].

II Method

II.1 Random Subspace Coding

In random subspace coding in a dd-dimensional space, the dimensionality of subspace dsd_{s} is first determined. Let ={i1,,ids}{\cal I}=\{i_{1},\ldots,i_{d_{s}}\} denote the indices of randomly chosen coordinates from [1,d][1,d], and {[li,ui]}i\{[l_{i},u_{i}]\}_{i\in{\cal I}} be closed intervals in respective coordinates. Here, li<uil_{i}<u_{i}, lil_{i}\in\Re and uiu_{i}\in\Re. Random sampling of each interval [li,ui][l_{i},u_{i}] is performed using the Dirichlet process method [26]. Given a positive integer NN, this method can generate an interval such that any point in [0,1][0,1] along each selected coordinate is included in the interval with probability 1/N1/N. Throughout this paper, NN is fixed as 33. Then, an axis-parallel hyperrectangle RR is determined as

R={𝒙d|lixiui,i}.\displaystyle R=\{{\bm{x}}\in\Re^{d}\;|\ \;l_{i}\leq x_{i}\leq u_{i},i\in{\cal I}\}. (1)

Later on, we denote it as “rectangle” for simplicity. Using mm rectangles R1,,RmR_{1},\ldots,R_{m}, the random subspace coding is defined as zk=I(𝒙Rk)z_{k}=I({\bm{x}}\in R_{k}), where I()I(\cdot) is one if the condition is satified and zero otherwise.

Given a binary vector 𝒛={zk}k=1,,m{\bm{z}}=\{z_{k}\}_{k=1,...,m}, a point 𝒙d{\bm{x}}\in\Re^{d} corresponds to 𝒛{\bm{z}}, if 𝒙{\bm{x}} is included in all rectangles with zk=1z_{k}=1 and not included in those with zk=0z_{k}=0. Let PP denotes the intersection of all rectangles with zk=1z_{k}=1,

P(𝒛)={k|zk=1}Rk.\displaystyle P({\bm{z}})=\bigcap_{\{k\;|\;z_{k}=1\}}R_{k}. (2)

Let NN denote the union of all rectangles with zk=0z_{k}=0,

N(𝒛)={k|zk=0}Rk.\displaystyle N({\bm{z}})=\bigcup_{\{k\;|\;z_{k}=0\}}R_{k}. (3)

Since the preimage of 𝒛{\bm{z}} is described as P(𝒛)N(𝒛)P({\bm{z}})-N({\bm{z}}), 𝒛{\bm{z}} is decodable if and only if P(𝒛)N(𝒛)P({\bm{z}})-N({\bm{z}})\neq\emptyset.

II.2 Black-Box Optimization

In CONBQA, the training dataset DD are encoded to binary dataset D={(𝒛1,y1),,(𝒛n,yn)}D^{\prime}=\{({\bm{z}}_{1},y_{1}),...,({\bm{z}}_{n},y_{n})\} through random subspace coding. Here, {yi}i=1,,n\{y_{i}\}_{i=1,...,n} is normalized by min-max normalization, and thus these values are nonnegative. Then, a QUBO is prepared from DD^{\prime} and its optimal solution 𝒛{\bm{z}}^{*} is obtained with a quantum annealing. Next, 𝒛{\bm{z}}^{*} decoded to a continuous vector 𝒙{\bm{x}}^{*}, and fed to the black-box system.

To model an underlying function from DD^{\prime}, a nonnegative linear model:

y=k=1mwkzk,wk0,y=\sum_{k=1}^{m}w_{k}z_{k},\;w_{k}\geq 0, (4)

is employed, and trained with least-squares fitting to DD^{\prime}. Let 𝒘={wk}k=1,,m{\bm{w}}^{*}=\{w_{k}^{*}\}_{k=1,...,m} denote the trained parameter vector. Thus, the acquisition function is defined as

g(𝒛)=k=1mwkzk.g({\bm{z}})=\sum_{k=1}^{m}w_{k}^{*}z_{k}. (5)

The unique solution with maximum of g(𝒛)g({\bm{z}}) has zk=1z_{k}=1 for k\forall k. However, this solution may be a non-decodable solution, because it is almost impossible to exist the region, where all rectangles are intersected. That is, unconstrained maximization of the acquisition function may lead to a non-decodable solution. To keep the optimal solution decodable, we add the following constraint, 𝒛C{\bm{z}}\in C where

C={𝒛{0,1}m|zizj=0,ifRiRj=}.C=\{{\bm{z}}\in\{0,1\}^{m}\;|\;z_{i}z_{j}=0,\;{\rm if}\;R_{i}\cap R_{j}=\emptyset\}. (6)

The meaning of this constraint is that the data point do not belong to RiR_{i} and RjR_{j} when two rectangles are not intersected. The constrained problem is then described as

max𝒛Ck=1mwkzk.\max_{{\bm{z}}\in C}\sum_{k=1}^{m}w^{*}_{k}z_{k}. (7)

In Sec. III, we will present a proof that the optimal solution of Eq. (7) is decodable.

The problem by Eq. (7) is treated as a QUBO and solved by a quantum annealing. As done in many application studies of quantum annealing, constraints are replaced by a penality term as follows,

min𝒛{0,1}mAi=1mwizi+B{(i,j)|RiRj=,i<j}zizj,\min_{{\bm{z}}\in\{0,1\}^{m}}-A\sum_{i=1}^{m}w_{i}^{*}z_{i}+B\sum_{\{(i,j)|R_{i}\cap R_{j}=\emptyset,i<j\}}z_{i}z_{j}, (8)

where AA and BB are positive hyperparameters. To make contributions from two terms comparable, we use A=1/maxk{wk}A=1/\max_{k}\{w_{k}^{*}\} and B=1B=1. Once the solution 𝒛{\bm{z}}^{*} of Eq. (8) is obtained by quantum annealing, a point 𝒙{\bm{x}}^{*} is randomly sampled from the preimage P(𝒛)N(𝒛)P({\bm{z}}^{*})-N({\bm{z}}^{*}), evaluated to yield a new outcome yy^{*} by the black-box system. The new data point (𝒙,y)({\bm{x}}^{*},y^{*}) is then added to the dataset DD, and the next iteration is put forward.

III Decodability

To understand Eq. (7) better, we formulate it as a graph-theoretic problem. Let us define mm nodes corresponding to all rectangles, and make an edge if the corresponding rectangles are overlapped. Put a nonnegative weight wkw^{*}_{k} to each node. Then, the acquisition function defined by Eq. (5) is the sum of all node weights of the subgraph induced by the nodes with zk=1z_{k}=1. Let us call the rectangles with zk=1z_{k}=1 and 0 positive and negative rectangles, respectively. The nodes corresponding to positive and negative rectangles are called positive and negative nodes, respectively.

Consider the following two conditions. 1) There is no edge between node ii and jj. 2) Both of the nodes are positive. If both of the conditions are satisfied for 𝒛{\bm{z}}, it is not decodable, because the intersection of two non-overlapping rectangles is empty, hence P(𝒛)=P({\bm{z}})=\emptyset. The constraint of Eq. (6) ensures that this situation does not happen, and the positive node set always forms a clique. Assume that the positive set does not form a clique. Then, there is at least one pair of positive nodes without an edge, violating the constraint. In summary, the optimization problem corresponds to a maximum clique problem [27], where the weight of a clique is defined as the sum of all node weights.

Let us prove that the global optimal solution 𝒛{\bm{z}}^{*} of Eq. (7) is decodable. As shown before, the decodability condition is written as P(𝒛)N(𝒛)P({\bm{z}}^{*})-N({\bm{z}}^{*})\neq\emptyset. To prove it, it is sufficient to show P(𝒛)P({\bm{z}}^{*})\neq\emptyset and P(𝒛)N(𝒛)=P({\bm{z}}^{*})\cap N({\bm{z}}^{*})=\emptyset. Since the optimal solution corresponds to a clique in the graph theoretic problem, all pairs of positive rectangles overlap. Due to Helly’s Theorem of convex sets, the intersection of all rectangles P(𝒛)P({\bm{z}}^{*}) is nonempty. Let us assume that P(𝒛)N(𝒛)P({\bm{z}}^{*})\cap N({\bm{z}}^{*})\neq\emptyset. Then, P(𝒛)P({\bm{z}}^{*}) has to have overlap with one or more negative rectangles. If a negative rectangle has overlap with P(𝒛)P({\bm{z}}^{*}), it has overlap with all the positive rectangles. In that case, one can build a larger clique than the current one by including the negative rectangle. Since node weight is positive, larger clique should have larger weight. It contradicts the assumption that the current clique has maximum weight, thus P(𝒛)N(𝒛)=P({\bm{z}}^{*})\cap N({\bm{z}}^{*})=\emptyset.

IV Experiments

IV.1 Results with D-Wave Quantum Annealer

Using D-Wave Advantage, we implement a hardware-assisted system for continuous black-box optimization. In the previous section, we proved that, if the hardware optimizer finds the optimal solution of Eq. (7), it is always decodable. In reality, however, a quantum annealer and other algorithms solve the QUBO problem defined by Eq. (8) and do not always return the ground state. Thus, there are the following three possible cases when the solution 𝒛{\bm{z}}^{*} is calculated by quantum annealing:

  1. 1.

    Empty: P(𝒛)=P({\bm{z}}^{*})=\emptyset.

  2. 2.

    Admissible: P(𝒛)P({\bm{z}}^{*})\neq\emptyset and P(𝒛)N(𝒛)P({\bm{z}}^{*})\cap N({\bm{z}}^{*})\neq\emptyset.

  3. 3.

    Decodable: P(𝒛)P({\bm{z}}^{*})\neq\emptyset and P(𝒛)N(𝒛)=P({\bm{z}}^{*})\cap N({\bm{z}}^{*})=\emptyset.

For a decoder, if 𝒛{\bm{z}}^{*} with decodable or admissible is recommended, the next data point 𝒙{\bm{x}}^{*} is randomly sampled from the region with P(𝒛)P({\bm{z}}^{*})\neq\emptyset. On the other hand, when the solution is empty, 𝒙{\bm{x}}^{*} is randomly selected from the whole search space.

To test the performance, we employ a 6-dimensional function called Hartmann-6 [28] that is often used for testing global optimizers. In this testing function, whole search spaces are determined as [0,1][0,1] in all coordinates. For all experiments, the parameters of random subspace coding are determined as m=60m=60 and ds=2d_{s}=2. The performance of CONBQA is measured by regret STS_{T}, which is the difference between the optimal value of Hartmann-6 and the best one observed so far in the black-box optimization. Thus, the purpose of the present continuous black-box optimization is to obtain small STS_{T} with as few iterations as possible. Figure 3 shows the regret in logarithmic scale against the number of iterations for CONBQA, AddGP based on UCB score, and random search. In the 15 initial steps, 𝒙{\bm{x}} is randomly sampled from the whole search space, and afterwords, iterations controlled by CONBQA or Add-GP are performed. Here, the average over 10 independent runs, where initial randomly-sampled data are different, is shown with the error indicating the standard deviation.

Refer to caption
Figure 3: Regret STS_{T} depending on the number of iterations by random search, Add-GP, and CONBQA. CONBQA with D-Wave Advantage with hybrid solver service performs better than CONBQA with simulated annealing and that with greedy search. It outperforms Add-GP based on UCB score and random search as well. 10 independent runs are performed and shaded region is the error.

As a QUBO solver used in CONBQA, we employ D-Wave Advantage with hybrid solver service[29], simulated annealing using dwave-neal with default setting[30], and greedy search. It is remarkable that when the number of iterations is small, regardless of QUBO solvers, CONBQA outperforms Add-GP which performed best in a previous benchmark study [8]. Of course, in comparison with random search, better optimization performance of CONBQA is confirmed. This result indicates that our framework to solve continuous black-box optimization using random subspace coding and QUBO solvers as acquisition function optimizer is work well. In addition, among the CONBQA variants, the quantum annealer performed best. In particular, optimization result by quantum annealer is quite better than that by greedy search, indicating that the performance of continuous black-box optimization can possibly be improved by a better QUBO solvers, i.e., specialized hardware.

Next, we address the decordability in optimization process. Table 1 shows the fraction of empty, admissible, and decodable solutions in the iterations in CONBQA depending on the QUBO solvers. As an important fact, we have never observed empty solutions by three methods to solve QUBO. Simulated annealing and greedy search had a higher fraction of decodable solutions than the quantum annealer despite that they performed poorly in optimization (see Fig. 3). This result indicates that the solutions obtained from the simulated annealing or greedy search are well satisfying the constraint of Eq. (6), but tend to stuck at local minima with larger values of the first term in Eq. (8). On the other hand, quantum annealer recommends many admissible solutions, which were not fine-tuned for the constraint, but better optimization is realized. This means that the solutions by quantum annealer have smaller value of the first term in Eq. (8) instead of fulfilling the constraint exactly, and CONBQA could get closer to the global optimal solution.

Table 1: Fraction of empty, admissible, and decodable solutions obtained from three different QUBO solvers.
QUBO solver  Empty  Admissible  Decodable
D-Wave Advantage 0 % 53.5 % 46.5 %
Simulated annealing 0 % 12.1 % 87.9 %
Greedy search 0 % 3.6 % 96.4 %

IV.2 Benchmarking with Simulated Annealing

The performance of CONBQA is tested further with three additional benchmark functions from 2013 IEEE CEC Competition on Niching Methods for Multimodal Optimization [31]. Among the 20 functions provided, three high-dimensional functions of F12F_{12} (5D), F11F_{11} (10D), and F12F_{12} (10D) are chosen. Their dimensionality is 5, 10, and 10, respectively. To try some experimental settings with different number of encoding bits, mm, we perform CONBQA with simulated annealing. In these testing functions, whole search space are determined as [0,1][0,1] in all coordinates. Figure 4 shows the regret in logarithmic scale against the number of iterations for CONBQA with m=20,60m=20,60, and 100100, Add-GP, and random search. For random subspace coding, ds=2d_{s}=2 is used. The average over 10 independent runs, where initial 15 randomly-sampled data are different, is shown with the error indicating the standard deviation.

For relatively low-dimensional dataset of F12F_{12} (5D), CONBQA is superior in finding better solutions in comparison with Add-GP regardless of mm. In this case, m=60m=60 shows the better optimization performance. If mm is increased, the resolution of random subspace coding becomes high, which is discussed in the next subsection, and we can get closer to the optimal solution in principle. On the other hand, the regression model becomes complicated, and in an early stage of iterations, an accurate prediction model is difficult to construct by statistical problems. Thus, an appropriate value of mm will be determined by this trade-off and is strongly depending on the target problem.

For 10-dimensional datasets (F11F_{11} (10D) and F12F_{12} (10D)), CONBQA outperforms Add-GP in an early stage, but slightly worse than Add-GP at the final stage of optimization. This is because CONBQA cannot get exactly to the optimal point due to the finite resolution of random subspace coding. Note that this resolution problem can be resolved by using large mm, but the trade-off described above should be appeared. Thus, it is not always necessary to make mm larger to obtain better optimizations. In applications such as experimental and simulation conditions tuning in physics, chemistry, and materials science, the number of iterations is severely limited. That is, the early stage of optimization is more important for these applications. In such cases, CONBQA would be preferred over Add-GP.

Refer to caption
Figure 4: Performance of CONBQA with 20, 60, and 100 bits in comparison to random search and AddGP with respect to three benchmark functions (F12F_{12} (5D), F11F_{11} (10D), and F12F_{12} (10D)) [31]. Simulated annealing was used to solve QUBO. 10 independent runs are performed and shaded region is the error.

IV.3 Higher Dimensionality

In the benchmark, CONBQA was applied to up to 10 dimensional problems. Since the ability of Ising machines is growing rapidly, CONBQA will likely be applied to problems with higher dimensionality in the future. To estimate how many bits are necessary for CONBQA, we investigated the resolution of random subspace coding for d=10,100d=10,100, and 10001000. The resolution is characterized by the average size of the intersection of all rectangles enclosing a randomly chosen point which is represented by RIR_{I}. The size of a rectangle is measured by the average of its edge lengths. If RIR_{I} is small, in a decoder, the possible region of 𝒙{\bm{x}}^{*} is small when 𝒛{\bm{z}}^{*} is given. Thus, the high resolution by random subspace coding is realized when RIR_{I} is small enough.

Figure 5 shows RIR_{I} depending on the number of bits for d=10,100d=10,100, and 10001000 for ds=2d_{s}=2. Here, the average over 50 randomly chosen points is shown with the error indicating the standard deviation. In our benchmarking in Sec. IV B, if 100 bits are used for 10-dimensional problems, a better performance than Add-GP in an early stage of optimization is realized in CONBQA. According to this fact, RI0.4R_{I}\simeq 0.4 would be enough to work CONBQA effectively for d=10d=10. Thus, with 10,000 bits, it is possible to reach sufficiently high resolution for 1,000-dimensional problems. For 100-dimensional problems, 1,000 bits will be sufficient. On the other hand, to get more closer to the optimal solution at the final stage of optimization, it is important to improve a resolution. In this case, we will need ten times more bits for each dimension of problems at least. Notice that statistical aspects are totally disregarded in this analysis: solving a high-dimensional problem may require a prohibitive number of black-box evaluations. Then, the required number of bits is strongly depending on target problems. Nevertheless, CONBQA has a potential to solve high-dimensional continuous black-box optimization problems that are difficult to solve by conventional methodologies when more bits will be used in quantum annealer.

Refer to caption
Figure 5: Average size of the intersection of all rectangles enclosing a randomly chosen point for random subspace coding RIR_{I} depending on the number of bits. Results at various dimensionality of problem dd are shown. The number of randomly chosen points is 50 and shaded region is the error.

V Conclusion

To summarize, we presented that a quantum annealing can be used to solve continuous black-box optimization problems, if a coding-decoding scheme and a machine learning algorithm are properly designed and combined. In this study, for a coding-decoding scheme, random subspace coding was applied. Mathematical foundation about decodability was laid out with intriguing connection to Helly’s Theorem. It was shown that CONBQA is competitive with a state-of-the-art black-box optimization algorithm, encouraging applications to various domains. In addition, random subspace coding will be one of the effective methods to handle continuous problems by Ising machines.

CONBQA is the first of its kind and there remains many possibilities for improvement. First, uncertainty quantification [32] can be combined with CONBQA. In black-box optimization, it is customary to compute uncertainty of prediction and incorporate it in the aquisition function, while CONBQA does not have such a mechanism. A special aspect about CONBQA is that optimization of the acquisition function is done by a quantum annealer, hence the solution is affected by noise. It is interesting to investigate how uncertainty quantification and quantum noise affect overall performance. Second, CONBQA currently make one suggestion at a time to the black-box system, but, in many cases, it is more desirable to make multiple suggestions simultanously. A quantum annealer is basically a sampler and generates many solutions at a time. Currently, only the best solution is used, but it may be possible to use other solutions as multiple suggestions.

In our algorithm, not only quantum annealer but also all QUBO solvers including quantum approximate optimization algorithm (QAOA) [33, 34, 35] using a gated-quantum computers are adopted. Thus, CONBQA may open up a new possibility of these machines to continuous valued-problems.

Acknowledgements.
This work is supported by AMED under Grant Number JP20nk0101111, the New Energy and Industrial Technology Development Organization (NEDO) (Grant No. P15009), Council for Science, Technology and Innovation (CSTI), Cross-ministerial Strategic Innovation Promotion Program (SIP) (Technologies for Smart Bio-industry and Agriculture, “Materials Integration” for Revolutionary Design System of Structural Materials, and Photonics and Quantum Technology for Society 5.0), and JST ERATO (Grant No. MJER1903).

References

  • Shahriari et al. [2015] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas, Taking the human out of the loop: A review of bayesian optimization, Proceedings of the IEEE 104, 148 (2015).
  • Terayama et al. [2021] K. Terayama, M. Sumita, R. Tamura, and K. Tsuda, Black-box optimization for automated discovery, Acc. Chem. Res. 54, 1334 (2021).
  • Noack et al. [2019] M. M. Noack, K. G. Yager, M. Fukuto, G. S. Doerk, R. Li, and J. A. Sethian, A kriging-based approach to autonomous experimentation with applications to x-ray scattering, Scientific reports 9, 1 (2019).
  • Vargas-Hernández et al. [2019] R. Vargas-Hernández, Y. Guan, D. Zhang, and R. Krems, Bayesian optimization for the inverse scattering problem in quantum reaction dynamics, New Journal of Physics 21, 022001 (2019).
  • Yamashita et al. [2018] T. Yamashita, N. Sato, H. Kino, T. Miyake, K. Tsuda, and T. Oguchi, Crystal structure prediction accelerated by bayesian optimization, Physical Review Materials 2, 013803 (2018).
  • Häse et al. [2018] F. Häse, L. M. Roch, C. Kreisbeck, and A. Aspuru-Guzik, Phoenics: A bayesian optimizer for chemistry, ACS central science 4, 1134 (2018).
  • Tamura and Hukushima [2018] R. Tamura and K. Hukushima, Bayesian optimization for computationally extensive probability distributions, PLoS ONE 13, e0193785 (2018).
  • Choffin and Ueda [2018] B. Choffin and N. Ueda, Scaling bayesian optimization up to higher dimensions: a review and comparison of recent algorithms, in 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP) (IEEE, 2018) pp. 1–6.
  • [9] Our implementation (CONBQA) is available on GitHub at https://github.com/tsudalab/conbqa/.
  • Kadowaki and Nishimori [1998] T. Kadowaki and H. Nishimori, Quantum annealing in the transverse Ising model, Phys. Rev. E 58, 5355 (1998).
  • Tanaka et al. [2017] S. Tanaka, R. Tamura, and B. K. Chakrabarti, Quantum spin glasses, annealing and computation (Cambridge University Press, 2017).
  • Johnson et al. [2011] M. W. Johnson, M. H. Amin, S. Gildert, T. Lanting, F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Johansson, and P. Bunyk, Quantum annealing with manufactured spins, Nature 473, 194 (2011).
  • Matsubara et al. [2017] S. Matsubara, H. Tamura, M. Takatsu, D. Yoo, B. Vatankhahghadim, H. Yamasaki, T. Miyazawa, S. Tsukamoto, Y. Watanabe, and K. Takemoto, Ising-model optimizer with parallel-trial bit-sieve engine, in Conference on Complex, Intelligent, and Software Intensive Systems (Springer, 2017) pp. 432–438.
  • Yamaoka et al. [2015] M. Yamaoka, C. Yoshimura, M. Hayashi, T. Okuyama, H. Aoki, and H. Mizuno, A 20k-spin Ising chip to solve combinatorial optimization problems with CMOS annealing, IEEE J. Solid-State Circuits 51, 303 (2015).
  • Inagaki et al. [2016] T. Inagaki, Y. Haribara, K. Igarashi, T. Sonobe, S. Tamate, T. Honjo, A. Marandi, P. L. McMahon, T. Umeki, and K. Enbutsu, A coherent Ising machine for 2000-node optimization problems, Science 354, 603 (2016).
  • McMahon et al. [2016] P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, S. Tamate, T. Inagaki, H. Takesue, S. Utsunomiya, and K. Aihara, A fully programmable 100-spin coherent Ising machine with all-to-all connections, Science 354, 614 (2016).
  • Goto et al. [2019] H. Goto, K. Tatsumura, and A. R. Dixon, Combinatorial optimization by simulating adiabatic bifurcations in nonlinear hamiltonian systems, Sci. Adv. 5, eaav2372 (2019).
  • Kitai et al. [2020] K. Kitai, J. Guo, S. Ju, S. Tanaka, K. Tsuda, J. Shiomi, and R. Tamura, Designing metamaterials with quantum annealing and factorization machines, Physical Review Research 2, 013319 (2020).
  • Hatakeyama-Sato et al. [2021] K. Hatakeyama-Sato, T. Kashikawa, K. Kimura, and K. Oyaizu, Tackling the challenge of a huge materials science search space with quantum‐inspired annealing, Adv. Intell. Syst. , 2000209 (2021).
  • Koshikawa et al. [2021] A. S. Koshikawa, M. Ohzeki, T. Kadowaki, and K. Tanaka, Benchmark test of black-box optimization using d-wave quantum annealer, arXiv , 2103.12320 (2021).
  • Kanerva [1988] P. Kanerva, Sparse distributed memory (MIT press, 1988).
  • Rachkovskii et al. [2005] D. Rachkovskii, S. Slipchenko, E. Kussul, and T. Baidyk, Properties of numeric codes for the scheme of random subspaces rsc, Cybernetics and Systems Analysis 41, 509 (2005).
  • Eckstein et al. [2019] J. Eckstein, A. Kagawa, and N. Goldberg, Repr: Rule-enhanced penalized regression, Informs Journal on Optimization 1, 143 (2019).
  • Atzmueller [2015] M. Atzmueller, Subgroup discovery, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5, 35 (2015).
  • Kandasamy et al. [2015] K. Kandasamy, J. Schneider, and B. Póczos, High dimensional bayesian optimisation and bandits via additive models, in International conference on machine learning (2015) pp. 295–304.
  • Devroye et al. [1993] L. Devroye, P. Epstein, and J.-R. Sack, On generating random intervals and hyperrectangles, Journal of Computational and Graphical Statistics 2, 291 (1993).
  • Chapuis et al. [2019] G. Chapuis, H. Djidjev, G. Hahn, and G. Rizk, Finding maximum cliques on the d-wave quantum annealer, Journal of Signal Processing Systems 91, 363 (2019).
  • Picheny et al. [2013] V. Picheny, T. Wagner, and D. Ginsbourger, A benchmark of kriging-based infill criteria for noisy optimization, Structural and Multidisciplinary Optimization 48, 607 (2013).
  • [29] https://www.dwavesys.com/sites/default/files/14-1048A-A_D-Wave_Hybrid_Solver_Service_plus_Advantage_Technology_Update_0.pdf.
  • [30] https://docs.ocean.dwavesys.com/projects/neal/en/latest/.
  • Xiaodong et al. [2013] L. Xiaodong, A. Engelbrecht, and M. G. Epitropakis, Benchmark functions for cec’2013 special session and competition on niching methods for multimodal function optimization,   (2013).
  • Smith [2013] R. C. Smith, Uncertainty quantification: theory, implementation, and applications, Vol. 12 (Siam, 2013).
  • Farhi et al. [2014] E. Farhi, J. Goldstone, and S. Gutmann, A quantum approximate optimization algorithm, arXiv , 1411.4028 (2014).
  • Hadfield et al. [2017] S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rieffel, D. Venturelli, and R. Biswas, From the quantum approximate optimization algorithm to a quantum alternating operator ansatz, arXiv , 1709.03489 (2017).
  • Otterbach et al. [2017] J. S. Otterbach, R. Manenti, N. Alidoust, A. Bestwick, M. Block, B. Bloom, S. Caldwell, N. Didier, E. S. Fried, S. Hong, P. Karalekas, C. B. Osborn, A. Papageorge, E. C. Peterson, G. Prawiroatmodjo, N. Rubin, C. A. Ryan, D. Scarabelli, M. Scheer, E. A. Sete, P. Sivarajah, R. S. Smith, A. Staley, N. Tezak, W. J. Zeng, A. Hudson, B. R. Johnson, M. Reagor, M. P. da Silva, and C. Rigetti, Unsupervised machine learning on a hybrid quantum computer, arXiv , 1712.05771 (2017).