This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

thanks: Majority completed at USRA

A Parameter Setting Heuristic for the Quantum Alternating Operator Ansatz

James Sud jsud@uchicago.edu Quantum Artificial Intelligence Laboratory (QuAIL), NASA Ames Research Center, Moffett Field, CA, 94035, USA USRA Research Institute for Advanced Computer Science (RIACS), Mountain View, CA, 94043, USA Department of Computer Science, University of Chicago, 5730 S Ellis Ave, Chicago, IL 60637, USA    Stuart Hadfield Quantum Artificial Intelligence Laboratory (QuAIL), NASA Ames Research Center, Moffett Field, CA, 94035, USA USRA Research Institute for Advanced Computer Science (RIACS), Mountain View, CA, 94043, USA    Eleanor Rieffel Quantum Artificial Intelligence Laboratory (QuAIL), NASA Ames Research Center, Moffett Field, CA, 94035, USA    Norm Tubman Quantum Artificial Intelligence Laboratory (QuAIL), NASA Ames Research Center, Moffett Field, CA, 94035, USA    Tad Hogg Quantum Artificial Intelligence Laboratory (QuAIL), NASA Ames Research Center, Moffett Field, CA, 94035, USA
(April 9, 2025)
Abstract

Parameterized quantum circuits are widely studied approaches to tackling challenging optimization problems. A prominent example is the Quantum Alternating Operator Ansatz (QAOA), a generalized approach that builds on the alternating structure of the Quantum Approximate Optimization Algorithm. Finding high-quality parameters efficiently for QAOA remains a major challenge in practice. In this work, we introduce a classical strategy for parameter setting, suitable for cases in which the number of distinct cost values grows only polynomially with the problem size, such as is common for constraint satisfaction problems. The crux of our strategy is that we replace the cost function expectation value step of QAOA with a classical model that also has parameters which play an analogous role to the QAOA parameters, but can be efficiently evaluated classically. This model is based on empirical observations of QAOA, where variable configurations with the same cost have the same amplitudes from step to step, which we define as Perfect Homogeneity. Perfect Homogeneity is known to hold exactly for problems with particular symmetries. More generally, high overlaps between QAOA states and states with Perfect Homogeneity have been empirically observed in a number of settings. Building on this idea, we define a Classical Homogeneous Proxy for QAOA in which this property holds exactly, and which yields information describing both states and expectation values. We classically determine high-quality parameters for this proxy, and then use these parameters in QAOA, an approach we label the Homogeneous Heuristic for Parameter Setting. We numerically examine this heuristic for MaxCut on unweighted Erdős-Rényi random graphs. For up to 33 QAOA levels we demonstrate that the heuristic is easily able to find parameters that match approximation ratios corresponding to previously-found globally optimized approaches. For levels up to 2020 we obtain parameters using our heuristic with approximation ratios monotonically increasing with depth, while a strategy that uses parameter transfer instead fails to converge with comparable classical resources. These results suggest that our heuristic may find good parameters in regimes that are intractable with noisy intermediate-scale quantum devices. Finally, we outline how our heuristic may be applied to wider classes of problems.

I Introduction

The Quantum Alternating Operator Ansatz (QAOA) [1] is a widely-studied parameterized quantum algorithm for tackling combinatorial optimization problems. It uses the alternating structure of Farhi et al.’s Quantum Approximate Optimization Algorithm [2], a structure that was also studied in special cases in much earlier work [3, 4]. Recent work has explored regimes for which suitable parameters for QAOA are difficult to pre-compute. This includes extensive research viewing QAOA as a variational quantum-classical algorithm (VQA), in which results from runs on quantum hardware are fed into a classical outer loop algorithm for updating the parameters. Optimizing parameters for VQAs can be quite challenging, as one typically needs to search over a large parameter space with a complex cost landscape. While for a successful algorithm, one does not necessarily need to find optimal parameters, but rather good enough parameters for a given target outcome, finding good parameters can be difficult due to the number of local optima [5, 6], and in some cases barren plateaus [7, 8]. Moreover, the high levels of noise present on current quantum devices can exacerbate these challenges [9].

We propose a novel approach to QAOA parameter optimization that maps the QAOA circuit to a simpler classical model. The crux of our approach is that in the parameter objective function (as introduced below), we replace the cost function expectation value, which is typically evaluated either on quantum hardware or using expensive classical evaluation, with a simpler model that has parameters that play an analogous role to the QAOA parameters, but can be efficiently evaluated classically. This approach is based on the observation, originally due to Hogg [3, 4], that one may leverage a classical “mean-field” or “homogeneous” model where variable configurations with the same cost have the same amplitudes from step to step, which we define as Perfect Homogeneity. Extending this idea, we define a Classical Homogeneous Proxy for QAOA (“proxy” for short) in which this property holds exactly, and which yields both information describing states and expectation values. We then classically determine high-quality parameters for this proxy, and then use these parameters in QAOA, an approach we label the Homogeneous Heuristic for Parameter Setting (“heuristic” for short). This heuristic is visualized in Fig. 1.

The heuristic is appropriate for classes of constraint satisfaction problems (CSPs) in which instances consists of a number of clauses polynomial in the number of variables, with each clause acting on a subset of the variables. For such problems, the number of distinct cost function values, and thus the computation time of the proxy, is polynomially bounded in the number of variable as desired. We describe the proxy for four such classes of CSPs: random kSAT, MaxCut on unweighted Erdős-Rényi model graphs, random MaxEkLin2, and random Max-k-XOR. For these examples, the proxy leverages information only about the class (as well as the number of variables and clauses), rather than problem instance, so that the proxy describes states and expectation values for the entire class. We then explore the heuristic for parameter setting on classes of MaxCut problems. This heuristic returns a set of parameters for the entire class, which can then be tested on instances from that class. Our results indicate that for MaxCut, the heuristic on 2020-node graphs is able to – for p=1p=1 ,22, and 33 – identify parameters yielding approximation ratios comparable to those returned by the commonly-used strategy of transferring globally-optimized parameters [10, 11]. We then demonstrate out to depth p=20p=20 that the heuristic identifies parameters yielding monotonically increasing approximation ratios as the depth of the algorithm increases. A parameter setting strategy that uses an identical outer loop parameter update scheme but does not leverage the proxy fails to converge given identical classical resources.

The practical implications of this work is that for classes of optimization problems such as CSPs with a fixed number of randomly selected clauses on a fixed number of variables, the Classical Homogeneous Proxy for QAOA can be computed with solely classical resources in time polynomially scaling with respect to pp as well as nn. Thus, the proxy avoids the issue of exponentially growing resources (with respect to pp or nn) of using exact classical evaluation of expectation values, and avoids the noise present on NISQ devices. The Homogeneous Heuristic for Parameter Setting leverages this proxy, so the subroutine evaluating cost expectation values may be much quicker. However, the parameter landscape may still be challenging to optimize over, especially in cases in which the number of free parameters grows with the problem size. Nevertheless, we demonstrate for MaxCut that our heuristic is able to outperform previous results of parameter transfer [10], indicating the potential of our heuristic approach (which may also be combined with other advancements in parameter setting, as discussed briefly later in the paper) to achieve further improvements in practice.

Refer to caption
Figure 1: Flowchart of parameter setting procedure using the Classical Homogeneous Proxy for QAOA. The boxes to the left of the dashed line denote inputs that can be viewed as classical pre-computations, as described in Sec. II. Given these inputs, along with an initial guess of parameters, one can perform parameter setting, in which a classical optimizer and the proxy are used in a loop to search for parameters for the algorithm. Here, the proxy is used to estimate values of the cost function in order to reduce the time and space complexity for cost function evaluation, while the choice of classical optimizer and initial parameters is left open. In the end, the procedure outputs good parameters for the proxy that are then used as parameters for QAOA.

Quantum Alternating Operator Ansatz: We here briefly describe the structure of QAOA [2, 3, 4, 1]. Given a cost function c(x)c(x) to optimize, a QAOA circuit consists of pp repeated applications of a mixing and cost Hamiltonian BB and CC in alternation, parameterized by 2p2p parameters (γ,β)(\vec{\gamma},\vec{\beta}):

|Ψ(γ,β)=eiβpBeiγpCeiβ1Beiγ1C|ψ0,\ket{\Psi(\vec{\gamma},\vec{\beta})}=e^{-i\beta_{p}B}e^{-i\gamma_{p}C}\cdots e^{-i\beta_{1}B}e^{-i\gamma_{1}C}\ket{\psi_{0}}, (1)

where |ψ0\ket{\psi_{0}} is an easily-prepared initial state. There is freedom in choosing the Hamiltonians BB, and CC, although typically (as is followed in this work), CC is chosen such that C|x=c(x)|xC\ket{x}=c(x)\ket{x} for all xx, and BB is a mixer that is simple to implement on hardware. More general operators and initial states for QAOA are proposed in [1]. In this work we choose the X mixer B=i=1nXiB=\sum_{i=1}^{n}X_{i}, where nn represents the total number of qubits in the circuit, and the starting state |ψ0\ket{\psi_{0}} is chosen to be the uniform superposition over all 2n2^{n} bitstrings, as originally considered in [2]. The goal of QAOA is then to produce a state Ψ(γ,β)\Psi(\vec{\gamma},\vec{\beta}) such that repeated sampling in the computational basis yields either an optimal or high-quality approximate solution to the problem. Finding such good parameters is the parameter setting problem and may be approached in a number of ways with different tradeoffs, ranging from black-box optimization techniques to application specific approaches. We refer to fixed QAOA parameters p,γ,βp,\vec{\gamma},\vec{\beta} as a parameter schedule.

We now describe QAOA as a VQA: given some classical cost function c(x)c(x) on nn variable bitstrings {0,1}n\{0,1\}^{n}, and a quantum circuit ansatzë with parameters Θ\vec{\Theta}, one defines a parameter objective function f(Θ)f(\vec{\Theta}) and optimizes over Θ\vec{\Theta} with respect to ff. In order to optimize over Θ\vec{\Theta}, a two-part cycle is typically employed. First, a classical outer loop chooses one or more parameters Θ\vec{\Theta} for which evaluations of f(Θ)f(\vec{\Theta}) or its partial derivatives are requested, based on initial or prior cycle information. Then, a subroutine evaluates ff and its derivatives for all Θ\vec{\Theta} requested by the outer loop. Given these evaluations, the parameter update scheme can then update the current best Θ\vec{\Theta} and choose a new set of parameters to test. Throughout this work we refer to the outer loop as the parameter update scheme and the inner subroutine as parameter objective function evaluation. Typically in QAOA, f(Θ)=f(γ,β)f(\vec{\Theta})=f(\vec{\gamma},\vec{\beta}) is taken to be γ,β|C|γ,β\braket{\vec{\gamma},\vec{\beta}}{C}{\vec{\gamma},\vec{\beta}}. This choice of ff measures the expected cost function value obtained from sampling the QAOA state, which we refer to as the typical parameter objective function. In this work, we will replace it with what we define as the homogeneous parameter objective function, which utilizes the Classical Homogeneous Proxy for QAOA.

Related Work: There have been numerous parameter setting strategies proposed for QAOA. Most of these strategies focus on improving the parameter update scheme, while keeping the typical parameter objective function. These strategies range from the simplest approach of directly applying classical optimizers to approaches incorporating more sophisticated machine learning techniques [12, 13, 14, 15, 16, 17, 18, 19, 20]. In practice, however, efficiently finding high-quality parameters remains a challenging task. Global optimization strategies often get stuck in navigating the parameter optimization landscape due to barren plateaus [7, 9, 8] or multitudes of local optima [5, 6], and the problem becomes especially challenging as the number of parameters grows due to the curse of dimensionality. In some circumstances parameter optimization can even become NP-hard [21]. There have been multiple methods proposed that attempt to alleviate these issues. The first of these include initializing parameters to be close to parameters informed by previous information or intuition [22, 23, 24]. One such example is to use parameters that represent a discretized linear quantum annealing schedule [25, 26]. Another example is based on the principle that globally optimal parameters for random QAOA instances drawn from the same distribution tend to concentrate [27, 2, 13, 28]. Thus, if globally optimal parameters are known for one, or many instances of a specific class of problems, these exact parameters (or median parameters for more than one instance) empirically perform well on a new, unseen instance from the same class [29, 11]. Another approach for improving parameter setting is re-parameterizing the circuit, which places constraints on the allowed parameter schedules, thereby reducing the number of parameters so that they are optimized more easily. In some cases, this re-parameterization can be informed by optimized parameters for typical instances, or by connections to quantum annealing [13, 30, 24, 23, 26].

An alternative paradigm for parameter setting are methods that modify the parameter objective function itself. Indeed, one can find closed form expressions for expected cost as a function of parameters in some cases, for example MaxCut at p=1p=1 [31], or p=2p=2 for high-girth regular graphs [32]. Moreover, when applicable one can take advantage of problem locality considerations (such as graphs of bounded vertex degree for MaxCut) to compute the necessary expectation values without requiring the full quantum state vector [2, 31]. In these cases, then, one could optimize parameters with respect to these expressions rather than by evaluating the entire dynamics. Other examples of parameter objective function modification include using conditional value at risk [33] and Gibbs-like functions [34], which are closely related to the usual parameter objective function. Similar in spirit to our approach, recent work [35, 36, 37] has proposed the use of surrogate models, which use quantum circuit measurement outcomes to construct an approximate parameter objective function. In contrast, our approach is a entirely classical, and the parameters it outputs may be used directly, or possibly improved further, given access to a quantum device. Additionally, a related perspective was recently proposed in [38], wherein the authors study the connection between single-layer QAOA states and pseudo-Boltzmann states where computational basis amplitudes are also expressed as functions of the corresponding costs, i.e., perfectly homogeneous in our terminology. While [38] provides additional motivation for our approach, the authors there do not consider cases beyond p=1p=1 and so their results do not immediately apply to parameter setting in the same way.

Outline of paper: In Sec. II we define the Classical Homogeneous Proxy for QAOA. In Sec. III, we demonstrate how to implement the proxy for classes of CSPs with a fixed, polynomial number of randomly drawn clauses. In Sec. IV we introduce the Homogeneous Heuristic for Parameter Setting. In Sec. V we provide numerical evidence for the efficacy of the proxy and the heuristic applied to MaxCut on unweighted Erdős-Rényi model graphs. Finally in Sec. VI we present the results of the heuristic for the MaxCut on the same class of graphs. We conclude in Sec. VII with a discussion of our results and several future research directions.

II Classical Homogeneous Proxy for QAOA

This section summarizes and generalizes the approach of [3], updating the notation to match current notation for QAOA and allowing for easier extension to other CSPs. In this section, we describe the Classical Homogeneous Proxy for QAOA from a sum-of-paths perspective [3, 39] using a set of classical equations that ensure what we presently define as Perfect Homogeneity.

Definition 1.

Perfect Homogeneity: For a given state |Ψ=x={0,1}nq(x)|x\ket{\Psi}=\sum_{x=\{0,1\}^{n}}q(x)\ket{x}, where q(x)q(x) is the amplitude of bitstring xx when written in the computational basis, |Ψ\ket{\Psi} has Perfect Homogeneity if and only if the the amplitude q(x)q(x) of all xx can be written solely as a function of c(x)c(x), i.e. Q(c(x))Q(c(x)). Then |Ψ=x=0,1nQ(c(x))|x\ket{\Psi}=\sum_{x={0,1}^{n}}Q(c(x))\ket{x}.

Perfect Homogeneity trivially holds for QAOA states with non-degenerate cost functions where each bitstring xx has a unique cost c(x)c(x), as well as the maximally degenerate case where the cost function is constant. A less trivial example is the class of cost functions that depend only on the Hamming weight of each bitstring, c(x)=c(|x|)c(x)=c(|x|), such as the Hamming ramp c(x)|x|c(x)\sim|x|. For this class, symmetry arguments [23] imply that the QAOA state retains Perfectly Homogeneity for any choice of QAOA circuit depth and parameters. Indeed, useful intuition for homogeneity can be gained from the symmetry perspective; in [23, Lem. 1] it is shown that for a classical cost function with symmetries compatible with the QAOA mixer and initial state, bitstrings that are connected by these symmetries will have identical QAOA amplitudes. In this case the corresponding QAOA states belong to a Hilbert space of reduced dimension. Perfect Homogeneity is an even stronger condition: all bitstrings with the same cost have identical amplitudes, not just those connected by mixer-compatible symmetries. Hence, QAOA states can be expected to be Perfectly Homogeneous only in limited cases, though these states may be near to Perfectly Homogeneous states in some settings [3]. We consider QAOA applied to cost functions with polynomially many distinct cost values, such that states may not have Perfect Homogeneity. Classically, however, we may assert Perfect Homogeneity to construct our proxy.

Additional intuition comes from the case of strictly kk-local problem Hamiltonians (such as, for instance, MaxCut), where the QAOA state is easily shown to be Perfectly Homogeneous to leading-order in |γ||\gamma|, independent of β\beta, with amplitude of bitstring xx given by 12n(1iγc(x)ei2k)\frac{1}{\sqrt{2^{n}}}(1-i\gamma c(x)e^{i2k}) in the p=1p=1 case. Similar analysis for the higher pp case also yields Perfect Homogeneity to first order in the γj\gamma_{j} parameters. This connection is considered further in Sec. V.2.

Given this intuition, we begin from the sum-of-paths perspective [39] for QAOA to provide preliminary analysis for our approach.

II.1 Classical Homogeneous Proxy for QAOA from the Sum-of-Paths Perspective for QAOA

The amplitude q(x)q_{\ell}(x) of a bitstring xx at step \ell induced by applying a layer of QAOA with parameters (γ,β)(\gamma,\beta) to a QAOA state with 1\ell-1 layers can be expressed succinctly as [39]

q(x)=x|γ,β=yq1(y)cosndxyβ(isinβ)dxyeiγcy,q_{\ell}(x)=\braket{x}{\vec{\gamma},\vec{\beta}}=\sum_{y}q_{\ell-1}(y)\cos^{n-d_{xy}}\beta(-i\sin\beta)^{d_{xy}}e^{-i\gamma c_{y}}, (2)

where dxyd_{xy} is the Hamming distance between bitstrings xx and yy, cosβndxy(isinβ)dxy=x|eiβB|y\cos\beta^{n-d_{xy}}(-i\sin\beta)^{d_{xy}}=\bra{x}e^{-i\beta B}\ket{y} are the mixing operator matrix elements, cyc_{y} is the cost of bitstring yy, and the sum is over all 2n2^{n} bitstrings yy in the computational basis.

Grouping the terms in Eq. (2), we can rewrite the sum as

q(x)=d,ccosndβ(isinβ)deiγcy|dxy=d,cy=cq1(y)q_{\ell}(x)=\sum_{d,c}\cos^{n-d}\beta(-i\sin\beta)^{d}e^{-i\gamma c}\sum_{y|d_{xy}=d,c_{y}=c}q_{\ell-1}(y) (3)

where the sum over dd runs from [0,n][0,n] and the sum over cc runs over the set of unique costs allowed by the cost function, which we describe in Sec. II.2. If it is the case that the amplitudes q1(x)q_{\ell-1}(x) of all bitstrings with cost cc^{\prime} are identical, then we can substitute q(x)q(x) with Q(c(x))Q(c(x)) for all xx. This is exactly Perfect Homogeneity as described in Def. 1. For the rest of this text we use this upper case Q(c(x))Q(c(x)) to distinguish a function of cost c=c(x)c=c(x) rather than the bitstring xx itself. This substitution yields

q(x)=d,ccosndβ(isinβ)deiγcyQ1(c)n(x;d,c),q_{\ell}(x)=\sum_{d,c}\cos^{n-d}\beta(-i\sin\beta)^{d}e^{-i\gamma c_{y}}Q_{\ell-1}(c)n(x;d,c), (4)

where n(x;d,c)n(x;d,c) represents the number of bitstrings with cost cc that are of Hamming distance dd from xx. We note that for this equation, and all following equations, this evolution is no longer restricted to unitary evolution. As such, the values qq and QQ no longer represent strictly quantum amplitudes, but rather track an object that is an analogue to the quantum amplitude. We now introduce the major modification to the sum-of-paths equations. For this, we replace each distribution n(x;d,c)n(x;d,c) with some distribution N(c;d,c)N(c^{\prime};d,c) where cx=cc_{x}=c^{\prime}, where we again adopt the upper case NN to distinguish a new distribution that solely depends on the cost of the bitstring. This distribution will generally differ from the original, but in this work we exhibit cases where N(c;d,c)N(c^{\prime};d,c) is efficiently computable and can adequately replaces n(x;d,c)n(x;d,c) for the purpose of parameter setting. These cases are discussed in Sec. III and the effectiveness of the replacement is numerically explored in Sec. V.

Using N(c;d,c)N(c^{\prime};d,c) distributions to replace n(x;d,c)n(x;d,c), then, we can further rewrite the sum as

q(x)=Q(c)=d,ccosndβ(isinβ)deiγcQ1(c)N(c;d,c),q_{\ell}(x)=Q_{\ell}(c^{\prime})=\sum_{d,c}\cos^{n-d}\beta(-i\sin\beta)^{d}e^{-i\gamma c}Q_{\ell-1}(c)N(c^{\prime};d,c), (5)

where Q(c)Q_{\ell}(c^{\prime}) is used to make explicit that the substitutions yield an expression for q(x)q_{\ell}(x) that depends solely on the cost cc^{\prime} of bitstring xx. This leads to a crucial point for our analysis: if we start in a state observing Perfect Homogeneity, and if the distributions n(x;d,c)n(x;d,c) can be replaced by distributions N(c;d,c)N(c^{\prime};d,c), which solely depend on the cost cc^{\prime} of xx, then subsequent layers retain the Perfect Homogeneity of the state, yielding analogues of amplitudes Q(c)Q(c^{\prime}). Thus, Eq. (5) is a recursive equation yielding Perfectly Homogeneous analogues of quantum states, which we label the Classical Homogeneous Proxy for QAOA. We here note that importantly, the proxy will not return the analogue of amplitude of some bitstring xx, but rather the analogue of amplitude of bitstrings with cost cc^{\prime}, as implicit knowledge of which bitstrings xx have cost cc^{\prime} would solve the underlying objective function. In Sec. V we argue that there are cases where these analogue of amplitudes Q(c)Q(c^{\prime}) are close to all q(x)q(x) with cx=cc_{x}=c^{\prime}.

II.2 Cost Distributions

In order to evaluate Eq. (5), we note that the set of costs to which indices cc and cc^{\prime} belong must be defined. Ideally, this set represents exactly the unique objective function values allowed by the underlying cost function. For optimization problems, however, this set is unknown, and is precisely what we would like to solve. Instead, we can form a reasonable estimate by determining upper and lower bounds on the cost function value, as well as by excluding energies which can be efficiently determined to be excluded by the cost function. In this work we denote this estimated set of unique costs as 𝒞\mathcal{C}. As an example, for CSPs with binary valued clauses, the cost function counts the number of satisfied clauses, trivially yielding 𝒞={0,1,,m}\mathcal{C}=\{0,1,...,m\}, where mm is the number of clauses. The set 𝒞\mathcal{C} also allows us to define the initial state Q0(c)Q_{0}(c^{\prime}) for the algorithm. QAOA typically begins with a uniform superposition over all bitstrings xx, such that q0(x)=12nq_{0}(x)=\frac{1}{\sqrt{2^{n}}} for all xx. Thus we can set Q0(c)=12nQ_{0}(c^{\prime})=\frac{1}{\sqrt{2^{n}}} for all cc^{\prime} in 𝒞\mathcal{C}.

While Eq. (5) yields analogues of amplitudes Q(c)Q_{\ell}(c^{\prime}), one may also wish to use the Classical Homogeneous Proxy for QAOA to return an estimate of expected value of the cost function. This requires computing a distribution P(c)P(c^{\prime}) for all cc^{\prime} in 𝒞\mathcal{C}, representing the probability a randomly chosen bitstring has cost cc^{\prime}, averaged over the entire class. Much like in the case of computing 𝒞\mathcal{C}, this computation is approximate, as a perfect computation of this distribution would solve the underlying objective function. Examples of estimations of P(c)P(c^{\prime}) are shown in Sec. III. In order to compute our estimate of expected value of the cost, then, we simply sum over costs cc^{\prime}, weighting by the expected number of bitstrings with cost cc^{\prime} (2nP(c)2^{n}P(c^{\prime})) and the squared analogue to the amplitude |Q(c)|2|Q_{\ell}(c^{\prime})|^{2},

C~=c2nP(c)|Q(c)|2c,\widetilde{\braket{C}}=\sum_{c^{\prime}}2^{n}P(c^{\prime})|Q_{\ell}(c^{\prime})|^{2}c^{\prime}, (6)

with the tilde indicating the use of the proxy and that this is no longer a strictly normalized quantum expectation value. This is exactly the equation we use for the homogeneous parameter objective function as described in the introduction. Note that P(c)P(c^{\prime}) does not give a bona fide probability distribution unless it is re-normalized by dividing by c2nP(c)|Q(c)|2\sum_{c^{\prime}}2^{n}P(c^{\prime})|Q_{\ell}(c^{\prime})|^{2} at each step; nevertheless we have found these factors remain close to unity for the parameter setting experiments considered below and so we neglect them going forward, which yields further computational savings. It is straightforward to introduce these factors into Eq. 6 if desired in other applications.

The set 𝒞\mathcal{C} and estimate to cost C~\widetilde{\braket{C}} are critically determined by the class of problems one is working with. Examples of these values for a sample of classes is given in Sec. III.

II.3 Algorithm for Computing the Classical Homogeneous Proxy for QAOA

Formalizing the results in this section, we present Algorithm 1, which describes how given QAOA parameters p,γ,βp,\vec{\gamma},\vec{\beta} a set of possible costs 𝒞\mathcal{C}, and assumed cost distribution N(c;d,c)N(c^{\prime};d,c) we can compute the Classical Homogeneous Proxy for QAOA using Eq. (5) and Eq. 6.

List of Algorithms 1 Classical Homogeneous Proxy for QAOA
Input: pp, γ\vec{\gamma}, β\vec{\beta}, nn, mm, 𝒞\mathcal{C}. N(c;d,c)N(c^{\prime};d,c) c𝒞\forall c^{\prime}\in\mathcal{C}, P(c)P(c^{\prime}) c𝒞\forall c^{\prime}\in\mathcal{C}.
Output: Qp(c)c𝒞Q_{p}(c^{\prime})\;\forall c^{\prime}\in\mathcal{C}, Optionally C(γ,β)h\braket{C(\vec{\gamma},\vec{\beta})}_{h}
Q0(c)12nc𝒞Q_{0}(c^{\prime})\leftarrow 1\sqrt{2^{n}}\;\forall c^{\prime}\in\mathcal{C}
for \ell in [1,p][1,p] do
   Q(c)d,ccosβnd(isinβ)deiγcQ1(c)N(c;d,c)c𝒞Q_{\ell}(c^{\prime})\leftarrow\sum_{d,c}\cos\beta^{n-d}(-i\sin\beta)^{d}e^{-i\gamma c}Q_{\ell-1}(c)N(c^{\prime};d,c)\quad\forall c^{\prime}\in\mathcal{C}
end for
If desired, C~c2nP(c)|Q(c)|2c\widetilde{\braket{C}}\leftarrow\sum_{c^{\prime}}2^{n}P(c^{\prime})|Q_{\ell}(c^{\prime})|^{2}c^{\prime}

Time Complexity: Given N(c;d,c)N(c^{\prime};d,c) as well as Q1(c)Q_{\ell-1}(c^{\prime}) for all cc^{\prime} in 𝒞\mathcal{C}, the calculation of each amplitude Q(c)Q_{\ell}(c^{\prime}) using Eq. (5) is 𝒪(n|𝒞|)\mathcal{O}(n|\mathcal{C}|). Thus, the calculation of all |𝒞||\mathcal{C}| amplitudes Q(c)Q_{\ell}(c^{\prime}) is 𝒪(n|𝒞|2)\mathcal{O}(n|\mathcal{C}|^{2}). Computing all Qp(c)Q_{p}(c^{\prime}) elements then is 𝒪(np|𝒞|2)\mathcal{O}(np|\mathcal{C}|^{2}). If desired, the evaluation of the cost function, given in Eq. (6), involves a sum over 𝒞\mathcal{C} elements, thus is 𝒪(|𝒞|)\mathcal{O}(|\mathcal{C}|).

Thus, if |𝒞||\mathcal{C}| is O(poly(n))O(\mathrm{poly}(n)), then Algorithm 1 is efficient. Indeed, we show such a case in the following section, demonstrating how to calculate the necessary pre-computations of 𝒞\mathcal{C}, as well as N(c;d,c)N(c^{\prime};d,c) and P(c)P(c^{\prime}) for all cc^{\prime} in 𝒞\mathcal{C}.

III Cost distributions for Randomly Drawn CSPs

In this section we demonstrate how to compute 𝒞\mathcal{C} and C~\widetilde{\braket{C}} for a number of random CSPs. For the Classical Homogeneous Proxy of Sec. II.1, we assumed that for all cc^{\prime} in 𝒞\mathcal{C}, we are given distributions N(c;d,c)N(c^{\prime};d,c) that suitably estimate n(x;d,c)n(x;d,c) for all xx with cost cc^{\prime}. Obtaining these distributions can be viewed as a pre-computation step, and when derived from the properties of a particular problem class they may then be applied for any instance within. Here, we identify common random classes of optimization problems where N(c;d,c)N(c^{\prime};d,c) can be efficiently computed for the entire class. Particularly, we focus on CSPs with a fixed number mm of Boolean clauses, each acting on kk variables selected at random from the set of nn variables. The types of allowed clauses is determined by the problem, for example SAT problems consider disjunctive clauses. We note that the analysis here generalizes a similar method in [3] applied to 3-SAT, allowing for easy extension to any CSP matching the above criteria. For these problems, the procedure is as follows, noting that all calculations done here are averaged over the entire class: we first can rewrite the expected number of bitstrings with cost cc at distance dd from a random bitstring of cost cc^{\prime} as,

N(c;d,c)=(nd)P(c|d,c),N(c^{\prime};d,c)=\binom{n}{d}P(c|d,c^{\prime}), (7)

where P(c|d,c)P(c|d,c^{\prime}) represents the probability that a bitstring at distance dd from a bitstring with cost cc^{\prime} has cost cc. This probability can then be rewritten as

P(c|d,c)=P(c,c|d)P(c),P(c|d,c^{\prime})=\frac{P(c^{\prime},c|d)}{P(c^{\prime})}, (8)

where P(c,c|d)P(c^{\prime},c|d) represents the probability that two bitstrings separated with Hamming distance dd have costs cc and cc^{\prime} and P(c)P(c^{\prime}) represents the probability that a randomly chosen bitstring has cost cc^{\prime}. The numerator can be calculated as follows:

P(c,c|d)=b=max(0,c+cm)min(c,c)P(b,cb,cb|d),P(c^{\prime},c|d)=\sum_{b=\max(0,c^{\prime}+c-m)}^{\min(c^{\prime},c)}P(b,c^{\prime}-b,c-b|d), (9)

where P(b,cb,cb|d)P(b,c^{\prime}-b,c-b|d) represents the probability that two randomly chosen bitstrings with Hamming distance dd have bb satisfied clauses in common, along with cost cc^{\prime} and cc. This expression can be evaluated via the multinomial distribution

P(b,cb,cb|d)=m!b!(cb)!(cb)!(m+b(c+c))!PbothbPone(c+c)2bPneitherm+b(c+c),P(b,c^{\prime}-b,c-b|d)=\frac{m!}{b!(c^{\prime}-b)!(c-b)!(m+b-(c^{\prime}+c))!}P_{\textnormal{both}}^{b}P_{\textnormal{one}}^{(c^{\prime}+c)-2b}P_{\textnormal{neither}}^{m+b-(c^{\prime}+c)}, (10)

where PbothP_{\textnormal{both}}, PoneP_{\textnormal{one}}, and PneitherP_{\textnormal{neither}} represent the probability that a randomly selected clause is satisfied by both, one, or neither of the bitstrings separated by Hamming distance dd, respectively. Since Pboth+2Pone+Pneither=1P_{\textnormal{both}}+2P_{\textnormal{one}}+P_{\textnormal{neither}}=1, one only needs to calculate two of these three variables. The previous equations then allow computing N(c;d,c)N(c^{\prime};d,c) for any value cc^{\prime} as

N(c;d,c)=(nd)1P(c)b=max(0,c+cm)min(c,c)m!b!(cb)!(cb)!(m+b(c+c))!PbothbPone(c+c)2bPneitherm+b(c+c).N(c^{\prime};d,c)=\binom{n}{d}\frac{1}{P(c^{\prime})}\sum_{b=\max(0,c^{\prime}+c-m)}^{\min(c^{\prime},c)}\frac{m!}{b!(c^{\prime}-b)!(c-b)!(m+b-(c^{\prime}+c))!}P_{\textnormal{both}}^{b}P_{\textnormal{one}}^{(c^{\prime}+c)-2b}P_{\textnormal{neither}}^{m+b-(c^{\prime}+c)}. (11)

We summarize this approach in Algorithm 2.

List of Algorithms 2 Classical Homogeneous Proxy for QAOA Pre-computation for Randomly Drawn Cost Function
Input: nn, mm, Description of problem class
Output: 𝒞\mathcal{C}. N(c;d,c)N(c^{\prime};d,c), P(c)P(c^{\prime}) c𝒞\forall c^{\prime}\in\mathcal{C}
Determine a suitable 𝒞\mathcal{C} via Sec. II.2.
Compute PbothP_{\textnormal{both}}, PoneP_{\textnormal{one}}, PneitherP_{\textnormal{neither}} and P(c)c𝒞P(c^{\prime})\;\forall c^{\prime}\in\mathcal{C} given the problem class
Use these values to compute N(c;d,c)c𝒞N(c^{\prime};d,c)\;\forall c^{\prime}\in\mathcal{C} via Eq. (11)

Time Complexity: There are |𝒞||\mathcal{C}| distributions N(c;d,c)N(c^{\prime};d,c) with (n+1)|𝒞|(n+1)|\mathcal{C}| elements each. For fixed cc^{\prime}, dd, and cc, we must sum over |𝒞||\mathcal{C}| terms that can be evaluated in 𝒪(|𝒞|)\mathcal{O}(|\mathcal{C}|) steps, given by the time complexity of evaluating factorials of 𝒪(|𝒞|)\mathcal{O}(|\mathcal{C}|). Thus the evaluation of all distributions is 𝒪(n|𝒞|4)\mathcal{O}(n|\mathcal{C}|^{4}). We once again note that if |𝒞||\mathcal{C}| is O(poly(n))O(\mathrm{poly}(n)), Algorithm 2 runs in polynomial time.

We now demonstrate Algorithm 2 for several common problem classes.

III.1 MaxCut

We first analyze MaxCut on Erdős-Rényi random graphs 𝒢(n,pe)\mathcal{G}(n,p_{e}). In this model, a graph GG is generated by scanning over all possible (n2)\binom{n}{2} edges in an nn node graph, and including each edge with independent probability pep_{e}. The MaxCut problem on GG is to partition the nn nodes into two sets such that the number of cut edges crossing the partition is maximized. With respect to the class 𝒢(n,pe)\mathcal{G}(n,p_{e}) of graphs the cost function to be maximized over configurations xx is

c(x)=i,j(n2)eijxixj,c(x)=\sum_{\langle i,j\rangle\in\binom{n}{2}}e_{ij}x_{i}\oplus x_{j}, (12)

where eije_{ij} are independent binary random variables that take on value 0 with probability 1pe1-p_{e} and 11 with probability pep_{e}. We use this cost function to evaluate the relevant distributions in Eq. (11). We start by noting for a fixed graph GG with mm edges we have 0c(x)m0\leq c(x)\leq m, and the expected number of edges 𝔼[m]\mathbb{E}[m] in graphs drawn from 𝒢(n,pe)\mathcal{G}(n,p_{e}) is pe(n2)p_{e}\binom{n}{2}, with a standard deviation proportional to to the square root of this value as determined by the binomial distribution. Hence, as nn becomes large the expected number of edges concentrates to pe(n2)p_{e}\binom{n}{2}, and so for simplicity in the remainder of this work we set 𝒞={0,1,,𝔼[m]}\mathcal{C}=\{0,1,\dots,\lceil\mathbb{E}[m]\rceil\}. Note that in practice, to accommodate the possibility of a given instance with maximum cut greater than this quantity, one may increase 𝒞\mathcal{C} by several standard deviations as desired.

We can then easily calculate P(c)P(c^{\prime}), the probability a random bitstring has cost cc^{\prime} for Eq. (12). For a bitstring drawn uniformly at random, the probability of satisfying any given clause xixjx_{i}\oplus x_{j} is 1/21/2, as there are two satisfying assignments (0101 and 1010), and two non-satisfying assignments (0000 and 1111). Thus, the probability P(c)P(c^{\prime}) also follows a binomial distribution and is simply

P(c)=(12)m(mc).P(c^{\prime})=\left(\frac{1}{2}\right)^{m}\binom{m}{c^{\prime}}. (13)

We can then calculate PbothP_{\textnormal{both}}, PoneP_{\textnormal{one}} and PneitherP_{\textnormal{neither}}, where we show all three for didactic purposes (since Pboth+2Pone+Pneither=1P_{\textnormal{both}}+2P_{\textnormal{one}}+P_{\textnormal{neither}}=1). To see this, consider two randomly chosen bitstrings yy and zz, separated by Hamming distance dd. We note that there are (nd)(n-d) bits in common between yy and zz and dd bits different. Thus yiyj=zizjy_{i}\oplus y_{j}=z_{i}\oplus z_{j} requires that either both ii and jj are from the set of (nd)(n-d) same bits or the set of dd different bits, which has probability

Psame=(nd2)(n2)+(d2)(n2).P_{\textnormal{same}}=\frac{\binom{n-d}{2}}{\binom{n}{2}}+\frac{\binom{d}{2}}{\binom{n}{2}}. (14)

From this expression, we can easily see that Pboth=Pneither=12PsameP_{\textnormal{both}}=P_{\textnormal{neither}}=\frac{1}{2}P_{\textnormal{same}}. Since PsameP_{\textnormal{same}} represents the probability that yiyj=zizjy_{i}\oplus y_{j}=z_{i}\oplus z_{j}, for a random clause and a random bitstring, the probability that yiyj=1y_{i}\oplus y_{j}=1 is 1/21/2 and the same is true for yiyj=0y_{i}\oplus y_{j}=0 by symmetry. Thus we have

Pboth=Pneither=12((nd2)(n2)+(d2)(n2)).P_{\textnormal{both}}=P_{\textnormal{neither}}=\frac{1}{2}\left(\frac{\binom{n-d}{2}}{\binom{n}{2}}+\frac{\binom{d}{2}}{\binom{n}{2}}\right). (15)

For completion, we can calculate PoneP_{\textnormal{one}}, which is (1PsamePboth)/2(1-P_{\textnormal{same}}-P_{\textnormal{both}})/2 as shown in Sec. II.1. For this term we need yiyjzizjy_{i}\oplus y_{j}\neq z_{i}\oplus z_{j}, which can be accomplished if one of ii or jj is chosen from the (nd)(n-d) bits in common and the other is chosen from the dd differing bits. This probability is

Pdif=(nd1)(d1)(n2).P_{\textnormal{dif}}=\frac{\binom{n-d}{1}\binom{d}{1}}{\binom{n}{2}}. (16)

Thus, PoneP_{\textnormal{one}}, which is the probability of specifically yy satisfying and zz not satisfying (or vice versa) is half of PdifP_{\textnormal{dif}}, so

Pone=12((nd1)(d1)(n2)).P_{\textnormal{one}}=\frac{1}{2}\left(\frac{\binom{n-d}{1}\binom{d}{1}}{\binom{n}{2}}\right). (17)

Using these quantities N(c;d,c)N(c^{\prime};d,c) is then obtained from Eq. 11.

III.2 MaxE3Lin2/Max-3-XOR

We next consider the MaxE3Lin2 problem which generalizes MaxCut. QAOA for MaxE3Lin2 was analyzed by Farhi et al. [2], who showed an advantage over classical approximation algorithms, only to inspire better classical approaches [40, 41]. The analogous random class of MaxE3Lin2 problems has cost function

c(x)=a<b<cdabc(xaxbxczabc),c(x)=\sum_{a<b<c}d_{abc}(x_{a}\oplus x_{b}\oplus x_{c}\oplus z_{abc}), (18)

where the zabcz_{abc}, ,dabc,d_{abc} are independent random variables with equal probability of being 0 or 11.

Using Eq. (18) we can again calculate the relevant probability distributions. First note that a random bitstring xx will satisfy a individual clause (i.e. term in the sum) with probability 1/21/2, as (xa+xb+xc)mod2(x_{a}+x_{b}+x_{c})\mod 2 has an equal probability to be 0 or 11. Thus,

P(c)=(12)m(mc),P(c^{\prime})=\left(\frac{1}{2}\right)^{m}\binom{m}{c^{\prime}}, (19)

using the binomial distribution. Then we note that, as in Sec. III.1, the probability of (ya+yb+yc)mod2=(za+zb+zc)mod2(y_{a}+y_{b}+y_{c})\mod 2=(z_{a}+z_{b}+z_{c})\mod 2 for two random bitstrings with d(x,y)=dd(x,y)=d is given by

Psame=(nd3)+(d2)(nd1)(n3),P_{\textnormal{same}}=\frac{\binom{n-d}{3}+\binom{d}{2}\binom{n-d}{1}}{\binom{n}{3}}, (20)

since the value of (xa+xb+xc)mod2(x_{a}+x_{b}+x_{c})\mod 2 is preserved if none or two of xa,xb,xcx_{a},x_{b},x_{c} are flipped, which is satisfied if a,b,ca,b,c are all from the ndn-d identical bits, or two of a,b,ca,b,c are chosen from the dd differing bits. Thus, we can easily compute Pboth=Pneither=Psame/2P_{\textnormal{both}}=P_{\textnormal{neither}}=P_{\textnormal{same}}/2, since there is an equal chance (ya+yb+yc)mod2=(za+zb+zc)mod2=0/1(y_{a}+y_{b}+y_{c})\mod 2=(z_{a}+z_{b}+z_{c})\mod 2=0/1. PoneP_{\textnormal{one}} can then be calculated by a similar argument or by taking Pone=(1PbothPneither)/2P_{\textnormal{one}}=(1-P_{\textnormal{both}}-P_{\textnormal{neither}})/2. We note that the probability distributions calculated here are exactly equivalent to those for the Max-3-XOR problem, where all zabcz_{abc} in Eq. (18) are taken to be 11.

III.3 MaxEkLin2/Max-k-XOR

The MaxEkLin2 problem is a further generalization for each fixed for fixed kk of MaxE3Lin2, where we replace the a,b,ca,b,c with a1,,aka_{1},...,a_{k} in the cost function and the sum is taken over hyperedges of size kk. This class of problems was previously studied for QAOA in [42, 43]. For each kk, P(c)P(c^{\prime}) is the same as for MaxE3Lin2 above. However, PsameP_{\textnormal{same}} is given by

Psame=h=0k/2(d2l)(ndk2l)(nk),P_{\textnormal{same}}=\frac{\sum_{h=0}^{\lfloor k/2\rfloor}\binom{d}{2l}\binom{n-d}{k-2l}}{\binom{n}{k}}, (21)

where the terms in the sum represent all possible ways to choose an even number of bits to flip from the kk bits in the clause out of dd total bits. Then, again, we have again Pboth=Pneither=Psame/2P_{\textnormal{both}}=P_{\textnormal{neither}}=P_{\textnormal{same}}/2 and Pone=(1PbothPneither)/2P_{\textnormal{one}}=(1-P_{\textnormal{both}}-P_{\textnormal{neither}})/2. A similar calculation for the Max-k-XOR problem again yields identical probability distributions.

III.4 Rand-k-SAT

The case of random k-SAT is analyzed by Hogg in [3]. This cost function is defined as the sum over mm clauses of a logical OR of kk variables randomly drawn from a set of nn variables, each of which may be negated. In the notation used in this paper, the distributions of interest are

P(c)=2km(2k1)mc(mc),Pboth=2k(ndk)(nk),Pone=2kPboth,Pneither=12PonePboth.P(c^{\prime})=2^{-km}(2^{k}-1)^{m-c^{\prime}}\binom{m}{c^{\prime}},\quad P_{\textnormal{both}}=\frac{2^{-k}\binom{n-d}{k}}{\binom{n}{k}},\quad P_{\textnormal{one}}=2^{-k}-P_{\textnormal{both}},\quad P_{\textnormal{neither}}=1-2P_{\textnormal{one}}-P_{\textnormal{both}}. (22)

IV Homogeneous Heuristic for Parameter Setting

Leveraging the Classical Homogeneous Proxy for QAOA, here we propose a strategy for finding good algorithm parameters, which we call the Homogeneous Heuristic for Parameter Setting, as pictured in Fig. 1 and formalized in Algorithm 3.

List of Algorithms 3 Homogeneous Heuristic for Parameter Setting
Input: pp, γin\vec{\gamma_{\mathrm{in}}}, βin\vec{\beta_{\mathrm{in}}}, nn, mm, 𝒞\mathcal{C}. N(c;d,c)N(c^{\prime};d,c) c𝒞\forall c^{\prime}\in\mathcal{C}, P(c)P(c^{\prime}) c𝒞\forall c^{\prime}\in\mathcal{C}, constraints on (γ,β)(\vec{\gamma},\vec{\beta}).
Output: γout\vec{\gamma_{\mathrm{out}}}, βout\vec{\beta_{\mathrm{out}}}
γ,βγin,βin\vec{\gamma},\vec{\beta}\leftarrow\vec{\gamma_{\mathrm{in}}},\vec{\beta_{\mathrm{in}}}
while Desired do
   1) Using any suitable parameter update scheme, perform one update of (γ,β)(\vec{\gamma},\vec{\beta})
   2) Evaluate the homogeneous parameter objective function (Eq. (6)) using the Classical Homogeneous Proxy for QAOA for all (γ,β)(\vec{\gamma},\vec{\beta}) required to perform next parameter update in 1)
end while
γout,βoutγ,β\vec{\gamma_{\mathrm{out}}},\vec{\beta_{\mathrm{out}}}\leftarrow\vec{\gamma},\vec{\beta}

Here, a “suitable parameter update scheme” is intended to encapsulate a wide variety of general or specific approaches proposed for this in the literature, “Desired” denotes that the while loop can be iterated until the update scheme terminates or some desired convergence criteria is met, and “Constraints on (γ,β)(\vec{\gamma},\vec{\beta})” denotes any restrictions on the domain of values allowed for γ,β\vec{\gamma},\vec{\beta}, including restrictions to schedules of a prescribed form such as linear ramps introduced in Sec. V.2.

With the heuristic, we replace the typical cost expectation value with the homogeneous parameter objective function, where each function evaluation takes time as determined by Algorithms 1 and 2. This heuristic is purposefully defined in broad terms, in order to maintain complete freedom in the choice of parameter update schemes. Thus, one can still apply a myriad of approaches explored in parameter setting literature, such as parameter initialization, re-parameterization, and the use of different global or local optimization algorithms.

On the other hand, we emphasize that while our approach can significantly speed up the parameter setting task, it is by no means a panacea. Indeed, in cases where the number of parameters to optimize grows with the problem size (e.g., when p=np=n), this problem suffers generically from the curse of dimensionality, as well as other potential difficulties such as barren plateaus or plentiful local optima. Hence the incorporation of a variety of parameter setting strategies or approximations that seek to ameliorate these difficulties within our approach is well-motivated.

V Numerically Investigating the Classical Homogeneous Proxy for QAOA for MaxCut

In this section, we explore the application of the Classical Homogeneous Proxy for QAOA to MaxCut on Erdős-Rényi 𝒢(n,pe)\mathcal{G}(n,p_{e}) model graphs as considered in Sec. III.1. We first numerically study the accuracy of replacing n(x;d,c)n(x;d,c) distributions with N(c;d,c)N(c^{\prime};d,c) distributions as calculated via the methods presented in Sec. III. We then numerically show that the proxy maintains large overlaps with full classical statevector simulation of QAOA for certain parameter schedules. Finally, we provide a toy example for a small graph at p=3p=3, empirically showing that the homogeneous and typical parameter objective function correlate significantly for important parameter regimes.

V.1 Viability of Replacement Distance and Cost Distributions

For these experiments, we first choose ten 𝒢(10,1/3)\mathcal{G}(10,1/3) Erdős-Rényi model graphs, and calculate the n(x;d,c)n(x;d,c) for all bitstrings xx in {0,1}n\{0,1\}^{n}, all d in [0,n][0,n] and all cc in [0,m][0,m], with mm being an upper bound on the maximum cost (here we use pe(n2)p_{e}\binom{n}{2}, as described in Sec. III.1, in this case m=15m=15). For each cost cc^{\prime}, we consider n(x;d,c)n(x;d,c) for all states xx with cost cc^{\prime} across all 1010 graphs. In order to evaluate the viability of replacing n(x;d,c)n(x;d,c) with N(c;d,c)N(c^{\prime};d,c), we present the following intuition: the better that N(c;d,c)N(c^{\prime};d,c) estimates the average of n(x;d,c)n(x;d,c) over all xx with c(x)=cc(x)=c^{\prime}, and the less that n(x;d,c)n(x;d,c) deviates over xx with c(x)=cc(x)=c^{\prime}, the better N(c;d,c)N(c^{\prime};d,c) should estimate n(x;d,c)n(x;d,c) for all xx with c(x)=cc(x)=c^{\prime}. We thus aim to numerically demonstrate the extent to which both the analytically derived N(c;d,c)N(c^{\prime};d,c) estimate the average n(x;d,c)n(x;d,c) and to which n(x;d,c)n(x;d,c) deviates from its average. We first demonstrate the latter. To do this, we take the element-wise averages of these distributions. This average is one way of computing the distributions N(c;d,c)N(c^{\prime};d,c), as described in Sec. II.1. We also take the element-wise standard deviations of these distributions. In Fig. 2 we display the element-wise ratio of standard deviation to mean of these distributions for c=7c^{\prime}=7, chosen because P(c)P(c^{\prime}) is maximal near 77, such that this term has large weight in Eq. (5).

Refer to caption
Figure 2: Heat map of the standard deviation/mean of N(c;d,c)N(c^{\prime};d,c) distributions for 10 instances of 𝒢(10,1/3)\mathcal{G}(10,1/3) graphs, with cc^{\prime} fixed at 77. Hatched out squares correspond to those elements of the distribution for which N(c;d,c)N(c^{\prime};d,c) is always 0, meaning that no bitstring at distance dd from a bitstring with cost 77 has cost cc. c=7c^{\prime}=7 was chosen because for 𝒢(10,1/3)\mathcal{G}(10,1/3) graphs, P(c)P(c^{\prime}) is peaked near 77, so this represents a typical instance of an N(c;d,c)N(c^{\prime};d,c) distribution.

From the figure, we see that for costs near m/2m/2 (7.57.5) and distances near n/2n/2 (55), there is minimal deviation in the N(c;d,c)N(c^{\prime};d,c) distributions among multiple instances of Erdős-Rényi model graphs and multiple bitstrings xx with cost cc^{\prime}. We note that the relative deviation is highest at the edges of the plot, where d0d\rightarrow 0, dnd\rightarrow n, ccminc\rightarrow c_{min}, and ccmaxc\rightarrow c_{max}. We note, however, that at these points, the expected value of n(x;d,c)n(x;d,c) is smaller, such that the contribution of these deviations to the sum in Eq. (4) is less than those with distance and cost near the center of the distribution. As an example, there are (105)=252\binom{10}{5}=252 bitstrings at distance 55 from a given bitstring xx, as opposed to (101)=10\binom{10}{1}=10 bitstrings at distance 11. A similar argument can be made using the values of P(c)P(c^{\prime}) derived in Sec. III. This numerical evidence thus suggests that replacing n(x;d,c)n(x;d,c) with N(c;d,c)N(c^{\prime};d,c) determined via an averaging over all xx with cost cc^{\prime} over multiple instances may introduce deviations from Eq. (5) precisely in cases that contribute to less Eq. (2), allowing for near-homogeneous evolution.

This result provides evidence that the deviation in n(x;d,c)n(x;d,c) is small for the “bulk” of contributions to the sum in Eq. (2), such that we can replace n(x;d,c)n(x;d,c) with N(c;d,c)N(c^{\prime};d,c), the average over xx with c(x)=cc(x)=c^{\prime} for the entire class of problems. We then would like to understand how the analytically derived expressions for N(c;d,c)N(c^{\prime};d,c) in Sec. III estimate these averaged distributions. We perform the comparison of N(c;d,c)N(c^{\prime};d,c) computed by averaging and the methods in Sec. III for MaxCut on Erdős-Rényi model graphs in Fig. 3, showing the Pearson correlation coefficient between the two distributions for each cc^{\prime} in [0,m][0,m]. We likewise display P(c)P(c^{\prime}), in order to elucidate the dominant distributions in the sum of Eq. (5).

Refer to caption
Figure 3: (middle) Pearson Correlation Coefficients between N(c;d,c)N(c^{\prime};d,c), calculated through the averaging over 10 𝒢(10,1/3)\mathcal{G}(10,1/3) graph instances and through the analytical method presented in Sec. III. P(c)P(c^{\prime}), the probability of a random bitstring having cost cc^{\prime} is also displayed to elucidate dominant terms in the sum of Eq. (5). Inserts display the distribution for fixed cc^{\prime} for (left) c=7c^{\prime}=7 using (top) the analytical approach and (bottom) the averaging approach. (right) insert displays the same for c=13c^{\prime}=13.

From the figure we see that for dominant terms (with high P(c)P(c^{\prime})), the two distributions align well visually, corresponding to a high correlation coefficient. For less important terms, the analytical distributions do not match the average over many instances, but the effect of this mismatch may be reduced due to the lesser weight on these terms as determined by P(c)P(c^{\prime}) in the sums of equation Eq. (5).

Combined, Figs. 2 and 3 show that for dominant terms, there is little deviation in n(x;d,c)n(x;d,c) distributions from their average N(c;d,c)N(c^{\prime};d,c), and the analytically computed distributions match these average distributions well. Thus, they together indicate that the analytical methods for calculating N(c;d,c)N(c^{\prime};d,c) should approximate n(x;d,c)n(x;d,c) with c(x)=cc(x)=c^{\prime} for terms that dominate the sum in Eq. (5).

V.2 Numerical Overlaps

To further investigate the viability of the Classical Homogeneous Proxy for QAOA, we perform numerical simulations of the proxy on MaxCut on Erdős-Rényi graphs 𝒢(10,.5)\mathcal{G}(10,.5), and display the squared overlap between classical full statevector simulation and the proxy as a function of pp for various parameter schedules motivated by QAOA literature. For this analysis, we choose linear ramp parameter schedules, inspired by quantum annealing. In particular, we fix a starting and ending point for γ\vec{\gamma} and β\vec{\beta}, which is kept constant regardless of pp, and the schedule is defined as

γ=γ1+(γfγ1)p,β=β1+(βfβ1)p\gamma_{\ell}=\gamma_{1}+(\gamma_{f}-\gamma_{1})\frac{\ell}{p},\quad\beta_{\ell}=\beta_{1}+(\beta_{f}-\beta_{1})\frac{\ell}{p} (23)

for each layer {\ell}. Given these linear ramp schedules, the squared overlaps between the replace and original quantities are calculated as follows. First, statevector simulation was performed using HybridQ, an open-source package for large-scale quantum circuit simulation [44]. Next, the N(c;d,c)N(c^{\prime};d,c) distributions are computed according to Eq. III. Then, for the purposes of comparison, we compute the proxy slightly differently from above, by starting with the uniform superposition over all 2n2^{n} bitstrings and simply plugging in N(c(x);d,c)N(c(x);d,c) for all n(x;d,c)n(x;d,c) in Eq. (4), keeping all 2n2^{n} amplitudes q(x)q^{\ell}(x) at each step \ell. This allows us to compute the overlap between the proxy and true state using standard quantum state overlap, although we lose the gain in complexity from performing the proxy on only the set of unique costs, as we are working with 2n2^{n} bitstrings rather than |𝒞||\mathcal{C}| costs. Then, using this method, we plot the squared overlaps between the replace and original quantities as a function of pp with varying values of γ1\gamma_{1} and γf\gamma_{f} in Fig. 4.

Refer to caption
Figure 4: Squared overlap between p=20p=20 MaxCut QAOA run on a 𝒢(8,1/2)\mathcal{G}(8,1/2) Erdős-Rényi model graph instance using full state simulation vs. Eq. (5), with distributions calculated as in Sec. III as a function of current QAOA layer. A fixed linear ramp parameter schedule is chosen, with γ\gamma increasing and β\beta decreasing at each step. Each curve corresponds to differing values of γ1\gamma_{1} and γf\gamma_{f}

From the figure, we can see that the overlap gradually decreases as the number of QAOA layers increases. However, the decline is less dramatic when γ1\gamma_{1} and γf\gamma_{f} are lower in magnitude. Thus, we see that as we move towards the γ1\gamma\ll 1 regime for these problems (or, more precisely γC\gamma\ll\|C\| [39]), the true QAOA state remains closer to the proxy even as the algorithm progresses, as remarked in Sec II. We stress that this behavior is empirical and the numerics are limited to the MaxCut examples analyzed presently. This behavior does however align with the analytical fact mentioned in Sec. II that to first order in γ\gamma, QAOA states are Perfectly Homogeneous for strictly k-local Hamiltonians.

V.3 Parameter Objective Function Landscapes at Low Depth

In order to provide an explicit illustrative example for the efficacy of the Homogeneous Heuristic for Parameter Setting in Alg. 3, we depict the both the typical and homogeneous parameter objective function as a function of γ3\gamma_{3} and β3\beta_{3} for a randomly drawn 𝒢(8,1/2)\mathcal{G}(8,1/2) graph and QAOA with p=3p=3. In this example, our aim is to visualize similarities between the two parameter objective functions rather than to exhaustively find optimal parameters for the graph. As such, we borrow γ1\gamma_{1}, γ2\gamma_{2}, β1\beta_{1} and β2\beta_{2}, optimized from a 2020 node instance in Sec. VI.1. These landscapes are shown in Fig. 5.

Refer to caption
Refer to caption
Figure 5: Parameter objective function landscapes displayed as heat maps as a function of γ3\gamma_{3} and β3\beta_{3} for p=3p=3 MaxCut QAOA on 𝒢(8,1/2)\mathcal{G}(8,1/2) Erdős-Rényi model graph with γ1\gamma_{1}, γ2\gamma_{2}, β1\beta_{1} and β2\beta_{2} fixed. (left) The typical parameter objective function, computed via classical full statevector simulation. (right) The homogeneous parameter objective function computed via the Classical Homogeneous Proxy for QAOA.

It is visually clear that the two landscapes have significant differences. For the typical parameter objective function, there exists a clearly defined, Gaussian-like peak (yellow) and valley (blue). For the homogeneous parameter objective function, there exists a similarly-located peak, albeit vertically compressed, and the corresponding valley comprises almost the entire rest of the landscape. However, we can see that in the small γ\gamma regime in particular, the landscapes qualitatively look very similar. This behavior is suggested by Fig. 4, where we see quantitatively that the extent to which the Classical Homogeneous Proxy for QAOA overlaps with the true QAOA evolution grows as γ\gamma decreases. Additionally, as seen in previous studies of QAOA parameters [13, 14, 11, 10], optimal values of γ\gamma tend to remain relatively small (exact values are not described as they depend on nn, pp, and the cost function being used), especially at the beginning of the algorithm. This suggests that, while the homogeneous and typical parameter objective functions may deviate significantly in general, they maintain significant correlations in relevant parameter regimes, specifically those which are near the maximum in the landscape. Indeed, for the task of parameter setting, we expect qualitative feature mapping of the landscape to be much more important than a precise matching of objective function values.

It is also worthwhile to consider the difference in computational resources needed to produce the two plots. For the typical parameter objective function, in order to classically evaluate the evolution of the algorithm in each cell of the presented 30 by 30 landscapes, we perform full statevector simulation. Farhi et al. show in [2], that in order to compute Pauli observable expectation values for the typical parameter objective function, one only needs to include qubits that are within the reverse causal cone generated by the qubits involved in the observable and the quantum circuit implementing QAOA. However, for the example analyzed here, at p=3p=3, this reverse causal cone includes all 88 qubits for each observable, so in order to classically compute the evolution, we perform a full-state simulation. Thus, deriving evolution of the proxy took roughly one-fiftieth of the time required for simulating full QAOA. We note that it is possible to efficiently evaluate each cell on an actual quantum computer, and that if one only wants expectation values given parameters rather than the full state evolution, there are more efficient classical methods (e.g. [2, 31]). Current difficulties for this approach, however include noise resulting both from finite sampling error as well as the effects of imperfect quantum hardware.

VI Results

In this section we present numerical evidence supporting the Homogeneous Heuristic for Parameter Setting, again using MaxCut on Erdős-Rényi model graphs as a target application. Due to the array of possible techniques implementing the parameter update scheme as mentioned in Sec. IV, we do not wish to provide an exhaustive comparison of the heuristic to previous literature, but rather demonstrate regimes where the heuristic provides parameters that are either comparable with previous results, or that yield increasing performance out to larger values of pp where we are not aware of prior methods in the literature successfully returning well-performing parameter schedules.

VI.1 Global Optimization at Low-Depth

Here we present results of the Homogeneous Heuristic for Parameter Setting, as well as comparisons to the transfer of parameters method outlined in Lotshaw et al. [10], implemented using the QAOAKit software package [11]. Lotshaw et al. show that using one set of median (over the entire dataset at a given nn and pp) parameters performs similarly to optimized parameters for each instance. Thus, we directly pull the obtained parameters from QAOAKit, which first calculates optimal parameters for all connected non-isomorphic graphs up to size 99 at p=1p=1,22, and 33. For each pp, the median over all parameters is calculated, and these median parameters are directly applied to ten Erdős-Rényi graphs from 𝒢(20,1/2)\mathcal{G}(20,1/2), yielding average and standard deviation of expectation values for these median parameters over the ten graphs. To compare with these transferred parameters, we display the approximation ratio achieved by parameters that are optimized with the heuristic, as described in Sec. IV, over the same ten graphs. Here the approximation ratio is defined as follows,

ApxRatio=C/copt,\mathrm{Apx\;Ratio}=\braket{C}/c_{opt}, (24)

i.e., the expected cost value returned by true QAOA, divided by the true optimal value coptc_{opt}, as determined via brute force search over all 2n2^{n} bitstrings. For this experiment, as well as all experiments below, the state throughout the algorithm and C\braket{C} are computed via full statevector simulation, even for parameters returned via the heuristic. The comparison between the heuristic and parameter transfer is shown in Fig. 6.

Refer to caption
Figure 6: Box plots of approximation ratios obtained by the transfer of median of optimal parameters from 1010 𝒢(9,1/2)\mathcal{G}(9,1/2) graphs to 1010 𝒢(20,1/2)\mathcal{G}(20,1/2) graphs vs parameters found via homogeneous parameter setting for the same 1010 𝒢(20,1/2)\mathcal{G}(20,1/2) graphs, for p=p= 11,22, and 33.

As we can see in Fig. 6, for low depth, the heuristic performs comparably well to parameter transfer. On an instance-by-instance basis, the approximation ratio achieved by homogeneous parameter setting minus that achieved by parameter transfer was .0037±.0062-.0037\pm.0062, .0164±.0148.0164\pm.0148, and .0097±.0183.0097\pm.0183 for p=1p=1, 22, and 33, respectively. We do not see statistically significant differences between the two methods for any three of the depths analyzed, although the average performance of the heuristic is slightly higher in the latter two cases. This numerical evidence indicates that the method is competitive. Furthermore, the optimal parameters in the transfer case require the optimization of smaller QAOA instances, which clearly may incur some tradeoff between the size of problem one wishes to train parameters on versus the accuracy of the parameter transfer onto larger and larger instances. The parameters for comparison were pulled directly from QAOAKit data-tables, so our purpose here is not to provide a full timing comparison between the two methods. However, this demonstrates that our polynomially-scaling heuristic performs comparably with other techniques used in literature.

VI.2 Parameter Optimization at Higher Depth

To elucidate how the Homogeneous Heuristic for Parameter Setting scales with QAOA depth pp, we further depict box plots of the approximation ratio for a new set of ten Erdős-Rényi graphs 𝒢(20,.5)\mathcal{G}(20,.5) at pp up to 2020. For these experiments, we further restrict to linear ramp parameter schedules as described in Eq. (23), to reduce the number of parameters from 2p2p to 44. We introduce this re-parameterization because having 2p2p free parameters, even for this relatively moderately sized pp, results in optimization routines that do not converge in a reasonable time on the device as specified below. The results for these runs are shown in Fig. 7.

Refer to caption
Figure 7: Box plots of approximation ratios for parameters found via homogeneous parameter setting for the same 1010 𝒢(20,1/2)\mathcal{G}(20,1/2) graphs, for p=4p=4 88,1212, 1616, and 2020, restricted to linear ramp schedules as described in Sec. V.3.

From this figure, we see that the heuristic, when implemented with linear ramp schedules, results in monotonic improvement of approximation ratios as pp increases. Notably, for this regime of N=20N=20, p=20p=20, we were not able to find previous works that efficiently returned optimized parameter schedules, even when restricted to linear ramps. Thus, these results demonstrate a regime in which the heuristic is able to return parameters that appear intractable for current devices and strategies, whether quantum, classical, or hybrid.

Numerical details: For our simulations in this section, all calculations (excluding those pulled from the QAOAKit database) were performed using a laptop with Intel i7-10510U CPUs @ 1.80GHz and 16 GB of RAM, with no parallelization utilized. For the 2020 node graphs, all experiments clocked in under 66 hours, where the longest times were for fully parameterized p=3p=3 circuits (6 parameters). Parameters were seeded using linear ramp schedules from [13] and parameter optimization was performed using the standard Broyden–Fletcher–Goldfarb–Shanno algorithm [45] from the SciPy package [46].

VII Discussion

In this work we formalized the concepts of Perfect Homogeneity and the Classical Homogeneous Proxy for QAOA. We demonstrated how to derive the necessary quantities and efficiently evaluate the proxy for combinatorial satisfaction problems with a fixed, polynomial number of randomly chosen clauses. We then provided numerical evidence to support the use of the proxy for estimating the evolution and cost expectation value of QAOA. Finally, we applied these results to construct the Homogeneous Heuristic for QAOA, and implemented this strategy for a class of MaxCut instances on graphs up to n=20n=20 and p=20p=20. Our results show that the heuristic on this class easily yields parameters at p=1p=1, 22, and 33 that are comparable to those returned by parameter transfer. We further demonstrated that we are able to optimize parameters out to p=20p=20 by restricting to a linear ramp schedule, obtaining the desirable property of monotonically increasing approximation ratios as the number of QAOA layers is increased. Notably, we found that the proxy seems to well-estimate both the state and cost expectation of QAOA in the particular cases when γ\gamma remains relatively small throughout the algorithm, as well as for quantum annealing-inspired linear ramp schedules. These ramp schedules have been frequently proposed as empirically well-performing schedules [13, 10, 26], which supports that the proxy may more accurately estimate QAOA expectation values for important parameter regimes and schedules of interest, even if these estimates may diverge somewhat in the case of arbitrarily chosen parameters.

Several interesting research questions and future directions directly follow from our results. An immediate question is to better understand the relationship between the problem class specified, the resulting distributions N(c;d,c)N(c^{\prime};d,c) and P(c)P(c^{\prime}) used for the proxy, and the effect on the parameters returned by the Homogeneous Heuristic for QAOA, especially with respect to a given problem instance to be solved. For example, a fixed instance can be drawn from a number of different possible classes, so changing the class considered can have a significant effect on the parameters returned and resulting performance. One approach to address this issue would be to extend the derivations of N(c;d,c)N(c^{\prime};d,c) and P(c)P(c^{\prime}) to incorporate instance-specific information beyond just the problem class. A naive example in this vein would be to estimate the distributions via Monte Carlo sampling of bitstrings and their costs for the given instance. Furthermore, including instance-specific information appears a promising route to explicitly extending the heuristic beyond random problem classes, which can be used to study parameter schedules and performance in the worst-case setting. Finally, it is worthwhile to explore adaptations of our approach to cases where the number of unique possible costs may become large. In this case, one could imagine binning together costs close in value such that an effective cost function with much fewer possible costs is produced, and to which the proxy may be applied.

In terms of generalizing both the methods and scope of our approach, we first re-emphasize that parameter optimization for parameterized quantum circuits consists of two primary components: a parameter update scheme outer loop, and a parameter objective function evaluation subroutine. The inner subroutine is typically evaluated using the quantum computer. The key idea of our approach is to replace the inner subroutine with an efficiently computable classical strategy based on the assumption of Perfect Homogeneity. Hence a natural extension is to consider other efficiently computable proxies for the inner loop. For example, in cases where the problem instance comes with a high degrees of classical symmetries, the dimension of the effective Hilbert space can be drastically reduced, and so the evaluation and optimization of the typical parameter objective can be sped up significantly [23]. Similarly, different proxies may follow from related ideas and results in the literature such as the small-parameter analysis of [39], the pseudo-Boltzmann approximation of [38], and classical or quantum surrogate models [35, 36]. We remark that a promising direction that appears relatively straightforward in light of our results is to extend the analysis of [38] to QAOA levels beyond p=1p=1. Finally, an important direction is to explicitly generalize our approach to algorithms beyond QAOA and, more generally, problems beyond combinatorial optimization, such as the parameter setting problem for Variational Quantum Eigensolvers. Generally, it is important to better understand and characterize regimes where such classical proxies are most advantages, such as when the noisy computation and measurements of real-world quantum devices is taken into account, as well as to what degree undesirable effects such as barren plateaus may apply when such proxies are utilized for parameter setting.

Acknowledgements

We are grateful for support from NASA Ames Research Center. We acknowledge funding from the NASA ARMD Transformational Tools and Technology (TTT) Project. Part of the project was funded by the Defense Advanced Research Projects Agency (DARPA) via IAA8839 annex 114. JS, SH, TH are thankful for support from NASA Academic Mission Services, Contract No. NNA16BD14C. We also acknowledge XSEDE computational Project No. TG-MCA93S030 on Bridges-2 at the Pittsburgh supercomputer center.

References

  • [1] Stuart Hadfield, Zhihui Wang, Bryan O’Gorman, Eleanor G. Rieffel, Davide Venturelli, and Rupak Biswas. From the Quantum Approximate Optimization Algorithm to a Quantum Alternating Operator Ansatz. Algorithms, 12(2):34, February 2019. Number: 2 Publisher: Multidisciplinary Digital Publishing Institute.
  • [2] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A Quantum Approximate Optimization Algorithm. arXiv:1411.4028 [quant-ph], 10(2), November 2014. arXiv: 1411.4028.
  • [3] Tad Hogg. Quantum search heuristics. Physical Review A, 61(5):052311, April 2000. Publisher: American Physical Society.
  • [4] Tad Hogg and Dmitriy Portnov. Quantum optimization. Information Sciences, 128(3):181–197, October 2000.
  • [5] M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R. McClean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and Patrick J. Coles. Variational quantum algorithms. Nature Reviews Physics, 3(9):625–644, September 2021. Number: 9 Publisher: Nature Publishing Group.
  • [6] David Wierichs, Christian Gogolin, and Michael Kastoryano. Avoiding local minima in variational quantum eigensolvers with the natural gradient optimizer. Physical Review Research, 2(4):043246, November 2020. arXiv:2004.14666 [quant-ph].
  • [7] Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy, Ryan Babbush, and Hartmut Neven. Barren plateaus in quantum neural network training landscapes. Nature Communications, 9(1):4812, November 2018. Number: 1 Publisher: Nature Publishing Group.
  • [8] Martin Larocca, Piotr Czarnik, Kunal Sharma, Gopikrishnan Muraleedharan, Patrick J. Coles, and M. Cerezo. Diagnosing barren plateaus with tools from quantum optimal control. Technical Report arXiv:2105.14377, arXiv, March 2022. arXiv:2105.14377 [quant-ph] type: article.
  • [9] Samson Wang, Enrico Fontana, M. Cerezo, Kunal Sharma, Akira Sone, Lukasz Cincio, and Patrick J. Coles. Noise-induced barren plateaus in variational quantum algorithms. Nature Communications, 12(1):6961, November 2021. Number: 1 Publisher: Nature Publishing Group.
  • [10] Phillip C. Lotshaw, Travis S. Humble, Rebekah Herrman, James Ostrowski, and George Siopsis. Empirical performance bounds for quantum approximate optimization. arXiv:2102.06813 [physics, physics:quant-ph], February 2021. arXiv: 2102.06813.
  • [11] Ruslan Shaydulin, Kunal Marwaha, Jonathan Wurtz, and Phillip C. Lotshaw. QAOAKit: A Toolkit for Reproducible Study, Application, and Verification of the QAOA. arXiv:2110.05555 [quant-ph], November 2021. arXiv: 2110.05555.
  • [12] Sami Khairy, Ruslan Shaydulin, Lukasz Cincio, Yuri Alexeev, and Prasanna Balaprakash. Learning to Optimize Variational Quantum Circuits to Solve Combinatorial Problems. Proceedings of the AAAI Conference on Artificial Intelligence, 34(03):2367–2375, April 2020. Number: 03.
  • [13] Leo Zhou, Sheng-Tao Wang, Soonwon Choi, Hannes Pichler, and Mikhail D. Lukin. Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation on Near-Term Devices. Physical Review X, page 021067, June 2020. Publisher: American Physical Society.
  • [14] Gavin E. Crooks. Performance of the Quantum Approximate Optimization Algorithm on the Maximum Cut Problem. arXiv:1811.08419 [quant-ph], November 2018. arXiv: 1811.08419.
  • [15] Ruslan Shaydulin, Ilya Safro, and Jeffrey Larson. Multistart Methods for Quantum Approximate optimization. In 2019 IEEE High Performance Extreme Computing Conference (HPEC), pages 1–8, September 2019. ISSN: 2643-1971.
  • [16] Mahabubul Alam, Abdullah Ash-Saki, and Swaroop Ghosh. Accelerating quantum approximate optimization algorithm using machine learning. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 686–689. IEEE, 2020.
  • [17] Guillaume Verdon, Michael Broughton, Jarrod R. McClean, Kevin J. Sung, Ryan Babbush, Zhang Jiang, Hartmut Neven, and Masoud Mohseni. Learning to learn with quantum neural networks via classical neural networks. arXiv:1907.05415 [quant-ph], July 2019. arXiv: 1907.05415.
  • [18] Max Wilson, Rachel Stromswold, Filip Wudarski, Stuart Hadfield, Norm M. Tubman, and Eleanor G. Rieffel. Optimizing quantum heuristics with meta-learning. Quantum Machine Intelligence, 3(1):13, April 2021.
  • [19] Andrea Skolik, Jarrod R McClean, Masoud Mohseni, Patrick van der Smagt, and Martin Leib. Layerwise learning for quantum neural networks. Quantum Machine Intelligence, 3(1):1–11, 2021.
  • [20] Javier Rivera-Dean, Patrick Huembeli, Antonio Acín, and Joseph Bowles. Avoiding local minima in variational quantum algorithms with neural networks. arXiv preprint arXiv:2104.02955, 2021.
  • [21] Lennart Bittel and Martin Kliesch. Training variational quantum algorithms is NP-hard. Physical Review Letters, 127(12):120502, 2021.
  • [22] Michael Streif and Martin Leib. Training the quantum approximate optimization algorithm without access to a quantum processing unit. Quantum Science and Technology, 5(3):034008, May 2020. Publisher: IOP Publishing.
  • [23] Ruslan Shaydulin, Stuart Hadfield, Tad Hogg, and Ilya Safro. Classical symmetries and the Quantum Approximate Optimization Algorithm. Quantum Information Processing, 20(11):359, November 2021. arXiv: 2012.04713.
  • [24] Lucas T. Brady, Christopher L. Baldwin, Aniruddha Bapat, Yaroslav Kharkov, and Alexey V. Gorshkov. Optimal Protocols in Quantum Annealing and Quantum Approximate Optimization Algorithm Problems. Physical Review Letters, 126(7):070505, February 2021. Publisher: American Physical Society.
  • [25] Stefan H. Sack and Maksym Serbyn. Quantum annealing initialization of the quantum approximate optimization algorithm. Quantum, 5:491, July 2021. Publisher: Verein zur Förderung des Open Access Publizierens in den Quantenwissenschaften.
  • [26] Jonathan Wurtz and Peter J. Love. Counterdiabaticity and the quantum approximate optimization algorithm. arXiv:2106.15645 [quant-ph], July 2021. arXiv: 2106.15645.
  • [27] Fernando G. S. L. Brandao, Michael Broughton, Edward Farhi, Sam Gutmann, and Hartmut Neven. For Fixed Control Parameters the Quantum Approximate Optimization Algorithm’s Objective Function Value Concentrates for Typical Instances. arXiv:1812.04170 [quant-ph], December 2018. arXiv: 1812.04170.
  • [28] Xinwei Lee, Yoshiyuki Saito, Dongsheng Cai, and Nobuyoshi Asai. Parameters Fixing Strategy for Quantum Approximate Optimization Algorithm. arXiv:2108.05288 [quant-ph], August 2021. arXiv: 2108.05288.
  • [29] Alexey Galda, Xiaoyuan Liu, Danylo Lykov, Yuri Alexeev, and Ilya Safro. Transferability of optimal QAOA parameters between random graphs. arXiv:2106.07531 [quant-ph], June 2021. arXiv: 2106.07531.
  • [30] Lucas T Brady, Lucas Kocia, Przemyslaw Bienias, Aniruddha Bapat, Yaroslav Kharkov, and Alexey V Gorshkov. Behavior of analog quantum algorithms. arXiv preprint arXiv:2107.01218, 2021.
  • [31] Zhihui Wang, Stuart Hadfield, Zhang Jiang, and Eleanor G. Rieffel. Quantum approximate optimization algorithm for MaxCut: A fermionic view. Physical Review A, 97(2):022304, February 2018. Publisher: American Physical Society.
  • [32] Kunal Marwaha. Local classical MAX-CUT algorithm outperforms p=2 QAOA on high-girth regular graphs. Quantum, 5:437, April 2021.
  • [33] Panagiotis Kl Barkoutsos, Giacomo Nannicini, Anton Robert, Ivano Tavernelli, and Stefan Woerner. Improving Variational Quantum Optimization using CVaR. Quantum, 4:256, April 2020. arXiv:1907.04769 [quant-ph].
  • [34] Li Li, Minjie Fan, Marc Coram, Patrick Riley, and Stefan Leichenauer. Quantum optimization with a novel Gibbs objective function and ansatz architecture search. Physical Review Research, 2(2):023074, April 2020.
  • [35] Kevin J Sung, Jiahao Yao, Matthew P Harrigan, Nicholas C Rubin, Zhang Jiang, Lin Lin, Ryan Babbush, and Jarrod R McClean. Using models to improve optimizers for variational quantum algorithms. Quantum Science and Technology, 5(4):044008, 2020.
  • [36] Ryan Shaffer, Lucas Kocia, and Mohan Sarovar. Surrogate-based optimization for variational quantum algorithms, April 2022. Number: arXiv:2204.05451 arXiv:2204.05451 [quant-ph].
  • [37] Juliane Mueller, Wim Lavrijsen, Costin Iancu, et al. Accelerating noisy VQE optimization with gaussian processes. arXiv preprint arXiv:2204.07331, 2022.
  • [38] Pablo Díez-Valle, Diego Porras, and Juan José García-Ripoll. QAOA pseudo-Boltzmann states. arXiv:2201.03358 [quant-ph], March 2022. arXiv: 2201.03358.
  • [39] Stuart Hadfield, Tad Hogg, and Eleanor G Rieffel. Analytical framework for quantum alternating operator ansätze. arXiv preprint arXiv:2105.06996, 2021.
  • [40] Boaz Barak, Ankur Moitra, Ryan O’Donnell, Prasad Raghavendra, Oded Regev, David Steurer, Luca Trevisan, Aravindan Vijayaraghavan, David Witmer, and John Wright. Beating the random assignment on constraint satisfaction problems of bounded degree. arXiv preprint arXiv:1505.03424, 2015.
  • [41] Matthew Hastings. Classical and quantum bounded depth approximation algorithms. Quantum Information and Computation, 19:1116–1140, November 2019.
  • [42] Kunal Marwaha and Stuart Hadfield. Bounds on approximating MaxkkXOR with quantum and classical local algorithms. Quantum, 6:757, 2022.
  • [43] Chi-Ning Chou, Peter J Love, Juspreet Singh Sandhu, and Jonathan Shi. Limitations of local quantum algorithms on random Max-k-XOR and beyond. In 49th International Colloquium on Automata, Languages, and Programming (ICALP 2022), 2022.
  • [44] Salvatore Mandrà, Jeffrey Marshall, Eleanor G. Rieffel, and Rupak Biswas. HybridQ: A Hybrid Simulator for Quantum Circuits. arXiv:2111.06868 [quant-ph], November 2021. arXiv: 2111.06868.
  • [45] R. (Roger) Fletcher. Practical methods of optimization. Chichester ; New York : Wiley, 1987.
  • [46] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C. J. Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, and Paul van Mulbregt. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17(3):261–272, March 2020.