The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: A Typical Case

Edward Farhi Google Inc., Venice CA 90291 and Center for Theoretical Physics, MIT, Cambridge MA, 02139 David Gamarnik Operations Research Center and Sloan School of Management MIT, Cambridge MA, 02140 Sam Gutmann

Abstract

The Quantum Approximate Optimization Algorithm can naturally be applied to combinatorial search problems on graphs. The quantum circuit has p applications of a unitary operator that respects the locality of the graph. On a graph with bounded degree, with p small enough, measurements of distant qubits in the state output by the QAOA give uncorrelated results. We focus on finding big independent sets in random graphs with dn/2 edges keeping d fixed and n large. Using the Overlap Gap Property of almost optimal independent sets in random graphs, and the locality of the QAOA, we are able to show that if p is less than a d-dependent constant times log n, the QAOA cannot do better than finding an independent set of size .854 times the optimal for d large. Because the logarithm is slowly growing, even at one million qubits we can only show that the algorithm is blocked if p is in single digits. At higher p the algorithm “sees” the whole graph and we have no indication that performance is limited.

1 Introduction

The Quantum Approximation Optimization Algorithm [1, 2] is designed to find approximate solutions to combinatorial search problems, and here we consider its application to finding large independent sets in random graphs. The graphs have $n$ vertices and $dn\over 2$ edges chosen uniformly at random, with $d$ , the average degree of each graph, fixed. The quantum algorithm consists of an alternation of $2p$ unitaries, half of which are single-qubit unitaries and the other half only interact qubits that are connected by an edge in the graph. On a bounded-degree graph, with $p$ fixed or growing slowly with $n$ , the QAOA does not “see” the whole graph. This means that bits output by the QAOA are uncorrelated at graph distances larger than $2p$ . Looking at random graphs of average degree $d$ , when $2p$ is less than a multiple of $\log n$ we will show that the power of the algorithm is limited. More precisely if $2p\leq w\log n/\log(d/\ln 2)$ for any $w<1$ and $d$ big enough, the QAOA fails to produce an independent set larger than .854 times the optimal. (The ratio of logs is independent of the base of the log.) If $p$ is large enough that the algorithm “covers” the whole graph we have no indication that the algorithm has limited power.

In the first QAOA paper [1] it was shown that there exists a set of large Max-Cut instances on which the $p=2$ algorithm fails to achieve an approximation ratio of better than 0.756. These are bipartite 3-regular graphs with $o(n)$ squares. Although completely satisfiable, at this shallow depth, the QAOA can not detect if there are large odd length loops which would make it not completely satisfiable and hence the approximation ratio is provably less than $1$ . Recently [3] looking at Max-Cut constructed a sequence of $d$ -regular bipartite graphs for which the QAOA at depth $p<(1/3~{}\mathrm{log}_{2}n-4)d^{-1}$ fails to find an approximation ratio better than some $d$ dependent constant less than $1$ . This result is similar to the one in this paper as it considers $p$ growing logarithmically with $n$ . However theirs is a worst case result and ours is for typical instances. Also crucially [3] require that the cost function have a $Z_{2}$ symmetry and that the initial state be an eigenstate of the $Z_{2}$ operator. In our setup these conditions are not needed and for the problem we study they are not met.

Our proof method uses the Overlap Gap Property (OGP) exhibited by the large independent sets of random graphs with bounded average degree established in [4]. Roughly speaking the OGP says that for a given random graph, the intersection of any two large (i.e. nearly optimal) independent sets is either big or small, that is, there is no middle ground. The OGP was established to be an obstruction to a variety of classical algorithms, including local algorithms [4],[5],[6], Markov Chain Monte Carlo (and related) methods, [7],[8],[9], and Approximate Message Passing type algorithms [10]. The application of the OGP as a barrier to quantum algorithms is novel. It depends on the locality of the QAOA: the unitary operators in the algorithm only connect vertices which are connected in the input graph. As a result, because of the bounded average degree, when $p$ is a small multiple of $\log n$ , changing a single edge of the graph affects only $o(n)$ qubits of the final state. We will use this to show that outputting a large independent set contradicts the OGP.

It is worth remarking that the OGP is conjectured to not hold for the low energy configurations of the Sherrington-Kirkpatrick model. Assuming that the not hold conjecture is true, Montanari recently [11] constructed a polynomial time Approximate Message Passing algorithm for finding a near ground state configuration.

The QAOA has been applied to the Sherrington Kirkpatrick model at fixed $p$ in the infinite $n$ limit [12]. The associated graph is fully connected (each vertex has degree $n-1$ ) so the QAOA sees the whole graph at the lowest values of $p$ . The techniques of this paper for bounded degree graphs cannot be applied to show any obstacle to the performance of the QAOA on this model.

2 Maximum Independent Set

The computational problem we focus on is Maximum Independent Set or MIS. Given a graph defined as a collection of vertices and edges, an independent set is a subset of the vertices with no graph edge between any two vertices in the set. It is easy to find a small independent set. Finding a big independent set is the challenge. In fact finding the largest independent set in an arbitrary graph is an NP-hard problem. But here we focus on random graphs and are interested in finding big independent sets but not necessarily the biggest. We define $\alpha(G)$ as the independence ratio which is the size of the biggest independent set in $G$ divided by the number of vertices. Given a graph $G$ , an algorithm will output an independent set and the quality of the algorithm can be measured by the size of the output divided by $n$ as compared to $\alpha(G)$ .

We focus on sparse Erdös-Rényi graphs of $n$ nodes and $m$ edges where $m={dn\over 2}$ with $d$ fixed, that is independent of $n$ . In other words $\mathbb{G}(n,{dn\over 2})$ is a graph with ${dn\over 2}$ edges chosen by picking this number of edges uniformly at random from the $n(n-1)/2$ possible edges. The average degree of each graph is $d$ . The independence ratio $\alpha(\mathbb{G}(n,{dn\over 2}))$ of this graph denoted by $\alpha_{n,d}$ is a random variable with the following known properties. First [13] there exists an $\alpha_{d}$ such that

\displaystyle\alpha_{n,d}\to\alpha_{d}\text{ with probability $1$ as }n\to\infty.

(1)

Second, while the value of $\alpha_{d}$ for finite $d$ is unknown, the asymptotic value of $\alpha_{d}$ as $d$ increases is known [14]:

\displaystyle\lim_{d\to\infty}{\alpha_{d}\over 2\ln d/d}=1.

(2)

We are interested in algorithms that take a graph as input and output a large independent set. The natural question that arises is if a polynomial time algorithm can produce independent sets close to $\alpha_{d}$ . A simple greedy algorithm achieves asymptotically half of the optimal value, that is, it constructs an independent set of size $n(\ln d/d)(1+o_{d}(1))$ where $o_{d}(1)$ denotes a function of $d$ converging to $0$ as $d\to\infty$ . Finding a polynomial time algorithm that provably goes beyond this would be a major achievement and it mirrors a similar problem in the context of dense Erdös-Rényi graphs, which has been open for more than four decades [15]. Our interest is a quantum algorithm, the QAOA. We will show that if $p$ , the depth of the QAOA is less than $\epsilon$ log $n$ with $\epsilon$ a small constant, then the QAOA fails to go as far as $.854\alpha_{d}$ for $d$ large. For larger $p$ our arguments do not apply and we cannot say if the QAOA gets close to $\alpha_{d}$ with $p$ say growing as a large constant times log $n$ . (Actually if we let $p$ grow fast enough with $n$ the QAOA will find the optimum [1].)

3 The Quantum Approximate Optimization Algorithm

We start by reviewing the QAOA with the graph problem Maximum Independent Set in mind. It is convenient here to work with bits that are $0,1$ valued. Given a classical cost function $C({\boldsymbol{b}})$ defined on $n$ -bit strings ${\boldsymbol{b}}=(b_{1},b_{2},\ldots,b_{n})\in\{0,1\}^{n}$ , the QAOA is a quantum algorithm that aims to find a string ${\boldsymbol{b}}$ such that $C({\boldsymbol{b}})$ is close to its absolute maximum. The graph-dependent cost function $C$ can be written as an operator that is diagonal in the computational basis, defined as

C\left|{\boldsymbol{b}}\right\rangle=C({\boldsymbol{b}})\left|{\boldsymbol{b}}\right\rangle.

(3)

We only consider “local” cost functions, that is, those that only have interactions between qubits that are connected on the instance graph. The problem dependent unitary operator depends on $C$ and a single parameter $\gamma$

\displaystyle U(C,\gamma)=e^{-i\gamma C}.

(4)

Note that $U(C,\gamma)$ conjugating a single qubit operator produces an operator that only involves that qubit and those connected to it on the graph.

The operator that induces transitions between strings uses

\displaystyle B=\sum_{j=1}^{n}X_{j},

(5)

where $X_{j}$ is the Pauli $X$ operator acting on qubit $j$ , and the associated unitary operator depends on a parameter $\beta$

\displaystyle U(B,\beta)=e^{-i\beta B}=\prod_{j=1}^{n}e^{-i\beta X_{j}}.

(6)

Note that $U(B,\beta)$ conjugating a single qubit rotates that qubit and has no effect on other qubits.

We initialize the system of qubits in a product state such as

\displaystyle\left|s\right\rangle=\left|0\right\rangle^{\otimes n}

(7)

\left|s\right\rangle=\left|+\right\rangle^{\otimes n}=\frac{1}{\sqrt{2^{n}}}\sum_{{\boldsymbol{b}}}\left|{\boldsymbol{b}}\right\rangle.

(8)

Using a product state for the initial state is the usual choice for the QAOA and is required for the arguments below.

We alternately apply $p$ layers of $U(C,\gamma)$ and $U(B,\beta)$ . Let $\boldsymbol{\gamma}=\gamma_{1},\gamma_{2},\ldots,\gamma_{p}$ and $\boldsymbol{\beta}=\beta_{1},\beta_{2},\ldots,\beta_{p}$ . The QAOA circuit prepares the unitary operator

\displaystyle U=U(B,\beta_{p})U(C,\gamma_{p})\cdots U(B,\beta_{1})U(C,\gamma_{1})

(9)

which acting on the initial state gives

\displaystyle\left|\boldsymbol{\gamma},\boldsymbol{\beta}\right\rangle=U\left|s\right\rangle.

(10)

The associated QAOA objective function is

\displaystyle\left\langle\boldsymbol{\gamma},\boldsymbol{\beta}\right|C\left|\boldsymbol{\gamma},\boldsymbol{\beta}\right\rangle.

(11)

By repeatedly measuring the quantum state $\left|\boldsymbol{\gamma},\boldsymbol{\beta}\right\rangle$ in the computational basis, one will find a bit string ${\boldsymbol{b}}$ such that $C({\boldsymbol{b}})$ is near (11) or better. Different strategies have been developed to find optimal $(\boldsymbol{\gamma},\boldsymbol{\beta})$ for any given instance [16, 17]. But here we will show that under certain circumstances no set of parameters $(\boldsymbol{\gamma},\boldsymbol{\beta})$ can achieve a certain level of success so we need not concern ourselves with optimal parameters. So from now on we denote the state produced by the QAOA as $\left|\psi\right\rangle$ .

4 Locality Properties of the QAOA

In this paper we are focusing on combinatorial search problem associated with random graphs of bounded average degree $d$ , so it it very unlikely that any vertex has degree much larger than $d$ . This means that the cost function unitary (4) conjugating say a single qubit operator typically produces an operator acting on no more than roughly $d$ qubits. The “driver” unitary in the form of (6) introduces no spreading at all. We also use for the initial state a product state which has no entanglement. With this form of the QAOA we can establish some general locality properties of the quantum state produced by the quantum circuit. What follows is not restricted to a particular computational problem or to random graphs.

4.1 Distant qubits

Consider an instance of some graph problem with its associated local cost function $C$ . The first property has to do with bits that are far away from each other on the graph. Define B( $i,r$ ) as the set of vertices that are within a distance $r$ of the vertex $i$ . Let Dist( $i,2p$ ) be the complement of B( $i,2p)$ , that is, it is the set of vertices at least $2p$ away from $i$ . We are assuming than $2p$ is small enough so that Dist( $i,2p$ ) is not empty. Let $O_{i}$ be an operator acting on qubit $i$ tensored with the identity acting on all other qubits. Let $O_{\text{dist}}$ be an operator acting only on the qubits in Dist( $i,2p$ ). We now show that

\left\langle s\right|U^{\dagger}O_{i}O_{\text{dist}}U\left|s\right\rangle=\left\langle s\right|U^{\dagger}O_{i}U\left|s\right\rangle\left\langle s\right|U^{\dagger}O_{\text{dist}}U\left|s\right\rangle

(12)

as long as $\left|s\right\rangle$ is a product state and $U$ is of the form (7) with a local cost function.

Proof of (12) . Because the QAOA is local we see that $U^{\dagger}O_{i}U$ only involves qubits in B( $i,p$ ). Because $O_{\text{dist}}$ only involves qubits that are $2p$ away from qubit $i$ we see that $U^{\dagger}O_{\text{dist}}U$ can only involve qubits in the complement of B( $i,p$ ). Now

\left\langle s\right|U^{\dagger}O_{i}O_{\text{dist}}U\left|s\right\rangle=\left\langle s\right|U^{\dagger}O_{i}UU^{\dagger}O_{\text{dist}}U\left|s\right\rangle

(13)

and we will insert between $U$ and $U^{\dagger}$ a complete set with qubits in B( $i,p$ ) and its complement. Call $``\mathrm{near}"$ the set of bits in B( $i,p$ ) and those in its complement “ $\mathrm{far}$ ”. Now the initial state is a product state which we can write as $\left|s\right\rangle=\left|s_{\mathrm{near}}\right\rangle\left|s_{\mathrm{far}}\right\rangle$ . Insert a complete set in the middle of the right hand side of (13) and we get

\displaystyle=\sum_{\bf{v}_{\mathrm{near}}}\sum_{\bf{v}_{\mathrm{far}}}\left\langle s_{\mathrm{near}}\right|\left\langle s_{\mathrm{far}}\right|U^{\dagger}O_{i}U\left|\bf{v}_{\mathrm{near}}\right\rangle\left|\bf{v}_{\mathrm{far}}\right\rangle\left\langle\bf{v}_{\mathrm{near}}\right|\left\langle\bf{v}_{\mathrm{far}}\right|U^{\dagger}O_{\mathrm{dist}}U\left|s_{\mathrm{near}}\right\rangle\left|s_{\mathrm{far}}\right\rangle

(14)

where the basis set $\left|\bf{v}_{\mathrm{near}}\right\rangle$ contains $\left|s_{\mathrm{near}}\right\rangle$ and the basis set $\left|\bf{v}_{\mathrm{far}}\right\rangle$ contains $\left|s_{\mathrm{far}}\right\rangle$ . Now the $U^{\dagger}O_{i}U$ term collapses the sum on $\bf{v}_{\mathrm{far}}$ and the $U^{\dagger}O_{\text{dist}}U$ term collapses the sum on $\bf{v}_{\mathrm{near}}$ and we get

\left\langle s_{\mathrm{near}}\right|\left\langle s_{\mathrm{far}}\right|U^{\dagger}O_{i}U\left|s_{\mathrm{near}}\right\rangle\left|s_{\mathrm{far}}\right\rangle\left\langle s_{\mathrm{near}}\right|\left\langle s_{\mathrm{far}}\right|U^{\dagger}O_{\mathrm{dist}}U\left|s_{\mathrm{near}}\right\rangle\left|s_{\mathrm{far}}\right\rangle

(15)

which results in (12).

In terms of the state $\left|\psi\right\rangle$ produced by the QAOA (12) says that

\displaystyle\left\langle\psi\right|O_{i}O_{\mathrm{dist}}\left|\psi\right\rangle=\left\langle\psi\right|O_{i}\left|\psi\right\rangle\left\langle\psi\right|O_{\mathrm{dist}}\left|\psi\right\rangle.

(16)

Again $O_{i}$ is any operator acting on qubit $i$ and $O_{\mathrm{dist}}$ is any operator acting on qubits at least $2p$ away from $i$ and we see that the measurement outcomes of the two operators are independent. In particular measurement in the state $\left|\psi\right\rangle$ of bit values at $i$ and in Dist are independent.

4.2 Far from an edge

For the next property imagine changing the cost function on a single edge. This locality property concerns the influence of this change on qubits that are far away from that edge. Consider two instances of some computational problem which differ only by the presence or absence of a single edge. Or perhaps they differ because the single edge is weighted differently in the two instances. Call the associated cost functions $C$ and $C^{\prime}$ which give rise to $U$ and $U^{\prime}$ through (4). Let the edge in question be between vertices $i$ and $j$ . Let Far $(ij,p)$ be the complement of $\mathrm{B}(i,p)\cup\mathrm{B}(j,p)$ . We assume that $p$ is small enough that Far $(ij,p)$ is not empty. Consider an operator $O_{\text{far}}$ that involves only qubits in Far $(ij,p)$ . Now for the depth $p$ algorithm, $\ U^{\dagger}O_{\text{far}}U$ does not involve the edge $ij$ so it is same with $U$ replaced by $U^{\prime}$ that is

\ U^{\prime{\dagger}}O_{\text{far}}U^{\prime}=\ U^{\dagger}O_{\text{far}}U.

(17)

What this means is that the influence of the edge in question is limited to qubits within a distance $p$ of the edge. It also means that the probability of measuring a bit string in Far $(ij,p)$ is unaffected by the change in the edge $ij$ . Let $\left|\psi\right\rangle=U\left|s\right\rangle$ and $\left|\psi^{\prime}\right\rangle=U^{\prime}\left|s\right\rangle$ so these are the states produced with the unmodified and modified edge sets. Consider $\textbf{b}_{\text{far}}$ which is the bit values of the set of bits in Far $(ij,p)$ . Let $O_{\text{far}}=\left|\textbf{b}_{\text{far}}\right\rangle\left\langle\textbf{b}_{\text{far}}\right|$ tensored with the identity on qubits in $\mathrm{B}(i,p)\cup\mathrm{B}(j,p)$ . Now write $\left|{\boldsymbol{b}}\right\rangle=\left|{\boldsymbol{b}}_{\text{near}}\right\rangle\left|{\boldsymbol{b}}_{\text{far}}\right\rangle$ and take the expectation of (17) in the state $\left|{\boldsymbol{b}}\right\rangle$ . Keep $\textbf{b}_{\text{far}}$ fixed and sum on $\textbf{b}_{\text{near}}$ to get

\sum_{{\boldsymbol{b}}_{\mathrm{near}}}\big{|}\langle{{\boldsymbol{b}}_{\text{near}}}\big{|}\left\langle\textbf{b}_{\text{far}}\right|\psi\rangle\big{|}^{2}=\sum_{{\boldsymbol{b}}_{\mathrm{near}}}\big{|}\left\langle{\boldsymbol{b}}_{\text{near}}\right|\left\langle\textbf{b}_{\text{far}}\right|\psi^{\prime}\rangle\big{|}^{2}.

(18)

This means that the probability of measuring the bit string $\textbf{b}_{\text{far}}$ in Far $(ij,p)$ is unaffected by the edge change.

4.3 Concentration of Hamming weight

Again we are considering the QAOA with a local cost function associated with a graph $G$ , a one local driver operator such as in (6), and a product state for $\left|s\right\rangle$ . For fixed $p$ each vertex $i$ has a neighborhood B( $i,2p)$ . We take $p$ small enough that the maximum size of these neighborhoods is less than $n^{A}$ for some $A<1$ . Run the QAOA to get the state $\left|\psi\right\rangle$ and measure in the computational basis to get a bit string. We now show that the Hamming weight of these measured bit strings concentrates in that, for sufficiently large $n$ , each measurement produces the same Hamming weight with a variance that is $o(n)$ .

Let the Hamming weight operator be

W=\sum_{i}b_{i}.

(19)

The measurement variance of $W$ is

\left\langle\psi\right|W^{2}\left|\psi\right\rangle-\left\langle\psi\right|W\left|\psi\right\rangle^{2}

(20)

which breaks into $n^{2}$ terms

\sum_{i}\sum_{j}\Big{[}\left\langle\psi\right|b_{i}b_{j}\left|\psi\right\rangle-\left\langle\psi\right|b_{i}\left|\psi\right\rangle\left\langle\psi\right|b_{j}\left|\psi\right\rangle]~{}\Big{]}.

(21)

Now for a fixed $i$ consider the sum on $j$ . If $j$ is more than $2p$ away from $i$ we can use (12) to see that this term is $0$ . So the only contributions can come from the $n^{A}$ nearby qubits and we bound the variance by $n^{(1+A)}$ . The expected value of the Hamming weight scales with $n$ so the distribution concentrates.

However we need a stronger result for our arguments. Let $G$ be a graph with $n$ vertices and as before take $p$ small enough that the maximum size of B( $i,2p$ ) is less than $n^{A}$ for some $A<1$ with high probability. We are thinking of $n$ as large. For a given QAOA circuit we make the state $\left|\psi\right\rangle$ and measure the Hamming weight. Call the observed value $W_{\text{obs}}$ . Then there exists $\gamma>0$ such that for every $\delta>0$ and for $n$ large enough

\displaystyle\mathbb{P}_{G}\Big{[}~{}\big{|}W_{\text{obs}}-\left\langle\psi\right|W\left|\psi\right\rangle\big{|}\geq\delta n\Big{]}\leq e^{-\delta n^{\gamma}},

(22)

where the graph is fixed and the probability is over measurements of $W$ in the state $\left|\psi\right\rangle$ . We are going to prove this in section 8 and also apply it to random graphs.

5 QAOA applied to Maximum Independent Set

We have discussed the QAOA in general and here we specify it for MIS. For any graph we are looking for a big independent set, that is, a string with a large Hamming weight which corresponds to an independent set. If we choose for the cost function the Hamming weight given by (19) we will easily discover strings with a big Hamming weight but they typically will not be independent sets of the input graph. So we also consider the Independent Set cost function:

C_{\text{IS}}=\sum_{\langle ij\rangle}b_{i}b_{j}

(23)

where the sum is only over edges in the graph so $C_{\text{IS}}$ is local. We want a big Hamming weight $W$ and $C_{\text{IS}}$ to be as small as possible. So for the objective cost function consider:

C_{\text{obj}}=W-C_{\text{IS}}.

(24)

When we run the QAOA our goal is to make $C_{\text{obj}}$ big. But the cost function that appears in (4) need not be $C_{\text{obj}}$ . We can for starters take the cost function that appears in (4) to be $C_{\text{IS}}$ with the goal of making the quantum expectation of $C_{\text{obj}}$ big. Regardless of what we take to drive this local QAOA, the strings that are output will not be independent sets of the associated graph. However these strings can be pruned to produce independent sets.

Suppose the quantum algorithm outputs a string with a positive value of $C_{\text{obj}}$ . By pruning we can produce an independent set of size at least this value. To see this consider the set of graph edges that exist between any of the vertices associated with $1$ ’s in the output string. Call the number of these edges $N_{E}$ . Now for each of these edges pick one of the two associated vertices at random and remove it from the string. This reduces the Hamming weight by at most $N_{E}$ and reduces $C_{\text{IS}}$ by $N_{E}$ so $C_{\text{obj}}$ can not go down. This also shows that the maximum of $C_{\text{obj}}$ is the size of the largest independent set. We call the QAOA augmented by this pruning the QAOA+. Note that the pruning process is done randomly and respects the locality of the underlying graph. When we run the QAOA+ at depth $p$ we mean that the QAOA is run at depth $p-1$ and the pruning is the last layer.

We now show that the shallowest depth version of the QAOA will produce a string with the objective function value being near $1.02n/d$ for large $n$ . Here we pick for the starting state a rotation away from the all zeros state so that the initial Hamming weight is not zero. Introduce a parameter $\theta$ and for $\left|s\right\rangle$ take

\left|s\right\rangle=U(B,\theta)\left|0\right\rangle

(25)

so the QAOA state is given by the three parameter unitary:

\left|\psi\right\rangle=U(B,\beta)~{}U(C_{\text{IS}},\gamma)~{}U(B,\theta)\left|0\right\rangle

(26)

which we call the QAOA with $p=1.5$ . Consider the simple case of $\gamma=0$ so the two rotations combine to be one rotation by $\theta+\beta$ . This brings the state $\left|0\right\rangle$ to one where each bit $b_{i}$ has expected value $\sin^{2}(\theta+\beta)$ . Now the expected value of $C_{\text{obj}}$ is $n[\sin^{2}(\theta+\beta)-(d/2)\sin^{4}(\theta+\beta)]$ whose maximum is $1/2d$ at $\sin(\theta+\beta)=1/\sqrt{d}$ . Letting $\gamma$ vary can only improve this.

For arbitrary $\theta$ , $\gamma$ and $\beta$ we can evaluate the expectation of $C_{\text{obj}}$ in the state $\left|\psi\right\rangle$ by averaging over instances. It is best to write (23) as

C_{\text{IS}}=\sum_{ij}J_{ij}b_{i}b_{j}

(27)

where each $J_{ij}$ is $1$ with probability $d/n$ and $0$ with probability $1-d/n$ . Note that this is not exactly the same as the distribution we analyze in the rest of the paper which is a fixed number of edges $nd/2$ . But for the purposes of this calculation including edges with probability $d/n$ is more straightforward. In fact we can do the average over $J$ and get the expected value over graphs of the quantum expectation of the objective function

\displaystyle\mathrm{Ex}\big{[}\left\langle\psi\right|C_{\text{obj}}\left|\psi\right\rangle\big{]}

(28)

as $n$ times an explicit function of $d$ and the parameters $\theta$ , $\gamma$ and $\beta$ . For each $d$ we can optimize numerically. For $d=3$ we get $.969n/3$ . We see for large $d$ that (27) approaches $1.02n/d$ and the parameters $\theta$ and $\beta$ go down as $\sqrt{d}$ . Pulling out a $1/d$ and rescaling $\theta$ and $\beta$ we have a function that has a limit as $d$ goes to infinity. The optimum of this function is $1.02$ .

What we have shown is that there is a form of the QAOA that we call the QAOA+ which at its lowest depth finds an independent set whose size is at least a constant times $n/d$ . There are simple classical algorithms that can beat this. Our goal here was to show that the lowest depth QAOA+ can be analyzed on random instances and that it is a good starting point for understanding the QAOA at higher depth.

6 The Overlap Gap Property

We now describe the Overlap Gap Property which is the key to showing the limitation of the QAOA on random graphs. For random graphs, this property is satisfied by independent sets with size larger than a certain multiplicative factor $\eta^{*}\in(0,1)$ away from optimality. Specifically, this will be done for

\displaystyle\eta^{*}={1\over 2}+{1\over 2\sqrt{2}}=.853....

(29)

For each $\eta\in(0,1)$ and a graph $G$ on $n$ nodes let

\displaystyle\mathcal{I}(\eta,G)=\{\sigma\in\mathcal{I}(G):|\sigma|\geq n\eta\alpha_{d}\}

(30)

where $\mathcal{I}(G)$ is the set of all independent sets. That is, $\mathcal{I}(\eta,G)$ is the set of independent sets in $G$ which are of size at least $\eta$ times the asymptotic optimal. We call these large independent sets $\eta$ -optimal. For any two $n$ node graphs $G_{1}$ and $G_{2}$ , and for every $0<\tau\leq\eta$ , let $\mathcal{O}^{Ab}(\eta,G_{1},G_{2},\tau)$ denote the set of pairs $\sigma_{1},\sigma_{2}$ such that $\sigma_{j}\in\mathcal{I}(\eta,G_{j}),j=1,2$ and

\displaystyle|\sigma_{1}\cap\sigma_{2}|\geq n\tau\alpha_{d},

(31)

that is, it is the set of pairs of $\eta$ -optimal independent sets whose intersection size normalized by the asymptotic size of the largest independent set is above $\tau$ . Similarly, let $\mathcal{O}^{Be}(\eta,G_{1},G_{2},\tau)$ denote the set of pairs $\sigma_{1},\sigma_{2}$ such that $\sigma_{j}\in\mathcal{I}(\eta,G_{j}),j=1,2$ and

\displaystyle|\sigma_{1}\cap\sigma_{2}|\leq n\tau\alpha_{d},

(32)

that is, it is the set of pairs of $\eta$ -optimal independent sets whose intersection size normalized by the asymptotic size of the largest independent set is below $\tau$ .

Let $G_{0}$ and $G_{m}$ be chosen independently with distribution $\mathbb{G}(n,{dn\over 2})$ . We are going to introduce an interpolation between these two graphs. Let $(i^{a}_{1},j^{a}_{1}),\ldots,(i^{a}_{m},j^{a}_{m})$ with $a=0,m$ be the corresponding sets of $m$ edges of the two graphs. For every $t=0,1,2,\ldots,m$ consider an interpolating random graph $G_{t}$ with edges

\displaystyle(i^{0}_{1},j^{0}_{1}),\ldots,(i^{0}_{m-t},j^{0}_{m-t}),(i^{m}_{m-t+1},j^{m}_{m-t+1}),\ldots,(i^{m}_{m},j^{m}_{m}).

(33)

$G_{t}$ uses the first $m-t$ edges from $G_{0}$ and the remaining $t$ edges from $G_{m}$ . For each fixed $t$ , the graph $G_{t}$ is distributed as $\mathbb{G}(n,m)$ , modulo the potential repetitions of edges. With high probability, the total number of edge repetitions is $O(1)$ with respect to $n$ so they can be ignored.

Theorem: Overlap Gap Property For every $\eta>\eta^{*}$ there exists $0<\tau_{1}<\tau_{2}<\eta,d_{0}$ and $c>0$ such that for all $d>d_{0}$

\displaystyle\mathrm{Prob}\Big{[}\exists\quad 0\leq t_{1},t_{2}\leq m\quad s.t.\quad\mathcal{O}^{Ab}(\eta,G_{t_{1}},G_{t_{2}},\tau_{1})\cap\mathcal{O}^{Be}(\eta,G_{t_{1}},G_{t_{2}},\tau_{2})\neq\emptyset\Big{]}\leq\exp(-cn),

(34)

and

\displaystyle\mathrm{Prob}\Big{[}\mathcal{O}^{Ab}(\eta,G_{0},G_{m},\tau_{1})\neq\emptyset\Big{]}\leq\exp(-cn).

(35)

for all large enough $n$ .

The theorem makes two claims. First it says that across all pairs $G_{t_{1}},G_{t_{2}}$ of graphs in the interpolating sequence, every $\eta$ -optimal independent set $\sigma_{1}$ in $G_{t_{1}}$ and every $\eta$ -optimal independent set $\sigma_{2}$ in $G_{t_{2}}$ have normalized intersection either at most $\tau_{1}$ or at least $\tau_{2}$ , except for an exponentially small probability. There is essentially no middle ground of pairs whose normalized intersection size is between $\tau_{1}$ and $\tau_{2}$ . This is the Overlap Gap Property. The second says, for two independent random graphs sampled from $\mathbb{G}(n,{dn\over 2})$ all corresponding pairs of large independent sets have normalized intersection at most $\tau_{1}$ . For a proof and further discussion see [4].

7 Overlap Gap Property is an obstruction to the QAOA+

7.1 Main Result

Our main result is that the QAOA+, which is the QAOA augmented by pruning to produce independent sets, applied to random graphs of average degree $d$ will fail to find an independent set close to optimal for $p$ less than a constant times $\log n$ . More precisely:

Obstruction Theorem

For every $w<1$ and $\eta>\eta^{*}$ , there is a $\gamma>0$ and a $d_{0}$ such that for $d>d_{0}$ we have: If the QAOA+ is run on a $\mathbb{G}(n,{dn\over 2})$ graph with

\displaystyle 2p\leq{w\log n\over\log(d/\ln{2})},

(36)

then the probability that the algorithm outputs an independent set of size at least $\eta\alpha_{d}n$ is at most $e^{-n^{\gamma}}$ for all $n$ sufficiently large.

We will prove this after stating some preliminary results. The first concerns the size of the neighborhoods of vertices in random graphs with $m={nd\over 2}$ edges.

7.2 Preliminary Results

Neighborhood Size Theorem

Fix $d>1$ , and $w<1$ . If

\displaystyle 2p\leq{w\log n\over\log(d/\ln{2})},

(37)

then there exist $a>0$ and $A<1$ such that

		$\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,2p)\geq n^{A}\Big{]}\leq e^{-n^{a}}$		(38)
	$\displaystyle and~{}~{}~{}$	$\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,p)\geq n^{A/2}\Big{]}\leq e^{-n^{a/2}}.$		(39)

Here Prob is with respect to the graph distribution. What this says is that if $p$ is smaller than a certain constant times $\log n$ , then in a random graph the neighborhood ball of each vertex contains a fraction of the vertices that vanishes as $n$ goes to infinity. We will prove this theorem in the next section.

The next result is an expansion of the purely quantum result in section 4.2 to the QAOA+ which is the depth $p-1$ QAOA run with pruning so all outputs are independent sets. The result (18) says that if a single edge in a graph is modified then the measurement probabilities of qubits far away are unaffected by the edge change. This is a statement about the QAOA but the QAOA outputs strings that need to be pruned back to make independent sets. The pruning process is random and respects the locality of the underlying graph. Let $\mathbb{P}_{G}(\sigma)$ be the probability that the independent set $\sigma$ is output by the QAOA+ running on graph $G$ . Here the symbol $\mathbb{P}_{G}$ means the graph $G$ is fixed and the randomness comes from the quantum measurement and the randomized pruning. So we now state

Far from an edge Lemma

Consider an arbitrary graph $G$ with $G^{\prime}$ obtained by adding a single edge $(i,j)$ to $G$ . Let $\mathrm{Far}$ be the complement of $\mathrm{B}(i,p)\cup\mathrm{B}(j,p)$ where $p$ is the depth of the QAOA+. Then for every $\sigma\in\{0,1\}^{n}$

\displaystyle\sum_{\hat{\sigma}:\hat{\sigma}_{k}=\sigma_{k},k\in\mathrm{Far}}\mathbb{P}_{G}(\hat{\sigma})=\sum_{\hat{\sigma}:\hat{\sigma}_{k}=\sigma_{k},k\in\mathrm{Far}}\mathbb{P}_{G^{\prime}}(\hat{\sigma}).

(40)

Here we are assuming that $p$ is small enough that Far is not empty. The proposition says that the total probability of independent sets which “agree” with $\sigma$ on the node set Far, that is those nodes far from the newly added edge, remains the same after the addition of the edge. The proposition is the direct implication of the local property (18) augmented by the fact that pruning is local.

Our next result is concerns the concentration of Hamming weight discussed in section 4.3 now extended to the QAOA+.

Concentration Theorem

1. Let $G$ be a graph with $n$ vertices and ${dn\over 2}$ edges. Suppose $p$ is chosen such that $\max_{i}|\mathrm{B}_{G_{n}}(i,2p)|\leq n^{A}$ for some $0<A<1$ . Let $\sigma$ be the output of the QAOA+. Then there exists $\gamma_{1}>0$ such that for all $\delta>0$

\displaystyle\mathbb{P}_{G}\Big{[}~{}\big{|}|\sigma|-\mathbb{E}_{G}|\sigma|\big{|}\geq\delta n~{}\Big{]}\leq e^{-\delta n^{\gamma_{1}}}

(41)

for $n$ large enough. Here the graph is fixed and the randomness comes from the QAOA+.

2. Now let $G$ be a random $\mathbb{G}(n,{dn\over 2})$ graph. Suppose for some $a>0$ ,

		$\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,2p)\geq n^{A}\Big{]}\leq e^{-n^{a}}\$		(42)
	$\displaystyle and~{}~{}~{}$	$\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,p)\geq n^{A/2}\Big{]}\leq e^{-n^{a/2}}.$		(43)

Then there exists $\gamma_{2}>0$ such that for any $0<\delta<1$ ,

\displaystyle\mathrm{Prob}\Big{[}~{}\big{|}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq\delta n~{}\Big{]}\leq e^{-{\delta^{2}}n^{\gamma_{2}}}

(44)

for $n$ large enough. Here $\mathrm{Prob}$ and $\mathrm{Ex}$ refer to Probability and Expectation coming from the distribution of random graphs combined with the randomness coming from the QAOA+.

This key concentration result will be proven in the next section. We now prove our main result.

7.3 Proof of Obstruction Theorem

Choose $w<1$ . Find $A<1$ and $a$ as per the Neighborhood Size Theorem. Consider the interpolation $G_{t},0\leq t\leq m$ described in the previous section. Denoting by $\mathrm{B}_{G_{t}}(i,p)$ the neighborhood of node $i$ in graph $G_{t}$ , we have by the union bound

\displaystyle\mathrm{Prob}\left[\max_{t}\max_{i}|\mathrm{B}_{G_{t}}(i,p)|\geq n^{A/2}\right]\leq{dn\over 2}e^{-n^{a/2}}\leq e^{-n^{a^{\prime}}},

(45)

where $a^{\prime}$ is any constant smaller than $a\over 2$ and $n$ is large enough. Let $D=\max_{t}\max_{i}|\mathrm{B}_{G_{t}}(i,p)|$ . On the sequence of graphs $G_{t}$ we construct a coupled sequence of independent sets $\sigma_{t},0\leq t\leq m$ , where for each fixed $t$ , $\sigma_{t}$ is a single independent set coming from the distribution of the output of the QAOA+ on the random graph $G_{t}$ . The set-theoretic difference $\Delta$ of $\sigma_{t}$ and $\sigma_{t+1}$ will be seen to satisfy

\displaystyle|\sigma_{t}\Delta\sigma_{t+1}|\leq 4D.

(46)

By “construct” we do not mean an algorithmically efficient construction, rather we show that such a coupled sequence exists. First produce a single sample $\sigma_{0}=(\sigma_{0,1},\ldots,\sigma_{0,n})$ by running the QAOA+ on $G_{0}$ . Next, recall that $G_{1}$ is obtained from $G_{0}$ by deleting $G_{0}$ ’s last edge $(i^{0}_{m},j^{0}_{m})$ and adding $G_{m}$ ’s last edge $(i^{m}_{m},j^{m}_{m})$ . Consider the set of nodes

\displaystyle S=\mathrm{B}_{G_{0}}(i^{0}_{m},p)\cup\mathrm{B}_{G_{0}}(j^{0}_{m},p)\cup\mathrm{B}_{G_{1}}(i^{m}_{m},p)\cup\mathrm{B}_{G_{1}}(j^{m}_{m},p).

(47)

Note that $S$ has size at most $4D$ . Find a sample $\sigma_{1}$ according to the conditional distribution

\displaystyle\mathbb{P}_{G_{1}}(\sigma_{1}|\sigma_{1,i}=\sigma_{0,i}~{}i\notin S).

(48)

As a result the cardinality of $\sigma_{0}\Delta\sigma_{1}$ is at most $4D$ . In the same fashion, we produce samples $\sigma_{2},\ldots,\sigma_{m}$ , where each $\sigma_{t}$ is found by conditioning on $\sigma_{t-1}$ similarly. Again, $|\sigma_{t}\Delta\sigma_{t+1}|\leq 4D$ for all $t=0,\ldots,m-1$ .

Next, we claim that $\sigma_{1}$ is distributed as $\mathbb{P}_{G_{1}}$ . Indeed for every $\sigma_{1}\in\{0,1\}^{n}$ , its probability mass according to this sampling procedure is

\displaystyle\sum_{\sigma_{0}:\sigma_{0i}=\sigma_{1i},i\notin S}\mathbb{P}_{G_{0}}(\sigma_{0})\Bigg{[}\frac{\mathbb{P}_{G_{1}}(\sigma_{1})}{\sum_{\sigma:\sigma_{i}=\sigma_{1i},i\notin S}\mathbb{P}_{G_{1}}(\sigma)}\Bigg{]}=\mathbb{P}_{G_{1}}(\sigma_{1})

(49)

by property (40). Similarly, we have that $\sigma_{t}$ is distributed as $\mathbb{P}_{G_{t}}(\sigma_{t})$ . We have shown that our desired coupled sequence of $m+1$ independent sets exists.

Note that $\mathrm{Ex}[|\sigma_{t}|]$ is independent of $t$ , where as before the expectation is both with respect to the randomness of $G_{t}$ and the QAOA+. We claim that for every $\mu>0$

\displaystyle\mathrm{Ex}[|\sigma_{t}|]\leq(1+\mu)n\eta^{*}\alpha_{d},

(50)

for all large enough $n$ . Observe that by the Concentration Theorem it suffices to show (50) in order to prove our main result which is that the QAOA+ fails to produce independent sets that are bigger than $\eta^{*}$ -optimal.

Assume to the contrary, that (50) is violated for infinitely many $n$ . For any such $n$ , given $G_{t}$ , $\sigma_{t}$ has the distribution $\mathbb{P}_{G_{t}}$ , and $G_{t}$ has the $\mathbb{G}(n,m)$ distribution. Taking $\delta={\mu\over 2}\eta^{*}\alpha_{d}$ in (44) and then using the union bound, we have

\displaystyle\mathrm{Prob}\left[\min_{t}|\sigma_{t}|\geq(1+\mu/2)n\eta^{*}\alpha_{d}\right]\geq 1-e^{-n^{\gamma_{3}}},

(51)

for some $\gamma_{3}>0$ . Let $\hat{\eta}=(1+\mu/2)\eta^{*}$ . Find $0<\tau_{1}<\tau_{2}<\hat{\eta}$ from the Overlap Gap Theorem with respect to $\hat{\eta}$ . Then because $\tau_{2}<\hat{\eta}$ we have

\displaystyle\mathrm{Prob}\left[\min_{t}|\sigma_{t}|\geq n\tau_{2}\alpha_{d}\right]\geq 1-e^{-n^{\gamma_{3}}}.

(52)

By (51) and the second part of the Overlap Gap Theorem then

\displaystyle\mathrm{Prob}\big{[}~{}|\sigma_{0}\cap\sigma_{m}|\leq n\tau_{1}\alpha_{d}~{}\big{]}\geq 1-e^{-n^{\gamma_{3}}}-e^{-cn}.

(53)

Now let us track the intersection $|\sigma_{0}\cap\sigma_{t}|$ as $t$ goes from $0$ to $m$ . For $t$ at $0$ this is bigger than $n\tau_{2}\alpha_{d}$ but at $t=m$ it is less than $n\tau_{1}\alpha_{d}$ . However by the Overlap Gap Theorem we know there is (with high probability) no middle ground so there is some $T$ where $|\sigma_{0}\cap\sigma_{T}|$ is big but $|\sigma_{0}\cap\sigma_{T+1}|$ is small. As a general property of sets we have

\big{|}|\sigma_{0}\cap\sigma_{T}|-|\sigma_{0}\cap\sigma_{T+1}|\big{|}\leq\big{|}\sigma_{T}\Delta\sigma_{T+1}\big{|}.

(54)

Using (46) we get $n\alpha_{d}(\tau_{2}-\tau_{1})\leq 4D$ which is a contradiction for large enough $n$ because $D\leq n^{A/2}$ , $A/2<1/2<1$ , with high probability, see (45). This contradiction shows that the Obstruction Theorem is true.

8 Proofs of Neighborhood Size and Concentration Results

8.1 Neighborhood Size

We begin by restating and then proving the Neighborhood Size Theorem. This result concerns random graphs and makes no reference to the quantum algorithm.

Neighborhood Size Theorem

Fix $d>1$ , and $w<1$ . If

\displaystyle 2p\leq{w\log n\over\log(d/\ln{2})}

(55)

then there exist $a>0$ and $A<1$ such that

		$\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,2p)\geq n^{A}\Big{]}\leq e^{-n^{a}}\$		(56)
	$\displaystyle and~{}~{}~{}$	$\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,p)\geq n^{A/2}\Big{]}\leq e^{-n^{a/2}}.$		(57)

To prove this we consider a branching process where each parent has Poisson( $d$ ) children. Let $Z_{k}=$ the size of the $k$ th generation with $Z_{0}=1$ . Let

\displaystyle\phi_{k}(t)=\mathrm{E}[e^{tZ_{k}}]

(58)

where the expectation E is with respect to the Poisson process. We have

	$\displaystyle\phi_{0}(t)$	$\displaystyle=e^{t}$
	$\displaystyle\phi_{1}(t)$	$\displaystyle=e^{d(e^{t}-1)},$		(59)

the Poisson moment generation function, and generally

\displaystyle\phi_{k+1}(t)=e^{d(\phi_{k}(t)-1)}.

(60)

We first show

\displaystyle\phi_{k}(({\ln{2}/d})^{k})\leq e\ \text{ for any}\ k\geq 0.

(61)

Assume by induction on $j$ , that

\displaystyle\phi_{j}(({\ln{2}/d})^{k})\leq e^{(\ln{2}/d)^{k-j}}

(62)

which we show as follows. This holds for $j=0$ , and

	$\displaystyle\phi_{j+1}(({\ln{2}/d})^{k})$	$\displaystyle=e^{d(\phi_{j}(({\ln{2}/d})^{k})-1)}$
		$\displaystyle\leq e^{d(e^{({\ln{2}/d})^{k-j}}-1)}$		(63)

by hypothesis. Using $(e^{x\ln{2}}-1)\leq x$ for $0\leq x\leq 1$ , this is

\displaystyle\leq e^{d(\ln{2})^{k-j-1}\over d^{k-j}}=e^{(\ln{2}/d)^{k-(j+1)}}.

(64)

Now Markov’s inequality says that for any $t$ and $u$

	$\displaystyle\mathrm{P}\left[Z_{k}\geq u\left({d/\ln{2}}\right)^{k}\right]$	$\displaystyle\leq e^{-u\left(d/\ln{2}\right)^{k}t}\phi_{k}(t)$
		$\displaystyle\leq e^{-u}e$		(65)

which we get by choosing $t=(\ln{2}/d)^{k}$ . Note that P is the probability associated with the Poisson branching process. We will pick $u$ to make this small, but first we need to bound $Z_{1}+Z_{2}+\ldots+Z_{k}=B_{k}$ since this is what we compare with the graph neighborhood. Note that

\displaystyle\mathrm{P}\Big{[}B_{k}\geq\lambda\Big{]}\leq\mathrm{P}\left[Z_{1}\geq{\lambda\over k}\right]+\mathrm{P}\left[Z_{2}\geq{\lambda\over k}\right]+\ldots+\mathrm{P}\left[Z_{k}\geq{\lambda\over k}\right].

(66)

Choose $\lambda=d^{sk}\left(d/\ln{2}\right)^{k}$ and $u={d^{sk}\over k}$ with $s$ to be determined later. Then (for $k$ large enough, using $d>1$ )

\displaystyle\mathrm{P}\left[B_{k}\geq d^{sk}\left(d/\ln{2}\right)^{k}\right]\leq k\,e^{-{d^{sk}\over k}}\,e

(67)

(since the moment generating function for $Z_{k}$ is the biggest) which we can write as

\displaystyle\mathrm{P}\left[B_{k}\geq d^{sk}\left(d/\ln{2}\right)^{k}\right]\leq e^{-{d^{sk/2}}}

(68)

for $k$ large enough.

This bound applies to Poisson branching. To make contact with our graph neighborhoods we first compare the branching process to Erdos-Renyi graphs and then to our random graphs with a fixed number of edges. In the Erdos-Renyi graph, each vertex has Binomial $(n-1,{d\over n-1})$ neighbors which as $n$ goes to infinity is Poisson( $d$ ). For finite $n$ the moment generating function of this Binomial is less than the moment generating function of the Poisson, so the $k$ -neighborhood of a vertex in the random Erdos-Renyi graph satisfies the same bound as the branching process. Let $f_{y}$ be the probability that (say) vertex $i$ in the Erdos-Renyi graph has its $k$ -neighborhood at least as big as $d^{sk}({d/\ln{2}})^{k}$ , conditioned on the graph having exactly $y$ edges. (The number of edges is a Binomial $(\binom{n}{2},{d\over n-1})$ random variable.) Then

\displaystyle\sum\limits^{\binom{n}{2}}_{y=0}\ \mathrm{P}(y\ \text{edges})\,f_{y}\leq\,e^{-d^{sk/2}}.

(69)

Starting the sum at $m={nd\over 2}$ we have

\displaystyle\sum\limits^{\binom{n}{2}}_{y=m}\mathrm{P}(y\ \text{edges})\,f_{y}\leq e^{-d^{sk/2}}

(70)

and since $f_{y}$ is an increasing function of $y$ we have

\displaystyle\sum\limits^{\binom{n}{2}}_{y=m}\mathrm{P}(y\ \text{edges})\,f_{m}\leq e^{-d^{sk/2}}.

(71)

Now $m={nd\over 2}$ is the expected number of edges so this is $({1\over 2}+o(1))f_{m}$ and we conclude that

\displaystyle f_{m}\leq(2+o(1))e^{-d^{sk/2}}\leq e^{-d^{sk/3}}

(72)

for $k$ large enough. So using the indirect connection between the branching process and our graphs, we have

\displaystyle\mathrm{Prob}\left[\mathrm{B}(i,k)\geq d^{sk}\left(d/\ln{2}\right)^{k}\right]\leq e^{-d^{sk/3}}

(73)

where Prob is with respect to the graph distribution with a fixed number of edges $m$ . Now take $k={w\log n\over\log(d/\ln{2})}$ and recall that by assumption $2p\leq k$ and we have

\displaystyle\mathrm{Prob}\left[\mathrm{B}(i,2p)\geq\left(d^{s}d/\ln{2}\right)^{w\log n\over\log(d/\ln{2})}\right]\leq\mathrm{Prob}\left[\mathrm{B}(i,k)\geq\left(d^{s}d/\ln{2}\right)^{w\log n\over\log(d/\ln{2})}\right]\leq e^{-d^{sw\log n\over 3\log(d/\ln{2})}}.

(74)

We can take $\log=\log_{d}$ in the above, and writing $\log_{d}(1/\ln{2})=L$ we get

\displaystyle\mathrm{Prob}\left[\mathrm{B}(i,2p)\geq n^{(1+s+L)w\over 1+L}\right]\leq e^{-n^{sw\over 3(1+L)}}.

(75)

We can pick $s>0$ to make $A={(1+s+L)w\over 1+L}<1$ because $w<1$ . By the union bound, $\mathrm{Prob}[\max_{i}\mathrm{B}(i,2p)\geq n^{A}]$ can be at most $n$ times as large, so choosing $a<{sw\over 3(1+L)}$ yields the first half of the theorem. The other half, for $B(i,p)$ , has $w\over 2$ in the exponent. End of Proof.

8.2 Concentration

We now restate and prove the Concentration Theorem. This result concerns the concentration of the Hamming weight in the output strings of the shallow depth QAOA+. The first part applies to the QAOA+ acting on a fixed random graph. The second part is a statement about concentration on random graphs.

Concentration Theorem

\displaystyle\mathbb{P}_{G}\Big{[}~{}\big{|}|\sigma|-\mathbb{E}_{G}|\sigma|\big{|}\geq\delta n~{}\Big{]}\leq e^{-\delta n^{\gamma_{1}}}

(76)

for $n$ large enough. Here the graph is fixed and the randomness comes from the QAOA+.

2. Now let $G$ be a random $\mathbb{G}(n,{dn\over 2})$ graph. Suppose

		$\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,2p)\geq n^{A}\Big{]}\leq e^{-n^{a}}\$		(77)
	$\displaystyle and~{}~{}~{}$	$\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,p)\geq n^{A/2}\Big{]}\leq e^{-n^{a/2}}.$		(78)

Then there exists $\gamma_{2}>0$ such that for any $0<\delta<1$ ,

\displaystyle\mathrm{Prob}\Big{[}~{}\big{|}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq\delta n~{}\Big{]}\leq e^{-{\delta^{2}}n^{\gamma_{2}}},

(79)

for $n$ large enough. Here $\mathrm{Prob}$ and $\mathrm{Ex}$ refer to Probability and Expectation coming from the distribution of random graphs combined with the randomness coming from the QAOA+.

Consider the QAOA+ running on a graph $G$ and outputting an independent set $\sigma$ . The Hamming weight of $\sigma$ is a sum of $n$ random variables $b_{1}+b_{2}+\cdots+b_{n}$ , where each $b_{i}\in\{0,1\}$ is independent of the collection $\{b_{j}\}$ for $j$ not within a distance $2p$ of $i$ in $G$ . With $p$ small enough so that the $2p$ -neighborhoods are small compared to $n$ , this is enough to get a sub-exponential bound on the concentration of $|\sigma|$ for a fixed graph $G$ . We use the standard technique of moment generating functions (as in the Chernoff bound) to prove (76).

Suppose $\max\limits_{i}\mathrm{B}_{G}(i,2p)\leq n^{A}$ , with $A<1$ . Let

\displaystyle Y_{i}=b_{i}-\mathbb{E}_{G}\,(b_{i}).

(80)

We will first bound $\mathbb{E}_{G}\,[\sum Y_{i}]^{t}$ and use that to bound $\mathbb{E}_{G}[e^{\theta\sum Y_{i}}]$ , from which (76) will follow. We have

\displaystyle\mathbb{E}_{G}\,\big{[}\sum Y_{i}\big{]}^{t}=\sum_{1\leq i_{1},\cdots,{i_{t}}\leq n}\,\mathbb{E}_{G}\big{[}Y_{i_{1}}\cdots Y_{i_{t}}\big{]}.

(81)

Because each $\mathbb{E}_{G}[Y_{i}]=0$ , any term in the sum is 0 by independence unless for each $i_{k}$ , there is an $i_{\ell}~{}(\ell\neq k)$ in $\mathrm{B}_{G}(i_{k},2p)$ . The number of nonzero terms is at most $t!(nn^{A})^{t/2}$ - see note at end of this section. Since $|Y_{i}|\leq 1$ , we have

\displaystyle\mathbb{E}_{G}\,\big{[}\sum\,Y_{i}\big{]}^{t}\ \leq\ t!(nn^{A})^{t/2}\ .

(82)

Multiply both sides by $\theta^{t}$ , divide by $t!$ and sum on $t$ to get

\displaystyle\mathbb{E}_{G}\big{[}\,e^{\theta\sum Y_{i}}\big{]}\leq\sum^{\infty}_{t=0}\,\theta^{t}(nn^{A})^{t/2}=\frac{1}{1-\theta n^{\frac{1+A}{2}}}.

(83)

Now we can prove (76). By the Markov inequality, for any $\theta$ ,

\displaystyle\mathbb{P}_{G}\Big{[}\sum Y_{i}\geq\delta n\Big{]}\leq e^{-\theta\delta n}\mathbb{E}_{G}\Big{[}e^{\theta\sum Y_{i}}\bigg{]}

(84)

which combines with the previous result to give

\displaystyle\mathbb{P}_{G}\Big{[}\sum Y_{i}\geq\delta n\Big{]}\leq\frac{e^{-\theta\delta n}}{1-\theta n^{\frac{1+A}{2}}}.\hskip 31.50005pt

Choose $0<\gamma^{\prime}<\frac{1-A}{2}\$ and $\ \theta=n^{\gamma^{\prime}-1}$ , so

\displaystyle\mathbb{P}_{G}\Big{[}\sum\,Y_{i}\geq\delta n\Big{]}\leq\frac{e^{-\delta n^{\gamma^{\prime}}}}{1-n^{\left(\gamma^{\prime}-\frac{1}{2}+\frac{A}{2}\right)}}.

(85)

The bound on $\mathbb{P}_{G}\big{[}\sum\,Y_{i}\leq-\delta n\big{]}$ is the same (use $\ \theta=-n^{\gamma^{\prime}-1}$ ). Replace $\gamma^{\prime}$ by any $\gamma_{1}<\gamma^{\prime}$ and we have

\displaystyle\mathbb{P}_{G}\left[\left|\sum Y_{i}\right|\geq\delta n\right]\leq e^{-\delta n^{\gamma_{1}}}

(86)

for large $n$ . This is (76).

The second concentration bound (77) follows from the fact that $\mathbb{E}_{G}|\sigma|$ is very unlikely to vary much as $G$ varies as we now show. The random graph $G$ consists of $m$ edges $e_{1},e_{2},\cdots e_{m}$ chosen independently uniformly over the $n\choose 2$ possibilities. (We ignore the $O(1)$ collisions.) Let

\displaystyle f(\mathbf{e})=f(e_{1},e_{2},\cdots e_{m})=\mathbb{E}_{G}|\sigma|.

(87)

Here the graph is fixed and the expectation is with respect to the QAOA+. As long as the $p$ -neighborhoods of the vertices in $G$ are small, a change in one edge of $G$ makes only a small change in $f(e_{1},e_{2},\cdots e_{m})$ by the locality of the QAOA+. It is natural to use the Azuma inequality which bounds concentration probability in terms of local changes. But the neighborhoods are only small with high probability so we cannot use the Azuma inequality directly on the function $f$ . Instead we adjust $f$ using a version of Kirzbraun’s Theorem on extending Lipschitz functions.

To set the stage we state the Azuma inequality. Consider a real valued function $\phi(\bf{e})$ where $\bf{e}$ is a set of $m$ variables $e_{1},e_{2}\cdots e_{m}$ with the property that $\phi$ does not change much when one of the variables changes. Let $\tilde{\textbf{e}}$ be equal to $\bf{e}$ except in one of the $m$ variables with the Lipschitz condition on $\phi$ being

\displaystyle|\phi(\textbf{e})-\phi(\tilde{\textbf{e}})|\leq R

(88)

for some fixed $R$ . Now we take the $\{e_{i}\}$ to be independent random variables. The Azuma inequality states that

\displaystyle\mathrm{Prob}\Big{[}\big{|}~{}\phi-\mathrm{Ex}[\phi]~{}\big{|}\geq t\Big{]}\leq 2\mathrm{exp}\Big{(}\frac{-t^{2}}{2mR^{2}}\Big{)}.

(89)

Return to our function $f(\bf{e})$ where the $e_{i}$ are edges in a graph. We will use (78). Let $\mathrm{K}_{n}$ be the set of graphs with $n$ vertices and ${dn\over 2}$ edges that have small neighborhoods

\displaystyle\mathrm{K}_{n}=\big{\{}G:\max_{i}|B_{G}(i,p)|\leq n^{A/2}\big{\}}

(90)

\displaystyle\mathrm{Prob}\big{[}K_{n}\big{]}\geq 1-e^{-n^{a/2}}

(91)

for $n$ large enough. If $\bf{e}$ and $\tilde{\bf{e}}$ differ by one edge and both are in $\mathrm{K}_{n}$ the we have

\displaystyle\big{|}f(\textbf{e})-f(\tilde{\textbf{e}})\big{|}\leq 4n^{A/2}

(92)

since the QAOA+ outputs can only differ on bits in the 4 neighborhoods of the vertices at the ends of the 2 swapped edges. For graphs not necessarily in $\mathrm{K}_{n}$ we need to modify $f$ . Let $\rho(\bf{e},\tilde{\bf{e}})$ be the number of edge changes needed to turn $\bf{e}$ into $\tilde{\bf{e}}$ . By (92) if e and $\tilde{\textbf{e}}$ are in $\mathrm{K}_{n}$ then

\displaystyle\big{|}f(\textbf{e})-f(\tilde{\textbf{e}})\big{|}\leq 4n^{A/2}\rho(\bf{e},{\tilde{\bf{e}}}).

(93)

Now define for any e,

\displaystyle g(\textbf{e})=\min_{\textbf{e}^{\prime}\in\mathrm{K}_{n}}\big{[}f(\textbf{e}^{\prime})+4n^{A/2}\rho(\textbf{e},\textbf{e}^{\prime})\big{]}.

(94)

For $\textbf{e}\in\mathrm{K}_{n}$ we see that $g(\textbf{e})=f(\textbf{e})$ by (93). For any e and $\tilde{\textbf{e}}$ , we have

\displaystyle\big{|}g(\textbf{e})-g(\tilde{\textbf{e}})\big{|}\leq 4n^{A/2}\rho(\bf{e},{\tilde{\bf{e}}}).

(95)

To see this, note that there is an $\textbf{e}^{\prime}\in\mathrm{K}_{n}$ with

\displaystyle\begin{split}g(\textbf{e})=f(\textbf{e}^{\prime})+4n^{A/2}\rho(\textbf{e},\textbf{e}^{\prime})\\ g(\tilde{\textbf{e}})\leq f(\textbf{e}^{\prime})+4n^{A/2}\rho(\tilde{\textbf{e}},\textbf{e}^{\prime}).\end{split}

(96)

Subtracting these two and using the triangle inequality yields

\displaystyle g(\tilde{\textbf{e}})-g(\textbf{e})\leq 4n^{A/2}\rho(\textbf{e},\tilde{\textbf{e}}).

(97)

Repeat with e and $\tilde{\textbf{e}}$ interchanged and (95) follows. We are going to apply the Azuma theorem to $g$ so note that if e differs from $\tilde{\textbf{e}}$ in just one edge then $\rho(\textbf{e},\tilde{\textbf{e}})$ is $1$ in (95). Then for any $\delta$ the inequality (89) says

\displaystyle\mathrm{Prob}\Big{[}\big{|}g-\mathrm{Ex}[g]\big{|}\geq\delta n/2\Big{]}\leq 2\mathrm{exp}\Big{(}\frac{-(\delta n/2)^{2}}{2(dn/2)(4n^{A/2})^{2}}\Big{)}=2\mathrm{exp}\Big{(}-\frac{\delta^{2}n^{1-A}}{64d}\Big{)}.

(98)

We are going to use this to bound $\mathrm{Prob}\big{[}|f-\mathrm{Ex}[f]|\geq\delta n\big{]}$ which appears in (79) since again $f=\mathbb{E}_{G}[|\sigma|]$ and $f$ agrees with $g$ except outside of $\mathrm{K}_{n}$ . Outside of $\mathrm{K}_{n}$ we have the crude bound that $|f-g|\leq(n+4n^{A/2}dn/2)$ so by (91)

\displaystyle\big{|}\mathrm{Ex}[f]-\mathrm{Ex}[g]\big{|}\leq\big{(}n+4n^{A/2}dn/2\big{)}e^{-n^{a/2}}\leq\delta n/2,

(99)

for any $\delta$ if $n$ is big enough. Now

\displaystyle\mathrm{Prob}\Big{[}\big{|}f-\mathrm{Ex}[f]\big{|}\geq\delta n\Big{]}\leq\mathrm{Prob}\Big{[}\big{|}f-\mathrm{Ex}[g]\big{|}\geq\delta n/2\Big{]}.

(100)

Inside $\mathrm{K}_{n}$ we have $f=g$ while the probability of being outside of $\mathrm{K}_{n}$ is bounded by (91) so

\displaystyle\mathrm{Prob}\Big{[}\big{|}f-\mathrm{Ex}[f]\big{|}\geq\delta n\Big{]}\leq\mathrm{Prob}\Big{[}\big{|}g-\mathrm{Ex}[g]\big{|}\geq\delta n/2\Big{]}+e^{-n^{a/2}},

(101)

which by (98) is

\displaystyle\mathrm{Prob}\Big{[}\big{|}f-\mathrm{Ex}[f]\big{|}\geq\delta n\Big{]}\leq 2\mathrm{exp}\Big{(}-\frac{\delta^{2}n^{1-A}}{64d}\Big{)}+e^{-n^{a/2}}.

(102)

We can rewrite this (take $\delta<1$ ) as

\displaystyle\mathrm{Prob}\Big{[}\big{|}\mathbb{E}_{G}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq\delta n\Big{]}\leq e^{-{\delta^{2}}n^{\tilde{a}}}

(103)

with $\tilde{a}$ less than both $1-A$ and $a/2$ , for $n$ large. The first half of the concentration theorem implies

\displaystyle\mathrm{Prob}\Big{[}\big{|}|\sigma|-\mathbb{E}_{G}|\sigma|\big{|}\geq\delta n\Big{]}\leq e^{-\delta n^{\gamma_{1}}}+e^{-n^{a}},

(104)

where the $e^{-n^{a}}$ accounts for graphs with large $2p$ -neighborhoods.

Now the triangle inequality says

\displaystyle\mathrm{Prob}\Big{[}\big{|}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq 2\delta n\Big{]}\leq e^{-{\delta^{2}}n^{\tilde{a}}}+e^{-\delta n^{\gamma_{1}}}+e^{-n^{a}}.

(105)

By taking $\gamma_{2}$ smaller than $\tilde{a},\gamma_{1}$ and $a$ we obtain

\displaystyle\mathrm{Prob}\Big{[}\big{|}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq 2\delta n\Big{]}\leq e^{-{\delta^{2}}n^{\gamma_{2}}},

(106)

for all $n$ large. This is (79), as desired, if we go back to the beginning and let $\delta$ go to $\delta/2$ .

8.3 Counting the number of non-zero terms

To bound the number of non-zero terms in

\displaystyle\sum\limits_{0\leq i_{1},\ldots,i_{t}\leq n}\ \mathbb{E}_{G}\left(Y_{i_{1}}\cdots Y_{i_{t}}\right)

(107)

first note that since each $\mathbb{E}_{G}[Y_{i}]=0$ , any term in the sum is 0 by independence unless for each $i_{k}$ , there is an $i_{\ell}~{}(\ell\neq k)$ in $\mathrm{B}_{G}(i_{k},2p)$ . We are going to graphically represent the non-zero contributions by considering graphs with vertices $1,2,\ldots t$ , possibly disconnected. A graph will be called valid if no vertex is isolated. We say that a sequence $i_{1},i_{2},\ldots i_{t}$ satisfies a graph $\Gamma$ on $\{1,2,\ldots t\}$ if $i_{\ell}$ is in the $2p$ -neighborhood of $i_{k}$ (and, equivalently, vice-versa) whenever $(\ell,k)$ is an edge of $\Gamma$ . We know that every sequence $i_{1},i_{2},\ldots i_{t}$ with $\mathbb{E}_{G}\left(Y_{i_{1}}\cdots Y_{i_{t}}\right)\neq 0$ satisfies at least one valid graph, and in particular it satisfies some minimal valid graph – minimal means no subgraph with $t$ vertices but fewer edges is valid. So we need to bound the number of minimal valid graphs as a function of $t$ . Call this number $V_{t}$ . A minimal valid graph has this form: each of its components is a star, that is, a tree with a central vertex connected to all the other vertices in the component. Suppose the largest component has size $r$ . If $r=2$ , then the minimal valid graphs are matchings ( $t$ must be even) of which there are $(t-1)(t-3)\ldots 1$ for $t\geq 2$ . For $r>2$ , there $t\binom{t-1}{t-r}$ stars of size $r$ , and at most $t\binom{t-1}{t-r}\ V_{t-r}$ minimal valid graphs which may be overcounting because we’ve singled out one component of size $r$ . So for $t\geq 2$

	$\displaystyle V_{t}\leq$	$\displaystyle\ (t-1)(t-3)\cdots 1+\sum\limits^{t-2}_{r=3}\ t\binom{t-1}{t-r}\ V_{t-r}$
	$\displaystyle=$	$\displaystyle\ {t!\over t(t-2)\cdots 2}+\sum\limits^{t-2}_{r=3}\ {t(t-1)\cdots(t-r+1)\over(r-1)!}V_{t-r}.$		(108)

Now suppose inductively that $V_{t-r}\leq(t-r)!$ for $r\geq 3$ . (Note that $V_{2}=1$ and $V_{3}=3$ .) Then for $t\geq 4$ ,

$\displaystyle V_{t}\leq$	$\displaystyle\ {t!\over t(t-2)}+\sum\limits^{t-2}_{r=3}\ {t(t-1)\cdots(t-r+1)\over(r-1)!}(t-r)!$
$\displaystyle\leq$	$\displaystyle\ {t!\over 8}+\sum\limits^{\infty}_{r=3}\ {t!\over(r-1)!}$	(109)
$\displaystyle=$	$\displaystyle\ t!({1\over 8}+e-2)<~{}t!~{}.$	(110)

So there are at most $t!$ minimal valid graphs with $t$ vertices. How many sequences $i_{1},i_{2},\ldots,i_{t}$ , where $1\leq i_{\ell}\leq n$ , can satisfy a given minimal valid graph? If it has $u$ components, then it must have $t-u$ edges. The central vertex $\ell$ in any component corresponds to $i_{\ell}$ which has $n$ possible values. If $(\ell,k)$ is an edge in the minimal valid graph, $i_{k}$ can have only $n^{A}$ values, since the sequence $i_{1}\ldots i_{t}$ is assumed to satisfy this minimal valid graph. So there are at most $n^{u}(n^{A})^{t-u}$ possible sequences and since $u\leq t/2$ , this is $\leq n^{t/2}(n^{A})^{t/2}$ . Multiplying by the bound on the number of minimal valid graphs, there are at most

\displaystyle t!n^{t/2}(n^{A})^{t/2}

(111)

nonzero terms in (107).

9 Discussion

No one knows if a quantum computer running a quantum algorithm will be able to outperform a classical computer on a combinatorial search problem. One approach is to build a quantum computer, run a quantum algorithm and see what happens at the available number of qubits. Another approach is to look for a provable quantum speedup over the best known classical algorithms. To this end it is useful know if there are provable limitations to the power of any proposed quantum algorithm.

In this paper we look at the Quantum Approximate Optimization Algorithm applied to finding a large independent set in a random graph of fixed average degree $d$ . The performance of the QAOA can only improve with depth $p$ but we show that for Maximum Independent Set on random graphs the algorithm will fail to pass a certain performance barrier if $2p$ is less than $w\log n/\log{(d/\ln{2})}$ for any $w<1$ with $d$ big enough. (This ratio is independent of the base of the $\log$ .) The quantum algorithm consists of $p$ unitaries that each respect the locality of the underlying graph. With a fixed average degree of $d$ this means that each qubit typically has an influence sphere of roughly $d^{p}$ other qubits. For qubits further than $2p$ apart on the graph these influence spheres do not intersect and we can show that measurements of these qubits are uncorrelated. This is key to our showing that the algorithm has limited power. However if $p$ is large enough that $d^{p}$ exceeds $n$ our arguments do not apply and we have no indication that the QAOA will fail.

Our results are for random graphs and we do not have results for say a graph which is a $2$ dimensional square lattice. In this case perhaps the border between failure and possible success is at $\sqrt{n}$ . Back to random graphs. Although our proof technique requires $d$ big, the intuition is that if $d^{2p}<n$ then most pairs of qubits will have independent measurement outcomes. Consider the case when the degree is small so the influence spheres are small, the least favorable situation for the QAOA. For example with $d=3$ at one million qubits our result sugggest failure for $p$ less than $7$ . The bipartite construction of [3] only shows failure at 2 million qubits with $d=3$ for $p$ less than $1$ . Just beyond this the QAOA “sees” the whole graph and we can not say with certainly what happens at a few million qubits in the shallow circuit depth regime with $p$ say in double digits.

Acknowledgement

We thank Aram Harrow for helpful discussion.

References

[1] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm. arXiv:1411.4028, 2014.
[2] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm applied to a bounded occurrence constraint problem. arXiv:1412.6062, 2014.
[3] Sergey Bravyi, Alexander Kliesch, Robert Koenig, and Eugene Tang. Obstacles to state preparation and variational optimization from symmetry protection. arXiv:1910.08980, 2019.
[4] David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. Annals of Probability, 45:2353–2376, 2017.
[5] David Gamarnik and Madhu Sudan. Performance of sequential local algorithms for the random nae-k-sat problem. SIAM Journal on Computing, 46(2):590–619, 2017.
[6] Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, Mustazee Rahman, et al. Suboptimality of local algorithms for a class of max-cut problems. The Annals of Probability, 47(3):1587–1618, 2019.
[7] Amin Coja-Oghlan, Amir Haqshenas, and Samuel Hetterich. Walksat stalls well below satisfiability. SIAM Journal on Discrete Mathematics, 31(2):1160–1173, 2017.
[8] Gérard Ben Arous and Aukosh Jagannath. Spectral gap estimates in mean field spin glasses. Comm. Math. Phys., 361(1):1–52, 2018.
[9] David Gamarnik, Aukosh Jagannath, and Subhabrata Sen. The overlap gap property in principal submatrix recovery. arXiv:1908.09959, 2019.
[10] David Gamarnik and Aukosh Jagannath. The overlap gap property and approximate message passing algorithms for $p$ -spin models. arXiv:1911.06943, 2019.
[11] Andrea Montanari. Optimization of the Sherrington-Kirkpatrick hamiltonian. arXiv:1812.10897, 2018.
[12] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Leo Zhou. The quantum approximate optimization algorithm and the Sherrington-Kirkpatrick model at infinite size. arXiv:1910.08187, 2019.
[13] M. Bayati, D. Gamarnik, and P. Tetali. Combinatorial approach to the interpolation method and scaling limits in sparse random graphs. Annals of Probability. (Conference version in Proc. 42nd Ann. Symposium on the Theory of Computing (STOC) 2010), 41:4080–4115, 2013.
[14] A. Frieze. On the independence number of random graphs. Discrete Mathematics, 81:171–175, 1990.
[15] Richard M Karp. The probabilistic analysis of some combinatorial search algorithms. Algorithms and complexity: New directions and recent results, 1:1–19, 1976.
[16] Fernando G. S. L. Brandao, Michael Broughton, Edward Farhi, Sam Gutmann, and Hartmut Neven. For fixed control parameters the quantum approximate optimization algorithm’s objective function value concentrates for typical instances. arXiv:1812.04170, 2018.
[17] Leo Zhou, Sheng-Tao Wang, Soonwon Choi, Hannes Pichler, and Mikhail D. Lukin. Quantum approximate optimization algorithm: Performance, mechanism, and implementation on near-term devices. arXiv:1812.01041, 2018.