This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Quantum Approximate Optimization Algorithm Needs to See the Whole Graph: A Typical Case

Edward Farhi Google Inc., Venice CA 90291 and Center for Theoretical Physics, MIT, Cambridge MA, 02139 David Gamarnik Operations Research Center and Sloan School of Management MIT, Cambridge MA, 02140 Sam Gutmann
Abstract

The Quantum Approximate Optimization Algorithm can naturally be applied to combinatorial search problems on graphs. The quantum circuit has p applications of a unitary operator that respects the locality of the graph. On a graph with bounded degree, with p small enough, measurements of distant qubits in the state output by the QAOA give uncorrelated results. We focus on finding big independent sets in random graphs with dn/2 edges keeping d fixed and n large. Using the Overlap Gap Property of almost optimal independent sets in random graphs, and the locality of the QAOA, we are able to show that if p is less than a d-dependent constant times log n, the QAOA cannot do better than finding an independent set of size .854 times the optimal for d large. Because the logarithm is slowly growing, even at one million qubits we can only show that the algorithm is blocked if p is in single digits. At higher p the algorithm “sees” the whole graph and we have no indication that performance is limited.

1 Introduction

The Quantum Approximation Optimization Algorithm [1, 2] is designed to find approximate solutions to combinatorial search problems, and here we consider its application to finding large independent sets in random graphs. The graphs have nn vertices and dn2dn\over 2 edges chosen uniformly at random, with dd, the average degree of each graph, fixed. The quantum algorithm consists of an alternation of 2p2p unitaries, half of which are single-qubit unitaries and the other half only interact qubits that are connected by an edge in the graph. On a bounded-degree graph, with pp fixed or growing slowly with nn, the QAOA does not “see” the whole graph. This means that bits output by the QAOA are uncorrelated at graph distances larger than 2p2p. Looking at random graphs of average degree dd, when 2p2p is less than a multiple of logn\log n we will show that the power of the algorithm is limited. More precisely if 2pwlogn/log(d/ln2)2p\leq w\log n/\log(d/\ln 2) for any w<1w<1 and dd big enough, the QAOA fails to produce an independent set larger than .854 times the optimal. (The ratio of logs is independent of the base of the log.) If pp is large enough that the algorithm “covers” the whole graph we have no indication that the algorithm has limited power.

In the first QAOA paper [1] it was shown that there exists a set of large Max-Cut instances on which the p=2p=2 algorithm fails to achieve an approximation ratio of better than 0.756. These are bipartite 3-regular graphs with o(n)o(n) squares. Although completely satisfiable, at this shallow depth, the QAOA can not detect if there are large odd length loops which would make it not completely satisfiable and hence the approximation ratio is provably less than 11. Recently [3] looking at Max-Cut constructed a sequence of dd-regular bipartite graphs for which the QAOA at depth p<(1/3log2n4)d1p<(1/3~{}\mathrm{log}_{2}n-4)d^{-1} fails to find an approximation ratio better than some dd dependent constant less than 11. This result is similar to the one in this paper as it considers pp growing logarithmically with nn. However theirs is a worst case result and ours is for typical instances. Also crucially [3] require that the cost function have a Z2Z_{2} symmetry and that the initial state be an eigenstate of the Z2Z_{2} operator. In our setup these conditions are not needed and for the problem we study they are not met.

Our proof method uses the Overlap Gap Property (OGP) exhibited by the large independent sets of random graphs with bounded average degree established in [4]. Roughly speaking the OGP says that for a given random graph, the intersection of any two large (i.e. nearly optimal) independent sets is either big or small, that is, there is no middle ground. The OGP was established to be an obstruction to a variety of classical algorithms, including local algorithms [4],[5],[6], Markov Chain Monte Carlo (and related) methods, [7],[8],[9], and Approximate Message Passing type algorithms [10]. The application of the OGP as a barrier to quantum algorithms is novel. It depends on the locality of the QAOA: the unitary operators in the algorithm only connect vertices which are connected in the input graph. As a result, because of the bounded average degree, when pp is a small multiple of logn\log n, changing a single edge of the graph affects only o(n)o(n) qubits of the final state. We will use this to show that outputting a large independent set contradicts the OGP.

It is worth remarking that the OGP is conjectured to not hold for the low energy configurations of the Sherrington-Kirkpatrick model. Assuming that the not hold conjecture is true, Montanari recently [11] constructed a polynomial time Approximate Message Passing algorithm for finding a near ground state configuration.

The QAOA has been applied to the Sherrington Kirkpatrick model at fixed pp in the infinite nn limit [12]. The associated graph is fully connected (each vertex has degree n1n-1) so the QAOA sees the whole graph at the lowest values of pp. The techniques of this paper for bounded degree graphs cannot be applied to show any obstacle to the performance of the QAOA on this model.

2 Maximum Independent Set

The computational problem we focus on is Maximum Independent Set or MIS. Given a graph defined as a collection of vertices and edges, an independent set is a subset of the vertices with no graph edge between any two vertices in the set. It is easy to find a small independent set. Finding a big independent set is the challenge. In fact finding the largest independent set in an arbitrary graph is an NP-hard problem. But here we focus on random graphs and are interested in finding big independent sets but not necessarily the biggest. We define α(G)\alpha(G) as the independence ratio which is the size of the biggest independent set in GG divided by the number of vertices. Given a graph GG, an algorithm will output an independent set and the quality of the algorithm can be measured by the size of the output divided by nn as compared to α(G)\alpha(G).

We focus on sparse Erdös-Rényi graphs of nn nodes and mm edges where m=dn2m={dn\over 2} with dd fixed, that is independent of nn. In other words 𝔾(n,dn2)\mathbb{G}(n,{dn\over 2}) is a graph with dn2{dn\over 2} edges chosen by picking this number of edges uniformly at random from the n(n1)/2n(n-1)/2 possible edges. The average degree of each graph is dd. The independence ratio α(𝔾(n,dn2))\alpha(\mathbb{G}(n,{dn\over 2})) of this graph denoted by αn,d\alpha_{n,d} is a random variable with the following known properties. First [13] there exists an αd\alpha_{d} such that

αn,dαd with probability 1 as n.\displaystyle\alpha_{n,d}\to\alpha_{d}\text{ with probability $1$ as }n\to\infty. (1)

Second, while the value of αd\alpha_{d} for finite dd is unknown, the asymptotic value of αd\alpha_{d} as dd increases is known [14]:

limdαd2lnd/d=1.\displaystyle\lim_{d\to\infty}{\alpha_{d}\over 2\ln d/d}=1. (2)

We are interested in algorithms that take a graph as input and output a large independent set. The natural question that arises is if a polynomial time algorithm can produce independent sets close to αd\alpha_{d}. A simple greedy algorithm achieves asymptotically half of the optimal value, that is, it constructs an independent set of size n(lnd/d)(1+od(1))n(\ln d/d)(1+o_{d}(1)) where od(1)o_{d}(1) denotes a function of dd converging to 0 as dd\to\infty. Finding a polynomial time algorithm that provably goes beyond this would be a major achievement and it mirrors a similar problem in the context of dense Erdös-Rényi graphs, which has been open for more than four decades [15]. Our interest is a quantum algorithm, the QAOA. We will show that if pp, the depth of the QAOA is less than ϵ\epsilon log nn with ϵ\epsilon a small constant, then the QAOA fails to go as far as .854αd.854\alpha_{d} for dd large. For larger pp our arguments do not apply and we cannot say if the QAOA gets close to αd\alpha_{d} with pp say growing as a large constant times log nn. (Actually if we let pp grow fast enough with nn the QAOA will find the optimum [1].)

3 The Quantum Approximate Optimization Algorithm

We start by reviewing the QAOA with the graph problem Maximum Independent Set in mind. It is convenient here to work with bits that are 0,10,1 valued. Given a classical cost function C(𝒃)C({\boldsymbol{b}}) defined on nn-bit strings 𝒃=(b1,b2,,bn){0,1}n{\boldsymbol{b}}=(b_{1},b_{2},\ldots,b_{n})\in\{0,1\}^{n}, the QAOA is a quantum algorithm that aims to find a string 𝒃{\boldsymbol{b}} such that C(𝒃)C({\boldsymbol{b}}) is close to its absolute maximum. The graph-dependent cost function CC can be written as an operator that is diagonal in the computational basis, defined as

C|𝒃=C(𝒃)|𝒃.C\left|{\boldsymbol{b}}\right\rangle=C({\boldsymbol{b}})\left|{\boldsymbol{b}}\right\rangle. (3)

We only consider “local” cost functions, that is, those that only have interactions between qubits that are connected on the instance graph. The problem dependent unitary operator depends on CC and a single parameter γ\gamma

U(C,γ)=eiγC.\displaystyle U(C,\gamma)=e^{-i\gamma C}. (4)

Note that U(C,γ)U(C,\gamma) conjugating a single qubit operator produces an operator that only involves that qubit and those connected to it on the graph.

The operator that induces transitions between strings uses

B=j=1nXj,\displaystyle B=\sum_{j=1}^{n}X_{j}, (5)

where XjX_{j} is the Pauli XX operator acting on qubit jj, and the associated unitary operator depends on a parameter β\beta

U(B,β)=eiβB=j=1neiβXj.\displaystyle U(B,\beta)=e^{-i\beta B}=\prod_{j=1}^{n}e^{-i\beta X_{j}}. (6)

Note that U(B,β)U(B,\beta) conjugating a single qubit rotates that qubit and has no effect on other qubits.

We initialize the system of qubits in a product state such as

|s=|0n\displaystyle\left|s\right\rangle=\left|0\right\rangle^{\otimes n} (7)

or

|s=|+n=12n𝒃|𝒃.\left|s\right\rangle=\left|+\right\rangle^{\otimes n}=\frac{1}{\sqrt{2^{n}}}\sum_{{\boldsymbol{b}}}\left|{\boldsymbol{b}}\right\rangle. (8)

Using a product state for the initial state is the usual choice for the QAOA and is required for the arguments below.

We alternately apply pp layers of U(C,γ)U(C,\gamma) and U(B,β)U(B,\beta). Let 𝜸=γ1,γ2,,γp\boldsymbol{\gamma}=\gamma_{1},\gamma_{2},\ldots,\gamma_{p} and 𝜷=β1,β2,,βp\boldsymbol{\beta}=\beta_{1},\beta_{2},\ldots,\beta_{p}. The QAOA circuit prepares the unitary operator

U=U(B,βp)U(C,γp)U(B,β1)U(C,γ1)\displaystyle U=U(B,\beta_{p})U(C,\gamma_{p})\cdots U(B,\beta_{1})U(C,\gamma_{1}) (9)

which acting on the initial state gives

|𝜸,𝜷=U|s.\displaystyle\left|\boldsymbol{\gamma},\boldsymbol{\beta}\right\rangle=U\left|s\right\rangle. (10)

The associated QAOA objective function is

𝜸,𝜷|C|𝜸,𝜷.\displaystyle\left\langle\boldsymbol{\gamma},\boldsymbol{\beta}\right|C\left|\boldsymbol{\gamma},\boldsymbol{\beta}\right\rangle. (11)

By repeatedly measuring the quantum state |𝜸,𝜷\left|\boldsymbol{\gamma},\boldsymbol{\beta}\right\rangle in the computational basis, one will find a bit string 𝒃{\boldsymbol{b}} such that C(𝒃)C({\boldsymbol{b}}) is near (11) or better. Different strategies have been developed to find optimal (𝜸,𝜷)(\boldsymbol{\gamma},\boldsymbol{\beta}) for any given instance [16, 17]. But here we will show that under certain circumstances no set of parameters (𝜸,𝜷)(\boldsymbol{\gamma},\boldsymbol{\beta}) can achieve a certain level of success so we need not concern ourselves with optimal parameters. So from now on we denote the state produced by the QAOA as |ψ\left|\psi\right\rangle.

4 Locality Properties of the QAOA

In this paper we are focusing on combinatorial search problem associated with random graphs of bounded average degree dd, so it it very unlikely that any vertex has degree much larger than dd. This means that the cost function unitary (4) conjugating say a single qubit operator typically produces an operator acting on no more than roughly dd qubits. The “driver” unitary in the form of (6) introduces no spreading at all. We also use for the initial state a product state which has no entanglement. With this form of the QAOA we can establish some general locality properties of the quantum state produced by the quantum circuit. What follows is not restricted to a particular computational problem or to random graphs.

4.1 Distant qubits

Consider an instance of some graph problem with its associated local cost function CC. The first property has to do with bits that are far away from each other on the graph. Define B(i,ri,r) as the set of vertices that are within a distance rr of the vertex ii. Let Dist(i,2pi,2p) be the complement of B(i,2p)i,2p), that is, it is the set of vertices at least 2p2p away from ii. We are assuming than 2p2p is small enough so that Dist(i,2pi,2p) is not empty. Let OiO_{i} be an operator acting on qubit ii tensored with the identity acting on all other qubits. Let OdistO_{\text{dist}} be an operator acting only on the qubits in Dist(i,2pi,2p). We now show that

s|UOiOdistU|s=s|UOiU|ss|UOdistU|s\left\langle s\right|U^{\dagger}O_{i}O_{\text{dist}}U\left|s\right\rangle=\left\langle s\right|U^{\dagger}O_{i}U\left|s\right\rangle\left\langle s\right|U^{\dagger}O_{\text{dist}}U\left|s\right\rangle (12)

as long as |s\left|s\right\rangle is a product state and UU is of the form (7) with a local cost function.

Proof of (12) . Because the QAOA is local we see that UOiUU^{\dagger}O_{i}U only involves qubits in B(i,pi,p). Because OdistO_{\text{dist}} only involves qubits that are 2p2p away from qubit ii we see that UOdistUU^{\dagger}O_{\text{dist}}U can only involve qubits in the complement of B(i,pi,p). Now

s|UOiOdistU|s=s|UOiUUOdistU|s\left\langle s\right|U^{\dagger}O_{i}O_{\text{dist}}U\left|s\right\rangle=\left\langle s\right|U^{\dagger}O_{i}UU^{\dagger}O_{\text{dist}}U\left|s\right\rangle (13)

and we will insert between UU and UU^{\dagger} a complete set with qubits in B(i,pi,p) and its complement. Call ``near"``\mathrm{near}" the set of bits in B(i,pi,p) and those in its complement “far\mathrm{far}”. Now the initial state is a product state which we can write as |s=|snear|sfar\left|s\right\rangle=\left|s_{\mathrm{near}}\right\rangle\left|s_{\mathrm{far}}\right\rangle. Insert a complete set in the middle of the right hand side of (13) and we get

=𝐯near𝐯farsnear|sfar|UOiU|𝐯near|𝐯far𝐯near|𝐯far|UOdistU|snear|sfar\displaystyle=\sum_{\bf{v}_{\mathrm{near}}}\sum_{\bf{v}_{\mathrm{far}}}\left\langle s_{\mathrm{near}}\right|\left\langle s_{\mathrm{far}}\right|U^{\dagger}O_{i}U\left|\bf{v}_{\mathrm{near}}\right\rangle\left|\bf{v}_{\mathrm{far}}\right\rangle\left\langle\bf{v}_{\mathrm{near}}\right|\left\langle\bf{v}_{\mathrm{far}}\right|U^{\dagger}O_{\mathrm{dist}}U\left|s_{\mathrm{near}}\right\rangle\left|s_{\mathrm{far}}\right\rangle (14)

where the basis set |𝐯near\left|\bf{v}_{\mathrm{near}}\right\rangle contains |snear\left|s_{\mathrm{near}}\right\rangle and the basis set |𝐯far\left|\bf{v}_{\mathrm{far}}\right\rangle contains |sfar\left|s_{\mathrm{far}}\right\rangle. Now the UOiUU^{\dagger}O_{i}U term collapses the sum on 𝐯far\bf{v}_{\mathrm{far}} and the UOdistUU^{\dagger}O_{\text{dist}}U term collapses the sum on 𝐯near\bf{v}_{\mathrm{near}} and we get

snear|sfar|UOiU|snear|sfarsnear|sfar|UOdistU|snear|sfar\left\langle s_{\mathrm{near}}\right|\left\langle s_{\mathrm{far}}\right|U^{\dagger}O_{i}U\left|s_{\mathrm{near}}\right\rangle\left|s_{\mathrm{far}}\right\rangle\left\langle s_{\mathrm{near}}\right|\left\langle s_{\mathrm{far}}\right|U^{\dagger}O_{\mathrm{dist}}U\left|s_{\mathrm{near}}\right\rangle\left|s_{\mathrm{far}}\right\rangle (15)

which results in (12).

In terms of the state |ψ\left|\psi\right\rangle produced by the QAOA (12) says that

ψ|OiOdist|ψ=ψ|Oi|ψψ|Odist|ψ.\displaystyle\left\langle\psi\right|O_{i}O_{\mathrm{dist}}\left|\psi\right\rangle=\left\langle\psi\right|O_{i}\left|\psi\right\rangle\left\langle\psi\right|O_{\mathrm{dist}}\left|\psi\right\rangle. (16)

Again OiO_{i} is any operator acting on qubit ii and OdistO_{\mathrm{dist}} is any operator acting on qubits at least 2p2p away from ii and we see that the measurement outcomes of the two operators are independent. In particular measurement in the state |ψ\left|\psi\right\rangle of bit values at ii and in Dist are independent.

4.2 Far from an edge

For the next property imagine changing the cost function on a single edge. This locality property concerns the influence of this change on qubits that are far away from that edge. Consider two instances of some computational problem which differ only by the presence or absence of a single edge. Or perhaps they differ because the single edge is weighted differently in the two instances. Call the associated cost functions CC and CC^{\prime} which give rise to UU and UU^{\prime} through (4). Let the edge in question be between vertices ii and jj. Let Far(ij,p)(ij,p) be the complement of B(i,p)B(j,p)\mathrm{B}(i,p)\cup\mathrm{B}(j,p). We assume that pp is small enough that Far(ij,p)(ij,p) is not empty. Consider an operator OfarO_{\text{far}} that involves only qubits in Far(ij,p)(ij,p). Now for the depth pp algorithm, UOfarU\ U^{\dagger}O_{\text{far}}U does not involve the edge ijij so it is same with UU replaced by UU^{\prime} that is

UOfarU=UOfarU.\ U^{\prime{\dagger}}O_{\text{far}}U^{\prime}=\ U^{\dagger}O_{\text{far}}U. (17)

What this means is that the influence of the edge in question is limited to qubits within a distance pp of the edge. It also means that the probability of measuring a bit string in Far(ij,p)(ij,p) is unaffected by the change in the edge ijij. Let |ψ=U|s\left|\psi\right\rangle=U\left|s\right\rangle and |ψ=U|s\left|\psi^{\prime}\right\rangle=U^{\prime}\left|s\right\rangle so these are the states produced with the unmodified and modified edge sets. Consider bfar\textbf{b}_{\text{far}} which is the bit values of the set of bits in Far(ij,p)(ij,p). Let Ofar=|bfarbfar|O_{\text{far}}=\left|\textbf{b}_{\text{far}}\right\rangle\left\langle\textbf{b}_{\text{far}}\right| tensored with the identity on qubits in B(i,p)B(j,p)\mathrm{B}(i,p)\cup\mathrm{B}(j,p). Now write |𝒃=|𝒃near|𝒃far\left|{\boldsymbol{b}}\right\rangle=\left|{\boldsymbol{b}}_{\text{near}}\right\rangle\left|{\boldsymbol{b}}_{\text{far}}\right\rangle and take the expectation of (17) in the state |𝒃\left|{\boldsymbol{b}}\right\rangle. Keep bfar\textbf{b}_{\text{far}} fixed and sum on bnear\textbf{b}_{\text{near}} to get

𝒃near|𝒃near|bfar|ψ|2=𝒃near|𝒃near|bfar|ψ|2.\sum_{{\boldsymbol{b}}_{\mathrm{near}}}\big{|}\langle{{\boldsymbol{b}}_{\text{near}}}\big{|}\left\langle\textbf{b}_{\text{far}}\right|\psi\rangle\big{|}^{2}=\sum_{{\boldsymbol{b}}_{\mathrm{near}}}\big{|}\left\langle{\boldsymbol{b}}_{\text{near}}\right|\left\langle\textbf{b}_{\text{far}}\right|\psi^{\prime}\rangle\big{|}^{2}. (18)

This means that the probability of measuring the bit string bfar\textbf{b}_{\text{far}} in Far(ij,p)(ij,p) is unaffected by the edge change.

4.3 Concentration of Hamming weight

Again we are considering the QAOA with a local cost function associated with a graph GG, a one local driver operator such as in (6), and a product state for |s\left|s\right\rangle. For fixed pp each vertex ii has a neighborhood B(i,2p)i,2p). We take pp small enough that the maximum size of these neighborhoods is less than nAn^{A} for some A<1A<1. Run the QAOA to get the state |ψ\left|\psi\right\rangle and measure in the computational basis to get a bit string. We now show that the Hamming weight of these measured bit strings concentrates in that, for sufficiently large nn, each measurement produces the same Hamming weight with a variance that is o(n)o(n).

Let the Hamming weight operator be

W=ibi.W=\sum_{i}b_{i}. (19)

The measurement variance of WW is

ψ|W2|ψψ|W|ψ2\left\langle\psi\right|W^{2}\left|\psi\right\rangle-\left\langle\psi\right|W\left|\psi\right\rangle^{2} (20)

which breaks into n2n^{2} terms

ij[ψ|bibj|ψψ|bi|ψψ|bj|ψ]].\sum_{i}\sum_{j}\Big{[}\left\langle\psi\right|b_{i}b_{j}\left|\psi\right\rangle-\left\langle\psi\right|b_{i}\left|\psi\right\rangle\left\langle\psi\right|b_{j}\left|\psi\right\rangle]~{}\Big{]}. (21)

Now for a fixed ii consider the sum on jj. If jj is more than 2p2p away from ii we can use (12) to see that this term is 0. So the only contributions can come from the nAn^{A} nearby qubits and we bound the variance by n(1+A)n^{(1+A)}. The expected value of the Hamming weight scales with nn so the distribution concentrates.

However we need a stronger result for our arguments. Let GG be a graph with nn vertices and as before take pp small enough that the maximum size of B(i,2pi,2p) is less than nAn^{A} for some A<1A<1 with high probability. We are thinking of nn as large. For a given QAOA circuit we make the state |ψ\left|\psi\right\rangle and measure the Hamming weight. Call the observed value WobsW_{\text{obs}}. Then there exists γ>0\gamma>0 such that for every δ>0\delta>0 and for nn large enough

G[|Wobsψ|W|ψ|δn]eδnγ,\displaystyle\mathbb{P}_{G}\Big{[}~{}\big{|}W_{\text{obs}}-\left\langle\psi\right|W\left|\psi\right\rangle\big{|}\geq\delta n\Big{]}\leq e^{-\delta n^{\gamma}}, (22)

where the graph is fixed and the probability is over measurements of WW in the state |ψ\left|\psi\right\rangle. We are going to prove this in section 8 and also apply it to random graphs.

5 QAOA applied to Maximum Independent Set

We have discussed the QAOA in general and here we specify it for MIS. For any graph we are looking for a big independent set, that is, a string with a large Hamming weight which corresponds to an independent set. If we choose for the cost function the Hamming weight given by (19) we will easily discover strings with a big Hamming weight but they typically will not be independent sets of the input graph. So we also consider the Independent Set cost function:

CIS=ijbibjC_{\text{IS}}=\sum_{\langle ij\rangle}b_{i}b_{j} (23)

where the sum is only over edges in the graph so CISC_{\text{IS}} is local. We want a big Hamming weight WW and CISC_{\text{IS}} to be as small as possible. So for the objective cost function consider:

Cobj=WCIS.C_{\text{obj}}=W-C_{\text{IS}}. (24)

When we run the QAOA our goal is to make CobjC_{\text{obj}} big. But the cost function that appears in (4) need not be CobjC_{\text{obj}}. We can for starters take the cost function that appears in (4) to be CISC_{\text{IS}} with the goal of making the quantum expectation of CobjC_{\text{obj}} big. Regardless of what we take to drive this local QAOA, the strings that are output will not be independent sets of the associated graph. However these strings can be pruned to produce independent sets.

Suppose the quantum algorithm outputs a string with a positive value of CobjC_{\text{obj}}. By pruning we can produce an independent set of size at least this value. To see this consider the set of graph edges that exist between any of the vertices associated with 11’s in the output string. Call the number of these edges NEN_{E}. Now for each of these edges pick one of the two associated vertices at random and remove it from the string. This reduces the Hamming weight by at most NEN_{E} and reduces CISC_{\text{IS}} by NEN_{E} so CobjC_{\text{obj}} can not go down. This also shows that the maximum of CobjC_{\text{obj}} is the size of the largest independent set. We call the QAOA augmented by this pruning the QAOA+. Note that the pruning process is done randomly and respects the locality of the underlying graph. When we run the QAOA+ at depth pp we mean that the QAOA is run at depth p1p-1 and the pruning is the last layer.

We now show that the shallowest depth version of the QAOA will produce a string with the objective function value being near 1.02n/d1.02n/d for large nn. Here we pick for the starting state a rotation away from the all zeros state so that the initial Hamming weight is not zero. Introduce a parameter θ\theta and for |s\left|s\right\rangle take

|s=U(B,θ)|0\left|s\right\rangle=U(B,\theta)\left|0\right\rangle (25)

so the QAOA state is given by the three parameter unitary:

|ψ=U(B,β)U(CIS,γ)U(B,θ)|0\left|\psi\right\rangle=U(B,\beta)~{}U(C_{\text{IS}},\gamma)~{}U(B,\theta)\left|0\right\rangle (26)

which we call the QAOA with p=1.5p=1.5. Consider the simple case of γ=0\gamma=0 so the two rotations combine to be one rotation by θ+β\theta+\beta. This brings the state |0\left|0\right\rangle to one where each bit bib_{i} has expected value sin2(θ+β)\sin^{2}(\theta+\beta). Now the expected value of CobjC_{\text{obj}} is n[sin2(θ+β)(d/2)sin4(θ+β)]n[\sin^{2}(\theta+\beta)-(d/2)\sin^{4}(\theta+\beta)] whose maximum is 1/2d1/2d at sin(θ+β)=1/d\sin(\theta+\beta)=1/\sqrt{d}. Letting γ\gamma vary can only improve this.

For arbitrary θ\theta, γ\gamma and β\beta we can evaluate the expectation of CobjC_{\text{obj}} in the state |ψ\left|\psi\right\rangle by averaging over instances. It is best to write (23) as

CIS=ijJijbibjC_{\text{IS}}=\sum_{ij}J_{ij}b_{i}b_{j} (27)

where each JijJ_{ij} is 11 with probability d/nd/n and 0 with probability 1d/n1-d/n. Note that this is not exactly the same as the distribution we analyze in the rest of the paper which is a fixed number of edges nd/2nd/2. But for the purposes of this calculation including edges with probability d/nd/n is more straightforward. In fact we can do the average over JJ and get the expected value over graphs of the quantum expectation of the objective function

Ex[ψ|Cobj|ψ]\displaystyle\mathrm{Ex}\big{[}\left\langle\psi\right|C_{\text{obj}}\left|\psi\right\rangle\big{]} (28)

as nn times an explicit function of dd and the parameters θ\theta, γ\gamma and β\beta. For each dd we can optimize numerically. For d=3d=3 we get .969n/3.969n/3. We see for large dd that (27) approaches 1.02n/d1.02n/d and the parameters θ\theta and β\beta go down as d\sqrt{d}. Pulling out a 1/d1/d and rescaling θ\theta and β\beta we have a function that has a limit as dd goes to infinity. The optimum of this function is 1.021.02.

What we have shown is that there is a form of the QAOA that we call the QAOA+ which at its lowest depth finds an independent set whose size is at least a constant times n/dn/d. There are simple classical algorithms that can beat this. Our goal here was to show that the lowest depth QAOA+ can be analyzed on random instances and that it is a good starting point for understanding the QAOA at higher depth.

6 The Overlap Gap Property

We now describe the Overlap Gap Property which is the key to showing the limitation of the QAOA on random graphs. For random graphs, this property is satisfied by independent sets with size larger than a certain multiplicative factor η(0,1)\eta^{*}\in(0,1) away from optimality. Specifically, this will be done for

η=12+122=.853.\displaystyle\eta^{*}={1\over 2}+{1\over 2\sqrt{2}}=.853.... (29)

For each η(0,1)\eta\in(0,1) and a graph GG on nn nodes let

(η,G)={σ(G):|σ|nηαd}\displaystyle\mathcal{I}(\eta,G)=\{\sigma\in\mathcal{I}(G):|\sigma|\geq n\eta\alpha_{d}\} (30)

where (G)\mathcal{I}(G) is the set of all independent sets. That is, (η,G)\mathcal{I}(\eta,G) is the set of independent sets in GG which are of size at least η\eta times the asymptotic optimal. We call these large independent sets η\eta-optimal. For any two nn node graphs G1G_{1} and G2G_{2}, and for every 0<τη0<\tau\leq\eta, let 𝒪Ab(η,G1,G2,τ)\mathcal{O}^{Ab}(\eta,G_{1},G_{2},\tau) denote the set of pairs σ1,σ2\sigma_{1},\sigma_{2} such that σj(η,Gj),j=1,2\sigma_{j}\in\mathcal{I}(\eta,G_{j}),j=1,2 and

|σ1σ2|nταd,\displaystyle|\sigma_{1}\cap\sigma_{2}|\geq n\tau\alpha_{d}, (31)

that is, it is the set of pairs of η\eta-optimal independent sets whose intersection size normalized by the asymptotic size of the largest independent set is above τ\tau. Similarly, let 𝒪Be(η,G1,G2,τ)\mathcal{O}^{Be}(\eta,G_{1},G_{2},\tau) denote the set of pairs σ1,σ2\sigma_{1},\sigma_{2} such that σj(η,Gj),j=1,2\sigma_{j}\in\mathcal{I}(\eta,G_{j}),j=1,2 and

|σ1σ2|nταd,\displaystyle|\sigma_{1}\cap\sigma_{2}|\leq n\tau\alpha_{d}, (32)

that is, it is the set of pairs of η\eta-optimal independent sets whose intersection size normalized by the asymptotic size of the largest independent set is below τ\tau.

Let G0G_{0} and GmG_{m} be chosen independently with distribution 𝔾(n,dn2)\mathbb{G}(n,{dn\over 2}). We are going to introduce an interpolation between these two graphs. Let (i1a,j1a),,(ima,jma)(i^{a}_{1},j^{a}_{1}),\ldots,(i^{a}_{m},j^{a}_{m}) with a=0,ma=0,m be the corresponding sets of mm edges of the two graphs. For every t=0,1,2,,mt=0,1,2,\ldots,m consider an interpolating random graph GtG_{t} with edges

(i10,j10),,(imt0,jmt0),(imt+1m,jmt+1m),,(imm,jmm).\displaystyle(i^{0}_{1},j^{0}_{1}),\ldots,(i^{0}_{m-t},j^{0}_{m-t}),(i^{m}_{m-t+1},j^{m}_{m-t+1}),\ldots,(i^{m}_{m},j^{m}_{m}). (33)

GtG_{t} uses the first mtm-t edges from G0G_{0} and the remaining tt edges from GmG_{m}. For each fixed tt, the graph GtG_{t} is distributed as 𝔾(n,m)\mathbb{G}(n,m), modulo the potential repetitions of edges. With high probability, the total number of edge repetitions is O(1)O(1) with respect to nn so they can be ignored.

Theorem: Overlap Gap Property For every η>η\eta>\eta^{*} there exists 0<τ1<τ2<η,d00<\tau_{1}<\tau_{2}<\eta,d_{0} and c>0c>0 such that for all d>d0d>d_{0}

Prob[0t1,t2ms.t.𝒪Ab(η,Gt1,Gt2,τ1)𝒪Be(η,Gt1,Gt2,τ2)]exp(cn),\displaystyle\mathrm{Prob}\Big{[}\exists\quad 0\leq t_{1},t_{2}\leq m\quad s.t.\quad\mathcal{O}^{Ab}(\eta,G_{t_{1}},G_{t_{2}},\tau_{1})\cap\mathcal{O}^{Be}(\eta,G_{t_{1}},G_{t_{2}},\tau_{2})\neq\emptyset\Big{]}\leq\exp(-cn), (34)

and

Prob[𝒪Ab(η,G0,Gm,τ1)]exp(cn).\displaystyle\mathrm{Prob}\Big{[}\mathcal{O}^{Ab}(\eta,G_{0},G_{m},\tau_{1})\neq\emptyset\Big{]}\leq\exp(-cn). (35)

for all large enough nn.

The theorem makes two claims. First it says that across all pairs Gt1,Gt2G_{t_{1}},G_{t_{2}} of graphs in the interpolating sequence, every η\eta-optimal independent set σ1\sigma_{1} in Gt1G_{t_{1}} and every η\eta-optimal independent set σ2\sigma_{2} in Gt2G_{t_{2}} have normalized intersection either at most τ1\tau_{1} or at least τ2\tau_{2}, except for an exponentially small probability. There is essentially no middle ground of pairs whose normalized intersection size is between τ1\tau_{1} and τ2\tau_{2}. This is the Overlap Gap Property. The second says, for two independent random graphs sampled from 𝔾(n,dn2)\mathbb{G}(n,{dn\over 2}) all corresponding pairs of large independent sets have normalized intersection at most τ1\tau_{1}. For a proof and further discussion see [4].

7 Overlap Gap Property is an obstruction to the QAOA+

7.1 Main Result

Our main result is that the QAOA+, which is the QAOA augmented by pruning to produce independent sets, applied to random graphs of average degree dd will fail to find an independent set close to optimal for pp less than a constant times logn\log n. More precisely:

Obstruction Theorem

For every w<1w<1 and η>η\eta>\eta^{*}, there is a γ>0\gamma>0 and a d0d_{0} such that for d>d0d>d_{0} we have: If the QAOA+ is run on a 𝔾(n,dn2)\mathbb{G}(n,{dn\over 2}) graph with

2pwlognlog(d/ln2),\displaystyle 2p\leq{w\log n\over\log(d/\ln{2})}, (36)

then the probability that the algorithm outputs an independent set of size at least ηαdn\eta\alpha_{d}n is at most enγe^{-n^{\gamma}} for all nn sufficiently large.

We will prove this after stating some preliminary results. The first concerns the size of the neighborhoods of vertices in random graphs with m=nd2m={nd\over 2} edges.

7.2 Preliminary Results

Neighborhood Size Theorem

Fix d>1d>1, and w<1w<1. If

2pwlognlog(d/ln2),\displaystyle 2p\leq{w\log n\over\log(d/\ln{2})}, (37)

then there exist a>0a>0 and A<1A<1 such that

Prob[maxiB(i,2p)nA]ena\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,2p)\geq n^{A}\Big{]}\leq e^{-n^{a}} (38)
and\displaystyle and~{}~{}~{} Prob[maxiB(i,p)nA/2]ena/2.\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,p)\geq n^{A/2}\Big{]}\leq e^{-n^{a/2}}. (39)

Here Prob is with respect to the graph distribution. What this says is that if pp is smaller than a certain constant times logn\log n, then in a random graph the neighborhood ball of each vertex contains a fraction of the vertices that vanishes as nn goes to infinity. We will prove this theorem in the next section.

The next result is an expansion of the purely quantum result in section 4.2 to the QAOA+ which is the depth p1p-1 QAOA run with pruning so all outputs are independent sets. The result (18) says that if a single edge in a graph is modified then the measurement probabilities of qubits far away are unaffected by the edge change. This is a statement about the QAOA but the QAOA outputs strings that need to be pruned back to make independent sets. The pruning process is random and respects the locality of the underlying graph. Let G(σ)\mathbb{P}_{G}(\sigma) be the probability that the independent set σ\sigma is output by the QAOA+ running on graph GG. Here the symbol G\mathbb{P}_{G} means the graph GG is fixed and the randomness comes from the quantum measurement and the randomized pruning. So we now state

Far from an edge Lemma

Consider an arbitrary graph GG with GG^{\prime} obtained by adding a single edge (i,j)(i,j) to GG. Let Far\mathrm{Far} be the complement of B(i,p)B(j,p)\mathrm{B}(i,p)\cup\mathrm{B}(j,p) where pp is the depth of the QAOA+. Then for every σ{0,1}n\sigma\in\{0,1\}^{n}

σ^:σ^k=σk,kFarG(σ^)=σ^:σ^k=σk,kFarG(σ^).\displaystyle\sum_{\hat{\sigma}:\hat{\sigma}_{k}=\sigma_{k},k\in\mathrm{Far}}\mathbb{P}_{G}(\hat{\sigma})=\sum_{\hat{\sigma}:\hat{\sigma}_{k}=\sigma_{k},k\in\mathrm{Far}}\mathbb{P}_{G^{\prime}}(\hat{\sigma}). (40)

Here we are assuming that pp is small enough that Far is not empty. The proposition says that the total probability of independent sets which “agree” with σ\sigma on the node set Far, that is those nodes far from the newly added edge, remains the same after the addition of the edge. The proposition is the direct implication of the local property (18) augmented by the fact that pruning is local.

Our next result is concerns the concentration of Hamming weight discussed in section 4.3 now extended to the QAOA+.

Concentration Theorem

1. Let GG be a graph with nn vertices and dn2{dn\over 2} edges. Suppose pp is chosen such that maxi|BGn(i,2p)|nA\max_{i}|\mathrm{B}_{G_{n}}(i,2p)|\leq n^{A} for some 0<A<10<A<1. Let σ\sigma be the output of the QAOA+. Then there exists γ1>0\gamma_{1}>0 such that for all δ>0\delta>0

G[σ|𝔼G|σδn]eδnγ1\displaystyle\mathbb{P}_{G}\Big{[}~{}\big{|}|\sigma|-\mathbb{E}_{G}|\sigma|\big{|}\geq\delta n~{}\Big{]}\leq e^{-\delta n^{\gamma_{1}}} (41)

for nn large enough. Here the graph is fixed and the randomness comes from the QAOA+.

2. Now let GG be a random 𝔾(n,dn2)\mathbb{G}(n,{dn\over 2}) graph. Suppose for some a>0a>0,

Prob[maxiB(i,2p)nA]ena\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,2p)\geq n^{A}\Big{]}\leq e^{-n^{a}}\ (42)
and\displaystyle and~{}~{}~{} Prob[maxiB(i,p)nA/2]ena/2.\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,p)\geq n^{A/2}\Big{]}\leq e^{-n^{a/2}}. (43)

Then there exists γ2>0\gamma_{2}>0 such that for any 0<δ<10<\delta<1,

Prob[σ|Ex|σδn]eδ2nγ2\displaystyle\mathrm{Prob}\Big{[}~{}\big{|}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq\delta n~{}\Big{]}\leq e^{-{\delta^{2}}n^{\gamma_{2}}} (44)

for nn large enough. Here Prob\mathrm{Prob} and Ex\mathrm{Ex} refer to Probability and Expectation coming from the distribution of random graphs combined with the randomness coming from the QAOA+.

This key concentration result will be proven in the next section. We now prove our main result.

7.3 Proof of Obstruction Theorem

Choose w<1w<1. Find A<1A<1 and aa as per the Neighborhood Size Theorem. Consider the interpolation Gt,0tmG_{t},0\leq t\leq m described in the previous section. Denoting by BGt(i,p)\mathrm{B}_{G_{t}}(i,p) the neighborhood of node ii in graph GtG_{t}, we have by the union bound

Prob[maxtmaxi|BGt(i,p)|nA/2]dn2ena/2ena,\displaystyle\mathrm{Prob}\left[\max_{t}\max_{i}|\mathrm{B}_{G_{t}}(i,p)|\geq n^{A/2}\right]\leq{dn\over 2}e^{-n^{a/2}}\leq e^{-n^{a^{\prime}}}, (45)

where aa^{\prime} is any constant smaller than a2a\over 2 and nn is large enough. Let D=maxtmaxi|BGt(i,p)|D=\max_{t}\max_{i}|\mathrm{B}_{G_{t}}(i,p)|. On the sequence of graphs GtG_{t} we construct a coupled sequence of independent sets σt,0tm\sigma_{t},0\leq t\leq m, where for each fixed tt, σt\sigma_{t} is a single independent set coming from the distribution of the output of the QAOA+ on the random graph GtG_{t}. The set-theoretic difference Δ\Delta of σt\sigma_{t} and σt+1\sigma_{t+1} will be seen to satisfy

|σtΔσt+1|4D.\displaystyle|\sigma_{t}\Delta\sigma_{t+1}|\leq 4D. (46)

By “construct” we do not mean an algorithmically efficient construction, rather we show that such a coupled sequence exists. First produce a single sample σ0=(σ0,1,,σ0,n)\sigma_{0}=(\sigma_{0,1},\ldots,\sigma_{0,n}) by running the QAOA+ on G0G_{0}. Next, recall that G1G_{1} is obtained from G0G_{0} by deleting G0G_{0}’s last edge (im0,jm0)(i^{0}_{m},j^{0}_{m}) and adding GmG_{m}’s last edge (imm,jmm)(i^{m}_{m},j^{m}_{m}). Consider the set of nodes

S=BG0(im0,p)BG0(jm0,p)BG1(imm,p)BG1(jmm,p).\displaystyle S=\mathrm{B}_{G_{0}}(i^{0}_{m},p)\cup\mathrm{B}_{G_{0}}(j^{0}_{m},p)\cup\mathrm{B}_{G_{1}}(i^{m}_{m},p)\cup\mathrm{B}_{G_{1}}(j^{m}_{m},p). (47)

Note that SS has size at most 4D4D. Find a sample σ1\sigma_{1} according to the conditional distribution

G1(σ1|σ1,i=σ0,iiS).\displaystyle\mathbb{P}_{G_{1}}(\sigma_{1}|\sigma_{1,i}=\sigma_{0,i}~{}i\notin S). (48)

As a result the cardinality of σ0Δσ1\sigma_{0}\Delta\sigma_{1} is at most 4D4D. In the same fashion, we produce samples σ2,,σm\sigma_{2},\ldots,\sigma_{m}, where each σt\sigma_{t} is found by conditioning on σt1\sigma_{t-1} similarly. Again, |σtΔσt+1|4D|\sigma_{t}\Delta\sigma_{t+1}|\leq 4D for all t=0,,m1t=0,\ldots,m-1.

Next, we claim that σ1\sigma_{1} is distributed as G1\mathbb{P}_{G_{1}}. Indeed for every σ1{0,1}n\sigma_{1}\in\{0,1\}^{n}, its probability mass according to this sampling procedure is

σ0:σ0i=σ1i,iSG0(σ0)[G1(σ1)σ:σi=σ1i,iSG1(σ)]=G1(σ1)\displaystyle\sum_{\sigma_{0}:\sigma_{0i}=\sigma_{1i},i\notin S}\mathbb{P}_{G_{0}}(\sigma_{0})\Bigg{[}\frac{\mathbb{P}_{G_{1}}(\sigma_{1})}{\sum_{\sigma:\sigma_{i}=\sigma_{1i},i\notin S}\mathbb{P}_{G_{1}}(\sigma)}\Bigg{]}=\mathbb{P}_{G_{1}}(\sigma_{1}) (49)

by property (40). Similarly, we have that σt\sigma_{t} is distributed as Gt(σt)\mathbb{P}_{G_{t}}(\sigma_{t}). We have shown that our desired coupled sequence of m+1m+1 independent sets exists.

Note that Ex[|σt|]\mathrm{Ex}[|\sigma_{t}|] is independent of tt, where as before the expectation is both with respect to the randomness of GtG_{t} and the QAOA+. We claim that for every μ>0\mu>0

Ex[|σt|](1+μ)nηαd,\displaystyle\mathrm{Ex}[|\sigma_{t}|]\leq(1+\mu)n\eta^{*}\alpha_{d}, (50)

for all large enough nn. Observe that by the Concentration Theorem it suffices to show (50) in order to prove our main result which is that the QAOA+ fails to produce independent sets that are bigger than η\eta^{*}-optimal.

Assume to the contrary, that (50) is violated for infinitely many nn. For any such nn, given GtG_{t}, σt\sigma_{t} has the distribution Gt\mathbb{P}_{G_{t}}, and GtG_{t} has the 𝔾(n,m)\mathbb{G}(n,m) distribution. Taking δ=μ2ηαd\delta={\mu\over 2}\eta^{*}\alpha_{d} in (44) and then using the union bound, we have

Prob[mint|σt|(1+μ/2)nηαd]1enγ3,\displaystyle\mathrm{Prob}\left[\min_{t}|\sigma_{t}|\geq(1+\mu/2)n\eta^{*}\alpha_{d}\right]\geq 1-e^{-n^{\gamma_{3}}}, (51)

for some γ3>0\gamma_{3}>0. Let η^=(1+μ/2)η\hat{\eta}=(1+\mu/2)\eta^{*}. Find 0<τ1<τ2<η^0<\tau_{1}<\tau_{2}<\hat{\eta} from the Overlap Gap Theorem with respect to η^\hat{\eta}. Then because τ2<η^\tau_{2}<\hat{\eta} we have

Prob[mint|σt|nτ2αd]1enγ3.\displaystyle\mathrm{Prob}\left[\min_{t}|\sigma_{t}|\geq n\tau_{2}\alpha_{d}\right]\geq 1-e^{-n^{\gamma_{3}}}. (52)

By (51) and the second part of the Overlap Gap Theorem then

Prob[|σ0σm|nτ1αd]1enγ3ecn.\displaystyle\mathrm{Prob}\big{[}~{}|\sigma_{0}\cap\sigma_{m}|\leq n\tau_{1}\alpha_{d}~{}\big{]}\geq 1-e^{-n^{\gamma_{3}}}-e^{-cn}. (53)

Now let us track the intersection |σ0σt||\sigma_{0}\cap\sigma_{t}| as tt goes from 0 to mm. For tt at 0 this is bigger than nτ2αdn\tau_{2}\alpha_{d} but at t=mt=m it is less than nτ1αdn\tau_{1}\alpha_{d}. However by the Overlap Gap Theorem we know there is (with high probability) no middle ground so there is some TT where |σ0σT||\sigma_{0}\cap\sigma_{T}| is big but |σ0σT+1||\sigma_{0}\cap\sigma_{T+1}| is small. As a general property of sets we have

||σ0σT||σ0σT+1|||σTΔσT+1|.\big{|}|\sigma_{0}\cap\sigma_{T}|-|\sigma_{0}\cap\sigma_{T+1}|\big{|}\leq\big{|}\sigma_{T}\Delta\sigma_{T+1}\big{|}. (54)

Using (46) we get nαd(τ2τ1)4Dn\alpha_{d}(\tau_{2}-\tau_{1})\leq 4D which is a contradiction for large enough nn because DnA/2D\leq n^{A/2}, A/2<1/2<1A/2<1/2<1, with high probability, see (45). This contradiction shows that the Obstruction Theorem is true.

8 Proofs of Neighborhood Size and Concentration Results

8.1 Neighborhood Size

We begin by restating and then proving the Neighborhood Size Theorem. This result concerns random graphs and makes no reference to the quantum algorithm.

Neighborhood Size Theorem

Fix d>1d>1, and w<1w<1. If

2pwlognlog(d/ln2)\displaystyle 2p\leq{w\log n\over\log(d/\ln{2})} (55)

then there exist a>0a>0 and A<1A<1 such that

Prob[maxiB(i,2p)nA]ena\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,2p)\geq n^{A}\Big{]}\leq e^{-n^{a}}\ (56)
and\displaystyle and~{}~{}~{} Prob[maxiB(i,p)nA/2]ena/2.\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,p)\geq n^{A/2}\Big{]}\leq e^{-n^{a/2}}. (57)

To prove this we consider a branching process where each parent has Poisson(dd) children. Let Zk=Z_{k}= the size of the kkth generation with Z0=1Z_{0}=1. Let

ϕk(t)=E[etZk]\displaystyle\phi_{k}(t)=\mathrm{E}[e^{tZ_{k}}] (58)

where the expectation E is with respect to the Poisson process. We have

ϕ0(t)\displaystyle\phi_{0}(t) =et\displaystyle=e^{t}
ϕ1(t)\displaystyle\phi_{1}(t) =ed(et1),\displaystyle=e^{d(e^{t}-1)}, (59)

the Poisson moment generation function, and generally

ϕk+1(t)=ed(ϕk(t)1).\displaystyle\phi_{k+1}(t)=e^{d(\phi_{k}(t)-1)}. (60)

We first show

ϕk((ln2/d)k)e for anyk0.\displaystyle\phi_{k}(({\ln{2}/d})^{k})\leq e\ \text{ for any}\ k\geq 0. (61)

Assume by induction on jj, that

ϕj((ln2/d)k)e(ln2/d)kj\displaystyle\phi_{j}(({\ln{2}/d})^{k})\leq e^{(\ln{2}/d)^{k-j}} (62)

which we show as follows. This holds for j=0j=0, and

ϕj+1((ln2/d)k)\displaystyle\phi_{j+1}(({\ln{2}/d})^{k}) =ed(ϕj((ln2/d)k)1)\displaystyle=e^{d(\phi_{j}(({\ln{2}/d})^{k})-1)}
ed(e(ln2/d)kj1)\displaystyle\leq e^{d(e^{({\ln{2}/d})^{k-j}}-1)} (63)

by hypothesis. Using (exln21)x(e^{x\ln{2}}-1)\leq x for 0x10\leq x\leq 1, this is

ed(ln2)kj1dkj=e(ln2/d)k(j+1).\displaystyle\leq e^{d(\ln{2})^{k-j-1}\over d^{k-j}}=e^{(\ln{2}/d)^{k-(j+1)}}. (64)

Now Markov’s inequality says that for any tt and uu

P[Zku(d/ln2)k]\displaystyle\mathrm{P}\left[Z_{k}\geq u\left({d/\ln{2}}\right)^{k}\right] eu(d/ln2)ktϕk(t)\displaystyle\leq e^{-u\left(d/\ln{2}\right)^{k}t}\phi_{k}(t)
eue\displaystyle\leq e^{-u}e (65)

which we get by choosing t=(ln2/d)kt=(\ln{2}/d)^{k}. Note that P is the probability associated with the Poisson branching process. We will pick uu to make this small, but first we need to bound Z1+Z2++Zk=BkZ_{1}+Z_{2}+\ldots+Z_{k}=B_{k} since this is what we compare with the graph neighborhood. Note that

P[Bkλ]P[Z1λk]+P[Z2λk]++P[Zkλk].\displaystyle\mathrm{P}\Big{[}B_{k}\geq\lambda\Big{]}\leq\mathrm{P}\left[Z_{1}\geq{\lambda\over k}\right]+\mathrm{P}\left[Z_{2}\geq{\lambda\over k}\right]+\ldots+\mathrm{P}\left[Z_{k}\geq{\lambda\over k}\right]. (66)

Choose λ=dsk(d/ln2)k\lambda=d^{sk}\left(d/\ln{2}\right)^{k} and u=dskku={d^{sk}\over k} with ss to be determined later. Then (for kk large enough, using d>1d>1)

P[Bkdsk(d/ln2)k]kedskke\displaystyle\mathrm{P}\left[B_{k}\geq d^{sk}\left(d/\ln{2}\right)^{k}\right]\leq k\,e^{-{d^{sk}\over k}}\,e (67)

(since the moment generating function for ZkZ_{k} is the biggest) which we can write as

P[Bkdsk(d/ln2)k]edsk/2\displaystyle\mathrm{P}\left[B_{k}\geq d^{sk}\left(d/\ln{2}\right)^{k}\right]\leq e^{-{d^{sk/2}}} (68)

for kk large enough.

This bound applies to Poisson branching. To make contact with our graph neighborhoods we first compare the branching process to Erdos-Renyi graphs and then to our random graphs with a fixed number of edges. In the Erdos-Renyi graph, each vertex has Binomial(n1,dn1)(n-1,{d\over n-1}) neighbors which as nn goes to infinity is Poisson(dd). For finite nn the moment generating function of this Binomial is less than the moment generating function of the Poisson, so the kk-neighborhood of a vertex in the random Erdos-Renyi graph satisfies the same bound as the branching process. Let fyf_{y} be the probability that (say) vertex ii in the Erdos-Renyi graph has its kk-neighborhood at least as big as dsk(d/ln2)kd^{sk}({d/\ln{2}})^{k}, conditioned on the graph having exactly yy edges. (The number of edges is a Binomial ((n2),dn1)(\binom{n}{2},{d\over n-1}) random variable.) Then

y=0(n2)P(yedges)fyedsk/2.\displaystyle\sum\limits^{\binom{n}{2}}_{y=0}\ \mathrm{P}(y\ \text{edges})\,f_{y}\leq\,e^{-d^{sk/2}}. (69)

Starting the sum at m=nd2m={nd\over 2} we have

y=m(n2)P(yedges)fyedsk/2\displaystyle\sum\limits^{\binom{n}{2}}_{y=m}\mathrm{P}(y\ \text{edges})\,f_{y}\leq e^{-d^{sk/2}} (70)

and since fyf_{y} is an increasing function of yy we have

y=m(n2)P(yedges)fmedsk/2.\displaystyle\sum\limits^{\binom{n}{2}}_{y=m}\mathrm{P}(y\ \text{edges})\,f_{m}\leq e^{-d^{sk/2}}. (71)

Now m=nd2m={nd\over 2} is the expected number of edges so this is (12+o(1))fm({1\over 2}+o(1))f_{m} and we conclude that

fm(2+o(1))edsk/2edsk/3\displaystyle f_{m}\leq(2+o(1))e^{-d^{sk/2}}\leq e^{-d^{sk/3}} (72)

for kk large enough. So using the indirect connection between the branching process and our graphs, we have

Prob[B(i,k)dsk(d/ln2)k]edsk/3\displaystyle\mathrm{Prob}\left[\mathrm{B}(i,k)\geq d^{sk}\left(d/\ln{2}\right)^{k}\right]\leq e^{-d^{sk/3}} (73)

where Prob is with respect to the graph distribution with a fixed number of edges mm. Now take k=wlognlog(d/ln2)k={w\log n\over\log(d/\ln{2})} and recall that by assumption 2pk2p\leq k and we have

Prob[B(i,2p)(dsd/ln2)wlognlog(d/ln2)]Prob[B(i,k)(dsd/ln2)wlognlog(d/ln2)]edswlogn3log(d/ln2).\displaystyle\mathrm{Prob}\left[\mathrm{B}(i,2p)\geq\left(d^{s}d/\ln{2}\right)^{w\log n\over\log(d/\ln{2})}\right]\leq\mathrm{Prob}\left[\mathrm{B}(i,k)\geq\left(d^{s}d/\ln{2}\right)^{w\log n\over\log(d/\ln{2})}\right]\leq e^{-d^{sw\log n\over 3\log(d/\ln{2})}}. (74)

We can take log=logd\log=\log_{d} in the above, and writing logd(1/ln2)=L\log_{d}(1/\ln{2})=L we get

Prob[B(i,2p)n(1+s+L)w1+L]ensw3(1+L).\displaystyle\mathrm{Prob}\left[\mathrm{B}(i,2p)\geq n^{(1+s+L)w\over 1+L}\right]\leq e^{-n^{sw\over 3(1+L)}}. (75)

We can pick s>0s>0 to make A=(1+s+L)w1+L<1A={(1+s+L)w\over 1+L}<1 because w<1w<1. By the union bound, Prob[maxiB(i,2p)nA]\mathrm{Prob}[\max_{i}\mathrm{B}(i,2p)\geq n^{A}] can be at most nn times as large, so choosing a<sw3(1+L)a<{sw\over 3(1+L)} yields the first half of the theorem. The other half, for B(i,p)B(i,p), has w2w\over 2 in the exponent. End of Proof.

8.2 Concentration

We now restate and prove the Concentration Theorem. This result concerns the concentration of the Hamming weight in the output strings of the shallow depth QAOA+. The first part applies to the QAOA+ acting on a fixed random graph. The second part is a statement about concentration on random graphs.

Concentration Theorem

1. Let GG be a graph with nn vertices and dn2{dn\over 2} edges. Suppose pp is chosen such that maxi|BGn(i,2p)|nA\max_{i}|\mathrm{B}_{G_{n}}(i,2p)|\leq n^{A} for some 0<A<10<A<1. Let σ\sigma be the output of the QAOA+. Then there exists γ1>0\gamma_{1}>0 such that for all δ>0\delta>0

G[σ|𝔼G|σδn]eδnγ1\displaystyle\mathbb{P}_{G}\Big{[}~{}\big{|}|\sigma|-\mathbb{E}_{G}|\sigma|\big{|}\geq\delta n~{}\Big{]}\leq e^{-\delta n^{\gamma_{1}}} (76)

for nn large enough. Here the graph is fixed and the randomness comes from the QAOA+.

2. Now let GG be a random 𝔾(n,dn2)\mathbb{G}(n,{dn\over 2}) graph. Suppose

Prob[maxiB(i,2p)nA]ena\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,2p)\geq n^{A}\Big{]}\leq e^{-n^{a}}\ (77)
and\displaystyle and~{}~{}~{} Prob[maxiB(i,p)nA/2]ena/2.\displaystyle\mathrm{Prob}\Big{[}\max_{i}\mathrm{B}(i,p)\geq n^{A/2}\Big{]}\leq e^{-n^{a/2}}. (78)

Then there exists γ2>0\gamma_{2}>0 such that for any 0<δ<10<\delta<1,

Prob[σ|Ex|σδn]eδ2nγ2,\displaystyle\mathrm{Prob}\Big{[}~{}\big{|}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq\delta n~{}\Big{]}\leq e^{-{\delta^{2}}n^{\gamma_{2}}}, (79)

for nn large enough. Here Prob\mathrm{Prob} and Ex\mathrm{Ex} refer to Probability and Expectation coming from the distribution of random graphs combined with the randomness coming from the QAOA+.

Consider the QAOA+ running on a graph GG and outputting an independent set σ\sigma. The Hamming weight of σ\sigma is a sum of nn random variables b1+b2++bnb_{1}+b_{2}+\cdots+b_{n}, where each bi{0,1}b_{i}\in\{0,1\} is independent of the collection {bj}\{b_{j}\} for jj not within a distance 2p2p of ii in GG. With pp small enough so that the 2p2p-neighborhoods are small compared to nn, this is enough to get a sub-exponential bound on the concentration of |σ||\sigma| for a fixed graph GG. We use the standard technique of moment generating functions (as in the Chernoff bound) to prove (76).

Suppose maxiBG(i,2p)nA\max\limits_{i}\mathrm{B}_{G}(i,2p)\leq n^{A}, with A<1A<1. Let

Yi=bi𝔼G(bi).\displaystyle Y_{i}=b_{i}-\mathbb{E}_{G}\,(b_{i}). (80)

We will first bound 𝔼G[Yi]t\mathbb{E}_{G}\,[\sum Y_{i}]^{t} and use that to bound 𝔼G[eθYi]\mathbb{E}_{G}[e^{\theta\sum Y_{i}}], from which (76) will follow. We have

𝔼G[Yi]t=1i1,,itn𝔼G[Yi1Yit].\displaystyle\mathbb{E}_{G}\,\big{[}\sum Y_{i}\big{]}^{t}=\sum_{1\leq i_{1},\cdots,{i_{t}}\leq n}\,\mathbb{E}_{G}\big{[}Y_{i_{1}}\cdots Y_{i_{t}}\big{]}. (81)

Because each 𝔼G[Yi]=0\mathbb{E}_{G}[Y_{i}]=0, any term in the sum is 0 by independence unless for each iki_{k}, there is an i(k)i_{\ell}~{}(\ell\neq k) in BG(ik,2p)\mathrm{B}_{G}(i_{k},2p). The number of nonzero terms is at most t!(nnA)t/2t!(nn^{A})^{t/2} - see note at end of this section. Since |Yi|1|Y_{i}|\leq 1, we have

𝔼G[Yi]tt!(nnA)t/2.\displaystyle\mathbb{E}_{G}\,\big{[}\sum\,Y_{i}\big{]}^{t}\ \leq\ t!(nn^{A})^{t/2}\ . (82)

Multiply both sides by θt\theta^{t}, divide by t!t! and sum on tt to get

𝔼G[eθYi]t=0θt(nnA)t/2=11θn1+A2.\displaystyle\mathbb{E}_{G}\big{[}\,e^{\theta\sum Y_{i}}\big{]}\leq\sum^{\infty}_{t=0}\,\theta^{t}(nn^{A})^{t/2}=\frac{1}{1-\theta n^{\frac{1+A}{2}}}. (83)

Now we can prove (76). By the Markov inequality, for any θ\theta,

G[Yiδn]eθδn𝔼G[eθYi]\displaystyle\mathbb{P}_{G}\Big{[}\sum Y_{i}\geq\delta n\Big{]}\leq e^{-\theta\delta n}\mathbb{E}_{G}\Big{[}e^{\theta\sum Y_{i}}\bigg{]} (84)

which combines with the previous result to give

G[Yiδn]eθδn1θn1+A2.\displaystyle\mathbb{P}_{G}\Big{[}\sum Y_{i}\geq\delta n\Big{]}\leq\frac{e^{-\theta\delta n}}{1-\theta n^{\frac{1+A}{2}}}.\hskip 31.50005pt

Choose 0<γ<1A20<\gamma^{\prime}<\frac{1-A}{2}\ and θ=nγ1\ \theta=n^{\gamma^{\prime}-1}, so

G[Yiδn]eδnγ1n(γ12+A2).\displaystyle\mathbb{P}_{G}\Big{[}\sum\,Y_{i}\geq\delta n\Big{]}\leq\frac{e^{-\delta n^{\gamma^{\prime}}}}{1-n^{\left(\gamma^{\prime}-\frac{1}{2}+\frac{A}{2}\right)}}. (85)

The bound on G[Yiδn]\mathbb{P}_{G}\big{[}\sum\,Y_{i}\leq-\delta n\big{]} is the same (use θ=nγ1\ \theta=-n^{\gamma^{\prime}-1}). Replace γ\gamma^{\prime} by any γ1<γ\gamma_{1}<\gamma^{\prime} and we have

G[|Yi|δn]eδnγ1\displaystyle\mathbb{P}_{G}\left[\left|\sum Y_{i}\right|\geq\delta n\right]\leq e^{-\delta n^{\gamma_{1}}} (86)

for large nn. This is (76).

The second concentration bound (77) follows from the fact that 𝔼G|σ|\mathbb{E}_{G}|\sigma| is very unlikely to vary much as GG varies as we now show. The random graph GG consists of mm edges e1,e2,eme_{1},e_{2},\cdots e_{m} chosen independently uniformly over the (n2)n\choose 2 possibilities. (We ignore the O(1)O(1) collisions.) Let

f(𝐞)=f(e1,e2,em)=𝔼G|σ|.\displaystyle f(\mathbf{e})=f(e_{1},e_{2},\cdots e_{m})=\mathbb{E}_{G}|\sigma|. (87)

Here the graph is fixed and the expectation is with respect to the QAOA+. As long as the pp-neighborhoods of the vertices in GG are small, a change in one edge of GG makes only a small change in f(e1,e2,em)f(e_{1},e_{2},\cdots e_{m}) by the locality of the QAOA+. It is natural to use the Azuma inequality which bounds concentration probability in terms of local changes. But the neighborhoods are only small with high probability so we cannot use the Azuma inequality directly on the function ff. Instead we adjust ff using a version of Kirzbraun’s Theorem on extending Lipschitz functions.

To set the stage we state the Azuma inequality. Consider a real valued function ϕ(𝐞)\phi(\bf{e}) where 𝐞\bf{e} is a set of mm variables e1,e2eme_{1},e_{2}\cdots e_{m} with the property that ϕ\phi does not change much when one of the variables changes. Let e~\tilde{\textbf{e}} be equal to 𝐞\bf{e} except in one of the mm variables with the Lipschitz condition on ϕ\phi being

|ϕ(e)ϕ(e~)|R\displaystyle|\phi(\textbf{e})-\phi(\tilde{\textbf{e}})|\leq R (88)

for some fixed RR. Now we take the {ei}\{e_{i}\} to be independent random variables. The Azuma inequality states that

Prob[|ϕEx[ϕ]|t]2exp(t22mR2).\displaystyle\mathrm{Prob}\Big{[}\big{|}~{}\phi-\mathrm{Ex}[\phi]~{}\big{|}\geq t\Big{]}\leq 2\mathrm{exp}\Big{(}\frac{-t^{2}}{2mR^{2}}\Big{)}. (89)

Return to our function f(𝐞)f(\bf{e}) where the eie_{i} are edges in a graph. We will use (78). Let Kn\mathrm{K}_{n} be the set of graphs with nn vertices and dn2{dn\over 2} edges that have small neighborhoods

Kn={G:maxi|BG(i,p)|nA/2}\displaystyle\mathrm{K}_{n}=\big{\{}G:\max_{i}|B_{G}(i,p)|\leq n^{A/2}\big{\}} (90)

so

Prob[Kn]1ena/2\displaystyle\mathrm{Prob}\big{[}K_{n}\big{]}\geq 1-e^{-n^{a/2}} (91)

for nn large enough. If 𝐞\bf{e} and 𝐞~\tilde{\bf{e}} differ by one edge and both are in Kn\mathrm{K}_{n} the we have

|f(e)f(e~)|4nA/2\displaystyle\big{|}f(\textbf{e})-f(\tilde{\textbf{e}})\big{|}\leq 4n^{A/2} (92)

since the QAOA+ outputs can only differ on bits in the 4 neighborhoods of the vertices at the ends of the 2 swapped edges. For graphs not necessarily in Kn\mathrm{K}_{n} we need to modify ff. Let ρ(𝐞,𝐞~)\rho(\bf{e},\tilde{\bf{e}}) be the number of edge changes needed to turn 𝐞\bf{e} into 𝐞~\tilde{\bf{e}}. By (92) if e and e~\tilde{\textbf{e}} are in Kn\mathrm{K}_{n} then

|f(e)f(e~)|4nA/2ρ(𝐞,𝐞~).\displaystyle\big{|}f(\textbf{e})-f(\tilde{\textbf{e}})\big{|}\leq 4n^{A/2}\rho(\bf{e},{\tilde{\bf{e}}}). (93)

Now define for any e,

g(e)=mineKn[f(e)+4nA/2ρ(e,e)].\displaystyle g(\textbf{e})=\min_{\textbf{e}^{\prime}\in\mathrm{K}_{n}}\big{[}f(\textbf{e}^{\prime})+4n^{A/2}\rho(\textbf{e},\textbf{e}^{\prime})\big{]}. (94)

For eKn\textbf{e}\in\mathrm{K}_{n} we see that g(e)=f(e)g(\textbf{e})=f(\textbf{e}) by (93). For any e and e~\tilde{\textbf{e}}, we have

|g(e)g(e~)|4nA/2ρ(𝐞,𝐞~).\displaystyle\big{|}g(\textbf{e})-g(\tilde{\textbf{e}})\big{|}\leq 4n^{A/2}\rho(\bf{e},{\tilde{\bf{e}}}). (95)

To see this, note that there is an eKn\textbf{e}^{\prime}\in\mathrm{K}_{n} with

g(e)=f(e)+4nA/2ρ(e,e)g(e~)f(e)+4nA/2ρ(e~,e).\displaystyle\begin{split}g(\textbf{e})=f(\textbf{e}^{\prime})+4n^{A/2}\rho(\textbf{e},\textbf{e}^{\prime})\\ g(\tilde{\textbf{e}})\leq f(\textbf{e}^{\prime})+4n^{A/2}\rho(\tilde{\textbf{e}},\textbf{e}^{\prime}).\end{split} (96)

Subtracting these two and using the triangle inequality yields

g(e~)g(e)4nA/2ρ(e,e~).\displaystyle g(\tilde{\textbf{e}})-g(\textbf{e})\leq 4n^{A/2}\rho(\textbf{e},\tilde{\textbf{e}}). (97)

Repeat with e and e~\tilde{\textbf{e}} interchanged and (95) follows. We are going to apply the Azuma theorem to gg so note that if e differs from e~\tilde{\textbf{e}} in just one edge then ρ(e,e~)\rho(\textbf{e},\tilde{\textbf{e}}) is 11 in (95). Then for any δ\delta the inequality (89) says

Prob[|gEx[g]|δn/2]2exp((δn/2)22(dn/2)(4nA/2)2)=2exp(δ2n1A64d).\displaystyle\mathrm{Prob}\Big{[}\big{|}g-\mathrm{Ex}[g]\big{|}\geq\delta n/2\Big{]}\leq 2\mathrm{exp}\Big{(}\frac{-(\delta n/2)^{2}}{2(dn/2)(4n^{A/2})^{2}}\Big{)}=2\mathrm{exp}\Big{(}-\frac{\delta^{2}n^{1-A}}{64d}\Big{)}. (98)

We are going to use this to bound Prob[|fEx[f]|δn]\mathrm{Prob}\big{[}|f-\mathrm{Ex}[f]|\geq\delta n\big{]} which appears in (79) since again f=𝔼G[|σ|]f=\mathbb{E}_{G}[|\sigma|] and ff agrees with gg except outside of Kn\mathrm{K}_{n}. Outside of Kn\mathrm{K}_{n} we have the crude bound that |fg|(n+4nA/2dn/2)|f-g|\leq(n+4n^{A/2}dn/2) so by (91)

|Ex[f]Ex[g]|(n+4nA/2dn/2)ena/2δn/2,\displaystyle\big{|}\mathrm{Ex}[f]-\mathrm{Ex}[g]\big{|}\leq\big{(}n+4n^{A/2}dn/2\big{)}e^{-n^{a/2}}\leq\delta n/2, (99)

for any δ\delta if nn is big enough. Now

Prob[|fEx[f]|δn]Prob[|fEx[g]|δn/2].\displaystyle\mathrm{Prob}\Big{[}\big{|}f-\mathrm{Ex}[f]\big{|}\geq\delta n\Big{]}\leq\mathrm{Prob}\Big{[}\big{|}f-\mathrm{Ex}[g]\big{|}\geq\delta n/2\Big{]}. (100)

Inside Kn\mathrm{K}_{n} we have f=gf=g while the probability of being outside of Kn\mathrm{K}_{n} is bounded by (91) so

Prob[|fEx[f]|δn]Prob[|gEx[g]|δn/2]+ena/2,\displaystyle\mathrm{Prob}\Big{[}\big{|}f-\mathrm{Ex}[f]\big{|}\geq\delta n\Big{]}\leq\mathrm{Prob}\Big{[}\big{|}g-\mathrm{Ex}[g]\big{|}\geq\delta n/2\Big{]}+e^{-n^{a/2}}, (101)

which by (98) is

Prob[|fEx[f]|δn]2exp(δ2n1A64d)+ena/2.\displaystyle\mathrm{Prob}\Big{[}\big{|}f-\mathrm{Ex}[f]\big{|}\geq\delta n\Big{]}\leq 2\mathrm{exp}\Big{(}-\frac{\delta^{2}n^{1-A}}{64d}\Big{)}+e^{-n^{a/2}}. (102)

We can rewrite this (take δ<1\delta<1) as

Prob[|𝔼G|σ|Ex|σ||δn]eδ2na~\displaystyle\mathrm{Prob}\Big{[}\big{|}\mathbb{E}_{G}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq\delta n\Big{]}\leq e^{-{\delta^{2}}n^{\tilde{a}}} (103)

with a~\tilde{a} less than both 1A1-A and a/2a/2, for nn large. The first half of the concentration theorem implies

Prob[σ|𝔼G|σδn]eδnγ1+ena,\displaystyle\mathrm{Prob}\Big{[}\big{|}|\sigma|-\mathbb{E}_{G}|\sigma|\big{|}\geq\delta n\Big{]}\leq e^{-\delta n^{\gamma_{1}}}+e^{-n^{a}}, (104)

where the enae^{-n^{a}} accounts for graphs with large 2p2p-neighborhoods.

Now the triangle inequality says

Prob[σ|Ex|σ2δn]eδ2na~+eδnγ1+ena.\displaystyle\mathrm{Prob}\Big{[}\big{|}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq 2\delta n\Big{]}\leq e^{-{\delta^{2}}n^{\tilde{a}}}+e^{-\delta n^{\gamma_{1}}}+e^{-n^{a}}. (105)

By taking γ2\gamma_{2} smaller than a~,γ1\tilde{a},\gamma_{1} and aa we obtain

Prob[σ|Ex|σ2δn]eδ2nγ2,\displaystyle\mathrm{Prob}\Big{[}\big{|}|\sigma|-\mathrm{Ex}|\sigma|\big{|}\geq 2\delta n\Big{]}\leq e^{-{\delta^{2}}n^{\gamma_{2}}}, (106)

for all nn large. This is (79), as desired, if we go back to the beginning and let δ\delta go to δ/2\delta/2.

8.3 Counting the number of non-zero terms

To bound the number of non-zero terms in

0i1,,itn𝔼G(Yi1Yit)\displaystyle\sum\limits_{0\leq i_{1},\ldots,i_{t}\leq n}\ \mathbb{E}_{G}\left(Y_{i_{1}}\cdots Y_{i_{t}}\right) (107)

first note that since each 𝔼G[Yi]=0\mathbb{E}_{G}[Y_{i}]=0, any term in the sum is 0 by independence unless for each iki_{k}, there is an i(k)i_{\ell}~{}(\ell\neq k) in BG(ik,2p)\mathrm{B}_{G}(i_{k},2p). We are going to graphically represent the non-zero contributions by considering graphs with vertices 1,2,t1,2,\ldots t, possibly disconnected. A graph will be called valid if no vertex is isolated. We say that a sequence i1,i2,iti_{1},i_{2},\ldots i_{t} satisfies a graph Γ\Gamma on {1,2,t}\{1,2,\ldots t\} if ii_{\ell} is in the 2p2p-neighborhood of iki_{k} (and, equivalently, vice-versa) whenever (,k)(\ell,k) is an edge of Γ\Gamma. We know that every sequence i1,i2,iti_{1},i_{2},\ldots i_{t} with 𝔼G(Yi1Yit)0\mathbb{E}_{G}\left(Y_{i_{1}}\cdots Y_{i_{t}}\right)\neq 0 satisfies at least one valid graph, and in particular it satisfies some minimal valid graph – minimal means no subgraph with tt vertices but fewer edges is valid. So we need to bound the number of minimal valid graphs as a function of tt. Call this number VtV_{t}. A minimal valid graph has this form: each of its components is a star, that is, a tree with a central vertex connected to all the other vertices in the component. Suppose the largest component has size rr. If r=2r=2, then the minimal valid graphs are matchings (tt must be even) of which there are (t1)(t3)1(t-1)(t-3)\ldots 1 for t2t\geq 2. For r>2r>2, there t(t1tr)t\binom{t-1}{t-r} stars of size rr, and at most t(t1tr)Vtrt\binom{t-1}{t-r}\ V_{t-r} minimal valid graphs which may be overcounting because we’ve singled out one component of size rr. So for t2t\geq 2

Vt\displaystyle V_{t}\leq (t1)(t3)1+r=3t2t(t1tr)Vtr\displaystyle\ (t-1)(t-3)\cdots 1+\sum\limits^{t-2}_{r=3}\ t\binom{t-1}{t-r}\ V_{t-r}
=\displaystyle= t!t(t2)2+r=3t2t(t1)(tr+1)(r1)!Vtr.\displaystyle\ {t!\over t(t-2)\cdots 2}+\sum\limits^{t-2}_{r=3}\ {t(t-1)\cdots(t-r+1)\over(r-1)!}V_{t-r}. (108)

Now suppose inductively that Vtr(tr)!V_{t-r}\leq(t-r)! for r3r\geq 3. (Note that V2=1V_{2}=1 and V3=3V_{3}=3.) Then for t4t\geq 4,

Vt\displaystyle V_{t}\leq t!t(t2)+r=3t2t(t1)(tr+1)(r1)!(tr)!\displaystyle\ {t!\over t(t-2)}+\sum\limits^{t-2}_{r=3}\ {t(t-1)\cdots(t-r+1)\over(r-1)!}(t-r)!
\displaystyle\leq t!8+r=3t!(r1)!\displaystyle\ {t!\over 8}+\sum\limits^{\infty}_{r=3}\ {t!\over(r-1)!} (109)
=\displaystyle= t!(18+e2)<t!.\displaystyle\ t!({1\over 8}+e-2)<~{}t!~{}. (110)

So there are at most t!t! minimal valid graphs with tt vertices. How many sequences i1,i2,,iti_{1},i_{2},\ldots,i_{t}, where 1in1\leq i_{\ell}\leq n, can satisfy a given minimal valid graph? If it has uu components, then it must have tut-u edges. The central vertex \ell in any component corresponds to ii_{\ell} which has nn possible values. If (,k)(\ell,k) is an edge in the minimal valid graph, iki_{k} can have only nAn^{A} values, since the sequence i1iti_{1}\ldots i_{t} is assumed to satisfy this minimal valid graph. So there are at most nu(nA)tun^{u}(n^{A})^{t-u} possible sequences and since ut/2u\leq t/2, this is nt/2(nA)t/2\leq n^{t/2}(n^{A})^{t/2}. Multiplying by the bound on the number of minimal valid graphs, there are at most

t!nt/2(nA)t/2\displaystyle t!n^{t/2}(n^{A})^{t/2} (111)

nonzero terms in (107).

9 Discussion

No one knows if a quantum computer running a quantum algorithm will be able to outperform a classical computer on a combinatorial search problem. One approach is to build a quantum computer, run a quantum algorithm and see what happens at the available number of qubits. Another approach is to look for a provable quantum speedup over the best known classical algorithms. To this end it is useful know if there are provable limitations to the power of any proposed quantum algorithm.

In this paper we look at the Quantum Approximate Optimization Algorithm applied to finding a large independent set in a random graph of fixed average degree dd. The performance of the QAOA can only improve with depth pp but we show that for Maximum Independent Set on random graphs the algorithm will fail to pass a certain performance barrier if 2p2p is less than wlogn/log(d/ln2)w\log n/\log{(d/\ln{2})} for any w<1w<1 with dd big enough. (This ratio is independent of the base of the log\log.) The quantum algorithm consists of pp unitaries that each respect the locality of the underlying graph. With a fixed average degree of dd this means that each qubit typically has an influence sphere of roughly dpd^{p} other qubits. For qubits further than 2p2p apart on the graph these influence spheres do not intersect and we can show that measurements of these qubits are uncorrelated. This is key to our showing that the algorithm has limited power. However if pp is large enough that dpd^{p} exceeds nn our arguments do not apply and we have no indication that the QAOA will fail.

Our results are for random graphs and we do not have results for say a graph which is a 22 dimensional square lattice. In this case perhaps the border between failure and possible success is at n\sqrt{n}. Back to random graphs. Although our proof technique requires dd big, the intuition is that if d2p<nd^{2p}<n then most pairs of qubits will have independent measurement outcomes. Consider the case when the degree is small so the influence spheres are small, the least favorable situation for the QAOA. For example with d=3d=3 at one million qubits our result sugggest failure for pp less than 77. The bipartite construction of [3] only shows failure at 2 million qubits with d=3d=3 for pp less than 11. Just beyond this the QAOA “sees” the whole graph and we can not say with certainly what happens at a few million qubits in the shallow circuit depth regime with pp say in double digits.

Acknowledgement

We thank Aram Harrow for helpful discussion.

References

  • [1] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm. arXiv:1411.4028, 2014.
  • [2] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm applied to a bounded occurrence constraint problem. arXiv:1412.6062, 2014.
  • [3] Sergey Bravyi, Alexander Kliesch, Robert Koenig, and Eugene Tang. Obstacles to state preparation and variational optimization from symmetry protection. arXiv:1910.08980, 2019.
  • [4] David Gamarnik and Madhu Sudan. Limits of local algorithms over sparse random graphs. Annals of Probability, 45:2353–2376, 2017.
  • [5] David Gamarnik and Madhu Sudan. Performance of sequential local algorithms for the random nae-k-sat problem. SIAM Journal on Computing, 46(2):590–619, 2017.
  • [6] Wei-Kuo Chen, David Gamarnik, Dmitry Panchenko, Mustazee Rahman, et al. Suboptimality of local algorithms for a class of max-cut problems. The Annals of Probability, 47(3):1587–1618, 2019.
  • [7] Amin Coja-Oghlan, Amir Haqshenas, and Samuel Hetterich. Walksat stalls well below satisfiability. SIAM Journal on Discrete Mathematics, 31(2):1160–1173, 2017.
  • [8] Gérard Ben Arous and Aukosh Jagannath. Spectral gap estimates in mean field spin glasses. Comm. Math. Phys., 361(1):1–52, 2018.
  • [9] David Gamarnik, Aukosh Jagannath, and Subhabrata Sen. The overlap gap property in principal submatrix recovery. arXiv:1908.09959, 2019.
  • [10] David Gamarnik and Aukosh Jagannath. The overlap gap property and approximate message passing algorithms for pp-spin models. arXiv:1911.06943, 2019.
  • [11] Andrea Montanari. Optimization of the Sherrington-Kirkpatrick hamiltonian. arXiv:1812.10897, 2018.
  • [12] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, and Leo Zhou. The quantum approximate optimization algorithm and the Sherrington-Kirkpatrick model at infinite size. arXiv:1910.08187, 2019.
  • [13] M. Bayati, D. Gamarnik, and P. Tetali. Combinatorial approach to the interpolation method and scaling limits in sparse random graphs. Annals of Probability. (Conference version in Proc. 42nd Ann. Symposium on the Theory of Computing (STOC) 2010), 41:4080–4115, 2013.
  • [14] A. Frieze. On the independence number of random graphs. Discrete Mathematics, 81:171–175, 1990.
  • [15] Richard M Karp. The probabilistic analysis of some combinatorial search algorithms. Algorithms and complexity: New directions and recent results, 1:1–19, 1976.
  • [16] Fernando G. S. L. Brandao, Michael Broughton, Edward Farhi, Sam Gutmann, and Hartmut Neven. For fixed control parameters the quantum approximate optimization algorithm’s objective function value concentrates for typical instances. arXiv:1812.04170, 2018.
  • [17] Leo Zhou, Sheng-Tao Wang, Soonwon Choi, Hannes Pichler, and Mikhail D. Lukin. Quantum approximate optimization algorithm: Performance, mechanism, and implementation on near-term devices. arXiv:1812.01041, 2018.