This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Universal Sampling Lower Bounds for Quantum Error Mitigation

Ryuji Takagi [email protected] Department of Basic Science, The University of Tokyo, Tokyo 153-8902, Japan Nanyang Quantum Hub, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore    Hiroyasu Tajima [email protected] Department of Communication Engineering and Informatics, University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo, 182-8585, Japan JST, PRESTO, 4-1-8 Honcho, Kawaguchi, Saitama, 332-0012, Japan    Mile Gu [email protected] Nanyang Quantum Hub, School of Physical and Mathematical Sciences, Nanyang Technological University, 637371, Singapore Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, 117543, Singapore MajuLab, CNRS-UNS-NUS-NTU International Joint Research Unit UMI 3654, Singapore
Abstract

Numerous quantum error-mitigation protocols have been proposed, motivated by the critical need to suppress noise effects on intermediate-scale quantum devices. Yet, their general potential and limitations remain elusive. In particular, to understand the ultimate feasibility of quantum error mitigation, it is crucial to characterize the fundamental sampling cost — how many times an arbitrary mitigation protocol must run a noisy quantum device. Here, we establish universal lower bounds on the sampling cost for quantum error mitigation to achieve the desired accuracy with high probability. Our bounds apply to general mitigation protocols, including the ones involving nonlinear postprocessing and those yet-to-be-discovered. The results imply that the sampling cost required for a wide class of protocols to mitigate errors must grow exponentially with the circuit depth for various noise models, revealing the fundamental obstacles in the scalability of useful noisy near-term quantum devices.

Introduction — As recent technological developments have started to realize controllable small-scale quantum devices, a central problem in quantum information science has been to pin down what can and cannot be accomplished with noisy intermediate-scale quantum (NISQ) devices [1]. One of the most relevant issues in understanding the ultimate capability of quantum hardware is to characterize how well noise effects could be circumvented. This is especially so for NISQ devices, as today’s quantum devices generally cannot accommodate full quantum error correction that requires scalable quantum architecture. As an alternative to quantum error correction, quantum error mitigation has recently attracted much attention as a potential tool to help NISQ devices realize useful applications [2, 3]. It is thus of primary interest from practical and foundational viewpoints to understand the ultimate feasibility of quantum error mitigation.

Quantum error mitigation protocols generally involve running available noisy quantum devices many times. The collected data is then post-processed to infer classical information of interest. While this avoids the engineering challenge in error correction, it comes at the price of sampling cost — computational overhead in having to sample a noisy device many times. This sampling cost represents the crucial quantity determining the feasibility of quantum error mitigation. If the required sampling cost becomes too large, then such quantum error mitigation protocol becomes infeasible under a realistic time constraint. Various prominent quantum error mitigation methods face this problem, where sampling cost grows exponentially with circuit size [4, 5, 6, 7, 8]. The crucial question then is whether there is hope to come up with a new error mitigation strategy that avoids this hurdle or if this is a universal feature shared by all quantum error mitigation protocols. To answer this question, we need a characterization of the sampling cost that is universally required for the general error-mitigation protocols, which has hitherto been unknown.

Here, we provide a solution to this problem. We derive lower bounds for the number of samples fundamentally required for general quantum error mitigations to realize the target performance. We then show that the required samples for a wide class of mitigation protocols to error-mitigate layered circuits under various noise models — including the depolarizing and stochastic Pauli noise — must grow exponentially with the circuit depth to achieve the target performance. This turns the conjecture that quantum error mitigation would generally suffer from the exponential sampling overhead into formal relations, extending the previous results on the exponential resource overhead required for noisy circuits without postprocessing [9, 10, 11]. We accomplish these by employing an information-theoretic approach, which establishes the novel connection between the state distinguishability and operationally motivated error-mitigation performance measures. Our results place the fundamental limitations imposed on the capability of general error-mitigation strategies that include existing protocols [5, 6, 12, 13, 7, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 8, 25, 26, 27, 28, 29, 30, 31] and the ones yet to be discovered, being analogous to the performance converse bounds established in several other disciplines — such as thermodynamics [32, 33, 34], quantum communication [35, 36], and quantum resource theories [37, 38] — that contributed to characterizing the ultimate operational capability allowed in each physical setting.

Our work complements and extends several recent advancements in the field. Ref. [39] introduced a general framework of quantum error mitigation and established lower bounds for the maximum estimator spread, i.e., the range of the outcomes of the estimator, imposed on all error mitigation in the class, which provides a sufficient number of samples to ensure the target accuracy. Those bounds were then employed to show that the maximum spread grows exponentially with the circuit depth to mitigate local depolarizing noise. Ref. [40] showed a related result where for the class of error-mitigation strategies that only involve linear postprocessing, in which the target expectation value can be represented by a linear combination of the actually observed quantities, either the maximum estimator spread or the sample number needs to grow exponentially with the circuit depth to mitigate local depolarizing noise. The severe obstacle induced by noise in showing a quantum advantage for variational quantum algorithms has also recently been studied [41, 42, 43]. Our results lift the observations made in these works to rigorous bounds for the necessary sampling cost required for general error mitigation, including the ones involving nonlinear postprocessing that constitute a large class of protocols [12, 13, 7, 14, 15, 16, 17, 25, 27, 29, 30, 21, 22, 24, 44, 19, 31].

Framework — Suppose we wish to obtain the expectation value of an observable A𝕆A\in\mathbb{O} for an ideal state ρ𝕊\rho\in\mathbb{S} where 𝕆\mathbb{O} and 𝕊\mathbb{S} are some sets of observables and states. We assume that the ideal quantum state ρ\rho is produced by a unitary quantum circuit 𝒰\mathcal{U} applied to the initial state ρini𝕊in\rho_{\rm ini}\in\mathbb{S}_{\rm in} as ρ=𝒰(ρini)\rho=\mathcal{U}(\rho_{\rm ini}) where 𝕊in\mathbb{S}_{\rm in} is the set of possible input states. The noise in the circuit, however, prevents us from preparing the state ρ\rho exactly. We consider quantum error mitigation protocols that aim to estimate the true expectation value under the presence of noise in the following manner [39] (see also Fig. 1).

In the mitigation procedure, one can first modify the circuit, e.g., use a different choice of unitary gates with potential circuit simplification, apply nonadaptive operations (enabling, e.g., dynamical decoupling [45, 46] and Pauli twirling [6]), and supply ancillary qubits — the allowed modifications are determined by the capability of the available device. Together with the noise present in the modified circuit, this turns the original unitary 𝒰\mathcal{U} into some quantum channel \mathcal{F}, which produces a distorted state ρ\rho^{\prime}. The distorted state can be represented in terms of the ideal state ρ\rho by ρ=(ρ)\rho^{\prime}=\mathcal{E}(\rho) where we call 𝒰\mathcal{E}\coloneqq\mathcal{F}\circ\mathcal{U}^{\dagger} an effective noise channel.

The second step consists of collecting NN samples {n(ρ)}n=1N\{\mathcal{E}_{n}(\rho)\}_{n=1}^{N} of distorted states represented by a set of effective noise channels 𝔼{n}n=1N\mathbb{E}\coloneqq\{\mathcal{E}_{n}\}_{n=1}^{N} and applying a trailing quantum process 𝒫A\mathcal{P}_{A} over them. The effective noise channels in 𝔼\mathbb{E} can be different from each other in general, as noisy hardware could have different noise profiles each time, or could purposely change the noise strength [47, 5]. The trailing process 𝒫A\mathcal{P}_{A} then outputs an estimate represented by a random variable E^A(ρ)\hat{E}_{A}(\rho) for the true expectation value Tr(Aρ)\operatorname{Tr}(A\rho). The main focus of our study is the sampling number NN, the total number NN of distorted states used in the error mitigation process.

We quantify the performance of an error-mitigation protocol by how well the protocol can estimate the expectation values for a given set 𝕆\mathbb{O} of observables and a set 𝕊\mathbb{S} of ideal states, which we call target observables and target states respectively. We keep the choices of these sets general, and they can be flexibly chosen depending on one’s interest. For instance, if one is interested in error mitigation protocols designed to estimate the Pauli observables (e.g., virtual distillation [15, 16, 21]), 𝕆\mathbb{O} can be chosen as the set of Pauli operators. As the trailing process includes a measurement depending on the observable, an error-mitigation strategy with target observables 𝕆\mathbb{O} is equipped with a family of trailing processes {𝒫A}A𝕆\{\mathcal{P}_{A}\}_{A\in\mathbb{O}}. Similarly, our results hold for an arbitrary choice of 𝕊\mathbb{S}, where one can, for instance, choose this as the set of all quantum states, which better describes the protocols such as probabilistic error cancellation [47, 5, 48, 49, 50, 51, 52], or as the set of states in a certain subspace, which captures the essence of subspace expansion [12, 14, 17].

This framework includes many error-mitigation protocols proposed so far [5, 6, 12, 13, 7, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 8, 25, 26, 27, 28, 29, 30, 31]. It is worth noting that our framework includes protocols that involve nonlinear postprocessing of the measurement outcomes. Error-mitigation protocols typically work by (1) making some set of (usually Pauli) measurements for observables {Oi}i\{O_{i}\}_{i}, (2) estimating their expectation values {Oi}i\{\left\langle O_{i}\right\rangle\}_{i} for distorted states, and (3) applying a classical postprocessing function ff over them. The protocols with linear postprocessing functions, i.e., the ones with the form f(Oii)=iciOif(\left\langle O_{i}\right\rangle_{i})=\sum_{i}c_{i}\left\langle O_{i}\right\rangle, are known to admit simpler analysis [40, 39], but numerous protocols — including virtual distillation [15, 16, 21], symmetry verification [13], and subspace expansion [12, 14, 17] — come with nonlinear postprocessing functions. In our framework, the sampling number NN is the total number of samples used, where we consider the output represented by E^A(ρ)\hat{E}_{A}(\rho) as our final guess and thus do not generally assume repeating some procedure many times and take a statistical average. This enables us to have any postprocessing absorbed in the trailing process 𝒫A\mathcal{P}_{A}, making our results valid for the protocols with nonlinear postprocessing functions.

We also remark that our framework includes protocols with much more operational power than existing protocols, as we allow the trailing process to apply any coherent interaction over all distorted states. Our results thus provide fundamental limits on the sampling overhead applicable to an arbitrary protocol in this extended class of error-mitigation protocols.

Refer to caption
Figure 1: Framework of quantum error mitigation. For an ideal state ρ𝕊\rho\in\mathbb{S} and an observable A𝕆A\in\mathbb{O} of interest, we first prepare NN copies of distorted states {n(ρ)}n=1N\{\mathcal{E}_{n}(\rho)\}_{n=1}^{N}, where 𝔼={n}n=1N\mathbb{E}=\{\mathcal{E}_{n}\}_{n=1}^{N} is the set of effective noise channels. A trailing quantum process 𝒫A\mathcal{P}_{A} is then applied to NN distorted states, producing the final estimation of Tr(Aρ)\operatorname{Tr}(A\rho) represented by a random variable E^A(ρ)\hat{E}_{A}(\rho). We quantify the error-mitigation performance in two ways by studying the property of the distribution of E^A(ρ)\hat{E}_{A}(\rho); the first is the combination of the accuracy δ\delta and the success probability 1ε1-\varepsilon, and the second is the combination of the bias bA(ρ)E^A(ρ)Tr(Aρ)b_{A}(\rho)\coloneqq\left\langle\hat{E}_{A}(\rho)\right\rangle-\operatorname{Tr}(A\rho) and the standard deviation σAQEM\sigma_{A}^{\operatorname{QEM}} of E^A(ρ)\hat{E}_{A}(\rho).

Sampling lower bounds — We now consider the required samples to ensure the target performance. The performance of quantum error mitigation can be defined in multiple ways. Here, we consider two possible performance quantifiers that are operationally relevant.

Our first performance measure is the combination of the accuracy of the estimate and the success probability. This closely aligns with the operational motivation, where one would like an error mitigation strategy to be able to provide a good estimate for each observable in 𝕆\mathbb{O} and an ideal state in 𝕊\mathbb{S} at a high probability. This can be formalized as a condition

Prob(|Tr(Aρ)E^A(ρ)|δ)1ε,ρ𝕊,A𝕆\displaystyle{\rm Prob}(|\operatorname{Tr}(A\rho)-\hat{E}_{A}(\rho)|\leq\delta)\geq 1-\varepsilon,\ \forall\rho\in\mathbb{S},\,\forall A\in\mathbb{O} (1)

where δ\delta is the target accuracy and 1ε1-\varepsilon is the success probability (see also Fig. 1).

The problem then is to identify lower bounds on the number NN of distorted states needed to achieve this condition as a function of δ\delta and ε\varepsilon. We address this by observing that the trailing process of quantum error mitigation is represented as an application of a quantum channel and thus can never increase the state distinguishability. To formulate our result, let us define the observable-dependent distinguishability with respect to a set 𝕆\mathbb{O} of observables as

D𝕆(ρ,σ)maxA𝕆|Tr[A(ρσ)]|.\displaystyle D_{\mathbb{O}}(\rho,\sigma)\coloneqq\max_{A\in\mathbb{O}}|\operatorname{Tr}[A(\rho-\sigma)]|. (2)

This quantity can be understood as the resolution in distinguishing two quantum states by using the measurements of the observables in 𝕆\mathbb{O}. We note that when 𝕆={A|0A𝕀}\mathbb{O}=\{A\,|0\leq A\leq\mathbb{I}\} 111Here, we consider the standard matrix inequality with respect to the positive semidefinite cone, i.e., M1M2M_{1}\leq M_{2} for two matrices M1M_{1} and M2M_{2} means that the matrix M2M1M_{2}-M_{1} is positive semidefinite., the quantity in (2) becomes the trace distance Dtr(ρ,σ)=12ρσ1D_{\operatorname{tr}}(\rho,\sigma)=\frac{1}{2}\|\rho-\sigma\|_{1} [54].

We then obtain the following sampling lower bounds applicable to an arbitrary given set 𝔼\mathbb{E} of effective noise channels. (Proof in Appendix A 222See the Supplemental Material for detailed proofs and discussions of our main results, which includes Refs. [56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80].)

Theorem 1.

Suppose that an error-mitigation strategy achieves (1) with some δ0\delta\geq 0 and 0ε1/20\leq\varepsilon\leq 1/2 with NN distorted states characterized by the effective noise channels 𝔼={n}n=1N\mathbb{E}=\{\mathcal{E}_{n}\}_{n=1}^{N}. Then, the sample number NN is lower bounded as

N\displaystyle N maxρ,σ𝕊D𝕆(ρ,σ)2δmin𝔼log[14ε(1ε)]log[1/F((ρ),(σ))],\displaystyle\geq\max_{\begin{subarray}{c}\rho,\sigma\in\mathbb{S}\\ D_{\mathbb{O}}(\rho,\sigma)\geq 2\delta\end{subarray}}\min_{\mathcal{E}\in\mathbb{E}}\frac{\log\left[\frac{1}{4\varepsilon(1-\varepsilon)}\right]}{\log\left[1/F(\mathcal{E}(\rho),\mathcal{E}(\sigma))\right]}, (3)
N\displaystyle N maxρ,σ𝕊D𝕆(ρ,σ)2δmin𝔼2(12ε)2ln2S((ρ)(σ)),\displaystyle\geq\max_{\begin{subarray}{c}\rho,\sigma\in\mathbb{S}\\ D_{\mathbb{O}}(\rho,\sigma)\geq 2\delta\end{subarray}}\min_{\mathcal{E}\in\mathbb{E}}\frac{2(1-2\varepsilon)^{2}}{\ln 2\cdot S(\mathcal{E}(\rho)\|\mathcal{E}(\sigma))},

where F(ρ,σ)ρσ12F(\rho,\sigma)\coloneqq\|\sqrt{\rho}\sqrt{\sigma}\|_{1}^{2} is the (square) fidelity and S(ρσ)Tr(ρlogρ)Tr(ρlogσ)S(\rho\|\sigma)\coloneqq\operatorname{Tr}(\rho\log\rho)-\operatorname{Tr}(\rho\log\sigma) is the relative entropy.

This result tells that if the noise effect brings states close to each other, it incurs an unavoidable sampling cost to error mitigation. The minimization over 𝔼\mathbb{E} chooses the effective noise channel that least reduces the infidelity and the relative entropy respectively. On the other hand, the maximum over the ideal states represents the fact that to mitigate two states ρ\rho and σ\sigma that are separated further than 2δ2\delta in terms of observables in 𝕆\mathbb{O}, the sample number NN that achieves the accuracy δ\delta and the success probability 1ε1-\varepsilon must satisfy the lower bounds with respect to ρ\rho and σ\sigma. The maximization over such ρ\rho and σ\sigma then provides the tightest lower bound. This also reflects the observation that error mitigation accommodating a larger set 𝕆\mathbb{O} of target observables would require a larger number of samples.

We remark that although the set 𝔼\mathbb{E} — which depends on how one modifies the noisy circuit — ultimately depends on a specific error-mitigation strategy in mind, fixing 𝔼\mathbb{E} to a certain form already provides useful insights as we see later in the context of noisy layered circuits. We also stress that the above bounds hold for an arbitrary choice of 𝔼\mathbb{E}, providing the general relation between the error mitigation performance and the information-theoretic quantity.

The bounds in Theorem 1 depend on the accuracy δ\delta implicitly through the constraints on ρ\rho and σ\sigma in the maximization. For instance, if one sets δ=0\delta=0, one can find that both bounds diverge, as the choice of σ=ρ\sigma=\rho would be allowed in the maximization. In Appendix B, we report an alternative bound that has an explicit dependence on the accuracy δ\delta.

Let us now consider our second performance measure based on the standard deviation and the bias of the estimate. Let σAQEM(ρ)\sigma_{A}^{\operatorname{QEM}}(\rho) be the standard deviation of E^A(ρ)\hat{E}_{A}(\rho) for an observable A𝕆A\in\mathbb{O}, which represents the uncertainty of the final estimate of an error mitigation protocol. Since a good error mitigation protocol should come with a small fluctuation in its outcome, the standard deviation of the underlying distribution for the estimate can serve as a performance quantifier. However, the standard deviation itself is not sufficient to characterize the error mitigation performance, as one can easily come up with a useless strategy that always outputs a fixed outcome, which has zero standard deviation. This issue can be addressed by considering the deviation of the expected value of the estimate from the true expectation value called bias, defined as bA(ρ)E^A(ρ)Tr(Aρ)b_{A}(\rho)\coloneqq\left\langle\hat{E}_{A}(\rho)\right\rangle-\operatorname{Tr}(A\rho) for a state ρ𝕊\rho\in\mathbb{S} and an observable A𝕆A\in\mathbb{O} (see also Fig. 1).

To assess the performance of error-mitigation protocols, we consider the worst-case error among possible ideal states and measurements. This motivates us to consider the maximum standard deviation σmaxQEMmaxA𝕆maxρ𝕊σAQEM(ρ)\sigma_{\max}^{\operatorname{QEM}}\coloneqq\max_{A\in\mathbb{O}}\max_{\rho\in\mathbb{S}}\sigma_{A}^{\operatorname{QEM}}(\rho) and the maximum bias bmaxmaxA𝕆maxρ𝕊bA(ρ)b_{\max}\coloneqq\max_{A\in\mathbb{O}}\max_{\rho\in\mathbb{S}}b_{A}(\rho). Then, we obtain the following sampling lower bound in terms of these performance quantifiers. (Proof in Appendix C.)

Theorem 2.

The sampling cost for an error-mitigation strategy with the maximum standard deviation σmaxQEM\sigma_{\max}^{\operatorname{QEM}} and the maximum bias bmaxb_{\max} is lower bounded as

Nmaxρ,σ𝕊D𝕆(ρ,σ)2bmax0min𝔼log[11(1+2σmaxQEMD𝕆(ρ,σ)2bmax)2]1log[1/F((ρ),(σ))].\displaystyle N\geq\max_{\begin{subarray}{c}\rho,\sigma\in\mathbb{S}\\ D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0\end{subarray}}\min_{\mathcal{E}\in\mathbb{E}}\frac{\log\left[1-\frac{1}{\left(1+\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}\right)^{2}}\right]^{-1}}{\log\left[1/F(\mathcal{E}(\rho),\mathcal{E}(\sigma))\right]}. (4)

This result represents the trade-off between the standard deviation, bias, and the required sampling cost. To realize the small standard deviation and bias, error mitigation needs to use many samples; in fact, the lower bound diverges at the limit of σmaxQEM0\sigma_{\max}^{\operatorname{QEM}}\to 0 whenever there exist states ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that D𝕆(ρ,σ)2bmaxD_{\mathbb{O}}(\rho,\sigma)\geq 2b_{\max}. On the other hand, a larger bias results in a smaller sampling lower bound, indicating a potential to reduce the sampling cost by giving up some bias.

The bounds in Theorems 1, 2 are universally applicable to arbitrary error mitigation protocols in our framework. Therefore, our bounds are not expected to give good estimates for a given specific error-mitigation protocol in general, just as there is a huge gap between the Carnot efficiency and the efficiency of most of the practical heat engines. Nevertheless, it is still insightful to investigate how our bounds are compared to existing mitigation protocols. In Appendix D, we compare the bound in Theorem 1 to the sampling cost for several error-mitigation methods, showing that our bound can provide nontrivial lower bounds with the gap being the factor of 3 to 6. Although this does not guarantee that our bound behaves similarly for other scenarios in general, this ensures that there is a setting in which the bound in Theorem 1 can provide a nearly tight estimate. We further show in Appendix E that the scaling of the lower bound in Theorem 2 with noise strength can be achieved by the probabilistic error cancellation method in a certain scenario. This shows that probabilistic error cancellation serves as an optimal protocol in this specific sense, complementing the recent observation on the optimality of probabilistic error cancellation established for the maximum estimator spread measure [39].

Noisy layered circuits — The above results clarify the close relation between the sampling cost and state distinguishability. As an application of our general bounds, we study the inevitable sample overhead to mitigate noise in the circuits consisting of multiple layers of unitaries. Although we here focus on the local depolarizing noise, our results can be extended to a number of other noise models as we discuss later.

Suppose that an MM-qubit quantum circuit consists of layers of unitaries, each of which is followed by a local depolarizing noise, i.e., a depolarizing noise of the form 𝒟p=(1p)id+p𝕀/2\mathcal{D}_{p}=(1-p)\operatorname{id}+p\mathbb{I}/2 where pp is a noise strength, applies to each qubit. We aim to estimate ideal expectation values for the target states 𝕊\mathbb{S} and observables 𝕆\mathbb{O} by using NN such noisy layered circuits. Although the noise strength can vary for different locations, we suppose that LL layers are followed by the local depolarizing noise with noise strength of at least γ\gamma. We call these layers U1,U2,,ULU_{1},U_{2},\dots,U_{L} and let γn,l,m\gamma_{n,l,m} denote the noise strength of the local depolarizing noise on the mthm^{\rm th} qubit after the lthl^{\rm th} unitary layer UlU_{l} in the nthn^{\rm th} noisy circuit, where mM,lL,nNm\leq M,\,l\leq L,\,n\leq N. This gives the expression of the local depolarizing noise after lthl^{\rm th} layer in the nthn^{\rm th} noisy circuit as m=1M𝒟γn,l,m\otimes_{m=1}^{M}\mathcal{D}_{\gamma_{n,l,m}}, where γn,l,mγn,l,m\gamma_{n,l,m}\geq\gamma\,\forall n,l,m.

Here, we focus on the error-mitigation protocols that apply an arbitrary trailing process over NN distorted states and any unital operations (i.e., operations that preserve the maximally mixed state) before and after UlU_{l} (Fig. 2). This structure ensures that error correction does not come into play here, as the size of input and output spaces of the intermediate unital channels is restricted to MM qubits, as well as that unital channels do not serve as good decoders for error correction.

We show that the necessary number of samples required to achieve the target performance grows exponentially with the number of layers in both performance quantifiers introduced above.

Theorem 3.

Suppose that an error-mitigation strategy described above is applied to an MM-qubit circuit to mitigate local depolarizing channels with strength at least γ\gamma that follow LL layers of unitaries, and achieves (1) with some δ0\delta\geq 0 and 0ε1/20\leq\varepsilon\leq 1/2. Then, if there exist at least two states ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that D𝕆(ρ,σ)2δD_{\mathbb{O}}(\rho,\sigma)\geq 2\delta, the required sample number NN is lower bounded as

N(12ε)22ln(2)M(1γ)2L.\displaystyle N\geq\frac{(1-2\varepsilon)^{2}}{2\ln(2)\,M(1-\gamma)^{2L}}. (5)

The proof can be found in Appendix F. This result particularly shows that the required number of samples must grow exponentially with the circuit depth LL. We remark that the bound always holds under the mild condition, i.e., D𝕆(ρ,σ)2δD_{\mathbb{O}}(\rho,\sigma)\geq 2\delta for some ρ,σ𝕊\rho,\sigma\in\mathbb{S}. This reflects that, to achieve the desired accuracy δ\delta satisfying this condition, error mitigation really needs to extract the expectation values about the observables in 𝕆\mathbb{O} and the states in 𝕊\mathbb{S}, prohibiting it from merely making a random guess.

In Appendix G, we obtain a similar exponential growth of the required sample overhead for a fixed target bias and standard deviation. We also obtain in Appendix H alternative bounds that are tighter in the range of small ε\varepsilon.

With a suitable modification of allowed unitaries and intermediate operations, we extend these results to a wide class of noise models, including stochastic Pauli, global depolarizing, and thermal noise. The case of thermal noise particularly provides an intriguing physical interpretation: the sampling cost NN required to mitigate thermal noise after time tt is characterized by the loss of free energy N=Ω(1/[F(ρt)Feq])N=\Omega(1/[F(\rho_{t})-F_{\rm eq}]) where ρt\rho_{t} is the state at time tt and FeqF_{\rm eq} is the equilibrium free energy. This in turn shows that the necessary sampling cost grows as N=Ω(eαentt)N=\Omega(e^{\alpha_{\rm ent}t}) where αent\alpha_{\rm ent} is a constant characterized by the minimum entropy production rate. We provide details on these extensions in Appendix I.

We remark that Theorem 3 (and related results discussed in the Appendices) extends the previous results showing the exponential resource overhead required for noisy circuits without postprocessing [9, 10, 11]. In Appendix J, we provide further clarifications about the differences between the settings considered in the previous works and ours.

Refer to caption
Figure 2: Each distorted state (nthn^{\rm th} copy depicted in the figure) is produced by a circuit with LL layers U1,,ULU_{1},\dots,U_{L} followed by a local depolarizing noise with noise strength at least γ\gamma, i.e., γn,l,mγ,n,l,m\gamma_{n,l,m}\geq\gamma,\forall n,l,m. Each layer UlU_{l} can be sandwiched by additional unital operations Λn,l\Lambda_{n,l} and Ξn,l\Xi_{n,l}. Other layers and depolarizing channels with noise strength smaller than γ\gamma are absorbed in these operations as they are also unital.

Conclusions — We established sampling lower bounds imposed on the general quantum error-mitigation protocols. Our results formalize the idea that the reduction in the state distinguishability caused by noise and error-mitigation processes leads to the unavoidable computational overhead in quantum error mitigation. We then showed that error-mitigation protocols with certain intermediate operations and an arbitrary trailing process require the number of samples that grows exponentially with the circuit depth to mitigate various types of noise. We presented these bounds with respect to multiple performance quantifiers — accuracy and success probability, as well as the standard deviation and bias — each of which has its own operational relevance.

Our bounds provide fundamental limitations that universally apply to general mitigation protocols, clarifying the underlying principle that regulates error-mitigation performance. As a trade-off, they may not give tight estimates for a given specific error-mitigation strategy, analogously to many other converse bounds established in other fields that typically give loose bounds for most specific protocols. A thorough study to identify in what setting our bounds can give good estimates will make an interesting future research direction.

Acknowledgements.
Acknowledgments — We thank Suguru Endo, Shintaro Minagawa, Masato Koashi, and Daniel Stilck França for helpful discussions, and Kento Tsubouchi, Takahiro Sagawa, and Nobuyuki Yoshioka for sharing a preliminary version of their manuscript. We acknowledge the support of the Singapore Ministry of Education Tier 1 Grants RG146/20 and RG77/22 (S), the NRF2021-QEP2- 02-P06 from the Singapore Research Foundation and the Singapore Ministry of Education Tier 2 Grant T2EP50221-0014 and the FQXi R-710-000-146-720 Grant “Are quantum agents more energetically efficient at making predictions?” from the Foundational Questions Institute and Fetzer Franklin Fund (a donor-advised fund of Silicon Valley Community Foundation). R.T. was also supported by the Lee Kuan Yew Postdoctoral Fellowship at Nanyang Technological University Singapore. H.T. is supported by JSPS Grants-in-Aid for Scientific Research No. JP19K14610 and No. JP22H05250, JST PRESTO No. JPMJPR2014, and JST MOONSHOT No. JPMJMS2061.

Note added. — During the completion of our manuscript, we became aware of an independent work by Tsubouchi et al. [81] that obtained a result related to our Theorem S.2 in Appendix G, in which they showed an alternative exponential sample lower bound applicable to error-mitigation protocols that achieve zero bias using quantum estimation theory.

References

  • Preskill [2018] J. Preskill, Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018).
  • McArdle et al. [2020] S. McArdle, S. Endo, A. Aspuru-Guzik, S. C. Benjamin,  and X. Yuan, Quantum computational chemistry, Rev. Mod. Phys. 92, 015003 (2020).
  • Endo et al. [2021] S. Endo, Z. Cai, S. C. Benjamin,  and X. Yuan, Hybrid quantum-classical algorithms and quantum error mitigation, J. Phys. Soc. Japan 90, 032001 (2021).
  • Yuan et al. [2016] X. Yuan, Z. Zhang, N. Lütkenhaus,  and X. Ma, Simulating single photons with realistic photon sources, Phys. Rev. A 94, 062305 (2016).
  • Temme et al. [2017] K. Temme, S. Bravyi,  and J. M. Gambetta, Error mitigation for short-depth quantum circuits, Phys. Rev. Lett. 119, 180509 (2017).
  • Li and Benjamin [2017] Y. Li and S. C. Benjamin, Efficient variational quantum simulator incorporating active error minimization, Phys. Rev. X 7, 021050 (2017).
  • Endo et al. [2018] S. Endo, S. C. Benjamin,  and Y. Li, Practical quantum error mitigation for near-future applications, Phys. Rev. X 8, 031027 (2018).
  • Bravyi et al. [2021] S. Bravyi, S. Sheldon, A. Kandala, D. C. Mckay,  and J. M. Gambetta, Mitigating measurement errors in multiqubit experiments, Phys. Rev. A 103, 042605 (2021).
  • Unruh [1995] W. G. Unruh, Maintaining coherence in quantum computers, Phys. Rev. A 51, 992 (1995).
  • Aharonov et al. [1996] D. Aharonov, M. Ben-Or, R. Impagliazzo,  and N. Nisan, Limitations of Noisy Reversible Computation, arXiv:quant-ph/9611028 (1996).
  • Palma et al. [1996] G. M. Palma, K.-a. Suominen,  and A. Ekert, Quantum computers and dissipation, Proc. R. Soc. A: Math. Phys. Eng. Sci. 452, 567 (1996).
  • McClean et al. [2017] J. R. McClean, M. E. Kimchi-Schwartz, J. Carter,  and W. A. de Jong, Hybrid quantum-classical hierarchy for mitigation of decoherence and determination of excited states, Phys. Rev. A 95, 042308 (2017).
  • Bonet-Monroig et al. [2018] X. Bonet-Monroig, R. Sagastizabal, M. Singh,  and T. E. O’Brien, Low-cost error mitigation by symmetry verification, Phys. Rev. A 98, 062339 (2018).
  • McClean et al. [2020] J. R. McClean, Z. Jiang, N. C. Rubin, R. Babbush,  and H. Neven, Decoding quantum errors with subspace expansions, Nat. Commun. 11, 636 (2020).
  • Koczor [2021] B. Koczor, Exponential error suppression for near-term quantum devices, Phys. Rev. X 11, 031057 (2021).
  • Huggins et al. [2021] W. J. Huggins, S. McArdle, T. E. O’Brien, J. Lee, N. C. Rubin, S. Boixo, K. B. Whaley, R. Babbush,  and J. R. McClean, Virtual distillation for quantum error mitigation, Phys. Rev. X 11, 041036 (2021).
  • Yoshioka et al. [2022] N. Yoshioka, H. Hakoshima, Y. Matsuzaki, Y. Tokunaga, Y. Suzuki,  and S. Endo, Generalized quantum subspace expansion, Phys. Rev. Lett. 129, 020502 (2022).
  • Czarnik et al. [2021] P. Czarnik, A. Arrasmith, P. J. Coles,  and L. Cincio, Error mitigation with Clifford quantum-circuit data, Quantum 5, 592 (2021).
  • Strikis et al. [2021] A. Strikis, D. Qin, Y. Chen, S. C. Benjamin,  and Y. Li, Learning-based quantum error mitigation, PRX Quantum 2, 040330 (2021).
  • Lowe et al. [2021] A. Lowe, M. H. Gordon, P. Czarnik, A. Arrasmith, P. J. Coles,  and L. Cincio, Unified approach to data-driven quantum error mitigation, Phys. Rev. Research 3, 033098 (2021).
  • Czarnik et al. [2021] P. Czarnik, A. Arrasmith, L. Cincio,  and P. J. Coles, Qubit-efficient exponential suppression of errors, arXiv:2102.06056 (2021).
  • O’Brien et al. [2021] T. E. O’Brien, S. Polla, N. C. Rubin, W. J. Huggins, S. McArdle, S. Boixo, J. R. McClean,  and R. Babbush, Error mitigation via verified phase estimation, PRX Quantum 2, 020317 (2021).
  • Huo and Li [2022] M. Huo and Y. Li, Dual-state purification for practical quantum error mitigation, Phys. Rev. A 105, 022427 (2022).
  • Bultrini et al. [2023] D. Bultrini, M. H. Gordon, P. Czarnik, A. Arrasmith, M. Cerezo, P. J. Coles,  and L. Cincio, Unifying and benchmarking state-of-the-art quantum error mitigation techniques, Quantum 7, 1034 (2023).
  • Cai [2021a] Z. Cai, Quantum Error Mitigation using Symmetry Expansion, Quantum 5, 548 (2021a).
  • Wang et al. [2023] K. Wang, Y.-A. Chen,  and X. Wang, Mitigating quantum errors via truncated Neumann series, Sci. China Inf. Sci. 66, 180508 (2023).
  • Cai [2021b] Z. Cai, Multi-exponential error extrapolation and combining error mitigation techniques for NISQ applications, npj Quantum Inf. 7, 80 (2021b).
  • Mari et al. [2021] A. Mari, N. Shammah,  and W. J. Zeng, Extending quantum probabilistic error cancellation by noise scaling, Phys. Rev. A 104, 052607 (2021).
  • Mezher et al. [2022] R. Mezher, J. Mills,  and E. Kashefi, Mitigating errors by quantum verification and postselection, Phys. Rev. A 105, 052608 (2022).
  • Fontana et al. [2022] E. Fontana, I. Rungger, R. Duncan,  and C. Cîrstoiu, Spectral analysis for noise diagnostics and filter-based digital error mitigation, arXiv:2206.08811 (2022).
  • Xiong et al. [2022] Y. Xiong, S. X. Ng,  and L. Hanzo, Quantum error mitigation relying on permutation filtering, IEEE Trans. Commun. 70, 1927 (2022).
  • Carnot [1824] S. Carnot, Reflections on the motive power of fire, and on machines fitted to develop that power, Paris: Bachelier 108, 1824 (1824).
  • Landauer [1961] R. Landauer, Irreversibility and heat generation in the computing process, IBM J. Res. Dev. 5, 183 (1961).
  • Brandão et al. [2015] F. Brandão, M. Horodecki, N. Ng, J. Oppenheim,  and S. Wehner, The second laws of quantum thermodynamics, Proc. Natl. Acad. Sci. U.S.A. 112, 3275 (2015).
  • Bennett et al. [1999] C. H. Bennett, P. W. Shor, J. A. Smolin,  and A. V. Thapliyal, Entanglement-assisted classical capacity of noisy quantum channels, Phys. Rev. Lett. 83, 3081 (1999).
  • Pirandola et al. [2017] S. Pirandola, R. Laurenza, C. Ottaviani,  and L. Banchi, Fundamental limits of repeaterless quantum communications, Nat. Commun. 8, 15043 (2017).
  • Regula and Takagi [2021] B. Regula and R. Takagi, Fundamental limitations on distillation of quantum channel resources, Nat. Commun. 12, 4411 (2021).
  • Fang and Liu [2022] K. Fang and Z.-W. Liu, No-go theorems for quantum resource purification: New approach and channel theory, PRX Quantum 3, 010337 (2022).
  • Takagi et al. [2022] R. Takagi, S. Endo, S. Minagawa,  and M. Gu, Fundamental limits of quantum error mitigation, npj Quantum Inf. 8, 114 (2022).
  • Wang et al. [2021a] S. Wang, P. Czarnik, A. Arrasmith, M. Cerezo, L. Cincio,  and P. J. Coles, Can error mitigation improve trainability of noisy variational quantum algorithms? arXiv:2109.01051 (2021a).
  • Wang et al. [2021b] S. Wang, E. Fontana, M. Cerezo, K. Sharma, A. Sone, L. Cincio,  and P. J. Coles, Noise-induced barren plateaus in variational quantum algorithms, Nat. Commun. 12, 6961 (2021b).
  • Stilck França and García-Patrón [2021] D. Stilck França and R. García-Patrón, Limitations of optimization algorithms on noisy quantum devices, Nat. Phys. 17, 1221 (2021).
  • De Palma et al. [2023] G. De Palma, M. Marvian, C. Rouzé,  and D. S. França, Limitations of variational quantum algorithms: A quantum optimal transport approach, PRX Quantum 4, 010309 (2023).
  • Nation et al. [2021] P. D. Nation, H. Kang, N. Sundaresan,  and J. M. Gambetta, Scalable mitigation of measurement errors on quantum computers, PRX Quantum 2, 040326 (2021).
  • Viola and Lloyd [1998] L. Viola and S. Lloyd, Dynamical suppression of decoherence in two-state quantum systems, Phys. Rev. A 58, 2733 (1998).
  • Viola et al. [1999] L. Viola, E. Knill,  and S. Lloyd, Dynamical decoupling of open quantum systems, Phys. Rev. Lett. 82, 2417 (1999).
  • Buscemi et al. [2013] F. Buscemi, M. Dall’Arno, M. Ozawa,  and V. Vedral, Direct observation of any two-point quantum correlation function, arXiv:1312.4240 (2013).
  • Takagi [2021] R. Takagi, Optimal resource cost for error mitigation, Phys. Rev. Research 3, 033178 (2021).
  • Regula et al. [2021] B. Regula, R. Takagi,  and M. Gu, Operational applications of the diamond norm and related measures in quantifying the non-physicality of quantum maps, Quantum 5, 522 (2021).
  • Jiang et al. [2021] J. Jiang, K. Wang,  and X. Wang, Physical Implementability of Linear Maps and Its Application in Error Mitigation, Quantum 5, 600 (2021).
  • Sun et al. [2021] J. Sun, X. Yuan, T. Tsunoda, V. Vedral, S. C. Benjamin,  and S. Endo, Mitigating realistic noise in practical noisy intermediate-scale quantum devices, Phys. Rev. Applied 15, 034026 (2021).
  • Piveteau et al. [2022] C. Piveteau, D. Sutter,  and S. Woerner, Quasiprobability decompositions with reduced sampling overhead, npj Quantum Inf. 8, 12 (2022).
  • Note [1] Here, we consider the standard matrix inequality with respect to the positive semidefinite cone, i.e., M1M2M_{1}\leq M_{2} for two matrices M1M_{1} and M2M_{2} means that the matrix M2M1M_{2}-M_{1} is positive semidefinite.
  • Nielsen and Chuang [2011] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information, 10th ed. (Cambridge University Press, New York, NY, USA, 2011).
  • Note [2] See the Supplemental Material for detailed proofs and discussions of our main results, which includes Refs. [56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80].
  • Fuchs and van de Graaf [1999] C. Fuchs and J. van de Graaf, Cryptographic distinguishability measures for quantum-mechanical states, IEEE Trans. Inf. Theory 45, 1216 (1999).
  • Hiai et al. [1981] F. Hiai, M. Ohya,  and M. Tsukada, Sufficiency, KMS condition and relative entropy in von Neumann algebras. Pac. J. Math. 96, 99 (1981).
  • Reeb et al. [2011] D. Reeb, M. J. Kastoryano,  and M. M. Wolf, Hilbert’s projective metric in quantum information theory, J. Math. Phys. 52, 082201 (2011).
  • Kastoryano and Temme [2013] M. J. Kastoryano and K. Temme, Quantum logarithmic sobolev inequalities and rapid mixing, J. Math. Phys. 54, 052202 (2013).
  • De Palma et al. [2021] G. De Palma, M. Marvian, D. Trevisan,  and S. Lloyd, The quantum wasserstein distance of order 1, IEEE Trans. Inf. Theory 67, 6627 (2021).
  • Gilchrist et al. [2005] A. Gilchrist, N. K. Langford,  and M. A. Nielsen, Distance measures to compare real and ideal quantum processes, Phys. Rev. A 71, 062310 (2005).
  • Tajima et al. [2018] H. Tajima, N. Shiraishi,  and K. Saito, Uncertainty relations in implementation of unitary operations, Phys. Rev. Lett. 121, 110403 (2018).
  • Katsube et al. [2020] R. Katsube, M. Hotta,  and K. Yamaguchi, A fundamental upper bound for signal to noise ratio of quantum detectors, J. Phys. Soc. Japan 89, 054005 (2020).
  • Hayashi [2016] M. Hayashi, Quantum Information Theory: Mathematical Foundation (Springer, 2016).
  • Audenaert and Eisert [2005] K. M. R. Audenaert and J. Eisert, Continuity bounds on the quantum relative entropy, J. Math. Phys. 46, 102104 (2005).
  • Carbone and Martinelli [2015] R. Carbone and A. Martinelli, Logarithmic sobolev inequalities in non-commutative algebras, Infin. Dimens. Anal. Quantum Probab. Relat. Top. 18, 1550011 (2015).
  • Kastoryano and Brandão [2016] M. J. Kastoryano and F. G. S. L. Brandão, Quantum Gibbs Samplers: The Commuting Case, Commun. Math. Phys. 344, 915 (2016).
  • Müller-Hermes et al. [2016a] A. Müller-Hermes, D. Stilck França,  and M. M. Wolf, Relative entropy convergence for depolarizing channels, J. Math. Phys. 57, 022202 (2016a).
  • Müller-Hermes et al. [2016b] A. Müller-Hermes, D. Stilck França,  and M. M. Wolf, Entropy production of doubly stochastic quantum channels, J. Math. Phys. 57, 022203 (2016b).
  • Bardet et al. [2021] I. Bardet, Á. Capel, A. Lucia, D. Pérez-García,  and C. Rouzé, On the modified logarithmic sobolev inequality for the heat-bath dynamics for 1d systems, J. Math. Phys. 62, 061901 (2021).
  • Beigi et al. [2020] S. Beigi, N. Datta,  and C. Rouzé, Quantum Reverse Hypercontractivity: Its Tensorization and Application to Strong Converses, Commun. Math. Phys. 376, 753 (2020).
  • Capel et al. [2020] Á. Capel, C. Rouzé,  and D. Stilck França, The modified logarithmic Sobolev inequality for quantum spin systems: classical and commuting nearest neighbour interactions, arXiv:2009.11817 (2020).
  • Hirche et al. [2022] C. Hirche, C. Rouzé,  and D. Stilck França, On contraction coefficients, partial orders and approximation of capacities for quantum channels, Quantum 6, 862 (2022).
  • Müller-Lennert et al. [2013] M. Müller-Lennert, F. Dupuis, O. Szehr, S. Fehr,  and M. Tomamichel, On quantum rényi entropies: A new generalization and some properties, J. Math. Phys. 54, 122203 (2013).
  • Gorini et al. [1976] V. Gorini, A. Kossakowski,  and E. C. G. Sudarshan, Completely positive dynamical semigroups of n‐level systems, J. Math. Phys. 17, 821 (1976).
  • Temme et al. [2014] K. Temme, F. Pastawski,  and M. J. Kastoryano, Hypercontractivity of quasi-free quantum semigroups, J. Phys. A: Math. Theor. 47, 405303 (2014).
  • Temme and Kastoryano [2015] K. Temme and M. J. Kastoryano, How fast do stabilizer Hamiltonians thermalize? arXiv:1505.07811 (2015).
  • Capel et al. [2018] Á. Capel, A. Lucia,  and D. Pérez-García, Quantum conditional relative entropy and quasi-factorization of the relative entropy, J. Phys. A: Math. Theor. 51, 484001 (2018).
  • Temme et al. [2010] K. Temme, M. J. Kastoryano, M. B. Ruskai, M. M. Wolf,  and F. Verstraete, The chi2-divergence and mixing times of quantum Markov processes, J. Math. Phys. 51, 122201 (2010).
  • [80] Qiskit: Device backend noise model simulations, .
  • Tsubouchi et al. [2023] K. Tsubouchi, T. Sagawa,  and N. Yoshioka, Universal cost bound of quantum error mitigation based on quantum estimation theory, Phys. Rev. Lett. 131, 210601 (2023), published concurrently in the same issue.

Universal Sampling Lower Bounds for Quantum Error Mitigation
Supplemental Material

Appendix A Proof of Theorem 1

Proof.

For an ideal state ρ𝕊\rho\in\mathbb{S} and an observable A𝕆A\in\mathbb{O}, let pA,ρp_{A,\rho} be the probability distribution for E^A(ρ)\hat{E}_{A}(\rho). This means that we have

𝒫A(n=1Nn(ρ))=dapA,ρ(a)|aa|\displaystyle\mathcal{P}_{A}\left(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho)\right)=\int dap_{A,\rho}(a)|{a}\rangle\!\langle{a}| (6)

where {|a}\{\ket{a}\} is the classical states representing possible estimates represented by the random variable E^A(ρ)\hat{E}_{A}(\rho).

Take arbitrary two states ρ,σ𝕊\rho,\sigma\in\mathbb{S} satisfying |Tr(Aρ)Tr(Aσ)|2δ|\operatorname{Tr}(A\rho)-\operatorname{Tr}(A\sigma)|\geq 2\delta. We assume Tr(Aρ)Tr(Aσ)\operatorname{Tr}(A\rho)\geq\operatorname{Tr}(A\sigma) without loss of generality. Let us divide the regions of estimates into three sections as (see Fig. S.1)

L{a|aTr(Aσ)+δ},M{a|Tr(Aσ)+δ<a<Tr(Aρ)δ},R{a|aTr(Aρ)δ}.\displaystyle L\coloneqq\left\{\left.a\;\rule{0.0pt}{9.5pt}\right|\;a\leq\operatorname{Tr}(A\sigma)+\delta\right\},\ M\coloneqq\left\{\left.a\;\rule{0.0pt}{9.5pt}\right|\;\operatorname{Tr}(A\sigma)+\delta<a<\operatorname{Tr}(A\rho)-\delta\right\},\ R\coloneqq\left\{\left.a\;\rule{0.0pt}{9.5pt}\right|\;a\geq\operatorname{Tr}(A\rho)-\delta\right\}. (7)
Refer to caption
Figure S.1: Probability distributions of E^A(ρ)\hat{E}_{A}(\rho) and E^A(σ)\hat{E}_{A}(\sigma) for some states ρ,σ𝕊\rho,\sigma\in\mathbb{S}. The condition (1) requires that 1ε1-\varepsilon of the probability must be present around the true expectation values with δ\delta deviation for both ρ\rho and σ\sigma.

The condition (1) ensures that

L𝑑apA,ρ(a)ε,R𝑑apA,σ(a)ε,\displaystyle\int_{L}da\,p_{A,\rho}(a)\leq\varepsilon,\quad\int_{R}da\,p_{A,\sigma}(a)\leq\varepsilon, (8)
R𝑑apA,ρ(a)1ε,L𝑑apA,σ(a)1ε.\displaystyle\int_{R}da\,p_{A,\rho}(a)\geq 1-\varepsilon,\quad\int_{L}da\,p_{A,\sigma}(a)\geq 1-\varepsilon.

This allows us to bound the trace distance between two classical probability distributions pA,ρp_{A,\rho} and pA,σp_{A,\sigma} as

Dtr(pA,ρ,pA,σ)\displaystyle D_{\rm tr}(p_{A,\rho},p_{A,\sigma}) =12L+M+R𝑑a|pA,ρ(a)pA,σ(a)|\displaystyle=\frac{1}{2}\int_{L+M+R}da\,|p_{A,\rho}(a)-p_{A,\sigma}(a)| (9)
12L𝑑a|pA,ρ(a)pA,σ(a)|+12R𝑑a|pA,ρ(a)pA,σ(a)|\displaystyle\geq\frac{1}{2}\int_{L}da\,|p_{A,\rho}(a)-p_{A,\sigma}(a)|+\frac{1}{2}\int_{R}da\,|p_{A,\rho}(a)-p_{A,\sigma}(a)|
12L𝑑a[pA,σ(a)pA,ρ(a)]+12R𝑑a[pA,ρ(a)pA,σ(a)]\displaystyle\geq\frac{1}{2}\int_{L}da\,\left[p_{A,\sigma}(a)-p_{A,\rho}(a)\right]+\frac{1}{2}\int_{R}da\,\left[p_{A,\rho}(a)-p_{A,\sigma}(a)\right]
12ε\displaystyle\geq 1-2\varepsilon

where we used (8) in the last inequality. Noting that 𝒫A\mathcal{P}_{A} in (6) is a quantum channel, we apply the data-processing inequality to the trace distance to get

Dtr(n=1Nn(ρ),n=1Nn(σ))12ε.\displaystyle D_{\rm tr}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))\geq 1-2\varepsilon. (10)

To extract NN out, recall that Dtr(ρ,σ)1F(ρ,σ)D_{\rm tr}(\rho,\sigma)\leq\sqrt{1-F(\rho,\sigma)} [56] and that F(iρi,iσi)=iF(ρi,σi)F(\otimes_{i}\rho_{i},\otimes_{i}\sigma_{i})=\prod_{i}F(\rho_{i},\sigma_{i}). Under the condition ε1/2\varepsilon\leq 1/2, we can further bound (10) as

1n=1NF(n(ρ),n(σ))(12ε)2,\displaystyle 1-\prod_{n=1}^{N}F(\mathcal{E}_{n}(\rho),\mathcal{E}_{n}(\sigma))\geq(1-2\varepsilon)^{2}, (11)

leading to

n=1NlogF(n(ρ),n(σ))1log[14ε(1ε)].\displaystyle\sum_{n=1}^{N}\log F(\mathcal{E}_{n}(\rho),\mathcal{E}_{n}(\sigma))^{-1}\geq\log\left[\frac{1}{4\varepsilon(1-\varepsilon)}\right]. (12)

Taking nargminnF(n(ρ),n(σ))n^{\star}\coloneqq{\rm argmin}_{n}F(\mathcal{E}_{n}(\rho),\mathcal{E}_{n}(\sigma)), i.e., F(n(ρ),n(σ))1F(i(ρ),i(σ))1,iF(\mathcal{E}_{n^{\star}}(\rho),\mathcal{E}_{n^{\star}}(\sigma))^{-1}\geq F(\mathcal{E}_{i}(\rho),\mathcal{E}_{i}(\sigma))^{-1},\forall i, we get

Nlog[14ε(1ε)]logF(n(ρ),n(σ))1.\displaystyle N\geq\frac{\log\left[\frac{1}{4\varepsilon(1-\varepsilon)}\right]}{\log F(\mathcal{E}_{n^{\star}}(\rho),\mathcal{E}_{n^{\star}}(\sigma))^{-1}}. (13)

Since this holds for every state ρ\rho and σ\sigma such that |Tr(Aρ)Tr(Aσ)|2δ|\operatorname{Tr}(A\rho)-\operatorname{Tr}(A\sigma)|\geq 2\delta, we maximize the right-hand side over such states to reach the advertised statement.

The relation with respect to the relative entropy can be obtained by upper bounding (10) with the relative entropy. Using the quantum Pinsker’s inequality [57]

Dtr(ρ,σ)ln22S(ρσ)\displaystyle D_{\operatorname{tr}}(\rho,\sigma)\leq\sqrt{\frac{\ln 2}{2}}\sqrt{S(\rho\|\sigma)} (14)

and the additivity S(iρiiσi)=iS(ρiσi)S(\otimes_{i}\rho_{i}\|\otimes_{i}\sigma_{i})=\sum_{i}S(\rho_{i}\|\sigma_{i}), we can bound (10) as

ln22n=1NS(n(ρ)n(σ))(12ε)2\displaystyle\frac{\ln 2}{2}\sum_{n=1}^{N}S\left(\mathcal{E}_{n}(\rho)\|\mathcal{E}_{n}(\sigma)\right)\geq(1-2\varepsilon)^{2} (15)

when ε1/2\varepsilon\leq 1/2. Taking nargmaxnS(n(ρ)n(σ))n^{\star}\coloneqq{\rm argmax}_{n}S\left(\mathcal{E}_{n}(\rho)\|\mathcal{E}_{n}(\sigma)\right), i.e., S(n(ρ)n(σ))S(i(ρ)i(σ)),iS(\mathcal{E}_{n^{\star}}(\rho)\|\mathcal{E}_{n^{\star}}(\sigma))\geq S(\mathcal{E}_{i}(\rho)\|\mathcal{E}_{i}(\sigma)),\forall i, we get

N2(12ε)2ln2S(n(ρ)n(σ)).\displaystyle N\geq\frac{2(1-2\varepsilon)^{2}}{\ln 2\cdot S(\mathcal{E}_{n^{\star}}(\rho)\|\mathcal{E}_{n^{\star}}(\sigma))}. (16)

Since this holds for every state ρ\rho and σ\sigma such that |Tr(Aρ)Tr(Aσ)|2δ|\operatorname{Tr}(A\rho)-\operatorname{Tr}(A\sigma)|\geq 2\delta, we maximize the right-hand side over such states to reach the advertised statement.

Appendix B Alternative bound with an explicit accuracy dependence

Here, we obtain an alternative bound to the one in Theorem 1 that has an explicit dependence on the accuracy δ\delta when 𝕊\mathbb{S} contains all pure states. To this end, let us define a generalized contraction coefficient with respect to 𝕆\mathbb{O} by

η𝔼,𝕆max𝔼maxρ,σ𝔻Dtr((ρ),(σ))D𝕆(ρ,σ),\displaystyle\eta^{\mathbb{E},\mathbb{O}}\coloneqq\max_{\mathcal{E}\in\mathbb{E}}\max_{\rho,\sigma\in\mathbb{D}}\frac{D_{\operatorname{tr}}(\mathcal{E}(\rho),\mathcal{E}(\sigma))}{D_{\mathbb{O}}(\rho,\sigma)}, (17)

where 𝔻\mathbb{D} is an arbitrary set that contains all pure quantum states. Note that η𝔼,𝕆\eta^{\mathbb{E},\mathbb{O}} can take any nonnegative value in general. However, this particularly becomes the measure of contraction of trace distance when 𝕆={A| 0A𝕀}\mathbb{O}=\{A\,|\,0\leq A\leq\mathbb{I}\}, in which case 0η𝔼,𝕆10\leq\eta^{\mathbb{E},\mathbb{O}}\leq 1 always holds [58, 59, 60]. The following result represents the lower bound in terms of accuracy, success probability, and the generalized contraction coefficient.

Proposition S.1.

Suppose 𝕊\mathbb{S} contains the set of all pure states. Then, the sampling cost NN that satisfies (1) is lower bounded as

Nlog[14ε(1ε)]log1(12η𝔼,𝕆δ)2.\displaystyle N\geq\frac{\log\left[\frac{1}{4\varepsilon(1-\varepsilon)}\right]}{\log\frac{1}{\left(1-2\eta^{\mathbb{E},\mathbb{O}}\,\delta\right)^{2}}}. (18)

In the case of ε1\varepsilon\ll 1 and δ1\delta\ll 1, the lower bound approximately becomes log(14ε)/(4η𝔼,𝕆δ)\log\left(\frac{1}{4\varepsilon}\right)/(4\eta^{\mathbb{E},\mathbb{O}}\,\delta). This bound has the advantage of having explicit dependence on the accuracy and separating the contraction coefficient of the effective noise channel, making explicit the role of the reduction in the state distinguishability.

Proof.

Let η𝔼,𝕆,𝕊\eta^{\mathbb{E},\mathbb{O},\mathbb{S}} be the generalized contraction coefficient (c.f., Refs. [58, 59, 60]) for \mathcal{E} defined by

η𝔼,𝕆,𝕊max𝔼maxρ,σ𝕊Dtr((ρ),(σ))D𝕆(ρ,σ).\displaystyle\eta^{\mathbb{E},\mathbb{O},\mathbb{S}}\coloneqq\max_{\mathcal{E}\in\mathbb{E}}\max_{\rho,\sigma\in\mathbb{S}}\frac{D_{\operatorname{tr}}(\mathcal{E}(\rho),\mathcal{E}(\sigma))}{D_{\mathbb{O}}(\rho,\sigma)}. (19)

Noting that F(ρ,σ)[1Dtr(ρ,σ)]2F(\rho,\sigma)\geq\left[1-D_{\operatorname{tr}}(\rho,\sigma)\right]^{2}, we have for arbitrary ρ,σ𝕊\rho,\sigma\in\mathbb{S} that

max𝔼logF((ρ),(σ))1\displaystyle\max_{\mathcal{E}\in\mathbb{E}}\log F(\mathcal{E}(\rho),\mathcal{E}(\sigma))^{-1} log1[1max𝔼Dtr((ρ),(σ))]2\displaystyle\leq\log\frac{1}{\left[1-\max_{\mathcal{E}\in\mathbb{E}}D_{\operatorname{tr}}(\mathcal{E}(\rho),\mathcal{E}(\sigma))\right]^{2}} (20)
log1[1η𝔼,𝕆,𝕊maxA𝕆|Tr[A(ρσ)]|]2\displaystyle\leq\log\frac{1}{\left[1-\eta^{\mathbb{E},\mathbb{O},\mathbb{S}}\max_{A\in\mathbb{O}}|\operatorname{Tr}[A(\rho-\sigma)]|\right]^{2}}

Let δ~minρ,σ𝕊{δ|δ=maxA𝕆|Tr[A(ρσ)]|2δ}\tilde{\delta}\coloneqq\min_{\rho,\sigma\in\mathbb{S}}\left\{\left.\delta^{\prime}\;\rule{0.0pt}{9.5pt}\right|\;\delta^{\prime}=\max_{A\in\mathbb{O}}|\operatorname{Tr}[A(\rho-\sigma)]|\geq 2\delta\right\}. Maximizing the lower bound in (3) over ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that maxA𝕆|Tr(Aρ)Tr(Aσ)|2δ\max_{A\in\mathbb{O}}|\operatorname{Tr}(A\rho)-\operatorname{Tr}(A\sigma)|\geq 2\delta, we get

Nlog[14ε(1ε)]log1(1η𝔼,𝕆,𝕊δ~)2.\displaystyle N\geq\frac{\log\left[\frac{1}{4\varepsilon(1-\varepsilon)}\right]}{\log\frac{1}{\left(1-\eta^{\mathbb{E},\mathbb{O},\mathbb{S}}\,\tilde{\delta}\right)^{2}}}. (21)

Now, let 𝔻\mathbb{D} be a set that contains all pure states and consider arbitrary error mitigation with 𝕊=𝔻\mathbb{S}=\mathbb{D}. In such a scenario, we have δ~=2δ\tilde{\delta}=2\delta and η𝔼,𝕆,𝕊=η𝔼,𝕆\eta^{\mathbb{E},\mathbb{O},\mathbb{S}}=\eta^{\mathbb{E},\mathbb{O}} where

η𝔼,𝕆max𝔼maxρ,σ𝔻Dtr((ρ),(σ))D𝕆(ρ,σ).\displaystyle\eta^{\mathbb{E},\mathbb{O}}\coloneqq\max_{\mathcal{E}\in\mathbb{E}}\max_{\rho,\sigma\in\mathbb{D}}\frac{D_{\operatorname{tr}}(\mathcal{E}(\rho),\mathcal{E}(\sigma))}{D_{\mathbb{O}}(\rho,\sigma)}. (22)

This gives

Nlog[14ε(1ε)]log1(12η𝔼δ)2,\displaystyle N\geq\frac{\log\left[\frac{1}{4\varepsilon(1-\varepsilon)}\right]}{\log\frac{1}{\left(1-2\eta^{\mathbb{E}}\,\delta\right)^{2}}}, (23)

concluding the proof.

Appendix C Proof of Theorem 2

Proof.

Let DF(ρ,σ)1F(ρ,σ)D_{F}(\rho,\sigma)\coloneqq\sqrt{1-F(\rho,\sigma)} be the purified distance [61]. For an observable AA, let σA(ρ)Tr[(ATr[Aρ])2ρ]\sigma_{A}(\rho)\coloneqq\sqrt{\operatorname{Tr}[(A-\operatorname{Tr}[A\rho])^{2}\rho]} be the standard deviation of the probability distribution for measuring the observable AA for state ρ\rho. Then, as an improvement of a relation reported in Ref. [62], it was shown in Ref. [63] that arbitrary states η,τ\eta,\tau and an observable OO satisfy

|Tr[O(ητ)]|DF(η,τ)(σO(η)+σO(τ)+|Tr[O(ητ)]|).\displaystyle|\operatorname{Tr}[O(\eta-\tau)]|\leq D_{F}(\eta,\tau)\left(\sigma_{O}(\eta)+\sigma_{O}(\tau)+|\operatorname{Tr}[O(\eta-\tau)]|\right). (24)

Using the data-processing inequality and applying (24) to the error-mitigated classical states, we get for arbitrary ρ,σ𝕊\rho,\sigma\in\mathbb{S} that

DF(n=1Nn(ρ),n=1Nn(σ))\displaystyle D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma)) DF(𝒫An=1Nn(ρ),𝒫An=1Nn(σ))\displaystyle\geq D_{F}(\mathcal{P}_{A}\circ\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\mathcal{P}_{A}\circ\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma)) (25)
ΔσAQEM(ρ)+σAQEM(σ)+Δ.\displaystyle\geq\frac{\Delta}{\sigma_{A}^{\operatorname{QEM}}(\rho)+\sigma_{A}^{\operatorname{QEM}}(\sigma)+\Delta}.

Here, Δ|Tr[A𝒫An=1Nn(ρ)]Tr[A𝒫An=1Nn(σ)]|=|E^A(ρ)E^A(σ)|\Delta\coloneqq|\operatorname{Tr}[A^{\prime}\,\mathcal{P}_{A}\circ\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho)]-\operatorname{Tr}[A^{\prime}\,\mathcal{P}_{A}\circ\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma)]|=|\left\langle\hat{E}_{A}(\rho)\right\rangle-\left\langle\hat{E}_{A}(\sigma)\right\rangle| where A=a|aa|A^{\prime}=\sum_{a}|{a}\rangle\!\langle{a}| is the observable corresponding to the estimate (c.f., (6)), and σAQEM(ρ)σA(𝒫An=1Nn(ρ))\sigma_{A}^{\operatorname{QEM}}(\rho)\coloneqq\sigma_{A^{\prime}}(\mathcal{P}_{A}\circ\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho)) is the standard deviation of E^A(ρ)\hat{E}_{A}(\rho). Reorganizing the terms gives

σAQEM(ρ)+σAQEM(σ)(1DF(n=1Nn(ρ),n=1Nn(σ))1)Δ\displaystyle\sigma_{A}^{\operatorname{QEM}}(\rho)+\sigma_{A}^{\operatorname{QEM}}(\sigma)\geq\left(\frac{1}{D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))}-1\right)\Delta (26)

Without loss of generality, let us take E^A(ρ)E^A(σ)0\left\langle\hat{E}_{A}(\rho)\right\rangle-\left\langle\hat{E}_{A}(\sigma)\right\rangle\geq 0. Then, recalling the definition of bias bA(ρ)E^A(ρ)Tr[Aρ]b_{A}(\rho)\coloneqq\left\langle\hat{E}_{A}(\rho)\right\rangle-\operatorname{Tr}[A\rho], we have

Δ=bA(ρ)bA(σ)+Tr[A(ρσ)].\displaystyle\Delta=b_{A}(\rho)-b_{A}(\sigma)+\operatorname{Tr}[A(\rho-\sigma)]. (27)

Combining these, we get

σAQEM(ρ)+σAQEM(σ)(1DF(n=1Nn(ρ),n=1Nn(σ))1)(Tr[A(ρσ)]+bA(ρ)bA(σ)).\displaystyle\sigma_{A}^{\operatorname{QEM}}(\rho)+\sigma_{A}^{\operatorname{QEM}}(\sigma)\geq\left(\frac{1}{D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))}-1\right)(\operatorname{Tr}[A(\rho-\sigma)]+b_{A}(\rho)-b_{A}(\sigma)). (28)

Defining he maximum standard deviation as σmaxQEMmaxA𝕆maxρ𝕊σA,maxQEM\sigma_{\max}^{\operatorname{QEM}}\coloneqq\max_{A\in\mathbb{O}}\max_{\rho\in\mathbb{S}}\sigma_{A,\max}^{\operatorname{QEM}} and the maximum bias bmaxmaxA𝕆maxρ𝕊|bA(ρ)|b_{\max}\coloneqq\max_{A\in\mathbb{O}}\max_{\rho\in\mathbb{S}}|b_{A}(\rho)|, we get

σmaxQEMmaxρ,σ𝕊12(1DF(n=1Nn(ρ),n=1Nn(σ))1)(D𝕆(ρ,σ)2bmax).\displaystyle\sigma_{\max}^{\operatorname{QEM}}\geq\max_{\rho,\sigma\in\mathbb{S}}\frac{1}{2}\left(\frac{1}{D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))}-1\right)(D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}). (29)

where D𝕆D_{\mathbb{O}} is the observable-dependent distinguishability defined in (2).

The bound (29) can be turned into a lower bound on NN. To see this, note that (29) ensures that for arbitrary ρ,σ𝕊\rho,\sigma\in\mathbb{S}, we have

σmaxQEM1DF(n=1Nn(ρ),n=1Nn(σ))2DF(n=1Nn(ρ),n=1Nn(σ))(D𝕆(ρ,σ)2bmax),\displaystyle\sigma_{\max}^{\operatorname{QEM}}\geq\frac{1-D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))}{2D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))}(D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}), (30)

which leads to

DF(n=1Nn(ρ),n=1Nn(σ))\displaystyle D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma)) D𝕆(ρ,σ)2bmax2σmaxQEM+D𝕆(ρ,σ)2bmax\displaystyle\geq\frac{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}{2\sigma_{\max}^{\operatorname{QEM}}+D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}} (31)
=11+2σmaxQEMD𝕆(ρ,σ)2bmax.\displaystyle=\frac{1}{1+\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}}.

Noting that

DF(n=1Nn(ρ),n=1Nn(σ))\displaystyle D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma)) =1F(n=1Nn(ρ),n=1Nn(σ))\displaystyle=\sqrt{1-F(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))} (32)
=1n=1NF(n(ρ),n(σ))\displaystyle=\sqrt{1-\prod_{n=1}^{N}F(\mathcal{E}_{n}(\rho),\mathcal{E}_{n}(\sigma))}

where we used the multiplicativity of the fidelity for tensor-product states. This gives

n=1NF(n(ρ),n(σ))11(1+2σmaxQEMD𝕆(ρ,σ)2bmax)2\displaystyle\prod_{n=1}^{N}F(\mathcal{E}_{n}(\rho),\mathcal{E}_{n}(\sigma))\leq 1-\frac{1}{\left(1+\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}\right)^{2}} (33)

for ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that D𝕆(ρ,σ)2bmax0D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0. Taking the inverse and logarithm on both sides, we get

n=1NlogF1(n(ρ),n(σ))log[11(1+2σmaxQEMD𝕆(ρ,σ)2bmax)2]1.\displaystyle\sum_{n=1}^{N}\log F^{-1}(\mathcal{E}_{n}(\rho),\mathcal{E}_{n}(\sigma))\geq\log\left[1-\frac{1}{\left(1+\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}\right)^{2}}\right]^{-1}. (34)

Noting

Nmax𝔼logF1((ρ),(σ))n=1NlogF1(n(ρ),n(σ)),\displaystyle N\max_{\mathcal{E}\in\mathbb{E}}\log F^{-1}(\mathcal{E}(\rho),\mathcal{E}(\sigma))\geq\sum_{n=1}^{N}\log F^{-1}(\mathcal{E}_{n}(\rho),\mathcal{E}_{n}(\sigma)), (35)

we reach

Nmin𝔼log[11(1+2σmaxQEMD𝕆(ρ,σ)2bmax)2]1logF1((ρ),(σ))\displaystyle N\geq\min_{\mathcal{E}\in\mathbb{E}}\frac{\log\left[1-\frac{1}{\left(1+\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}\right)^{2}}\right]^{-1}}{\log F^{-1}(\mathcal{E}(\rho),\mathcal{E}(\sigma))} (36)

for arbitrary states ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that D𝕆(ρ,σ)2bmax0D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0.

Appendix D Comparison of the bound in Theorem 1 with existing protocols

To make an explicit comparison between the bound in Theorem 1 and the sample cost for specific protocols, we consider mitigating the local depolarizing noise 𝒟pM\mathcal{D}_{p}^{\otimes M}, where 𝒟p(ρ)(1p)ρ+p𝕀/2\mathcal{D}_{p}(\rho)\coloneqq(1-p)\rho+p\mathbb{I}/2, with accuracy δ\delta and probability 1ε1-\varepsilon. For the sake of the analysis, we consider conservative strategies with the target observable 𝕆={i=1MXi}\mathbb{O}=\{\prod_{i=1}^{M}X_{i}\} and target ideal states 𝕊={GHZ(±θ1),GHZ(±θ2)}\mathbb{S}=\{\operatorname{GHZ}(\pm\theta_{1}),\operatorname{GHZ}(\pm\theta_{2})\} where

|GHZ(θ)12(|0M+ei(θ+π/2)|1M)\displaystyle\ket{\operatorname{GHZ}(\theta)}\coloneqq\frac{1}{\sqrt{2}}\left(\ket{0}^{\otimes M}+e^{i(\theta+\pi/2)}\ket{1}^{\otimes M}\right) (37)

and we set θ1arcsin(δ)\theta_{1}\coloneqq\arcsin(\delta), θ2arcsin(2δ)\theta_{2}\coloneqq\arcsin(2\delta). This refers to the setting in which one would like to obtain good estimates at least for these states and observable. We also set the effective noise channels being the local depolarizing noise, i.e., i=𝒟pM,i\mathcal{E}_{i}=\mathcal{D}_{p}^{\otimes M},\ \forall i, which is satisfied for all the specific protocols we study below.

Since Tr[i=1MXiGHZ(θ)]=sinθ\operatorname{Tr}[\prod_{i=1}^{M}X_{i}\operatorname{GHZ}(\theta)]=-\sin\theta, we particularly have D𝕆(GHZ(θ1),GHZ(θ1))=2δD_{\mathbb{O}}(\operatorname{GHZ}(\theta_{1}),\operatorname{GHZ}(-\theta_{1}))=2\delta. This allows us to bound the lower bound in Theorem 1 to get

Nlog14ε(1ε)log[1/F(𝒟pM(GHZ(θ1)),DpM(GHZ(θ1)))].\displaystyle N\geq\frac{\log\frac{1}{4\varepsilon(1-\varepsilon)}}{\log[1/F(\mathcal{D}_{p}^{\otimes M}(\operatorname{GHZ}(\theta_{1})),D_{p}^{\otimes M}(\operatorname{GHZ}(-\theta_{1})))]}. (38)

To study the sample numbers for specific error-mitigation protocols, we simulate each mitigation strategy and evaluate the actual sample numbers used to achieve the desired performance. Namely, we evaluate the probability of getting estimates within the accuracy δ\delta for all ideal states in 𝕊\mathbb{S} using a certain sample budget. If this success probability is above the target probability 1ε1-\varepsilon, we need less number of samples. On the other hand, if the success probability is below 1ε1-\varepsilon, we need more samples. By binary search, we evaluate the sample number that realizes the success probability 1ε1-\varepsilon with a small error tolerance, which we set 0.01. We also note that since the shift of the expectation value due to depolarizing noise is larger for GHZ(±θ2)\operatorname{GHZ}(\pm\theta_{2}) than GHZ(±θ1)\operatorname{GHZ}(\pm\theta_{1}) and the noise and error mitigation apply to GHZ(θ2)\operatorname{GHZ}(\theta_{2}) and GHZ(θ2)\operatorname{GHZ}(-\theta_{2}) in a symmetric manner, it suffices to ensure that the expectation value for GHZ(θ2)\operatorname{GHZ}(\theta_{2}) can be estimated with the accuracy δ\delta with probability 1ε1-\varepsilon.

The error-mitigation protocols we study are virtual distillation, symmetry verification, and probabilistic error cancellation. For virtual distillation, we consider 2-copy virtual distillation and estimate Tr(i=1MXi[𝒟pM(GHZ(θ2))]2)/Tr([𝒟pM(GHZ(θ2))]2)\operatorname{Tr}(\prod_{i=1}^{M}X_{i}[\mathcal{D}_{p}^{\otimes M}(\operatorname{GHZ}(\theta_{2}))]^{2})/\operatorname{Tr}([\mathcal{D}_{p}^{\otimes M}(\operatorname{GHZ}(\theta_{2}))]^{2}) as in Refs. [15, 16]. For symmetry verification [13], we observe that all the ideal states in 𝕊\mathbb{S} are +1 eigenstates of the stabilizers generated by {ZiZi+1}i=1M1\{Z_{i}Z_{i+1}\}_{i=1}^{M-1}. We thus estimate Tr(i=1MXiρ~)/Tr(ρ~)\operatorname{Tr}\left(\prod_{i=1}^{M}X_{i}\tilde{\rho}\right)/\operatorname{Tr}(\tilde{\rho}) where ρ~(i=1M1𝕀+ZiZi+12)𝒟pM(GHZ(θ2))(i=1M1𝕀+ZiZi+12)\tilde{\rho}\coloneqq\left(\prod_{i=1}^{M-1}\frac{\mathbb{I}+Z_{i}Z_{i+1}}{2}\right)\mathcal{D}_{p}^{\otimes M}(\operatorname{GHZ}(\theta_{2}))\left(\prod_{i=1}^{M-1}\frac{\mathbb{I}+Z_{i}Z_{i+1}}{2}\right) is the noisy state projected onto the stabilizer subspace. To run probabilistic cancellation [5], we utilize the decomposition of the inverse map for depolarizing noise 𝒟p1=[1+3p4(1p)]idp4(1p)(𝒳+𝒴+𝒵)\mathcal{D}_{p}^{-1}=\left[1+\frac{3p}{4(1-p)}\right]\operatorname{id}-\frac{p}{4(1-p)}(\mathcal{X}+\mathcal{Y}+\mathcal{Z}) where 𝒳,𝒴,𝒵\mathcal{X},\mathcal{Y},\mathcal{Z} are unitary Pauli channels. After every depolarizing channel, we simulate the action of this inverse channel by following the procedure described in Appendix E.

Following the analysis described above, we plot in Fig. S.2 our lower bound and the actual sample costs to achieve accuracy δ=0.1\delta=0.1 and failure probability ε=0.2\varepsilon=0.2 for 7-qubit local depolarizing noise with respect to the noise strength pp. We observe that Theorem 1 provides nontrivial lower bounds that are comparable to the actual sample cost, the difference being the factor of 3 to 6 in the studied range.

Note that Fig. S.2 also includes the sample number required for the case when we do not apply any error mitigation but just measure i=1MXi\prod_{i=1}^{M}X_{i} on 𝒟pM(GHZ(θ2))\mathcal{D}_{p}^{\otimes M}(\operatorname{GHZ}(\theta_{2})). When the noise strength is small, the target accuracy and success probability can be achieved without error mitigation, making the no-mitigation strategy sample-efficient. However, as the noise gets stronger, it gets harder to achieve the target performance and eventually becomes impossible to realize the target accuracy no matter what sample number is allowed, as can be seen in the divergent behavior in Fig. S.2.

Refer to caption
Figure S.2: The lower bound in Theorem 1 and the actual samples NN used for several specific error-mitigation protocols to mitigate 7-qubit local depolarizing noise with noise strength pp. We fixed accuracy δ=0.1\delta=0.1, failure probability ε=0.2\varepsilon=0.2 and considered 𝕆={i=1mXi}\mathbb{O}=\{\otimes_{i=1}^{m}X_{i}\} and 𝕊={GHZ(±θ1),GHZ(±θ2)}\mathbb{S}=\{\operatorname{GHZ}(\pm\theta_{1}),\operatorname{GHZ}(\pm\theta_{2})\}.

Appendix E Scaling of the bound in Theorem 2 with noise strength

Let us consider MM-qubit state suffering from the MM-qubit local dephasing noise 𝒵pM\mathcal{Z}_{p}^{\otimes M} where 𝒵(ρ)(1p)ρ+pZρZ\mathcal{Z}(\rho)\coloneqq(1-p)\rho+pZ\rho Z. Consider the class of error mitigation with the same effective noise channels i=𝒵pM,i\mathcal{E}_{i}=\mathcal{Z}_{p}^{\otimes M},\ \forall i. We also have a mild condition that the set 𝕆\mathbb{O} of target observables consists of normalized observables, i.e., those with the unit operator norm, and contains the Pauli operator i=1MXi\prod_{i=1}^{M}X_{i}, and the set 𝕊\mathbb{S} of target states contains MM-qubit GHZ states GHZ±|GHZ±GHZ±|\operatorname{GHZ}_{\pm}\coloneqq|{\operatorname{GHZ}_{\pm}}\rangle\!\langle{\operatorname{GHZ}_{\pm}}| with |GHZ±12(|0M±|1M)\ket{\operatorname{GHZ}_{\pm}}\coloneqq\frac{1}{\sqrt{2}}(\ket{0}^{\otimes M}\pm\ket{1}^{\otimes M}). We also assume that bmax1b_{\max}\leq 1, which excludes the trivial strategy that always outputs a fixed estimate.

Noting that D𝕆(GHZ+,GHZ)=Tr[i=1Xi(GHZ+GHZ)]=2D_{\mathbb{O}}(\operatorname{GHZ}_{+},\operatorname{GHZ}_{-})=\operatorname{Tr}[\prod_{i=1}X_{i}(\operatorname{GHZ}_{+}-\operatorname{GHZ}_{-})]=2, the choice of ρ=GHZ+\rho=\operatorname{GHZ}_{+} and σ=GHZ\sigma=\operatorname{GHZ}_{-} satisfies D𝕆(ρ,σ)2bmax0D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0. This puts a further bound for the lower bound in (4) and results in

Nlog[111+2σmaxQEMD𝕆(GHZ+,GHZ)2bmax]log[1/F(𝒵pM(GHZ+),𝒵pM(GHZ))].\displaystyle N\geq\frac{\log\left[1-\frac{1}{1+\frac{2\sigma_{\max}^{\rm QEM}}{D_{\mathbb{O}}(\operatorname{GHZ}_{+},\operatorname{GHZ}_{-})-2b_{\max}}}\right]}{\log[1/F(\mathcal{Z}_{p}^{\otimes M}(\operatorname{GHZ}_{+}),\mathcal{Z}_{p}^{\otimes M}(\operatorname{GHZ}_{-}))]}. (39)

Note that

𝒵pM(GHZ±)\displaystyle\mathcal{Z}_{p}^{\otimes M}(\operatorname{GHZ}_{\pm}) =k=0M/2(M2k)(1p)M2kp2kGHZ±\displaystyle=\sum_{k=0}^{\lfloor M/2\rfloor}\binom{M}{2k}(1-p)^{M-2k}p^{2k}\operatorname{GHZ}_{\pm} (40)
+k=1(M+1)/2(M2k1)(1p)M2k+1p2k1GHZ\displaystyle\quad+\sum_{k=1}^{\lfloor(M+1)/2\rfloor}\binom{M}{2k-1}(1-p)^{M-2k+1}p^{2k-1}\operatorname{GHZ}_{\mp}
=1+(12p)M2GHZ±+1(12p)M2GHZ,\displaystyle=\frac{1+(1-2p)^{M}}{2}\operatorname{GHZ}_{\pm}+\frac{1-(1-2p)^{M}}{2}\operatorname{GHZ}_{\mp},

where we used

k=0M/2(M2k)(1p)M2kp2k\displaystyle\sum_{k=0}^{\lfloor M/2\rfloor}\binom{M}{2k}(1-p)^{M-2k}p^{2k} =k=0M(Mk)(1p)Mkpk+(p)k2\displaystyle=\sum_{k=0}^{M}\binom{M}{k}(1-p)^{M-k}\frac{p^{k}+(-p)^{k}}{2} (41)
=1+(12p)M2.\displaystyle=\frac{1+(1-2p)^{M}}{2}.
k=1(M+1)/2(M2k1)(1p)M2k+1p2k1\displaystyle\sum_{k=1}^{\lfloor(M+1)/2\rfloor}\binom{M}{2k-1}(1-p)^{M-2k+1}p^{2k-1} =k=0M(Mk)(1p)Mkpk(p)k2\displaystyle=\sum_{k=0}^{M}\binom{M}{k}(1-p)^{M-k}\frac{p^{k}-(-p)^{k}}{2} (42)
=1(12p)M2.\displaystyle=\frac{1-(1-2p)^{M}}{2}.

This gives

F(𝒵pM(GHZ+),𝒵pM(GHZ))=1(12p)2M.\displaystyle F(\mathcal{Z}_{p}^{\otimes M}(\operatorname{GHZ}_{+}),\mathcal{Z}_{p}^{\otimes M}(\operatorname{GHZ}_{-}))=1-(1-2p)^{2M}. (43)

Focusing on the case when p=12δp=\frac{1}{2}-\delta with δ1\delta\ll 1, the scaling of the lower bound in (39) with respect to the noise strength is given by

[log11(12p)2M]11/(2δ)2M.\displaystyle\left[\log\frac{1}{1-(1-2p)^{2M}}\right]^{-1}\sim 1/(2\delta)^{2M}. (44)

We now show that this scaling can be achieved by probabilistic error cancellation. The probabilistic error cancellation method is the strategy to mitigate the effect of noise channel \mathcal{E} by simulating the action of the inverse noise channel 1\mathcal{E}^{-1}. The simulation can be accomplished by considering a linear decomposition of the inverse channel 1=icii\mathcal{E}^{-1}=\sum_{i}c_{i}\mathcal{B}_{i} where cic_{i} is a (possibly negative) real number and i\mathcal{B}_{i} is a physical operation that can be implemented on the given device. One then applies the operation i\mathcal{B}_{i} with probability |ci|/γ|c_{i}|/\gamma with γi|ci|\gamma\coloneqq\sum_{i}|c_{i}|, makes a measurement, and multiplies γsgn(ci)\gamma{\rm sgn}(c_{i}) to the outcome, where sgn(ci){\rm sgn}(c_{i}) is the sign of cic_{i} that takes +1 if ci0c_{i}\geq 0 and 1-1 if ci<0c_{i}<0. This constructs a distribution whose expected value coincides with the ideal expectation value with the standard deviation scaled by the factor of γ\gamma. Therefore, by taking the average over NN samples from this distribution, one can construct an estimate of the ideal expectation value, which obeys the distribution with the standard deviation γ/N\sim\gamma/\sqrt{N}.

In general, one can also consider optimizing γ\gamma over the choices of implementable operations {i}i\{\mathcal{B}_{i}\}_{i}. Such optimal cost for mitigating the local dephasing noise is characterized as [39]

γopt=1(12p)M.\displaystyle\gamma_{\rm opt}=\frac{1}{(1-2p)^{M}}. (45)

This implies that the sample number NN that achieves some fixed standard deviation becomes N1(12p)2MN\propto\frac{1}{(1-2p)^{2M}}. When p=12δp=\frac{1}{2}-\delta with δ1\delta\ll 1, the sample scales as N1/(2δ)2MN\sim 1/(2\delta)^{2M}, realizing the same scaling in (44).

Appendix F Proof of Theorem 3

Proof.

Let 𝒟p(ρ)(1p)ρ+p𝕀/2\mathcal{D}_{p}(\rho)\coloneqq(1-p)\rho+p\mathbb{I}/2 be the single-qubit depolarizing noise. Let also 𝒰1,,𝒰L\mathcal{U}_{1},\dots,\mathcal{U}_{L} be the unitary channels followed by a local depolarizing channel with the noise strength at least γ\gamma and suppose the lthl^{\mathrm{th}} layer 𝒰l\mathcal{U}_{l} in the nthn^{\mathrm{th}} noisy circuit is followed by the local depolarizing channel m=1M𝒟γn,l,m\otimes_{m=1}^{M}\mathcal{D}_{\gamma_{n,l,m}}. By assumption, we have γn,l,mγn,l,m\gamma_{n,l,m}\geq\gamma\ \forall n,l,m. Also, let Λn,l\Lambda_{n,l} and Ξn,l\Xi_{n,l} be the unital channels applied before and after the lthl^{\rm th} layer UlU_{l} in the nthn^{\rm th} noisy layered circuit.

We first note that every 𝒟γn,l,m\mathcal{D}_{\gamma_{n,l,m}} can be written as 𝒟γn,l,mγ𝒟γ\mathcal{D}_{\gamma_{n,l,m}-\gamma}\circ\mathcal{D}_{\gamma}. Since the second local depolarizing channel m=1M𝒟γn,l,mγ\otimes_{m=1}^{M}\mathcal{D}_{\gamma_{n,l,m}-\gamma} is unital, it can be absorbed in the following unital channel Ξn,l\Xi_{n,l}, allowing us to focus on 𝒟γM\mathcal{D}_{\gamma}^{\otimes M} as the noise after each layer.

Then, the effective noise channel for the nthn^{\rm th} layered circuit can be written as

n\displaystyle\mathcal{E}_{n} =𝒟γMΞn,L𝒰LΛn,L𝒟γMΞn,1𝒰1Λn,1𝒰1𝒰L\displaystyle=\mathcal{D}_{\gamma}^{\otimes M}\circ\Xi_{n,L}\circ\mathcal{U}_{L}\circ\Lambda_{n,L}\circ\cdots\circ\mathcal{D}_{\gamma}^{\otimes M}\circ\Xi_{n,1}\circ\mathcal{U}_{1}\circ\Lambda_{n,1}\circ\mathcal{U}_{1}^{\dagger}\circ\dots\circ\mathcal{U}_{L}^{\dagger} (46)

With a little abuse of notation, we write this as n=l=1L[𝒟γMΞn,l𝒰lΛn,l]l=1L𝒰Ll+1\mathcal{E}_{n}=\prod_{l=1}^{L}\left[\mathcal{D}_{\gamma}^{\otimes M}\circ\Xi_{n,l}\circ\mathcal{U}_{l}\circ\Lambda_{n,l}\right]\circ\prod_{l=1}^{L}\mathcal{U}_{L-l+1}^{\dagger} where we let the product sign refer to the concatenation of quantum channels.

Let ρin\rho_{\rm in} and σin\sigma_{\rm in} be some input states and ρ=𝒰L𝒰1(ρin)\rho=\mathcal{U}_{L}\circ\dots\circ\mathcal{U}_{1}(\rho_{\rm in}) and σ=𝒰L𝒰1(σin)\sigma=\mathcal{U}_{L}\circ\dots\circ\mathcal{U}_{1}(\sigma_{\rm in}). Then, the unitary invariance and the triangle inequality of the trace distance imply that

Dtr(n=1Nn(ρ),n=1Nn(σ))\displaystyle D_{\operatorname{tr}}\left(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma)\right) =Dtr(n=1Nn(ρin),n=1Nn(σin))\displaystyle=D_{\operatorname{tr}}\left(\otimes_{n=1}^{N}\mathcal{F}_{n}(\rho_{\rm in}),\otimes_{n=1}^{N}\mathcal{F}_{n}(\sigma_{\rm in})\right) (47)
Dtr(n=1Nn(ρin),𝕀2NM)+Dtr(n=1Nn(σin),𝕀2NM)\displaystyle\leq D_{\rm tr}\left(\otimes_{n=1}^{N}\mathcal{F}_{n}(\rho_{\rm in}),\frac{\mathbb{I}}{2^{NM}}\right)+D_{\rm tr}\left(\otimes_{n=1}^{N}\mathcal{F}_{n}(\sigma_{\rm in}),\frac{\mathbb{I}}{2^{NM}}\right)

where we defined nl=1L[𝒟γMΞn,l𝒰lΛn,l]\mathcal{F}_{n}\coloneqq\prod_{l=1}^{L}\left[\mathcal{D}_{\gamma}^{\otimes M}\circ\Xi_{n,l}\circ\mathcal{U}_{l}\circ\Lambda_{n,l}\right]. Using the quantum Pinsker’s inequality

Dtr(ρ,σ)ln22S(ρσ)\displaystyle D_{\operatorname{tr}}(\rho,\sigma)\leq\sqrt{\frac{\ln 2}{2}}\,\sqrt{S(\rho\|\sigma)} (48)

that holds all states ρ\rho, σ\sigma, where S(ρσ)Tr(ρlogρ)Tr(ρlogσ)S(\rho\|\sigma)\coloneqq\operatorname{Tr}(\rho\log\rho)-\operatorname{Tr}(\rho\log\sigma) is the relative entropy, we get

Dtr(n=1Nn(ρin),𝕀2NM)\displaystyle D_{\rm tr}\left(\otimes_{n=1}^{N}\mathcal{F}_{n}(\rho_{\rm in}),\frac{\mathbb{I}}{2^{NM}}\right) ln22S(n=1Nn(ρin)𝕀2NM)\displaystyle\leq\sqrt{\frac{\ln 2}{2}}\sqrt{S\left(\otimes_{n=1}^{N}\mathcal{F}_{n}(\rho_{\rm in})\,\Big{\|}\,\frac{\mathbb{I}}{2^{NM}}\right)} (49)
=ln22n=1NS(n(ρin)𝕀2M)\displaystyle=\sqrt{\frac{\ln 2}{2}}\sqrt{\sum_{n=1}^{N}S\left(\mathcal{F}_{n}(\rho_{\rm in})\,\Big{\|}\,\frac{\mathbb{I}}{2^{M}}\right)}

where in the second line we used the additivity of the relative entropy S(ρ1ρ2σ1σ2)=S(ρ1σ1)+S(ρ2σ2)S(\rho_{1}\otimes\rho_{2}\|\sigma_{1}\otimes\sigma_{2})=S(\rho_{1}\|\sigma_{1})+S(\rho_{2}\|\sigma_{2}).

We now recall the result in Ref. [59], showing that

S(𝒟γM(τ)𝕀/2M)(1γ)2S(τ𝕀/2M)\displaystyle S\left(\mathcal{D}_{\gamma}^{\otimes M}(\tau)\,\Big{\|}\,\mathbb{I}/2^{M}\right)\leq(1-\gamma)^{2}S(\tau\,\|\,\mathbb{I}/2^{M}) (50)

for arbitrary MM-qubit state τ\tau. This implies that for arbitrary state MM-qubit state τ\tau, noise strength γ\gamma, unitary 𝒰\mathcal{U}, and unital channels Ξ\Xi, Λ\Lambda,

S(𝒟γMΞ𝒰Λ(τ)𝕀/2M)\displaystyle S(\mathcal{D}_{\gamma}^{\otimes M}\circ\Xi\circ\mathcal{U}\circ\Lambda(\tau)\,\|\,\mathbb{I}/2^{M}) (1γ)2S(Ξ𝒰Λ(τ)𝕀/2M)\displaystyle\leq(1-\gamma)^{2}S(\Xi\circ\mathcal{U}\circ\Lambda(\tau)\,\|\,\mathbb{I}/2^{M}) (51)
=(1γ)2S(Ξ𝒰Λ(τ)Ξ𝒰Λ(𝕀/2M))\displaystyle=(1-\gamma)^{2}S(\Xi\circ\mathcal{U}\circ\Lambda(\tau)\,\|\,\Xi\circ\mathcal{U}\circ\Lambda(\mathbb{I}/2^{M}))
(1γ)2S(τ𝕀/2M)\displaystyle\leq(1-\gamma)^{2}S(\tau\,\|\,\mathbb{I}/2^{M})

where we used (50) in the first line, the fact that Ξ𝒰Λ(𝕀/2M)=𝕀/2M\Xi\circ\mathcal{U}\circ\Lambda(\mathbb{I}/2^{M})=\mathbb{I}/2^{M} because Ξ\Xi and Λ\Lambda are unital in the second line, and the data-processing inequality for the relative entropy in the third line. The sequential application of (51) results in

S(n(ρin)𝕀2M)\displaystyle S\left(\mathcal{F}_{n}(\rho_{\rm in})\,\Big{\|}\,\frac{\mathbb{I}}{2^{M}}\right) (1γ)2LS(ρin𝕀2M)\displaystyle\leq(1-\gamma)^{2L}\,S\left(\rho_{\rm in}\,\Big{\|}\,\frac{\mathbb{I}}{2^{M}}\right) (52)
(1γ)2LM,\displaystyle\leq(1-\gamma)^{2L}M,

where the second line follows from the upper bound of the relative entropy. Combining (47), (52) and (49), we get

Dtr(n=1Nn(ρin),n=1Nn(σin))2ln22(1γ)2LMN.\displaystyle D_{\rm tr}\left(\otimes_{n=1}^{N}\mathcal{F}_{n}(\rho_{\rm in}),\otimes_{n=1}^{N}\mathcal{F}_{n}(\sigma_{\rm in})\right)\leq 2\sqrt{\frac{\ln 2}{2}}\sqrt{(1-\gamma)^{2L}MN}. (53)

Combining this with (10), we get

N(12ε)22ln(2)M(1γ)2L\displaystyle N\geq\frac{(1-2\varepsilon)^{2}}{2\ln(2)M(1-\gamma)^{2L}} (54)

for any ε1/2\varepsilon\leq 1/2. ∎

Appendix G Bound for layered circuits with a fixed target bias and standard deviation

Theorem 3 in the main text shows the exponential sampling overhead for noisy layered circuits with a fixed target accuracy and success probability. Here, we present a similar exponential growth of the required sample overhead for a fixed target bias and standard deviation.

Theorem S.2.

Suppose that an error-mitigation strategy described above is applied to an MM-qubit circuit to mitigate local depolarizing channels with strength at least γ\gamma that follow LL layers of unitaries. Then, if the estimator of the error mitigation has the maximum standard deviation σmaxQEM\sigma_{\max}^{\operatorname{QEM}} and maximum bias bmaxb_{\max}, the required number NN of samples is lower bounded as

N14M(1γ)2L(12σmaxQEMD𝕆(ρ,σ)2bmax+1)2\displaystyle N\geq\frac{1}{4M(1-\gamma)^{2L}}\left(\frac{1}{\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}+1}\right)^{2} (55)

for arbitrary states ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that D𝕆(ρ,σ)2bmax0D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0.

Proof.

We first note that DF(ρ,σ)S(ρσ)D_{F}(\rho,\sigma)\leq\sqrt{S(\rho\|\sigma)} for arbitrary states ρ,σ\rho,\sigma [64]. Also, the purified distance DFD_{F} satisfies the triangle inequality [61]. Therefore, we can repeat the argument in (47) and (49) with the purified distance to get

DF(n=1Nn(ρ),n=1Nn(σ))(n=1NS(n(ρin)𝕀2M)+n=1NS(n(σin)𝕀2M)).\displaystyle D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))\leq\left(\sqrt{\sum_{n=1}^{N}S\left(\mathcal{F}_{n}(\rho_{\rm in})\,\Big{\|}\,\frac{\mathbb{I}}{2^{M}}\right)}+\sqrt{\sum_{n=1}^{N}S\left(\mathcal{F}_{n}(\sigma_{\rm in})\,\Big{\|}\,\frac{\mathbb{I}}{2^{M}}\right)}\right). (56)

Using (52), we further have

DF(n=1Nn(ρ),n=1Nn(σ))2MN(1γ)2L\displaystyle D_{F}(\otimes_{n=1}^{N}\mathcal{E}_{n}(\rho),\otimes_{n=1}^{N}\mathcal{E}_{n}(\sigma))\leq 2\sqrt{MN(1-\gamma)^{2L}} (57)

This allows us to put another bound to (29) to get

σmaxQEMmaxρ,σ𝕊D𝕆(ρ,σ)2bmax012(12MN(1γ)2L1)(D𝕆(ρ,σ)2bmax).\displaystyle\sigma_{\max}^{\operatorname{QEM}}\geq\max_{\begin{subarray}{c}\rho,\sigma\in\mathbb{S}\\ D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0\end{subarray}}\frac{1}{2}\left(\frac{1}{2\sqrt{MN(1-\gamma)^{2L}}}-1\right)(D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}). (58)

Reorganizing this, we obtain

N14M(1γ)2L(12σmaxQEMD𝕆(ρ,σ)2bmax+1)2.\displaystyle N\geq\frac{1}{4M(1-\gamma)^{2L}}\left(\frac{1}{\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}+1}\right)^{2}. (59)

for arbitrary state ρ,σ\rho,\sigma such that D𝕆(ρ,σ)2bmax0D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0. ∎

Appendix H Alternative bounds for noisy layered circuits

Let us investigate alternative bounds that also diverge with the vanishing failure probability ε0\varepsilon\to 0. We first note the inequality [64]

S(τη)logF(τ,η)1\displaystyle S(\tau\|\eta)\geq\log F(\tau,\eta)^{-1} (60)

that hold for arbitrary states τ\tau and η\eta. Together with Theorem 1, our goal is to upper bound S(n(ρ)n(σ))S(\mathcal{E}_{n}(\rho)\|\mathcal{E}_{n}(\sigma)) for the effective noise channel defined in (46).

To do this, we employ the continuity bounds of the relative entropy [65]. Namely, for arbitrary states τ\tau and a full-rank state η\eta,

S(τη)\displaystyle S(\tau\|\eta) 4Dtr(τ,η)2λmin(η)\displaystyle\leq\frac{4D_{\operatorname{tr}}(\tau,\eta)^{2}}{\lambda_{\min}(\eta)} (61)

where λmin(η)\lambda_{\min}(\eta) is the minimum eigenvalue of a state η\eta, and for Dtr(τ,η)1/2D_{\operatorname{tr}}(\tau,\eta)\leq 1/2,

S(τη)\displaystyle S(\tau\|\eta) 2ln2Dtr(τ,η)[logd+log(2Dtr(τ,η))+12log(1/λmin(η))].\displaystyle\leq\frac{2}{\ln 2}D_{\operatorname{tr}}(\tau,\eta)\left[\log d+\log\left(\frac{2}{D_{\rm tr}(\tau,\eta)}\right)+\frac{1}{2}\log(1/\lambda_{\min}(\eta))\right]. (62)

where dd is the dimension of the Hilbert space that τ\tau and η\eta act on. Note that (61) has the square dependence on the trace distance, while (62) has the log dependence on the minimum eigenvalue.

Note that 𝒟γM=(1(4γ)M)Ψ+(4γ)M𝒟M\mathcal{D}_{\gamma}^{\otimes M}=(1-(4\gamma)^{M})\Psi+(4\gamma)^{M}\mathcal{D}^{\otimes M} where Ψ\Psi is some quantum channel and 𝒟\mathcal{D} is the completely depolarizing channel. This ensures that

λmin(𝒟γM(σ))(γ2)M\displaystyle\lambda_{\min}(\mathcal{D}_{\gamma}^{\otimes M}(\sigma))\geq\left(\frac{\gamma}{2}\right)^{M} (63)

for arbitrary state σ\sigma. Noting the structure in (46), where all components preserve the maximally mixed state, this results in

λmin(n(σ))(γ2)M.\displaystyle\lambda_{\min}(\mathcal{E}_{n}(\sigma))\geq\left(\frac{\gamma}{2}\right)^{M}. (64)

We combine (61) with (53) and (64) to get

S(n(ρ)n(σ)))8ln(2)M(1γ)2L(2γ)M.\displaystyle S(\mathcal{E}_{n}(\rho)\|\mathcal{E}_{n}(\sigma)))\leq 8\ln(2)\,M(1-\gamma)^{2L}\left(\frac{2}{\gamma}\right)^{M}. (65)

Combining this with (60) and Theorem 1, we obtain

N\displaystyle N log[14ε(1ε)](γ2)M8ln(2)M(1γ)2L.\displaystyle\geq\log\left[\frac{1}{4\varepsilon(1-\varepsilon)}\right]\frac{\left(\frac{\gamma}{2}\right)^{M}}{8\ln(2)\,M(1-\gamma)^{2L}}. (66)

This bound diverges with ε0\varepsilon\to 0 and grows exponentially with the circuit depth with exponent 2L2L. However, this is also exponentially loose with the qubit number MM.

The second bound (62) can remedy this drawback at the cost of a looser circuit depth dependence. Let us first note that the function xlog(1/x)x\log(1/x) is monotonically increasing for 0<x1/20<x\leq 1/2. Therefore, for MM, γ\gamma, and LL such that Dtr(n(ρ),n(σ))2ln2M(1γ)L1/2D_{\operatorname{tr}}(\mathcal{E}_{n}(\rho),\mathcal{E}_{n}(\sigma))\leq\sqrt{2\ln 2}\sqrt{M}(1-\gamma)^{L}\leq 1/2, we can combine (62), (53), and (64) to get

S(n(ρ)n(σ))22ln2M(1γ)L[M+log(22ln2M(1γ)L)+M2log(2/γ)].\displaystyle S(\mathcal{E}_{n}(\rho)\|\mathcal{E}_{n}(\sigma))\leq 2\sqrt{\frac{2}{\ln 2}}\sqrt{M}(1-\gamma)^{L}\left[M+\log\left(\frac{2}{\sqrt{2\ln 2}\sqrt{M}(1-\gamma)^{L}}\right)+\frac{M}{2}\log(2/\gamma)\right]. (67)

This, together with Theorem 1, we get

Nlog[14ε(1ε)]ln28M(1γ)L[Llog(21γ)+M12log(Mln22)+M2log(2/γ)]1.\displaystyle N\geq\log\left[\frac{1}{4\varepsilon(1-\varepsilon)}\right]\frac{\sqrt{\ln 2}}{\sqrt{8M}(1-\gamma)^{L}}\left[L\log\left(\frac{2}{1-\gamma}\right)+M-\frac{1}{2}\log\left(\frac{M\ln 2}{2}\right)+\frac{M}{2}\log(2/\gamma)\right]^{-1}. (68)

This has a polynomial dependence with MM and still grows exponentially with LL (up to a polynomial correction), while the LL dependence is not as good as (66).

We can also apply a similar argument to show bounds for fixed standard deviation and bias, which diverge with σmaxQEM0\sigma_{\max}^{\operatorname{QEM}}\to 0. Namely, we combine (65) with (60) and Theorem 2 to obtain

Nlog[11(1+2σmaxQEMD𝕆(ρ,σ)2bmax)2]1(γ2)M8ln(2)M(1γ)2L.\displaystyle N\geq\log\left[1-\frac{1}{\left(1+\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}\right)^{2}}\right]^{-1}\frac{\left(\frac{\gamma}{2}\right)^{M}}{8\ln(2)\,M(1-\gamma)^{2L}}. (69)

This bound diverges with σmaxQEM0\sigma_{\max}^{\operatorname{QEM}}\to 0 and grows exponentially with the circuit depth with exponent 2L2L. Similarly, using (67) and Theorem 2, we get

Nlog[11(1+2σmaxQEMD𝕆(ρ,σ)2bmax)2]1ln28M(1γ)L[Llog(21γ)+M12log(Mln22)+M2log(2γ)]1.\displaystyle N\geq\log\left[1-\frac{1}{\left(1+\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}\right)^{2}}\right]^{-1}\frac{\sqrt{\ln 2}}{\sqrt{8M}(1-\gamma)^{L}}\left[L\log\left(\frac{2}{1-\gamma}\right)+M-\frac{1}{2}\log\left(\frac{M\ln 2}{2}\right)+\frac{M}{2}\log\left(\frac{2}{\gamma}\right)\right]^{-1}. (70)

Appendix I Noisy layered circuits under other noise models

Let us consider a layered circuit affected by a noise channel 𝒩n,l\mathcal{N}_{n,l} after the lthl^{\mathrm{th}} layer UlU_{l} in the nthn^{\mathrm{th}} copy of the noisy circuits. Suppose that for every nn, all {𝒩n,l}l=1L\{\mathcal{N}_{n,l}\}_{l=1}^{L} and {Ul}l=1L\{U_{l}\}_{l=1}^{L} share the same fixed point σn\sigma_{n}. We aim to mitigate errors by applying additional channels Λn,l\Lambda_{n,l} and Ξn,l\Xi_{n,l} that also preserve σn\sigma_{n} before and after UlU_{l} in the nthn^{\mathrm{th}} noisy circuit and applying a global trailing quantum process over nn copies of distorted state.

For an arbitrary quantum channel 𝒩\mathcal{N} with a fixed point σ\sigma, let ξ𝒩\xi_{\mathcal{N}} be a contraction constant for 𝒩\mathcal{N} that satisfies

S(𝒩(ρ)σ)ξ𝒩S(ρσ),ρ.\displaystyle S(\mathcal{N}(\rho)\|\sigma)\leq\xi_{\mathcal{N}}S(\rho\|\sigma),\ \forall\rho. (71)

The contraction constant ξ𝒩\xi_{\mathcal{N}} was studied for various situations, particularly in relation to logarithmic Sobolev inequalities for dynamical semigroups that represent continuous Markovian evolution [59, 66, 67, 68, 69, 70, 71, 72, 73].

Using the contraction constant, we can state the sample lower bound in the general form, which reproduces Theorems 3 and S.2 as special cases.

Theorem S.3.

Suppose that an error-mitigation strategy achieves (1) with some δ0\delta\geq 0 and 0ε1/20\leq\varepsilon\leq 1/2 to mitigate noise {𝒩n,l}n,l\{\mathcal{N}_{n,l}\}_{n,l} that occurs after each UlU_{l} in the nthn^{\mathrm{th}} layered circuit in an error mitigation strategy described above. Suppose also that every noise channel has the contraction constant smaller than ξ\xi, i.e., ξ𝒩n,lξ,n,l\xi_{\mathcal{N}_{n,l}}\leq\xi,\forall n,l.

Then, if there exist at least two states ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that D𝕆(ρ,σ)2δD_{\mathbb{O}}(\rho,\sigma)\geq 2\delta, the required sample number NN is lower bounded as

N(12ε)22ln(2)MξL.\displaystyle N\geq\frac{(1-2\varepsilon)^{2}}{2\ln(2)\,M\xi^{L}}. (72)

Similarly, to achieve the maximum standard deviation σmaxQEM\sigma_{\max}^{\operatorname{QEM}} and maximum bias bmaxb_{\max}, the required number NN of samples is lower bounded as

N14MξL(12σmaxQEMD𝕆(ρ,σ)2bmax+1)2\displaystyle N\geq\frac{1}{4M\xi^{L}}\left(\frac{1}{\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}+1}\right)^{2} (73)

for arbitrary states ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that D𝕆(ρ,σ)2bmax0D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0.

Proof.

Both statements can be obtained essentially by following the same arguments in the proofs of Theorems 3 and S.2. Namely, defining nl=1L[𝒩n,lΞn,l𝒰lΛn,l]\mathcal{F}_{n}\coloneqq\prod_{l=1}^{L}\left[\mathcal{N}_{n,l}\circ\Xi_{n,l}\circ\mathcal{U}_{l}\circ\Lambda_{n,l}\right], the argument leading to (53) gives us

Dtr(n=1Nn(ρin),n=1Nn(σin))2ln2ξLMN.\displaystyle D_{\rm tr}\left(\otimes_{n=1}^{N}\mathcal{F}_{n}(\rho_{\rm in}),\otimes_{n=1}^{N}\mathcal{F}_{n}(\sigma_{\rm in})\right)\leq\sqrt{2\ln 2}\sqrt{\xi^{L}MN}. (74)

This, together with (10), gives

N(12ε)22ln(2)MξL\displaystyle N\geq\frac{(1-2\varepsilon)^{2}}{2\ln(2)M\xi^{L}} (75)

whenever there exist two states ρ,σ𝕊\rho,\sigma\in\mathbb{S} such that D𝕆(ρ,σ)2δD_{\mathbb{O}}(\rho,\sigma)\geq 2\delta.

Similarly, the argument to get (57) results in

DF(n=1Nn(ρin)n=1Nn(σin))2MNξL.\displaystyle D_{F}\left(\otimes_{n=1}^{N}\mathcal{F}_{n}(\rho_{\rm in})\|\otimes_{n=1}^{N}\mathcal{F}_{n}(\sigma_{\rm in})\right)\leq 2\sqrt{MN\xi^{L}}. (76)

Combining this to (29) leads to

N14MξL(12σmaxQEMD𝕆(ρ,σ)2bmax+1)2.\displaystyle N\geq\frac{1}{4M\xi^{L}}\left(\frac{1}{\frac{2\sigma_{\max}^{\operatorname{QEM}}}{D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}}+1}\right)^{2}. (77)

for arbitrary state ρ,σ\rho,\sigma such that D𝕆(ρ,σ)2bmax0D_{\mathbb{O}}(\rho,\sigma)-2b_{\max}\geq 0.

Theorems 3 and S.2 are recovered by noting that the contraction constant for the local depolarizing noise 𝒟γM\mathcal{D}_{\gamma}^{\otimes M} is known as ξ𝒟γM=(1γ)2\xi_{\mathcal{D}_{\gamma}^{\otimes M}}=(1-\gamma)^{2} [59].

We remark that analogous results to Theorem S.3 can also be obtained in terms of the contractivity constant for other distance measures, such as trace distance, purified distance, and Rényi-α\alpha divergences with α>1\alpha>1. Indeed, we can get corresponding bounds for stochastic Pauli noise by employing the contractivity for Rényi-22 sandwiched relative entropy[73]

S2(Λn(ρ)𝕀/dn)exp[2(11/n)lnrlnn]S2(ρ𝕀/dn)\displaystyle S_{2}(\Lambda^{\otimes n}(\rho)\|\mathbb{I}/d^{n})\leq\exp\left[2(1-1/n)\frac{\ln r}{\ln n}\right]S_{2}(\rho\|\mathbb{I}/d^{n}) (78)

for every nn and dd-dimensional unital channel Λ\Lambda that satisfies r1Λ(1r)𝒟221r^{-1}\|\Lambda-(1-r)\mathcal{D}\|_{2\to 2}\leq 1. Here, S2(τη)logTr(η1/2τη1/2τ)S_{2}(\tau\|\eta)\coloneqq\log\operatorname{Tr}\left(\eta^{-1/2}\,\tau\,\eta^{-1/2}\,\tau\right) is the Rényi-2 sandwiched relative entropy [74], 𝒟\mathcal{D} is the completely depolarizing channel, and Φ22supX0Φ(X)2X2\|\Phi\|_{2\to 2}\coloneqq\sup_{X\neq 0}\frac{\|\Phi(X)\|_{2}}{\|X\|_{2}} is the 222\to 2 norm for an arbitrary linear map Φ\Phi. For an error probability 𝐪=(qx,qy,qz){\bf q}=(q_{x},q_{y},q_{z}) for each Pauli error, define a qubit stochastic Pauli noise as 𝒯𝐪(τ)(1qxqyqz)τ+qxXτX+qyYτY+qzZτZ\mathcal{T}_{\bf q}(\tau)\coloneqq(1-q_{x}-q_{y}-q_{z})\tau+q_{x}X\tau X+q_{y}Y\tau Y+q_{z}Z\tau Z. Then, by taking r=q|12min{qx+qy,qy+qz,qx+qz}|r=q\coloneqq|1-2\min\{q_{x}+q_{y},q_{y}+q_{z},q_{x}+q_{z}\}| in Eq. (78), we obtain

S2(𝒯𝐪n(ρ)𝕀/2n)q1/ln2S2(ρ𝕀/2n).\displaystyle S_{2}(\mathcal{T}_{\bf q}^{\otimes n}(\rho)\|\mathbb{I}/2^{n})\leq q^{1/\ln 2}S_{2}(\rho\|\mathbb{I}/2^{n}). (79)

We remark that this type of bound was also employed in Ref. [41]. Since S(τη)S2(τη),τ,ηS(\tau\|\eta)\leq S_{2}(\tau\|\eta),\ \forall\tau,\eta [74], we can upper bound the form in (49) by (78). From there, we can follow the same argument as the ones in the proofs of Theorems 3 and S.2, noting the data-processing inequality and the additivity for tensor-product states known for Rény-2 sandwiched relative entropy [74]. This results in the sampling lower bounds Ω(qL/ln2)\Omega\left(q^{L/\ln 2}\right) for stochastic Pauli noise.

More generally, the exponential sample growth can be established for a large class of tensor-product channels. Let {Φt}t0\{\Phi_{t}\}_{t\geq 0} be a set of unital qubit channels constructing a dynamical semigroup, i.e., Φ0=id\Phi_{0}=\operatorname{id} and Φs+r=ΦsΦr,s,t0\Phi_{s+r}=\Phi_{s}\circ\Phi_{r},\forall s,t\geq 0, generated by a Liouvillian \mathcal{L}, i.e., Φt=et\Phi_{t}=e^{t\mathcal{L}} [75]. Suppose further that 𝕀/2\mathbb{I}/2 is the unique solution of (τ)=0\mathcal{L}(\tau)=0, i.e., the unique fixed point for {Φt}t\{\Phi_{t}\}_{t}. Then, for a noisy channel 𝒩=ΦtM\mathcal{N}=\Phi_{t}^{\otimes M} defined for some t>0t>0 comes with ξ𝒩<1\xi_{\mathcal{N}}<1 [69], leading to the exponential sample cost together with Theorem S.3. An example includes a channel generated by the Liouvillian constructed by a stochastic Pauli channel (τ)=i=03qiPiτPiτ\mathcal{L}(\tau)=\sum_{i=0}^{3}q_{i}P_{i}\tau P_{i}-\tau where qiq_{i} is the probability that ithi^{\mathrm{th}} Pauli PiP_{i} is applied.

We can also apply this to noise channels globally applied to MM qubits. Consider the MM-qubit (not necessarily unital) depolarizing channel with a full-rank fixed point σ\sigma defined as 𝒟γ,σ(τ)(1γ)τ+γσ\mathcal{D}_{\gamma,\sigma}(\tau)\coloneqq(1-\gamma)\tau+\gamma\sigma. For instance, the fixed state σ\sigma can be taken as the thermal Gibbs state σβ,H=eβH/Tr(eβH)\sigma_{\beta,H}=e^{-\beta H}/\operatorname{Tr}(e^{-\beta H}) for some inverse temperature β\beta and Hamiltonian HH. Then, it was shown in Ref. [68] that

S(𝒟γ,σ(ρ)σ)(1γ)2α1S(ρσ)\displaystyle S(\mathcal{D}_{\gamma,\sigma}(\rho)\|\sigma)\leq(1-\gamma)^{2\alpha_{1}}S(\rho\|\sigma) (80)

where α1\alpha_{1} is computed by

α1=minx[0,1]12(1+qλmin(σ)(x)).\displaystyle\alpha_{1}=\min_{x\in[0,1]}\frac{1}{2}(1+q_{\lambda_{\min}(\sigma)}(x)). (81)

Here, λmin(σ)\lambda_{\min}(\sigma) is the minimum eigenvalue of σ\sigma, and qx(y)q_{x}(y) is defined for x,y(0,1)x,y\in(0,1) as

qy(x){D2(yx)D2(xy)xy1x=y\displaystyle q_{y}(x)\coloneqq\begin{cases}\frac{D_{2}(y\|x)}{D_{2}(x\|y)}&x\neq y\\ 1&x=y\end{cases} (82)

where D2(xy)xlog(x/y)+(1x)log[(1x)/(1y)]D_{2}(x\|y)\coloneqq x\log(x/y)+(1-x)\log[(1-x)/(1-y)] is the binary relative entropy. As shown in Ref. [68], α1\alpha_{1} monotonically increases with λmin(σ)\lambda_{\min}(\sigma) and behaves as α11/2\alpha_{1}\to 1/2 in the limit λmin(σ)0\lambda_{\min}(\sigma)\to 0 and α11\alpha_{1}\to 1 in the limit λmin(σ)1/2\lambda_{\min}(\sigma)\to 1/2. Taking ξ𝒟γ,σ=(1γ)2α1\xi_{\mathcal{D}_{\gamma,\sigma}}=(1-\gamma)^{2\alpha_{1}} in Theorem S.3 then allows us to obtain sample lower bounds for the generalized global depolarizing noise with an arbitrary full-rank fixed state.

Similarly to the tensor-product channels, the exponential sample cost can be extended to a wide class of quantum channels. Let 𝒩\mathcal{N} be a unital channel and 𝒩\mathcal{N}^{\dagger} be its dual, i.e., Tr(X𝒩(Y))=Tr(𝒩(Y)X),X,Y\operatorname{Tr}(X\mathcal{N}(Y))=\operatorname{Tr}(\mathcal{N}^{\dagger}(Y)X),\forall X,Y. Then, if 𝒩𝒩\mathcal{N}^{\dagger}\circ\mathcal{N} is primitive, i.e., limn(𝒩𝒩)n(τ)=𝕀/2M,τ\lim_{n\to\infty}(\mathcal{N}^{\dagger}\circ\mathcal{N})^{n}(\tau)=\mathbb{I}/2^{M},\ \forall\tau, we have λ𝒩<1\lambda_{\mathcal{N}}<1 [69].

Our results also admit a thermodynamic interpretation. Consider the situation where the ideal circuit is the identity map and our system suffers from the continuous thermal map {Φt}t\{\Phi_{t}\}_{t} generated by a Liouvillian \mathcal{L} with a thermal Gibbs state σβ,H=eβH/Tr(eβH)\sigma_{\beta,H}=e^{-\beta H}/\operatorname{Tr}(e^{-\beta H}) as a fixed state, i.e., (σβ,H)=0\mathcal{L}(\sigma_{\beta,H})=0 [59]. This represents the state preservation scenario, in which we would like to extract the classical information about the initial state after some period of time.

Let ρ\rho be the initial state and ρt=Φt(ρ)\rho_{t}=\Phi_{t}(\rho) be a state at time t0t\geq 0. Note that for an arbitrary state τ\tau, we have S(τσβ,H)=β(F(τ)Feq)S(\tau\|\sigma_{\beta,H})=\beta(F(\tau)-F_{\rm eq}) where F(τ)Tr(τH)S(τ)/βF(\tau)\coloneqq\operatorname{Tr}(\tau H)-S(\tau)/\beta is the nonequilibrium free energy of τ\tau and FeqF(σβ,H)F_{\rm eq}\coloneqq F(\sigma_{\beta,H}) is the equilibrium free energy. Then, Theorem 1 implies that, to extract the expectation value of the initial state ρ\rho, we need to use

N=Ω(1F(ρt)Feq)\displaystyle N=\Omega\left(\frac{1}{F(\rho_{t})-F_{\rm eq}}\right) (83)

samples. In other words, the necessary sampling cost grows as the free energy gets lost due to thermalization. The property of the dynamical semigroup further gives

F(ρt)Feqeαentt[F(ρ)Feq]\displaystyle F(\rho_{t})-F_{\rm eq}\leq e^{-\alpha_{\rm ent}t}\left[F(\rho)-F_{\rm eq}\right] (84)

with

αent=minτΣ˙(τ)β[F(τ)Feq]\displaystyle\alpha_{\rm ent}=\min_{\tau}\frac{\dot{\Sigma}(\tau)}{\beta\left[F(\tau)-F_{\rm eq}\right]} (85)

where Σ˙(τ)[ddtS(τt)βddtTr(τtH)]|t=0=Tr[(τ)ln(τ)]Tr[(τ)H]\dot{\Sigma}(\tau)\coloneqq\left[\frac{d}{dt}S(\tau_{t})-\beta\frac{d}{dt}\operatorname{Tr}(\tau_{t}H)\right]|_{t=0}=-\operatorname{Tr}[\mathcal{L}(\tau)\ln(\tau)]-\operatorname{Tr}[\mathcal{L}(\tau)H] is the entropy production rate. In particular, when αent>0\alpha_{\rm ent}>0 holds, i.e., Σ˙(τ)>0,τσβ,H\dot{\Sigma}(\tau)>0,\ \forall\tau\neq\sigma_{\beta,H}, this implies

N=Ω(eαentt),\displaystyle N=\Omega\left(e^{\alpha_{\rm ent}t}\right), (86)

the exponential sample cost with time tt. This cannot be avoided even if we apply intermediate operations that preserve the Gibbs state, as such operations cannot increase the free energy. We also remark that there has been an intense study on the evaluation of the exponent α1\alpha_{1} for various types of thermal maps, lattices, and Hamiltonian (e.g., Refs. [59, 76, 77, 78, 72, 71]). These results can directly be applied to give lower bounds for the sampling overhead required for quantum error mitigation.

Thermal channels construct an important class of noise channels of physical significance, and the exponential contraction of the relative entropy, i.e., (71) with ξ𝒩<1\xi_{\mathcal{N}}<1, was established for the thermal map that constructs a Markov semi-group [59]. It was conjectured in Ref. [59] that an arbitrary Markov semi-group with a unique fixed point would also show the exponential decay of the relative entropy. If this is true, then Theorem S.3 applies to an even wider class of noise channels, particularly when Ul=𝕀lU_{l}=\mathbb{I}\,\forall l corresponding to the state preservation scenario.

Although this conjecture appears to be plausible, it is still generally elusive to rigorously show the exponential contraction of the relative entropy for non-unital channels. Nevertheless, our framework admits an alternative bound that also grows exponentially with circuit depth, as we show in the following.

The idea is to employ the exponential contraction of the trace distance instead of the relative entropy. For a channel 𝒩\mathcal{N} that has the unique fixed point τ\tau, it holds that [79]

Dtr(𝒩k(ρ),τ)2[s2(𝒩)]k[λmin(τ)]1/2ρ\displaystyle D_{\rm tr}\left(\mathcal{N}^{k}(\rho),\tau\right)\leq 2\left[s_{2}(\mathcal{N})\right]^{k}\left[\lambda_{\min}(\tau)\right]^{-1/2}\quad\forall\rho (87)

for an arbitrary positive integer kk, where 𝒩k\mathcal{N}^{k} refers to the kk sequential applications of 𝒩\mathcal{N}, λmin(τ)\lambda_{\min}(\tau) is the smallest eigenvalue of τ\tau, and s2(𝒩)s_{2}(\mathcal{N}) is the second largest singular value of the map Γτ1/2𝒩Γτ1/2\Gamma_{\tau}^{-1/2}\circ\mathcal{N}\circ\Gamma_{\tau}^{1/2} with Γτ\Gamma_{\tau} defined by Γτ(ρ)τ1/2ρτ1/2\Gamma_{\tau}(\rho)\coloneqq\tau^{1/2}\rho\tau^{1/2}.

Consider the setting of the layered circuit with Ul=𝕀lU_{l}=\mathbb{I}\,\forall l, where we are to use NN MM-qubit circuits as an input to error mitigation. Let 𝒩\mathcal{N} be the MM-qubit noise channel that applies to each layer of a noisy circuit. This is the case where the effective noise channel of the depth-LL noisy circuit is represented by 𝒩L\mathcal{N}^{L}. Then, the bound in Theorem 1 in terms of the relative entropy can be bounded as

Nmaxρ,σ𝕊D𝕆(ρ,σ)2δ2(12ε)2ln2S(𝒩L(ρ)𝒩L(σ))\displaystyle N\geq\max_{\begin{subarray}{c}\rho,\sigma\in\mathbb{S}\\ D_{\mathbb{O}}(\rho,\sigma)\geq 2\delta\end{subarray}}\frac{2(1-2\varepsilon)^{2}}{\ln 2\cdot S(\mathcal{N}^{L}(\rho)\|\mathcal{N}^{L}(\sigma))} maxρ,σ𝕊D𝕆(ρ,σ)2δ(12ε)2λmin(𝒩L(σ))2ln2Dtr(𝒩L(ρ),𝒩L(σ))2\displaystyle\geq\max_{\begin{subarray}{c}\rho,\sigma\in\mathbb{S}\\ D_{\mathbb{O}}(\rho,\sigma)\geq 2\delta\end{subarray}}\frac{(1-2\varepsilon)^{2}\lambda_{\min}(\mathcal{N}^{L}(\sigma))}{2\ln 2\,D_{\rm tr}(\mathcal{N}^{L}(\rho),\mathcal{N}^{L}(\sigma))^{2}} (88)
(12ε)2λmin(𝒩)λmin(τ)8ln2[s2(𝒩)]2L\displaystyle\geq\frac{(1-2\varepsilon)^{2}\lambda_{\min}(\mathcal{N})\lambda_{\min}(\tau)}{8\ln 2\left[s_{2}(\mathcal{N})\right]^{2L}}

where the second inequality is due to the continuity bound of the relative entropy (61), and in the last inequality we used (87) and defined λmin(𝒩)minρλmin(𝒩(ρ))\lambda_{\min}(\mathcal{N})\coloneqq\min_{\rho}\lambda_{\min}(\mathcal{N}(\rho)), which refers to the minimum eigenvalue of the output state of the channel 𝒩\mathcal{N}. This bound shows the exponential sampling cost for an arbitrary channel 𝒩\mathcal{N} that has a unique fixed state and s2(𝒩)<1s_{2}(\mathcal{N})<1.

We remark that λmin(𝒩)\lambda_{\min}(\mathcal{N}) and λmin(τ)\lambda_{\min}(\tau) can be exponentially small with the qubit number MM, which prevents one from using this bound to argue that the circuit with L=Ω(M)L=\Omega(M) must use an exponential sample cost, unlike the one in Theorem S.3. Nevertheless, it still ensures exponential growth with respect to the layer number LL if we see it as a separate parameter from the qubit number MM. We also stress that the tighter bound in Theorem S.3 would become valid once the exponential decay of the relative entropy for non-unital channels is established, going along with future developments in the understanding of non-unital Markovian semi-groups.

As an example of an important non-unital channel, let us discuss the generalized amplitude damping noise [54], which describes the thermal relaxation at finite temperature in, e.g., superconducting systems [80]. The generalized amplitude damping 𝒜γ,κ\mathcal{A}_{\gamma,\kappa} characterized by two parameters γ,κ\gamma,\kappa with 0<γ,κ<10<\gamma,\kappa<1 has the Kraus operators

K1\displaystyle K_{1} =1κ(|00|+1γ|11|)K2=γ(1κ)|01|\displaystyle=\sqrt{1-\kappa}\left(|{0}\rangle\!\langle{0}|+\sqrt{1-\gamma}|{1}\rangle\!\langle{1}|\right)\quad K_{2}=\sqrt{\gamma(1-\kappa)}|{0}\rangle\!\langle{1}| (89)
K3\displaystyle K_{3} =κ(1γ|00|+|11|)K4=γκ|10|.\displaystyle=\sqrt{\kappa}\left(\sqrt{1-\gamma}|{0}\rangle\!\langle{0}|+|{1}\rangle\!\langle{1}|\right)\quad K_{4}=\sqrt{\gamma\kappa}|{1}\rangle\!\langle{0}|.

The generalized amplitude damping channel maps the Pauli operators as

𝒜γ,κ(X)\displaystyle\mathcal{A}_{\gamma,\kappa}(X) =1γX𝒜γ,κ(Y)=1γY\displaystyle=\sqrt{1-\gamma}X\quad\mathcal{A}_{\gamma,\kappa}(Y)=\sqrt{1-\gamma}Y (90)
𝒜γ,κ(Z)\displaystyle\mathcal{A}_{\gamma,\kappa}(Z) =(1γ)Z𝒜γ,κ(𝕀)=𝕀+γ(12κ)Z.\displaystyle=(1-\gamma)Z\quad\mathcal{A}_{\gamma,\kappa}(\mathbb{I})=\mathbb{I}+\gamma(1-2\kappa)Z.

This ensures that at the limit of the infinite number of applications of 𝒜γ,κ\mathcal{A}_{\gamma,\kappa}, the three non-trivial Pauli operators XX, YY, and ZZ vanish, while the identity operator approaches 𝕀+γ(12κ)(k=0(1γ)k)Z=𝕀+(12κ)Z\mathbb{I}+\gamma(1-2\kappa)\left(\sum_{k=0}^{\infty}(1-\gamma)^{k}\right)Z=\mathbb{I}+(1-2\kappa)Z. This shows that the fixed state of 𝒜γ,κ\mathcal{A}_{\gamma,\kappa} is 𝕀+(12κ)Z2\frac{\mathbb{I}+(1-2\kappa)Z}{2}, and more generally, the fixed point of the tensor-product channel 𝒜γ,κk\mathcal{A}_{\gamma,\kappa}^{\otimes k} for an arbitrary positive integer kk is (𝕀+(12κ)Z2)k\left(\frac{\mathbb{I}+(1-2\kappa)Z}{2}\right)^{\otimes k}, because whatever the kk-qubit initial state we start from, only the 𝕀/2k\mathbb{I}/2^{k} term in the Pauli decomposition of the initial state survives after the infinite number of applications of 𝒜γ,κk\mathcal{A}_{\gamma,\kappa}^{\otimes k}, which converges to (𝕀+(12κ)Z2)k\left(\frac{\mathbb{I}+(1-2\kappa)Z}{2}\right)^{\otimes k}. This means that 𝒜γ,κk\mathcal{A}_{\gamma,\kappa}^{\otimes k} has a unique fixed state, to which the bound (88) applies.

In particular, the sampling lower bound for mitigating the local generalized amplitude damping noise applied to MM-qubit circuit is obtained by setting 𝒩=𝒜γ,κM\mathcal{N}=\mathcal{A}_{\gamma,\kappa}^{\otimes M} in (88) with τ=(𝕀+(12κ)Z2)M\tau=\left(\frac{\mathbb{I}+(1-2\kappa)Z}{2}\right)^{\otimes M}. This immediately gives λmin(τ)=κ~M\lambda_{\min}(\tau)=\tilde{\kappa}^{M} where κ~min{κ,1κ}\tilde{\kappa}\coloneqq\min\{\kappa,1-\kappa\}.

To get λmin(𝒜γ,κ)\lambda_{\min}(\mathcal{A}_{\gamma,\kappa}), we note that for a general channel 𝒩\mathcal{N}, if 𝒩\mathcal{N} satisfies 𝒩(ρ)p𝕀\mathcal{N}(\rho)\geq p\mathbb{I} for every state ρ\rho, we have λmin(𝒩)p\lambda_{\min}(\mathcal{N})\geq p. This happens when its Choi state J𝒩=id𝒩(Φ~)J_{\mathcal{N}}=\operatorname{id}\otimes\mathcal{N}(\tilde{\Phi}) for 𝒩\mathcal{N}, where Φ~=|iijj|\tilde{\Phi}=|{ii}\rangle\!\langle{jj}| is the unnormalized maximally entangled state, has the minimum eigenvalue λmin(J𝒩)p\lambda_{\min}(J_{\mathcal{N}})\geq p. This allows us to lower bound λmin(𝒩)\lambda_{\min}(\mathcal{N}) by looking at the eigenvalue of the Choi state. A straightforward calculation reveals that the Choi state of 𝒜γ,κ\mathcal{A}_{\gamma,\kappa} is given by

J𝒜γ,κ=(1γκ001γ0γκ0000γ(1κ)01γ001γ(1κ)),\displaystyle J_{\mathcal{A}_{\gamma,\kappa}}=\begin{pmatrix}1-\gamma\kappa&0&0&\sqrt{1-\gamma}\\ 0&\gamma\kappa&0&0\\ 0&0&\gamma(1-\kappa)&0\\ \sqrt{1-\gamma}&0&0&1-\gamma(1-\kappa)\end{pmatrix}, (91)

which gives λmin(J𝒜γ,κM)=[λmin(J𝒜γ,κ)]M=(1γ/21γ+γ2(κ1/2)2)M\lambda_{\min}(J_{\mathcal{A}_{\gamma,\kappa}^{\otimes M}})=\left[\lambda_{\min}(J_{\mathcal{A}_{\gamma,\kappa}})\right]^{M}=\left(1-\gamma/2-\sqrt{1-\gamma+\gamma^{2}(\kappa-1/2)^{2}}\right)^{M}.

Finally, s2(𝒜γ,κM)s_{2}(\mathcal{A}_{\gamma,\kappa}^{\otimes M}) can be obtained by computing the singular values of Γτ1/2𝒜γ,κMΓτ1/2\Gamma_{\tau}^{-1/2}\circ\mathcal{A}_{\gamma,\kappa}^{\otimes M}\circ\Gamma_{\tau}^{1/2}. Letting τ1𝕀+(12κ)Z2\tau_{1}\coloneqq\frac{\mathbb{I}+(1-2\kappa)Z}{2} so that τ=τ1M\tau=\tau_{1}^{\otimes M}, we get Γτ1/2𝒜γ,κMΓτ1/2=(Γτ11/2𝒜γ,κΓτ11/2)M\Gamma_{\tau}^{-1/2}\circ\mathcal{A}_{\gamma,\kappa}^{\otimes M}\circ\Gamma_{\tau}^{1/2}=\left(\Gamma_{\tau_{1}}^{-1/2}\circ\mathcal{A}_{\gamma,\kappa}\circ\Gamma_{\tau_{1}}^{1/2}\right)^{\otimes M}. This implies that the singular values of the map Γτ1/2𝒜γ,κMΓτ1/2\Gamma_{\tau}^{-1/2}\circ\mathcal{A}_{\gamma,\kappa}^{\otimes M}\circ\Gamma_{\tau}^{1/2} on MM-qubit systems are obtained by multiplying the singular values of the single-qubit map Γτ11/2𝒜γ,κΓτ11/2\Gamma_{\tau_{1}}^{-1/2}\circ\mathcal{A}_{\gamma,\kappa}\circ\Gamma_{\tau_{1}}^{1/2}. In addition, since the largest singular value of Γτ11/2𝒜γ,κΓτ11/2\Gamma_{\tau_{1}}^{-1/2}\circ\mathcal{A}_{\gamma,\kappa}\circ\Gamma_{\tau_{1}}^{1/2} is 1, we get s2(𝒜γ,κM)=s2(𝒜γ,κ)s_{2}(\mathcal{A}_{\gamma,\kappa}^{\otimes M})=s_{2}(\mathcal{A}_{\gamma,\kappa}) allowing us to only compute the second largest singular value of the single-qubit map Γτ11/2𝒜γ,κΓτ11/2\Gamma_{\tau_{1}}^{-1/2}\circ\mathcal{A}_{\gamma,\kappa}\circ\Gamma_{\tau_{1}}^{1/2}. This can be obtained as s2(𝒜γ,κM)=s2(𝒜γ,κ)=1γs_{2}(\mathcal{A}_{\gamma,\kappa}^{\otimes M})=s_{2}(\mathcal{A}_{\gamma,\kappa})=\sqrt{1-\gamma} by a simple computation.

Plugging these quantities into (88) results in

N(12ε)2[κ~(1γ/21γ+γ2(κ1/2)2)]M8ln2(1γ)L,κ~min{κ,1κ},\displaystyle N\geq\frac{(1-2\varepsilon)^{2}\left[\tilde{\kappa}\left(1-\gamma/2-\sqrt{1-\gamma+\gamma^{2}(\kappa-1/2)^{2}}\right)\right]^{M}}{8\ln 2\left(1-\gamma\right)^{L}},\quad\tilde{\kappa}\coloneqq\min\{\kappa,1-\kappa\}, (92)

which particularly grows exponentially with the layer number LL, noting that 0<γ<10<\gamma<1.

Appendix J Relation to previous works on noisy layered circuits

Theorems 3 and S.2 imply that if an nn-qubit quantum circuit with super-logarithmic depth suffers from local depolarizing noise (and other types of noise as discussed in Appendix I), super-polynomial number of samples are required, making the computation infeasible even with the help of quantum error mitigation. The observation that computation quickly becomes useless under the noise effects dates back to the seminal work by Aharonov et al. [10] (see, e.g., Refs. [9, 11] for the works that put forward and investigated related ideas). Here, we clarify differences between our results and the previous works, in particular Ref. [10].

The setting discussed in Ref. [10] involves a quantum circuit with dd layers of unitary gates, where each layer is followed by the action of local depolarizing noise. After the dd th layer, each qubit is measured with the computational basis. The authors showed that when the depth is super-logarithmic with the qubit number, the probability of obtaining 0 (and 1) cannot be away from 1/2 by a constant amount. This implies that to realize the “useful” computation — which should not be simulable by a random guess — the noisy quantum circuit should be restricted to the one with logarithmic depth. The authors showed this result by employing the fact — which was also shown in the same manuscript — that the relative entropy S(ρ𝕀/2n)=nS(ρ)S(\rho\|\mathbb{I}/2^{n})=n-S(\rho) decreases exponentially with circuit depth. They combined this with the fact that the computational-basis measurement (followed by partial trace) does not increase the relative entropy, showing that the entropy of the probability distribution of the measurement outcome of the first qubit can only be deviated from the maximal value by the amount scaling as n2dn2^{-d}. Therefore, to ensure the probability of obtaining 0 or 1 to be away from 1/2 by a constant amount, one needs to have d=𝒪(logn)d=\mathcal{O}(\log n) to allow the entropy to be away from the maximal value with a large nn limit.

Although the above consideration contains important insights, and our results — as well as many other follow-up works after this such as Refs. [59, 43, 42, 41, 40] — benefit from their inspirations, the argument there is not directly applicable to the setting involving quantum error mitigation. As explained in the main text, the idea of quantum error mitigation is to use quantum and classical resources in hybrid, in which postprocessing computation after the noisy circuit is essential. In our framework, this part is included in the trailing process (𝒫A\mathcal{P}_{A} in Fig. 1 in the main text) — indeed, our model allows an arbitrary quantum operation as the trailing process, which also includes classical postprocessing computation. Importantly, the trailing process is not unital in general, and therefore, entropy can decrease in the postprocessing step, preventing the results in Ref. [10] from being directly carried over.

We solve this problem by first reducing the performance of error mitigation to the distinguishability of two noisy quantum states, which eventually leads to Theorems 1 and 2 providing the necessary sampling cost to achieve the target error mitigation performance with respect to entropic measures. It is a priori not obvious how entropic quantity is quantitatively related to the operational performance quantifiers such as accuracy–probability and bias–standard deviation of quantum error mitigation. Theorems 1 and 2 establish the connections between these quantities, which then allow us to focus on analyzing the entropic measures, where we can apply the previous findings, such as the exponential decay of relative entropy from the maximally mixed state under local depolarizing noise, to show the exponential sampling overhead for the general error mitigation.

In addition, our quantitative bounds — which give nearly tight estimates for certain cases — provide concrete sampling lower bounds beyond just the exponential scaling behavior for layered circuits, which could also be used as a benchmark for error mitigation strategies.

Let us also remark on how our results are related to the prior works that investigated the capability of general error mitigation, which involves postprocessing operations. Refs. [39, 40] studied the maximum estimator spread, i.e., the range of outcomes of the estimator, imposed on general error mitigation protocols. These works showed that, for error mitigation with linear postprocessing, the maximum estimator spread must grow exponentially with the circuit depth for noisy layered circuits under local depolarizing noise. Although these results hinted that the exponential samples would be required in general error mitigation, they were not conclusive because the maximum estimator spread provides only a sufficient number of samples to ensure a certain accuracy. Our results close this gap by showing that exponential samples are necessary. Ref. [43] studied the general framework introduced in Ref. [39] and obtained a concentration bound, which places an upper bound for the probability of getting an estimate away from the value for the maximally mixed state by a certain amount. Their bound implies that, when the circuit depth is Ω(logN)\Omega(\log N), the probability of achieving a certain constant accuracy will become exponentially small with respect to the accuracy multiplied by the Lipschitz constant of the estimator. However, as the authors pointed out, the Lipschitz constant of the estimator becomes exponentially large for many error mitigation protocols, which severely restricts the applicability of their bound. Also, their bound does not directly provide a sampling lower bound to achieve a certain accuracy. Our results, on the other hand, encompass the standard estimators with exponentially large Lipschitz constants and provide the first explicit sampling lower bounds that apply to general error mitigation protocols.