This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Purity-Assisted Zero-Noise Extrapolation for Quantum Error Mitigation

Abstract

Quantum error mitigation aims to reduce errors in quantum systems and improve accuracy. Zero-noise extrapolation (ZNE) is a commonly used method, where noise is amplified, and the target expectation is extrapolated to a noise-free point. However, ZNE relies on assumptions about error rates based on the error model. In this study, a purity-assisted zero-noise extrapolation (pZNE) method is utilized to address limitations in error rate assumptions and enhance the extrapolation process. The pZNE is based on the Pauli diagonal error model implemented using the Pauli twirling technique. Although this method does not significantly reduce the bias of routine ZNE, it extends its effectiveness to a wider range of error rates where routine ZNE may face limitations. In addition, the practicality of the pZNE method is verified through numerical simulations and experiments on the online quantum computation platform, Quafu. Comparisons with routine ZNE and virtual distillation methods show that biases in extrapolation methods increase with error rates and may become divergent at high error rates. The bias of pZNE is slightly lower than routine ZNE, while its error rate threshold surpasses that of routine ZNE. Furthermore, for full density matrix information, the pZNE method is more efficient than the routine ZNE.

keywords:
quantum error mitigation, zero-noise extrapolation, purity, Pauli twirling, quantum computation platform, superconducting qubits

Tian-Ren Jin Yun-Hao Shi Zheng-An Wang Tian-Ming Li Kai Xu* Heng Fan*

{affiliations}

T.-R. Jin, Dr. Y.-H. Shi, T.-M. Li, Prof. K. Xu, Prof. H. Fan
Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China
School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
Email Address: [email protected], [email protected]

Dr. Z.-A. Wang, Prof. K. Xu, Prof. H. Fan
Beijing Academy of Quantum Information Sciences, Beijing 100193, China
Hefei National Laboratory, Hefei 230088, China

Prof. K. Xu, Prof. H. Fan
Songshan Lake Materials Laboratory, Dongguan 523808, China
CAS Center for Excellence in Topological Quantum Computation, UCAS, Beijing 100190, China

1 Introduction

Quantum computation is expected to surpass classical computation for specific problems; nevertheless, the vulnerability of qubits to environmental noise poses a significant challenge to realizing a practical quantum computer. Consequently, the development of fault-tolerant quantum computation methodologies is crucial for enabling the widespread application of quantum computation. Quantum error correction code (QECC) provides a systematic solution to fault-tolerant quantum computation [1], while its application requires many physical qubits and the error rate should be lower than a threshold [2, 3, 4]. With remarkable progress [5, 6], however, it is challenging for the state-of-the-art technique to meet the requirements for the full implementation of QECCs. Therefore, fault-tolerant quantum computation based on high-precision logic qubits is still a far-reaching task.

In the noisy intermediate-scale quantum era [7], quantum error mitigation (QEM) provides an alternative approach to dealing with noise. QECCs detect and correct errors that occur in the quantum process to ensure that there are no errors in the coded logic qubits. On the contrary, QEMs allow errors to occur, but use some post-processing techniques to reduce the bias between the output information of noisy quantum circuit and the ideal quantum circuit [8, 9].

In recent years, several QEM schemes have been proposed, including zero-noise extrapolation (ZNE) [10, 11, 12, 13, 14], probabilistic error cancellation (PEC) [13, 14, 15, 16], symmetry verification methods [17, 18, 19], purification methods [20, 21, 22, 23], subspace expansion [24, 25, 26, 27], and learning-based error mitigation [28, 29]. These methods have been applied on different quantum computation platforms and interesting physics [15, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 31, 44]. In addition, there have been some efforts to integrate various QEM schemes in a generalized framework [45, 19, 46, 47, 48, 49]. Among these various error mitigation schemes, the ZNE and PEC are used to deal with quantum circuits, which based on specific error models of noise. The PEC method mitigates the error by post-cancelling the noise, which is previously estimated from experiments in the selected error basis. Although the Pauli basis is sufficient with Pauli twirling technique [50, 51, 52, 34], in practice, the selected basis should be truncated for large systems, such as the 22-weighted Pauli basis [34]. Thus, the effectiveness of PEC depends on the error model described by the error basis. Recently, an attempt was made to integrate ZNE into the PEC framework [47].

However, the ZNE method is a subtle approach based on the analyticity of the observable expectation related to the error rate, which is a fundamental physical condition. The relation to the error model emerges when the error rate is considered. To perform the ZNE, the error must be amplified, and the ideal value is obtained by extrapolating the expectation backward to the point of zero noise. In the early application of ZNE, the error is amplified by stretching the duration of pulse for each gate [15]. However, this amplification is not suitable for digital quantum gates. Another implementation is to deliberately add redundant gates in the circuit [30], which is improved with a treatment called unitary folding [53]. In the unitary folding method, the redundant gates are implemented by the target circuit of interest. Specifically, the circuit evolves forward and backward successively, and the error rate is characterized by the number of folds. Although the amplifications of the error are developed more feasible, the estimation of the error rate is a priori, which has limitations on the noise type. Moreover, for the complex circuit in the experiment, the effect of errors cannot be sufficiently described by a single parameter of the error rate alone.

In the ZNE method, dependence of the error model in noisy amplification is contrary to the universal idea of the extrapolation, which limits the effectiveness of this method. In this paper, we try to release the limitation of noise in the ZNE method with the assistance of the purity of the noisy output state. Here, we introduce the purity-assisted zero-noise extrapolation (pZNE) method. Specifically, we propose a modified purification method based on the Pauli diagonal error model. By estimating the bias of the modified purification methods, it is shown that the bias is controlled by error rate, which can be represented by purity of noisy state. The pZNE method mitigates the noisy expectation by extrapolating the mitigated expectations of the modified purification method with different error rates to a noise-free pure state. Equivalently, it can be applied by extrapolating the noisy expectations with purity.

This paper is organized as follows. In Sec. 2, we review the routine ZNE method and analyze how this method is based on the error model of noise. Then, we show that the purity of the output state can assist the extrapolation in zero-noise error mitigation in Sec. 3. We describe the detailed procedures of pZNE in Sec. 4, including the fitting model, purity estimation methods, and the measurement overhead. We also verify this method by numerical simulations and experiments on Quafu cloud-based quantum computation platform in Sec. 5. The conclusion and discussion are given in Sec. 6.

2 Zero-noise Extrapolation with Unitary Folding

The ZNE method consists of two steps:

  1. 1.

    Noise amplification: Collecting the raw data of different error rates via measuring on modulated circuits.

  2. 2.

    Extrapolation: Post-processing the experiment data to obtain the mitigated expectation value based on some specific fitting models.

In this section, we will review the ZNE method, the fitting models, and the noise amplification method of unitary folding. In addition, we discuss how the assumption that the error rate is proportional to the number of folds is based on the error model of noise.

Consider the ideal unitary circuit 𝒰\mathcal{U} of interest. However, in practice, we can only perform a noisy circuit 𝒰λ\mathcal{U}_{\lambda} instead. Then the experimental expectation value O^λ=Tr[O^ρλ]\braket{\hat{O}}_{\lambda}=\mathrm{Tr}[\hat{O}\rho_{\lambda}] of observable O^\hat{O} of interest is different from the ideal one O^0=Tr[O^ρ0]\braket{\hat{O}}_{0}=\mathrm{Tr}[\hat{O}\rho_{0}], where ρλ=𝒰λ(ρ)\rho_{\lambda}=\mathcal{U}_{\lambda}(\rho) is the noisy output state, and ρ0=𝒰(ρ)\rho_{0}=\mathcal{U}(\rho) is the ideal output state. Now, we are going to infer the ideal value O^0\braket{\hat{O}}_{0} from some noisy expectation values O^λ\braket{\hat{O}}_{\lambda}.

This method is inspired by a simple idea that the expectation value O^λ\braket{\hat{O}}_{\lambda} is an analytic function of a parameter λ0\lambda\geq 0, which characterizes the noise level of noisy circuit 𝒰λ\mathcal{U}_{\lambda}, and assume O^λ=O^0\braket{\hat{O}}_{\lambda}=\braket{\hat{O}}_{0} when λ=0\lambda=0. According to analyticity, the expectation value can be expanded in a power series of λ\lambda as

O^λ=O(λ)=n=0Onλn,\braket{\hat{O}}_{\lambda}=O(\lambda)=\sum_{n=0}^{\infty}O_{n}\lambda^{n}, (1)

and by definition of λ\lambda, we have O0=O(0)=O^0O_{0}=O(0)=\braket{\hat{O}}_{0}. Thus, by interpolation the function O(λ)O(\lambda) with expectations O^λi\braket{\hat{O}}_{\lambda_{i}} of different error rate λi\lambda_{i}, one can infer the noise-free expectation value O^0\braket{\hat{O}}_{0}.

For the number of experimental data is finite, the expansion of O(λ)O(\lambda) should be truncated at some order MM of λ\lambda, which requires that the error rate λ\lambda should small enough, and this is the polynomial model of function O(λ)O(\lambda) [10, 13]. In addition, there are also (multi-)exponential model [11]

O^λ=O(λ)=a=0MBaeγaλ,\braket{\hat{O}}_{\lambda}=O(\lambda)=\sum_{a=0}^{M}B_{a}e^{-\gamma_{a}\lambda}, (2)

and poly-exponential model [53]

O^λ=O(λ)=ef(λ)+O,\braket{\hat{O}}_{\lambda}=O(\lambda)=e^{f(\lambda)}+O_{\infty}, (3)

where f(λ)f(\lambda) is a polynomial of λ\lambda. These fitting models are put forward under the consideration that the physical observable should be bounded.

Then, the noisy data of different error rates λi\lambda_{i} is obtained by error amplification. The unitary folding performs the circuit sequence in the experiment

𝒰n=𝒰λ(𝒰λ𝒰λ)n1=(𝒰λ𝒰λ)n1𝒰λ,\mathcal{U}_{n}=\mathcal{U}_{\lambda}\circ(\mathcal{U}_{\lambda}^{\dagger}\circ\mathcal{U}_{\lambda})^{n-1}=(\mathcal{U}_{\lambda}\circ\mathcal{U}_{\lambda}^{\dagger})^{n-1}\circ\mathcal{U}_{\lambda}, (4)

where \circ denotes the composition of quantum operations, the corresponding error rate is expected as λn=(2n+1)λ0\lambda_{n}=(2n+1)\lambda_{0}, and the resolution is 2λ02\lambda_{0}. The circuit is shown in Figure 1. To have a more fine-grained resolution, it can also perform the layer (gate) folding [53], which assumes the gate can be decomposed into layers 𝒰λ=i=1di\mathcal{U}_{\lambda}=\prod_{i=1}^{d}\circ\mathcal{L}_{i}, and performs the folding on these layers

i(ii)ni\mathcal{L}_{i}\mapsto(\mathcal{L}_{i}\circ\mathcal{L}_{i}^{\dagger})^{n}\circ\mathcal{L}_{i} (5)

in the circuit. By doing so, the resolution can be promoted to 2λ/d2\lambda/d.

Refer to caption
Figure 1: The circuit of unitary folding.

To consider the procedure of unitary folding in more detail, we assume that the circuit is a gate only, and set the noise in forward and backward evolutions as

𝒰λ\displaystyle\mathcal{U}_{\lambda} =f𝒰,\displaystyle=\mathcal{E}_{f}\circ\mathcal{U}, (6)
𝒰λ\displaystyle\mathcal{U}_{\lambda}^{\dagger} =𝒰b,\displaystyle=\mathcal{U}^{\dagger}\circ\mathcal{E}_{b}, (7)

which can be interpreted as the definition of forward error f\mathcal{E}_{f} and backward error b\mathcal{E}_{b}. The error rate for both errors is denoted as λf\lambda_{f} and λb\lambda_{b} respectively. Then, we formally have

𝒰λ𝒰λ=fb,\mathcal{U}_{\lambda}\circ\mathcal{U}_{\lambda}^{\dagger}=\mathcal{E}_{f}\circ\mathcal{E}_{b}\equiv\mathcal{E}, (8)

thus the noisy gate is

𝒰n=n𝒰,\mathcal{U}_{n}=\mathcal{E}_{n}\circ\mathcal{U}, (9)

where n=n1f\mathcal{E}_{n}=\mathcal{E}^{n-1}\circ\mathcal{E}_{f} is the error channel with the error rate λn=(n1)λ~+λf\lambda_{n}=(n-1)\tilde{\lambda}+\lambda_{f}, where λ~\tilde{\lambda} is the error rate of error channel \mathcal{E}.

To obtain λn=(2n1)λf\lambda_{n}=(2n-1)\lambda_{f}, we can assume λ~=2λf\tilde{\lambda}=2\lambda_{f}, which is true when the error channel satisfies f=b\mathcal{E}_{f}=\mathcal{E}_{b}. However, this assumption is not always realized [54] (or see Appendix A). If we assume λ~2λf\tilde{\lambda}\approx 2\lambda_{f}, it will introduce an additional error in the inference of the ideal value O(0)O(0) before fitting the experimental data. As a result, the ZNE method is constrained by the error model of noise. The direct way to solve this problem is to determine the error rate λ\lambda from experiments.

3 Extrapolation with Purity

Fidelity is always used to witness the difference between the state with an ideal state, however, the ideal state ρ0\rho_{0} cannot be prepared precisely due to the inevitable noise in the experiment. Therefore, measuring the fidelity of the noisy state and the ideal state requires the classical simulation of the quantum circuit, which is generally impractical for large systems. In this section, we will describe the idea of using the purity of the output state of the noisy circuit

p(ρλ)=Tr|ρλ|2=Tr(ρλρλ)p(\rho_{\lambda})=\mathrm{Tr}|\rho_{\lambda}|^{2}=\mathrm{Tr}(\rho_{\lambda}^{\dagger}\rho_{\lambda}) (10)

to reflect the influence of noise in the circuit. The ideal unitary circuit will not influence the purity of the state. We assume that the error is not unitary, otherwise, it can be canceled by the folding of the well-calibrated circuit. Moreover, we assume the error is in the form of Pauli diagonal

r=iqri𝒫i,\mathcal{E}_{r}=\sum_{i}q_{ri}\mathcal{P}_{i}, (11)

where 𝒫i()=P^iP^i\mathcal{P}_{i}(\cdot)=\hat{P}_{i}\cdot\hat{P}_{i} is Pauli operations, and r=f,br=f,b distinguish the forward and backward errors. This assumption can be realized by the Pauli twirling technique.

Let the ideal state be decomposed into Pauli basis ρ0=𝒰(ρ)=1DiρiP^i\rho_{0}=\mathcal{U}(\rho)=\frac{1}{D}\sum_{i}\rho_{i}\hat{P}_{i}. The Pauli diagonal forward error channel acts as

r(ρ0)=1Di,jqrjρi𝒫j(P^i)=1DiχriρiP^i,\mathcal{E}_{r}(\rho_{0})=\frac{1}{D}\sum_{i,j}q_{rj}\rho_{i}\mathcal{P}_{j}(\hat{P}_{i})=\frac{1}{D}\sum_{i}\chi_{ri}\rho_{i}\hat{P}_{i}, (12)

where χri=jϵijqrj\chi_{ri}=\sum_{j}\epsilon_{ij}q_{rj} is the eigenvalue of the error, with 𝒫j(P^i)=ϵijP^i\mathcal{P}_{j}(\hat{P}_{i})=\epsilon_{ij}\hat{P}_{i}. We assume that the operators whose expectation values of interest are Pauli operators. Then, the expectation of P^i\hat{P}_{i} of the folded noisy state ρn=𝒰n(ρ)=n(ρ0)\rho_{n}=\mathcal{U}_{n}(\rho)=\mathcal{E}_{n}(\rho_{0}) are

P^in=Tr[P^in(ρ0)]=χfinχbin1ρi\displaystyle\braket{\hat{P}_{i}}_{n}=\mathrm{Tr}[\hat{P}_{i}\mathcal{E}_{n}(\rho_{0})]=\chi_{fi}^{n}\chi_{bi}^{n-1}\rho_{i} (13)

By fitting these expectations with the exponential model, only the parameters χfiχbi\chi_{fi}\chi_{bi} and χfiρi\chi_{fi}\rho_{i} are obtained. To find the ideal expectation P^i0=ρ0\braket{\hat{P}_{i}}_{0}=\rho_{0}, we should estimate χfi\chi_{fi}. In general, χfiχbi\chi_{fi}\neq\chi_{bi}, so the ideal expectation cannot be extracted.

We use purity to do this estimation. The purity of the noisy state ρn\rho_{n} is

pn=1Diρi2χni2=(p0p)χn2¯+p,p_{n}=\frac{1}{D}\sum_{i}\rho_{i}^{2}\chi_{ni}^{2}=\left(p_{0}-p_{\infty}\right)\overline{\chi_{n}^{2}}+p_{\infty}, (14)

where χn2¯=χni<1ρi2χni2χni<1ρi2\overline{\chi_{n}^{2}}=\frac{\sum_{\chi_{ni}<1}\rho_{i}^{2}\chi_{ni}^{2}}{\sum_{\chi_{ni}<1}\rho_{i}^{2}} is the average of nontrivial eigenvalues χni=χfinχbin1\chi_{ni}=\chi_{fi}^{n}\chi_{bi}^{n-1} of folding circuits, p0=1Diρi2p_{0}=\frac{1}{D}\sum_{i}\rho_{i}^{2} is the purity of ideal state, and pp_{\infty} is the purity of the stable state of the forward error channel. For convenience, we can assume the ideal state is pure, p0=1p_{0}=1, and the stable state is a maximally mixed state, p=1/Dp_{\infty}=1/D. If the distribution of χni\chi_{ni} concentrates, so that the variance Δχ¯n2=χn2¯χ¯n2\Delta\overline{\chi}_{n}^{2}=\overline{\chi_{n}^{2}}-\overline{\chi}_{n}^{2} is small, then we can approximate

χniχn2¯1/2=pnpp0p,\chi_{ni}\approx\overline{\chi_{n}^{2}}^{1/2}=\sqrt{\frac{p_{n}-p_{\infty}}{p_{0}-p_{\infty}}}, (15)

and the ideal expectation is estimated as

P^i0P^inest=P^inp0ppnp.\braket{\hat{P}_{i}}_{0}\approx\braket{\hat{P}_{i}}_{n\mathrm{est}}=\braket{\hat{P}_{i}}_{n}\sqrt{\frac{p_{0}-p_{\infty}}{p_{n}-p_{\infty}}}. (16)

The effectiveness of this approximation that χniχn2¯1/2\chi_{ni}\approx\overline{\chi_{n}^{2}}^{1/2} requires Δχ¯n2\Delta\overline{\chi}_{n}^{2} to be small, which is based mainly on the distribution of eigenvalue of error channel and slightly on the state ρ0\rho_{0} in large systems. The relative bias of this method is

|P^inestP^i01|=|χniχn2¯1/21|.\left|\frac{\braket{\hat{P}_{i}}_{n\mathrm{est}}}{\braket{\hat{P}_{i}}_{0}}-1\right|=\left|\frac{\chi_{ni}}{\overline{\chi_{n}^{2}}^{1/2}}-1\right|. (17)

Given a tolerant error ϵ\epsilon, the probability of failing is

P(χn2¯1/2|χniχn2¯1/2|ϵ)2eϵ22σ2coshϵ2,P\left(\overline{\chi_{n}^{2}}^{-1/2}\left|\chi_{ni}-\overline{\chi_{n}^{2}}^{1/2}\right|\geq\epsilon\right)\leq 2e^{-\frac{\epsilon^{2}}{2\sigma^{2}}}\cosh\frac{\epsilon}{2}, (18)

where σ2=Δχ¯n2χ¯n2\sigma^{2}=\frac{\Delta\bar{\chi}_{n}^{2}}{\bar{\chi}_{n}^{2}}, by Chebyshev’s inequality (for detail, see Appendix D).

Denote the failing probability as δ\delta, and select the tolerant error ϵ\epsilon so small, we expand the left-hand of inequality to the leading term of ϵ\epsilon

ϵ2σ2δ4σ22σ,\epsilon\leq 2\sigma\sqrt{\frac{2-\delta}{4-\sigma^{2}}}\sim\sqrt{2}\sigma, (19)

which means that the pZNE method works if the eigenvalues of forward error satisfy σ2\sigma\leq 2, and the tolerant error ϵ\epsilon is controlled by σ\sigma. The parameter σ\sigma relies on the form of the forward error, for example, for the global depolarizing error, all the nontrivial eigenvalues χni,(i0)\chi_{ni},(i\neq 0) are the same, so σ=0\sigma=0 no matter how large the error. Moreover, without specifying the form of errors, this parameter is controlled by the error probability qλ=i0qniq_{\lambda}=\sum_{i\neq 0}q_{ni}, where qniq_{ni} is the coefficients of Pauli diagonal error n\mathcal{E}_{n}. Obviously 12qλχ¯n11-2q_{\lambda}\leq\bar{\chi}_{n}\leq 1, we have

σ1χ¯nχ¯n2qλ12qλ,\sigma\leq\sqrt{\frac{1-\bar{\chi}_{n}}{\bar{\chi}_{n}}}\leq\sqrt{\frac{2q_{\lambda}}{1-2q_{\lambda}}}, (20)

thus when the error is so small that qλ<0.4q_{\lambda}<0.4, this method always has the possibility of success, and the relative bias is roughly controlled as

ϵ2(2δ)qλ25qλ.\epsilon\leq 2\sqrt{\frac{(2-\delta)q_{\lambda}}{2-5q_{\lambda}}}. (21)

With more detail estimation (see Appendix D), ϵ2λ+O(λ2)\epsilon\sim\sqrt{2}\lambda+O(\lambda^{2}).

In all, the estimator in Equation (16) has bias controlled by the error probability qλq_{\lambda} in general, so the number of folding. Therefore, we can extrapolate the estimators of different folding circuits to reduce bias. In this step there are two strategies. One is to extrapolate the estimators P^inest\braket{\hat{P}_{i}}_{n\mathrm{est}} to n=12n=\frac{1}{2}, which is similar to the extrapolation of routine ZNE. Another is to extrapolate the estimators P^inest\braket{\hat{P}_{i}}_{n\mathrm{est}} along the effective error rate sn=lnpnpp0ps_{n}=-\ln\frac{p_{n}-p_{\infty}}{p_{0}-p_{\infty}} to sn0=0s_{n_{0}}=0, and in this effective error rate,

P^in=P^ineste12sn=f(sn),\braket{\hat{P}_{i}}_{n}=\braket{\hat{P}_{i}}_{n\mathrm{est}}e^{-\frac{1}{2}s_{n}}=f(s_{n}), (22)

which means that we can also extrapolate the noisy expectation P^in\braket{\hat{P}_{i}}_{n} with the effective error rate, or purity.

4 Procedure of Methods

In Sec. 3, we discuss the procedure of the pZNE method. Assume our quantum computation task consists of (ρ,𝒰,O^)(\rho,\mathcal{U},\hat{O}), where ρ\rho is the initial state, 𝒰\mathcal{U} is the ideal unitary evolution, and O^\hat{O} is the operator of interest. The outcome we want from this computation task is the expectation O^0=Tr[O^𝒰(ρ)]\braket{\hat{O}}_{0}=\mathrm{Tr}[\hat{O}\mathcal{U}(\rho)]. The procedure of the pZNE method to mitigate the error in this task is following.

  1. 1.

    Task decomposing: Decompose the operator O^\hat{O} of interest into Pauli basis O^=iIOiP^i\hat{O}=\sum_{i\in I}O_{i}\hat{P}_{i}, where II is the set of indices with nonzero coefficients OiO_{i}.

  2. 2.

    Pauli twirling: Construct the Pauli-twirling the ideal circuit 𝒰twirl\mathcal{U}_{\mathrm{twirl}}, which corresponds to the noisy circuit 𝒰λ\mathcal{U}_{\lambda} in experiment.

  3. 3.

    Task performing: Perform the circuit 𝒰n\mathcal{U}_{n} to estimate the expectation P^in=Tr[P^i𝒰n(ρ)]\braket{\hat{P}_{i}}_{n}=\mathrm{Tr}[\hat{P}_{i}\mathcal{U}_{n}(\rho)] and the purity pn=Tr[𝒰n(ρ)2]p_{n}=\mathrm{Tr}[\mathcal{U}_{n}(\rho)^{2}].

  4. 4.

    Data processing: Estimate P^inest\braket{\hat{P}_{i}}_{n\mathrm{est}} with P^in\braket{\hat{P}_{i}}_{n} and pnp_{n}. Extrapolate P^inest\braket{\hat{P}_{i}}_{n\mathrm{est}} to noise-free point, n=12n=\frac{1}{2} or sn0=0s_{n_{0}}=0. Calculate the estimation of the ideal expectation O^0\braket{\hat{O}}_{0} with coefficients OiO_{i}.

In this section, we will discuss the steps of task performing and data processing (for the technique of Pauli twirling, see Appendix C).

4.1 Task Performing

The unitary folding technique to construct the circuit sequence 𝒰n{\mathcal{U}}_{n} is discussed in the previous section. Unlike the routine ZNE method, we use purity to assist the estimation of ideal expectations. However, it is not an easy task to measure purity, since purity is not an expectation of linear operator on Hilbert space \mathcal{H}, but 2\mathcal{H}^{\otimes 2}. In this subsection, we will introduce the purity estimation methods.

Refer to caption
Figure 2: The circuit of random measurement. The gate 𝒰i\mathcal{U}_{i} is uniformly sampled from an ensemble of unitary gates, like the n-qubit or tensor products of single-qubit Clifford unitaries. The quantum state ρ\rho can be recovered from the n-bit measurement outcome and the choice of the unitary gate 𝒰i\mathcal{U}_{i}

The most trivial way to measure the purity is to perform the quantum state tomography (QST), and then calculate the purity by definition. However, QST is so expensive that it is hard to perform on a large system. Therefore, we need more efficient methods to measure purity. The purity can be measured by random measurement [55], where we randomly measure the operators sampled from an ensemble, typically the Pauli string of the qubits. The circuit is shown in Figure 2. This method is called classical shadow, which can substitute QST. By random measurement of operators with the Monte Carlo sampling, this method is more feasible than QST for large-size systems. The number of measurements is shown in the order of

log(M)supiOiϵ2,\sim\frac{\log(M)\sup_{i}\|O_{i}\|}{\epsilon^{2}}, (23)

which is determined by the observable operators O^i,(i=1,,M)\hat{O}_{i},(i=1,\dots,M) of interest, the ensemble of unitary gates, and the precision ϵ\epsilon. However, since it is equivalent to QST in probability, it cannot solve the problem of exponential increasing dimension of the quantum system.

Refer to caption
Figure 3: The circuit of (a) S^(M)\hat{S}^{(M)} and (b) the Hadamard test of VD/ESD.

In the purification method, the measurement of quantities Tr(ρM)\mathrm{Tr}(\rho^{M}) is the key problem. The measurement method with a replica is proposed, which is founded on the following equation

Tr(O^ρM)=Tr(O^(i)S^(M)ρM),\mathrm{Tr}\left(\hat{O}\rho^{M}\right)=\mathrm{Tr}\left(\hat{O}_{(i)}\hat{S}^{(M)}\rho^{\otimes M}\right), (24)

where ρM\rho^{\otimes M} is MM copies of ρ\rho, O^(i)\hat{O}_{(i)} is the operator O^\hat{O} acted on the ii-th copy, typically i=1i=1, and S^(M)\hat{S}^{(M)} is cyclic permutation on MM copies

S^(M)|ψ1|ψ2|ψM=|ψ2|ψM|ψ1.\hat{S}^{(M)}\ket{\psi_{1}}\otimes\ket{\psi_{2}}\otimes\dots\otimes\ket{\psi_{M}}=\ket{\psi_{2}}\otimes\dots\otimes\ket{\psi_{M}}\otimes\ket{\psi_{1}}. (25)

The purity can be mapped to the expectation of S^(2)\hat{S}^{(2)} on two-replica state ρ2\rho^{\otimes 2}. The circuit of S^(M)\hat{S}^{(M)} is shown in Figure 3(a), and the circuit to measure the expectation in Equation (24) is shown in Figure 3(b). This method allows us to measure the purity directly from the quantum circuit, which is generally used in the so-called virtual distillation or error suppression by derangement (VD/ESD) in the purification method [21]. Although it can reduce the exponentially increasing quantities measured in QST, it enlarges the space complexity of qubits for the preparation of 22 copies, and the SWAP gates needed in applying S^(2)\hat{S}^{(2)} will enlarge the time complexity of the quantum circuit.

4.2 Data Processing

With the data P^in\braket{\hat{P}_{i}}_{n}, pnp_{n} from experiments, we calculate the effective error rate sns_{n} and the estimators P^inest\braket{\hat{P}_{i}}_{n\mathrm{est}} in Equation (16). Then, we extrapolate the estimators to noise-free estimator P^iest\braket{\hat{P}_{i}}_{\mathrm{est}}, where n=12n=\frac{1}{2} or sn0=0s_{n_{0}}=0. Finally, calculate the estimator of ideal expectation O^0=iOiP^iest\braket{\hat{O}}_{0}=\sum_{i}O_{i}\braket{\hat{P}_{i}}_{\mathrm{est}}.

Since the errors are Pauli diagonal with the Pauli twirling technique, as shown in Equation (13) and (14), P^in\braket{\hat{P}_{i}}_{n} is an exponential function of nn, while pnp_{n} is a multi-exponential function of nn, so the fitting model is

P^in\displaystyle\braket{\hat{P}_{i}}_{n} =g(n)=Bekn,\displaystyle=g(n)=Be^{-kn}, (26)
pn\displaystyle p_{n} =f(n)=iAiekin+C.\displaystyle=f(n)=\sum_{i}A_{i}e^{-k_{i}n}+C. (27)

Moreover, it can be simplified as

pn=F(P^in)=iAiP^inki+C,p_{n}=F(\braket{\hat{P}_{i}}_{n})=\sum_{i}A_{i}\braket{\hat{P}_{i}}_{n}^{k_{i}}+C, (28)

where (Ai,ki,C)(A_{i},k_{i},C) is the parameters need to determine.

In particular, the expectation O^0\braket{\hat{O}}_{0} can be calculated with the estimator P^i1est\braket{\hat{P}_{i}}_{1\mathrm{est}} without unitary folding and extrapolation. This should be interpreted as a modified VD/ESD of the purification method (see Appendix B). In the following, we will discuss the difference between the two criteria of noisy-free point, and the difference between the 11-fold pZNE (without extrapolation) and VD/ESD.

For extrapolation to n=12n=\frac{1}{2}, χni=χfiχbi=eλ22ωi\chi_{ni}=\sqrt{\frac{\chi_{f_{i}}}{\chi_{bi}}}=e^{-\frac{\lambda^{2}}{2}\omega_{i}} by Equation (51). The tolerant error is

ϵ2σ2λ22Δω¯+O(λ4),\epsilon\sim\sqrt{2}\sigma\sim\frac{\sqrt{2}\lambda^{2}}{2}\Delta\bar{\omega}+O(\lambda^{4}), (29)

where Δω¯=ω2¯ω¯2\Delta\bar{\omega}=\sqrt{\overline{\omega^{2}}-\bar{\omega}^{2}}, has lower order than the bias of estimator P^i1est\braket{\hat{P}_{i}}_{1\mathrm{est}} without extrapolation. For extrapolation to pure state, sn0=0s_{n_{0}}=0, the tolerant error is (see Appendix. D)

ϵ2σ2(1n0)λ2Δω¯+O(λ4),\epsilon\sim\sqrt{2}\sigma\sim\sqrt{2}(1-n_{0})\lambda^{2}\Delta\bar{\omega}+O(\lambda^{4}), (30)

which is in the same order as extrapolate to n=1/2n={1}/{2}, and in practice, we can select the one in n0n_{0} and 1/21/2 closer to 11.

For routine ZNE method, the relative bias is |eλ2ωi/21|\left|e^{-\lambda^{2}\omega_{i}/2}-1\right|. The tolerant error is (see Appendix. D)

ϵ22λ2Δω¯+O(λ4),\epsilon\sim\frac{\sqrt{2}}{2}\lambda^{2}\Delta\bar{\omega}+O(\lambda^{4}), (31)

when λ2Δω¯12(1n0)2\lambda^{2}\Delta\bar{\omega}\leq\frac{1}{\sqrt{2}(1-n_{0})}\sim\sqrt{2}, otherwise it may fail. The relative bias up to the lowest order is the same as the pZNE extrapolating to n=12n=\frac{1}{2}, however, considering the constraint of pZNE σ=λ22Δω¯2\sigma=\frac{\lambda^{2}}{2}\Delta\bar{\omega}\leq 2 or λ2Δω¯4\lambda^{2}\Delta\bar{\omega}\leq 4, we find that the pZNE can be used in the region where the error rate is large so that the routine ZNE method may fail. The suppression of bias to second order λ2\lambda^{2} is from the assumption that the difference of forward and backward error is in the second order, which comes from the pulse-inverse technique [54]. Without the pulse-inverse technique, the error rate λ\lambda in the above calculation is substituted by λ1/2\lambda^{1/2}.

Then, we consider the VD/ESD method with 22-replica (see Appendix B). The estimator of VD/ESD is

P^ip=P^inpn,\braket{\hat{P}_{i}}_{p}=\frac{\braket{\hat{P}_{i}}_{n}}{p_{n}}, (32)

where n=2n=2. Under the Pauli diagonal error, with Equation (13) and Equation (14), the bias of the estimator is

|Dχni(D1)χn2¯+11||D(D1)χ¯n+1/χ¯n1|\left|\frac{D\chi_{ni}}{(D-1)\overline{\chi_{n}^{2}}+1}-1\right|\sim\left|\frac{D}{(D-1)\bar{\chi}_{n}+1/\bar{\chi}_{n}}-1\right| (33)

which approaches to zero when χ¯n=1\bar{\chi}_{n}=1, the noise-free point, or χ¯n=1D1\bar{\chi}_{n}=\frac{1}{D-1} if Δχ¯n1\Delta\bar{\chi}_{n}\ll 1. Therefore, it is only reliable when error is small enough. As the estimation in Appendix D, the tolerant error

ϵ124λ>2λ\epsilon\sim\sqrt[4]{12}\lambda>\sqrt{2}\lambda (34)

is larger than the one for the 11-fold pZNE method. Notice that the requirement Δχ¯n1\Delta\bar{\chi}_{n}\ll 1 of the 11-fold pZNE method is weaker than that λ1\lambda\ll 1 of VD/ESD. Therefore, the 11-folded pZNE has lower tolerant error and looser conditions than the VD/ESD method with 22-replica.

4.3 Efficiency

To describe the efficiency of this method, we consider its overhead [9, 45]. The overhead depends on the fitting model, and here, we abstractly denote the fitting model as

Oem=O(0)=F({pi,Oi}),O_{em}=O(0)=F(\{p_{i},O_{i}\}), (35)

where F({pi,Oi})F(\{p_{i},O_{i}\}) is a function of the data {pi,Oi}\{p_{i},O_{i}\} of purity and observable expectation from the experiment. Then, we calculate the overhead CemC_{em} of pZNE, and the details of calculation is in Appendix E.

According to the law of large numbers (LLN), the overhead is given by

Cem\displaystyle C_{em} Nϵ(Oem)Nϵ(O)=Var[Oem]Var[O]\displaystyle\equiv\frac{N^{\epsilon}(O_{em})}{N^{\epsilon}(O)}=\frac{\mathrm{Var}[O_{em}]}{\mathrm{Var}[O]}
(CemZNE+Var[p]Var[O]i|Fpi|)2,\displaystyle\lesssim\left(\sqrt{C_{em}^{\textrm{ZNE}}}+\frac{\mathrm{Var}[p]}{\mathrm{Var}[O]}\sum_{i}\left|\frac{\partial F}{\partial p_{i}}\right|\right)^{2}, (36)

where Nϵ(Oem)N^{\epsilon}(O_{em}) and Nϵ(O)N^{\epsilon}(O) is the number of shots to measure the mitigated estimator OemO_{em} and the raw estimator OO up to ϵ\epsilon precision correspondingly, and the overhead of routine ZNE method CemZNEi(F/Oi)2C_{em}^{\textrm{ZNE}}\sim\sum_{i}\left({\partial F}/{\partial O_{i}}\right)^{2}. The variance Var[p]\mathrm{Var}[p] depends on the method to measure the purity, and as we previously discussed, there are two kinds of methods to measure the purity. One is the replica measurement method, which is based on the Equation (24). Another one is the QST-like method, QST and classical shadow, which measure the expectation of all Pauli operators σ^α\braket{\hat{\sigma}_{\alpha}}, where α{1,2,,4L1}\alpha\in\{1,2,\dots,4^{L}-1\}, and LL is the system size. Then, we calculate the overhead of these two different measurement methods respectively. (For details, see Appendix E.)

For the replica measurement method, if we assume the observable of interest is a Pauli operator, and when the error rate x1x\rightarrow 1, O↛1O\not\rightarrow 1, then the overhead CemC_{em} is

CemCemZNE.C_{em}\sim C_{em}^{\textrm{ZNE}}. (37)

In the contrary, if O1O\rightarrow 1, the overhead increases by a constant

Cem(CemZNE+2κi|Fpi|)2.C_{em}\lesssim\left(\sqrt{C_{em}^{\textrm{ZNE}}}+2\kappa\sum_{i}\left|\frac{\partial F}{\partial p_{i}}\right|\right)^{2}. (38)

For the QST-like measurement method, we assume the observable O^\hat{O} is a Pauli operator located on l0l_{0} qubits, namely O^\hat{O} is not identity I^\hat{I} exactly on l0l_{0} qubits, and we call it l0l_{0}-weight Pauli operator in below. We measure the 3L3^{L} different LL-weight Pauli operators with the same number NLN_{L} of shots to perform the QST-like method. The overhead is

Cem(C~emZNE+10L14L13l0i|Fpi|)2.C_{em}\lesssim\left(\sqrt{\tilde{C}_{em}^{\textrm{ZNE}}}+\frac{10^{L}-1}{4^{L-1}3^{l_{0}}}\sum_{i}\left|\frac{\partial F}{\partial p_{i}}\right|\right)^{2}. (39)

If l0>Llog352l_{0}>L\log_{3}\frac{5}{2}, when LL\rightarrow\infty, the second term in the rightest inequality goes to zero, which means that the overhead

limLCemC~emZNE.\lim_{L\rightarrow\infty}C_{em}\sim\tilde{C}_{em}^{\textrm{ZNE}}. (40)

Thus, the variance of purity Var[p]\mathrm{Var}[p] is well controlled by the variance of Pauli operator Var[O]\mathrm{Var}[O] with large weight.

However, we note that the overhead of ZNE C~emZNE\tilde{C}_{em}^{\textrm{ZNE}} here is calculated under the assumption that the expectation of O^\braket{\hat{O}} is measured from QST-like method. In the ZNE method, we need not perform QST to measure O^\braket{\hat{O}}, thus the actual ZNE overhead is

CemZNE3LZC~emZNE,C_{em}^{\textrm{ZNE}}\sim 3^{-L}Z\tilde{C}_{em}^{\textrm{ZNE}}, (41)

where ZZ is the number of different LL-weight operators covering the operator O^i\hat{O}_{i} of interest. Therefore, the overhead asymptotically is

limLCem3LZCemZNE.\lim_{L\rightarrow\infty}C_{em}\lesssim\frac{3^{L}}{Z}C_{em}^{\textrm{ZNE}}. (42)

Moreover, if the operators of interest is so dense that every LL-weight operator covers at least one operator O^i\hat{O}_{i} of interest, Z=3LZ=3^{L}, the efficiency of pZNE is of the same order as the routine ZNE.

Refer to caption
Figure 4: The numerical simulation of pZNE, ZNE, and VD/ESD with the depolarizing error 𝒟q\mathcal{D}_{q}, where error rate q=0.05q=0.05 per gate. The circuit is shown in (a), which includes several layers of folded noisy CNOT gate 𝒰n\mathcal{U}_{n}. Each circuit is simulated with 1010 repetitions, and the QST results are obtained by performing 10001000 single-shot measurements on each 22-weight Pauli basis. The mitigated expectations of pZNE, ZNE, and VD/ESD are shown in (b). The raw data of expectation Z^0\braket{\hat{Z}_{0}} and purity are shown in (c) and (d).
Refer to caption
Figure 5: The numerical simulation of pZNE, ZNE, and VD/ESD with the randomly sampled Pauli diagonal error 𝒟q\mathcal{D}_{q} with fixed error probability q=0.05q=0.05 per gate. The circuit is the same as Figure 4 under the Pauli diagonal error 𝒟q\mathcal{D}_{q} selected. Each circuit is simulated with 1010 repetitions, and the QST results are obtained by performing 20002000 single-shot measurements on each 22-weight Pauli basis. The mitigated expectations of pZNE, ZNE, and VD/ESD are shown in (a), and the raw data of expectation Z^0\braket{\hat{Z}_{0}} and purity in (b) and (c). The results of average over 5050 random sampled Pauli diagonal errors are shown in (d). The data points show the value of ΔZ01\Delta\braket{Z_{0}}_{1} and error bars show the value of ΔZ02\Delta\braket{Z_{0}}_{2}.
Refer to caption
Figure 6: The experimental simulation of pZNE, ZNE, and VD/ESD. The Pauli twirled circuit is shown in (a), which consists of layers of folded noisy CNOT gate 𝒰n\mathcal{U}_{n}, where the noise \mathcal{E} is determined by the experiment device. The expectation Z^0\braket{\hat{Z}_{0}} with the initial state |00\ket{00} under increasing CNOT gate folding layers is shown. The fitting model is an exponential model. Each Pauli basis is repeatedly measured with 16001600 shots, whereas each Pauli twirling instance is performed with 100100 shots. The mitigated expectations of pZNE, ZNE, and VD/ESD are shown in (b). The raw data of expectation Z^0\braket{\hat{Z}_{0}} and purity are shown in (c) and (d).

5 Verification

In this section, we simulate the pZNE method numerically and experimentally.

5.1 Numerical Simulation: Depolarizing Error

We verify the effectiveness of pZNE by numerical simulation. First, we simulate the CNOT gates under the depolarizing error of different error rates. The initial state is selected as |00\ket{00}, and the observable expectation Z^0\braket{\hat{Z}_{0}} of Pauli ZZ operator on the qubit q0q_{0} is measured for showing the effect of the error. For ideal CNOT gates, the expectation Z^0=1\braket{\hat{Z}_{0}}=1 is invariant for any number of folds. However, the depolarizing error will lead to the decay of the expectation. We compare the mitigated expectation of the pZNE to the routine ZNE method, the VD/ESD method, and the modified purification method. Each layer of CNOT gate is a folded gate, with 1,31,3 and 55 CNOT gates. We simulate the circuit with different layers, which enlarge the error rate of 11-fold gate. The results are shown in Figure 4. The estimators of pZNE, ZNE, VD/ESD, and modified purification method are shown in Figure 4(b), the raw value of expectations Z^0\braket{\hat{Z}_{0}} are shown in the Figure 4(c), and the purities of the noisy state are shown in the Figure 4(d). (For details of data processing, see Appendix F.)

Among the several methods, the VD/ESD method presents a large bias of expectation, which is expected in Equation (33). The modified purification has the lowest bias and standard deviation since the depolarizing error Δχ=0\Delta\chi=0. However, in cases where the error is not a depolarizing error but a Pauli diagonal error, the modified purification method may exhibit a significant bias that increases with the error rate. We will discuss this later in Sec 5.2.

The reason pZNE (with extrapolation) does not perform better than the modified purification, 11-fold pZNE without extrapolation, is that for the depolarizing errors, the error is from the random errors in measurement. The extrapolation thus cannot cancel the bias, but enlarge the random errors in measurement. We expect that there exist errors with small but not vanish Δχ\Delta\chi, where the modified purification method perform better than the routine ZNE method. With sufficiently large number NN of shots, the random errors in measurement are sufficiently smaller than the error from error models. In this case, the pZNE (with extrapolation) will perform better than both the modified purification and the routine ZNE method.

Comparing the pZNE with the ZNE, the bias between the average mitigated data and ideal expectation of these two methods are comparable before the 1414-th layer, where the purity of 11-folded noisy state p0.5p\geq 0.5, as shown in Figure 4(d). It can be seen that the standard deviations of the pZNE method are lower than those of the ZNE method. Despite this, purity estimation necessitates complete information about the quantum state, resulting in an additional measurement overhead. However, in scenarios where dense information about the state is required, the purity can be estimated simultaneously from the measurement outcomes. In such cases, pZNE does not involve higher overhead compared to ZNE, and it showcases the advantage of offering higher precision than ZNE.

5.2 Numerical Simulation: Pauli Diagonal Error

To analyze the pZNE method with the Pauli diagonal error, we first simulate the CNOT gate under this kind of error models. The settings of simulations are the same as those in the depolarizing error model case. The results are shown in Figure 5. The estimators of pZNE, ZNE, VD/ESD, and modified purification method of 11-folded gate are shown in Figure 5(a), the raw value of expectations Z^0\braket{\hat{Z}_{0}} are shown in Figure 5(b), and the purities of the noisy state are shown in Figure 5(c). (For details, see Appendix F.)

The VD/ESD estimator still has a large bias from the ideal expectation. Different from the depolarizing error model case, the modified purification method also has bias with randomly selected Pauli diagonal error. The bias increases with the increasing of the layer in Figure 5(a), which represents the increasing error rate. This result is because the eigenvalues of the selected error channel do not concentrate enough. For the pZNE and routine ZNE method, same as the depolarizing error case, the bias of pZNE is comparable with the routine ZNE. In addition, the standard deviation of pZNE is lower than routine ZNE.

However, the observation from the results of the depolarizing error and the selected Pauli diagonal error are not sufficient to represent the performance of different methods. To confirm this, we perform the numerical simulations of the CNOT gate under a wider range of randomly sampled Pauli diagonal errors with the same error rates. We take 50 instances of Pauli diagonal errors, randomly sampled, and computed the expectations and variances of various mitigation methods for each instance. Subsequently, we calculated the square root of the second moment of biases for these sampled instances

ΔZ01=[𝔼i(Z01)2]1/2,\Delta\braket{Z_{0}}_{1}=\left[\mathbb{E}_{\mathcal{E}_{i}}(\braket{Z_{0}}-1)^{2}\right]^{1/2}, (43)

and the average standard deviation

ΔZ02=[𝔼i(VarZ0)]1/2,\Delta\braket{Z_{0}}_{2}=\left[\mathbb{E}_{\mathcal{E}_{i}}(\mathrm{Var}\braket{Z_{0}})\right]^{1/2}, (44)

to represent the average performance of the different methods.

The quantity ΔZ01\Delta\braket{Z_{0}}_{1} represents the accuracy of the methods, and the quantity ΔZ02\Delta\braket{Z_{0}}_{2} represents the precision. The result is shown in Figure 5(d).

The VD/ESD method exhibits a noticeable average bias, whereas the bias of the modified purification method increases linearly with the number of layers. Interestingly, their standard deviations remain comparable and do not show an increase in the number of layers. In comparison, the biases of the pZNE and conventional ZNE methods are nearly equivalent and tend to increase with the number of layers. However, before the 1414-th layer, the bias of pZNE is marginally smaller than that of conventional ZNE. After the 1515-th layer, conventional ZNE encounters failures for some sampled instances due to their diverging biases, while pZNE manages to remain effective until the 1818-th layer. Furthermore, the standard deviations of pZNE show a corresponding increase with the number of layers, as both methods involve extrapolation. Notably, these standard deviations are consistently lower than those of the routine ZNE method.

These results show that the pZNE method offers enhanced accuracy and precision, particularly in scenarios with substantial error rates, while also maintaining a lower standard deviation compared to routine ZNE on average across various Pauli diagonal errors. Nonetheless, to harness these benefits effectively, it is recommended to leverage the pZNE method primarily in tasks that necessitate detailed and comprehensive information about the quantum state.

5.3 Experiment

We perform the pZNE and ZNE experiments on the cloud-based superconducting quantum computation platform, Quafu [56]. For the information of the Quafu platform and device, see Appendix G. The experimental setup was identical to that of the numerical simulations but was performed only once. We applied the Pauli twirling technique to the folded CNOT gate. The results are illustrated in Figure 6, where the mitigated outcomes of pZNE, ZNE, VD/ESD, and the modified purification method of the 11-folded gate are presented in Figure 6(b). Additionally, the raw expectation values Z^0\braket{\hat{Z}_{0}} are shown in Figure 6(c), and the purities of the noisy state are shown in Figure 6(d). The data processing procedures were the same as those used in the numerical simulation, with the exception of calculating standard deviations.

The results of the VD/ESD and modified purification methods exhibit significant biases away from the ideal values, indicating a lack of concentration in the eigenvalues of the twirled error channel. The mitigated expectations of the pZNE show biases comparable to those of the standard ZNE method. Interestingly, for error rates in the moderate range, specifically after the 55-th layer and before the 1111-th layer, the bias of the pZNE method is lower than that of the standard ZNE approach. Our previous assumption considered each error channel of the layers in the circuit to be Pauli diagonal. However, implementing individual Pauli twirling for each layer in the circuit would lead to an exponential increase in compiled circuit instances with the number of layers. To simplify this process, we twirled the folding circuit as a whole. Consequently, the coherent part of the error channels, which would have been eliminated in individual Pauli twirling, contributed to the overall error channel of longer layers. In such cases, the purity of the state can assist in extrapolating expectations, thereby reducing the bias in pZNE expectations compared to routine ZNE for stages with moderate errors.

6 Conclusion

In this paper, we introduce a method that utilizes purity to assess noise error rates, operating under the assumption that the error channel is Pauli diagonal. By using the Pauli twirling technique, we ensure that this assumption holds universally. Our study involves estimating the bias of the pZNE method based on a specified failing probability and conducting a comparative analysis with other relevant methods. We demonstrate that the pZNE method may not lead to a substantial reduction in bias compared to the routine ZNE method when error rates are low. However, it showcases its utility within a certain threshold where the routine ZNE method might encounter limitations or failures.

Moreover, we analyze the overhead of the pZNE method using different purity estimation techniques, such as the replica measurement method and the QST-like method. Our results reveal distinct behaviors in the upper bounds of pZNE for these estimation methods. Specifically, the overhead of the replica measurement method is constrained by the overhead of ZNE with a finite constant added, while the overhead of a QST-like method escalates exponentially with the growth of system size. In cases with exponentially increasing overhead, the pZNE method may not be efficient for large systems when using a QST-like method. However, if the task requires dense information about the density matrix, the purity estimation will not significantly inflate the overhead.

In addition, we verify the pZNE method with the numerical simulations under the depolarizing error and the Pauli diagonal errors, as well as the experiments on the quantum computation platform, Quafu. We compare it with the routine ZNE and the VD/ESD method. The results show that the bias of both routine ZNE and pZNE increases with the error rate, and divergence when the error rate is large enough. The bias of pZNE is slightly lower than routine ZNE when the error rate is moderate, and the critical point of error rate for pZNE is larger than routine ZNE method. The standard deviation of pZNE is lower than that of ZNE, indicating that the pZNE method is more efficient when dense state information is of interest.

Our results show that the purity of noisy quantum states can be used to assist the ZNE method, thus extending its effectiveness. It can be expected that pZNE will reinforce the wide usage of the ZNE method in the NISQ era.

Appendix A The Forward and Backward Errors

In this appendix, we consider the forward and backward errors. The equality of forward and backward error is the basic assumption of the ZNE method. However, it is not such easy to satisfy, even for the unitary evolution whose inverse is itself. For example, assume the ideal unitary 𝒰\mathcal{U} is CNOT gate, the noisy circuit is 𝒰λ=f𝒰\mathcal{U}_{\lambda}=\mathcal{E}_{f}\circ\mathcal{U}. Since the CNOT gate is self-inverse, in the unitary folding technique, the backward evolution is 𝒰λ=𝒰λ\mathcal{U}_{\lambda}^{\dagger}=\mathcal{U}_{\lambda}, and the backward error is

b=𝒰f𝒰.\mathcal{E}_{b}=\mathcal{U}\circ\mathcal{E}_{f}\circ\mathcal{U}^{\dagger}. (45)

We assume the circuit is implemented with the Pauli twirling technique, thus the forward and backward errors are Pauli diagonal. Let the Pauli operators on the first control qubit of CNOT gate be X1,Y1,Z1X_{1},Y_{1},Z_{1}, and on the target qubit be X2,Y2,Z2X_{2},Y_{2},Z_{2}. The action of CNOT gate is

X1X1X2,Y1Y1X2,Z1Z1,\displaystyle X_{1}\mapsto X_{1}X_{2},\quad Y_{1}\mapsto Y_{1}X_{2},\quad Z_{1}\mapsto Z_{1},
X2X2,Y2Z1Y2,Z2Z1Z2,\displaystyle X_{2}\mapsto X_{2},\quad Y_{2}\mapsto Z_{1}Y_{2},\quad Z_{2}\mapsto Z_{1}Z_{2}, (46)

so if there is no symmetry that all coefficients qiq_{i} of Pauli diagonal forward error f\mathcal{E}_{f} are different (the converse case is zero-measure in space of error), then the forward and backward error is different.

In Ref. [54], they deal with this problem by introducing the pulse-inverse circuit 𝒰λI\mathcal{U}_{\lambda I}^{\dagger} (𝒦I\mathcal{K}_{I} in original context). The pulse-inverse circuit 𝒰λI\mathcal{U}_{\lambda I}^{\dagger} is a special circuit of inverse of 𝒰λ\mathcal{U}_{\lambda}. Assume the ideal evolution 𝒰\mathcal{U} is implemented by a time-dependent Hamiltonian H(t)H(t), and the error is described by a Lindblad operation (t)\mathcal{L}(t). The ideal pulse-inverse circuit 𝒰\mathcal{U}^{\dagger} is implemented by time-dependent Hamiltonian HI(t)=H(Tt)H_{I}(t)=-H(T-t), and a reasonable assumption is that the error in this inverse pulse is described by the Lindblad operation I(t)=(Tt)\mathcal{L}_{I}(t)=\mathcal{L}(T-t). The forward and backward error thus is related to the evolution in the interaction picture

f\displaystyle\mathcal{E}_{f} =𝒰Texp0TdtL~(t)𝒰,\displaystyle=\mathcal{U}\circ\mathrm{T}\exp\int_{0}^{T}\mathrm{d}t\tilde{L}(t)\circ\mathcal{U}^{\dagger}, (47)
b\displaystyle\mathcal{E}_{b} =Texp0TdtL~I(t),\displaystyle=\mathrm{T}\exp\int_{0}^{T}\mathrm{d}t\tilde{L}_{I}(t), (48)

where L~(t)=𝒰(t)L(t)𝒰(t),L~I(t)=𝒰I(t)LI(t)𝒰I(t)\tilde{L}(t)=\mathcal{U}^{\dagger}(t)L(t)\mathcal{U}(t),\tilde{L}_{I}(t)=\mathcal{U}_{I}^{\dagger}(t)L_{I}(t)\mathcal{U}_{I}(t) are the Lindblad operations in the interaction picture. Ref. [54] shows that in the first order of Magnus expansion fb\mathcal{E}_{f}\approx\mathcal{E}_{b}, which agrees with the first order expansion of time-independent case. However, to higher order, these two errors are still different, if the Hamiltonian evolution and Lindblad evolution are not commute. Therefore, the pulse-inverse circuit extends the requirement of time-independence on unitary folding to the Floquet system, which is more suitable for digital systems, but cannot fix the problem in the assumption that the forward and backward errors are equal.

Moreover, intersecting a perturbative parameter λ\lambda before Lindblad operations to reflect the intensity of the noise, the expansion shows that

b=feλ2Ω2,\mathcal{E}_{b}=\mathcal{E}_{f}e^{\lambda^{2}\Omega_{2}}, (50)

the difference between forward and backward errors is up to the second order. With the Pauli twirling technique, both forward and backward errors are Pauli diagonal, thus commute, so their eigenvalues have a relation

χbi=χfieλ2ωi.\chi_{bi}=\chi_{fi}e^{\lambda^{2}\omega_{i}}. (51)

Appendix B Brief Introduction to Purification Method

The purification method is based on the belief that in practice the output state is always mixed, while the ideal state is pure. If the error is not so large, the ideal pure state should not be far away from the experimental mixed state. Thus, the pure state closest to the mixed state may be a good approximation of the ideal state, and this state is the aim of the purification method.

Let the mixed state ρ\rho decomposed as

ρ=npn|nn|,\rho=\sum_{n}p_{n}\ket{n}\bra{n}, (52)

where pnp_{n} are the spectrum of the density ordered descendingly, and |n\ket{n} are the corresponding eigenstates. For some other pure state |ψ=nψn|n\ket{\psi}=\sum_{n}\psi_{n}\ket{n}, the distance

D2(ρ,|ψ)=Trρ2+12ψ|ρ|ψ=Trρ2+12npn|ψn|2\begin{split}D^{2}(\rho,\ket{\psi})&=\mathrm{Tr}\rho^{2}+1-2\bra{\psi}\rho\ket{\psi}\\ &=\mathrm{Tr}\rho^{2}+1-2\sum_{n}p_{n}|\psi_{n}|^{2}\end{split} (53)

minimizes when |ψn|=δn,0|\psi_{n}|=\delta_{n,0}, namely the state with the largest probability in the mixed state is the closest pure state. It is clear that if we know the mixed state completely, by diagonalization we can get the closest pure state.

For large systems, the dimension of the matrix is too overwhelming to perform the diagonalization. The better way is to realize the purification by iteration. One of the iteration procedures is the McWeeny purification [57, 40]

ρn+1=3ρn22ρn3.\rho_{n+1}=3\rho_{n}^{2}-2\rho_{n}^{3}. (54)

In a basis that can diagonalize the mixed state, this iteration only maps the spectrum. However, the attractive fixed point of this map is p=0,1p=0,1 and the critical point is 1/21/2, thus it has the probability of convergence to 0 for all eigenvalues, which is not what we desire. Only if there is an eigenvalue pnp_{n} greater than 1/21/2, namely the error is small enough, the purification is working. Another one is the map

ρρMTrρM.\rho\mapsto\frac{\rho^{M}}{\mathrm{Tr}\rho^{M}}. (55)

In the diagonal representation, the spectra are mapped as

pnpnMmpmM=11+mn(pm/pn)M.p_{n}\mapsto\frac{p_{n}^{M}}{\sum_{m}p_{m}^{M}}=\frac{1}{1+\sum_{m\neq n}(p_{m}/p_{n})^{M}}. (56)

In the limit MM\rightarrow\infty, the spectra pnδn0p_{n}\rightarrow\delta_{n0}, since pnp_{n} is in descending order, which is what pure state we want.

To do the purification, one can perform the tomography of target density, and then iterate the procedure given above on a classical computer. However, tomography is expensive for large systems, other method is also needed. One can prepare the purified state by post-selection, Other strategy is to realize the purification virtually. The purified state can be approximated as ρMTrρM\frac{\rho^{M}}{\mathrm{Tr}\rho^{M}} for some given integer MM, so the expectation of quantity O^\hat{O} under the purified state approximate as

O^pTr(O^ρλM)TrρλM.\braket{\hat{O}}_{\mathrm{p}}\approx\frac{\mathrm{Tr}\left(\hat{O}\rho_{\lambda}^{M}\right)}{\mathrm{Tr}\rho_{\lambda}^{M}}. (57)

If the value Tr(O^ρλM),TrρλM\mathrm{Tr}\left(\hat{O}\rho_{\lambda}^{M}\right),\mathrm{Tr}\rho_{\lambda}^{M} can be evaluated from the experiment, the purified expectation value can be obtained.

A method called virtual distillation or error suppression by derangement is proposed [21], which is based on the Equation (24). where ρM\rho^{\otimes M} is MM copies of ρ\rho, O^(i)\hat{O}_{(i)} is the operator O^\hat{O} acted on the ii-th copy, typically i=1i=1, and S^(M)\hat{S}^{(M)} is cyclic Thus, Tr(O^ρλM)\mathrm{Tr}\left(\hat{O}\rho_{\lambda}^{M}\right) and TrρλM\mathrm{Tr}\rho_{\lambda}^{M} can be measured with MM copies of ρ\rho. For S^(M)\hat{S}^{(M)} is cyclic permutation on MM copies, the expectation of operator O^(M)=1MiO^(i)\hat{O}^{(M)}=\frac{1}{M}\sum_{i}\hat{O}_{(i)} is same as the original one

Tr(O^(i)S^(M)ρM)=Tr(O^(M)S^(M)ρM).\mathrm{Tr}\left(\hat{O}_{(i)}\hat{S}^{(M)}\rho^{\otimes M}\right)=\mathrm{Tr}\left(\hat{O}^{(M)}\hat{S}^{(M)}\rho^{\otimes M}\right). (58)

Since O^(M)\hat{O}^{(M)} are commute with S^(M)\hat{S}^{(M)}, we can find the common eigenstates as the basis, which allows measuring Tr(O^ρM)\mathrm{Tr}\left(\hat{O}\rho^{M}\right) and TrρM\mathrm{Tr}\rho^{M} simultaneously. One can measure the expectation Tr(O^(M)S^(M)ρM)\mathrm{Tr}\left(\hat{O}^{(M)}\hat{S}^{(M)}\rho^{\otimes M}\right) and Tr(S^(M)ρM)\mathrm{Tr}\left(\hat{S}^{(M)}\rho^{\otimes M}\right) by Hadamard test (see Figure 3) or by diagonalization [21]. For the measurement by diagonalization, in a qubit system, let O^\hat{O} be the Z^\hat{Z} operator, the eigenstates are just spin waves. Moreover, considering the case with two copies, i.e. M=2M=2, the eigenstates states are just the spin singlet and triplet, and the transformation is

B^(2)=(10000121200121200001).\hat{B}^{(2)}=\left(\begin{array}[]{cccc}1&0&0&0\\ 0&\frac{1}{\sqrt{2}}&-\frac{1}{\sqrt{2}}&0\\ 0&\frac{1}{\sqrt{2}}&\frac{1}{\sqrt{2}}&0\\ 0&0&0&1\end{array}\right). (59)

Under this transformation,

S^(2)12(1+Z^(1)Z^(2)+Z^(1)Z^(2))Z^(2)Z^(2)=12(Z^(1)+Z^(2)),\begin{split}\hat{S}^{(2)}&\rightarrow\frac{1}{2}(1+\hat{Z}_{(1)}-\hat{Z}_{(2)}+\hat{Z}_{(1)}\hat{Z}_{(2)})\\ \hat{Z}^{(2)}&\rightarrow\hat{Z}^{(2)}=\frac{1}{2}(\hat{Z}_{(1)}+\hat{Z}_{(2)}),\end{split} (60)

and we can verify that Z^(2)S^(2)=Z^(2)\hat{Z}^{(2)}\hat{S}^{(2)}=\hat{Z}^{(2)}. Thus, the estimator of Z^\hat{Z} is

Z^pZ^λTrρλ2.\braket{\hat{Z}}_{p}\approx\frac{\braket{\hat{Z}}_{\lambda}}{\mathrm{Tr}\rho_{\lambda}^{2}}. (61)

In addition to the above-mentioned methods, there is a variation of purity estimation called state verification [23]. We consider the purity of ρn=𝒰n(ρ)\rho_{n}=\mathcal{U}_{n}(\rho), namely

pn\displaystyle p_{n} =Trρn2=Tr[𝒰n(ρ)2]\displaystyle=\mathrm{Tr}\rho_{n}^{2}=\mathrm{Tr}\left[\mathcal{U}_{n}(\rho)^{2}\right]
=Tr[ρ(𝒰n)𝒰n(ρ)]=Tr[ρρ~n].\displaystyle=\mathrm{Tr}\left[\rho\left(\mathcal{U}_{n}\right)^{\dagger}\circ\mathcal{U}_{n}(\rho)\right]=\mathrm{Tr}\left[\rho\tilde{\rho}_{n}\right]. (62)

This is the overlap of initial state ρ\rho with quasi-state ρ~n=(𝒰n)𝒰n(ρ)\tilde{\rho}_{n}=\left(\mathcal{U}_{n}\right)^{\dagger}\circ\mathcal{U}_{n}(\rho), which is not physical in general. Here, the \dagger on the channel 𝒰n\mathcal{U}_{n} is acted on its Kraus operators, but not on the output state 𝒰n(ρ)\mathcal{U}_{n}(\rho). For 𝒰n=(𝒰λ𝒰λ)n1𝒰λ\mathcal{U}_{n}=(\mathcal{U}_{\lambda}\circ\mathcal{U}_{\lambda}^{\dagger})^{n-1}\circ\mathcal{U}_{\lambda}, the quasi-channel is

(𝒰n)𝒰n=(𝒰λ)(𝒰λ𝒰λ)n1(𝒰λ𝒰λ)n1𝒰λ.\left(\mathcal{U}_{n}\right)^{\dagger}\circ\mathcal{U}_{n}=\left(\mathcal{U}_{\lambda}\right)^{\dagger}\circ(\mathcal{U}_{\lambda}\circ\mathcal{U}_{\lambda}^{\dagger})^{\dagger n-1}\circ(\mathcal{U}_{\lambda}\circ\mathcal{U}_{\lambda}^{\dagger})^{n-1}\circ\mathcal{U}_{\lambda}. (63)

In ideal case, λ=0\lambda=0, (𝒰λ)=𝒰λ=𝒰\left(\mathcal{U}_{\lambda}\right)^{\dagger}=\mathcal{U}_{\lambda}^{\dagger}=\mathcal{U}^{\dagger}, we have

(𝒰n)𝒰n|λ=0=𝒰~2n1|λ=0,\left.\left(\mathcal{U}_{n}\right)^{\dagger}\circ\mathcal{U}_{n}\right|_{\lambda=0}=\left.\tilde{\mathcal{U}}_{2n-1}\right|_{\lambda=0}, (64)

where 𝒰~n=(𝒰λ𝒰λ)n\tilde{\mathcal{U}}_{n}=(\mathcal{U}_{\lambda}^{\dagger}\circ\mathcal{U}_{\lambda})^{n}. Thus, in the purification method, the recurrence probability can be used instead of purity

p~n=Tr[ρρn],\tilde{p}_{n}=\mathrm{Tr}\left[\rho\rho_{n}\right], (65)

where ρn=𝒰~2n1(ρ)\rho_{n}=\tilde{\mathcal{U}}_{2n-1}(\rho) is a physical state, to substitute the purity.

The state verification echo p~n\tilde{p}_{n} is easier to measure than purity. However, it is coincident with the purity if the error channels satisfy

(f)=b.\left(\mathcal{E}_{f}\right)^{\dagger}=\mathcal{E}_{b}. (66)

In the Pauli twirling case, it is equivalent to f=b\mathcal{E}_{f}=\mathcal{E}_{b}, where the routine ZNE method is satisfied. Therefore, it cannot be used in pZNE to measure purity.

Appendix C Pauli Twirling

Pauli twirling technique can compile non-diagonal error channel into Pauli diagonal. Given a group GG of gates the twirling channel of \mathcal{E} with these gates is

twirl=1|G|𝒢G𝒢𝒢.\mathcal{E}_{\mathrm{twirl}}=\frac{1}{|G|}\sum_{\mathcal{G}\in G}\mathcal{G}\circ\mathcal{E}\circ\mathcal{G}^{\dagger}. (67)

If the gate GG is Pauli group, writing the error channel \mathcal{E} as the Kraus decomposition in Pauli basis

=ieijP^iP^j,\mathcal{E}=\sum_{i}e_{ij}\hat{P}_{i}\cdot\hat{P}_{j}, (68)

the twirling channel is

twirl\displaystyle\mathcal{E}_{\mathrm{twirl}} =1D2i,j,keij𝒫k(P^i)𝒫k(P^j)\displaystyle=\frac{1}{D^{2}}\sum_{i,j,k}e_{ij}\mathcal{P}_{k}(\hat{P}_{i})\cdot\mathcal{P}_{k}(\hat{P}_{j})
=1D2i,j,keijϵikϵjkP^iP^j.\displaystyle=\frac{1}{D^{2}}\sum_{i,j,k}e_{ij}\epsilon_{ik}\epsilon_{jk}\hat{P}_{i}\cdot\hat{P}_{j}. (69)

To go ahead, let P^l=P^iP^j\hat{P}_{l}=\hat{P}_{i}\hat{P}_{j}, then we have

𝒫k(P^l)=𝒫k(P^i)𝒫k(P^j),\mathcal{P}_{k}(\hat{P}_{l})=\mathcal{P}_{k}(\hat{P}_{i})\mathcal{P}_{k}(\hat{P}_{j}), (70)

which means ϵikϵjk=ϵlk\epsilon_{ik}\epsilon_{jk}=\epsilon_{lk}. Note that for l=0l=0, all ϵlk=1\epsilon_{lk}=1, otherwise, ϵlk=±\epsilon_{lk}=\pm with equal probabilities, thus

kϵlk=D2δl,0=D2δi,j.\sum_{k}\epsilon_{lk}=D^{2}\delta_{l,0}=D^{2}\delta_{i,j}. (71)

With this result, the twirling channel

twirl=ieii𝒫i\mathcal{E}_{\mathrm{twirl}}=\sum_{i}e_{ii}\mathcal{P}_{i} (72)

is Pauli diagonal.

However, in practice, the error channel \mathcal{E} company with the ideal gate 𝒰\mathcal{U}, so our task is to twirl the error channel \mathcal{E} in the noisy gate 𝒰λ=𝒰\mathcal{U}_{\lambda}=\mathcal{E}\circ\mathcal{U} without changing the ideal gate 𝒰\mathcal{U}. To do so, notice that the Clifford gate combined with T-gate forms the universal quantum gate set. For the Clifford gate 𝒞\mathcal{C}, the twirling channel of its noisy realization is

𝒞λtwirl=1|P|𝒫P𝒫𝒞λ𝒫,\mathcal{C}_{\lambda\mathrm{twirl}}=\frac{1}{|P|}\sum_{\mathcal{P}\in P}\mathcal{P}\circ\mathcal{C}_{\lambda}\circ\mathcal{P}^{\prime}, (73)

where 𝒫=𝒞𝒫𝒞\mathcal{P}^{\prime}=\mathcal{C}^{\dagger}\circ\mathcal{P}\circ\mathcal{C} are also Pauli gates. In this selection of 𝒫\mathcal{P}^{\prime}, the twirling channel of noisy gate is

𝒞λtwirl\displaystyle\mathcal{C}_{\lambda\mathrm{twirl}} =1|P|𝒫P𝒫𝒞𝒫\displaystyle=\frac{1}{|P|}\sum_{\mathcal{P}\in P}\mathcal{P}\circ\mathcal{E}\circ\mathcal{C}\circ\mathcal{P}^{\prime}
=1|P|𝒫P𝒫𝒫𝒞\displaystyle=\frac{1}{|P|}\sum_{\mathcal{P}\in P}\mathcal{P}\circ\mathcal{E}\circ\mathcal{P}\circ\mathcal{C}
=twirl𝒞.\displaystyle=\mathcal{E}_{\mathrm{twirl}}\circ\mathcal{C}. (74)

For the T-gate 𝒯\mathcal{T}, the construction is similar

𝒯λtwirl=1|P|𝒫P𝒫𝒯λ𝒫,\mathcal{T}_{\lambda\mathrm{twirl}}=\frac{1}{|P|}\sum_{\mathcal{P}\in P}\mathcal{P}\circ\mathcal{T}_{\lambda}\circ\mathcal{P}^{\prime}, (75)

while 𝒫=𝒯𝒫𝒯\mathcal{P}^{\prime}=\mathcal{T}^{\dagger}\circ\mathcal{P}\circ\mathcal{T} are Clifford gates instead. Note that if the error channel is so defined that on the ideal gate 𝒰λ=𝒰\mathcal{U}_{\lambda}=\mathcal{U}\circ\mathcal{E}, the twirling channel can be constructed similarly, with a small modification

𝒰λtwirl=1|P|𝒫P𝒫𝒰λ𝒫,\mathcal{U}_{\lambda\mathrm{twirl}}=\frac{1}{|P|}\sum_{\mathcal{P}\in P}\mathcal{P}^{\prime}\circ\mathcal{U}_{\lambda}\circ\mathcal{P}, (76)

where 𝒫=𝒰𝒫𝒰\mathcal{P}^{\prime}=\mathcal{U}\circ\mathcal{P}\circ\mathcal{U}^{\dagger}.

Appendix D Effectiveness

The failing probability is

P\displaystyle P (χn2¯1/2|χniχn2¯1/2|ϵ)\displaystyle\left(\overline{\chi_{n}^{2}}^{-1/2}\left|\chi_{ni}-\overline{\chi_{n}^{2}}^{1/2}\right|\geq\epsilon\right)
=P(χniχn2¯1/21ϵ)\displaystyle=P\left(\chi_{ni}\overline{\chi_{n}^{2}}^{-1/2}-1\geq\epsilon\right)
+P(1χniχn2¯1/2ϵ).\displaystyle\ \quad+P\left(1-\chi_{ni}\overline{\chi_{n}^{2}}^{-1/2}\geq\epsilon\right). (77)

By exponential from of Chebyshev’s inequality, and χn2¯=χ¯n2+Δχ¯n2\overline{\chi_{n}^{2}}=\bar{\chi}_{n}^{2}+\Delta\bar{\chi}_{n}^{2}, we have

P(χniχn2¯1/21ϵ)infλ>0𝔼eλ(χniχn2¯1/21ϵ)\displaystyle P\left(\chi_{ni}\overline{\chi_{n}^{2}}^{-1/2}-1\geq\epsilon\right)\leq\inf_{\lambda>0}\mathbb{E}e^{\lambda\left(\chi_{ni}\overline{\chi_{n}^{2}}^{-1/2}-1-\epsilon\right)}
infλ>0exp[Δχ¯n22χn2¯λ2+λ(χ¯nχn2¯1/21ϵ)]\displaystyle\approx\inf_{\lambda>0}\exp\left[\frac{\Delta\bar{\chi}_{n}^{2}}{2\overline{\chi_{n}^{2}}}\lambda^{2}+\lambda\left(\frac{\bar{\chi}_{n}}{\overline{\chi_{n}^{2}}^{1/2}}-1-\epsilon\right)\right]
infλ>0exp[Δχ¯n22χ¯n2λ2λ(ϵ+Δχ¯n22χ¯n2)]\displaystyle\approx\inf_{\lambda>0}\exp\left[\frac{\Delta\bar{\chi}_{n}^{2}}{2\bar{\chi}_{n}^{2}}\lambda^{2}-\lambda\left(\epsilon+\frac{\Delta\bar{\chi}_{n}^{2}}{2\bar{\chi}_{n}^{2}}\right)\right]
=exp(χ¯n22Δχ¯n2ϵ212ϵ),\displaystyle=\exp\left(-\frac{\bar{\chi}_{n}^{2}}{2\Delta\bar{\chi}_{n}^{2}}\epsilon^{2}-\frac{1}{2}\epsilon\right), (78)

where in the second line, we use the cumulant expansion to the second order, and in the third line, we keep to the first order of Δχ¯n2\Delta\bar{\chi}_{n}^{2}. Similarly, we have

P(1χniχn2¯1/2ϵ)infλ>0𝔼eλ(1χniχn2¯1/2ϵ)\displaystyle P\left(1-\chi_{ni}\overline{\chi_{n}^{2}}^{-1/2}\geq\epsilon\right)\leq\inf_{\lambda>0}\mathbb{E}e^{\lambda\left(1-\chi_{ni}\overline{\chi_{n}^{2}}^{-1/2}-\epsilon\right)}
infλ>0exp[Δχ¯n22χn2¯λ2+λ(1χ¯nχn2¯1/2ϵ)]\displaystyle\approx\inf_{\lambda>0}\exp\left[\frac{\Delta\bar{\chi}_{n}^{2}}{2\overline{\chi_{n}^{2}}}\lambda^{2}+\lambda\left(1-\frac{\bar{\chi}_{n}}{\overline{\chi_{n}^{2}}^{1/2}}-\epsilon\right)\right]
infλ>0exp[Δχ¯n22χ¯n2λ2λ(ϵΔχ¯n22χ¯n2)]\displaystyle\approx\inf_{\lambda>0}\exp\left[\frac{\Delta\bar{\chi}_{n}^{2}}{2\bar{\chi}_{n}^{2}}\lambda^{2}-\lambda\left(\epsilon-\frac{\Delta\bar{\chi}_{n}^{2}}{2\bar{\chi}_{n}^{2}}\right)\right]
=exp(χ¯n22Δχ¯n2ϵ2+12ϵ).\displaystyle=\exp\left(-\frac{\bar{\chi}_{n}^{2}}{2\Delta\bar{\chi}_{n}^{2}}\epsilon^{2}+\frac{1}{2}\epsilon\right). (79)

These give the result in Equation (18).

Then, we estimate the tolerant error ϵ\epsilon in detail. By χni=jϵijqnj\chi_{n_{i}}=\sum_{j}\epsilon_{ij}q_{nj}, where ϵij\epsilon_{ij} is the parity between Pauli operators P^i,P^j\hat{P}_{i},\hat{P}_{j}, the average is

χ¯n=𝔼i(χni)=jqnj𝔼i(ϵij).\bar{\chi}_{n}=\mathbb{E}_{i}(\chi_{n_{i}})=\sum_{j}q_{nj}\mathbb{E}_{i}(\epsilon_{ij}). (80)

For j=0j=0, ϵij=1\epsilon_{ij}=1, while j0j\neq 0, ϵij=±1\epsilon_{ij}=\pm 1 distributes with equal probability 12\frac{1}{2}, with the assumption that ρi2\rho_{i}^{2} are, so the average

𝔼i(ϵij)=1Ziρi2ϵij=1Z𝔼ϵ(ϵρϵ2)\mathbb{E}_{i}(\epsilon_{ij})=\frac{1}{Z}\sum_{i}\rho_{i}^{2}\epsilon_{ij}=\frac{1}{Z}\mathbb{E}_{\epsilon}(\epsilon\rho_{\epsilon}^{2}) (81)

where we treat ρi2\rho_{i}^{2} as a random variable ρϵ2=ϵi=ϵρi2\rho_{\epsilon}^{2}=\sum_{\epsilon_{i}=\epsilon}\rho_{i}^{2} distributing over the parity ϵ\epsilon, and Z=iρi2Z=\sum_{i}\rho_{i}^{2}. Without accurate knowledge of the state, we assume the distribution of ρϵ2\rho_{\epsilon}^{2} is nearly symmetric, thus 𝔼i(ϵij)0\mathbb{E}_{i}(\epsilon_{ij})\sim 0. In this consideration, we have

χ¯nqf0=1qλ.\bar{\chi}_{n}\sim q_{f0}=1-q_{\lambda}. (82)

For the second-order moment,

χn2¯\displaystyle\overline{\chi_{n}^{2}} =𝔼i(χni2)=jqfj2+jkqnjqnk𝔼i(ϵijϵik)\displaystyle=\mathbb{E}_{i}(\chi_{n_{i}}^{2})=\sum_{j}q_{fj}^{2}+\sum_{j\neq k}q_{nj}q_{nk}\mathbb{E}_{i}(\epsilon_{ij}\epsilon_{ik})
jqnj2=(1qλ)2+j0qnj2\displaystyle\sim\sum_{j}q_{nj}^{2}=(1-q_{\lambda})^{2}+\sum_{j\neq 0}q_{nj}^{2}
(1qλ)2+qλ2,\displaystyle\leq(1-q_{\lambda})^{2}+q_{\lambda}^{2}, (83)

where the second line follows that when jkj\neq k, ϵij\epsilon_{ij} and ϵik\epsilon_{ik} are independent, and the third line is because qnjqλq_{nj}\leq q_{\lambda}. Therefore, the parameter σ\sigma is of the order

σqλ1qλ,\sigma\sim\frac{q_{\lambda}}{1-q_{\lambda}}, (84)

and the tolerant error

ϵ2qλ2δ48qλ+3qλ22λ+O(λ2).\epsilon\sim 2q_{\lambda}\sqrt{\frac{2-\delta}{4-8q_{\lambda}+3q_{\lambda}^{2}}}\sim\sqrt{2}\lambda+O(\lambda^{2}). (85)

We calculate the tolerant error of relative bias for extrapolation to pure state, sn0=0s_{n_{0}}=0. With χn02¯=𝔼(χn0i2)=1\overline{\chi_{n_{0}}^{2}}=\mathbb{E}(\chi_{n_{0}i}^{2})=1, and by Equation (51), we have

χn02¯\displaystyle\overline{\chi_{n_{0}}^{2}} =𝔼[χfi2(2n01)e2(n01)λ2ωi]\displaystyle=\mathbb{E}\left[\chi_{fi}^{2(2n_{0}-1)}e^{2(n_{0}-1)\lambda^{2}\omega_{i}}\right]
exp[2(n01)λ2(ω¯+(n01)λ2Δω¯2)],\displaystyle\approx\exp\left[2(n_{0}-1)\lambda^{2}(\bar{\omega}+(n_{0}-1)\lambda^{2}\Delta\bar{\omega}^{2})\right], (86)

where we expect n012n_{0}\sim\frac{1}{2} and use cumulant expansion to second order. Thus, the effective noise-free point is

n01ω¯λ2Δω¯2.n_{0}\approx 1-\frac{\bar{\omega}}{\lambda^{2}\Delta\bar{\omega}^{2}}. (87)

It follows that χ¯n0=eλ2ω¯2(1n0)\bar{\chi}_{n_{0}}=e^{-\frac{\lambda^{2}\bar{\omega}}{2}(1-n_{0})}, and the relative bias is

ϵ2σ2(1n0)λ2Δω¯+O(λ4),\epsilon\sim\sqrt{2}\sigma\sim\sqrt{2}(1-n_{0})\lambda^{2}\Delta\bar{\omega}+O(\lambda^{4}), (88)

For routine ZNE method, the relative bias is |eλ2ωi/21|\left|e^{-\lambda^{2}\omega_{i}/2}-1\right|, and the failing probability is calculated as

P(||ϵ)2exp(2ϵ2λ4Δω¯2ω¯22Δω¯2)cosh2ω¯ϵλ2Δω¯2.P\left(\left|\cdot\right|\geq\epsilon\right)\leq 2\exp\left(-\frac{2\epsilon^{2}}{\lambda^{4}\Delta\bar{\omega}^{2}}-\frac{\bar{\omega}^{2}}{2\Delta\bar{\omega}^{2}}\right)\cosh\frac{2\bar{\omega}\epsilon}{\lambda^{2}\Delta\bar{\omega}^{2}}. (89)

The tolerant error is bounded as

ϵ2λ2Δω¯21ω¯2/Δω¯2δ/212(1n0)2λ4Δω¯222λ2Δω¯,\epsilon\leq\frac{\sqrt{2}\lambda^{2}\Delta\bar{\omega}}{2}\sqrt{\frac{1-{\bar{\omega}^{2}}/{\Delta\bar{\omega}^{2}}-\delta/2}{1-2(1-n_{0})^{2}\lambda^{4}\Delta\bar{\omega}^{2}}}\sim\frac{\sqrt{2}}{2}\lambda^{2}\Delta\bar{\omega}, (90)

For the VD/ESD method with 22-replica, with χ¯n1qλ1λ\bar{\chi}_{n}\sim 1-q_{\lambda}\sim 1-\lambda and Δχ¯nqλλ\Delta\bar{\chi}_{n}\sim q_{\lambda}\sim\lambda, the failing probability is calculated in the similar way,

P(||ϵ)\displaystyle P\left(\left|\cdot\right|\geq\epsilon\right) 2exp[ϵ22λ2D22D]\displaystyle\lesssim 2\exp\left[-\frac{\epsilon^{2}}{2\lambda^{2}}-\frac{D-2}{2D}\right]
×cosh[D1D(1λ2D+1D)ϵ]\displaystyle\ \quad\times\cosh\left[\frac{D-1}{D}\left(\frac{1}{\lambda}-\frac{2D+1}{D}\right)\epsilon\right] (91)
2eϵ22λ212coshϵλ\displaystyle\sim 2e^{-\frac{\epsilon^{2}}{2\lambda^{2}}-\frac{1}{2}}\cosh\frac{\epsilon}{\lambda}
2e12(1ϵ412λ4)+O(ϵ6)\displaystyle\sim 2e^{-\frac{1}{2}}\left(1-\frac{\epsilon^{4}}{12\lambda^{4}}\right)+O(\epsilon^{6}) (92)

where ||\left|\cdot\right| stands for |Dχni(D1)χn2¯+11|\left|\frac{D\chi_{ni}}{(D-1)\overline{\chi_{n}^{2}}+1}-1\right|, and we assume λ1\lambda\ll 1 and D1,ϵλD\gg 1,\epsilon\sim\lambda in the third line. This shows that the tolerant error

ϵ124λ.\epsilon\sim\sqrt[4]{12}\lambda. (93)

Appendix E Efficiency

To calculate the overhead CemC_{em}, we consider the variance of the estimator. Since the estimator OemO_{em} is not linear, the variance is not defined rigidly. But up to linear order, we can derive it approximately.

Var[Oem]Var[ΔF]\displaystyle\mathrm{Var}[O_{em}]\sim\ \mathrm{Var}[\Delta F] (94)
=Var[i(FOiΔOi+FpiΔpi)]\displaystyle=\mathrm{Var}\left[\sum_{i}\left(\frac{\partial F}{\partial O_{i}}\Delta O_{i}+\frac{\partial F}{\partial p_{i}}\Delta p_{i}\right)\right]
(i|FOi|Var1/2[Oi]+|Fpi|Var1/2[pi])2\displaystyle\leq\left(\sum_{i}\left|\frac{\partial F}{\partial O_{i}}\right|\mathrm{Var}^{1/2}[O_{i}]+\left|\frac{\partial F}{\partial p_{i}}\right|\mathrm{Var}^{1/2}[p_{i}]\right)^{2}
(Var[O]i|FOi|+2Var[p]i|Fpi|)2.\displaystyle\lesssim\left(\mathrm{Var}[O]\sum_{i}\left|\frac{\partial F}{\partial O_{i}}\right|+2\mathrm{Var}[p]\sum_{i}\left|\frac{\partial F}{\partial p_{i}}\right|\right)^{2}. (95)

According to the law of large numbers (LLN), the overhead is given by

Cem\displaystyle C_{em} Nϵ(Oem)Nϵ(O)=Var[Oem]Var[O]\displaystyle\equiv\frac{N^{\epsilon}(O_{em})}{N^{\epsilon}(O)}=\frac{\mathrm{Var}[O_{em}]}{\mathrm{Var}[O]}
(i|FOi|+Var[p]Var[O]i|Fpi|)2\displaystyle\lesssim\left(\sum_{i}\left|\frac{\partial F}{\partial O_{i}}\right|+\frac{\mathrm{Var}[p]}{\mathrm{Var}[O]}\sum_{i}\left|\frac{\partial F}{\partial p_{i}}\right|\right)^{2}
(CemZNE+Var[p]Var[O]i|Fpi|)2,\displaystyle\sim\left(\sqrt{C_{em}^{\textrm{ZNE}}}+\frac{\mathrm{Var}[p]}{\mathrm{Var}[O]}\sum_{i}\left|\frac{\partial F}{\partial p_{i}}\right|\right)^{2}, (96)

where Nϵ(Oem)N^{\epsilon}(O_{em}) and Nϵ(O)N^{\epsilon}(O) is the number of shots to measure the mitigated estimator OemO_{em} and the raw estimator OO up to ϵ\epsilon precision correspondingly. Here, we assume that the overhead of routine ZNE method CemZNEi(F/Oi)2C_{em}^{\textrm{ZNE}}\sim\sum_{i}\left({\partial F}/{\partial O_{i}}\right)^{2} approximates to the observable expectation part of the overhead of pZNE. The reason is that the purity influences the mitigated estimator OemO_{em} via the error-rate-like index ss, thus the fitting model O(s)O(s) is similar to O(x)O(x) of ZNE.

In the second term of the overhead CemC_{em}, the part depending on FF is correlated to the specific fitting model. We only consider the variance Var[p]\mathrm{Var}[p] of purity in the latter. The variance Var[p]\mathrm{Var}[p] depends on the method to measure the purity, and as we previously discussed, there are two kinds of methods to measure the purity. One is the replica measurement method, which is based on the Equation (24). Another one is the QST-like method, such as QST and classical shadow, which measure the expectation of all Pauli operators σ^α\braket{\hat{\sigma}_{\alpha}}, where α{1,2,,4L1}\alpha\in\{1,2,\dots,4^{L}-1\}, and LL is the system size. The definition of the purity is

p=12L(1+ασ^α2).p=\frac{1}{2^{L}}\left(1+\sum_{\alpha}\braket{\hat{\sigma}_{\alpha}}^{2}\right). (97)

Then, we calculate the overhead of these two different measurement methods respectively.

For the replica measurement method, the variance of purity is

Var[p]=1NVar[S^(2)]=1p2N.\mathrm{Var}[p]=\frac{1}{N}\mathrm{Var}\left[\hat{S}^{(2)}\right]=\frac{1-p^{2}}{N}. (98)

Next, if we assume the observable of interest is a Pauli operator, we have Var[O]=1O2N\mathrm{Var}[O]=\frac{1-O^{2}}{N}, thus

Var[p]Var[O]=1p21O2.\frac{\mathrm{Var}[p]}{\mathrm{Var}[O]}=\frac{1-p^{2}}{1-O^{2}}. (99)

When the error rate x1x\rightarrow 1, we have p1p\rightarrow 1, if O↛1O\not\rightarrow 1, then the second term of the overhead CemC_{em} has less influence, and we have

CemCemZNE.C_{em}\sim C_{em}^{\textrm{ZNE}}. (100)

In the contrary, if O1O\rightarrow 1 also, we have

Var[p]Var[O]1p1O2KAKB2κ<,\frac{\mathrm{Var}[p]}{\mathrm{Var}[O]}\rightarrow\frac{1-p}{1-O}\rightarrow\frac{2K_{A}}{K_{B}}\equiv 2\kappa<\infty, (101)

where KAaAa(0)(ka)K_{A}\equiv\sum_{a}A_{a}(0)\Re(k_{a}), and KBaBa(0)ka=aB~a(0)(ka)K_{B}\equiv\sum_{a}B_{a}(0)k_{a}=\sum_{a}\tilde{B}_{a}(0)\Re(k_{a}). The second equality of KBK_{B} is because that O=O^O=\braket{\hat{O}} is real. Then the overhead is

Cem(CemZNE+2κi|Fpi|)2.C_{em}\lesssim\left(\sqrt{C_{em}^{\textrm{ZNE}}}+2\kappa\sum_{i}\left|\frac{\partial F}{\partial p_{i}}\right|\right)^{2}. (102)

For the QST-like measurement method, we assume that the observable O^\hat{O} is a Pauli operator located on l0l_{0} qubits, namely O^\hat{O} is not identity I^\hat{I} exactly on l0l_{0} qubits, and we call it the l0l_{0}-weight Pauli operator in the following. Then, the variance of purity of system with LL qubits is

Var[p]\displaystyle\mathrm{Var}[p] =Var[αpσαΔσα]=α(pσα)2Var[σα]\displaystyle=\mathrm{Var}\left[\sum_{\alpha}\frac{\partial p}{\partial\sigma_{\alpha}}\Delta\sigma_{\alpha}\right]=\sum_{\alpha}\left(\frac{\partial p}{\partial\sigma_{\alpha}}\right)^{2}\mathrm{Var}[\sigma_{\alpha}]
=14L1ασ^α2Var[σα].\displaystyle=\frac{1}{4^{L-1}}\sum_{\alpha}\braket{\hat{\sigma}_{\alpha}}^{2}\mathrm{Var}[\sigma_{\alpha}]. (103)

We measure the 3L3^{L} different LL-weight Pauli operators with the same number NLN_{L} of shots to perform the QST-like method. The number of shots used to estimate the ll-weight Pauli operator shorter than LL is Nl=3LlNLN_{l}=3^{L-l}N_{L}. Thus, by LLN, the variance of the ll-weight Pauli operator is

Var[σ|α|=l]Nl0NlVar[σ|α|=l0]=3ll0Var[O].\mathrm{Var}[\sigma_{|\alpha|=l}]\sim\frac{N_{l_{0}}}{N_{l}}\mathrm{Var}[\sigma_{|\alpha|=l_{0}}]=3^{l-l_{0}}\mathrm{Var}[O]. (104)

Dividing the summation of α\alpha into the summation of weight |α|=l|\alpha|=l, we have

Var[p]\displaystyle\mathrm{Var}[p] =14L1l>03lCLlσ^α=l||2¯Var[σ|α|=l]\displaystyle=\frac{1}{4^{L-1}}\sum_{l>0}3^{l}C_{L}^{l}\overline{\braket{\hat{\sigma}{\alpha}{=l}}{}{2}}\mathrm{Var}[\sigma_{|\alpha|=l}]
Var[O]4L1l>032ll0CLlσ^α=l||2¯\displaystyle\sim\frac{\mathrm{Var}[O]}{4^{L-1}}\sum_{l>0}3^{2l-l_{0}}C_{L}^{l}\overline{\braket{\hat{\sigma}{\alpha}{=l}}{}{2}}
Var[O]4L13l0l>032lCLl=10L14L13l0Var[O],\displaystyle\lesssim\frac{\mathrm{Var}[O]}{4^{L-1}3^{l_{0}}}\sum_{l>0}3^{2l}C_{L}^{l}=\frac{10^{L}-1}{4^{L-1}3^{l_{0}}}\mathrm{Var}[O], (105)

where σ^α=l||2¯13lCLl|α|=lσ^α21\overline{\braket{\hat{\sigma}{\alpha}{=l}}{}{2}}\equiv\frac{1}{3^{l}C_{L}^{l}}\sum_{|\alpha|=l}\braket{\hat{\sigma}_{\alpha}}^{2}\leq 1. Therefore, the overhead is bounded as

Cem(C~emZNE+10L14L13l0i|Fpi|)2.C_{em}\lesssim\left(\sqrt{\tilde{C}_{em}^{\textrm{ZNE}}}+\frac{10^{L}-1}{4^{L-1}3^{l_{0}}}\sum_{i}\left|\frac{\partial F}{\partial p_{i}}\right|\right)^{2}. (106)

Appendix F The Detail of Numerical Simulation

F.1 Data Process

For the extrapolation, we substitute the gates with the folded gates with odd numbers 1,31,3 and 55. In addition, we also simulate the case without any gate to separate the error of the circuit and the error of preparation and measurement. The circuit with layers from 11 to 2020 is simulated 1010 times. After each circuit with given layers, we perform the projective measurement on each qubit based on all three Pauli operators. Each Pauli operator is measured with 2,0002,000 shots.

With the measurement outcome, we estimate the expectations of all Pauli operators of the two-qubit system, which is used to calculate the purity of states. For the operator Z^0\hat{Z}_{0}, it is estimated from 6,0006,000 shots, while the purity of the system is estimated from 18,00018,000 shots. For a given number of layers, there are results of four different circuits, whose layer is 1,31,3 and 55 folded gate. The estimator of ZNE is extrapolated to the noise-free gate from the expectation Z^0\braket{\hat{Z}_{0}} of 1,31,3 and 55 folded gate, with the exponential fitting model

Z^0n=Aekn+B,\braket{\hat{Z}_{0}}_{n}=Ae^{-kn}+B, (107)

where (A,k,B)(A,k,B) are the parameters to be determined, and xx is the number of folding, namely 1,31,3 and 55. The estimator of pZNE is extrapolated from the expectation Z^0(x)\braket{\hat{Z}_{0}}(x) of 1,31,3 and 55 folded gates, and the purity p(x)p(x) of 1,31,3 and 55 folded gates. The fitting model is

p=AZ^0k+C,p=A\braket{\hat{Z}_{0}}^{k}+C, (108)

and the estimator is

Z^0pZNE=(1CA)1/k.\braket{\hat{Z}_{0}}_{\mathrm{pZNE}}=\left(\frac{1-C}{A}\right)^{1/k}. (109)

We fit the parameters AA, kk, and BB by the least square method.

For the VD/ESD method, we use the state estimator with M=2M=2 replica

ρp=ρn2Trρn2,\rho_{\mathrm{p}}=\frac{\rho_{n}^{2}}{\mathrm{Tr}\rho_{n}^{2}}, (110)

where ρn\rho_{n} is the noisy state of the qubit q0q_{0}. The estimator of expectation is

Z^0p=Tr(Z^0ρp)=Tr[Z^0ρn2]Trρn2.\braket{\hat{Z}_{0}}_{\mathrm{p}}=\mathrm{Tr}(\hat{Z}_{0}\rho_{\mathrm{p}})=\frac{\mathrm{Tr}[\hat{Z}_{0}\rho_{n}^{2}]}{\mathrm{Tr}\rho_{n}^{2}}. (111)

It can be verified that Tr[Z^0ρn2]=Tr[Z^0ρn]=Z^0n\mathrm{Tr}[\hat{Z}_{0}\rho_{n}^{2}]=\mathrm{Tr}[\hat{Z}_{0}\rho_{n}]=\braket{\hat{Z}_{0}}_{n}, thus

Z^0p=Z^0npn.\braket{\hat{Z}_{0}}_{\mathrm{p}}=\frac{\braket{\hat{Z}_{0}}_{n}}{p_{n}}. (112)

The estimator of the modified purification method is

Z0m=Z^0nD1Dpn1.\braket{Z_{0}}_{\mathrm{m}}=\braket{\hat{Z}_{0}}_{n}\sqrt{\frac{D-1}{Dp_{n}-1}}. (113)

F.2 Pauli Errors Sampling

We randomly sample the Pauli diagonal errors =iqi𝒫i\mathcal{E}=\sum_{i}q_{i}\mathcal{P}_{i}, where iqi=1\sum_{i}q_{i}=1, with a fixed error probability pλ=1p0p_{\lambda}=1-p_{0}. Therefore, all these errors form located on a 4n24^{n}-2 dimension projective space P4n2P\mathbb{R}^{4^{n}-2}. By stereographic projection, the projective space is diffeomorphic to Riemann sphere 𝕊4n2\mathbb{S}^{4^{n}-2}. The errors are sampled uniformly on this sphere.

First, we uniformly sample 4n14^{n}-1 number rir_{i} from [0,1][0,1]. The points formed from them are discarded if it is located out of the inscribed sphere, otherwise, the point is normalized to the surface of the sphere. Then, the point on spherical surface 𝕊4n2\mathbb{S}^{4^{n}-2} is map to the projective space P4n2P\mathbb{R}^{4^{n}-2} by stereographic projection. The point on projective space which has negative components is also discarded. The left point is normalized to the given error probability pλ=1p0p_{\lambda}=1-p_{0}.

The parameters of error model used to obtain the results of Figure 5(a), (b) and (c) is shown in Table 1.

pip_{i} I1I_{1} X1X_{1} Y1Y_{1} Z1Z_{1}
I0I_{0} 9.50×1019.50\times 10^{-1} 6.24×1036.24\times 10^{-3} 5.87×1035.87\times 10^{-3} 3.61×1033.61\times 10^{-3}
X0X_{0} 3.22×1033.22\times 10^{-3} 1.64×1031.64\times 10^{-3} 5.19×1035.19\times 10^{-3} 3.80×1043.80\times 10^{-4}
Y0Y_{0} 4.15×1034.15\times 10^{-3} 6.89×1036.89\times 10^{-3} 4.01×1034.01\times 10^{-3} 4.00×1044.00\times 10^{-4}
Z0Z_{0} 7.50×1047.50\times 10^{-4} 2.04×1032.04\times 10^{-3} 3.47×1033.47\times 10^{-3} 2.14×1032.14\times 10^{-3}
Table 1: The probability qiq_{i} of Pauli basis 𝒫i\mathcal{P}_{i} of Pauli diagonal error model in simulations of Figure 5(a), (b), and (c). The leftmost column denotes the Pauli operations on qubit q0q_{0}, and the top row denotes the Pauli operations on q1q_{1}.

F.3 Pauli Twirling of CNOT Gate

For our circuits in the experiment, Pauli twirling is performed for the whole circuit. The Pauli gates 𝒫1,𝒫2\mathcal{P}_{1},\mathcal{P}_{2} in Figure 6(a) run over the 1616 elements of Pauli group, and the Pauli gates 𝒫3,𝒫4\mathcal{P}_{3},\mathcal{P}_{4} are selected by the Equation 73 accordingly.

If LL is even, then the ideal circuit is identity, and 𝒫3=𝒫1,𝒫4=𝒫2\mathcal{P}_{3}=\mathcal{P}_{1},\mathcal{P}_{4}=\mathcal{P}_{2}. Otherwise, when LL is odd, 𝒫3,𝒫4\mathcal{P}_{3},\mathcal{P}_{4} can be calculated by Equation A. All the 1616 instances are shown in Table 2.

𝒫3𝒫4\mathcal{P}_{3}\mathcal{P}_{4} I2I_{2} X2X_{2} Y2Y_{2} Z2Z_{2}
I1I_{1} I1I2I_{1}I_{2} I1X2I_{1}X_{2} Z1Y2Z_{1}Y_{2} Z1Z2Z_{1}Z_{2}
X1X_{1} X1X2X_{1}X_{2} X1I2X_{1}I_{2} Y1Z2Y_{1}Z_{2} Y1Y2Y_{1}Y_{2}
Y1Y_{1} Y1X2Y_{1}X_{2} Y1I2Y_{1}I_{2} X1Z2X_{1}Z_{2} X1Y2X_{1}Y_{2}
Z1Z_{1} Z1I2Z_{1}I_{2} Z1X2Z_{1}X_{2} I1Y2I_{1}Y_{2} I1Z2I_{1}Z_{2}
Table 2: Pauli twirling of CNOT gate. The leftmost column denotes 𝒫1\mathcal{P}_{1}, and the top row denotes 𝒫2\mathcal{P}_{2}.

Appendix G Device Information on Quafu Cloud Platform

In Sec. 5, we use the ScQ-P18 device on Quafu cloud platform for experimental verification. Here we utilize the open-source Python SDK, namely PyQuafu, to write the experimental code for running the circuits. The layout of the ScQ-P18 device and the error rates of CZ gates are shown in Figure 7. We use qubits 4 and 5 on ScQ-P18 for the two-qubit demonstration. The XEB fidelity of single-qubit gates is about 0.9947 for qubit 4, 0.9972 for qubit 5, and their CZ gate is about 0.966. More information about the two used qubits can be found in Table 3.

Refer to caption
Figure 7: The layout of the chip and the error rates of CZ gates.
Qubit index 4 5
Qubit frequency, f10f_{10} (GHz\mathrm{GHz}) 4.930 4.465
Readout frequency, frf_{r} (GHz\mathrm{GHz}) 6.713 6.693
Anharmonicity, η\eta (MHz\mathrm{MHz}) -204.8 -191.7
Relaxation time, T1T_{1} (μs\mu\mathrm{s}) 23.37 31.03
Coherence time, T2T_{2} (μs\mu\mathrm{s}) 2.47 1.51
Readout fidelity of state |0\ket{0}, F0F_{0} 0.9524 0.9109
Readout fidelity of state |1\ket{1}, F1F_{1} 0.9025 0.8647
Table 3: The parameters of the two used qubits.

Here we mitigate the readout error of the experimental results on the cloud platform. In specific, we first assume the ideal and noisy measurements are in bases |x\ket{x} and |Ex\ket{E_{x}}, respectively. According to the Bayesian readout correction, the relation between the noisy measurement operator Mx:=|ExEx|M_{x}:=\ket{E_{x}}\bra{E_{x}} and the ideal operator Px:=|xx|P_{x}:=\ket{x}\bra{x} can be expressed as

Mx=yRxyPy,M_{x}=\sum_{y}R_{xy}P_{y}, (114)

where the matrix RR can be evaluated from the probability of experimental outcome when performing the noisy measurement on the prepared states |x\ket{x}, namely

Rxy=Tr(MxPy).R_{xy}=\mathrm{Tr}(M_{x}P_{y}). (115)

In the single-qubit case, the matrix RR is constructed by the readout fidelities:

R=(F01F11F1F1)R=\left(\begin{array}[]{cc}F_{0}&1-F_{1}\\ 1-F_{1}&F_{1}\end{array}\right) (116)

where F0F_{0} and F1F_{1} are the readout fidelities of states |0\ket{0} and |1\ket{1}. Ignoring the readout crosstalk, the multi-qubit RR matrix can be reduced to the direct product of each single-qubit RR matrix.

With the knowledge of matrix RR, we can construct the ideal measurement outcome by using its inverse

Px=y(R1)xyMy,P_{x}=\sum_{y}(R^{-1})_{xy}M_{y}, (117)

and the probability px=Tr(Pxρ)p_{x}=\mathrm{Tr}(P_{x}\rho) of some state ρ\rho in state |x\ket{x} is thus

px=y(R1)xyqy,p_{x}=\sum_{y}(R^{-1})_{xy}q_{y}, (118)

where qx=Tr(Mxρ)q_{x}=\mathrm{Tr}(M_{x}\rho) is the probability of this state in |Ex\ket{E_{x}}. In addition, we also use the least square method to limit the mitigated probabilities, so that all of them meet the non-negative and normalization conditions.

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grants Nos.92265207, T2121001, 11934018, 12122504), the Innovation Program for Quantum Science and Technology (Grant No. 2021ZD0301800), and Beijing Natural Science Foundation (Grant No. Z200009). We also acknowledge the support from the Synergetic Extreme Condition User Facility (SECUF) in Beijing, China.

Conflict of Interest

The authors declare no conflict of interest.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  • [1] B. M. Terhal, Rev. Mod. Phys. 2015, 87 307.
  • [2] E. Knill, R. Laflamme, W. Zurek, arXiv:9610011 1996.
  • [3] D. Aharonov, M. Ben-Or, In Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’97. Association for Computing Machinery, New York, NY, USA, ISBN 0897918886, 1997 176–188, URL https://doi.org/10.1145/258533.258579.
  • [4] A. Y. Kitaev, Russian Mathematical Surveys 1997, 52, 6 1191.
  • [5] Y. Zhao, Y. Ye, H.-L. Huang, Y. Zhang, D. Wu, H. Guan, Q. Zhu, Z. Wei, T. He, S. Cao, et al., Phys. Rev. Lett. 2022, 129 030501.
  • [6] G. Q. AI, Nature 2023, 614, 7949 676.
  • [7] J. Preskill, Quantum 2018, 2 79.
  • [8] S. Endo, Z. Cai, S. C. Benjamin, X. Yuan, Journal of the Physical Society of Japan 2021, 90, 3 032001.
  • [9] Z. Cai, R. Babbush, S. C. Benjamin, S. Endo, W. J. Huggins, Y. Li, J. R. McClean, T. E. O’Brien, Rev. Mod. Phys. 2023, 95 045005.
  • [10] Y. Li, S. C. Benjamin, Phys. Rev. X 2017, 7 021050.
  • [11] Z. Cai, npj Quantum Information 2021, 7, 1 80.
  • [12] M. Urbanek, B. Nachman, V. R. Pascuzzi, A. He, C. W. Bauer, W. A. de Jong, Phys. Rev. Lett. 2021, 127 270502.
  • [13] K. Temme, S. Bravyi, J. M. Gambetta, Phys. Rev. Lett. 2017, 119 180509.
  • [14] S. Endo, S. C. Benjamin, Y. Li, Phys. Rev. X 2018, 8 031027.
  • [15] A. Kandala, K. Temme, A. D. Córcoles, A. Mezzacapo, J. M. Chow, J. M. Gambetta, Nature 2019, 567, 7749 491.
  • [16] A. Mari, N. Shammah, W. J. Zeng, Phys. Rev. A 2021, 104 052607.
  • [17] X. Bonet-Monroig, R. Sagastizabal, M. Singh, T. E. O’Brien, Phys. Rev. A 2018, 98 062339.
  • [18] S. McArdle, X. Yuan, S. Benjamin, Phys. Rev. Lett. 2019, 122 180501.
  • [19] Z. Cai, Quantum 2021, 5 548.
  • [20] B. Koczor, Phys. Rev. X 2021, 11 031057.
  • [21] W. J. Huggins, S. McArdle, T. E. O’Brien, J. Lee, N. C. Rubin, S. Boixo, K. B. Whaley, R. Babbush, J. R. McClean, Phys. Rev. X 2021, 11 041036.
  • [22] M. Huo, Y. Li, Phys. Rev. A 2022, 105 022427.
  • [23] Z. Cai, arXiv:2107.07279 2021.
  • [24] J. R. McClean, M. E. Kimchi-Schwartz, J. Carter, W. A. de Jong, Phys. Rev. A 2017, 95 042308.
  • [25] J. R. McClean, Z. Jiang, N. C. Rubin, R. Babbush, H. Neven, Nature communications 2020, 11, 1 636.
  • [26] P. D. Nation, H. Kang, N. Sundaresan, J. M. Gambetta, PRX Quantum 2021, 2 040326.
  • [27] N. Yoshioka, H. Hakoshima, Y. Matsuzaki, Y. Tokunaga, Y. Suzuki, S. Endo, Phys. Rev. Lett. 2022, 129 020502.
  • [28] A. Strikis, D. Qin, Y. Chen, S. C. Benjamin, Y. Li, PRX Quantum 2021, 2 040330.
  • [29] P. Czarnik, M. McKerns, A. T. Sornborger, L. Cincio, arXiv preprint arXiv:2204.07109 2022.
  • [30] E. F. Dumitrescu, A. J. McCaskey, G. Hagen, G. R. Jansen, T. D. Morris, T. Papenbrock, R. C. Pooser, D. J. Dean, P. Lougovski, Phys. Rev. Lett. 2018, 120 210501.
  • [31] Y. Kim, C. J. Wood, T. J. Yoder, S. T. Merkel, J. M. Gambetta, K. Temme, A. Kandala, Nature Physics 2023, 19 752–759.
  • [32] M. Foss-Feig, S. Ragole, A. Potter, J. Dreiling, C. Figgatt, J. Gaebler, A. Hall, S. Moses, J. Pino, B. Spaun, B. Neyenhuis, D. Hayes, Phys. Rev. Lett. 2022, 128 150504.
  • [33] C. Song, J. Cui, H. Wang, J. Hao, H. Feng, Y. Li, Science Advances 2019, 5, 9 eaaw5686.
  • [34] E. Van Den Berg, Z. K. Minev, A. Kandala, K. Temme, Nature Physics 2023, 19 1116–1121.
  • [35] S. Zhang, Y. Lu, K. Zhang, W. Chen, Y. Li, J.-N. Zhang, K. Kim, Nature communications 2020, 11, 1 587.
  • [36] R. Sagastizabal, X. Bonet-Monroig, M. Singh, M. A. Rol, C. Bultink, X. Fu, C. Price, V. Ostroukh, N. Muthusubramanian, A. Bruno, et al., Phys. Rev. A 2019, 100 010302(R).
  • [37] L.-H. Ren, Y.-H. Shi, J.-J. Chen, H. Fan, Phys. Rev. A 2023, 107 052617.
  • [38] D. Zhu, S. Johri, N. M. Linke, K. A. Landsman, C. H. Alderete, N. H. Nguyen, A. Y. Matsuura, T. H. Hsieh, C. Monroe, Proceedings of the National Academy of Sciences 2020, 117, 41 25402.
  • [39] J. I. Colless, V. V. Ramasesh, D. Dahlen, M. S. Blok, M. E. Kimchi-Schwartz, J. R. McClean, J. Carter, W. A. de Jong, I. Siddiqi, Phys. Rev. X 2018, 8 011021.
  • [40] Google AI Quantum and Collaborators, Science 2020, 369, 6507 1084.
  • [41] F. Arute, et al., arXiv:2010.07965 2020.
  • [42] S. Stanisic, J. L. Bosse, F. M. Gambetta, R. A. Santos, W. Mruczkiewicz, T. E. O’Brien, E. Ostby, A. Montanaro, Nature Communications 2022, 13, 1 5743.
  • [43] J. Sun, X. Yuan, T. Tsunoda, V. Vedral, S. C. Benjamin, S. Endo, Phys. Rev. Appl. 2021, 15 034026.
  • [44] R. Majumdar, P. Rivero, F. Metz, A. Hasan, D. S. Wang, In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), volume 01. 2023 881–887, URL https://doi.org/10.1109/QCE57702.2023.00102.
  • [45] Z. Cai, arXiv:2110.05389 2021.
  • [46] R. Takagi, S. Endo, S. Minagawa, M. Gu, npj Quantum Information 2022, 8, 1 114.
  • [47] Y. Kim, A. Eddins, S. Anand, K. X. Wei, E. Van Den Berg, S. Rosenblatt, H. Nayfeh, Y. Wu, M. Zaletel, K. Temme, et al., Nature 2023, 618, 7965 500.
  • [48] Y. Quek, D. S. França, S. Khatri, J. J. Meyer, J. Eisert, arXiv:2210.11505 2023.
  • [49] K. Tsubouchi, T. Sagawa, N. Yoshioka, Phys. Rev. Lett. 2023, 131 210601.
  • [50] J. J. Wallman, J. Emerson, Phys. Rev. A 2016, 94 052325.
  • [51] Z. Cai, S. C. Benjamin, Scientific reports 2019, 9, 1 11281.
  • [52] A. Hashim, R. K. Naik, A. Morvan, J.-L. Ville, B. Mitchell, J. M. Kreikebaum, M. Davis, E. Smith, C. Iancu, K. P. O’Brien, I. Hincks, J. J. Wallman, J. Emerson, I. Siddiqi, Phys. Rev. X 2021, 11 041039.
  • [53] T. Giurgica-Tiron, Y. Hindy, R. LaRose, A. Mari, W. J. Zeng, In 2020 IEEE International Conference on Quantum Computing and Engineering (QCE). 2020 306–316.
  • [54] I. Henao, J. P. Santos, R. Uzdin, npj Quantum Information 2023, 9, 1 120.
  • [55] H.-Y. Huang, R. Kueng, J. Preskill, Nature Physics 2020, 16, 10 1050.
  • [56] C.-T. Chen, Y.-H. Shi, Z. Xiang, Z.-A. Wang, T.-M. Li, H.-Y. Sun, T.-S. He, X. Song, S. Zhao, D. Zheng, et al., Sci. China-Phys. Mech. Astron. 2022, 65, 11 110362.
  • [57] R. McWeeny, Rev. Mod. Phys. 1960, 32 335.

Table of Contents
Refer to caption

Zero noise extrapolation (ZNE) is an error mitigation method that amplifies and extrapolates noise to a noise-free point, yet it relies on assumptions about the error model. This paper introduces a modified method, termed purity-assisted ZNE (pZNE), which addresses these limitations by utilizing the Pauli twirling technique and leveraging the purity of the noisy output state.