Unified multivariate trace estimation and quantum error mitigation

Jin-Min Liang [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China Qiao-Qiao Lv [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China Zhi-Xi Wang [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China Shao-Ming Fei [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China

Abstract

Calculating the trace of the product of $m$ $n$ -qubit density matrices (multivariate trace) is a crucial subroutine in quantum error mitigation and information measures estimation. We propose an unified multivariate trace estimation (UMT) which conceptually unifies the previous qubit-optimal and depth-optimal approaches with tunable quantum circuit depth and the number of qubits. The constructed circuits have $\lceil(m-1)/s\rceil$ or $n\lceil(m-1)/s\rceil$ depth corresponding to $(s+m)n$ or $s+mn$ qubits for $s\in\{1,\cdots,\lfloor m/2\rfloor\}$ , respectively. Such flexible circuit structures enable people to choose suitable circuits according different hardware devices. We apply UMT to virtual distillation for achieving exponential error suppression and design a family of concrete circuits to calculate the trace of the product of $8$ and $9$ $n$ -qubit density matrices. Numerical example shows that the additional circuits still mitigate the noise expectation value under the global depolarizing channel.

I Introduction

Fault-tolerant quantum (FTQ) computers may provide novel computational advantages over classical computers for many tasks Shor1994 ; Harrow2009Quantum ; Biamonte2017Quantum ; Liang2019Quantum . As the perfect FTQ computers are not available yet, a preferable substitute is the the noisy intermediate-scale quantum (NISQ) devices with limited quantum resources Cerezo2021Variational ; Bharti2022Noisy . Nevertheless, the implementation of both FTQ and NISQ devices hinges on the effective control of noise. Thus, error correction and mitigation are of fundamental importance in quantum computation. Aiming at efficiently approximating the desired output states, the quantum error correction (QEC) provides a theoretical blueprint enabling quantum computation in an arbitrary small error level Devitt2013Quantum ; Lidar2013Quantum . However, duo to the larger qubit count, extra circuit complexity and other additional operations Preskill2018Quantum , the overhead of QEC is too large to be available for practical applications. Therefore, a variety of quantum error mitigation (QEM) approaches Endo2021Hybrid ; Zhang2021Variational ; Bultrini2021Unifying ; Yang2021Accelerated ; Cai2021A ; Takagi2022Universal ; Huo2022Dual ; Czarnik2022Improving for NISQ algorithms have been presented instead for QEC. Different from QEC, QEM focuses on recovering the ideal measurement statistics (usually the expectation values) Cao2022NISQ and can be directly employed in the ground state preparation Liang2022Improved ; McArdle2019Variational . For instance, the error extrapolation technique utilizes different error rates to the zero noise limit Temme2017Error ; Li2017Efficient ; Endo2018Practical . In particular, probabilistically implementing the inverse process can mitigate the noise effect on computation for some noise channels Temme2017Error . However, such error mitigation techniques rely on the prior knowledge on the noise model whose characterization is expensive.

The generalized quantum subspace expansion (GSE) Yoshioka2022Generalized and the virtual distillation (VD) methods Koczor2021Exponential ; Huggins2021Virtual do not require any information of the noise. The GSE method optimizes states from a subspace expanded by a small subset of Pauli operators. It is effective against coherent errors Endo2021Hybrid . The VD method prepares $m$ copies $n$ -qubit noisy state $\rho=E_{0}|u_{0}\rangle\langle u_{0}|+\sum_{k=1}^{2^{n}-1}E_{k}|u_{k}\rangle\langle u_{k}|$ in spectral decomposition, $E_{0}>E_{1}\geq\cdots\geq E_{2^{n}-1}$ , and calculates the expectation value of an observable $O$ with respect to the state $\rho^{m}/\textrm{Tr}(\rho^{m})$ ,

\displaystyle\langle O\rangle_{\textrm{vd}}^{(m)}=\frac{\mbox{Tr}(O\rho^{m})}{\mbox{Tr}(\rho^{m})}=\langle O\rangle_{\textrm{exact}}[1+\mathcal{O}(E_{1}^{m}/E_{0}^{m})],

(1)

which exponentially gets closer to the ideal value $\langle O\rangle_{\mbox{exact}}=\langle u_{0}|O|u_{0}\rangle$ . The core idea behind this claim is that the state $\rho^{m}/\mbox{Tr}(\rho^{m})$ approaches to the desired pure state $|u_{0}\rangle$ exponentially in $m$ . In order to obtain an exponential error suppression, an efficient quantum algorithm for measuring Eq. (1) is required.

Generally, instead of the $m$ copies of the $n$ -qubit noisy state $\rho$ , one may consider $m$ different $n$ -qubit states $\rho_{1},\cdots,\rho_{m}$ . The $\textrm{Tr}(\rho^{m})$ in Eq. (1) is then replaced by a more general quantity $\mbox{Tr}(\rho_{1}\cdots\rho_{m})$ which is called multivariate trace (MT) first introduced in the work Quek2022Multivariate . Measuring MT is of fundamental and practical interest in quantum information processing such as the calculation of the Rényi entropy, entanglement entropy Yao2010Entanglement and the entanglement spectroscopy Johri2017Entanglement . Broadly speaking, the estimation of MT on a quantum computer is currently tackled by using either the qubit-optimal approach Ekert2002Direct or depth-optimal method Quek2022Multivariate . Here, the qubit-optimal means that the number of ancilla qubits is optimal, and the depth-optimal stands for that the circuit depth is minimal. The qubit-optimal approach requires only single ancilla qubit and a $\Theta(m)$ -depth circuit, whereas the depth-optimal method requires $\lfloor m/2\rfloor$ ancilla qubits and a constant depth circuit, where $\lfloor\cdot\rfloor$ denotes the floor function. The former method is prohibited due to linear depth in $m$ . Meanwhile, the latter method has an attractive depth for NISQ devices, but the linear number of needed qubits in $m$ would restrict its application to small $m$ . Although a recent work Czarnik2021Qubit drastically reduces the qubit resource by utilizing qubit resets technique, the circuit depth is still $\Theta(m)$ . Thus, limited qubit number and circuit depth may hinder the advantage of the VD method in current NISQ era.

Accounting to the fact that the error accumulates with the increasing of the number of qubits and the depth of quantum circuit, in this work we provide a quantum algorithm to calculate the corrected expectation value Eq. (1) by constructing a family of circuits which have different circuit depth and number of qubits. As the authors Quek2022Multivariate pointed out that their circuit is flexible and can be adjusted as different depth and number of qubits. Thus, in this work we first mathematically establish a specific trade-off relation between the number of qubits and the circuit depth. The circuit depth and the number of qubits can be denoted as a function of a free parameter $s$ . From the variation $s$ , there are $\lfloor m/2\rfloor$ different circuit structures with the same number of quantum gates. Based on the constructed circuits, we propose an unified multivariate estimation (UMT), which is capable of calculating $\mbox{Tr}(\rho_{1}\cdots\rho_{m})$ with a tunable circuit depth and number of qubits. The existing qubit-optimal and depth-optimal algorithms are two extremal cases of our algorithm for $s=1$ and $\lfloor m/2\rfloor$ . Furthermore, we apply UMT to achieve the exponential error suppression and give a family of concrete circuits for $8$ and $9$ $n$ -qubit density matrices. Finally, we simulate the effects of the global depolarizing channel in the process of estimating $\langle O\rangle_{\textrm{vd}}^{(5)}$ for a two qubits state $\rho$ .

II Unified multivariate trace estimation

The MT is defined as

\displaystyle\mbox{Tr}(\rho_{1}\cdots\rho_{m}):=\mbox{Tr}[S^{(m)}(\rho_{1}\otimes\cdots\otimes\rho_{m})],

(2)

where $S^{(m)}$ is a unitary representation of the cyclic shift permutation $\pi=(1,2,\cdots,m)$ : $S^{(m)}|\psi_{1}\rangle\otimes|\psi_{2}\rangle\otimes\cdots\otimes|\psi_{m}\rangle=|\psi_{m}\rangle\otimes|\psi_{1}\rangle\otimes\cdots\otimes|\psi_{m-1}\rangle$ for pure states $|\psi_{1}\rangle,\cdots,|\psi_{m}\rangle$ . Note that for $m=2$ , $\mbox{Tr}[\rho_{1}\rho_{2}]=\mbox{Tr}[S^{(2)}(\rho_{1}\otimes\rho_{2})]$ with $S^{(2)}$ the SWAP operator. Eq. (2) means that the MT can be estimated by calculating the real and imaginary parts of $\mbox{Tr}[S^{(m)}(\rho_{1}\otimes\cdots\otimes\rho_{m})]$ . Following the framework of Ref. Ekert2002Direct , a crucial step is to perform the controlled unitary $\mathcal{C}(S^{(m)})$ with respect to $S^{(m)}$ , $\mathcal{C}(S^{(m)})=|0\rangle\langle 0|\otimes I+|1\rangle\langle 1|\otimes S^{(m)}$ , where $I$ denotes the identity. In this section, we construct a sequence of alternative circuits to achieve the operation $\mathcal{C}(S^{(m)})$ .

II.1 The qubit-depth trade-off

Proposition 1 (Qubit-depth trade-off).

For a given set of $n$ -qubit states $\{\rho_{1},\cdots,\rho_{m}\}$ , there exists a family of quantum circuits with depth $nh(m,s)$ and $(s+mn)$ qubits to achieve the operation $\mathcal{C}(S^{(m)})$ , $s=1,\cdots,\lfloor m/2\rfloor$ . The depth function $h(m,s)=\lceil(m-1)/s\rceil\in[2,m-1]$ is a monotonically decreasing function of the variable $s$ .

The proposition 1 is based on the decomposition of the permutation cycle $\pi$ Quek2022Multivariate ,

\pi=\left\{\begin{aligned} &\prod_{j=2}^{m/2}(j,m+2-j)\prod_{i=1}^{m/2}(i,m+1-i),&m&~{}even\\ &\prod_{j=2}^{\lceil m/2\rceil}(j,m+2-j)\prod_{i=1}^{\lfloor m/2\rfloor}(i,m+1-i),&m&~{}odd\end{aligned}\right.,

where all arithmetic is modulo $m$ . $\pi$ can be decomposed into a product of $(m-1)$ transpositions. The circuit of proposition 1 contains an ancillary register (AR) and a work register (WR). The WR stores $m$ density matrices $\rho_{1},\cdots,\rho_{m}$ . The AR stores an $s$ -qubit GHZ state, $|\mbox{GHZ}_{s}\rangle=\frac{1}{\sqrt{2}}(|0\rangle^{\otimes s}+|1\rangle^{\otimes s})$ which controls the SWAP operation between two different density matrices $\rho_{l},\rho_{k}$ for $l,k=1,\cdots,m$ . Each qubit of $|\mbox{GHZ}_{s}\rangle$ controls one transposition. Thus, the $s$ -qubit GHZ state controls $s$ transpositions at one time. $m-1$ transpositions can be controlled at most $\lceil(m-1)/s\rceil$ times. Each SWAP operations can be decomposed into $n$ controlled SWAP gates. Thus, the total depth is $n\lceil(m-1)/s\rceil$ . Here we remark that by using the qubit reset technique and the middle measurement, the $s$ -qubit GHZ state $|\mbox{GHZ}_{s}\rangle$ can be prepared by a constant-depth quantum circuit (independent of $s$ ). The number of qubits needed is $s$ or $2(s-1)$ when $s$ is even or odd, respectively Quek2022Multivariate . In general, for preparing $|\mbox{GHZ}_{s}\rangle$ , a coherent quantum circuit has depth $\mathcal{O}(s)$ . In this work, we do not consider the circuit depth of preparing state $|\mbox{GHZ}_{s}\rangle$ and density matrices $\rho_{1},\cdots,\rho_{m}$ .

Notice that $s=1$ and $s=\lfloor m/2\rfloor$ cover the results of the qubit-optimal method Ekert2002Direct and the depth-optimal approach Quek2022Multivariate . With these two extremal cases, altogether there are $\lfloor m/2\rfloor$ optional circuits. In the work Ekert2002Direct the authors introduced a single ancilla qubit and implemented a controlled unitary $S^{(m)}$ with depth $(m-1)n$ . The circuit presented in Quek2022Multivariate has depth $2n$ and $\lfloor m/2\rfloor$ ancilla qubits. The quantum circuits in Ekert2002Direct and Quek2022Multivariate can be seen as $n$ parallelized sub-circuits with only single or $\lfloor m/2\rfloor$ ancilla qubits, respectively Beckey2021Computable ; Cai2021Resource . Fig. 1 illustrates three mathematically equivalent quantum circuits for estimating $\mbox{Tr}(\rho_{1}\rho_{2}\rho_{3})$ with single ancillary qubit.

Refer to caption — Figure 1: Quantum circuits for estimating the real part of $\mbox{Tr}(\rho_{1}\rho_{2}\rho_{3})$ , where states $\rho_{1},\rho_{2},\rho_{3}$ are two-qubit ones. (a) The circuit in the work Ekert2002Direct . The number of ancilla qubits is $1$ and the circuit depth is $4$ . (b) and (c) are the parallelized versions of the circuit (a). The corresponding circuit depth is $2$ and the number of ancilla qubits is $2$ . The $8$ -qubit quantum circuit (b) can be seen as $2$ parallelized circuits with each $1$ ancilla qubit, as shown in (c).

Generalizing proposition 1 in the parallelized scenario, we have

Proposition 2 (Qubit-depth trade-off in parallelized scenario).

Given a set of $n$ -qubit states $\{\rho_{1},\cdots,\rho_{m}\}$ . There is a family of quantum circuits for achieving the operation $\mathcal{C}(S^{(m)})$ . These circuits have depth $h(m,s)=\lceil(m-1)/s\rceil$ and $(s+m)n$ qubits, $s=1,\cdots,\lfloor m/2\rfloor$ .

The circuits in proposition 2 are the parallelized versions of the Proposition 1, in which the AR consists of $n$ $s$ -qubit GHZ states. Each $s$ -qubit GHZ state achieves the controlled SWAP operation on the single qubit subspace of the state $\rho_{i}$ , $i=1,\cdots,m$ . Moreover, in the case that $m=2$ and $\rho_{1}=\rho_{2}=\rho$ , we recover the circuit for estimating the purity $\mbox{Tr}(\rho^{2})$ presented in Beckey2021Computable as a special case of our approach that $s=\lfloor m/2\rfloor=1$ and $h(2,1)=1$ . In particular, when $s=2$ , the circuit depth is $h(m,2)=m/2$ and $h(m,2)\in[\lfloor m/2\rfloor,\lceil m/2\rceil]$ for even and odd, respectively. This property guarantees that only $2n$ or $2$ additional ancilla qubits can reduce the depth by half. Proposition 1 and 2 show that the number of ancilla qubits and the circuit depth are tunable according to different hardware devices.

II.2 UMT estimation

Given the equivalent circuit construction of the controlled $S^{(m)}$ operation among density matrices, we perform a measurement in the Pauli $\sigma_{x}$ basis on each ancilla qubit. The expectation value gives an estimation of $\mbox{Tr}(\rho_{1}\cdots\rho_{m})$ . We have the following Theorem, see proof in Appendix A.

Theorem 1 (UMT estimation).

Given a set of $n$ -qubit states $\{\rho_{1},\cdots,\rho_{m}\}$ and fixed error $\varepsilon>0$ and $\delta\in(0,1)$ . UMT estimation calculates the quantity Eq. (2) by the sample mean $\langle\hat{V}\rangle$ of a random variable $\hat{V}$ that can be produced using $\mathcal{O}(\varepsilon^{-2}\log(\delta^{-1}))$ repetitions of a quantum circuit, constructed via the Propositions 1 and 2, consisting of $\mathcal{O}(mn)$ controlled SWAP gates such that

\displaystyle\emph{Pr}(|\langle\hat{V}\rangle-\mbox{Tr}(\rho_{1}\cdots\rho_{m})|\leq\varepsilon)\geq 1-\delta

(3)

for $s=1,\cdots,\lfloor m/2\rfloor$ .

Theorem 1 gives an analysis on the sample complexity guaranteed by the Hoeffding’s inequality Hoeffding1963Probability . The number of quantum gates is $\mathcal{O}(mn)$ for different circuit structures.

III UMT for virtual distillation

A direct application of UMT is to the quantum error mitigation Huggins2021Virtual ; Koczor2021Exponential . Suppose that the near-term quantum devices aim to prepare an $n$ -qubit pure state $|\phi\rangle$ . However, owing to the effect of environment noise, one prepares instead a mixed state $\rho=\mathcal{C}(|\phi\rangle\langle\phi|)$ , where the operation $\mathcal{C}$ is a map containing a unitary evolution and a noise channel such as depolarizing channel. The error-free expected value of an Hermitian operator $O$ is $\langle O\rangle=\mbox{Tr}(O|\phi\rangle\langle\phi|)$ . However, the noisy expected value is $\langle O\rangle_{\textrm{noise}}=\mbox{Tr}(O\rho)\neq\langle O\rangle$ . Virtual distillation provides a method to approximate $\langle O\rangle$ as a corrected expectation value

\displaystyle\langle O\rangle_{\textrm{vd}}^{(m)}=\frac{\mbox{Tr}(O\rho^{m})}{\mbox{Tr}(\rho^{m})},

(4)

by $m$ copies of the mixed state $\rho$ .

III.1 Estimating the corrected expectation value with UMT

It is clear to see that the denominator in Eq. (4) can be evaluated by employing Theorem 1 with setting $\rho_{1}=\cdots=\rho_{m}=\rho$ . The numerator of Eq. (4) is

\displaystyle\textrm{Tr}(O\rho^{m})=\textrm{Tr}(\tilde{O}^{(i)}S^{(m)}\rho\otimes\cdots\otimes\rho),

(5)

where the observable $\tilde{O}^{(i)}=I\otimes\cdots O^{(i)}\otimes\cdots\otimes I$ and $O^{(i)}$ denotes the operator $O$ acting on the $i$ th register which stores the $i$ th copies of $\rho$ . Suppose an efficient decomposition

\displaystyle O=\sum_{k=1}^{N_{o}}a_{k}P_{k},~{}~{}a_{k}\in\mathbb{R},

(6)

where $P_{k}=\sigma_{k_{1}}\otimes\cdots\otimes\sigma_{k_{n}}$ are tensor products of Pauli operators $\sigma_{k_{1}},\cdots,\sigma_{k_{n}}\in\{\sigma_{x},\sigma_{y},\sigma_{z},I\}$ and the quantity $\sum_{k=1}^{N_{o}}|a_{k}|=\mathcal{O}(c)$ is bounded by a constant $c$ . It is straightforward to show that

\displaystyle\textrm{Tr}(O\rho^{m})=\sum_{k=1}^{N_{o}}a_{k}\textrm{Tr}[\tilde{P}_{k}^{(i)}S^{(m)}(\rho\otimes\cdots\otimes\rho)],

(7)

where $\tilde{P}_{k}=I\otimes\cdots P_{k}^{(i)}\otimes\cdots\otimes I$ . By preparing $m$ copies of the state $\rho$ , Theorem 2 (see proof in Appendix B) provides an estimator of the denominator.

Theorem 2 (Estimation of $\mbox{Tr}(O\rho^{m})$ ).

Let $\rho$ be an $n$ -qubit noisy state. Given a Pauli decomposition of observable $O$ Eq. (6). For fixed precision $\varepsilon>0$ , $\delta\in(0,1)$ , and a constant $c$ there exists a quantum algorithm that estimates $\emph{Tr}(O\rho^{m})$ within $\varepsilon$ additive error with success probability $1-\delta$ and requires $\mathcal{O}\Big{(}\frac{mN_{o}c^{2}}{\varepsilon^{2}}\log\frac{1}{\delta}\Big{)}$ copies of $\rho$ and $\mathcal{O}\Big{(}\frac{N_{o}c^{2}}{\varepsilon^{2}}\log\frac{1}{\delta}\Big{)}$ repetitions of a quantum circuit (constructed via the propositions 1 and 2) consisting of $\mathcal{O}(mn)$ controlled SWAP gates for $s=1,\cdots,\lfloor m/2\rfloor$ .

After implementing the sequences of controlled SWAP gate, we perform a controlled $P_{k}$ on an arbitrary register storing the state $\rho$ . Then we measure the ancilla qubits in the basis of Pauli operators $\sigma_{x}$ and $\sigma_{y}$ . The measurement sample means are the real and imaginary parts of $\mbox{Tr}(O\rho^{m})$ . We remark that the quantity $\sum_{k=1}^{N_{o}}|a_{k}|=c$ plays a great role in efficiently estimating the numerator $\mbox{Tr}(O\rho^{m})$ . Due to the fact that the sample complexity is linear in $\left(\sum_{k=1}^{N_{o}}|a_{k}|\right)^{2}$ , we thus expect that $\sum_{k=1}^{N_{o}}|a_{k}|$ is bounded by a constant $c$ . This observation is intuitive. In variational quantum eigensolver Peruzzo2014Quantum and variational quantum simulation Cerezo2021Variational ; Bharti2022Noisy , one typical question is the estimation of the expectation values of Hamiltonian $H$ . The number of repetitions needed to obtain a precision $\epsilon$ with operator averaging is similar to our result Roggero2020Short .

III.2 Approximations for mean and variance of a Ratio

The numerator and denominator are calculated via producing two independent variables $\hat{X}$ and $\hat{Y}$ . Let $\hat{X}=(X_{1},\cdots,X_{\mathcal{N}_{O}})$ and $\hat{Y}=(Y_{1},\cdots,Y_{\mathcal{N}_{I}})$ be two independent variables, denoting the sampling results after running the UMT and measuring the ancilla qubits. The sample means are given by

	$\displaystyle\langle\hat{X}\rangle=\frac{\sum_{j=1}^{\mathcal{N}_{O}}X_{j}}{\mathcal{N}_{O}}\approx\mathbb{E}[\hat{X}]=\textrm{Tr}(O\rho^{m}),$		(8)
	$\displaystyle\langle\hat{Y}\rangle=\frac{\sum_{j=1}^{\mathcal{N}_{I}}Y_{j}}{\mathcal{N}_{I}}\approx\mathbb{E}[\hat{Y}]=\textrm{Tr}(\rho^{m}),$		(9)

with error $\mathcal{E}_{O}=\mathcal{E}_{I}=\mathcal{O}(\mathcal{N}^{-1/2})$ such that

\displaystyle|\langle\hat{X}\rangle-\mathbb{E}[\hat{X}]|\leq\mathcal{E}_{O},~{}~{}|\langle\hat{Y}\rangle-\mathbb{E}[\hat{Y}]|\leq\mathcal{E}_{I},

(10)

where we have set $\mathcal{N}_{O}=\mathcal{N}_{I}=\mathcal{N}$ to represent the number of samples. Then, the expectation value of the ratio $\langle\hat{X}\rangle/\langle\hat{Y}\rangle$ has an approximation, Small2010Expansions

$\displaystyle\mathbb{E}\Bigg{[}\frac{\langle\hat{X}\rangle}{\langle\hat{Y}\rangle}\Bigg{]}$	$\displaystyle\approx\frac{\mathbb{E}[\hat{X}]}{\mathbb{E}[\hat{Y}]}+\frac{\mathbb{E}[\hat{X}]\textrm{Var}[\hat{Y}]^{2}}{\mathcal{N}\mathbb{E}[\hat{Y}]^{3}}$
	$\displaystyle=\frac{\textrm{Tr}(O\rho^{m})}{\textrm{Tr}(\rho^{m})}+\frac{\textrm{Tr}(O\rho^{m})\textrm{Var}[\hat{Y}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{3}}$
	$\displaystyle=\frac{\textrm{Tr}(O\rho^{m})}{\textrm{Tr}(\rho^{m})}+\frac{\textrm{Tr}(O\rho^{m})[1-\textrm{Tr}(\rho^{m})^{2}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{3}},$	(11)

with error $\mathcal{O}(\mathcal{N}^{-1})$ . The approximation variance of the ratio $\langle\hat{X}\rangle/\langle\hat{Y}\rangle$ Small2010Expansions is

$\displaystyle\textrm{Var}\Bigg{[}\frac{\langle\hat{X}\rangle}{\langle\hat{Y}\rangle}\Bigg{]}$	$\displaystyle\approx\frac{\mathbb{E}[\hat{X}]^{2}\textrm{Var}[\hat{Y}]^{2}}{\mathcal{N}\mathbb{E}[\hat{Y}]^{4}}+\frac{\textrm{Var}[\hat{X}]^{2}}{\mathcal{N}\mathbb{E}[\hat{Y}]^{2}}$
	$\displaystyle=\frac{\textrm{Tr}(O\rho^{m})^{2}\textrm{Var}[\hat{Y}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{4}}+\frac{\textrm{Var}[\hat{X}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{2}}$
	$\displaystyle=\frac{\textrm{Tr}(O\rho^{m})^{2}[1-\textrm{Tr}(\rho^{m})^{2}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{4}}$
	$\displaystyle+\frac{\sum_{k=1}^{N_{o}}\|a_{k}\|^{2}[1-\textrm{Tr}(P_{k}\rho^{m})^{2}]}{\mathcal{N}\textrm{Tr}(\rho^{m})^{2}},$	(12)

with error $\mathcal{O}(\mathcal{N}^{-1})$ , where we have used the following results,

	$\displaystyle\textrm{Var}[\hat{X}]=\sum_{k=1}^{N_{o}}\|a_{k}\|^{2}[1-\textrm{Tr}(P_{k}\rho^{m})^{2}],$		(13)
	$\displaystyle\textrm{Var}[\hat{Y}]=1-\textrm{Tr}(\rho^{m})^{2}.$		(14)

In particular, when $\rho$ is a pure state, the variance reduces to

\displaystyle\textrm{Var}\Bigg{[}\frac{\langle\hat{X}\rangle}{\langle\hat{Y}\rangle}\Bigg{]}\approx\frac{\sum_{k=1}^{N_{o}}|a_{k}|^{2}[1-\textrm{Tr}(P_{k}\rho^{m})^{2}]}{\mathcal{N}}.

(15)

The variance estimation provides an approach to evaluate the required number of samples. Assuming a desired variance is $\Delta^{2}$ , Eq. (III.2) implies that the number of samples

	$\displaystyle\mathcal{N}$	$\displaystyle\approx\frac{\textrm{Tr}(O\rho^{m})^{2}[1-\textrm{Tr}(\rho^{m})^{2}]^{2}}{\Delta^{2}\textrm{Tr}(\rho^{m})^{4}}$
		$\displaystyle+\frac{\sum_{k=1}^{N_{o}}\|a_{k}\|^{2}[1-\textrm{Tr}(P_{k}\rho^{m})^{2}]}{\Delta^{2}\textrm{Tr}(\rho^{m})^{2}}.$		(16)

In the work Huggins2021Virtual the authors analyzed the variance of the estimator for $m=2$ . Here, we present an approximation for the mean and variance of the estimator for arbitrary $m$ .

III.3 Concrete construction of a family of circuits and noisy implementation

In this section, we first show the circuit construction of $8$ and $9$ $n$ -qubit density matrices. The permutations $\pi_{8}=(1,2,\cdots,8)$ and $\pi_{9}=(1,2,\cdots,9)$ have decompositions

	$\displaystyle\pi_{8}$	$\displaystyle=(4,6)(3,7)(2,8)(4,5)(3,6)(2,7)(1,8),$		(17)
	$\displaystyle\pi_{9}$	$\displaystyle=(5,6)(4,7)(3,8)(2,9)(4,6)(3,7)(2,8)(1,9),$		(18)

where each of transpositions denotes the $n$ SWAP gates. Based on Proposition 1, Fig. 2 (a-d) show $4$ circuits for computing $\textrm{Tr}(O\rho^{8})$ and $\textrm{Tr}(\rho^{8})$ for $s=1,2,3,4$ . The total number of qubits is $s+8n$ including $s$ ancilla qubits. The depth is $2n$ , $3n$ , $4n$ and $7n$ . Fig. 2 (e-h) show $4$ circuits for computing $\textrm{Tr}(O\rho^{9})$ and $\textrm{Tr}(\rho^{9})$ for $s=1,2,3,4$ . The total number of qubits is $s+9n$ including $s$ ancilla qubits. The depth is $2n$ , $3n$ , $4n$ and $8n$ . Fig. 3 is a parallelized version of Fig. 2 as shown in Proposition 2. The total number of qubits is $(s+8)n$ including $sn$ ancilla qubits for $s=1,2,3,4$ . The depth is $2$ , $3$ , $4$ and $7$ , respectively.

Here, we consider the effect of noise in the quantum circuits for different width and depth. Suppose we prepare an exact state $\rho=U(\boldsymbol{\alpha})(|0\rangle\langle 0|\otimes|0\rangle\langle 0|)U^{{\dagger}}(\boldsymbol{\alpha})$ by a parameterized quantum circuit

\displaystyle U(\boldsymbol{\alpha})=\prod_{i=1}^{2}\mbox{CNOT}\times(R_{y}(\alpha_{i})\otimes R_{y}(\alpha_{i+1})),

(19)

where $R_{y}(\alpha_{i})=e^{-\iota\alpha_{i}/2\sigma_{y}}$ is the rotation operator, CNOT denotes the controlled-NOT gate and the parameters $(\alpha_{1},\alpha_{2},\alpha_{3},\alpha_{4})=(0.8147,0.1270,0.2785,0.5469)$ . The state $\rho$ after the depolarizing noise channel is given by

\displaystyle\rho_{\textrm{noise}}=\mathcal{C}_{\textrm{depo}}(\gamma_{0},\rho)=(1-\gamma_{0})\rho+\gamma_{0}\frac{I}{4},~{}~{}~{}\gamma_{0}\in(0,1).

(20)

We measure the expectation value of observable $O=(\sigma_{z}^{(1)}+\sigma_{z}^{(2)})/2$ with respect to states $\rho$ and $\rho_{\textrm{noise}}$ . Fig. 4 shows two circuits for estimating $\mbox{Tr}(O\rho_{\textrm{noise}}^{5})$ and $\mbox{Tr}(\rho_{\textrm{noise}}^{5})$ . After each layer, we insert a depolarizing channel with parameter $\gamma$ for simulating the noise effects. Two and four depolarizing channels are required for Fig. 4.(a) and (b), respectively. The ideal expectation value is $\langle O\rangle=\mbox{Tr}(O\rho)=0.7547$ and noise result $\langle O\rangle_{\textrm{noise}}=\mbox{Tr}(O\rho_{\textrm{noise}})=0.4528$ for $\gamma_{0}=0.4$ . The corrected expectation value after VD and our circuits is given by

\displaystyle\langle O\rangle_{\textrm{vd}}^{(5)}=\frac{\mbox{Tr}(O\rho_{\textrm{noise}}^{5})}{\mbox{Tr}(\rho_{\textrm{noise}}^{5})}.

(21)

As shown in Table 1, by numerical calculation it is shown that our circuits can still mitigate the error even in the presence of the depolarizing noise channel.

$\gamma$	0.2	0.4	0.6	0.8
Fig. 4.(a) $\langle O\rangle_{\textrm{vd}}^{(5)}$	0.7546	0.7546	0.7546	0.7546
Fig. 4.(b) $\langle O\rangle_{\textrm{vd}}^{(5)}$	0.7546	0.7546	0.7546	0.7546

Table 1: The corrected expectation value

\langle O\rangle_{\textrm{vd}}^{(5)}

for different parameters

\gamma

IV Conclusion and discussion

We have proposed an unified quantum algorithm for estimating the multivariate trace. Our results depend on a qubit-depth trade-off relation which helps us to construct a family of circuits. The designed circuits have flexible depth and number of qubits, but the total number of quantum gates is always $\mathcal{O}(mn)$ . These proposals can be used as an important subroutine for estimating entanglement spectroscopy Yirka2021Qubit , quantum metrology Yamamoto2022Error and calculating the nonlinear function of density matrix Ekert2002Direct . Moreover, we have applied the UMT to achieve the exponential error suppression for quantum error mitigation and numerically find that our circuits still work in the noise situation of the global depolarizing channel. Notice that the recent work Zhou2022A estimated the MT with randomized measurement and further reduced the overhead of qubits Elben2022The . However, the number of depth remains the same as the qubit-optimal method Ekert2002Direct . Our algorithm gives also an alternative of the quantum parts in Zhou2022A .

There are two proposals involving dual-state purification Huo2022Dual ; Cai2021Resource , in which only the single ancillary qubit and the implementation of the dual channel are required. The related framework utilizes qubit reset technique to reduce the number of qubits Cai2021Resource compared with our circuit for $s=1$ . However, the circuit depth is still $\mathcal{O}(m)$ . Our family of circuits provides alternatives to reduce the circuit depth under the consume of the number of qubits. In general, depth is a more important index than qubit overhead for a quantum circuit. It is also worth to remark that our approaches utilize ancillary system to estimate $\textrm{Tr}(O\rho^{m})$ . For $m=2$ , a well-known destructive SWAP test Escartin2013SWAP ; Cincio2018Learning achieves the same goal without ancillary qubit and, at the same time, also reduces the circuit depth to a constant. However, one requires to measure multiple qubits which may increase the measurement overhead.

Notably, we have dealt with the problem for arbitrary positive integer $m$ . Estimating a nonlinear function of quantum states is also of fundamental and practical interests. For instance, the fidelity involves a square root of a quantum state Nielsen2000Quantum , while the Tsallis entropies are defined by $S^{T}_{\alpha}(\rho)=\frac{1}{1-\alpha}[\textrm{Tr}(\rho^{\alpha})-1]$ for $\alpha\in(0,1)\cup(1,\infty)$ Tsallis1988Possible . Although direct estimation is hard on a quantum computer, a hybrid quantum-classical framework Tan2021Variational makes the computation plausible by combining quantum state learning Lee2018Learning and approximating fractional powers. The core idea is to minimize the quantum purity which involves an estimation of MT. Thus, our proposals can be generalized to calculate a nonlinear function of quantum states in a roundabout way.

Several interesting issues should further be investigated in the near future. The first one is to simulate the effects of different types of noise in the circuit implementation of estimating $\textrm{Tr}(O\rho^{m})$ such as the amplitude damping and phase damping channels. It would be also interesting to explore circuit structures that reduce the width and depth simultaneously. However, for the calculation of MT estimation, the qubit-depth trade-off shows that the circuit depth and width are complementary computational resources. In particular, a reduction in circuit depth is often accompanied by an increase in width, and vice versa. Current available quantum computation devices often have small size in number of qubits and circuit depth. Thus, from the view of computational resources the circuit depth and width should be reduced as much as possible in quantum algorithm design. Due to the friendly decomposition of the cyclic permutation, the circuit depth in our algorithms is reduced coincidentally, although the number of ancilla qubits is increased in a control range. Our results may highlight further investigations on the depth reduction for general quantum circuits.

Acknowledgements: This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 12075159 and 12171044; Beijing Natural Science Foundation (Grant No. Z190005); the Academician Innovation Platform of Hainan Province.

Appendix A The proof of Theorem 1

The Theorem 1 can be proved in a way similar to the one given in Quek2022Multivariate , by starting with the $s$ -qubit GHZ states. Suppose we have efficiently prepared an $s$ -qubit GHZ state $|\mbox{GHZ}_{s}\rangle=\frac{1}{\sqrt{2}}(|0\rangle^{\otimes s}+|1\rangle^{\otimes s})$ by a constant-depth quantum circuit Quek2022Multivariate . Eq. (2) provides a direct way to estimate MT by calculating the real part $\textrm{Re}[\mbox{Tr}(S^{(m)}\rho^{(n,m)})]$ and the imaginary part $\mbox{Im}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})]$ , where $\rho^{(n,m)}=\rho_{a_{1}}\otimes\rho_{a_{2}}\otimes\cdots\otimes\rho_{a_{m}}$ is an arrangement of $m$ states $\{\rho_{1},\rho_{2},\cdots,\rho_{m}\}$ . Due to the equivalence of Propositions 1 and 2, here we only complete the proof according to the circuits from Proposition 1.

UMT estimation implements $(m-1)n=\mathcal{O}(mn)$ controlled SWAP on an initial state $\Phi_{0}=|\mbox{GHZ}_{s}\rangle\langle\mbox{GHZ}_{s}|\otimes\rho^{(n,m)}$ , giving rise to the state

\displaystyle\Phi_{1}=\frac{1}{2}(\Phi_{1}^{(1)}+\Phi_{1}^{(2)}+\Phi_{1}^{(3)}+\Phi_{1}^{(4)}),

(22)

where

	$\displaystyle\Phi_{1}^{(1)}=\|0\rangle\langle 0\|^{\otimes s}\otimes\rho^{(n,m)},$		(23)
	$\displaystyle\Phi_{1}^{(2)}=\|0\rangle\langle 1\|^{\otimes s}\otimes\rho^{(n,m)}(S^{(m)})^{{\dagger}},$		(24)
	$\displaystyle\Phi_{1}^{(3)}=\|1\rangle\langle 0\|^{\otimes s}\otimes S^{(m)}\rho^{(n,m)},$		(25)
	$\displaystyle\Phi_{1}^{(4)}=\|1\rangle\langle 1\|^{\otimes s}\otimes S^{(m)}\rho^{(n,m)}(S^{(m)})^{{\dagger}}.$		(26)

Next, we measure the $s$ ancillary qubits in the basis of Pauli operator $\sigma_{x}$ and record $\mathcal{Q}_{i}=0$ or $\mathcal{Q}_{i}=1$ , $i=1,\cdots,s$ , with respect to the measurement outcomes $|+\rangle=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle)$ or $|-\rangle=\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle)$ , respectively. After one time measurement, obtaining a bit string $(\mathcal{Q}_{1}\cdots\mathcal{Q}_{i}\cdots\mathcal{Q}_{s})$ , $\mathcal{Q}_{i}\in\{0,1\}$ , means that the state of the ancillary registers has collapsed to $|\mathcal{Q}_{1}\rangle\otimes\cdots\otimes|\mathcal{Q}_{i}\rangle\otimes\cdots\otimes|\mathcal{Q}_{s}\rangle$ , where $|\mathcal{Q}_{i}\rangle=\frac{1}{\sqrt{2}}(|0\rangle+(-1)^{\mathcal{Q}_{i}}|1\rangle)$ .

The probability of obtaining a bit string $(\mathcal{Q}_{1}\cdots\mathcal{Q}_{i}\cdots\mathcal{Q}_{s})$ is given by

	$\displaystyle\textrm{Pr}(\mathcal{Q}_{1}\cdots\mathcal{Q}_{s})$	$\displaystyle=\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}]$
		$\displaystyle=\frac{1}{2}(\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(1)}]+\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(2)}]$
		$\displaystyle+\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(3)}]+\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(4)}]),$

where the measurement operator

\displaystyle M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}=\Big{[}\bigotimes_{i=1}^{s}|\mathcal{Q}_{i}\rangle\langle\mathcal{Q}_{i}|\Big{]}\otimes I.

(27)

Hence, we have

	$\displaystyle\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(1)}]=\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\|0\rangle\langle 0\|^{\otimes s}\otimes\rho^{(n,m)}]=\frac{1}{2^{s}},$
	$\displaystyle\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(4)}]=\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\|1\rangle\langle 1\|^{\otimes s}\otimes\rho^{(n,m)}]=\frac{1}{2^{s}},$
	$\displaystyle\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(2)}]=\frac{(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}}{2^{s}}\textrm{Tr}(\rho^{(n,m)}(S^{(m)})^{{\dagger}}),$
	$\displaystyle\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(3)}]=\frac{(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}}{2^{s}}\textrm{Tr}(S^{(m)}\rho^{(n,m)}).$

Now the probability takes the form,

	$\displaystyle\textrm{Pr}(\mathcal{Q}_{1}\cdots\mathcal{Q}_{s})$	$\displaystyle=\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}]$
		$\displaystyle=\frac{1+(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}\textrm{Re}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})]}{2^{s}}.$		(28)

Thus, the mean of random variable $\hat{\mathcal{Q}}=(\mathcal{Q}_{1},\cdots,\mathcal{Q}_{s})$ is

$\displaystyle\mathbb{E}[\hat{\mathcal{Q}}]$	$\displaystyle=\sum_{\mathcal{Q}_{i}\in\{0,1\}}\textrm{Pr}(\mathcal{Q}_{1}\cdots\mathcal{Q}_{s})(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}$
	$\displaystyle=\sum_{\mathcal{Q}_{i}\in\{0,1\}}\frac{(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}+\textrm{Re}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})]}{2^{s}}$
	$\displaystyle=\textrm{Re}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})],$	(29)

where in the last equality we have utilized the property $\sum_{\mathcal{Q}_{i}\in\{0,1\}}(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}=0$ . The variance of $\hat{\mathcal{Q}}$ is given by

	$\displaystyle\textrm{Var}[\hat{\mathcal{Q}}]$	$\displaystyle=\mathbb{E}[\hat{\mathcal{Q}}^{2}]-(\mathbb{E}[\hat{\mathcal{Q}}])^{2}$
		$\displaystyle=1-\Big{(}\textrm{Re}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})]\Big{)}^{2}.$		(30)

Given a sample of size $\mathcal{N}$ . Consider $\mathcal{N}$ independent random variables $\hat{Q}^{(1)},\cdots,\hat{Q}^{(j)},\cdots,\hat{Q}^{(\mathcal{N})}$ , where each $\hat{Q}^{(j)}=(\hat{Q}_{1}^{(j)},\cdots,\hat{Q}_{i}^{(j)},\cdots,\hat{Q}_{s}^{(j)})$ , $\hat{Q}_{i}^{(j)}\in[0,1]$ for $j=1,\cdots,\mathcal{N}$ and $i=1,\cdots,s$ , corresponding to one measurement results via running the above circuit one time. The mean of $\hat{\mathcal{Q}}$ is then estimated by the sample mean,

\displaystyle\langle\hat{Q}\rangle=\sum_{j=1}^{\mathcal{N}}\frac{\hat{Q}^{(j)}}{\mathcal{N}}.

(31)

Let $\varepsilon\in{0,1}$ be a precision and $\delta$ a constant such that $\delta\in(0,1)$ . From the Hoeffding’s inequality Hoeffding1963Probability we have

\displaystyle\textrm{Pr}(|\langle\hat{Q}\rangle-\mathbb{E}[\hat{Q}]|\leq\varepsilon/2)\geq 1-\delta,

(32)

and the sample complexity $\mathcal{N}=\mathcal{O}(\varepsilon^{-2}\ln(\delta^{-1}))$ .

Similarly, one implements a phase gate (mapping $|0\rangle\rightarrow|0\rangle$ and $|1\rangle\rightarrow\iota|1\rangle$ , $\iota=\sqrt{-1}$ ) on each ancillary qubit before taking measurement to obtain the estimation of imaginary part. For the random variable $\hat{R}$ we yield a similar result,

\displaystyle\textrm{Pr}(|\langle\hat{R}\rangle-\mathbb{E}[\hat{R}]|\leq\varepsilon/2)\geq 1-\delta,

(33)

where the expectation value $\mathbb{E}[\hat{R}]=\mbox{Im}[\mbox{Tr}(S^{(m)}\rho^{(n,m)})]$ and the variance $\mbox{Var}[\hat{R}]=1-(\mbox{Im}[\mbox{Tr}(S^{(m)}\rho^{(n,m)})])^{2}$ . Define $\hat{V}=\{\hat{Q},\hat{R}\}$ . The mean of $\hat{V}$ is an estimation of the MT and satisfies $|\langle\hat{V}\rangle-\mathbb{E}[\hat{V}]|\leq\varepsilon$ , where the mean $\mathbb{E}[\hat{V}]=\mbox{Tr}(\rho_{1}\cdots\rho_{m})$ . The variance of $\hat{V}$ is given by $\mbox{Var}[\hat{V}]=1-(\mbox{Tr}(\rho_{1}\cdots\rho_{m}))^{2}$ .

Appendix B The proof of Theorem 2

We observe that the numerator of Eq. (4) is

\displaystyle\mbox{Tr}(O\rho^{m})=\mbox{Tr}(\tilde{O}^{(i)}S^{(m)}\rho\otimes\cdots\otimes\rho),

(34)

where the observable $\tilde{O}^{(i)}=I\otimes\cdots O^{(i)}\otimes\cdots\otimes I$ and $O^{(i)}$ denotes the operator $O$ acting on the $i$ th register. Let $O=\sum_{k=1}^{N_{o}}a_{k}P_{k}$ , $a_{k}\in\mathbb{R}$ , be an efficient decomposition of $O$ , where $P_{k}$ are tensor products of Pauli operators. It is straightforward to show that the trace $\mbox{Tr}(O\rho^{m})$ is a linear combination of $N_{o}$ MT estimations,

	$\displaystyle\mbox{Tr}(O\rho^{m})$	$\displaystyle=\sum_{k=1}^{N_{o}}a_{k}\mbox{Tr}(P_{k}\rho^{m})$		(35)
		$\displaystyle=\sum_{k=1}^{N_{o}}a_{k}\mbox{Tr}[P_{k}^{(i)}S^{(m)}(\rho\otimes\cdots\otimes\rho)].$		(36)

The real and imaginary parts of $\mbox{Tr}(O\rho^{m})$ can be estimated separately by using similar circuit procedure. Thus, we here only consider the estimation of the real part

\displaystyle\mbox{Re}[\mbox{Tr}(O\rho^{m})]=\sum_{k=1}^{N_{o}}a_{k}\mbox{Re}(\mbox{Tr}[P_{k}^{(i)}S^{(m)}(\rho\otimes\cdots\otimes\rho)]).

(37)

After implementing the sequences of controlled SWAP gate, we perform a controlled $P_{k}$ on an arbitrary register storing the state $\rho$ . Theorem 1 calculates $\mbox{Re}(\mbox{Tr}[P_{k}^{(i)}S^{(m)}(\rho\otimes\cdots\otimes\rho)])$ by producing a random variable $\hat{W}_{k}$ that can be calculated by using $\mathcal{O}(\varepsilon_{k}^{-2}\log(\delta^{-1}))$ repetitions of a quantum circuit (designed via propositions 1 and 2) consisting of $\mathcal{O}(mn)$ controlled SWAP gates such that

\displaystyle\mbox{Pr}(|\langle\hat{W}_{k}\rangle-\mbox{Re}(\mbox{Tr}(P_{k}\rho^{m}))|\leq\varepsilon_{k})\geq 1-\delta,

(38)

where $\varepsilon_{k}\in(0,1)$ , $\delta\in(0,1)$ and $\langle\hat{W}_{k}\rangle$ is the sample mean of variable $\hat{W}_{k}$ . The variance is $\mbox{Var}[\hat{W}_{k}]=1-|\textrm{Re}(\textrm{Tr}(P_{k}\rho^{m}))|^{2}$ . Let $\hat{\mathcal{W}}=\sum_{k=1}^{N_{o}}a_{k}\hat{W}_{k}$ be a new random variable. The mean of variable $\hat{\mathcal{W}}$ has the form,

\displaystyle\mathbb{E}[\hat{\mathcal{W}}]=\sum_{k=1}^{N_{o}}a_{k}\mathbb{E}[\hat{W}_{k}]=\sum_{k=1}^{N_{o}}a_{k}\mbox{Re}(\mbox{Tr}(P_{k}\rho^{m})).

(39)

Its variance is given by

$\displaystyle\mbox{Var}[\hat{\mathcal{W}}]$	$\displaystyle=\sum_{k=1}^{N_{o}}\|a_{k}\|^{2}\mbox{Var}[\hat{W}_{k}]$
	$\displaystyle=\sum_{k=1}^{N_{o}}\|a_{k}\|^{2}[1-\left(\mbox{Re}(\mbox{Tr}(P_{k}\rho^{m}))\right)^{2}]$
	$\displaystyle\leq\sum_{k=1}^{N_{o}}\|a_{k}\|^{2},$	(40)

where the last inequality is due to the facts that $\hat{W}_{1},\cdots,\hat{W}_{N_{o}}$ are independent and each quantity $\mbox{Var}[\hat{W}_{k}]\leq 1$ . We remark that the quantity $\left(\sum_{k=1}^{N_{o}}|a_{k}|\right)^{2}$ indicates the spread of data from mean $\mathbb{E}[\hat{\mathcal{W}}]$ . Again, the mean $\mathbb{E}[\hat{\mathcal{W}}]$ can be calculated by repeating the procedure $\mathcal{N}_{f}$ times, such that $\mathbb{E}[\hat{\mathcal{W}}]\approx\langle\hat{\mathcal{W}}\rangle=\frac{1}{\mathcal{N}_{f}}\sum_{l=1}^{\mathcal{N}_{f}}\hat{\mathcal{W}}^{(l)}$ , where $\hat{\mathcal{W}}^{(l)}$ is the measurement result on the $l$ -th iteration. Moreover, the error of the estimator is

$\displaystyle\|\langle\hat{\mathcal{W}}\rangle-\textrm{Re}(\mbox{Tr}(O\rho^{m}))\|$	$\displaystyle=\left\|\langle\hat{\mathcal{W}}\rangle-\mathbb{E}[\hat{\mathcal{W}}]\right\|$
	$\displaystyle=\left\|\sum_{k=1}^{N_{o}}a_{k}\left(\langle\hat{W}_{k}\rangle-\textrm{Re}[\mbox{Tr}(P_{k}\rho^{m})]\right)\right\|$
	$\displaystyle\leq\sum_{k=1}^{N_{o}}\left\|a_{k}\right\|\varepsilon_{k}<\varepsilon,$	(41)

where in the last inequality we have set $\varepsilon_{1}=\cdots=\varepsilon_{k}=\varepsilon/\sum_{k=1}^{N_{o}}|a_{k}|$ , and the last equation utilizes the Eq. (38). Using the same trick, we can estimate the imaginary part.

For $k$ runs from $1$ to $N_{o}$ , the sample complexity is

	$\displaystyle\mathcal{N}_{f}$	$\displaystyle=\sum_{k=1}^{N_{o}}\frac{1}{\varepsilon_{k}^{2}}\ln\frac{1}{\delta}=N_{o}\frac{(\sum_{k=1}^{N_{o}}\|a_{k}\|)^{2}}{\varepsilon^{2}}\ln\frac{1}{\delta}$
		$\displaystyle=\mathcal{O}\Big{(}\frac{N_{o}c^{2}}{\varepsilon^{2}}\ln\frac{1}{\delta}\Big{)}.$		(42)

Therefore, the total number of copies of $\rho$ is $\mathcal{O}\Big{(}\frac{mN_{o}c^{2}}{\varepsilon^{2}}\ln\frac{1}{\delta}\Big{)}$ . We set the quantity $\sum_{k=1}^{N_{o}}|a_{k}|=\mathcal{O}(c)$ bounded by a constant $c$ . Back to Eq. (B), the variance is also bounded since $\sum_{k=1}^{N_{o}}|a_{k}|^{2}\leq\left(\sum_{k=1}^{N_{o}}|a_{k}|\right)^{2}$ .

References

(1) P. Shor, in Symposium on Foundations of Computer Science (IEEE, Piscataway, NJ, 1994), pp. 124-134.
(2) A. W. Harrow, A. Hassidim, and S. Lloyd, Phys. Rev. Lett. 103, 150502 (2009).
(3) J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Nature 549, 195 (2017).
(4) J.-M. Liang, S.-Q. Shen, M. Li, and L. Li, Phys. Rev. A 99, 052310 (2019).
(5) M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and P. J. Coles, Nat. Rev. Phys 3, 625 (2021).
(6) K. Bharti, A. Cervera-Lierta, T. H. Kyaw, T. Haug, S. Alperin-Lea, A. Anand, M. Degroote, H. Heimonen, J. S. Kottmann, T. Menke, W.-K. Mok, S. Sim, L.-C. Kwek, and A. Aspuru-Guzik, Rev. Mod. Phys. 94, 015004 (2022).
(7) S. J Devitt, W. J. Munro and K. Nemoto, Quantum error correction for beginners, Rep. Prog. Phys. 76, 076001 (2013).
(8) D. Lidar and T. Brun, Quantum Error Correction (Cambridge University Press, Cambridge, England, 2013).
(9) J. Preskill, Quantum 2, 79 (2018).
(10) S. Endo, Z. Cai, S. C. Benjamin, and X. Yuan, J. Phys. Soc. Jpn. 90, 032001 (2021)
(11) S.-X. Zhang, Z.-Q. Wan, C.-Y. Hsieh, H. Yao, and S. Zhang, arXiv:2112.10380.
(12) D. Bultrini, M. H. Gordon, P. Czarnik, A. Arrasmith, P. J. Coles, and L. Cincio, arXiv:2107.13470.
(13) Y. Yang, B.-N. Lu, and Y. Li, PRX Quantum 2, 040361 (2021).
(14) Z. Cai, arXiv:2110.05389.
(15) R. Takagi, H. Tajima, and M. Gu, arXiv:2208.09178.
(16) M. Huo and Y. Li, Phys. Rev. A 105, 022427 (2022).
(17) P. Czarnik, M. McKerns, A. T. Sornborger, and L. Cincio, arXiv:2204.07109.
(18) N. Cao, J. Lin, D. Kribs, Y.-T. Poon, B. Zeng, and R. Laflamme, arXiv:2111.02345.
(19) J.-M. Liang, Q.-Q. Lv, S.-Q. Shen, M. Li, Z.-X. Wang, and S.-M. Fei, Adv. Quantum Technol. 2200090 (2022).
(20) S. McArdle, T. Jones, S. Endo, Ying Li, S. C. Benjamin, and X. Yuan, npj Quantum Inf. 5, 75 (2019).
(21) K. Temme, S. Bravyi, and J. M. Gambetta, Phys. Rev. Lett. 119, 180509 (2017).
(22) Y. Li and S. C. Benjamin, Phys. Rev. X 7, 021050 (2017).
(23) S. Endo, S. C. Benjamin, and Y. Li, Phys. Rev. X 8, 031027 (2018).
(24) N. Yoshioka, H. Hakoshima, Y. Matsuzaki, Y. Tokunaga, Y. Suzuki, and S. Endo, Phys. Rev. Lett. 129, 020502 (2022).
(25) B. Koczor, Phys. Rev. X 11, 031057 (2021).
(26) W. J. Huggins, S. McArdle, T. E. O’Brien, J. Lee, N. C. Rubin, S. Boixo, K. B. Whaley, R. Babbush, and J. R. McClean, Phys. Rev. X 11, 041036 (2021).
(27) Y. Quek, M. M. Wilde, and E. Kaur, arXiv:2206.15405.
(28) H. Yao and X.-L. Qi, Phys. Rev. Lett. 105, 080501 (2010).
(29) S. Johri, D. S. Steiger, and M. Troyer, Phys. Rev. B 96, 195136 (2017).
(30) A. K. Ekert, C. M. Alves, D. K. L. Oi, M. Horodecki, P. Horodecki, and L. C. Kwek, Phys. Rev. Lett. 88, 217901 (2002).
(31) P. Czarnik, A. Arrasmith, L. Cincio, and P. J. Coles, arXiv:2102.06056.
(32) J. L. Beckey, N. Gigena, P. J. Coles, and M. Cerezo, Phys. Rev. Lett. 127, 140501 (2021).
(33) Z. Cai, arXiv:2107.07279.
(34) W. Hoeffding, J. Am. Stat. Assoc. 58, 13 (1963).
(35) A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, Nat. Commun. 5, 4213 (2014).
(36) A. Roggero and A. Baroni, Phys. Rev, A 101, 022328 (2020).
(37) C. G. Small, Expansions and Asymptotics for Statistics (1st ed.). Chapman and Hall/CRC, 2010.
(38) J. Yirka and Y. Subasi, Quantum 5, 535 (2021).
(39) K. Yamamoto, S. Endo, H. Hakoshima, Y. Matsuzaki, and Y. Tokunaga, arXiv:2112.01850.
(40) Y. Zhou and Z. Liu, arXiv:2208.08416.
(41) A. Elben, S. T. Flammia, H.-Y. Huang, R. Kueng, J. Preskill, B. Vermersch, and P. Zoller, Nat. Rev. Phys. (2022).
(42) J. C. Garcia-Escartin and P. Chamorro-Posada, Phys. Rev. A 87, 052330 (2013).
(43) L. Cincio, Y. Subaşı, A. T. Sornborger, and P. J. Coles, New J. Phys. 20 113022 (2018).
(44) M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge Univ. Press, 2000).
(45) C. Tsallis, J. Stat. Phys. 52, 479 (1988).
(46) K. C. Tan and T. Volkoff, Phys. Rev. Research 3, 033251 (2021).
(47) S. M. Lee, J. Lee, and J. Bang, Phys. Rev. A 98, 052302 (2018).

$\displaystyle\mbox{Var}[\hat{\mathcal{W}}]$	$\displaystyle=\sum_{k=1}^{N_{o}}\|a_{k}\|^{2}\mbox{Var}[\hat{W}_{k}]$
	$\displaystyle=\sum_{k=1}^{N_{o}}\|a_{k}\|^{2}[1-\left(\mbox{Re}(\mbox{Tr}(P_{k}\rho^{m}))\right)^{2}]$
	$\displaystyle\leq\sum_{k=1}^{N_{o}}\|a_{k}\|^{2},$	(40)

$\displaystyle\|\langle\hat{\mathcal{W}}\rangle-\textrm{Re}(\mbox{Tr}(O\rho^{m}))\|$	$\displaystyle=\left\|\langle\hat{\mathcal{W}}\rangle-\mathbb{E}[\hat{\mathcal{W}}]\right\|$
	$\displaystyle=\left\|\sum_{k=1}^{N_{o}}a_{k}\left(\langle\hat{W}_{k}\rangle-\textrm{Re}[\mbox{Tr}(P_{k}\rho^{m})]\right)\right\|$
	$\displaystyle\leq\sum_{k=1}^{N_{o}}\left\|a_{k}\right\|\varepsilon_{k}<\varepsilon,$	(41)