This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Unified multivariate trace estimation and quantum error mitigation

Jin-Min Liang [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China    Qiao-Qiao Lv [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China    Zhi-Xi Wang [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China    Shao-Ming Fei [email protected] School of Mathematical Sciences, Capital Normal University, Beijing 100048, China
Abstract

Calculating the trace of the product of mm nn-qubit density matrices (multivariate trace) is a crucial subroutine in quantum error mitigation and information measures estimation. We propose an unified multivariate trace estimation (UMT) which conceptually unifies the previous qubit-optimal and depth-optimal approaches with tunable quantum circuit depth and the number of qubits. The constructed circuits have (m1)/s\lceil(m-1)/s\rceil or n(m1)/sn\lceil(m-1)/s\rceil depth corresponding to (s+m)n(s+m)n or s+mns+mn qubits for s{1,,m/2}s\in\{1,\cdots,\lfloor m/2\rfloor\}, respectively. Such flexible circuit structures enable people to choose suitable circuits according different hardware devices. We apply UMT to virtual distillation for achieving exponential error suppression and design a family of concrete circuits to calculate the trace of the product of 88 and 99 nn-qubit density matrices. Numerical example shows that the additional circuits still mitigate the noise expectation value under the global depolarizing channel.

I Introduction

Fault-tolerant quantum (FTQ) computers may provide novel computational advantages over classical computers for many tasks Shor1994 ; Harrow2009Quantum ; Biamonte2017Quantum ; Liang2019Quantum . As the perfect FTQ computers are not available yet, a preferable substitute is the the noisy intermediate-scale quantum (NISQ) devices with limited quantum resources Cerezo2021Variational ; Bharti2022Noisy . Nevertheless, the implementation of both FTQ and NISQ devices hinges on the effective control of noise. Thus, error correction and mitigation are of fundamental importance in quantum computation. Aiming at efficiently approximating the desired output states, the quantum error correction (QEC) provides a theoretical blueprint enabling quantum computation in an arbitrary small error level Devitt2013Quantum ; Lidar2013Quantum . However, duo to the larger qubit count, extra circuit complexity and other additional operations Preskill2018Quantum , the overhead of QEC is too large to be available for practical applications. Therefore, a variety of quantum error mitigation (QEM) approaches Endo2021Hybrid ; Zhang2021Variational ; Bultrini2021Unifying ; Yang2021Accelerated ; Cai2021A ; Takagi2022Universal ; Huo2022Dual ; Czarnik2022Improving for NISQ algorithms have been presented instead for QEC. Different from QEC, QEM focuses on recovering the ideal measurement statistics (usually the expectation values) Cao2022NISQ and can be directly employed in the ground state preparation Liang2022Improved ; McArdle2019Variational . For instance, the error extrapolation technique utilizes different error rates to the zero noise limit Temme2017Error ; Li2017Efficient ; Endo2018Practical . In particular, probabilistically implementing the inverse process can mitigate the noise effect on computation for some noise channels Temme2017Error . However, such error mitigation techniques rely on the prior knowledge on the noise model whose characterization is expensive.

The generalized quantum subspace expansion (GSE) Yoshioka2022Generalized and the virtual distillation (VD) methods Koczor2021Exponential ; Huggins2021Virtual do not require any information of the noise. The GSE method optimizes states from a subspace expanded by a small subset of Pauli operators. It is effective against coherent errors Endo2021Hybrid . The VD method prepares mm copies nn-qubit noisy state ρ=E0|u0u0|+k=12n1Ek|ukuk|\rho=E_{0}|u_{0}\rangle\langle u_{0}|+\sum_{k=1}^{2^{n}-1}E_{k}|u_{k}\rangle\langle u_{k}| in spectral decomposition, E0>E1E2n1E_{0}>E_{1}\geq\cdots\geq E_{2^{n}-1}, and calculates the expectation value of an observable OO with respect to the state ρm/Tr(ρm)\rho^{m}/\textrm{Tr}(\rho^{m}),

Ovd(m)=Tr(Oρm)Tr(ρm)=Oexact[1+𝒪(E1m/E0m)],\displaystyle\langle O\rangle_{\textrm{vd}}^{(m)}=\frac{\mbox{Tr}(O\rho^{m})}{\mbox{Tr}(\rho^{m})}=\langle O\rangle_{\textrm{exact}}[1+\mathcal{O}(E_{1}^{m}/E_{0}^{m})], (1)

which exponentially gets closer to the ideal value Oexact=u0|O|u0\langle O\rangle_{\mbox{exact}}=\langle u_{0}|O|u_{0}\rangle. The core idea behind this claim is that the state ρm/Tr(ρm)\rho^{m}/\mbox{Tr}(\rho^{m}) approaches to the desired pure state |u0|u_{0}\rangle exponentially in mm. In order to obtain an exponential error suppression, an efficient quantum algorithm for measuring Eq. (1) is required.

Generally, instead of the mm copies of the nn-qubit noisy state ρ\rho, one may consider mm different nn-qubit states ρ1,,ρm\rho_{1},\cdots,\rho_{m}. The Tr(ρm)\textrm{Tr}(\rho^{m}) in Eq. (1) is then replaced by a more general quantity Tr(ρ1ρm)\mbox{Tr}(\rho_{1}\cdots\rho_{m}) which is called multivariate trace (MT) first introduced in the work Quek2022Multivariate . Measuring MT is of fundamental and practical interest in quantum information processing such as the calculation of the Rényi entropy, entanglement entropy Yao2010Entanglement and the entanglement spectroscopy Johri2017Entanglement . Broadly speaking, the estimation of MT on a quantum computer is currently tackled by using either the qubit-optimal approach Ekert2002Direct or depth-optimal method Quek2022Multivariate . Here, the qubit-optimal means that the number of ancilla qubits is optimal, and the depth-optimal stands for that the circuit depth is minimal. The qubit-optimal approach requires only single ancilla qubit and a Θ(m)\Theta(m)-depth circuit, whereas the depth-optimal method requires m/2\lfloor m/2\rfloor ancilla qubits and a constant depth circuit, where \lfloor\cdot\rfloor denotes the floor function. The former method is prohibited due to linear depth in mm. Meanwhile, the latter method has an attractive depth for NISQ devices, but the linear number of needed qubits in mm would restrict its application to small mm. Although a recent work Czarnik2021Qubit drastically reduces the qubit resource by utilizing qubit resets technique, the circuit depth is still Θ(m)\Theta(m). Thus, limited qubit number and circuit depth may hinder the advantage of the VD method in current NISQ era.

Accounting to the fact that the error accumulates with the increasing of the number of qubits and the depth of quantum circuit, in this work we provide a quantum algorithm to calculate the corrected expectation value Eq. (1) by constructing a family of circuits which have different circuit depth and number of qubits. As the authors Quek2022Multivariate pointed out that their circuit is flexible and can be adjusted as different depth and number of qubits. Thus, in this work we first mathematically establish a specific trade-off relation between the number of qubits and the circuit depth. The circuit depth and the number of qubits can be denoted as a function of a free parameter ss. From the variation ss, there are m/2\lfloor m/2\rfloor different circuit structures with the same number of quantum gates. Based on the constructed circuits, we propose an unified multivariate estimation (UMT), which is capable of calculating Tr(ρ1ρm)\mbox{Tr}(\rho_{1}\cdots\rho_{m}) with a tunable circuit depth and number of qubits. The existing qubit-optimal and depth-optimal algorithms are two extremal cases of our algorithm for s=1s=1 and m/2\lfloor m/2\rfloor. Furthermore, we apply UMT to achieve the exponential error suppression and give a family of concrete circuits for 88 and 99 nn-qubit density matrices. Finally, we simulate the effects of the global depolarizing channel in the process of estimating Ovd(5)\langle O\rangle_{\textrm{vd}}^{(5)} for a two qubits state ρ\rho.

II Unified multivariate trace estimation

The MT is defined as

Tr(ρ1ρm):=Tr[S(m)(ρ1ρm)],\displaystyle\mbox{Tr}(\rho_{1}\cdots\rho_{m}):=\mbox{Tr}[S^{(m)}(\rho_{1}\otimes\cdots\otimes\rho_{m})], (2)

where S(m)S^{(m)} is a unitary representation of the cyclic shift permutation π=(1,2,,m)\pi=(1,2,\cdots,m): S(m)|ψ1|ψ2|ψm=|ψm|ψ1|ψm1S^{(m)}|\psi_{1}\rangle\otimes|\psi_{2}\rangle\otimes\cdots\otimes|\psi_{m}\rangle=|\psi_{m}\rangle\otimes|\psi_{1}\rangle\otimes\cdots\otimes|\psi_{m-1}\rangle for pure states |ψ1,,|ψm|\psi_{1}\rangle,\cdots,|\psi_{m}\rangle. Note that for m=2m=2, Tr[ρ1ρ2]=Tr[S(2)(ρ1ρ2)]\mbox{Tr}[\rho_{1}\rho_{2}]=\mbox{Tr}[S^{(2)}(\rho_{1}\otimes\rho_{2})] with S(2)S^{(2)} the SWAP operator. Eq. (2) means that the MT can be estimated by calculating the real and imaginary parts of Tr[S(m)(ρ1ρm)]\mbox{Tr}[S^{(m)}(\rho_{1}\otimes\cdots\otimes\rho_{m})]. Following the framework of Ref. Ekert2002Direct , a crucial step is to perform the controlled unitary 𝒞(S(m))\mathcal{C}(S^{(m)}) with respect to S(m)S^{(m)}, 𝒞(S(m))=|00|I+|11|S(m)\mathcal{C}(S^{(m)})=|0\rangle\langle 0|\otimes I+|1\rangle\langle 1|\otimes S^{(m)}, where II denotes the identity. In this section, we construct a sequence of alternative circuits to achieve the operation 𝒞(S(m))\mathcal{C}(S^{(m)}).

II.1 The qubit-depth trade-off

Proposition 1 (Qubit-depth trade-off).

For a given set of nn-qubit states {ρ1,,ρm}\{\rho_{1},\cdots,\rho_{m}\}, there exists a family of quantum circuits with depth nh(m,s)nh(m,s) and (s+mn)(s+mn) qubits to achieve the operation 𝒞(S(m))\mathcal{C}(S^{(m)}), s=1,,m/2s=1,\cdots,\lfloor m/2\rfloor. The depth function h(m,s)=(m1)/s[2,m1]h(m,s)=\lceil(m-1)/s\rceil\in[2,m-1] is a monotonically decreasing function of the variable ss.

The proposition 1 is based on the decomposition of the permutation cycle π\pi Quek2022Multivariate ,

π={j=2m/2(j,m+2j)i=1m/2(i,m+1i),mevenj=2m/2(j,m+2j)i=1m/2(i,m+1i),modd,\pi=\left\{\begin{aligned} &\prod_{j=2}^{m/2}(j,m+2-j)\prod_{i=1}^{m/2}(i,m+1-i),&m&~{}even\\ &\prod_{j=2}^{\lceil m/2\rceil}(j,m+2-j)\prod_{i=1}^{\lfloor m/2\rfloor}(i,m+1-i),&m&~{}odd\end{aligned}\right.,

where all arithmetic is modulo mm. π\pi can be decomposed into a product of (m1)(m-1) transpositions. The circuit of proposition 1 contains an ancillary register (AR) and a work register (WR). The WR stores mm density matrices ρ1,,ρm\rho_{1},\cdots,\rho_{m}. The AR stores an ss-qubit GHZ state, |GHZs=12(|0s+|1s)|\mbox{GHZ}_{s}\rangle=\frac{1}{\sqrt{2}}(|0\rangle^{\otimes s}+|1\rangle^{\otimes s}) which controls the SWAP operation between two different density matrices ρl,ρk\rho_{l},\rho_{k} for l,k=1,,ml,k=1,\cdots,m. Each qubit of |GHZs|\mbox{GHZ}_{s}\rangle controls one transposition. Thus, the ss-qubit GHZ state controls ss transpositions at one time. m1m-1 transpositions can be controlled at most (m1)/s\lceil(m-1)/s\rceil times. Each SWAP operations can be decomposed into nn controlled SWAP gates. Thus, the total depth is n(m1)/sn\lceil(m-1)/s\rceil. Here we remark that by using the qubit reset technique and the middle measurement, the ss-qubit GHZ state |GHZs|\mbox{GHZ}_{s}\rangle can be prepared by a constant-depth quantum circuit (independent of ss). The number of qubits needed is ss or 2(s1)2(s-1) when ss is even or odd, respectively Quek2022Multivariate . In general, for preparing |GHZs|\mbox{GHZ}_{s}\rangle, a coherent quantum circuit has depth 𝒪(s)\mathcal{O}(s). In this work, we do not consider the circuit depth of preparing state |GHZs|\mbox{GHZ}_{s}\rangle and density matrices ρ1,,ρm\rho_{1},\cdots,\rho_{m}.

Notice that s=1s=1 and s=m/2s=\lfloor m/2\rfloor cover the results of the qubit-optimal method Ekert2002Direct and the depth-optimal approach Quek2022Multivariate . With these two extremal cases, altogether there are m/2\lfloor m/2\rfloor optional circuits. In the work Ekert2002Direct the authors introduced a single ancilla qubit and implemented a controlled unitary S(m)S^{(m)} with depth (m1)n(m-1)n. The circuit presented in Quek2022Multivariate has depth 2n2n and m/2\lfloor m/2\rfloor ancilla qubits. The quantum circuits in Ekert2002Direct and Quek2022Multivariate can be seen as nn parallelized sub-circuits with only single or m/2\lfloor m/2\rfloor ancilla qubits, respectively Beckey2021Computable ; Cai2021Resource . Fig. 1 illustrates three mathematically equivalent quantum circuits for estimating Tr(ρ1ρ2ρ3)\mbox{Tr}(\rho_{1}\rho_{2}\rho_{3}) with single ancillary qubit.

Refer to caption
Figure 1: Quantum circuits for estimating the real part of Tr(ρ1ρ2ρ3)\mbox{Tr}(\rho_{1}\rho_{2}\rho_{3}), where states ρ1,ρ2,ρ3\rho_{1},\rho_{2},\rho_{3} are two-qubit ones. (a) The circuit in the work Ekert2002Direct . The number of ancilla qubits is 11 and the circuit depth is 44. (b) and (c) are the parallelized versions of the circuit (a). The corresponding circuit depth is 22 and the number of ancilla qubits is 22. The 88-qubit quantum circuit (b) can be seen as 22 parallelized circuits with each 11 ancilla qubit, as shown in (c).

Generalizing proposition 1 in the parallelized scenario, we have

Proposition 2 (Qubit-depth trade-off in parallelized scenario).

Given a set of nn-qubit states {ρ1,,ρm}\{\rho_{1},\cdots,\rho_{m}\}. There is a family of quantum circuits for achieving the operation 𝒞(S(m))\mathcal{C}(S^{(m)}). These circuits have depth h(m,s)=(m1)/sh(m,s)=\lceil(m-1)/s\rceil and (s+m)n(s+m)n qubits, s=1,,m/2s=1,\cdots,\lfloor m/2\rfloor.

The circuits in proposition 2 are the parallelized versions of the Proposition 1, in which the AR consists of nn ss-qubit GHZ states. Each ss-qubit GHZ state achieves the controlled SWAP operation on the single qubit subspace of the state ρi\rho_{i}, i=1,,mi=1,\cdots,m. Moreover, in the case that m=2m=2 and ρ1=ρ2=ρ\rho_{1}=\rho_{2}=\rho, we recover the circuit for estimating the purity Tr(ρ2)\mbox{Tr}(\rho^{2}) presented in Beckey2021Computable as a special case of our approach that s=m/2=1s=\lfloor m/2\rfloor=1 and h(2,1)=1h(2,1)=1. In particular, when s=2s=2, the circuit depth is h(m,2)=m/2h(m,2)=m/2 and h(m,2)[m/2,m/2]h(m,2)\in[\lfloor m/2\rfloor,\lceil m/2\rceil] for even and odd, respectively. This property guarantees that only 2n2n or 22 additional ancilla qubits can reduce the depth by half. Proposition 1 and 2 show that the number of ancilla qubits and the circuit depth are tunable according to different hardware devices.

II.2 UMT estimation

Given the equivalent circuit construction of the controlled S(m)S^{(m)} operation among density matrices, we perform a measurement in the Pauli σx\sigma_{x} basis on each ancilla qubit. The expectation value gives an estimation of Tr(ρ1ρm)\mbox{Tr}(\rho_{1}\cdots\rho_{m}). We have the following Theorem, see proof in Appendix A.

Theorem 1 (UMT estimation).

Given a set of nn-qubit states {ρ1,,ρm}\{\rho_{1},\cdots,\rho_{m}\} and fixed error ε>0\varepsilon>0 and δ(0,1)\delta\in(0,1). UMT estimation calculates the quantity Eq. (2) by the sample mean V^\langle\hat{V}\rangle of a random variable V^\hat{V} that can be produced using 𝒪(ε2log(δ1))\mathcal{O}(\varepsilon^{-2}\log(\delta^{-1})) repetitions of a quantum circuit, constructed via the Propositions 1 and 2, consisting of 𝒪(mn)\mathcal{O}(mn) controlled SWAP gates such that

Pr(|V^Tr(ρ1ρm)|ε)1δ\displaystyle\emph{Pr}(|\langle\hat{V}\rangle-\mbox{Tr}(\rho_{1}\cdots\rho_{m})|\leq\varepsilon)\geq 1-\delta (3)

for s=1,,m/2s=1,\cdots,\lfloor m/2\rfloor.

Theorem 1 gives an analysis on the sample complexity guaranteed by the Hoeffding’s inequality Hoeffding1963Probability . The number of quantum gates is 𝒪(mn)\mathcal{O}(mn) for different circuit structures.

III UMT for virtual distillation

A direct application of UMT is to the quantum error mitigation Huggins2021Virtual ; Koczor2021Exponential . Suppose that the near-term quantum devices aim to prepare an nn-qubit pure state |ϕ|\phi\rangle. However, owing to the effect of environment noise, one prepares instead a mixed state ρ=𝒞(|ϕϕ|)\rho=\mathcal{C}(|\phi\rangle\langle\phi|), where the operation 𝒞\mathcal{C} is a map containing a unitary evolution and a noise channel such as depolarizing channel. The error-free expected value of an Hermitian operator OO is O=Tr(O|ϕϕ|)\langle O\rangle=\mbox{Tr}(O|\phi\rangle\langle\phi|). However, the noisy expected value is Onoise=Tr(Oρ)O\langle O\rangle_{\textrm{noise}}=\mbox{Tr}(O\rho)\neq\langle O\rangle. Virtual distillation provides a method to approximate O\langle O\rangle as a corrected expectation value

Ovd(m)=Tr(Oρm)Tr(ρm),\displaystyle\langle O\rangle_{\textrm{vd}}^{(m)}=\frac{\mbox{Tr}(O\rho^{m})}{\mbox{Tr}(\rho^{m})}, (4)

by mm copies of the mixed state ρ\rho.

III.1 Estimating the corrected expectation value with UMT

It is clear to see that the denominator in Eq. (4) can be evaluated by employing Theorem 1 with setting ρ1==ρm=ρ\rho_{1}=\cdots=\rho_{m}=\rho. The numerator of Eq. (4) is

Tr(Oρm)=Tr(O~(i)S(m)ρρ),\displaystyle\textrm{Tr}(O\rho^{m})=\textrm{Tr}(\tilde{O}^{(i)}S^{(m)}\rho\otimes\cdots\otimes\rho), (5)

where the observable O~(i)=IO(i)I\tilde{O}^{(i)}=I\otimes\cdots O^{(i)}\otimes\cdots\otimes I and O(i)O^{(i)} denotes the operator OO acting on the iith register which stores the iith copies of ρ\rho. Suppose an efficient decomposition

O=k=1NoakPk,ak,\displaystyle O=\sum_{k=1}^{N_{o}}a_{k}P_{k},~{}~{}a_{k}\in\mathbb{R}, (6)

where Pk=σk1σknP_{k}=\sigma_{k_{1}}\otimes\cdots\otimes\sigma_{k_{n}} are tensor products of Pauli operators σk1,,σkn{σx,σy,σz,I}\sigma_{k_{1}},\cdots,\sigma_{k_{n}}\in\{\sigma_{x},\sigma_{y},\sigma_{z},I\} and the quantity k=1No|ak|=𝒪(c)\sum_{k=1}^{N_{o}}|a_{k}|=\mathcal{O}(c) is bounded by a constant cc. It is straightforward to show that

Tr(Oρm)=k=1NoakTr[P~k(i)S(m)(ρρ)],\displaystyle\textrm{Tr}(O\rho^{m})=\sum_{k=1}^{N_{o}}a_{k}\textrm{Tr}[\tilde{P}_{k}^{(i)}S^{(m)}(\rho\otimes\cdots\otimes\rho)], (7)

where P~k=IPk(i)I\tilde{P}_{k}=I\otimes\cdots P_{k}^{(i)}\otimes\cdots\otimes I. By preparing mm copies of the state ρ\rho, Theorem 2 (see proof in Appendix B) provides an estimator of the denominator.

Theorem 2 (Estimation of Tr(Oρm)\mbox{Tr}(O\rho^{m})).

Let ρ\rho be an nn-qubit noisy state. Given a Pauli decomposition of observable OO Eq. (6). For fixed precision ε>0\varepsilon>0, δ(0,1)\delta\in(0,1), and a constant cc there exists a quantum algorithm that estimates Tr(Oρm)\emph{Tr}(O\rho^{m}) within ε\varepsilon additive error with success probability 1δ1-\delta and requires 𝒪(mNoc2ε2log1δ)\mathcal{O}\Big{(}\frac{mN_{o}c^{2}}{\varepsilon^{2}}\log\frac{1}{\delta}\Big{)} copies of ρ\rho and 𝒪(Noc2ε2log1δ)\mathcal{O}\Big{(}\frac{N_{o}c^{2}}{\varepsilon^{2}}\log\frac{1}{\delta}\Big{)} repetitions of a quantum circuit (constructed via the propositions 1 and 2) consisting of 𝒪(mn)\mathcal{O}(mn) controlled SWAP gates for s=1,,m/2s=1,\cdots,\lfloor m/2\rfloor.

After implementing the sequences of controlled SWAP gate, we perform a controlled PkP_{k} on an arbitrary register storing the state ρ\rho. Then we measure the ancilla qubits in the basis of Pauli operators σx\sigma_{x} and σy\sigma_{y}. The measurement sample means are the real and imaginary parts of Tr(Oρm)\mbox{Tr}(O\rho^{m}). We remark that the quantity k=1No|ak|=c\sum_{k=1}^{N_{o}}|a_{k}|=c plays a great role in efficiently estimating the numerator Tr(Oρm)\mbox{Tr}(O\rho^{m}). Due to the fact that the sample complexity is linear in (k=1No|ak|)2\left(\sum_{k=1}^{N_{o}}|a_{k}|\right)^{2}, we thus expect that k=1No|ak|\sum_{k=1}^{N_{o}}|a_{k}| is bounded by a constant cc. This observation is intuitive. In variational quantum eigensolver Peruzzo2014Quantum and variational quantum simulation Cerezo2021Variational ; Bharti2022Noisy , one typical question is the estimation of the expectation values of Hamiltonian HH. The number of repetitions needed to obtain a precision ϵ\epsilon with operator averaging is similar to our result Roggero2020Short .

III.2 Approximations for mean and variance of a Ratio

The numerator and denominator are calculated via producing two independent variables X^\hat{X} and Y^\hat{Y}. Let X^=(X1,,X𝒩O)\hat{X}=(X_{1},\cdots,X_{\mathcal{N}_{O}}) and Y^=(Y1,,Y𝒩I)\hat{Y}=(Y_{1},\cdots,Y_{\mathcal{N}_{I}}) be two independent variables, denoting the sampling results after running the UMT and measuring the ancilla qubits. The sample means are given by

X^=j=1𝒩OXj𝒩O𝔼[X^]=Tr(Oρm),\displaystyle\langle\hat{X}\rangle=\frac{\sum_{j=1}^{\mathcal{N}_{O}}X_{j}}{\mathcal{N}_{O}}\approx\mathbb{E}[\hat{X}]=\textrm{Tr}(O\rho^{m}), (8)
Y^=j=1𝒩IYj𝒩I𝔼[Y^]=Tr(ρm),\displaystyle\langle\hat{Y}\rangle=\frac{\sum_{j=1}^{\mathcal{N}_{I}}Y_{j}}{\mathcal{N}_{I}}\approx\mathbb{E}[\hat{Y}]=\textrm{Tr}(\rho^{m}), (9)

with error O=I=𝒪(𝒩1/2)\mathcal{E}_{O}=\mathcal{E}_{I}=\mathcal{O}(\mathcal{N}^{-1/2}) such that

|X^𝔼[X^]|O,|Y^𝔼[Y^]|I,\displaystyle|\langle\hat{X}\rangle-\mathbb{E}[\hat{X}]|\leq\mathcal{E}_{O},~{}~{}|\langle\hat{Y}\rangle-\mathbb{E}[\hat{Y}]|\leq\mathcal{E}_{I}, (10)

where we have set 𝒩O=𝒩I=𝒩\mathcal{N}_{O}=\mathcal{N}_{I}=\mathcal{N} to represent the number of samples. Then, the expectation value of the ratio X^/Y^\langle\hat{X}\rangle/\langle\hat{Y}\rangle has an approximation, Small2010Expansions

𝔼[X^Y^]\displaystyle\mathbb{E}\Bigg{[}\frac{\langle\hat{X}\rangle}{\langle\hat{Y}\rangle}\Bigg{]} 𝔼[X^]𝔼[Y^]+𝔼[X^]Var[Y^]2𝒩𝔼[Y^]3\displaystyle\approx\frac{\mathbb{E}[\hat{X}]}{\mathbb{E}[\hat{Y}]}+\frac{\mathbb{E}[\hat{X}]\textrm{Var}[\hat{Y}]^{2}}{\mathcal{N}\mathbb{E}[\hat{Y}]^{3}}
=Tr(Oρm)Tr(ρm)+Tr(Oρm)Var[Y^]2𝒩Tr(ρm)3\displaystyle=\frac{\textrm{Tr}(O\rho^{m})}{\textrm{Tr}(\rho^{m})}+\frac{\textrm{Tr}(O\rho^{m})\textrm{Var}[\hat{Y}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{3}}
=Tr(Oρm)Tr(ρm)+Tr(Oρm)[1Tr(ρm)2]2𝒩Tr(ρm)3,\displaystyle=\frac{\textrm{Tr}(O\rho^{m})}{\textrm{Tr}(\rho^{m})}+\frac{\textrm{Tr}(O\rho^{m})[1-\textrm{Tr}(\rho^{m})^{2}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{3}}, (11)

with error 𝒪(𝒩1)\mathcal{O}(\mathcal{N}^{-1}). The approximation variance of the ratio X^/Y^\langle\hat{X}\rangle/\langle\hat{Y}\rangle Small2010Expansions is

Var[X^Y^]\displaystyle\textrm{Var}\Bigg{[}\frac{\langle\hat{X}\rangle}{\langle\hat{Y}\rangle}\Bigg{]} 𝔼[X^]2Var[Y^]2𝒩𝔼[Y^]4+Var[X^]2𝒩𝔼[Y^]2\displaystyle\approx\frac{\mathbb{E}[\hat{X}]^{2}\textrm{Var}[\hat{Y}]^{2}}{\mathcal{N}\mathbb{E}[\hat{Y}]^{4}}+\frac{\textrm{Var}[\hat{X}]^{2}}{\mathcal{N}\mathbb{E}[\hat{Y}]^{2}}
=Tr(Oρm)2Var[Y^]2𝒩Tr(ρm)4+Var[X^]2𝒩Tr(ρm)2\displaystyle=\frac{\textrm{Tr}(O\rho^{m})^{2}\textrm{Var}[\hat{Y}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{4}}+\frac{\textrm{Var}[\hat{X}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{2}}
=Tr(Oρm)2[1Tr(ρm)2]2𝒩Tr(ρm)4\displaystyle=\frac{\textrm{Tr}(O\rho^{m})^{2}[1-\textrm{Tr}(\rho^{m})^{2}]^{2}}{\mathcal{N}\textrm{Tr}(\rho^{m})^{4}}
+k=1No|ak|2[1Tr(Pkρm)2]𝒩Tr(ρm)2,\displaystyle+\frac{\sum_{k=1}^{N_{o}}|a_{k}|^{2}[1-\textrm{Tr}(P_{k}\rho^{m})^{2}]}{\mathcal{N}\textrm{Tr}(\rho^{m})^{2}}, (12)

with error 𝒪(𝒩1)\mathcal{O}(\mathcal{N}^{-1}), where we have used the following results,

Var[X^]=k=1No|ak|2[1Tr(Pkρm)2],\displaystyle\textrm{Var}[\hat{X}]=\sum_{k=1}^{N_{o}}|a_{k}|^{2}[1-\textrm{Tr}(P_{k}\rho^{m})^{2}], (13)
Var[Y^]=1Tr(ρm)2.\displaystyle\textrm{Var}[\hat{Y}]=1-\textrm{Tr}(\rho^{m})^{2}. (14)

In particular, when ρ\rho is a pure state, the variance reduces to

Var[X^Y^]k=1No|ak|2[1Tr(Pkρm)2]𝒩.\displaystyle\textrm{Var}\Bigg{[}\frac{\langle\hat{X}\rangle}{\langle\hat{Y}\rangle}\Bigg{]}\approx\frac{\sum_{k=1}^{N_{o}}|a_{k}|^{2}[1-\textrm{Tr}(P_{k}\rho^{m})^{2}]}{\mathcal{N}}. (15)

The variance estimation provides an approach to evaluate the required number of samples. Assuming a desired variance is Δ2\Delta^{2}, Eq. (III.2) implies that the number of samples

𝒩\displaystyle\mathcal{N} Tr(Oρm)2[1Tr(ρm)2]2Δ2Tr(ρm)4\displaystyle\approx\frac{\textrm{Tr}(O\rho^{m})^{2}[1-\textrm{Tr}(\rho^{m})^{2}]^{2}}{\Delta^{2}\textrm{Tr}(\rho^{m})^{4}}
+k=1No|ak|2[1Tr(Pkρm)2]Δ2Tr(ρm)2.\displaystyle+\frac{\sum_{k=1}^{N_{o}}|a_{k}|^{2}[1-\textrm{Tr}(P_{k}\rho^{m})^{2}]}{\Delta^{2}\textrm{Tr}(\rho^{m})^{2}}. (16)

In the work Huggins2021Virtual the authors analyzed the variance of the estimator for m=2m=2. Here, we present an approximation for the mean and variance of the estimator for arbitrary mm.

Refer to caption
Figure 2: Figures (a-d): The quantum circuits for estimating the real part of Tr(Oρ8)\textrm{Tr}(O\rho^{8}) for nn-qubit state ρ\rho. Figures (e-h): The quantum circuits for estimating the real part of Tr(Oρ9)\textrm{Tr}(O\rho^{9}) for nn-qubit state ρ\rho. Both circuits are from Proposition 1. When setting σk=I\sigma_{k}=I, the circuits cover the real parts of Tr(ρ8)\textrm{Tr}(\rho^{8}) and Tr(ρ9)\textrm{Tr}(\rho^{9}). The imaginary parts of these quantities can be estimated by measuring the ancilla qubits in the basis of Pauli operator σy\sigma_{y} and here ρ1==ρ8=ρ9=ρ\rho_{1}=\cdots=\rho_{8}=\rho_{9}=\rho.
Refer to caption
Figure 3: Figures (a-d): The quantum circuits for estimating the real part of Tr(Oρ8)\textrm{Tr}(O\rho^{8}) for nn-qubit state ρ\rho. (a) s=4s=4, h(8,4)=2h(8,4)=2; (b) s=3s=3, h(8,3)=3h(8,3)=3; (c) s=2s=2, h(8,2)=4h(8,2)=4 and (d) s=1s=1, h(8,1)=7h(8,1)=7. Figures (e-h): The quantum circuits for estimating the real part of Tr(Oρ9)\textrm{Tr}(O\rho^{9}) for nn-qubit state ρ\rho. (e) s=4s=4, h(9,4)=2h(9,4)=2; (f) s=3s=3, h(9,3)=4h(9,3)=4; (g) s=2s=2, h(9,2)=4h(9,2)=4 and (h) s=1s=1, h(9,1)=8h(9,1)=8. Both circuits are given by Proposition 2. When taking σk=I\sigma_{k}=I, the circuits cover the real parts of Tr(ρ8)\textrm{Tr}(\rho^{8}) and Tr(ρ9)\textrm{Tr}(\rho^{9}). The imaginary parts of these quantities can be estimated via the measurement on the ancilla qubits in the basis of Pauli operator σy\sigma_{y}.

III.3 Concrete construction of a family of circuits and noisy implementation

In this section, we first show the circuit construction of 88 and 99 nn-qubit density matrices. The permutations π8=(1,2,,8)\pi_{8}=(1,2,\cdots,8) and π9=(1,2,,9)\pi_{9}=(1,2,\cdots,9) have decompositions

π8\displaystyle\pi_{8} =(4,6)(3,7)(2,8)(4,5)(3,6)(2,7)(1,8),\displaystyle=(4,6)(3,7)(2,8)(4,5)(3,6)(2,7)(1,8), (17)
π9\displaystyle\pi_{9} =(5,6)(4,7)(3,8)(2,9)(4,6)(3,7)(2,8)(1,9),\displaystyle=(5,6)(4,7)(3,8)(2,9)(4,6)(3,7)(2,8)(1,9), (18)

where each of transpositions denotes the nn SWAP gates. Based on Proposition 1, Fig. 2 (a-d) show 44 circuits for computing Tr(Oρ8)\textrm{Tr}(O\rho^{8}) and Tr(ρ8)\textrm{Tr}(\rho^{8}) for s=1,2,3,4s=1,2,3,4. The total number of qubits is s+8ns+8n including ss ancilla qubits. The depth is 2n2n, 3n3n, 4n4n and 7n7n. Fig. 2 (e-h) show 44 circuits for computing Tr(Oρ9)\textrm{Tr}(O\rho^{9}) and Tr(ρ9)\textrm{Tr}(\rho^{9}) for s=1,2,3,4s=1,2,3,4. The total number of qubits is s+9ns+9n including ss ancilla qubits. The depth is 2n2n, 3n3n, 4n4n and 8n8n. Fig. 3 is a parallelized version of Fig. 2 as shown in Proposition 2. The total number of qubits is (s+8)n(s+8)n including snsn ancilla qubits for s=1,2,3,4s=1,2,3,4. The depth is 22, 33, 44 and 77, respectively.

Here, we consider the effect of noise in the quantum circuits for different width and depth. Suppose we prepare an exact state ρ=U(𝜶)(|00||00|)U(𝜶)\rho=U(\boldsymbol{\alpha})(|0\rangle\langle 0|\otimes|0\rangle\langle 0|)U^{{\dagger}}(\boldsymbol{\alpha}) by a parameterized quantum circuit

U(𝜶)=i=12CNOT×(Ry(αi)Ry(αi+1)),\displaystyle U(\boldsymbol{\alpha})=\prod_{i=1}^{2}\mbox{CNOT}\times(R_{y}(\alpha_{i})\otimes R_{y}(\alpha_{i+1})), (19)

where Ry(αi)=eιαi/2σyR_{y}(\alpha_{i})=e^{-\iota\alpha_{i}/2\sigma_{y}} is the rotation operator, CNOT denotes the controlled-NOT gate and the parameters (α1,α2,α3,α4)=(0.8147,0.1270,0.2785,0.5469)(\alpha_{1},\alpha_{2},\alpha_{3},\alpha_{4})=(0.8147,0.1270,0.2785,0.5469). The state ρ\rho after the depolarizing noise channel is given by

ρnoise=𝒞depo(γ0,ρ)=(1γ0)ρ+γ0I4,γ0(0,1).\displaystyle\rho_{\textrm{noise}}=\mathcal{C}_{\textrm{depo}}(\gamma_{0},\rho)=(1-\gamma_{0})\rho+\gamma_{0}\frac{I}{4},~{}~{}~{}\gamma_{0}\in(0,1). (20)

We measure the expectation value of observable O=(σz(1)+σz(2))/2O=(\sigma_{z}^{(1)}+\sigma_{z}^{(2)})/2 with respect to states ρ\rho and ρnoise\rho_{\textrm{noise}}. Fig. 4 shows two circuits for estimating Tr(Oρnoise5)\mbox{Tr}(O\rho_{\textrm{noise}}^{5}) and Tr(ρnoise5)\mbox{Tr}(\rho_{\textrm{noise}}^{5}). After each layer, we insert a depolarizing channel with parameter γ\gamma for simulating the noise effects. Two and four depolarizing channels are required for Fig. 4.(a) and (b), respectively. The ideal expectation value is O=Tr(Oρ)=0.7547\langle O\rangle=\mbox{Tr}(O\rho)=0.7547 and noise result Onoise=Tr(Oρnoise)=0.4528\langle O\rangle_{\textrm{noise}}=\mbox{Tr}(O\rho_{\textrm{noise}})=0.4528 for γ0=0.4\gamma_{0}=0.4. The corrected expectation value after VD and our circuits is given by

Ovd(5)=Tr(Oρnoise5)Tr(ρnoise5).\displaystyle\langle O\rangle_{\textrm{vd}}^{(5)}=\frac{\mbox{Tr}(O\rho_{\textrm{noise}}^{5})}{\mbox{Tr}(\rho_{\textrm{noise}}^{5})}. (21)

As shown in Table 1, by numerical calculation it is shown that our circuits can still mitigate the error even in the presence of the depolarizing noise channel.

Refer to caption
Figure 4: The quantum circuits for estimating the real part of Tr(Oρnoise5)\textrm{Tr}(O\rho_{\textrm{noise}}^{5}) for 22-qubit state ρnoise\rho_{\textrm{noise}} with σk=σz(1)\sigma_{k}=\sigma_{z}^{(1)} or σz(2)\sigma_{z}^{(2)}, where ρ1==ρ5=ρnoise\rho_{1}=\cdots=\rho_{5}=\rho_{\textrm{noise}}. (a) s=2s=2, h(5,2)=2h(5,2)=2; (b) s=1s=1, h(5,1)=4h(5,1)=4. When σk=I\sigma_{k}=I, the circuits cover the real part of Tr(ρnoise5)\textrm{Tr}(\rho_{\textrm{noise}}^{5}).
γ\gamma 0.2 0.4 0.6 0.8
Fig. 4.(a) Ovd(5)\langle O\rangle_{\textrm{vd}}^{(5)} 0.7546 0.7546 0.7546 0.7546
Fig. 4.(b) Ovd(5)\langle O\rangle_{\textrm{vd}}^{(5)} 0.7546 0.7546 0.7546 0.7546
Table 1: The corrected expectation value Ovd(5)\langle O\rangle_{\textrm{vd}}^{(5)} for different parameters γ\gamma.

IV Conclusion and discussion

We have proposed an unified quantum algorithm for estimating the multivariate trace. Our results depend on a qubit-depth trade-off relation which helps us to construct a family of circuits. The designed circuits have flexible depth and number of qubits, but the total number of quantum gates is always 𝒪(mn)\mathcal{O}(mn). These proposals can be used as an important subroutine for estimating entanglement spectroscopy Yirka2021Qubit , quantum metrology Yamamoto2022Error and calculating the nonlinear function of density matrix Ekert2002Direct . Moreover, we have applied the UMT to achieve the exponential error suppression for quantum error mitigation and numerically find that our circuits still work in the noise situation of the global depolarizing channel. Notice that the recent work Zhou2022A estimated the MT with randomized measurement and further reduced the overhead of qubits Elben2022The . However, the number of depth remains the same as the qubit-optimal method Ekert2002Direct . Our algorithm gives also an alternative of the quantum parts in Zhou2022A .

There are two proposals involving dual-state purification Huo2022Dual ; Cai2021Resource , in which only the single ancillary qubit and the implementation of the dual channel are required. The related framework utilizes qubit reset technique to reduce the number of qubits Cai2021Resource compared with our circuit for s=1s=1. However, the circuit depth is still 𝒪(m)\mathcal{O}(m). Our family of circuits provides alternatives to reduce the circuit depth under the consume of the number of qubits. In general, depth is a more important index than qubit overhead for a quantum circuit. It is also worth to remark that our approaches utilize ancillary system to estimate Tr(Oρm)\textrm{Tr}(O\rho^{m}). For m=2m=2, a well-known destructive SWAP test Escartin2013SWAP ; Cincio2018Learning achieves the same goal without ancillary qubit and, at the same time, also reduces the circuit depth to a constant. However, one requires to measure multiple qubits which may increase the measurement overhead.

Notably, we have dealt with the problem for arbitrary positive integer mm. Estimating a nonlinear function of quantum states is also of fundamental and practical interests. For instance, the fidelity involves a square root of a quantum state Nielsen2000Quantum , while the Tsallis entropies are defined by SαT(ρ)=11α[Tr(ρα)1]S^{T}_{\alpha}(\rho)=\frac{1}{1-\alpha}[\textrm{Tr}(\rho^{\alpha})-1] for α(0,1)(1,)\alpha\in(0,1)\cup(1,\infty) Tsallis1988Possible . Although direct estimation is hard on a quantum computer, a hybrid quantum-classical framework Tan2021Variational makes the computation plausible by combining quantum state learning Lee2018Learning and approximating fractional powers. The core idea is to minimize the quantum purity which involves an estimation of MT. Thus, our proposals can be generalized to calculate a nonlinear function of quantum states in a roundabout way.

Several interesting issues should further be investigated in the near future. The first one is to simulate the effects of different types of noise in the circuit implementation of estimating Tr(Oρm)\textrm{Tr}(O\rho^{m}) such as the amplitude damping and phase damping channels. It would be also interesting to explore circuit structures that reduce the width and depth simultaneously. However, for the calculation of MT estimation, the qubit-depth trade-off shows that the circuit depth and width are complementary computational resources. In particular, a reduction in circuit depth is often accompanied by an increase in width, and vice versa. Current available quantum computation devices often have small size in number of qubits and circuit depth. Thus, from the view of computational resources the circuit depth and width should be reduced as much as possible in quantum algorithm design. Due to the friendly decomposition of the cyclic permutation, the circuit depth in our algorithms is reduced coincidentally, although the number of ancilla qubits is increased in a control range. Our results may highlight further investigations on the depth reduction for general quantum circuits.


Acknowledgements: This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 12075159 and 12171044; Beijing Natural Science Foundation (Grant No. Z190005); the Academician Innovation Platform of Hainan Province.

Appendix A The proof of Theorem 1

The Theorem 1 can be proved in a way similar to the one given in Quek2022Multivariate , by starting with the ss-qubit GHZ states. Suppose we have efficiently prepared an ss-qubit GHZ state |GHZs=12(|0s+|1s)|\mbox{GHZ}_{s}\rangle=\frac{1}{\sqrt{2}}(|0\rangle^{\otimes s}+|1\rangle^{\otimes s}) by a constant-depth quantum circuit Quek2022Multivariate . Eq. (2) provides a direct way to estimate MT by calculating the real part Re[Tr(S(m)ρ(n,m))]\textrm{Re}[\mbox{Tr}(S^{(m)}\rho^{(n,m)})] and the imaginary part Im[Tr(S(m)ρ(n,m))]\mbox{Im}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})], where ρ(n,m)=ρa1ρa2ρam\rho^{(n,m)}=\rho_{a_{1}}\otimes\rho_{a_{2}}\otimes\cdots\otimes\rho_{a_{m}} is an arrangement of mm states {ρ1,ρ2,,ρm}\{\rho_{1},\rho_{2},\cdots,\rho_{m}\}. Due to the equivalence of Propositions 1 and 2, here we only complete the proof according to the circuits from Proposition 1.

UMT estimation implements (m1)n=𝒪(mn)(m-1)n=\mathcal{O}(mn) controlled SWAP on an initial state Φ0=|GHZsGHZs|ρ(n,m)\Phi_{0}=|\mbox{GHZ}_{s}\rangle\langle\mbox{GHZ}_{s}|\otimes\rho^{(n,m)}, giving rise to the state

Φ1=12(Φ1(1)+Φ1(2)+Φ1(3)+Φ1(4)),\displaystyle\Phi_{1}=\frac{1}{2}(\Phi_{1}^{(1)}+\Phi_{1}^{(2)}+\Phi_{1}^{(3)}+\Phi_{1}^{(4)}), (22)

where

Φ1(1)=|00|sρ(n,m),\displaystyle\Phi_{1}^{(1)}=|0\rangle\langle 0|^{\otimes s}\otimes\rho^{(n,m)}, (23)
Φ1(2)=|01|sρ(n,m)(S(m)),\displaystyle\Phi_{1}^{(2)}=|0\rangle\langle 1|^{\otimes s}\otimes\rho^{(n,m)}(S^{(m)})^{{\dagger}}, (24)
Φ1(3)=|10|sS(m)ρ(n,m),\displaystyle\Phi_{1}^{(3)}=|1\rangle\langle 0|^{\otimes s}\otimes S^{(m)}\rho^{(n,m)}, (25)
Φ1(4)=|11|sS(m)ρ(n,m)(S(m)).\displaystyle\Phi_{1}^{(4)}=|1\rangle\langle 1|^{\otimes s}\otimes S^{(m)}\rho^{(n,m)}(S^{(m)})^{{\dagger}}. (26)

Next, we measure the ss ancillary qubits in the basis of Pauli operator σx\sigma_{x} and record 𝒬i=0\mathcal{Q}_{i}=0 or 𝒬i=1\mathcal{Q}_{i}=1, i=1,,si=1,\cdots,s, with respect to the measurement outcomes |+=12(|0+|1)|+\rangle=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle) or |=12(|0|1)|-\rangle=\frac{1}{\sqrt{2}}(|0\rangle-|1\rangle), respectively. After one time measurement, obtaining a bit string (𝒬1𝒬i𝒬s)(\mathcal{Q}_{1}\cdots\mathcal{Q}_{i}\cdots\mathcal{Q}_{s}), 𝒬i{0,1}\mathcal{Q}_{i}\in\{0,1\}, means that the state of the ancillary registers has collapsed to |𝒬1|𝒬i|𝒬s|\mathcal{Q}_{1}\rangle\otimes\cdots\otimes|\mathcal{Q}_{i}\rangle\otimes\cdots\otimes|\mathcal{Q}_{s}\rangle, where |𝒬i=12(|0+(1)𝒬i|1)|\mathcal{Q}_{i}\rangle=\frac{1}{\sqrt{2}}(|0\rangle+(-1)^{\mathcal{Q}_{i}}|1\rangle).

The probability of obtaining a bit string (𝒬1𝒬i𝒬s)(\mathcal{Q}_{1}\cdots\mathcal{Q}_{i}\cdots\mathcal{Q}_{s}) is given by

Pr(𝒬1𝒬s)\displaystyle\textrm{Pr}(\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}) =Tr[M𝒬1𝒬sΦ1]\displaystyle=\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}]
=12(Tr[M𝒬1𝒬sΦ1(1)]+Tr[M𝒬1𝒬sΦ1(2)]\displaystyle=\frac{1}{2}(\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(1)}]+\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(2)}]
+Tr[M𝒬1𝒬sΦ1(3)]+Tr[M𝒬1𝒬sΦ1(4)]),\displaystyle+\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(3)}]+\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(4)}]),

where the measurement operator

M𝒬1𝒬s=[i=1s|𝒬i𝒬i|]I.\displaystyle M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}=\Big{[}\bigotimes_{i=1}^{s}|\mathcal{Q}_{i}\rangle\langle\mathcal{Q}_{i}|\Big{]}\otimes I. (27)

Hence, we have

Tr[M𝒬1𝒬sΦ1(1)]=Tr[M𝒬1𝒬s|00|sρ(n,m)]=12s,\displaystyle\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(1)}]=\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}|0\rangle\langle 0|^{\otimes s}\otimes\rho^{(n,m)}]=\frac{1}{2^{s}},
Tr[M𝒬1𝒬sΦ1(4)]=Tr[M𝒬1𝒬s|11|sρ(n,m)]=12s,\displaystyle\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(4)}]=\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}|1\rangle\langle 1|^{\otimes s}\otimes\rho^{(n,m)}]=\frac{1}{2^{s}},
Tr[M𝒬1𝒬sΦ1(2)]=(1)i=1s𝒬i2sTr(ρ(n,m)(S(m))),\displaystyle\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(2)}]=\frac{(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}}{2^{s}}\textrm{Tr}(\rho^{(n,m)}(S^{(m)})^{{\dagger}}),
Tr[M𝒬1𝒬sΦ1(3)]=(1)i=1s𝒬i2sTr(S(m)ρ(n,m)).\displaystyle\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}^{(3)}]=\frac{(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}}{2^{s}}\textrm{Tr}(S^{(m)}\rho^{(n,m)}).

Now the probability takes the form,

Pr(𝒬1𝒬s)\displaystyle\textrm{Pr}(\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}) =Tr[M𝒬1𝒬sΦ1]\displaystyle=\textrm{Tr}[M_{\mathcal{Q}_{1}\cdots\mathcal{Q}_{s}}\Phi_{1}]
=1+(1)i=1s𝒬iRe[Tr(S(m)ρ(n,m))]2s.\displaystyle=\frac{1+(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}\textrm{Re}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})]}{2^{s}}. (28)

Thus, the mean of random variable 𝒬^=(𝒬1,,𝒬s)\hat{\mathcal{Q}}=(\mathcal{Q}_{1},\cdots,\mathcal{Q}_{s}) is

𝔼[𝒬^]\displaystyle\mathbb{E}[\hat{\mathcal{Q}}] =𝒬i{0,1}Pr(𝒬1𝒬s)(1)i=1s𝒬i\displaystyle=\sum_{\mathcal{Q}_{i}\in\{0,1\}}\textrm{Pr}(\mathcal{Q}_{1}\cdots\mathcal{Q}_{s})(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}
=𝒬i{0,1}(1)i=1s𝒬i+Re[Tr(S(m)ρ(n,m))]2s\displaystyle=\sum_{\mathcal{Q}_{i}\in\{0,1\}}\frac{(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}+\textrm{Re}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})]}{2^{s}}
=Re[Tr(S(m)ρ(n,m))],\displaystyle=\textrm{Re}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})], (29)

where in the last equality we have utilized the property 𝒬i{0,1}(1)i=1s𝒬i=0\sum_{\mathcal{Q}_{i}\in\{0,1\}}(-1)^{\sum_{i=1}^{s}\mathcal{Q}_{i}}=0. The variance of 𝒬^\hat{\mathcal{Q}} is given by

Var[𝒬^]\displaystyle\textrm{Var}[\hat{\mathcal{Q}}] =𝔼[𝒬^2](𝔼[𝒬^])2\displaystyle=\mathbb{E}[\hat{\mathcal{Q}}^{2}]-(\mathbb{E}[\hat{\mathcal{Q}}])^{2}
=1(Re[Tr(S(m)ρ(n,m))])2.\displaystyle=1-\Big{(}\textrm{Re}[\textrm{Tr}(S^{(m)}\rho^{(n,m)})]\Big{)}^{2}. (30)

Given a sample of size 𝒩\mathcal{N}. Consider 𝒩\mathcal{N} independent random variables Q^(1),,Q^(j),,Q^(𝒩)\hat{Q}^{(1)},\cdots,\hat{Q}^{(j)},\cdots,\hat{Q}^{(\mathcal{N})}, where each Q^(j)=(Q^1(j),,Q^i(j),,Q^s(j))\hat{Q}^{(j)}=(\hat{Q}_{1}^{(j)},\cdots,\hat{Q}_{i}^{(j)},\cdots,\hat{Q}_{s}^{(j)}), Q^i(j)[0,1]\hat{Q}_{i}^{(j)}\in[0,1] for j=1,,𝒩j=1,\cdots,\mathcal{N} and i=1,,si=1,\cdots,s, corresponding to one measurement results via running the above circuit one time. The mean of 𝒬^\hat{\mathcal{Q}} is then estimated by the sample mean,

Q^=j=1𝒩Q^(j)𝒩.\displaystyle\langle\hat{Q}\rangle=\sum_{j=1}^{\mathcal{N}}\frac{\hat{Q}^{(j)}}{\mathcal{N}}. (31)

Let ε0,1\varepsilon\in{0,1} be a precision and δ\delta a constant such that δ(0,1)\delta\in(0,1). From the Hoeffding’s inequality Hoeffding1963Probability we have

Pr(|Q^𝔼[Q^]|ε/2)1δ,\displaystyle\textrm{Pr}(|\langle\hat{Q}\rangle-\mathbb{E}[\hat{Q}]|\leq\varepsilon/2)\geq 1-\delta, (32)

and the sample complexity 𝒩=𝒪(ε2ln(δ1))\mathcal{N}=\mathcal{O}(\varepsilon^{-2}\ln(\delta^{-1})).

Similarly, one implements a phase gate (mapping |0|0|0\rangle\rightarrow|0\rangle and |1ι|1|1\rangle\rightarrow\iota|1\rangle, ι=1\iota=\sqrt{-1}) on each ancillary qubit before taking measurement to obtain the estimation of imaginary part. For the random variable R^\hat{R} we yield a similar result,

Pr(|R^𝔼[R^]|ε/2)1δ,\displaystyle\textrm{Pr}(|\langle\hat{R}\rangle-\mathbb{E}[\hat{R}]|\leq\varepsilon/2)\geq 1-\delta, (33)

where the expectation value 𝔼[R^]=Im[Tr(S(m)ρ(n,m))]\mathbb{E}[\hat{R}]=\mbox{Im}[\mbox{Tr}(S^{(m)}\rho^{(n,m)})] and the variance Var[R^]=1(Im[Tr(S(m)ρ(n,m))])2\mbox{Var}[\hat{R}]=1-(\mbox{Im}[\mbox{Tr}(S^{(m)}\rho^{(n,m)})])^{2}. Define V^={Q^,R^}\hat{V}=\{\hat{Q},\hat{R}\}. The mean of V^\hat{V} is an estimation of the MT and satisfies |V^𝔼[V^]|ε|\langle\hat{V}\rangle-\mathbb{E}[\hat{V}]|\leq\varepsilon, where the mean 𝔼[V^]=Tr(ρ1ρm)\mathbb{E}[\hat{V}]=\mbox{Tr}(\rho_{1}\cdots\rho_{m}). The variance of V^\hat{V} is given by Var[V^]=1(Tr(ρ1ρm))2\mbox{Var}[\hat{V}]=1-(\mbox{Tr}(\rho_{1}\cdots\rho_{m}))^{2}.

Appendix B The proof of Theorem 2

We observe that the numerator of Eq. (4) is

Tr(Oρm)=Tr(O~(i)S(m)ρρ),\displaystyle\mbox{Tr}(O\rho^{m})=\mbox{Tr}(\tilde{O}^{(i)}S^{(m)}\rho\otimes\cdots\otimes\rho), (34)

where the observable O~(i)=IO(i)I\tilde{O}^{(i)}=I\otimes\cdots O^{(i)}\otimes\cdots\otimes I and O(i)O^{(i)} denotes the operator OO acting on the iith register. Let O=k=1NoakPkO=\sum_{k=1}^{N_{o}}a_{k}P_{k}, aka_{k}\in\mathbb{R}, be an efficient decomposition of OO, where PkP_{k} are tensor products of Pauli operators. It is straightforward to show that the trace Tr(Oρm)\mbox{Tr}(O\rho^{m}) is a linear combination of NoN_{o} MT estimations,

Tr(Oρm)\displaystyle\mbox{Tr}(O\rho^{m}) =k=1NoakTr(Pkρm)\displaystyle=\sum_{k=1}^{N_{o}}a_{k}\mbox{Tr}(P_{k}\rho^{m}) (35)
=k=1NoakTr[Pk(i)S(m)(ρρ)].\displaystyle=\sum_{k=1}^{N_{o}}a_{k}\mbox{Tr}[P_{k}^{(i)}S^{(m)}(\rho\otimes\cdots\otimes\rho)]. (36)

The real and imaginary parts of Tr(Oρm)\mbox{Tr}(O\rho^{m}) can be estimated separately by using similar circuit procedure. Thus, we here only consider the estimation of the real part

Re[Tr(Oρm)]=k=1NoakRe(Tr[Pk(i)S(m)(ρρ)]).\displaystyle\mbox{Re}[\mbox{Tr}(O\rho^{m})]=\sum_{k=1}^{N_{o}}a_{k}\mbox{Re}(\mbox{Tr}[P_{k}^{(i)}S^{(m)}(\rho\otimes\cdots\otimes\rho)]). (37)

After implementing the sequences of controlled SWAP gate, we perform a controlled PkP_{k} on an arbitrary register storing the state ρ\rho. Theorem 1 calculates Re(Tr[Pk(i)S(m)(ρρ)])\mbox{Re}(\mbox{Tr}[P_{k}^{(i)}S^{(m)}(\rho\otimes\cdots\otimes\rho)]) by producing a random variable W^k\hat{W}_{k} that can be calculated by using 𝒪(εk2log(δ1))\mathcal{O}(\varepsilon_{k}^{-2}\log(\delta^{-1})) repetitions of a quantum circuit (designed via propositions 1 and 2) consisting of 𝒪(mn)\mathcal{O}(mn) controlled SWAP gates such that

Pr(|W^kRe(Tr(Pkρm))|εk)1δ,\displaystyle\mbox{Pr}(|\langle\hat{W}_{k}\rangle-\mbox{Re}(\mbox{Tr}(P_{k}\rho^{m}))|\leq\varepsilon_{k})\geq 1-\delta, (38)

where εk(0,1)\varepsilon_{k}\in(0,1), δ(0,1)\delta\in(0,1) and W^k\langle\hat{W}_{k}\rangle is the sample mean of variable W^k\hat{W}_{k}. The variance is Var[W^k]=1|Re(Tr(Pkρm))|2\mbox{Var}[\hat{W}_{k}]=1-|\textrm{Re}(\textrm{Tr}(P_{k}\rho^{m}))|^{2}. Let 𝒲^=k=1NoakW^k\hat{\mathcal{W}}=\sum_{k=1}^{N_{o}}a_{k}\hat{W}_{k} be a new random variable. The mean of variable 𝒲^\hat{\mathcal{W}} has the form,

𝔼[𝒲^]=k=1Noak𝔼[W^k]=k=1NoakRe(Tr(Pkρm)).\displaystyle\mathbb{E}[\hat{\mathcal{W}}]=\sum_{k=1}^{N_{o}}a_{k}\mathbb{E}[\hat{W}_{k}]=\sum_{k=1}^{N_{o}}a_{k}\mbox{Re}(\mbox{Tr}(P_{k}\rho^{m})). (39)

Its variance is given by

Var[𝒲^]\displaystyle\mbox{Var}[\hat{\mathcal{W}}] =k=1No|ak|2Var[W^k]\displaystyle=\sum_{k=1}^{N_{o}}|a_{k}|^{2}\mbox{Var}[\hat{W}_{k}]
=k=1No|ak|2[1(Re(Tr(Pkρm)))2]\displaystyle=\sum_{k=1}^{N_{o}}|a_{k}|^{2}[1-\left(\mbox{Re}(\mbox{Tr}(P_{k}\rho^{m}))\right)^{2}]
k=1No|ak|2,\displaystyle\leq\sum_{k=1}^{N_{o}}|a_{k}|^{2}, (40)

where the last inequality is due to the facts that W^1,,W^No\hat{W}_{1},\cdots,\hat{W}_{N_{o}} are independent and each quantity Var[W^k]1\mbox{Var}[\hat{W}_{k}]\leq 1. We remark that the quantity (k=1No|ak|)2\left(\sum_{k=1}^{N_{o}}|a_{k}|\right)^{2} indicates the spread of data from mean 𝔼[𝒲^]\mathbb{E}[\hat{\mathcal{W}}]. Again, the mean 𝔼[𝒲^]\mathbb{E}[\hat{\mathcal{W}}] can be calculated by repeating the procedure 𝒩f\mathcal{N}_{f} times, such that 𝔼[𝒲^]𝒲^=1𝒩fl=1𝒩f𝒲^(l)\mathbb{E}[\hat{\mathcal{W}}]\approx\langle\hat{\mathcal{W}}\rangle=\frac{1}{\mathcal{N}_{f}}\sum_{l=1}^{\mathcal{N}_{f}}\hat{\mathcal{W}}^{(l)}, where 𝒲^(l)\hat{\mathcal{W}}^{(l)} is the measurement result on the ll-th iteration. Moreover, the error of the estimator is

|𝒲^Re(Tr(Oρm))|\displaystyle|\langle\hat{\mathcal{W}}\rangle-\textrm{Re}(\mbox{Tr}(O\rho^{m}))| =|𝒲^𝔼[𝒲^]|\displaystyle=\left|\langle\hat{\mathcal{W}}\rangle-\mathbb{E}[\hat{\mathcal{W}}]\right|
=|k=1Noak(W^kRe[Tr(Pkρm)])|\displaystyle=\left|\sum_{k=1}^{N_{o}}a_{k}\left(\langle\hat{W}_{k}\rangle-\textrm{Re}[\mbox{Tr}(P_{k}\rho^{m})]\right)\right|
k=1No|ak|εk<ε,\displaystyle\leq\sum_{k=1}^{N_{o}}\left|a_{k}\right|\varepsilon_{k}<\varepsilon, (41)

where in the last inequality we have set ε1==εk=ε/k=1No|ak|\varepsilon_{1}=\cdots=\varepsilon_{k}=\varepsilon/\sum_{k=1}^{N_{o}}|a_{k}|, and the last equation utilizes the Eq. (38). Using the same trick, we can estimate the imaginary part.

For kk runs from 11 to NoN_{o}, the sample complexity is

𝒩f\displaystyle\mathcal{N}_{f} =k=1No1εk2ln1δ=No(k=1No|ak|)2ε2ln1δ\displaystyle=\sum_{k=1}^{N_{o}}\frac{1}{\varepsilon_{k}^{2}}\ln\frac{1}{\delta}=N_{o}\frac{(\sum_{k=1}^{N_{o}}|a_{k}|)^{2}}{\varepsilon^{2}}\ln\frac{1}{\delta}
=𝒪(Noc2ε2ln1δ).\displaystyle=\mathcal{O}\Big{(}\frac{N_{o}c^{2}}{\varepsilon^{2}}\ln\frac{1}{\delta}\Big{)}. (42)

Therefore, the total number of copies of ρ\rho is 𝒪(mNoc2ε2ln1δ)\mathcal{O}\Big{(}\frac{mN_{o}c^{2}}{\varepsilon^{2}}\ln\frac{1}{\delta}\Big{)}. We set the quantity k=1No|ak|=𝒪(c)\sum_{k=1}^{N_{o}}|a_{k}|=\mathcal{O}(c) bounded by a constant cc. Back to Eq. (B), the variance is also bounded since k=1No|ak|2(k=1No|ak|)2\sum_{k=1}^{N_{o}}|a_{k}|^{2}\leq\left(\sum_{k=1}^{N_{o}}|a_{k}|\right)^{2}.

References