This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Adaptivity can help exponentially for shadow tomography

Sitan Chen     Weiyuan Gong     Zhihan Zhang SEAS, Harvard University. Email: [email protected].SEAS, Harvard University. Email: [email protected].IIIS, Tsinghua University. Email: [email protected].
Abstract

In recent years there has been significant interest in understanding the statistical complexity of learning from quantum data under the constraint that one can only make unentangled measurements. While a key challenge in establishing tight lower bounds in this setting is to deal with the fact that the measurements can be chosen in an adaptive fashion, a recurring theme has been that adaptivity offers little advantage over more straightforward, nonadaptive protocols.

In this note, we offer a counterpoint to this. We show that for the basic task of shadow tomography, protocols that use adaptively chosen two-copy measurements can be exponentially more sample-efficient than any protocol that uses nonadaptive two-copy measurements.

1 Introduction

The ability to extract information from quantum data lies at the heart of quantum information science. While entangled measurements are powerful for this purpose, practical constraints often necessitate the use of unentangled or restricted entangled measurements. This restriction introduces a dichotomy in algorithms: adaptive algorithms can dynamically decide how to measure a new batch of input based on the previous history, while nonadaptive algorithms fix their measurement strategies ahead of time. Establishing tight sample complexity lower bounds is challenging in the adaptive setting.

Surprisingly, for a wide range of quantum learning tasks, adaptivity provably offers no improvement and one can achieve optimal statistical rates using nonadaptive protocols. Examples include quantum state tomography with respect to trace distance [13, 19, 10, 18], shadow tomography with unentangled measurements [15, 7], identity testing and mixedness testing [3, 11, 7], and purity testing [7, 12, 2]. Prior to this note, the only known example where adaptivity helps for quantum learning was for quantum state tomography with respect to infidelity γ\gamma using unentangled measurement, where adaptive protocols require O(23n/γ)O(2^{3n}/\gamma) copies [10] while nonadaptive protocols require Θ(23n/γ2)\Theta(2^{3n}/\gamma^{2}) copies [13]. Here we ask:

Are there natural quantum learning tasks where adaptive measurement strategies significantly outperform nonadaptive ones?

We answer this in the affirmative by showing that adaptivity can help exponentially for the task of Pauli shadow tomography using two-copy measurements. The problem can be formalized as follows:

Problem 1.1 (Pauli shadow tomography).

Given access to copies of an nn-qubit unknown state σ\sigma, as well as a subset AA of the Pauli group 𝒫n\mathcal{P}_{n}, the goal is to estimate the expectation values {tr(Pσ)}PA\{\mathrm{tr}(P\sigma)\}_{P\in A} to within ε\varepsilon additive error.

This problem arises naturally in various contexts like the variational quantum eigensolver, quantum resource theory, quantum process tomography, and stabilizer state learning. A review of this literature is beyond the scope of this note and the reader can refer to the related works section of [9] for a more detailed overview.

Pauli shadow tomography provides a clean testbed for exploring statistical overheads incurred by near-term device constraints when learning quantum states [7, 14, 16] and channels [5, 6, 8, 4]. For instance, it is known that Θ(2n/ε2)\Theta(2^{n}/\varepsilon^{2}) copies are necessary and sufficient for protocols that can only perform unentangled measurements [7], even adaptively chosen ones. On the other hand, there is a protocol which performs Bell basis (nonadaptive) measurements on two copies of σ\sigma at a time which can estimate |tr(Pσ)|\absolutevalue{\mathrm{tr}(P\sigma)} for all P𝒫nP\in\mathcal{P}_{n} using only O(n/ε4)O(n/\varepsilon^{4}) copies. Recently, [9] and [17] improved upon this to give a protocol with the same sample complexity which performs two-copy measurements and can estimate tr(Pσ)\mathrm{tr}(P\sigma) rather than just their absolute values.

Importantly, the latter protocol of [9, 17] is adaptive: in the first step, it uses the protocol of [16] to estimate the absolute values, and in the second, it uses these absolute values to prepare a certain single-copy measurement to learn the signs. Given the absence of strong separations between what can be done adaptively versus nonadaptively, it is reasonable to conjecture that there is a nonadaptive alternative to the protocol in [9, 17].

Theorem 1.2.

Any protocol that estimates tr(Pσ)\mathrm{tr}(P\sigma) for all P𝒫nP\in\mathcal{P}_{n} to additive error less than 11 using nonadaptive two-copy measurements requires Ω(2n/2)\Omega(2^{n/2}) copies of σ\sigma. On the other hand, there is a protocol making adaptively chosen two-copy measurements that only uses O(n)O(n) copies.

To our knowledge, this is the first result that proves an exponential separation between adaptive and nonadaptive protocols in quantum learning, answering a question posed by Aharonov, Cotler, and Qi [1]. The second half of Theorem 1.2 follows from [9, 17]. We provide the proof for the first half, i.e. the lower bound for nonadaptive protocols, in Section 2.2 below.

2 Proof of the main result

2.1 Preliminaries

We begin by recalling some standard notations in quantum information. An nn-qubit quantum state can be written as a positive semi-definite matrix σ2n×2n\sigma\in\mathbb{C}^{2^{n}\times 2^{n}} with tr(σ)=1\mathrm{tr}(\sigma)=1. When the state has rank 11 and thus tr(σ2)=1\mathrm{tr}(\sigma^{2})=1, it is a pure state and can be denoted as |ψ\ket{\psi}.

The nn-qubit Pauli group 𝒫n{I,X,Y,Z}n\mathcal{P}_{n}\coloneqq\{I,X,Y,Z\}^{\otimes n} is the set of nn-qubit Pauli strings, where

I=(1001),X=(0110),Y=(0ii0),Z=(1001)I=\begin{pmatrix}1&0\\ 0&1\end{pmatrix}\,,\qquad X=\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\,,\qquad Y=\begin{pmatrix}0&-i\\ i&0\end{pmatrix}\,,\qquad Z=\begin{pmatrix}1&0\\ 0&-1\end{pmatrix}

are single-qubit Pauli matrices. We will need the following fact for the sum of products of pairs of Pauli matrices (see e.g.  [7, Lemma 4.10] for proof):

P𝒫nPP=2nSWAPn,\displaystyle\sum_{P\in\mathcal{P}_{n}}P\otimes P=2^{n}\,\mathrm{SWAP}_{n}\,, (1)

where SWAPn\mathrm{SWAP}_{n} is the 2n2n-qubit SWAP operator.

A quantum measurement \mathcal{M} can be represented as a set of positive operator-valued measures (POVMs), i.e. a set of positive-semidefinite matrices {Fs}s\{F_{s}\}_{s} satisfying sFs=I\sum_{s}F_{s}=I. Here, each FsF_{s} is a POVM element corresponding to a measurement outcome ss. When we measure a quantum state σ\sigma using the POVM {Fs}s\{F_{s}\}_{s}, the probability of observing outcome ss is tr(Fsσ)\operatorname{tr}(F_{s}\sigma). When FsF_{s} is rank-11 for all ss, it is a rank-1 POVM[7, Lemma 4.8] shows that any POVM can be information-theoretically simulated by a rank-11 POVM and classical post-processing. In the following, we will thus assume without loss of generality when we prove our lower bounds that all measurements in the learning protocols we consider only use rank-11 POVMs. For measurements over two copies, these can be written as

{ws|ψsψs|}s,\displaystyle\{w_{s}\ket{\psi_{s}}\bra{\psi_{s}}\}_{s}\,,

for 2n2n-qubit pure states {|ψs}\{\ket{\psi_{s}}\} and nonnegative weights wsw_{s} with sws=22n\sum_{s}w_{s}=2^{2n}.

2.2 Proof of the nonadaptive lower bound

To prove the nonadaptive lower bound in Theorem 1.2, we consider a distinguishing task in which we are given access to copies of an unknown nn-qubit quantum state σ\sigma and want to distinguish between two cases:

  • σ\sigma is sampled from {σa+=I+Pa2n}a=14n1\Bigl{\{}\sigma_{a}^{+}=\frac{I+P_{a}}{2^{n}}\Bigr{\}}_{a=1}^{4^{n}-1} uniformly for Pa𝒫n\IP_{a}\in\mathcal{P}_{n}\backslash I.

  • σ\sigma is sampled from {σa=IPa2n}a=14n1\Bigl{\{}\sigma_{a}^{-}=\frac{I-P_{a}}{2^{n}}\Bigr{\}}_{a=1}^{4^{n}-1} uniformly for Pa𝒫n\IP_{a}\in\mathcal{P}_{n}\backslash I.

If we have a protocol that can solve 1.1 for all P𝒫nP\in\mathcal{P}_{n} to additive error less than 11 with high probability, then we can solve this distinguishing task using the same protocol. It thus suffices to prove a lower bound on the number of copies of σ\sigma needed to solve the distinguishing problem with high probability.

For any nonadaptive protocols using at most TT two-copy measurements, we can assume that the protocol nonadaptively picks the POVM measurement t={wst|ψstψst|}s\mathcal{M}_{t}=\{w_{s}^{t}\ket{\psi_{s}^{t}}\bra{\psi_{s}^{t}}\}_{s}, where each |ψst\ket{\psi_{s}^{t}} is a 2n2n-qubit pure state and swst=22n\sum_{s}w_{s}^{t}=2^{2n}, in the tt-th iteration. Given an a{1,,4n1}a\in\{1,...,4^{n}-1\}, we denote by pa,t+p^{+}_{a,t} and pa,tp^{-}_{a,t} the probability distributions over outcomes sts_{t} from measuring σa+\sigma_{a}^{+} and σa\sigma_{a}^{-} respectively using t\mathcal{M}_{t}. We also denote by pa+p^{+}_{a} and pap^{-}_{a} the probability distribution over the TT measurement outcomes s1,,sTs_{1},...,s_{T} for measuring σa+\sigma_{a}^{+} and σa\sigma_{a}^{-}.

By Le Cam’s lemma [20] for hypothesis testing, if the total variation distance between the average probability distributions 𝔼a[pa+]\mathbb{E}_{a}[p^{+}_{a}] and 𝔼a[pa]\mathbb{E}_{a}[p^{-}_{a}] is bounded by o(1)o(1) for any nonadaptive sequence of two-copy POVM measurements {t}t=1T\{\mathcal{M}_{t}\}_{t=1}^{T}, i.e.,

TV(𝔼a[pa+],𝔼a[pa])o(1),\displaystyle\text{TV}(\mathbb{E}_{a}[p^{+}_{a}],\mathbb{E}_{a}[p^{-}_{a}])\leqslant o(1),

then any nonadaptive protocols using at most TT two-copy measurements cannot distinguish between these two cases with high probability.

Note that for a sequence of nonadaptive measurements, the probability distribution pa+p^{+}_{a} and pap^{-}_{a} can be written as a tensor product of the probability distributions for each individual measurement pa,t+p^{+}_{a,t} and pa,tp^{-}_{a,t}. We thus have

TV(𝔼a[pa+],𝔼a[pa])\displaystyle\text{TV}(\mathbb{E}_{a}[p^{+}_{a}],\mathbb{E}_{a}[p^{-}_{a}]) 𝔼a[TV(pa+,pa)]=𝔼a[TV(t=1Tpa,t+,t=1Tpa,t)]𝔼a[t=1TTV(pa,t+,pa,t)]\displaystyle\leqslant\mathbb{E}_{a}[\text{TV}(p^{+}_{a},p^{-}_{a})]=\mathbb{E}_{a}\Bigl{[}\text{TV}\Bigl{(}\bigotimes_{t=1}^{T}p^{+}_{a,t},\bigotimes_{t=1}^{T}p^{-}_{a,t}\Bigr{)}\Bigr{]}\leqslant\mathbb{E}_{a}\Bigl{[}\sum_{t=1}^{T}\text{TV}(p^{+}_{a,t},p^{-}_{a,t})\Bigr{]}
=t=1T𝔼a[TV(pa,t+,pa,t)]Tmaxtwo copy t𝔼a[TV(pa,t+,pa,t)]\displaystyle=\sum_{t=1}^{T}\mathbb{E}_{a}[\text{TV}(p^{+}_{a,t},p^{-}_{a,t})]\leqslant T\max_{\text{two copy }\mathcal{M}_{t}}\mathbb{E}_{a}[\text{TV}(p^{+}_{a,t},p^{-}_{a,t})]

where the first line follows from Jensen’s inequality, the nonadaptivity of the protocol, and triangle inequality.

It thus suffices to bound the average total variation distance for a single two-copy measurements. For any rank-1 POVM t\mathcal{M}_{t} over two copies, we have

𝔼a[TV(pa,t+,pa,t)]\displaystyle\mathbb{E}_{a}[\text{TV}(p^{+}_{a,t},p^{-}_{a,t})]
=𝔼a[12s|tr[(I+Pa2n)2wst|ψstψst|]tr[(IPa2n)2wst|ψstψst|]|]\displaystyle=\mathbb{E}_{a}\Bigl{[}\frac{1}{2}\sum_{s}\Bigl{|}\mathrm{tr}\Bigl{[}\Bigl{(}\frac{I+P_{a}}{2^{n}}\Bigr{)}^{\otimes 2}w_{s}^{t}\ket{\psi_{s}^{t}}\bra{\psi_{s}^{t}}\Bigr{]}-\mathrm{tr}\Bigl{[}\Bigl{(}\frac{I-P_{a}}{2^{n}}\Bigr{)}^{\otimes 2}w_{s}^{t}\ket{\psi_{s}^{t}}\bra{\psi_{s}^{t}}\Bigr{]}\Bigr{|}\Bigr{]}
=𝔼a[122nswst|tr[(IPa+PaI)|ψstψst|]|]\displaystyle=\mathbb{E}_{a}\Bigl{[}\frac{1}{2^{2n}}\sum_{s}w_{s}^{t}\,\absolutevalue{\mathrm{tr}[(I\otimes P_{a}+P_{a}\otimes I)\ket{\psi_{s}^{t}}\bra{\psi_{s}^{t}}]}\Bigr{]}
122n𝔼a[swsttr2[(IPa+PaI)|ψstψst|]swst]\displaystyle\leqslant\frac{1}{2^{2n}}\mathbb{E}_{a}\Bigl{[}\sqrt{\sum_{s}w_{s}^{t}\,\mathrm{tr}^{2}[(I\otimes P_{a}+P_{a}\otimes I)\ket{\psi_{s}^{t}}\bra{\psi_{s}^{t}}]\cdot\sum_{s}w_{s}^{t}}\Bigr{]}
12n𝔼a[swsttr2[(IPa+PaI)|ψstψst|]]\displaystyle\leqslant\frac{1}{2^{n}}\sqrt{\mathbb{E}_{a}\Bigl{[}\sum_{s}w_{s}^{t}\,\mathrm{tr}^{2}\left[(I\otimes P_{a}+P_{a}\otimes I)\ket{\psi_{s}^{t}}\bra{\psi_{s}^{t}}\right]\Bigr{]}}
=12n𝔼a[swstψst|ψst|(IPa+PaI)2|ψst|ψst]\displaystyle=\frac{1}{2^{n}}\sqrt{\mathbb{E}_{a}\Bigl{[}\sum_{s}w_{s}^{t}\bra{\psi_{s}^{t}}\bra{\psi_{s}^{t}}(I\otimes P_{a}+P_{a}\otimes I)^{\otimes 2}\ket{\psi_{s}^{t}}\ket{\psi_{s}^{t}}\Bigr{]}}
=12n122n1swstψst|ψst|(2n(SWAP1,3+SWAP1,4+SWAP2,3+SWAP2,4)4I4)|ψst|ψst\displaystyle=\frac{1}{2^{n}}\sqrt{\frac{1}{2^{2n}-1}\sum_{s}w_{s}^{t}\bra{\psi_{s}^{t}}\bra{\psi_{s}^{t}}\left(2^{n}(\text{SWAP}_{1,3}+\text{SWAP}_{1,4}+\text{SWAP}_{2,3}+\text{SWAP}_{2,4})-4I^{\otimes 4}\right)\ket{\psi_{s}^{t}}\ket{\psi_{s}^{t}}}
12n122n1swst(42n4)\displaystyle\leqslant\frac{1}{2^{n}}\sqrt{\frac{1}{2^{2n}-1}\sum_{s}w_{s}^{t}(4\cdot 2^{n}-4)}
=12n422n2n+1=O(12n/2)\displaystyle=\frac{1}{2^{n}}\sqrt{\frac{4\cdot 2^{2n}}{2^{n}+1}}=O\Bigl{(}\frac{1}{2^{n/2}}\Bigr{)}

where the third step follows from the Cauchy-Swartz inequality, the fourth step follows from the fact that swst=22n\sum_{s}w_{s}^{t}=2^{2n} and Jensen’s inequality, and the sixth step follows from Eq. (1) (here SWAPi,j\text{SWAP}_{i,j} denotes the SWAP operator on the ii-copy and the jj-th copy). Therefore, we have

TV(𝔼a[pa+],𝔼a[pa])O(T2n/2).\displaystyle\text{TV}(\mathbb{E}_{a}[p^{+}_{a}],\mathbb{E}_{a}[p^{-}_{a}])\leqslant O(T\cdot 2^{-n/2}).

This indicates that any nonadaptive protocols with To(2n/2)T\leqslant o(2^{n/2}) two-copy measurement can not solve this distinguishing task with high probability, which yields the Ω(2n/2)\Omega(2^{n/2}) lower bound in Theorem 1.2 for two-copy nonadaptive protocols for solving 1.1 for 𝒫n\mathcal{P}_{n}.

3 Discussion

In this note, we proved the first exponential separation in sample complexity between learning protocols with and without adaptivity in the context of shadow tomography. Looking ahead, it would be interesting to study more fine-grained tradeoffs between sample complexity and adaptivity, e.g. as measured by the number of rounds of adaptivity. In the setting studied in this paper, one round of adaptivity is sufficient to obtain optimal dependence on nn [9, 17]. Are there natural quantum learning tasks where protocols with more rounds of adaptivity are strictly more powerful?

Another important question is to understand the computational complexity of optimally choosing adaptive measurement bases in quantum learning. While the adaptive protocols in [9, 17] are sample-efficient, their adaptive choice of measurement currently requires at least exponential classical computation. Is this inherent, or can one prove a separation between nonadaptive and adaptive computationally efficient protocols?

Acknowledgments

We thank Hsin-Yuan Huang for helpful discussions and for encouraging us to write up this finding. We thank Jordan Cotler and Jerry Li for providing feedback on this manuscript, and Robbie King and Qi Ye for helpful discussions.

References

  • [1] Dorit Aharonov, Jordan Cotler, and Xiao-Liang Qi, Quantum algorithmic measurement, Nature communications 13 (2022), no. 1, 2101.04634.
  • [2] Anurag Anshu, Zeph Landau, and Yunchao Liu, Distributed quantum inner product estimation, Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pp. 44–51, 2022, 2111.03273.
  • [3] Sebastien Bubeck, Sitan Chen, and Jerry Li, Entanglement is necessary for optimal quantum property testing, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 692–703, IEEE, 2020, 2004.07869.
  • [4] Matthias C Caro, Learning quantum processes and Hamiltonians via the Pauli transfer matrix, ACM Transactions on Quantum Computing 5 (2024), no. 2, 2212.04471.
  • [5] Senrui Chen, Changhun Oh, Sisi Zhou, Hsin-Yuan Huang, and Liang Jiang, Tight bounds on pauli channel learning without entanglement, Physical Review Letters 132 (2024), no. 18, 2309.13461.
  • [6] Senrui Chen, Sisi Zhou, Alireza Seif, and Liang Jiang, Quantum advantages for Pauli channel estimation, Physical Review A 105 (2022), no. 3, 2108.08488.
  • [7] Sitan Chen, Jordan Cotler, Hsin-Yuan Huang, and Jerry Li, Exponential separations between learning with and without quantum memory, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pp. 574–585, IEEE, 2022, 2111.05881.
  • [8] Sitan Chen and Weiyuan Gong, Efficient Pauli channel estimation with logarithmic quantum memory, arXiv:2309.14326 (2023), 2309.14326.
  • [9] Sitan Chen, Weiyuan Gong, and Qi Ye, Optimal tradeoffs for estimating pauli observables, 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1086–1105, 2024, 2404.19105.
  • [10] Sitan Chen, Brice Huang, Jerry Li, Allen Liu, and Mark Sellke, When does adaptivity help for quantum state learning?, 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), IEEE, 2023, 2206.05265.
  • [11] Sitan Chen, Jerry Li, and Ryan O’Donnell, Toward instance-optimal state certification with incoherent measurements, Conference on Learning Theory, pp. 2541–2596, PMLR, 2022, 2102.13098.
  • [12] Weiyuan Gong, Jonas Haferkamp, Qi Ye, and Zhihan Zhang, On the sample complexity of purity and inner product estimation, arXiv:2410.12712 (2024), 2410.12712.
  • [13] Jeongwan Haah, Aram W Harrow, Zhengfeng Ji, Xiaodi Wu, and Nengkun Yu, Sample-optimal tomography of quantum states, Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pp. 913–925, 2016, 1508.01797.
  • [14] Hsin-Yuan Huang, Michael Broughton, Jordan Cotler, Sitan Chen, Jerry Li, Masoud Mohseni, Hartmut Neven, Ryan Babbush, Richard Kueng, John Preskill, et al., Quantum advantage in learning from experiments, Science 376 (2022), no. 6598, 2112.00778.
  • [15] Hsin-Yuan Huang, Richard Kueng, and John Preskill, Predicting many properties of a quantum system from very few measurements, Nature Physics 16 (2020), no. 10, 2002.08953.
  • [16]  , Information-theoretic bounds on quantum advantage in machine learning, Physical Review Letters 126 (2021), no. 19, 2101.02464.
  • [17] Robbie King, David Gosset, Robin Kothari, and Ryan Babbush, Triply efficient shadow tomography, arXiv:2404.19211 (2024), 2404.19211.
  • [18] Richard Kueng, Holger Rauhut, and Ulrich Terstiege, Low rank matrix recovery from rank one measurements, Applied and Computational Harmonic Analysis 42 (2017), no. 1, 1410.6913.
  • [19] Ryan O’Donnell and John Wright, Efficient quantum tomography, Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pp. 899–912, 2016, 1508.01907.
  • [20] Bin Yu, Assouad, fano, and le cam, Festschrift for Lucien Le Cam: research papers in probability and statistics (1997).