Adaptive Online Learning of Quantum States
Abstract
The problem of efficient quantum state learning, also called shadow tomography, aims to comprehend an unknown -dimensional quantum state through POVMs. Yet, these states are rarely static; they evolve due to factors such as measurements, environmental noise, or inherent Hamiltonian state transitions. This paper leverages techniques from adaptive online learning to keep pace with such state changes.
The key metrics considered for learning in these mutable environments are enhanced notions of regret, specifically adaptive and dynamic regret. We present adaptive and dynamic regret bounds for online shadow tomography, which are polynomial in the number of qubits and sublinear in the number of measurements. To support our theoretical findings, we include numerical experiments that validate our proposed models.
1 Introduction
As opposed to their classical counterparts, quantum states over qubits requires “amplitudes” for their description. Therefore, recovering the density matrix of a quantum system with qubits from measurements becomes an impractical task for even moderate values of . The setting of shadow tomography [1] attempts to circumvent this issue: rather than recovering the entire state matrix, the goal is to accurately reconstruct the “shadow” that the density matrix casts on a set of measurements that are known in advance.
More precisely, let be an -qubit quantum state: that is, a Hermitian positive semidefinite matrix with . We can measure by a two-outcome positive operator-valued measure (POVM) , which can be represented by a Hermitian matrix with eigenvalues in . Such a measurement accepts (returns 1) with probability and rejects (returns 0) with probability . Given a sequence of POVMs , shadow tomography uses copies of to estimate within for all with probability at least .
The state-of-the-art result [2] can accomplish this task with copies of , exponentially improving upon full state tomography in terms of . However, in the real world a POVM can be chosen based on previous actions from the observer (learner) and is not known in advance. Moreover, shadow tomography may be unfeasible on near-term quantum computers, as current methods aaronson2018shadow [1, 3, 4, 2] all require a joint measurement on multiple copies of , which needs to be implemented on larger quantum systems limited by the scale of existing quantum devices.
In light of both concerns it is natural to consider online learning of quantum states. In this problem, the measurements appear sequentially and the goal is to select a sequence of states to minimize regret compared to the optimal quantum state in retrospect. This problem naturally fits into the online convex optimization framework, in which a player makes a decision at each step, and then is revealed a convex loss function and suffers the loss of its decision. The player makes decisions to minimize the total loss. However, this problem is too difficult since the loss functions can be arbitrary, so we only expect to minimize the loss of the player minus the minimum loss of making the same decision at every step, which is called the regret of the player. In the online learning of quantum states setting, the player outputs a quantum state at step and suffers a loss measuring the difference between and the true state under the chosen POVM. The sequential nature of the setting gives rise to algorithms that treat each measurement separately, thereby eliminating the need for joint measurements111In some shadow tomography algorithms using similar online learning schemes aaronson2018shadow [1, 2], they do not need joint measurements when updating the state but do need them when choosing the POVMs. The existing algorithm of [5], along with subsequent works [6, 7, 8, 9], yields a sequence of states with regret for iterations. In some cases, these regret bounds can be formalized as mistake bounds which count the number of steps that the player suffers a loss larger than a given threshold.
Nevertheless, existing literature on shadow tomography does not cover an important factor in current quantum computers: fluctuation of quantum states. In practical scenarios, we often cannot calibrate the quantum state constantly, and shadow tomography algorithms requiring multiple identical copies of may not be viable. For instance, on the IBM quantum computing platform, a quantum circuit with measurements will be executed hundreds or thousands of shots in a row without calibrations in such consecutive experiments.
In real-world experiments, a quantum state is typically obtained by evolving a control Hamiltonian for some time starting from an initial state , i.e., . If the device is not calibrated for a while, the control Hamiltonian implemented is a noisy one , and the actual quantum state we prepare is a perturbed state . More discussions about noises in quantum states can be found in ball2016effect [10, 11, 12, 13, 14].
In light of these fluctuations, recent study on state tomography extends to temporal quantum tomography [15] with changing quantum states. Therefore, a fundamental question from our perspective of machine learning is learning changing quantum states, which naturally fits into the framework of adaptive online learning. Such quantum algorithms are crucial for near-term noisy, intermediate-scale quantum computers (NISQ), since the design of noise-resilient algorithms may extend the computational power of NISQ devices [16].
Adaptive online learning.
The motivation above suggests a learning problem that assumes the quantum state prepared at time step is . Due to imperfect calibration, at different time is different, and our goal is to learn in an online fashion. The problem of online learning of a changing concept has received significant attention in the machine learning literature, and we adapt several of these techniques to the quantum setting, which involves high-dimensional spaces over the complex numbers.
The first metric we consider is the "dynamic regret" introduced by zinkevich2003online [17], which measures the difference between the learner’s loss and that of a changing comparator. The dynamic regret bounds are usually characterized by how much the optimal comparator changes over time, known as the "path length."
Next, we consider a more sophisticated metric, the "adaptive regret," as introduced by hazan2009efficient [18]. This metric measures the maximum of the regret over all intervals, essentially taking into account a changing comparator. Many extensions and generalizations of the original technique have been presented in works such as daniely2015strongly [19, 20, 21]. Minimizing adaptive regret requires a combination of expert algorithms and continuous online learning methods, which we also adopt for the quantum setting.
Contributions.
In this paper, we systematically study adaptive online learning of quantum states. In quantum tomography, we might have prior knowledge about , but in online learning, it can be adversarial. As such, we expect regret guarantees that extend beyond just comparing with the best fixed . We study the following questions summarized in Table 1:
-
•
Dynamic regret: We consider minimizing regret under the assumption that the comparator changes slowly:
(1) Here is bounded and defined as the path length, the functions are -Lipschitz for all , and is the norm of the singular values of a matrix known as the nuclear norm. We propose an algorithm based on online mirror descent (OMD) that achieves an regret bound. Although the analysis shares the same spirit as the standard OMD proof, tackling optimization in the complex domain requires new tools from several areas. Specifically, we use Klein’s inequality from matrix analysis to study the dual update in OMD, and the Fannes–Audenaert inequality [22] from quantum information theory to obtain the dynamic regret bounds. Compared to aaronson2018online [5], our analysis is more general due to the introduction of these tools.
As an application, under the assumption that evolves according to a family of dynamical models, we can also prove dynamic regret bounds where is a variant of path length defined in Eq. (6). This covers two natural scenarios: either the dynamics comes from a family of known models (which incurs a overhead), or we know that the dynamical model is an -local quantum channel, a natural assumption for current quantum computers fabricated by 2D chips [23, 24, 25] where a quantum gate can only influence a constant number of qubits.
-
•
Adaptive regret: We also consider minimizing the strongly adaptive regret [19] of an algorithm at time for length :
(2) where is the output of algorithm at time , and is a real loss function that is revealed to the learner, for example, and for some .222Here we make no assumptions about how is revealed. In application, for example [1], they first obtain by repeated measurements and then compute classically. We show that combining algorithms from aaronson2018online [5], jun2017improved [20] gives an bound for adaptive regret. We then derive an regret bound when the comparator is allowed to shift times.
-
•
Mistake bound: Given a sequence of two-outcome measurements where each is followed afterward by a value such that . Assuming the ground truth has -shift or bounded path length , we build upon the regret bounds to derive upper bounds of the number of mistakes made by an algorithm whose output is a sequence of , where is considered a mistake. For dynamic regret and adaptive regret settings, we give mistake bounds in Corollary 1 and Corollary 4, respectively.
Regret | Reference | Setting | O-Bound | ||
---|---|---|---|---|---|
Dynamic | Theorem 1 | Path length | |||
Dynamic | Corollary 2 |
|
|
||
Dynamic | Corollary 3 |
|
|
||
Adaptive | Lemma 1 |
|
|||
Adaptive | Theorem 2 | -shift |
We also note that our results can be extended from two-outcome POVMs to -outcome POVMs for , inspired by a recent work by gong2022learning [26]. A -outcome measurements can be described by Hermitian matrices such that , and for any quantum state , the measurement outputs with probability . In our setting of online learning of quantum states, a sequence of measurements is given and each can be described by a set of Hermitian matrices . In Appendix C.1, we show that our algorithm has the same regret upper bound on it for any if we use the total variational norm as the loss function.
We also conduct numerical experiments for our algorithms about their dynamic and adaptive regret bounds on the -shift and the path length settings. The results demonstrate that our algorithms improve upon non-adaptive algorithms when the quantum state is changing, confirming our theoretical results.
2 Preliminaries
In online convex optimization (refer to [27] for a comprehensive treatment), a player engages in a -round game against an adversarial environment. In each round , the player selects , then the environment unveils the convex loss function , and the player incurs the loss . The player’s objective is to minimize the (static) regret:
(3) |
We can model the quantum tomography problem within the framework of online convex optimization: the convex domain is taken to be , the set of all trace-1 positive semidefinite (PSD) complex matrices of dimension :
The loss function can be or for example, measuring the quality of approximating the ground-truth .
Another goal is to bound the maximum number of mistakes we make during the learning process. We consider the case where , in the revealed loss function satisfies . We consider it a mistake if the algorithm outputs an such that .
Notations.
Let denote the nuclear norm, i.e., the norm of the singular values of a matrix. Let denote the inner trace product between and . Denote to be the largest eigenvalue of a Hermitian matrix . means that the matrix is positive semidefinite. For any positive integer , denote . Denote to be the tensor product, i.e., for matrices and , we have where , . Throughout the paper, omits poly-logarithmic terms.
3 Minimizing Dynamic Regret
In this section, we consider the case that may change over time. In particular, we do not assume the number of times that changes, but instead consider the total change over time measured by the path length. To this end, we study dynamic regret, where the path length of the comparator in nuclear norm is restricted:
(4) |
Note that the set of comparators do not have to be the ground truth states ; they can be any states satisfying the path length condition. We propose an algorithm achieving an dynamic regret bound, which instantiates copies of OMD algorithms with different learning rates as experts, and uses the multiplicative weight algorithm to select the best expert. The need for an expert algorithm arises because to obtain the dynamic regret bound, the learning rate of the OMD algorithm has to be a function of the path length, which is unknown ahead of time. The main algorithm is given in Algorithm 1, and the expert OMD algorithm is given in Algorithm 2. The following theorem gives the dynamic regret guarantee:
Theorem 1.
Assume the path length and the loss is convex, L-Lipschitz, and maps to , the dynamic regret of Algorithm 1 is bounded by .
The proof of Theorem 1 is deferred to Appendix A and we discuss some technical challenges here. Ref. zhang2018adaptive [28] gives an dynamic regret bound for online convex optimization:
(5) |
where . Directly adopting their result will lead to an exponential dependence on , because the original algorithm uses norm to measure the path length, which makes dual norm of the gradient depends polynomially on the dimension ( in this case). To avoid this issue, we use OMD with von Neumann entropy regularization to obtain a poly-logarithmic dependence on the dimension. From aaronson2018online [5], we know that with the von Neumann entropy as the regularizer, the regret can be bounded by problem parameters measured in nuclear norm and spectral norm, which tend to be small for online shadow tomography. The analysis is in a similar vein as the classical OMD proof [27], but classical results often rely on theorems from calculus that cannot be straightforwardly applied to the complex domain.
Given the challenge of complex numbers, we use tools from functional analysis and quantum information theory to study the convergence of the algorithm. Our analysis deviates from the classical analysis in the following aspects:
-
•
We use the notion of Gateaux differentiability and the Gateaux derivative to extend classical gradient-based methods to the quantum domain. In particular, we use the Gateaux derivative to define the Bregman divergence, a key ingredient of the OMD analysis. Subsequently, we show that the Bregman divergence obeys an Euler’s inequality that corresponds to the generalized Pythagorean theorem in the classical analysis [27].
-
•
To give a quantitative bound on the Bregman divergence between the current iterate and the dual update, we cannot rely on the interpretation of the Bregman divergence as a notion of distance. Instead, we upper bound the quantity using the power series of the matrix exponential, and the Golden-Thompson inequality.
-
•
When bounding the dynamic regret, we need to bound the difference of von Neumann entropy between two quantum states. This can be seen as a type of continuity bound for the von Neumann entropy, and was first given by fannes [29] and later sharpened by Audenaert_2007 [22]. The bound is known as the Fannes-Audenaert inequality with numerous applications in quantum information theory [30, 31, 32, 33].
Mistake bound of the path length setting.
We can obtain a mistake bound in this setting when we assume the ground truth has a bounded path length.
Corollary 1.
Let be a sequence of -qubit mixed state whose path length , and let be a sequence of two-outcome measurements that are revealed to the learner one by one, each followed by a value such that . Then there is an explicit strategy for outputting hypothesis states such that for at most times.
Dynamic regret given dynamical models.
So far, we have not made assumptions on how the state evolves. However, in many cases, the quantum state only evolves under a family of dynamical models. A natural question is the following: If we know the family of possible dynamical models in advance, can we achieve a better dynamic regret?
First, we consider the case when there is a fixed family of dynamical models. The most general linear maps implementable on quantum computers are quantum channels satisfying:
-
•
Completely positive: For any and -dimensional identity matrix , ;
-
•
Trace preserving: for any quantum state .
A common example is , where is a background Hamiltonian that evolves the quantum state following the Schrödinger equation.
We first consider the case with possible dynamical models , each of which is a quantum channel. Following hall2013dynamical [34] and zhang2018adaptive [28], we restrict a variant of the path length of the comparator:
(6) |
Corollary 2.
Under the assumption that we have possible dynamical models and the path length , there is an algorithm with dynamic regret .
In general, the possible dynamical models can be a continuous set. Considering that current quantum computers [23, 24, 25] commonly use 2D chips where a quantum gate only influences a constant number of qubits, it is natural to consider -local quantum channels where each channel only influences at most qubits (and perform identity on the other qubits), and the following result holds. The proofs of Corollary 2 and Corollary 3 are deferred to Appendix B.
Corollary 3.
Under the assumption that the dynamical models are -local quantum channels, there is an algorithm with dynamic regret , where and is the set of all -local quantum channels.
4 Minimizing Adaptive Regret
In this section, we minimize the adaptive regret in Eq. (2) in the non-realizable case, i.e., can be arbitrarily selected. We follow a meta-algorithm called Coin Betting for Changing Environment (CBCE) proposed by jun2017improved [20] with its black-box algorithm being the Regularized Follow-the-Leader based algorithm introduced by aaronson2018online [5] for learning quantum states. The algorithm is given in Algorithm 3.
The basic idea is that the CBCE meta-algorithm can transfer the regret bound of any black-box algorithm to strongly adaptive regret, and the RFTL algorithm with von Neumann entropy regularization achieves regret bound which can serve as the black-box algorithm. Formally:
Lemma 1.
Let be a sequence of two-outcome measurements on an -qubit state presented to the learner, and suppose the loss is convex, L-Lipschitz, and maps to . Then Algorithm 3 guarantees strongly adaptive regret for all .
-shift regret bound.
As an application of Lemma 1, we derive a -shift regret bound. In the -shift setting, we assume that can change at most times, and we derive the following theorem.
Theorem 2.
Under the same setting as Lemma 1, if we allow to change at most times, we have an bound on the -shift regret :
(7) |
Mistake bound of the -shift setting.
With Theorem 2 in hand, when we further assume that the ground truth also shifts at most times, we can derive the following mistake bound.
Corollary 4.
Let be a sequence of -qubit mixed state which changes at most times, and let be a sequence of two-outcome measurements that are revealed to the learner one by one, each followed by a value such that . Then there is an explicit strategy for outputting hypothesis states such that for at most times.
The proofs of the claims in this section are deferred to Appendix C.
5 Numerical Experiments
In this section, we present numerical results which support our regret bounds.333The implementation codes can be found at the link: https://github.com/EstelYang/Adaptive-Online-Learning-of-Quantum-States Specifically, we conduct experiments for comparing with non-adaptive algorithms and testing our performance of the -shift setting and the path length setting, respectively. All the experiments are performed on Visual Studio Code 1.6.32 on a 2021 MacBook Pro (M1 Pro chip) with 10-core CPU and 16GB memory. We generate random -qubit quantum states using the QuTiP package [35, 36]. In the -shift setting, each of the changes of the quantum state are chosen uniformly at random from all iterations . In the path length setting, our quantum state evolves slowly following a background Hamiltonian also generated by QuTiP. At time we generate a POVM measurement with eigenvalues and eigenvectors chosen uniformly at random from and -norm unit vectors, respectively. Then, we apply the measurement, receive loss, and update our prediction using Algorithm 1 and Algorithm 3. For all experiments, the instantaneous regret at each time step is computed using the ground truth state, which changes over each run depending on the setting.
Comparison to non-adaptive algorithms.
One eminent strength of adaptive online learning of quantum states is its ability to capture the changing states. Previous algorithms such as regularized follow-the-leader (RFTL) perform well for predicting fixed quantum states [5], but they are not adaptive when the quantum state evolves. We compare the cumulative regret produced by RFTL and our algorithms in the -shift setting and the path length setting. The results are illustrated as follows.










The advantage of our algorithms in the -shift setting is presented in Figure 1. In Subfigure(a), we plot the long term changes of the cumulative regret. The two curves have similar trends but CBCE achieves smaller regret. In Subfigure(b)-(f), we choose a small time scale to display the details. It is clearly shown that the distances between two curves grow larger as increases, which means our algorithm maintains the advantage in the long run.
Figure 2 shows the advantage of our algorithms in the path length setting. In Subfigure(a) and (b), we plot the comparison between our Algorithm 1 (denoted by “DOMD”) and the non-adaptive algorithm RFTL with different learning rates. It can be seen that in a single run, our algorithm attains the minimum regret and is comparable to RFTL with the best learning rate. Indeed, given the similarity of RFTL and OMD, we expect RFTL with the optimal learning rate to do as well as DOMD in practice. However, since the optimal learning rate for OMD depends on the path length, it is unknown a priori, and non-adaptive methods require more tuning to achieve good performance. In Subfigure(c) and (d), we plot the comparison between our Algorithm 3 (denoted by “CBCE”) and the non-adaptive RFTL with different . There is a significant gap between the regret of CBCE and RFTL with the best learning rate from the grid search, demonstrating the interesting phenomenon that empirically, CBCE with adaptive regret guarantees also does well in the path length setting.
Adaptive regret bound on the -shift setting.
We also conduct experiments to verify our adaptive regret bound in the -shift setting (Theorem 2). Results are shown in Figure 3.


In Subfigure(a), we plot the average regret as grows. Every data-point was executed by 100 random experiments and taking the maximum regret, since regret is an upper bound. Under different parameter choices , always converges to 0 when increases, implying a sublinear regret . More concretely, curves with larger has larger and converges more slowly, consistent with the intuition that more shifts make the learning task more difficult.
In Subfigure(b), to verify the precise bound we plot the ratio . We choose , and repeat the experiments 50 times randomly. We then plot the maximum of the regret constant on every interval. As the figure shows, the ratio is limited by a constant . This strongly supports our upper bound on with every fixed .
Dynamic regret bound on the path length setting.
In addition, we conduct experiments in terms of average regret and ratio term to testify our dynamic regret bound in the path length setting (Theorem 1). Results are shown in Figure 4.


In Subfigure(a), we plot the change of average regret . We observe that as grows, converges to 0 with different pairs of , indicating that is sublinear with respect to . In addition, the blue curve representing the average regret of a 2-qubit quantum state appears higher than the orange curve representing the average regret of a 1-qubit quantum state, consistent with the fact that our regret bound depends polynomially on the number of qubits.
In Subfigure(b), we choose different scales of to demonstrate how the ratio evolves in the long term. For every fixed , we execute 100 random experiments and take the maximum ratio term . Although there are noticeable fluctuations, it can be seen that the curve is non-increasing and only fluctuates within a fixed interval of 0 to 0.065, which shows that has a constant upper bound . In conclusion, is within the range of a small positive constant.
6 Conclusions
In this paper, we consider online learning of changing quantum states. We give algorithms with adaptive and dynamic regret bounds that are polynomial in the number of qubits and sublinear in the number of measurements, and verify our results with numerical experiments.
Our work leaves a couple of open questions for future investigation:
-
•
Lower bound in and path length are known in [28], but there is no lower bound in to our knowledge. Can we prove an lower bound for dynamic regret or adaptive regret of online learning of quantum states? We noticed a recent work [37] that proposed a lower bound in terms of for quantum channels. It would be intriguing in future work to explore whether these techniques can be adapted to the quantum state scenario.
-
•
Although our algorithms do not care how the loss function is revealed, in practice, a standard way is to measure several copies of the state at each step, so the copies need to follow the same dynamics. Accounting for different state evolution for different state copies is left as future work.
Acknowledgments
TL, XW, and RY were supported by the National Natural Science Foundation of China (Grant Numbers 92365117 and 62372006), and The Fundamental Research Funds for the Central Universities, Peking University.
References
- [1] Scott Aaronson. “Shadow tomography of quantum states”. SIAM Journal on Computing 49, STOC18–368–STOC18–394 (2020).
- [2] Costin Bădescu and Ryan O’Donnell. “Improved quantum data analysis”. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing. Pages 1398–1411. (2021).
- [3] Fernando G.S.L. Brandão, Amir Kalev, Tongyang Li, Cedric Yen-Yu Lin, Krysta M. Svore, and Xiaodi Wu. “Quantum SDP solvers: Large speed-ups, optimality, and applications to quantum learning”. In Proceedings of the 46th International Colloquium on Automata, Languages, and Programming. Volume 132 of Leibniz International Proceedings in Informatics, pages 27:1–27:14. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2019).
- [4] Scott Aaronson and Guy N Rothblum. “Gentle measurement of quantum states and differential privacy”. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing. Pages 322–333. (2019).
- [5] Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak. “Online learning of quantum states”. Advances in neural information processing systems31 (2018). url: https://proceedings.neurips.cc/paper_files/paper/2018/file/c1a3d34711ab5d85335331ca0e57f067-Paper.pdf.
- [6] Feidiao Yang, Jiaqing Jiang, Jialin Zhang, and Xiaoming Sun. “Revisiting online quantum state learning”. In Proceedings of the AAAI Conference on Artificial Intelligence. Volume 34, pages 6607–6614. (2020).
- [7] Yifang Chen and Xin Wang. “More practical and adaptive algorithms for online quantum state learning” (2020). arXiv:2006.01013
- [8] Josep Lumbreras, Erkka Haapasalo, and Marco Tomamichel. ‘‘Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states’’. Quantum 6, 749 (2022).
- [9] Julian Zimmert, Naman Agarwal, and Satyen Kale. ‘‘Pushing the efficiency-regret pareto frontier for online learning of portfolios and quantum states’’. In Conference on Learning Theory. Pages 182--226. PMLR (2022). url: https://proceedings.mlr.press/v178/zimmert22a.html.
- [10] Harrison Ball, Thomas M. Stace, Steven T. Flammia, and Michael J. Biercuk. ‘‘Effect of noise correlations on randomized benchmarking’’. Physical Review A 93, 022303 (2016).
- [11] Daniel Greenbaum and Zachary Dutton. ‘‘Modeling coherent errors in quantum error correction’’. Quantum Science and Technology 3, 015007 (2017).
- [12] Richard Kueng, David M. Long, Andrew C. Doherty, and Steven T. Flammia. ‘‘Comparing experiments to the fault-tolerance threshold’’. Physical Review Letters 117, 170502 (2016).
- [13] Joel J. Wallman. ‘‘Bounding experimental quantum error rates relative to fault-tolerant thresholds’’ (2015). arXiv:1511.00727
- [14] Joel Wallman, Chris Granade, Robin Harper, and Steven T. Flammia. ‘‘Estimating the coherence of noise’’. New Journal of Physics 17, 113020 (2015).
- [15] Quoc Hoan Tran and Kohei Nakajima. ‘‘Learning temporal quantum tomography’’. Physical Review Letters 127, 260401 (2021).
- [16] John Preskill. ‘‘Quantum computing in the NISQ era and beyond’’. Quantum 2, 79 (2018).
- [17] Martin Zinkevich. ‘‘Online convex programming and generalized infinitesimal gradient ascent’’. In International Conference on Machine Learning. Pages 928--936. (2003). url: https://dl.acm.org/doi/10.5555/3041838.3041955.
- [18] Elad Hazan and Comandur Seshadhri. ‘‘Efficient learning algorithms for changing environments’’. In International Conference on Machine Learning. Pages 393--400. (2009).
- [19] Amit Daniely, Alon Gonen, and Shai Shalev-Shwartz. ‘‘Strongly adaptive online learning’’. In International Conference on Machine Learning. Pages 1405--1411. PMLR (2015). url: https://proceedings.mlr.press/v37/daniely15.html.
- [20] Kwang-Sung Jun, Francesco Orabona, Stephen Wright, and Rebecca Willett. ‘‘Improved strongly adaptive online learning using coin betting’’. In Artificial Intelligence and Statistics. Pages 943--951. PMLR (2017). url: https://proceedings.mlr.press/v54/jun17a.html.
- [21] Ashok Cutkosky. ‘‘Parameter-free, dynamic, and strongly-adaptive online learning’’. In International Conference on Machine Learning. Pages 2250--2259. PMLR (2020). url: https://proceedings.mlr.press/v119/cutkosky20a.html.
- [22] Koenraad M. R. Audenaert. ‘‘A sharp continuity estimate for the von Neumann entropy’’. Journal of Physics A: Mathematical and Theoretical 40, 8127--8136 (2007).
- [23] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, and et al. ‘‘Quantum supremacy using a programmable superconducting processor’’. Nature 574, 505--510 (2019).
- [24] IBM Quantum. ‘‘Pushing quantum performance forward with our highest quantum volume yet’’. https://research.ibm.com/blog/quantum-volume-256 (2022). Accessed: 2022-05-13.
- [25] Qingling Zhu, Sirui Cao, Fusheng Chen, Ming-Cheng Chen, Xiawei Chen, Tung-Hsun Chung, Hui Deng, Yajie Du, Daojin Fan, Ming Gong, and et al. ‘‘Quantum computational advantage via 60-qubit 24-cycle random circuit sampling’’. Science Bulletin 67, 240--245 (2022).
- [26] Weiyuan Gong and Scott Aaronson. ‘‘Learning distributions over quantum measurement outcomes’’. In International Conference on Machine Learning. Pages 11598--11613. PMLR (2023). url: https://proceedings.mlr.press/v202/gong23a.html.
- [27] Elad Hazan. ‘‘Introduction to online convex optimization’’. Foundations and Trends® in Optimization 2, 157--325 (2016).
- [28] Lijun Zhang, Shiyin Lu, and Zhi-Hua Zhou. ‘‘Adaptive online learning in dynamic environments’’. In Advances in Neural Information Processing Systems. Volume 31. (2018). arXiv:1810.10815.
- [29] M. Fannes. ‘‘A continuity property of the entropy density for spin lattice systems’’. Communications in Mathematical Physics 31, 291--294 (1973).
- [30] Michael M. Wolf, Frank Verstraete, Matthew B. Hastings, and J. Ignacio Cirac. ‘‘Area laws in quantum systems: mutual information and correlations’’. Physical Review Letters 100, 070502 (2008).
- [31] Johan Åberg. ‘‘Catalytic coherence’’. Physical Review Letters 113, 150402 (2014).
- [32] Andreas Winter and Dong Yang. ‘‘Operational resource theory of coherence’’. Physical Review Letters 116, 120404 (2016).
- [33] Eric Chitambar and Gilad Gour. ‘‘Quantum resource theories’’. Reviews of Modern Physics 91, 025001 (2019).
- [34] Eric Hall and Rebecca Willett. ‘‘Dynamical models and tracking regret in online convex programming’’. In International Conference on Machine Learning. Pages 579--587. PMLR (2013). url: https://proceedings.mlr.press/v28/hall13.html.
- [35] J.R. Johansson, P.D. Nation, and Franco Nori. ‘‘QuTiP: An open-source Python framework for the dynamics of open quantum systems’’. Computer Physics Communications 183, 1760--1772 (2012).
- [36] J.R. Johansson, P.D. Nation, and Franco Nori. ‘‘QuTiP 2: A Python framework for the dynamics of open quantum systems’’. Computer Physics Communications 184, 1234--1240 (2013).
- [37] Asad Raza, Matthias C. Caro, Jens Eisert, and Sumeet Khatri. ‘‘Online learning of quantum processes’’ (2024). arXiv:2406.04250.
- [38] Heinz H. Bauschke and Patrick L. Combettes. ‘‘Convex analysis and monotone operator theory in hilbert spaces’’. Springer Publishing Company, Incorporated. (2011). 1st edition.
- [39] Eric P. Hanson and Nilanjana Datta. ‘‘Maximum and minimum entropy states yielding local continuity bounds’’. Journal of Mathematical Physics 59, 042204 (2018).
- [40] Joel A. Tropp. ‘‘From joint convexity of quantum relative entropy to a concavity theorem of lieb’’. Proceedings of the American Mathematical Society 140, 1757--1760 (2012). url: http://www.jstor.org/stable/41505631.
- [41] Rajendra Bhatia. ‘‘Matrix analysis’’. Volume 169. Springer. New York (1997).
- [42] Göran Lindblad. ‘‘Completely positive maps and entropy inequalities’’. Communications in Mathematical Physics 40, 147--151 (1975).
- [43] William F. Stinespring. ‘‘Positive functions on c*-algebras’’. Proceedings of the American Mathematical Society 6, 211--216 (1955). url: https://community.ams.org/journals/proc/1955-006-02/S0002-9939-1955-0069403-4/S0002-9939-1955-0069403-4.pdf.
- [44] Emanuel Knill. ‘‘Approximation by quantum circuits’’ (1995). arXiv:quant-ph/9508006
Appendix A Proof of Dynamic Regret Bounds for Quantum Tomography
We prove Theorem 1 in this section.
We use the online mirror descent (OMD) algorithm (similar to Algorithm 1 in aaronson2018online [5]) to replace the online gradient descent (OGD) expert in zhang2018adaptive [28]. We denote to be an upper bound on the spectral norm of , and to be the diameter of with respect to the function . The main motivation is that the dimension now is and we need logarithmic dependence on dimension for our regret bound, where norm (used by OGD) is not the most appropriate choice and fails to control .
Instead, the OMD algorithm with von Neumann entropy has a polynomial upper bound on both and , and the only question left is how to incorporate the path length into the analysis of OMD to achieve dynamic regret. We observe that such incorporation is natural, due to the essential similarity between OGD and OMD. We bound the path distance in nuclear norm, and bound the gradient of the regularization by spectral norm.
A potential issue is that the von Neumann entropy contains logarithmic terms, which might cause the gradient to explode when is very small. To mitigate this issue, we mix with : , for both and . Notice that this mixing does not hurt the regret bound: mixing such a small fraction of the identity only increases the regret by , since the losses are -Lipschitz. In addition, the path length will only decrease, since we are working over a smaller domain. Henceforth we will consider the modified throughout the proof.
Our proof uses tools in convex analysis for Hilbert spaces. Since all matrices used in Algorithm 2 are Hermitian (by Proposition 1), it suffices to consider the Hilbert space of Hermitian matrices equipped with the trace inner product. Note that this is a real Hilbert space: the vector space of Hermitian matrices only allows scalar multiplication over the field of reals.
We first present some basic results for the von Neumann Entropy. For completeness, we give the following definitions from operator_theory [38]; for a more comprehensive treatment of the topic of analysis in Hilbert spaces, see also operator_theory [38].
Definition 1 (Gateaux differentiability).
Let be a subset of a Hilbert space , and . Let such that for each , such that .Then is Gateaux differentiable at if there exists a bounded linear operator called the Gateaux derivative of at , such that
Definition 2 (Directional derivatives).
For a Hilbert space , let be proper, and .The directional derivative of at in the direction is
if the limit exists on the extended real line.
It is well-known that is Gateaux differentiable at if and only if all directional derivatives at exist and they form a bounded linear operator. By the Riesz-Frechet representation theorem, let denote the Hilbert space inner product, there exists a unique such that for each ,
Lemma 2 (Lemma VI.4 in Ref. [39]).
The von Neumann entropy for Hermitian PD matrices is Gateaux differentiable, and the Gateaux derivative is
The quantum analog of the classical Bregman divergence is therefore defined as
The above expression is also known as the quantum relative entropy of with respect to , the quantum information divergence, and the von Neumann divergence [40].
Proposition 1.
The variable in Algorithm 2 is always Hermitian PD.
Proof.
By the definition of and , we have Recall that the loss functions are of the form . Since is Hermitian, is also Hermitian. The lemma follows by definition. ∎
Proposition 2.
For all Hermitian PSD matrices , . For all and all Hermitian PD , , and for all and as defined in Algorithm 2, .
Proof.
The first claim follows from the definition of and . Since both are both positive definite, by Fact 3 in quantum_divergence [40] (Klein’s inequality), . For the last claim, be any element, to show , we only need to show . By direct computation,
since . ∎
Proposition 3 (Euler’s inequality).
Let be as defined in Algorithm 2, then for any , we have
Proof.
For ,
By Lemma 2, is Gateaux differentiable on the set of positive definite Hermitian matrices. Since and are both Gateaux differentiable, is also Gateaux differentiable as a function of . By definition, , and therefore for and any ,
Dividing by and taking , we have
Expanding the derivative, . Observe that we have the decomposition
which means , because by Proposition 2. ∎
We now prove the main lemma which gives a dynamic regret bound of the OMD algorithm, when the learning rate is chosen optimally.
Lemma 3.
Assume the path length where is the nuclear norm, the dynamic regret of Algorithm 2 is upper bounded by , if we choose .
Proof.
We follow the standard analysis of OMD, assuming are -Lipschitz. By convexity of loss , we have that
(8) |
The following property of Bregman divergences follows from the definition: for any Hermitian PSD matrices ,
(9) |
Now, take , , and and combine (8) and (9). Writing , we have
where the second inequality is due to Proposition 3.
The following lemma gives an upper bound on one of the quantities in Eq. (LABEL:eqn:loss_ineq), which we will use next in the proof. In vanilla OMD, this quantity is upper bounded using the fact that the Bregman divergence can be interpreted as a distance measured by a certain local norm; however, under complex domain, this fact is no longer immediate from the Mean Value Theorem. We instead prove the upper bound by using properties of the dual space updates .
Lemma 4.
Let be an upper bound on , then
Proof.
First, note that by definition, satisfies
(10) |
Then we have and by the Golden-Thompson inequality,
(11) |
Expanding the Bregman divergence, we have
(by (10)) | ||||
(by (11)) |
By definition, the matrix exponential can be written as an infinite sum,
Using this expression in the last term of the inequality,
Now we proceed to show that if is sufficiently small.
Let be the eigendecomposition of . Then
Since , and ,
Summing over and telescoping and observing any eigenvalue is lower bounded by , we have
Regret | ||||
(12) |
From aaronson2018online [5] we already know and due to the use of the von Neumann entropy. We now compute an upper bound on , suppose is the eigendecomposition of :
The inequality is due to the fact that for , the eigenvalues of are in .
For the last term in the regret bound, the Fannes–Audenaert inequality [22] states that
where is the Shannon entropy.
We try to bound by a linear function . Formally we have the following lemma:
Lemma 5.
For any , we have for ,
Proof.
Take the derivative of , we have that the minimal happens when which is , and it’s easy to verify that the minimal value of equals
by taking , the minimal value equals zero and therefore is an upper bound of . ∎
Choosing , we can upper bound the regret by
(13) |
Choosing , the regret is upper bounded by .
Now we have proven the following dynamic regret bound:
Notice that
we conclude that
∎
For the rest of the section, we reason about choosing the optimal , which depends on the actual path length that we do not know in advance. Algorithm 1 tackles this problem by constructing a set of candidate learning rates, say , then initiate number of Algorithm 2 with different , and run weighted majority algorithm on top of these experts.
The regret of such a two-layer algorithm can be decomposed as a sum of the regret of the base expert and the meta MW algorithm [19]. Since the path length is upper bounded by , we need only base experts, and the additional regret induced by the weighted majority algorithm is only . This regret is dominated by the dynamic regret of the best base expert, and thus is negligible in the final bound. Meanwhile, the optimal is guaranteed to be either a small constant which is already good, or lie between two candidate learning rates so that one of them is near-optimal.
More concretely, the best expert has regret upper bounded by due to the existence of an optimal by Lemma 3. The regret of the external MW algorithm in Algorithm 1 is upper bounded by , a well-known regret bound of MW whose proof we omit here (see 27, Section 1.3 for example). Summing up, we get the final result .
Appendix B Proof of Dynamic Regret Bounds Given Dynamical Models
B.1 Proof of Corollary 2
Similar to Appendix A, we also need to mix with for both and . Notice that for any quantum channel , we have , so we can use the same dynamical model in the new domain and the path length of the mixed comparators may only decrease. Due to the -Lipschitzness of loss functions, the regret is only hurt by .
We first prove a lemma which gives a dynamic regret bound of Algorithm 5 when the learning rate is chosen optimally.
Lemma 6.
Under the assumptions that the path length where is the nuclear norm and is a quantum channel, the dynamic regret of Algorithm 5 is upper bounded by , if we choose .
Proof.
In general, quantum channels can be seen as completely positive trace-preserving (CPTP) maps on the space of density operators. Ref. lindblad1975completely [42] gave the following result:
Lemma 7 (Contraction of CPTP maps with respective to relative entropy 42).
Let be the operators in a Hilbert space with finite trace and be the positive elements in . If is a completely positive trace-preserving map of into itself, then for all
where is the relative entropy between and .
Since density operators on finite Hilbert space is positive and has finite trace, and when is the von Neumann entropy, we can derive that if is a quantum channel on ,
(14) |
Writing , similar to Eq. (LABEL:eqn:loss_ineq), we can prove that
where the fourth inequality comes from Eq. (14).
By telescoping sums, we get
Regret | |||
Notice that this inequality is the same as Eq. (A) replacing by , so we can follow the same proof of Eq. (13) and obtain a variant regret bound
where .
Choosing , the regret is upper bounded by . ∎
Proof of Corollary 2.
We will prove that Algorithm 4 can achieve the dynamic regret bound in this corollary and in the following proof we will use the notation in the description of Algorithm 4.
In Algorithm 4, we run weighted majority algorithm on experts, each of which runs Algorithm 5 with quantum channel and step size in .
In Lemma 1 of the arXiv version of zhang2018adaptive [28], they give a regret bound of the weighted majority algorithm:
for any choosing .
Let be the optimal one of all experts, we have
which concludes our result. ∎
B.2 Proof of Corollary 3
In general, the possible dynamical models can be a continuous set, and we can take it to be the set of all local quantum channels for example. A quantum channel on qubits is said to be -local if it can be written as a product channel , where only influences qubits.
Lemma 8 (Stinespring’s dilation theorem 43).
Let be a completely positive and trace-preserving map on the space of density operators of a Hilbert space . Then there exists a Hilbert space with and a unitary operation on such that
for all , where denotes the partial trace on .
By Stinespring’s dilation theorem, we can represent the quantum channel on a dimension Hilbert space by partial trace of a pure quantum channel on a dimension Hilbert space , so we only need to find -net of unitary operators.
Lemma 9 (-net of 44, Lemma 4.4).
Let be the set of unitaries on Hilbert space with . For , there is a subset of with
such that for any , there exists with .
Lemma 10 (-net of quantum channels).
Let be the set of all quantum channels on density operators on Hilbert space with . For any , there is a subset of with
such that for any , there exists such that .
Proof.
Let be a Hilbert space with and . Let be the -net of unitaries on in Lemma 9.
Then, we can construct .
For an arbitrary , from Lemma 8, we can show that there exists a Hilbert space with and a unitary operation on such that
for any .
Notice that and are isomorphic since they are Hilbert space with the same dimension, so there exists such that .
From the definition of , there exists such that
for any .
Then, we can show that for any
where in the first equality, we let be a quantum channel on the density operators on since and are isomorphic.
The size of is the same as which is . ∎
With Lemma 10, we can construct an -net of -local quantum channels.
Lemma 11 (-net of -local quantum channels).
Let be the set of all -local quantum channels on qubits. For any , there is a subset of with
such that for any , there exists such that .
Proof.
From Lemma 10, we can construct a collection of sets . Each is a set of quantum channels on and contains extension of channels in in Lemma 10 acting on qubits and all operators in it perform identity on the other qubits, so .
Let be the union of all in , then we can conclude for any quantum channel , there exists with and . ∎
Proof of Corollary 3.
Let
(15) |
where is the set of all -local quantum channels on qubits.
For any , there exists such that , then we have
for any . Thus, we have
From Corollary 2, we can show that the regret of Algorithm 4 with the candidate set of quantum channels to be the in Lemma 11 satisfies
Regret | |||
Let , we can achieve a regret of when . ∎
Appendix C Other Proofs
C.1 -outcome Measurement Cases
We can generalize the two-outcome measurements in our results to -outcome measurements. The only issue of replacing two-outcome measurements with -outcome measurements is that the output of the -outcome measurement is a probability distribution rather than a scalar, so we need a new loss function to measure the regret. Let and be the probability distribution corresponding to the output of the measurement actting on the ground truth state . In this case, we can define the loss function at time to be
(16) |
This is a convex function so the proof in previous sections still holds and we only need to determine which is the upper bound on .
First, we prove the following inequality on the spectral norm of linear combination of . For any -outcome measurement described by and real numbers such that for all , we have
(17) |
where we use spectral norm in the first line and the first inequlity comes from . Then we have
(18) |
where is the sign function.
Therefore, is a valid parameter in this case. Since we have in previous two-outcome cases, we can simply repalce all with in our results in two-outcome cases to obatin an algorithm in -outcome cases with loss function in Eq. (16).
C.2 Proof of Corollary 1
Proof.
To achieve the mistake bound, we run in a subset of the iterations. Specifically, we only update if . Since whenever , the number of the mistakes makes is at most the number of the updates makes.
Let the sequence of when updates be . Since is a valid comparator sequence (sub-sequence of a -shift sequence shifts at most times) with , the number of updates satisfies the inequality , where comes from the fact that the algorithm’s regret compared to the best expert is upper bounded by . Thus, it implies , which completes the proof. ∎
C.3 Proof of Lemma 1
Proof.
The CBCE algorithm in jun2017improved [20] is a meta-algorithm which uses any online convex optimization algorithm as a black-box algorithm and obtain a strongly adaptive regret with an overhead for any interval length . Its guarantee is formalized in the following lemma.
Lemma 12 (SA-Regret of , 20, Theorem 2).
Assume the loss function , is convex and maps to and that the black-box algorithm has static regret bounded by , where . Let . The SA-Regret of CBCE with black-box on the interval I is bounded as follows:
where denotes the meta-algorithm. Note that this bound can be obtained with any bounded convex function sequence .
For the choice of black-box algorithm, the RFTL based algorithm we used as the black-box function in aaronson2018online [5] achieves an bound on the static regret when are or loss.
Lemma 13 (5, Theorem 2).
Let be a sequence of two-outcome measurements on an -qubit state presented to the learner, and suppose is convex and L-Lipschitz. Then there is an explicit learning strategy that guarantees regret for all . Specifically, when applied to loss and loss, the RFTL algorithm achieves regret for both.
C.4 Proof of Theorem 2
Proof.
Given any and , guarantees that
where is some constant. In the second line we use the result of Lemma 1, and in the third line we use the Cauchy–Schwarz inequality. ∎
C.5 Proof of Corollary 4
Proof.
To achieve the mistake bound, we run in a subset of the iterations. Specifically, we only update if . Since whenever , the number of the mistakes makes is at most the number of the updates makes.
Let the sequence of when updates be . Since is a valid comparator sequence (sub-sequence of a -shift sequence shifts at most times) with , from Theorem 2, we can conclude that the number of updates satisfies the inequality . Thus, it implies , which completes the proof. ∎