This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Adaptive Online Learning of Quantum States

Xinyi Chen Department of Computer Science, Princeton University, NJ 08540, USA Google DeepMind Princeton, NJ 08542, USA [email protected]    Elad Hazan Department of Computer Science, Princeton University, NJ 08540, USA Google DeepMind Princeton, NJ 08542, USA [email protected]    Tongyang Li Center on Frontiers of Computing Studies, Peking University, 100871 Beijing, China School of Computer Science, Peking University, 100871 Beijing, China [email protected]    Zhou Lu Department of Computer Science, Princeton University, NJ 08540, USA Google DeepMind Princeton, NJ 08542, USA [email protected]    Xinzhao Wang Center on Frontiers of Computing Studies, Peking University, 100871 Beijing, China School of Computer Science, Peking University, 100871 Beijing, China [email protected]    Rui Yang Center on Frontiers of Computing Studies, Peking University, 100871 Beijing, China School of Computer Science, Peking University, 100871 Beijing, China [email protected]
Abstract

The problem of efficient quantum state learning, also called shadow tomography, aims to comprehend an unknown dd-dimensional quantum state through POVMs. Yet, these states are rarely static; they evolve due to factors such as measurements, environmental noise, or inherent Hamiltonian state transitions. This paper leverages techniques from adaptive online learning to keep pace with such state changes.

The key metrics considered for learning in these mutable environments are enhanced notions of regret, specifically adaptive and dynamic regret. We present adaptive and dynamic regret bounds for online shadow tomography, which are polynomial in the number of qubits and sublinear in the number of measurements. To support our theoretical findings, we include numerical experiments that validate our proposed models.

1 Introduction

As opposed to their classical counterparts, quantum states over nn qubits requires 2n2^{n} “amplitudes” for their description. Therefore, recovering the density matrix of a quantum system with nn qubits from measurements becomes an impractical task for even moderate values of nn. The setting of shadow tomography [1] attempts to circumvent this issue: rather than recovering the entire state matrix, the goal is to accurately reconstruct the “shadow” that the density matrix casts on a set of measurements that are known in advance.

More precisely, let ρ\rho be an nn-qubit quantum state: that is, a 2n×2n2^{n}\times 2^{n} Hermitian positive semidefinite matrix with Tr(ρ)=1\operatorname{Tr}(\rho)=1. We can measure ρ\rho by a two-outcome positive operator-valued measure (POVM) EE, which can be represented by a 2n×2n2^{n}\times 2^{n} Hermitian matrix with eigenvalues in [0,1][0,1]. Such a measurement EE accepts ρ\rho (returns 1) with probability Tr(Eρ)\operatorname{Tr}(E\rho) and rejects ρ\rho (returns 0) with probability 1Tr(Eρ)1-\operatorname{Tr}(E\rho). Given a sequence of POVMs E1,,EmE_{1},\ldots,E_{m}, shadow tomography uses copies of ρ\rho to estimate Tr(Eiρ)\operatorname{Tr}(E_{i}\rho) within ϵ\epsilon for all i[m]i\in[m] with probability at least 2/32/3.

The state-of-the-art result [2] can accomplish this task with O~(n(logm)2/ϵ4)\tilde{O}(n(\log m)^{2}/\epsilon^{4}) copies of ρ\rho, exponentially improving upon full state tomography in terms of nn. However, in the real world a POVM EE can be chosen based on previous actions from the observer (learner) and is not known in advance. Moreover, shadow tomography may be unfeasible on near-term quantum computers, as current methods aaronson2018shadow [1, 3, 4, 2] all require a joint measurement on multiple copies of ρ\rho, which needs to be implemented on larger quantum systems limited by the scale of existing quantum devices.

In light of both concerns it is natural to consider online learning of quantum states. In this problem, the measurements appear sequentially and the goal is to select a sequence of states to minimize regret compared to the optimal quantum state in retrospect. This problem naturally fits into the online convex optimization framework, in which a player makes a decision at each step, and then is revealed a convex loss function and suffers the loss of its decision. The player makes decisions to minimize the total loss. However, this problem is too difficult since the loss functions can be arbitrary, so we only expect to minimize the loss of the player minus the minimum loss of making the same decision at every step, which is called the regret of the player. In the online learning of quantum states setting, the player outputs a quantum state φt\varphi_{t} at step tt and suffers a loss measuring the difference between φt\varphi_{t} and the true state under the chosen POVM. The sequential nature of the setting gives rise to algorithms that treat each measurement separately, thereby eliminating the need for joint measurements111In some shadow tomography algorithms using similar online learning schemes aaronson2018shadow [1, 2], they do not need joint measurements when updating the state but do need them when choosing the POVMs. The existing algorithm of [5], along with subsequent works [6, 7, 8, 9], yields a sequence of states with regret O(Tn)O(\sqrt{Tn}) for TT iterations. In some cases, these regret bounds can be formalized as mistake bounds which count the number of steps that the player suffers a loss larger than a given threshold.

Nevertheless, existing literature on shadow tomography does not cover an important factor in current quantum computers: fluctuation of quantum states. In practical scenarios, we often cannot calibrate the quantum state constantly, and shadow tomography algorithms requiring multiple identical copies of ρ\rho may not be viable. For instance, on the IBM quantum computing platform, a quantum circuit with measurements will be executed hundreds or thousands of shots in a row without calibrations in such consecutive experiments.

In real-world experiments, a quantum state ρ\rho is typically obtained by evolving a control Hamiltonian HH for some time τ\tau starting from an initial state ρ0\rho_{0}, i.e., ρ=eiHτρ0eiHτ\rho=e^{iH\tau}\rho_{0}e^{-iH\tau}. If the device is not calibrated for a while, the control Hamiltonian implemented is a noisy one H+δHH+\delta H, and the actual quantum state we prepare is a perturbed state ρ=ei(H+δH)τρ0ei(H+δH)τ\rho^{\prime}=e^{-i(H+\delta H)\tau}\rho_{0}e^{i(H+\delta H)\tau}. More discussions about noises in quantum states can be found in ball2016effect [10, 11, 12, 13, 14].

In light of these fluctuations, recent study on state tomography extends to temporal quantum tomography [15] with changing quantum states. Therefore, a fundamental question from our perspective of machine learning is learning changing quantum states, which naturally fits into the framework of adaptive online learning. Such quantum algorithms are crucial for near-term noisy, intermediate-scale quantum computers (NISQ), since the design of noise-resilient algorithms may extend the computational power of NISQ devices [16].

Adaptive online learning.

The motivation above suggests a learning problem that assumes the quantum state prepared at time step t=1,2,,Tt=1,2,\ldots,T is ρt\rho_{t}. Due to imperfect calibration, ρt\rho_{t} at different time tt is different, and our goal is to learn ρt\rho_{t} in an online fashion. The problem of online learning of a changing concept has received significant attention in the machine learning literature, and we adapt several of these techniques to the quantum setting, which involves high-dimensional spaces over the complex numbers.

The first metric we consider is the "dynamic regret" introduced by zinkevich2003online [17], which measures the difference between the learner’s loss and that of a changing comparator. The dynamic regret bounds are usually characterized by how much the optimal comparator changes over time, known as the "path length."

Next, we consider a more sophisticated metric, the "adaptive regret," as introduced by hazan2009efficient [18]. This metric measures the maximum of the regret over all intervals, essentially taking into account a changing comparator. Many extensions and generalizations of the original technique have been presented in works such as daniely2015strongly [19, 20, 21]. Minimizing adaptive regret requires a combination of expert algorithms and continuous online learning methods, which we also adopt for the quantum setting.

Contributions.

In this paper, we systematically study adaptive online learning of quantum states. In quantum tomography, we might have prior knowledge about ρt\rho_{t}, but in online learning, it can be adversarial. As such, we expect regret guarantees that extend beyond just comparing with the best fixed φCn\varphi\in C_{n}. We study the following questions summarized in Table 1:

  • Dynamic regret: We consider minimizing regret under the assumption that the comparator φ\varphi changes slowly:

    RegretT,𝒫𝒜=t=1Tt(Tr(Etxt𝒜))minφtCnt=1Tt(Tr(Etφt)), where 𝒫=t=1T1φtφt+11.\displaystyle\text{Regret}_{T,\mathcal{P}}^{\mathcal{A}}=\sum_{t=1}^{T}\ell_{t}\left(\operatorname{Tr}(E_{t}x_{t}^{\mathcal{A}})\right)-\min_{\varphi_{t}\in C_{n}}\sum_{t=1}^{T}\ell_{t}\left(\operatorname{Tr}(E_{t}\varphi_{t})\right),\text{ where }\mathcal{P}=\sum_{t=1}^{T-1}\|\varphi_{t}-\varphi_{t+1}\|_{\color[rgb]{0,0,0}1}. (1)

    Here 𝒫\mathcal{P} is bounded and defined as the path length, the functions t\ell_{t} are LL-Lipschitz for all t[T]t\in[T], and 1\|\cdot\|_{\color[rgb]{0,0,0}1} is the 1\ell_{1} norm of the singular values of a matrix known as the nuclear norm. We propose an algorithm based on online mirror descent (OMD) that achieves an O~(nT𝒫)\tilde{O}(\sqrt{nT\mathcal{P}}) regret bound. Although the analysis shares the same spirit as the standard OMD proof, tackling optimization in the complex domain requires new tools from several areas. Specifically, we use Klein’s inequality from matrix analysis to study the dual update in OMD, and the Fannes–Audenaert inequality [22] from quantum information theory to obtain the dynamic regret bounds. Compared to aaronson2018online [5], our analysis is more general due to the introduction of these tools.

    As an application, under the assumption that φt\varphi_{t} evolves according to a family of dynamical models, we can also prove O~(nT𝒫)\tilde{O}(\sqrt{nT\mathcal{P}^{\prime}}) dynamic regret bounds where 𝒫\mathcal{P}^{\prime} is a variant of path length defined in Eq. (6). This covers two natural scenarios: either the dynamics comes from a family of MM known models (which incurs a logM\log M overhead), or we know that the dynamical model is an O(1)O(1)-local quantum channel, a natural assumption for current quantum computers fabricated by 2D chips [23, 24, 25] where a quantum gate can only influence a constant number of qubits.

  • Adaptive regret: We also consider minimizing the strongly adaptive regret [19] of an algorithm 𝒜\mathcal{A} at time TT for length τ\tau:

    SA-RegretT𝒜(τ)=maxI[T],|I|=τ(tIt(Tr(Etxt𝒜))minφCntIt(Tr(Etφ)))\displaystyle\text{SA-Regret}_{T}^{\mathcal{A}}(\tau)=\max_{I\subseteq[T],|I|=\tau}\bigg{(}\sum_{t\in I}\ell_{t}\left(\operatorname{Tr}(E_{t}x_{t}^{\mathcal{A}})\right)-\min_{\varphi\in C_{n}}\sum_{t\in I}\ell_{t}\left(\operatorname{Tr}(E_{t}\varphi)\right)\bigg{)} (2)

    where xt𝒜x_{t}^{\mathcal{A}} is the output of algorithm 𝒜\mathcal{A} at time tt, and t\ell_{t} is a real loss function that is revealed to the learner, for example, t(z)=|zbt|\ell_{t}(z)=|z-b_{t}| and t(z)=(zbt)2\ell_{t}(z)=(z-b_{t})^{2} for some bt[0,1]b_{t}\in[0,1].222Here we make no assumptions about how t\ell_{t} is revealed. In application, for example [1], they first obtain btb_{t} by repeated measurements and then compute t\ell_{t} classically. We show that combining algorithms from aaronson2018online [5], jun2017improved [20] gives an O(nτlog(T))O(\sqrt{n\tau\log(T)}) bound for adaptive regret. We then derive an O(knTlog(T))O(\sqrt{knT\log(T)}) regret bound when the comparator φ\varphi is allowed to shift kk times.

  • Mistake bound: Given a sequence of two-outcome measurements E1,E2,,ETE_{1},E_{2},\ldots,E_{T} where each EtE_{t} is followed afterward by a value btb_{t} such that |Tr(Etρt)bt|ϵ/3|\operatorname{Tr}(E_{t}\rho_{t})-b_{t}|\leq\epsilon/3. Assuming the ground truth ρt\rho_{t} has kk-shift or bounded path length 𝒫\mathcal{P}, we build upon the regret bounds to derive upper bounds of the number of mistakes made by an algorithm whose output is a sequence of xt𝒜[0,1]x_{t}^{\mathcal{A}}\in[0,1], where |xt𝒜Tr(Etρt)|>ϵ|x_{t}^{\mathcal{A}}-\operatorname{Tr}(E_{t}\rho_{t})|>\epsilon is considered a mistake. For dynamic regret and adaptive regret settings, we give mistake bounds in Corollary 1 and Corollary 4, respectively.

Regret Reference Setting O-Bound
Dynamic Theorem 1 Path length LT(n+log(T))𝒫L\sqrt{T(n+\log(T))\mathcal{P}}
Dynamic Corollary 2
Family of MM dynamical models
LT(n+log(T))𝒫L\sqrt{T(n+\log(T))\mathcal{P}^{\prime}}+Tlog(Mlog(T))\sqrt{T\log(M\log(T))}
Dynamic Corollary 3
ll-local quantum channels
LT(n+log(T))𝒫L\sqrt{T(n+\log(T))\mathcal{P}^{\prime}} +26lT+2^{6l}\sqrt{T}
Adaptive Lemma 1
Arbitrary interval with length τ\tau
Lnτlog(T)L\sqrt{n\tau\log(T)}
Adaptive Theorem 2 kk-shift LknTlog(T)L\sqrt{knT\log(T)}
Table 1: Adaptive and dynamic regret bounds of online learning of quantum states. Here nn is the number of qubits, LL is the Lipschitzness of loss functions, TT is the total number of iterations, 𝒫\mathcal{P} is the path length defined in Eq. (1), 𝒫\mathcal{P}^{\prime} in Corollary 2 is a variant of path length defined in Eq. (6) and 𝒫\mathcal{P}^{\prime} in Corollary 3 is defined in Eq. (15). ll-local quantum channels are explained in Section 3.

We also note that our results can be extended from two-outcome POVMs to KK-outcome POVMs for K2K\geq 2, inspired by a recent work by gong2022learning [26]. A KK-outcome measurements \mathcal{M} can be described by KK 2n×2n2^{n}\times 2^{n} Hermitian matrices E1,,EKE_{1},\ldots,E_{K} such that 0EjI,0\preceq E_{j}\preceq I, j=1KEj=I\sum_{j=1}^{K}E_{j}=I, and for any quantum state ρ\rho, the measurement \mathcal{M} outputs jj with probability Tr(Ejρ)\mathrm{Tr}(E_{j}\rho). In our setting of online learning of quantum states, a sequence of measurements t\mathcal{M}_{t} is given and each t\mathcal{M}_{t} can be described by a set of Hermitian matrices {Et,1,Et,2,,Et,K}\{E_{t,1},E_{t,2},\ldots,E_{t,K}\}. In Appendix C.1, we show that our algorithm has the same regret upper bound on it for any KK if we use the total variational norm as the loss function.

We also conduct numerical experiments for our algorithms about their dynamic and adaptive regret bounds on the kk-shift and the path length settings. The results demonstrate that our algorithms improve upon non-adaptive algorithms when the quantum state is changing, confirming our theoretical results.

2 Preliminaries

In online convex optimization (refer to [27] for a comprehensive treatment), a player engages in a TT-round game against an adversarial environment. In each round tt, the player selects xt𝒦x_{t}\in\mathcal{K}, then the environment unveils the convex loss function t\ell_{t}, and the player incurs the loss t(xt)\ell_{t}(x_{t}). The player’s objective is to minimize the (static) regret:

Regret=t=1Tt(xt)minφ𝒦t=1Tt(φ).\displaystyle\text{Regret}=\sum_{t=1}^{T}\ell_{t}(x_{t})-\min_{\varphi\in\mathcal{K}}\sum_{t=1}^{T}\ell_{t}(\varphi). (3)

We can model the quantum tomography problem within the framework of online convex optimization: the convex domain 𝒦\mathcal{K} is taken to be CnC_{n}, the set of all trace-1 positive semidefinite (PSD) complex matrices of dimension 2n2^{n}:

Cn={M2n×2n,M=M,M0,Tr(M)=1}.C_{n}=\{M\in\mathbb{C}^{2^{n}\times 2^{n}},M=M^{\dagger},M\succeq 0,\operatorname{Tr}(M)=1\}.

The loss function can be t(x)=|Tr(Etxt)Tr(Etρt)|\ell_{t}(x)=|\operatorname{Tr}(E_{t}x_{t})-\operatorname{Tr}(E_{t}\rho_{t})| or t(x)=(Tr(Etxt)Tr(Etρt))2\ell_{t}(x)=(\operatorname{Tr}(E_{t}x_{t})-\operatorname{Tr}(E_{t}\rho_{t}))^{2} for example, measuring the quality of approximating the ground-truth ρt\rho_{t}.

Another goal is to bound the maximum number of mistakes we make during the learning process. We consider the case where t[T]\forall t\in[T], btb_{t} in the revealed loss function t\ell_{t} satisfies |btTr(Etρt)|13ϵ\ |b_{t}-\operatorname{Tr}(E_{t}\rho_{t})|\leq\frac{1}{3}\epsilon. We consider it a mistake if the algorithm outputs an xtx_{t} such that |Tr(Etxt)Tr(Etρt)|>ϵ|\operatorname{Tr}(E_{t}x_{t})-\operatorname{Tr}(E_{t}\rho_{t})|>\epsilon.

Notations.

Let 1\|\cdot\|_{\color[rgb]{0,0,0}1} denote the nuclear norm, i.e., the 1\ell_{1} norm of the singular values of a matrix. Let xy=Tr(xy)x\bullet y=\operatorname{Tr}(x^{\dagger}y) denote the inner trace product between xx and yy. Denote λi(X)\lambda_{i}(X) to be the ithi^{\text{th}} largest eigenvalue of a Hermitian matrix XX. X0X\succeq 0 means that the matrix XX is positive semidefinite. For any positive integer nn, denote [n]:={0,1,,n1}[n]:=\{0,1,\ldots,n-1\}. Denote \otimes to be the tensor product, i.e., for matrices A=(Ai1j1)i1[m1],j1[n1]m1×n1A=(A_{i_{1}j_{1}})_{i_{1}\in[m_{1}],j_{1}\in[n_{1}]}\in\mathbb{C}^{m_{1}}\times\mathbb{C}^{n_{1}} and B=(Bi2j2)i2[m2],j2[n2]m2×n2B=(B_{i_{2}j_{2}})_{i_{2}\in[m_{2}],j_{2}\in[n_{2}]}\in\mathbb{C}^{m_{2}}\times\mathbb{C}^{n_{2}}, we have AB=(Ai1j1Bi2j2)i[m1m2],j[n1n2]m1m2×n1n2A\otimes B=(A_{i_{1}j_{1}}\cdot B_{i_{2}j_{2}})_{i\in[m_{1}m_{2}],j\in[n_{1}n_{2}]}\in\mathbb{C}^{m_{1}m_{2}}\times\mathbb{C}^{n_{1}n_{2}} where i=i1m2+i2i=i_{1}m_{2}+i_{2}, j=j1n2+j2j=j_{1}n_{2}+j_{2}. Throughout the paper, O~\tilde{O} omits poly-logarithmic terms.

3 Minimizing Dynamic Regret

In this section, we consider the case that ρt\rho_{t} may change over time. In particular, we do not assume the number of times that ρt\rho_{t} changes, but instead consider the total change over time measured by the path length. To this end, we study dynamic regret, where the path length 𝒫\mathcal{P} of the comparator in nuclear norm is restricted:

𝒫=t=1T1φtφt+11.\displaystyle\mathcal{P}=\sum_{t=1}^{T-1}\|\varphi_{t}-\varphi_{t+1}\|_{\color[rgb]{0,0,0}1}. (4)

Note that the set of comparators {φt}\{\varphi_{t}\} do not have to be the ground truth states {ρt}\{\rho_{t}\}; they can be any states satisfying the path length condition. We propose an algorithm achieving an O(LT(n+log(T))𝒫)O(L\sqrt{T(n+\log(T))\mathcal{P}}) dynamic regret bound, which instantiates copies of OMD algorithms with different learning rates as experts, and uses the multiplicative weight algorithm to select the best expert. The need for an expert algorithm arises because to obtain the dynamic regret bound, the learning rate of the OMD algorithm has to be a function of the path length, which is unknown ahead of time. The main algorithm is given in Algorithm 1, and the expert OMD algorithm is given in Algorithm 2. The following theorem gives the dynamic regret guarantee:

Theorem 1.

Assume the path length 𝒫=t=1T1φtφt+111\mathcal{P}=\sum_{t=1}^{T-1}\|\varphi_{t}-\varphi_{t+1}\|_{\color[rgb]{0,0,0}1}\geq 1 and the loss t\ell_{t} is convex, L-Lipschitz, and maps to [0,1][0,1], the dynamic regret of Algorithm 1 is bounded by O(LT(n+log(T))𝒫)O(L\sqrt{T(n+\log(T))\mathcal{P}}).

1:  Input: a candidate set of η\eta, S={2k11klogT}S=\{2^{-k-1}\mid 1\leq k\leq\log T\}, constant α\alpha.
2:  Initialize logT\log T experts A1,,AlogTA_{1},\ldots,A_{\log T}, where AkA_{k} is an instance of Algorithm 2 with η=2k1\eta=2^{-k-1} for all k[logT]k\in[\log T].
3:  Set initial weights w1(k)=1logTw_{1}(k)=\frac{1}{\log T} for all k[logT]k\in[\log T].
4:  for t=1,,Tt=1,\ldots,T do
5:     Predict xt=kwt(k)xt(k)kwt(k)x_{t}=\frac{\sum_{k}w_{t}(k)x_{t}(k)}{\sum_{k}w_{t}(k)}, where xt(k)x_{t}(k) is the output of the kk-th expert.
6:     Observe loss function t()\ell_{t}(\cdot).
7:     Update the weights as
wt+1(k)=wt(k)eαt(xt(k)).w_{t+1}(k)=w_{t}(k)e^{-\alpha\ell_{t}(x_{t}(k))}.
8:     Send gradients t(xt(k))\nabla\ell_{t}(x_{t}(k)) to each expert AkA_{k}.
9:  end for
Algorithm 1 Dynamic Regret for Quantum Tomography
1:  Input: domain 𝒦=(11T)Cn+1T2nI\mathcal{K}=(1-\frac{1}{T})C_{n}+\frac{1}{T2^{n}}I, step size η<12L\eta<\frac{1}{2L}.
2:  Define R(x)=k=12nλk(x)logλk(x)R(x)=\sum_{k=1}^{2^{n}}\lambda_{k}(x)\log\lambda_{k}(x), R(x):=I+log(x)\nabla R(x):=I+\log(x), and let BRB_{R} denote the Bregman divergence defined by RR.
3:  Set x1=2nIx_{1}=2^{-n}I, and y1y_{1} to satisfy R(y1)=𝟎\nabla R(y_{1})=\bf{0}.
4:  for t=1,,Tt=1,\ldots,T do
5:     Predict xtx_{t} and receive loss t(Tr(Etxt))\ell_{t}(\operatorname{Tr}(E_{t}x_{t})).
6:     Define t=t(Tr(Etxt))Et\nabla_{t}=\ell^{\prime}_{t}(\operatorname{Tr}(E_{t}x_{t}))E_{t}, where t(y)\ell_{t}^{\prime}(y) is a subderivative of t\ell_{t} with respect to yy.
7:     Update yt+1y_{t+1} such that R(yt+1)=R(xt)ηt\nabla R(y_{t+1})=\nabla R(x_{t})-\eta\nabla_{t}.
8:     Update xt+1=argminx𝒦BR(x||yt+1)x_{t+1}=\text{argmin}_{x\in\mathcal{K}}B_{R}(x||y_{t+1}).
9:  end for
Algorithm 2 OMD for Quantum Tomography

The proof of Theorem 1 is deferred to Appendix A and we discuss some technical challenges here. Ref. zhang2018adaptive [28] gives an O(T𝒫)O(\sqrt{T\mathcal{P}}) dynamic regret bound for online convex optimization:

Dynamic Regret=t=1Tt(xt)minφ1,,φt𝒦t=1Tt(φt)\displaystyle\text{Dynamic Regret}=\sum_{t=1}^{T}\ell_{t}(x_{t})-\min_{\varphi_{1},\ldots,\varphi_{t}\in\mathcal{K}}\sum_{t=1}^{T}\ell_{t}(\varphi_{t}) (5)

where t=1T1φtφt+12=𝒫\sum_{t=1}^{T-1}\|\varphi_{t}-\varphi_{t+1}\|_{2}=\mathcal{P}. Directly adopting their result will lead to an exponential dependence on nn, because the original algorithm uses 2\ell_{2} norm to measure the path length, which makes dual norm of the gradient depends polynomially on the dimension (2n2^{n} in this case). To avoid this issue, we use OMD with von Neumann entropy regularization to obtain a poly-logarithmic dependence on the dimension. From aaronson2018online [5], we know that with the von Neumann entropy as the regularizer, the regret can be bounded by problem parameters measured in nuclear norm and spectral norm, which tend to be small for online shadow tomography. The analysis is in a similar vein as the classical OMD proof [27], but classical results often rely on theorems from calculus that cannot be straightforwardly applied to the complex domain.

Given the challenge of complex numbers, we use tools from functional analysis and quantum information theory to study the convergence of the algorithm. Our analysis deviates from the classical analysis in the following aspects:

  • We use the notion of Gateaux differentiability and the Gateaux derivative to extend classical gradient-based methods to the quantum domain. In particular, we use the Gateaux derivative to define the Bregman divergence, a key ingredient of the OMD analysis. Subsequently, we show that the Bregman divergence obeys an Euler’s inequality that corresponds to the generalized Pythagorean theorem in the classical analysis [27].

  • To give a quantitative bound on the Bregman divergence between the current iterate and the dual update, we cannot rely on the interpretation of the Bregman divergence as a notion of distance. Instead, we upper bound the quantity using the power series of the matrix exponential, and the Golden-Thompson inequality.

  • When bounding the dynamic regret, we need to bound the difference of von Neumann entropy between two quantum states. This can be seen as a type of continuity bound for the von Neumann entropy, and was first given by fannes [29] and later sharpened by Audenaert_2007 [22]. The bound is known as the Fannes-Audenaert inequality with numerous applications in quantum information theory [30, 31, 32, 33].

Mistake bound of the path length setting.

We can obtain a mistake bound in this setting when we assume the ground truth ρt\rho_{t} has a bounded path length.

Corollary 1.

Let {ρt}\{\rho_{t}\} be a sequence of nn-qubit mixed state whose path length PT=t=1T1ρt+1ρt1𝒫P_{T}=\sum_{t=1}^{T-1}\|\rho_{t+1}-\rho_{t}\|_{\color[rgb]{0,0,0}1}\leq\mathcal{P}, and let E1,E2,E_{1},E_{2},\ldots be a sequence of two-outcome measurements that are revealed to the learner one by one, each followed by a value bt[0,1]b_{t}\in[0,1] such that |Tr(Etρt)bt|13ϵ|\operatorname{Tr}(E_{t}\rho_{t})-b_{t}|\leq\frac{1}{3}\epsilon. Then there is an explicit strategy for outputting hypothesis states x1,x2,x_{1},x_{2},\ldots such that |Tr(Etxt)Tr(Etρt)|>ϵ|\operatorname{Tr}(E_{t}x_{t})-\operatorname{Tr}(E_{t}\rho_{t})|>\epsilon for at most O~(L2n𝒫ϵ2)\tilde{O}(\frac{L^{2}n\mathcal{P}}{\epsilon^{2}}) times.

Dynamic regret given dynamical models.

So far, we have not made assumptions on how the state evolves. However, in many cases, the quantum state only evolves under a family of dynamical models. A natural question is the following: If we know the family of possible dynamical models in advance, can we achieve a better dynamic regret?

First, we consider the case when there is a fixed family of dynamical models. The most general linear maps Φ:CnCn\Phi\colon C_{n}\to C_{n} implementable on quantum computers are quantum channels satisfying:

  • Completely positive: For any ρCn\rho\in C_{n} and mm-dimensional identity matrix ImI_{m}, ρIm0\rho\otimes I_{m}\succeq 0;

  • Trace preserving: Tr(Φ(ρ))=Tr(ρ)=1\operatorname{Tr}(\Phi(\rho))=\operatorname{Tr}(\rho)=1 for any quantum state ρCn\rho\in C_{n}.

A common example is Φ(x)=eiHtxeiHt\Phi(x)=e^{iHt}xe^{-iHt}, where HH is a background Hamiltonian that evolves the quantum state following the Schrödinger equation.

We first consider the case with MM possible dynamical models {Φ(1),,Φ(M)}\{\Phi^{(1)},\ldots,\Phi^{(M)}\}, each of which is a quantum channel. Following hall2013dynamical [34] and zhang2018adaptive [28], we restrict a variant of the path length of the comparator:

𝒫=minΦ(i){Φ(1),,Φ(M)}t=1T1φt+1Φ(i)(φt)1.\displaystyle\mathcal{P}^{\prime}=\min_{\Phi^{(i)}\in\{\Phi^{(1)},\ldots,\Phi^{(M)}\}}\sum_{t=1}^{T-1}\|\varphi_{t+1}-\Phi^{(i)}(\varphi_{t})\|_{\color[rgb]{0,0,0}1}. (6)
Corollary 2.

Under the assumption that we have MM possible dynamical models {Φ(1),,Φ(M)}\{\Phi^{(1)},\ldots,\Phi^{(M)}\} and the path length 𝒫=minΦ(i){Φ(1),,Φ(M)}t=1T1φt+1Φ(i)(φt)11\mathcal{P}^{\prime}=\min_{\Phi^{(i)}\in\{\Phi^{(1)},\ldots,\Phi^{(M)}\}}\sum_{t=1}^{T-1}\|\varphi_{t+1}-\Phi^{(i)}(\varphi_{t})\|_{\color[rgb]{0,0,0}1}\geq 1, there is an algorithm with dynamic regret O(LT(n+log(T))𝒫+Tlog(Mlog(T)))O(L\sqrt{T(n+\log(T))\mathcal{P}^{\prime}}+\sqrt{T\log(M\log(T))}).

In general, the possible dynamical models can be a continuous set. Considering that current quantum computers [23, 24, 25] commonly use 2D chips where a quantum gate only influences a constant number of qubits, it is natural to consider ll-local quantum channels where each channel only influences at most ll qubits (and perform identity on the other nln-l qubits), and the following result holds. The proofs of Corollary 2 and Corollary 3 are deferred to Appendix B.

Corollary 3.

Under the assumption that the dynamical models are ll-local quantum channels, there is an algorithm with dynamic regret O~(LnT𝒫+26lT)\tilde{O}(L\sqrt{nT\mathcal{P}^{\prime}}+2^{6l}\sqrt{T}), where 𝒫=minΦSn,lt=1T1φt+1Φ(φt)11\mathcal{P}^{\prime}=\min_{\Phi\in S_{n,l}}\sum_{t=1}^{T-1}\|\varphi_{t+1}-\Phi(\varphi_{t})\|_{\color[rgb]{0,0,0}1}\geq 1 and Sn,lS_{n,l} is the set of all ll-local quantum channels.

4 Minimizing Adaptive Regret

In this section, we minimize the adaptive regret in Eq. (2) in the non-realizable case, i.e., btb_{t} can be arbitrarily selected. We follow a meta-algorithm called Coin Betting for Changing Environment (CBCE) proposed by jun2017improved [20] with its black-box algorithm \mathcal{B} being the Regularized Follow-the-Leader based algorithm introduced by aaronson2018online [5] for learning quantum states. The algorithm is given in Algorithm 3.

1:  Input: CBCE from jun2017improved [20] as the meta-algorithm, RFTL from aaronson2018online [5] as the black-box algorithm.
2:  Initialize experts J\mathcal{B}_{J}, where J\mathcal{B}_{J} is an instance of the RFTL algorithm over a different interval JJ, J𝒥J\in\mathcal{J} where 𝒥\mathcal{J} is the set of geometric intervals.
3:  Initialize a prior distribution over J\mathcal{B}_{J}.
4:  for t=1,,Tt=1,\ldots,T do
5:     Predict among the black-box experts J\mathcal{B}_{J} according to CBCE.
6:     Update the distribution over the black-box experts J\mathcal{B}_{J} according to CBCE.
7:  end for
Algorithm 3 Adaptive Regret for Quantum Tomography

The basic idea is that the CBCE meta-algorithm can transfer the regret bound of any black-box algorithm to strongly adaptive regret, and the RFTL algorithm with von Neumann entropy regularization achieves O(T)O(\sqrt{T}) regret bound which can serve as the black-box algorithm. Formally:

Lemma 1.

Let E1,E2,E_{1},E_{2},\ldots be a sequence of two-outcome measurements on an nn-qubit state presented to the learner, and suppose the loss t\ell_{t} is convex, L-Lipschitz, and maps to [0,1][0,1]. Then Algorithm 3 guarantees strongly adaptive regret SA-RegretT𝒜(τ)=O(Lnτlog(T))\text{SA-Regret}_{T}^{\mathcal{A}}(\tau)=O(L\sqrt{n\tau\log(T)}) for all TT.

kk-shift regret bound.

As an application of Lemma 1, we derive a kk-shift regret bound. In the kk-shift setting, we assume that φ\varphi can change at most kk times, and we derive the following theorem.

Theorem 2.

Under the same setting as Lemma 1, if we allow φ\varphi to change at most kk times, we have an O(LknTlog(T))O(L\sqrt{knT\log(T)}) bound on the kk-shift regret Rk-shift𝒜R_{k\text{-shift}}^{\mathcal{A}}:

Rk-shift𝒜=t=1Tt(Etxt𝒜)min1=t1t2tk+1=T+1minφ1,φ2,,φkCnj=1kt=tjtj+11t(Etφj).\displaystyle R_{k\text{-shift}}^{\mathcal{A}}=\sum_{t=1}^{T}\ell_{t}(E_{t}x_{t}^{\mathcal{A}})-\min_{1=t_{1}\leq t_{2}\leq\cdots\leq t_{k+1}=T+1}\min_{\varphi_{1},\varphi_{2},\ldots,\varphi_{k}\in C_{n}}\sum_{j=1}^{k}\sum_{t=t_{j}}^{t_{j+1}-1}\ell_{t}(E_{t}\varphi_{j}). (7)

Mistake bound of the kk-shift setting.

With Theorem 2 in hand, when we further assume that the ground truth ρt\rho_{t} also shifts at most kk times, we can derive the following mistake bound.

Corollary 4.

Let {ρt}\{\rho_{t}\} be a sequence of nn-qubit mixed state which changes at most kk times, and let E1,E2,E_{1},E_{2},\ldots be a sequence of two-outcome measurements that are revealed to the learner one by one, each followed by a value bt[0,1]b_{t}\in[0,1] such that |Tr(Etρt)bt|13ϵ|\operatorname{Tr}(E_{t}\rho_{t})-b_{t}|\leq\frac{1}{3}\epsilon. Then there is an explicit strategy for outputting hypothesis states x1,x2,x_{1},x_{2},\ldots such that |Tr(Etxt)Tr(Etρt)|>ϵ|\operatorname{Tr}(E_{t}x_{t})-\operatorname{Tr}(E_{t}\rho_{t})|>\epsilon for at most O(knϵ2log(knϵ2))O\left(\frac{kn}{\epsilon^{2}}\log(\frac{kn}{\epsilon^{2}})\right) times.

The proofs of the claims in this section are deferred to Appendix C.

5 Numerical Experiments

In this section, we present numerical results which support our regret bounds.333The implementation codes can be found at the link: https://github.com/EstelYang/Adaptive-Online-Learning-of-Quantum-States Specifically, we conduct experiments for comparing with non-adaptive algorithms and testing our performance of the kk-shift setting and the path length setting, respectively. All the experiments are performed on Visual Studio Code 1.6.32 on a 2021 MacBook Pro (M1 Pro chip) with 10-core CPU and 16GB memory. We generate random nn-qubit quantum states using the QuTiP package [35, 36]. In the kk-shift setting, each of the kk changes of the quantum state are chosen uniformly at random from all iterations {1,2,,T}\{1,2,\ldots,T\}. In the path length setting, our quantum state evolves slowly following a background Hamiltonian HH also generated by QuTiP. At time tt we generate a POVM measurement EtE_{t} with eigenvalues and eigenvectors chosen uniformly at random from [0,1][0,1] and 2\ell_{2}-norm unit vectors, respectively. Then, we apply the measurement, receive loss, and update our prediction using Algorithm 1 and Algorithm 3. For all experiments, the instantaneous regret at each time step is computed using the ground truth state, which changes over each run depending on the setting.

Comparison to non-adaptive algorithms.

One eminent strength of adaptive online learning of quantum states is its ability to capture the changing states. Previous algorithms such as regularized follow-the-leader (RFTL) perform well for predicting fixed quantum states [5], but they are not adaptive when the quantum state evolves. We compare the cumulative regret produced by RFTL and our algorithms in the kk-shift setting and the path length setting. The results are illustrated as follows.

Refer to caption
(a) kk-shift setting, k=80k=80, n=2n=2.
Refer to caption
(b) kk-shift setting, k=4k=4, n=2n=2.
Refer to caption
(c) kk-shift setting, k=4k=4, n=3n=3.
Refer to caption
(d) kk-shift setting, k=4k=4, n=4n=4.
Refer to caption
(e) kk-shift setting, k=4k=4, n=5n=5.
Refer to caption
(f) kk-shift setting, k=4k=4, n=6n=6.
Figure 1: Comparison between Algorithm 3 and non-adaptive RFTL in the kk-shift setting. The regret attained by our adaptive Algorithm 3 is denoted by “CBCE” (the blue curve with purple shadow), and the regret generated by the non-adaptive algorithm RFTL is denoted by the orange curve with red shadow. The lower solid line stands for the average value of regret over 10 random experiments and the upper line stands for the maximum.
Refer to caption
(a) Path length setting, Algorithm 1, n=2n=2.
Refer to caption
(b) Path length setting, Algorithm 1, n=3n=3.
Refer to caption
(c) Path length setting, Algorithm 3, n=2n=2.
Refer to caption
(d) Path length setting, Algorithm 3, n=3n=3.
Figure 2: Comparison between our algorithms and non-adaptive RFTL in the path length setting. Our algorithms are denoted by “DOMD” / “CBCE", and the non-adaptive algorithm RFTL with different learning rates η\eta are denoted by the value of η\eta.

The advantage of our algorithms in the kk-shift setting is presented in Figure 1. In Subfigure(a), we plot the long term changes of the cumulative regret. The two curves have similar trends but CBCE achieves smaller regret. In Subfigure(b)-(f), we choose a small time scale T=20T=20 to display the details. It is clearly shown that the distances between two curves grow larger as TT increases, which means our algorithm maintains the advantage in the long run.

Figure 2 shows the advantage of our algorithms in the path length setting. In Subfigure(a) and (b), we plot the comparison between our Algorithm 1 (denoted by “DOMD”) and the non-adaptive algorithm RFTL with different learning rates. It can be seen that in a single run, our algorithm attains the minimum regret and is comparable to RFTL with the best learning rate. Indeed, given the similarity of RFTL and OMD, we expect RFTL with the optimal learning rate to do as well as DOMD in practice. However, since the optimal learning rate for OMD depends on the path length, it is unknown a priori, and non-adaptive methods require more tuning to achieve good performance. In Subfigure(c) and (d), we plot the comparison between our Algorithm 3 (denoted by “CBCE”) and the non-adaptive RFTL with different η\eta. There is a significant gap between the regret of CBCE and RFTL with the best learning rate from the grid search, demonstrating the interesting phenomenon that empirically, CBCE with adaptive regret guarantees also does well in the path length setting.

Adaptive regret bound on the kk-shift setting.

We also conduct experiments to verify our adaptive regret bound O(knTlogT)O(\sqrt{knT\log T}) in the kk-shift setting (Theorem 2). Results are shown in Figure 3.

Refer to caption
(a) Average Regret RaverR_{\text{aver}}
Refer to caption
(b) Ratio C=Rk-shift/knTlogTC=R_{k\text{-shift}}/\sqrt{knT\log T}
Figure 3: Adaptive regret bounds in the kk-shift setting.

In Subfigure(a), we plot the average regret Raver=regretT/TR_{\text{aver}}=\text{regret}_{T}/T as TT grows. Every data-point was executed by 100 random experiments and taking the maximum regret, since regret is an upper bound. Under different parameter choices (k,n)(k,n), RaverR_{\text{aver}} always converges to 0 when TT increases, implying a sublinear regret Rk-shiftR_{k\text{-shift}}. More concretely, curves with larger kk has larger RaverR_{\text{aver}} and converges more slowly, consistent with the intuition that more shifts make the learning task more difficult.

In Subfigure(b), to verify the precise bound we plot the ratio C=regretT/knTlogTC=\text{regret}_{T}/\sqrt{knT\log T}. We choose n=2,k=10,T=200n=2,k=10,T=200, and repeat the experiments 50 times randomly. We then plot the maximum of the regret constant CC on every interval. As the figure shows, the ratio is limited by a constant C0.07C\leq 0.07. This strongly supports our upper bound O(knTlogT)O(\sqrt{knT\log T}) on regretT\text{regret}_{T} with every fixed nn.

Dynamic regret bound on the path length setting.

In addition, we conduct experiments in terms of average regret RaverR_{\text{aver}} and ratio term Cpath=Rpath/T(n+logT)𝒫C_{\text{path}}=R_{\text{path}}/\sqrt{T(n+\log T)\mathcal{P}} to testify our dynamic regret bound O(LT(n+logT)𝒫)O(L\sqrt{T(n+\log T)\mathcal{P}}) in the path length setting (Theorem 1). Results are shown in Figure 4.

Refer to caption
(a) Average Regret RaverR_{\text{aver}}
Refer to caption
(b) Ratio C=Rpath/T(n+logT)𝒫C=R_{\text{path}}/\sqrt{T(n+\log T)\mathcal{P}}
Figure 4: Dynamic regret bound on the path length setting.

In Subfigure(a), we plot the change of average regret Raver=regretT/TR_{\text{aver}}=\text{regret}_{T}/T. We observe that as TT grows, RaverR_{\text{aver}} converges to 0 with different pairs of (n,𝒫)(n,\mathcal{P}), indicating that RpathR_{\text{path}} is sublinear with respect to TT. In addition, the blue curve representing the average regret of a 2-qubit quantum state appears higher than the orange curve representing the average regret of a 1-qubit quantum state, consistent with the fact that our regret bound depends polynomially on the number of qubits.

In Subfigure(b), we choose different scales of TT to demonstrate how the ratio CC evolves in the long term. For every fixed TT, we execute 100 random experiments and take the maximum ratio term CC. Although there are noticeable fluctuations, it can be seen that the curve is non-increasing and only fluctuates within a fixed interval of 0 to 0.065, which shows that CC has a constant upper bound C=0.065C^{\prime}=0.065. In conclusion, Rpath/T(n+logT)𝒫R_{\text{path}}/\sqrt{T(n+\log T)\mathcal{P}} is within the range of a small positive constant.

6 Conclusions

In this paper, we consider online learning of changing quantum states. We give algorithms with adaptive and dynamic regret bounds that are polynomial in the number of qubits and sublinear in the number of measurements, and verify our results with numerical experiments.

Our work leaves a couple of open questions for future investigation:

  • Lower bound in TT and path length PP are known in [28], but there is no lower bound in nn to our knowledge. Can we prove an Ω(n)\Omega(\sqrt{n}) lower bound for dynamic regret or adaptive regret of online learning of quantum states? We noticed a recent work [37] that proposed a lower bound in terms of nn for quantum channels. It would be intriguing in future work to explore whether these techniques can be adapted to the quantum state scenario.

  • Although our algorithms do not care how the loss function is revealed, in practice, a standard way is to measure several copies of the state at each step, so the copies need to follow the same dynamics. Accounting for different state evolution for different state copies is left as future work.

Acknowledgments

TL, XW, and RY were supported by the National Natural Science Foundation of China (Grant Numbers 92365117 and 62372006), and The Fundamental Research Funds for the Central Universities, Peking University.

References

Appendix A Proof of Dynamic Regret Bounds for Quantum Tomography

We prove Theorem 1 in this section.

We use the online mirror descent (OMD) algorithm (similar to Algorithm 1 in aaronson2018online [5]) to replace the online gradient descent (OGD) expert in zhang2018adaptive [28]. We denote GRG_{R} to be an upper bound on the spectral norm of t=t(Tr(Etxt))Et\nabla_{t}=\ell\textquoteright_{t}(\operatorname{Tr}(E_{t}x_{t}))E_{t}, and DR=maxx,y𝒦{R(x)R(y)}D_{R}=\sqrt{\max_{x,y\in\mathcal{K}}\{R(x)-R(y)\}} to be the diameter of 𝒦\mathcal{K} with respect to the function RR. The main motivation is that the dimension now is 2n2^{n} and we need logarithmic dependence on dimension for our regret bound, where 2\ell_{2} norm (used by OGD) is not the most appropriate choice and fails to control GRG_{R}.

Instead, the OMD algorithm with von Neumann entropy has a polynomial upper bound on both DRD_{R} and GRG_{R}, and the only question left is how to incorporate the path length into the analysis of OMD to achieve dynamic regret. We observe that such incorporation is natural, due to the essential similarity between OGD and OMD. We bound the path distance φtφt+1\|\varphi_{t}-\varphi_{t+1}\| in nuclear norm, and bound the gradient of the regularization R(x)\nabla R(x) by spectral norm.

A potential issue is that the von Neumann entropy contains logarithmic terms, which might cause the gradient to explode when λk\lambda_{k} is very small. To mitigate this issue, we mix xx with IT2n\frac{I}{T2^{n}}: x~=(11T)x+IT2n\tilde{x}=(1-\frac{1}{T})x+\frac{I}{T2^{n}}, for both xtx_{t} and φt\varphi_{t}. Notice that this mixing does not hurt the regret bound: mixing such a small fraction of the identity only increases the regret by O(LT1T)=O(L)O(LT\frac{1}{T})=O(L), since the losses are LL-Lipschitz. In addition, the path length will only decrease, since we are working over a smaller domain. Henceforth we will consider the modified 𝒦=(11T)Cn+1T2nI\mathcal{K}=(1-\frac{1}{T})C_{n}+\frac{1}{T2^{n}}I throughout the proof.

Our proof uses tools in convex analysis for Hilbert spaces. Since all matrices used in Algorithm 2 are Hermitian (by Proposition 1), it suffices to consider the Hilbert space of Hermitian matrices equipped with the trace inner product. Note that this is a real Hilbert space: the vector space of Hermitian matrices only allows scalar multiplication over the field of reals.

We first present some basic results for the von Neumann Entropy. For completeness, we give the following definitions from operator_theory [38]; for a more comprehensive treatment of the topic of analysis in Hilbert spaces, see also operator_theory [38].

Definition 1 (Gateaux differentiability).

Let CC be a subset of a Hilbert space \mathcal{H}, and f:C[,]f:C\rightarrow[-\infty,\infty]. Let xCx\in C such that for each yy\in\mathcal{H}, α>0\exists\alpha>0 such that [x,x+αy]C[x,x+\alpha y]\subset C.Then ff is Gateaux differentiable at xx if there exists a bounded linear operator Df(x)Df(x) called the Gateaux derivative of ff at xx, such that

Df(x)y=limα0f(x+αy)f(x)α.Df(x)y=\lim_{\alpha\downarrow 0}\frac{f(x+\alpha y)-f(x)}{\alpha}.
Definition 2 (Directional derivatives).

For a Hilbert space \mathcal{H}, let f:(,]f:\mathcal{H}\rightarrow(-\infty,\infty] be proper, and xdomf,yx\in\text{dom}f,y\in\mathcal{H}.The directional derivative of ff at xx in the direction yy is

f(x;y)=limα0f(x+αy)f(x)α,f^{\prime}(x;y)=\lim_{\alpha\downarrow 0}\frac{f(x+\alpha y)-f(x)}{\alpha},

if the limit exists on the extended real line.

It is well-known that ff is Gateaux differentiable at xx if and only if all directional derivatives at xx exist and they form a bounded linear operator. By the Riesz-Frechet representation theorem, let ,\langle\cdot,\cdot\rangle denote the Hilbert space inner product, there exists a unique f(x)\nabla f(x)\in\mathcal{H} such that for each yy\in\mathcal{H},

f(x;y)=y,f(x).f^{\prime}(x;y)=\langle y,\nabla f(x)\rangle.
Lemma 2 (Lemma VI.4 in Ref. [39]).

The von Neumann entropy R(x)=Tr(xlogx)R(x)=\operatorname{Tr}(x\log x) for Hermitian PD matrices xx is Gateaux differentiable, and the Gateaux derivative is

R(x)=logx+I.\nabla R(x)=\log x+I.

The quantum analog of the classical Bregman divergence is therefore defined as

BR(xy)=R(x)R(y)R(y)(xy).B_{R}(x\|y)=R(x)-R(y)-\nabla R(y)\bullet(x-y).

The above expression is also known as the quantum relative entropy of xx with respect to yy, the quantum information divergence, and the von Neumann divergence [40].

Proposition 1.

The variable yty_{t} in Algorithm 2 is always Hermitian PD.

Proof.

By the definition of yty_{t} and R()\nabla R(\cdot), we have log(yt+1)=log(xt1)ηt1.\log(y_{t+1})=\log(x_{t-1})-\eta\nabla_{t-1}. Recall that the loss functions are of the form t(ρ)=(Etρ)\ell_{t}(\rho)=\ell(E_{t}\bullet\rho). Since EtE_{t} is Hermitian, t1=()Et1\nabla_{t-1}=\ell^{\prime}(\cdot)E_{t-1} is also Hermitian. The lemma follows by definition. ∎

Proposition 2.

For all Hermitian PSD matrices x,yx,y, BR(xy)B_{R}(x\|y)\in\mathbb{R}. For all x𝒦x\in\mathcal{K} and all Hermitian PD yy, BR(xy)0B_{R}(x\|y)\geq 0, and for all φ1𝒦\varphi_{1}\in\mathcal{K} and x1x_{1} as defined in Algorithm 2, BR(φ1x1)maxx,y𝒦{R(x)R(y)}B_{R}(\varphi_{1}\|x_{1})\leq\max_{x,y\in\mathcal{K}}\left\{R(x)-R(y)\right\}.

Proof.

The first claim follows from the definition of RR and R\nabla R. Since both x,yx,y are both positive definite, by Fact 3 in quantum_divergence [40] (Klein’s inequality), BR(xy)0B_{R}(x\|y)\geq 0. For the last claim, φ1𝒦\varphi_{1}\in\mathcal{K} be any element, to show BR(φ1x1)=R(φ1)R(x1)R(x1)(φ1x1)maxx,y𝒦R(x)R(y)B_{R}(\varphi_{1}\|x_{1})=R(\varphi_{1})-R(x_{1})-\nabla R(x_{1})\bullet(\varphi_{1}-x_{1})\leq\max_{x,y\in\mathcal{K}}R(x)-R(y), we only need to show R(x1)(φ1x1)0\nabla R(x_{1})\bullet(\varphi_{1}-x_{1})\geq 0. By direct computation,

R(x1)(φ1x1)=(I+log(2nI))(φ12nI)=(nlog(2)I)(φ12nI)=0,\displaystyle\nabla R(x_{1})\bullet(\varphi_{1}-x_{1})=(I+\log(2^{-n}I))\bullet(\varphi_{1}-2^{-n}I)=(-n\log(2)I)\bullet(\varphi_{1}-2^{-n}I)=0,

since Tr(φ1)=Tr(2nI)=1\operatorname{Tr}(\varphi_{1})=\operatorname{Tr}(2^{-n}I)=1. ∎

Proposition 3 (Euler’s inequality).

Let xt,ytx_{t},y_{t} be as defined in Algorithm 2, then for any φ𝒦\varphi\in\mathcal{K}, we have

BR(φxt)BR(φyt).B_{R}(\varphi\|x_{t})\leq B_{R}(\varphi\|y_{t}).
Proof.

For x𝒦x\in\mathcal{K},

BR(xy)=Tr(xlogxxlogyx+y).B_{R}(x\|y)=\operatorname{Tr}(x\log x-x\log y-x+y).

By Lemma 2, Tr(xlogx)\operatorname{Tr}(x\log x) is Gateaux differentiable on the set of positive definite Hermitian matrices. Since Tr(xlogy)\operatorname{Tr}(x\log y) and Tr(x)\operatorname{Tr}(x) are both Gateaux differentiable, BR(xy)B_{R}(x\|y) is also Gateaux differentiable as a function of xx. By definition, xt=argminx𝒦BR(xyt)x_{t}=\text{argmin}_{x\in\mathcal{K}}B_{R}(x\|y_{t}), and therefore for α(0,1]\alpha\in(0,1] and any φ𝒦\varphi\in\mathcal{K},

BR(xtyt)BR((1α)xt+αφyt).B_{R}(x_{t}\|y_{t})\leq B_{R}((1-\alpha)x_{t}+\alpha\varphi\|y_{t}).

Dividing by α\alpha and taking α0\alpha\downarrow 0, we have

BR(xtyt)(φxt)0.\nabla B_{R}(x_{t}\|y_{t})\bullet(\varphi-x_{t})\geq 0.

Expanding the derivative, (R(xt)R(yt))(φxt)0(\nabla R(x_{t})-\nabla R(y_{t}))\bullet(\varphi-x_{t})\geq 0. Observe that we have the decomposition

(R(yt)R(xt))(φxt)=BR(xt||yt)BR(φ||yt)+BR(φ||xt),(\nabla R(y_{t})-\nabla R(x_{t}))\bullet(\varphi-x_{t})=B_{R}(x_{t}||y_{t})-B_{R}(\varphi||y_{t})+B_{R}(\varphi||x_{t}),

which means BR(φ||yt)BR(xt||yt)+BR(φ||xt)BR(φ||xt)B_{R}(\varphi||y_{t})\geq B_{R}(x_{t}||y_{t})+B_{R}(\varphi||x_{t})\geq B_{R}(\varphi||x_{t}), because BR(xt||yt)0B_{R}(x_{t}||y_{t})\geq 0 by Proposition 2. ∎

We now prove the main lemma which gives a dynamic regret bound of the OMD algorithm, when the learning rate η\eta is chosen optimally.

Lemma 3.

Assume the path length 𝒫=t=1T1φtφt+111\mathcal{P}=\sum_{t=1}^{T-1}\|\varphi_{t}-\varphi_{t+1}\|_{\color[rgb]{0,0,0}1}\geq 1 where 1\|\cdot\|_{\color[rgb]{0,0,0}1} is the nuclear norm, the dynamic regret of Algorithm 2 is upper bounded by O(LT(n+logT)𝒫)O(L\sqrt{T(n+\log T)\mathcal{P}}), if we choose η=𝒫(n+logT)TL2\eta=\sqrt{\frac{\mathcal{P}(n+\log T)}{TL^{2}}}.

Proof.

We follow the standard analysis of OMD, assuming t\ell_{t} are LL-Lipschitz. By convexity of loss t\ell_{t}, we have that

t(Tr(Etxt))t(Tr(Etφt))t(Tr(Etxt))(Tr(Etxt)Tr(Etφt))=t(xtφt).\displaystyle\ell_{t}(\operatorname{Tr}(E_{t}x_{t}))-\ell_{t}(\operatorname{Tr}(E_{t}\varphi_{t}))\leq\ell_{t}^{\prime}(\operatorname{Tr}(E_{t}x_{t}))(\operatorname{Tr}(E_{t}x_{t})-\operatorname{Tr}(E_{t}\varphi_{t})){\color[rgb]{0,0,0}=}\nabla_{t}\bullet(x_{t}-\varphi_{t}). (8)

The following property of Bregman divergences follows from the definition: for any Hermitian PSD matrices x,y,zx,y,z,

(xy)(R(z)R(y))=BR(xy)BR(xz)+BR(yz).\displaystyle(x-y)\bullet(\nabla R(z)-\nabla R(y))=B_{R}(x\|y)-B_{R}(x\|z)+B_{R}(y\|z). (9)

Now, take x=φtx=\varphi_{t}, y=xty=x_{t}, and z=yt+1z=y_{t+1} and combine (8) and (9). Writing t(x)=t(Tr(Etx))\ell_{t}(x)=\ell_{t}(\operatorname{Tr}(E_{t}x)), we have

t(xt)t(φt)1η(BR(φtxt)BR(φtyt+1)+BR(xtyt+1))1η(BR(φtxt)BR(φtxt+1)+BR(xtyt+1))=1η(BR(φtxt)BR(φt+1xt+1)+BR(φt+1xt+1)BR(φtxt+1)+BR(xtyt+1)),\begin{split}&\ell_{t}(x_{t})-\ell_{t}(\varphi_{t})\\ \leq&\frac{1}{\eta}(B_{R}(\varphi_{t}\|x_{t})-B_{R}(\varphi_{t}\|y_{t+1})+B_{R}(x_{t}\|y_{t+1}))\\ \leq&\frac{1}{\eta}(B_{R}(\varphi_{t}\|x_{t})-B_{R}(\varphi_{t}\|x_{t+1})+B_{R}(x_{t}\|y_{t+1}))\\ =&\frac{1}{\eta}\big{(}B_{R}(\varphi_{t}\|x_{t})-B_{R}(\varphi_{t+1}\|x_{t+1})+B_{R}(\varphi_{t+1}\|x_{t+1})-B_{R}(\varphi_{t}\|x_{t+1})+B_{R}(x_{t}\|y_{t+1})\big{)},\end{split}

where the second inequality is due to Proposition 3.

The following lemma gives an upper bound on one of the quantities in Eq. (LABEL:eqn:loss_ineq), which we will use next in the proof. In vanilla OMD, this quantity is upper bounded using the fact that the Bregman divergence can be interpreted as a distance measured by a certain local norm; however, under complex domain, this fact is no longer immediate from the Mean Value Theorem. We instead prove the upper bound by using properties of the dual space updates yt+1y_{t+1}.

Lemma 4.

Let GRG_{R} be an upper bound on t\|\nabla_{t}\|, then BR(xtyt+1)12η2GR2.B_{R}(x_{t}\|y_{t+1})\leq\frac{1}{2}\eta^{2}G_{R}^{2}.

Proof.

First, note that by definition, yt+1y_{t+1} satisfies

log(yt+1)=log(xt)ηt.\log(y_{t+1})=\log(x_{t})-\eta\nabla_{t}. (10)

Then we have yt+1=exp(log(xt)ηt),y_{t+1}=\exp(\log(x_{t})-\eta\nabla_{t}), and by the Golden-Thompson inequality,

Tr(yt+1)Tr(exp(log(xt))exp(ηt))Tr(xtexp(ηt)).\displaystyle\operatorname{Tr}(y_{t+1})\leq\operatorname{Tr}(\exp(\log(x_{t}))\exp(-\eta\nabla_{t}))\leq\operatorname{Tr}(x_{t}\exp(-\eta\nabla_{t})). (11)

Expanding the Bregman divergence, we have

BR(xtyt+1)=\displaystyle B_{R}(x_{t}\|y_{t+1})= Tr(xtlog(xt))Tr(yt+1log(yt+1))(I+log(yt+1))(xtyt+1)\displaystyle\operatorname{Tr}(x_{t}\log(x_{t}))-\operatorname{Tr}(y_{t+1}\log(y_{t+1}))-(I+\log(y_{t+1}))\bullet(x_{t}-y_{t+1})
=\displaystyle= Tr(xtlog(xt))Tr(xtlog(yt+1))1+Tr(yt+1)\displaystyle\operatorname{Tr}(x_{t}\log(x_{t}))-\operatorname{Tr}(x_{t}\log(y_{t+1}))-1+\operatorname{Tr}(y_{t+1})
=\displaystyle= Tr(xtlog(xt))Tr(xt(log(xt)ηt))1+Tr(yt+1)\displaystyle\operatorname{Tr}(x_{t}\log(x_{t}))-\operatorname{Tr}(x_{t}(\log(x_{t})-\eta\nabla_{t}))-1+\operatorname{Tr}(y_{t+1}) (by (10))
=\displaystyle= ηTr(xtt)1+Tr(yt+1)\displaystyle\eta\operatorname{Tr}(x_{t}\nabla_{t})-1+\operatorname{Tr}(y_{t+1})
\displaystyle\leq ηTr(xtt)1+Tr(xtexp(ηt)).\displaystyle\eta\operatorname{Tr}(x_{t}\nabla_{t})-1+\operatorname{Tr}(x_{t}\exp(-\eta\nabla_{t})). (by (11))

By definition, the matrix exponential can be written as an infinite sum,

exp(ηt)=k=01k!(ηt)k.\exp(-\eta\nabla_{t})=\sum_{k=0}^{\infty}\frac{1}{k!}(-\eta\nabla_{t})^{k}.

Using this expression in the last term of the inequality,

Tr(xtexp(ηt))=1ηTr(xtt)+12η2Tr(xtt2)+k=31k!Tr(xt(ηt)k).\displaystyle\operatorname{Tr}(x_{t}\exp(-\eta\nabla_{t}))=1-\eta\operatorname{Tr}(x_{t}\nabla_{t})+\frac{1}{2}\eta^{2}\operatorname{Tr}(x_{t}\nabla_{t}^{2})+\sum_{k=3}^{\infty}\frac{1}{k!}\operatorname{Tr}(x_{t}(-\eta\nabla_{t})^{k}).

Now we proceed to show that k=31k!Tr(xt(ηt)k)0\sum_{k=3}^{\infty}\frac{1}{k!}\operatorname{Tr}(x_{t}(-\eta\nabla_{t})^{k})\leq 0 if η\eta is sufficiently small.

k=31k!Tr(xt(ηt)k)=\displaystyle\sum_{k=3}^{\infty}\frac{1}{k!}\operatorname{Tr}(x_{t}(-\eta\nabla_{t})^{k})= k=11(2k+1)!Tr(xt(ηt)2k+1)+1(2k+2)!Tr(xt(ηt)2k+2)\displaystyle\sum_{k=1}^{\infty}\frac{1}{(2k+1)!}\operatorname{Tr}(x_{t}(-\eta\nabla_{t})^{2k+1})+\frac{1}{(2k+2)!}\operatorname{Tr}(x_{t}(-\eta\nabla_{t})^{2k+2})
=\displaystyle= k=1η2k+1(2k+1)!Tr(xt(t2k+1+η2k+2t2k+2)).\displaystyle\sum_{k=1}^{\infty}\frac{\eta^{2k+1}}{(2k+1)!}\operatorname{Tr}\big{(}x_{t}\big{(}-\nabla_{t}^{2k+1}+\frac{\eta}{2k+2}\nabla_{t}^{2k+2}\big{)}\big{)}.

Let t=VQV\nabla_{t}=VQV^{\dagger} be the eigendecomposition of t\nabla_{t}. Then

t2k+1+η2k+2t2k+2=VQ2k+1(I+η2k+2Q)V.\displaystyle-\nabla_{t}^{2k+1}+\frac{\eta}{2k+2}\nabla_{t}^{2k+2}=VQ^{2k+1}(-I+\frac{\eta}{2k+2}Q)V^{\dagger}.

Since η<12L\eta<\frac{1}{2L}, and tL\|\nabla_{t}\|\leq L,

I+η2k+2Q0,andTr(xt(t2k+1+η2k+2t2k+2))0.\displaystyle-I+\frac{\eta}{2k+2}Q\preceq 0,\text{and}\ \ \operatorname{Tr}\big{(}x_{t}\big{(}-\nabla_{t}^{2k+1}+\frac{\eta}{2k+2}\nabla_{t}^{2k+2}\big{)}\big{)}\leq 0.

Because the summands are nonpositive, we conclude that k=31k!Tr(xt(ηt)k)0\sum_{k=3}^{\infty}\frac{1}{k!}\operatorname{Tr}(x_{t}(-\eta\nabla_{t})^{k})\leq 0. The lemma follows from applying the generalized Cauchy-Schwartz inequality (see 5, Claim 14 and 41, Exercise IV.1.14, page 90),

BR(xtyt+1)12η2Tr(xtt2)η22xt1t2η2GR22.\displaystyle B_{R}(x_{t}\|y_{t+1})\leq\frac{1}{2}\eta^{2}\operatorname{Tr}(x_{t}\nabla_{t}^{2})\leq\frac{\eta^{2}}{2}\|x_{t}\|_{\color[rgb]{0,0,0}1}\|\nabla_{t}\|^{2}\leq\frac{\eta^{2}G_{R}^{2}}{2}.

Summing over tt and telescoping and observing any eigenvalue is lower bounded by 12nT\frac{1}{2^{n}T}, we have

Regret
\displaystyle\leq 1η(BR(φ1x1)BR(φTxT))\displaystyle\frac{1}{\eta}(B_{R}(\varphi_{1}\|x_{1})-B_{R}(\varphi_{T}\|x_{T}))
+1ηt=1T1|BR(φtxt+1)BR(φt+1xt+1)|+1ηt=1T1BR(xtyt+1)\displaystyle+\frac{1}{\eta}\sum_{t=1}^{T-1}|B_{R}(\varphi_{t}\|x_{t+1})-B_{R}(\varphi_{t+1}\|x_{t+1})|+\frac{1}{\eta}\sum_{t=1}^{T-1}B_{R}(x_{t}\|y_{t+1})
\displaystyle\leq 2(DR2+2n+logT)η+ηTGR22+1ηt=1T1|BR(φtxt+1)BR(φt+1xt+1)|\displaystyle\frac{2(D_{R}^{2}+2n+\log T)}{\eta}+\frac{\eta TG_{R}^{2}}{2}+\frac{1}{\eta}\sum_{t=1}^{T-1}|B_{R}(\varphi_{t}\|x_{t+1})-B_{R}(\varphi_{t+1}\|x_{t+1})|
\displaystyle\leq 2(DR2+2n+logT)η+ηTGR22+1ηt=1T1|R(φt)R(φt+1)R(xt+1)(φtφt+1)|\displaystyle\frac{2(D_{R}^{2}+2n+\log T)}{\eta}+\frac{\eta TG_{R}^{2}}{2}+\frac{1}{\eta}\sum_{t=1}^{T-1}|R(\varphi_{t})-R(\varphi_{t+1})-\nabla R(x_{t+1})\bullet(\varphi_{t}-\varphi_{t+1})|
\displaystyle\leq 2(DR2+2n+logT)η+ηTGR22\displaystyle\frac{2(D_{R}^{2}+2n+\log T)}{\eta}+\frac{\eta TG_{R}^{2}}{2}
+1ηt=1T1R(xt+1)(φtφt+1)1+1ηt=1T1|R(φt)R(φt+1)|.\displaystyle+\frac{1}{\eta}\sum_{t=1}^{T-1}\|\nabla R(x_{t+1})\|\|(\varphi_{t}-\varphi_{t+1})\|_{\color[rgb]{0,0,0}1}+\frac{1}{\eta}\sum_{t=1}^{T-1}|R(\varphi_{t})-R(\varphi_{t+1})|. (12)

From aaronson2018online [5] we already know DR=O(n)D_{R}=O(\sqrt{n}) and GR=O(L)G_{R}=O(L) due to the use of the von Neumann entropy. We now compute an upper bound on R(xt+1)\|\nabla R(x_{t+1})\|, suppose xt+1=VQVx_{t+1}=VQV^{\dagger} is the eigendecomposition of xt+1x_{t+1}:

R(xt+1)=I+log(xt+1)=I+Vlog(Q)Vnlog(2)+log(T).\displaystyle\|\nabla R(x_{t+1})\|=\|I+\log(x_{t+1})\|=\|I+V\log(Q)V^{\dagger}\|\leq n\log(2)+\log(T).

The inequality is due to the fact that for xt+1𝒦x_{t+1}\in\mathcal{K}, the eigenvalues of xt+1x_{t+1} are in [12nT,1][\frac{1}{2^{n}T},1].

For the last term in the regret bound, the Fannes–Audenaert inequality [22] states that

|R(φt)R(φt+1)|n2φtφt+11+H(12φtφt+11),\displaystyle|R(\varphi_{t})-R(\varphi_{t+1})|\leq\frac{n}{2}\|\varphi_{t}-\varphi_{t+1}\|_{\color[rgb]{0,0,0}1}+H(\frac{1}{2}\|\varphi_{t}-\varphi_{t+1}\|_{\color[rgb]{0,0,0}1}),

where H(p)=(plog(p)+(1p)log(1p))H(p)=-(p\log(p)+(1-p)\log(1-p)) is the Shannon entropy.

We try to bound H(p)H(p) by a linear function g(p)=λp+cg(p)=\lambda p+c. Formally we have the following lemma:

Lemma 5.

For any λ>0\lambda>0, we have for p(0,1)p\in(0,1),

H(p)λp+logeλ+1eλ.H(p)\leq\lambda p+\log\frac{e^{\lambda}+1}{e^{\lambda}}.
Proof.

Take the derivative of g(p)H(p)g(p)-H(p), we have that the minimal happens when λ=log1pp\lambda=\log\frac{1-p}{p} which is p=1eλ+1p=\frac{1}{e^{\lambda}+1}, and it’s easy to verify that the minimal value of g(p)H(p)g(p)-H(p) equals

λeλ+1+c+1eλ+1log1eλ+1+eλeλ+1logeλeλ+1\frac{\lambda}{e^{\lambda}+1}+c+\frac{1}{e^{\lambda}+1}\log\frac{1}{e^{\lambda}+1}+\frac{e^{\lambda}}{e^{\lambda}+1}\log\frac{e^{\lambda}}{e^{\lambda}+1}

by taking c=logeλ+1eλc=\log\frac{e^{\lambda}+1}{e^{\lambda}}, the minimal value equals zero and therefore g(p)g(p) is an upper bound of H(p)H(p). ∎

Choosing λ=logT\lambda=\log T, we can upper bound the regret by

Regret2(DR2+2n+logT)η+ηTGR22+(n+logT)𝒫η+n𝒫+logT𝒫+12η.\displaystyle\textbf{Regret}\leq\frac{2(D_{R}^{2}+2n+\log T)}{\eta}+\frac{\eta TG_{R}^{2}}{2}+\frac{(n+\log T)\mathcal{P}}{\eta}+\frac{n\mathcal{P}+\log T\mathcal{P}+1}{2\eta}. (13)

Choosing η=𝒫(n+logT)TL2\eta=\sqrt{\frac{\mathcal{P}(n+\log T)}{TL^{2}}}, the regret is upper bounded by O(LT(n+logT)𝒫)O(L\sqrt{T(n+\log T)\mathcal{P}}).

Now we have proven the following dynamic regret bound:

Dynamic Regret=t=1Tt(xt)minφt𝒦t=1Tt(φt)=O(LT(n+logT)𝒫).\displaystyle\text{Dynamic Regret}=\sum_{t=1}^{T}\ell_{t}(x_{t})-\min_{\varphi_{t}\in\mathcal{K}}\sum_{t=1}^{T}\ell_{t}(\varphi_{t})=O(L\sqrt{T(n+\log T)\mathcal{P}}).

Notice that

minφt𝒦t=1Tt(φt)minφtCnt=1Tt(φt)+O(L),\min_{\varphi_{t}\in\mathcal{K}}\sum_{t=1}^{T}\ell_{t}(\varphi_{t})\leq\min_{\varphi_{t}\in C_{n}}\sum_{t=1}^{T}\ell_{t}(\varphi_{t})+O(L),

we conclude that

Dynamic Regret=t=1Tt(xt)minφtCnt=1Tt(φt)=O(LT(n+logT)𝒫).\displaystyle\text{Dynamic Regret}=\sum_{t=1}^{T}\ell_{t}(x_{t})-\min_{\varphi_{t}\in C_{n}}\sum_{t=1}^{T}\ell_{t}(\varphi_{t})=O(L\sqrt{T(n+\log T)\mathcal{P}}).

For the rest of the section, we reason about choosing the optimal η\eta, which depends on the actual path length that we do not know in advance. Algorithm 1 tackles this problem by constructing a set SS of candidate learning rates, say S={14,18,}S=\{\frac{1}{4},\frac{1}{8},...\}, then initiate |S||S| number of Algorithm 2 with different ηS\eta\in S, and run weighted majority algorithm on top of these experts.

The regret of such a two-layer algorithm can be decomposed as a sum of the regret of the base expert and the meta MW algorithm [19]. Since the path length 𝒫\mathcal{P} is upper bounded by O(T)O(T), we need only |S|=O(logT)|S|=O(\log T) base experts, and the additional regret induced by the weighted majority algorithm is only O(TloglogT)O(\sqrt{T\log\log T}). This regret is dominated by the dynamic regret of the best base expert, and thus is negligible in the final bound. Meanwhile, the optimal η\eta is guaranteed to be either a small constant which is already good, or lie between two candidate learning rates so that one of them is near-optimal.

More concretely, the best expert has regret upper bounded by O(LT(n+logT)𝒫)=O~(LTn𝒫)O(L\sqrt{T(n+\log T)\mathcal{P}})=\tilde{O}(L\sqrt{Tn\mathcal{P}}) due to the existence of an optimal ηS\eta\in S by Lemma 3. The regret of the external MW algorithm in Algorithm 1 is upper bounded by O(Tlog(|S|))=O~(T)O(\sqrt{T\log(|S|)})=\tilde{O}(\sqrt{T}), a well-known regret bound of MW whose proof we omit here (see 27, Section 1.3 for example). Summing up, we get the final result O(LT(n+log(T))𝒫)O(L\sqrt{T(n+\log(T))\mathcal{P}}).

Appendix B Proof of Dynamic Regret Bounds Given Dynamical Models

B.1 Proof of Corollary 2

1:  Input: a candidate set of quantum channels {Φ(j)}\{\Phi^{(j)}\},1jM1\leq j\leq M, a candidate set of η\eta, S={2k11klogT}S=\{2^{-k-1}\mid 1\leq k\leq\log T\}, constant α\alpha.
2:  Initialize log(T)M\log(T)M experts E11,,Elog(T)ME_{11},...,E_{\log(T)M}, where EkjE_{kj} is an instance of Algorithm 5 with η=2k1\eta=2^{-k-1} and fixed quantum channel Φ(j)\Phi^{(j)}.
3:  Set initial weights w1(k,j)=1log(T)Mw_{1}(k,j)=\frac{1}{\log(T)M}.
4:  for t=1,,Tt=1,\ldots,T do
5:     Predict xt=k,jwt(k,j)xt(k,j)k,jwt(k,j)x_{t}=\frac{\sum_{k,j}w_{t}(k,j)x_{t}(k,j)}{\sum_{k,j}w_{t}(k,j)}, where xt(k,j)x_{t}(k,j) is the output of expert EkjE_{kj} at time tt.
6:     Observe loss function t()\ell_{t}(\cdot).
7:     Update the weights as
wt+1(k,j)=wt(k,j)eαt(xt(k,j)).w_{t+1}(k,j)=w_{t}(k,j)e^{-\alpha\ell_{t}(x_{t}(k,j))}.
8:     Send gradients t(xt(k,j))\nabla\ell_{t}(x_{t}(k,j)) to each expert EkjE_{kj}.
9:  end for
Algorithm 4 Dynamic Regret for Quantum Tomography Given Candidate Quantum Channels
1:  Input: domain 𝒦=(11T)Cn+1T2nI\mathcal{K}=(1-\frac{1}{T})C_{n}+\frac{1}{T2^{n}}I, step size η<12L\eta<\frac{1}{2L}, quantum channel Φ:CnCn\Phi\colon C_{n}\mapsto C_{n}.
2:  Define R(x)=k=12nλk(x)logλk(x)R(x)=\sum_{k=1}^{2^{n}}\lambda_{k}(x)\log\lambda_{k}(x), R(x):=I+log(x)\nabla R(x):=I+\log(x), and let BRB_{R} denote the Bregman divergence defined by RR.
3:  Set x1=2nIx_{1}=2^{-n}I, and y1y_{1} to satisfy R(y1)=𝟎\nabla R(y_{1})=\bf{0}.
4:  for t=1,,Tt=1,\ldots,T do
5:     Predict xtx_{t} and receive loss t(Tr(Etx))\ell_{t}(\operatorname{Tr}(E_{t}x)).
6:     Define t=t(Tr(Etxt))Et\nabla_{t}=\ell^{\prime}_{t}(\operatorname{Tr}(E_{t}x_{t}))E_{t}, where t(y)\ell_{t}^{\prime}(y) is a subderivative of t\ell_{t} with respect to yy.
7:     Update yt+1y_{t+1} such that R(yt+1)=R(xt)ηt\nabla R(y_{t+1})=\nabla R(x_{t})-\eta\nabla_{t}.
8:     Update x^t+1=argminx𝒦BR(x||yt+1)\hat{x}_{t+1}=\text{argmin}_{x\in\mathcal{K}}B_{R}(x||y_{t+1}).
9:     Update xt+1=Φ(x^t+1)x_{t+1}=\Phi(\hat{x}_{t+1}).
10:  end for
Algorithm 5 DMD for Quantum Tomography Given Fixed Quantum Channel

Similar to Appendix A, we also need to mix xx with IT2n:x~=(11T)x+IT2n,\frac{I}{T2^{n}}:\tilde{x}=(1-\frac{1}{T})x+\frac{I}{T2^{n}}, for both xtx_{t} and φt\varphi_{t}. Notice that for any quantum channel Φ\Phi, we have Φ(x~)=(11T)Φ(x)+IT2n=Φ(x)~\Phi(\tilde{x})=(1-\frac{1}{T})\Phi(x)+\frac{I}{T2^{n}}=\widetilde{\Phi(x)}, so we can use the same dynamical model in the new domain 𝒦=(11T)Cn+1T2nI\mathcal{K}=(1-\frac{1}{T})C_{n}+\frac{1}{T2^{n}}I and the path length of the mixed comparators {φt~}\{\widetilde{\varphi_{t}}\} may only decrease. Due to the LL-Lipschitzness of loss functions, the regret is only hurt by O(LT1T)=O(L)O(LT\frac{1}{T})=O(L).

We first prove a lemma which gives a dynamic regret bound of Algorithm 5 when the learning rate η\eta is chosen optimally.

Lemma 6.

Under the assumptions that the path length 𝒫=t=1T1Φ(φt)φt+111\mathcal{P}=\sum_{t=1}^{T-1}\|\Phi(\varphi_{t})-\varphi_{t+1}\|_{\color[rgb]{0,0,0}1}\geq 1 where 1\|\cdot\|_{\color[rgb]{0,0,0}1} is the nuclear norm and Φ:CnCn\Phi:C_{n}\mapsto C_{n} is a quantum channel, the dynamic regret of Algorithm 5 is upper bounded by O(LT(n+logT)𝒫)O(L\sqrt{T(n+\log T)\mathcal{P}}), if we choose η=𝒫(n+logT)TL2\eta=\sqrt{\frac{\mathcal{P}(n+\log T)}{TL^{2}}}.

Proof.

In general, quantum channels can be seen as completely positive trace-preserving (CPTP) maps on the space of density operators. Ref. lindblad1975completely [42] gave the following result:

Lemma 7 (Contraction of CPTP maps with respective to relative entropy 42).

Let T()T(\mathcal{H}) be the operators in a Hilbert space \mathcal{H} with finite trace and T+()T_{+}(\mathcal{H}) be the positive elements in T()T(\mathcal{H}). If Φ\Phi is a completely positive trace-preserving map of T()T(\mathcal{H}) into itself, then for all A,BT+()A,B\in T_{+}(\mathcal{H})

S(Φ(A)Φ(B))S(AB),S(\Phi(A)\|\Phi(B))\leq S(A\|B),

where S(AB)S(A\|B) is the relative entropy between AA and BB.

Since density operators on finite Hilbert space is positive and has finite trace, and BR(xy)=S(xy)B_{R}(x\|y)=S(x\|y) when RR is the von Neumann entropy, we can derive that if Φ():CnCn\Phi(\cdot)\colon C_{n}\to C_{n} is a quantum channel on CnC_{n},

x,yCn,BR(Φ(x)Φ(y))BR(xy).\forall x,y\in C_{n},B_{R}(\Phi(x)\|\Phi(y))\leq B_{R}(x\|y). (14)

Writing t(x)=t(Tr(Etx))\ell_{t}(x)=\ell_{t}(\operatorname{Tr}(E_{t}x)), similar to Eq. (LABEL:eqn:loss_ineq), we can prove that

t(xt)t(φt)\displaystyle\ell_{t}(x_{t})-\ell_{t}(\varphi_{t})
\displaystyle\leq 1η(BR(φtxt)BR(φtyt+1)+BR(xtyt+1))\displaystyle\frac{1}{\eta}(B_{R}(\varphi_{t}\|x_{t})-B_{R}(\varphi_{t}\|y_{t+1})+B_{R}(x_{t}\|y_{t+1}))
\displaystyle\leq 1η(BR(φtxt)BR(φtx^t+1)+BR(xtyt+1))\displaystyle\frac{1}{\eta}(B_{R}(\varphi_{t}\|x_{t})-B_{R}(\varphi_{t}\|\hat{x}_{t+1})+B_{R}(x_{t}\|y_{t+1}))
=\displaystyle= 1η(BR(φtxt)BR(φt+1xt+1)+BR(φt+1xt+1)BR(φtx^t+1)+BR(xtyt+1))\displaystyle\frac{1}{\eta}(B_{R}(\varphi_{t}\|x_{t})-B_{R}(\varphi_{t+1}\|x_{t+1})+B_{R}(\varphi_{t+1}\|x_{t+1})-B_{R}(\varphi_{t}\|\hat{x}_{t+1})+B_{R}(x_{t}\|y_{t+1}))
\displaystyle\leq 1η(BR(φtxt)BR(φt+1xt+1)+BR(φt+1xt+1)BR(Φ(φt)xt+1)+BR(xtyt+1)),\displaystyle\frac{1}{\eta}(B_{R}(\varphi_{t}\|x_{t})-B_{R}(\varphi_{t+1}\|x_{t+1})+B_{R}(\varphi_{t+1}\|x_{t+1})-B_{R}(\Phi(\varphi_{t})\|x_{t+1})+B_{R}(x_{t}\|y_{t+1})),

where the fourth inequality comes from Eq. (14).

By telescoping sums, we get

Regret
\displaystyle\leq 1η(BR(φ1x1)BR(φTxT))\displaystyle\frac{1}{\eta}(B_{R}(\varphi_{1}\|x_{1})-B_{R}(\varphi_{T}\|x_{T}))
+1ηt=1T1|BR(Φ(φt)xt+1)BR(φt+1xt+1)|+1ηt=1T1BR(xtyt+1)\displaystyle+\frac{1}{\eta}\sum_{t=1}^{T-1}|B_{R}(\Phi(\varphi_{t})\|x_{t+1})-B_{R}(\varphi_{t+1}\|x_{t+1})|+\frac{1}{\eta}\sum_{t=1}^{T-1}B_{R}(x_{t}\|y_{t+1})
\displaystyle\leq 1ηBR(φ1x1)+ηTGR22+1ηt=1T1|BR(Φ(φt)xt+1)BR(φt+1xt+1)|.\displaystyle\frac{1}{\eta}B_{R}(\varphi_{1}\|x_{1})+\frac{\eta TG_{R}^{2}}{2}+\frac{1}{\eta}\sum_{t=1}^{T-1}|B_{R}(\Phi(\varphi_{t})\|x_{t+1})-B_{R}(\varphi_{t+1}\|x_{t+1})|.

Notice that this inequality is the same as Eq. (A) replacing φt\varphi_{t} by Φ(φt)\Phi(\varphi_{t}), so we can follow the same proof of Eq. (13) and obtain a variant regret bound

RegretDR2η+ηTGR22+(n+logT)𝒫η+n𝒫+logT𝒫+12η,\displaystyle\textbf{Regret}\leq\frac{D_{R}^{2}}{\eta}+\frac{\eta TG_{R}^{2}}{2}+\frac{(n+\log T)\mathcal{P}}{\eta}+\frac{n\mathcal{P}+\log T\mathcal{P}+1}{2\eta},

where 𝒫=t=1T1φt+1Φ(φt)1\mathcal{P}=\sum_{t=1}^{T-1}\|\varphi_{t+1}-\Phi(\varphi_{t})\|_{\color[rgb]{0,0,0}1}.

Choosing η=𝒫(n+logT)TL2\eta=\sqrt{\frac{\mathcal{P}(n+\log T)}{TL^{2}}}, the regret is upper bounded by O(LT(n+logT)𝒫)O(L\sqrt{T(n+\log T)\mathcal{P}}). ∎

Proof of Corollary 2.

We will prove that Algorithm 4 can achieve the dynamic regret bound in this corollary and in the following proof we will use the notation in the description of Algorithm 4.

In Algorithm 4, we run weighted majority algorithm on Mlog(T)M\log(T) experts, each of which runs Algorithm 5 with quantum channel Φ(j)\Phi^{(j)} and step size in {14,18,}\{\frac{1}{4},\frac{1}{8},\ldots\}.

From Lemma 6, we can conclude that the minimum dynamic regret bound of all Mlog(T)M\log(T) experts is

O(LT(n+log(T))𝒫),\displaystyle O(L\sqrt{T(n+\log(T))\mathcal{P}^{\prime}}),

where 𝒫=minΦ(j){Φ(1),,Φ(M)}t=1T1φt+1Φ(j)(φt)1\mathcal{P}^{\prime}=\min_{\Phi^{(j)}\in\{\Phi^{(1)},\ldots,\Phi^{(M)}\}}\sum_{t=1}^{T-1}\|\varphi_{t+1}-\Phi^{(j)}(\varphi_{t})\|_{\color[rgb]{0,0,0}1}.

In Lemma 1 of the arXiv version of  zhang2018adaptive [28], they give a regret bound of the weighted majority algorithm:

t=1Tt(xt)t=1Tt(xt(k,j))+c2Tln(1w1(k,j))4,\displaystyle\sum_{t=1}^{T}\ell_{t}(x_{t})\leq\sum_{t=1}^{T}\ell_{t}(x_{t}(k,j))+\frac{c\sqrt{2T\ln(\frac{1}{w_{1}(k,j)})}}{4},

for any 1klog(T),1jM1\leq k\leq\log(T),1\leq j\leq M choosing α=8Tc2ln(1w1(k,j))\alpha=\sqrt{\frac{8}{Tc^{2}\ln(\frac{1}{w_{1}(k,j)})}}.

Let k,jk,j be the optimal one of all Mlog(T)M\log(T) experts, we have

t=1Tt(xt)\displaystyle\sum_{t=1}^{T}\ell_{t}(x_{t})\leq t=1Tt(φt)+c1LT(n+log(T))𝒫+cTln(log(T)M)4\displaystyle\sum_{t=1}^{T}\ell_{t}(\varphi_{t})+c_{1}L\sqrt{T(n+\log(T))\mathcal{P}^{\prime}}+\frac{c\sqrt{T\ln(\log(T)M)}}{4}
=\displaystyle= t=1Tt(φt)+O(LT(n+log(T))𝒫+Tlog(Mlog(T))),\displaystyle\sum_{t=1}^{T}\ell_{t}(\varphi_{t})+O(L\sqrt{T(n+\log(T))\mathcal{P}^{\prime}}+\sqrt{T\log(M\log(T))}),

which concludes our result. ∎

B.2 Proof of Corollary 3

In general, the possible dynamical models can be a continuous set, and we can take it to be the set of all local quantum channels for example. A quantum channel on nn qubits is said to be ll-local if it can be written as a product channel Φ=Φ1I\Phi=\Phi_{1}\otimes I, where Φ1\Phi_{1} only influences ll qubits.

Lemma 8 (Stinespring’s dilation theorem 43).

Let Φ:CnCn\Phi:C_{n}\rightarrow C_{n} be a completely positive and trace-preserving map on the space of density operators of a Hilbert space s\mathcal{H}_{s}. Then there exists a Hilbert space e\mathcal{H}_{e} with dime=(dims)2\dim{\mathcal{H}_{e}}=(\dim{\mathcal{H}_{s}})^{2}and a unitary operation UU on se\mathcal{H}_{s}\otimes\mathcal{H}_{e} such that

Φ(ρ)=Tre(U(ρ|00|e)U)\Phi(\rho)=\operatorname{Tr}_{e}(U(\rho\otimes|0\rangle\langle 0|_{e})U^{\dagger})

for all ρCn\rho\in C_{n}, where Tre\operatorname{Tr}_{e} denotes the partial trace on e\mathcal{H}_{e}.

By Stinespring’s dilation theorem, we can represent the quantum channel on a 2l2^{l} dimension Hilbert space s\mathcal{H}_{s} by partial trace of a pure quantum channel on a 23l2^{3l} dimension Hilbert space se\mathcal{H}_{s}\otimes\mathcal{H}_{e}, so we only need to find ϵ\epsilon-net of unitary operators.

Lemma 9 (ϵ\epsilon-net of UlU_{l} 44, Lemma 4.4).

Let UlU_{l} be the set of unitaries on Hilbert space \mathcal{H} with dim=2l\dim{\mathcal{H}}=2^{l}. For δ>0\delta>0, there is a subset Ul,δU_{l,\delta} of UlU_{l} with

|Ul,δ|(2δ)24l\left|U_{l,\delta}\right|\leq\left(\frac{2}{\delta}\right)^{2^{4l}}

such that for any UUlU\in U_{l}, there exists VUl,δV\in U_{l,\delta} with UV2δ\|U-V\|_{2}\leq\delta.

From Lemma 8 and Lemma 9, we can construct an ϵ\epsilon-net of general quantum channels:

Lemma 10 (ϵ\epsilon-net of quantum channels).

Let SlS_{l} be the set of all quantum channels on density operators on Hilbert space \mathcal{H} with dim=2l\dim{\mathcal{H}}=2^{l}. For any 1>δ>01>\delta>0, there is a subset Sl,δS_{l,\delta} of SlS_{l} with

|Sl,δ|(2δ)212l,\left|S_{l,\delta}\right|\leq\left(\frac{2}{\delta}\right)^{2^{12l}},

such that for any ΦSl\Phi\in S_{l}, there exists Φ~Sl,δ\tilde{\Phi}\in S_{l,\delta} such that supxClΦ(x)Φ~(x)13δ\sup_{x\in C_{l}}\|\Phi(x)-\tilde{\Phi}(x)\|_{\color[rgb]{0,0,0}1}\leq 3\delta.

Proof.

Let e\mathcal{H}_{e} be a Hilbert space with dim(e)=22l\dim(\mathcal{H}_{e})=2^{2l} and a=e\mathcal{H}_{a}=\mathcal{H}\otimes\mathcal{H}_{e}. Let U3l,δU_{3l,\delta} be the ϵ\epsilon-net of unitaries on a\mathcal{H}_{a} in Lemma 9.

Then, we can construct Sl,δ={ΦUU3l,δ,xCl,Φ(x)=Tre(U(x|00|e)U}S_{l,\delta}=\{\Phi\mid\exists U\in U_{3l,\delta},\forall x\in C_{l},\Phi(x)=\operatorname{Tr}_{e}(U(x\otimes|0\rangle\langle 0|_{e})U^{\dagger}\}.

For an arbitrary ΦSl\Phi^{\prime}\in S_{l}, from Lemma 8, we can show that there exists a Hilbert space c\mathcal{H}_{c} with dimc=22l\dim{\mathcal{H}_{c}}=2^{2l} and a unitary operation UU^{\prime} on c\mathcal{H}\otimes\mathcal{H}_{c} such that

Φ(x)=Trc(U(x|00|c)U)\Phi(x)=\operatorname{Tr}_{c}(U^{\prime}(x\otimes|0\rangle\langle 0|_{c})U^{\prime\dagger})

for any xClx\in C_{l}.

Notice that e\mathcal{H}\otimes\mathcal{H}_{e} and c\mathcal{H}\otimes\mathcal{H}_{c} are isomorphic since they are Hilbert space with the same dimension, so there exists VU3l,δV^{\prime}\in U_{3l,\delta} such that UV2δ\|U^{\prime}-V^{\prime}\|_{2}\leq\delta.

From the definition of Sl,δS_{l,\delta}, there exists Φ~Sl,δ\tilde{\Phi}\in S_{l,\delta} such that

Φ~(x)=Tre(V(x|00|e)V)\tilde{\Phi}(x)=\operatorname{Tr}_{e}(V^{\prime}(x\otimes|0\rangle\langle 0|_{e})V^{\prime\dagger})

for any xClx\in C_{l}.

Then, we can show that for any xClx\in C_{l}

Φ(x)Φ~(x)1\displaystyle\|\Phi(x)-\tilde{\Phi}(x)\|_{\color[rgb]{0,0,0}1}
=\displaystyle= Tre(U(x|00|e)UV(x|00|e)V)1\displaystyle\|\operatorname{Tr}_{e}(U^{\prime}(x\otimes|0\rangle\langle 0|_{e})U^{\prime\dagger}-V^{\prime}(x\otimes|0\rangle\langle 0|_{e})V^{\prime\dagger})\|_{\color[rgb]{0,0,0}1}
\displaystyle\leq U(x|00|e)UV(x|00|e)V1\displaystyle\|U^{\prime}(x\otimes|0\rangle\langle 0|_{e})U^{\prime\dagger}-V^{\prime}(x\otimes|0\rangle\langle 0|_{e})V^{\prime\dagger}\|_{\color[rgb]{0,0,0}1}
\displaystyle\leq 2(UV)(x|00|e)V1+(UV)(x|00|e)(UV)1\displaystyle 2\|(U^{\prime}-V^{\prime})(x\otimes|0\rangle\langle 0|_{e})V^{\prime\dagger}\|_{\color[rgb]{0,0,0}1}+\|(U^{\prime}-V^{\prime})(x\otimes|0\rangle\langle 0|_{e})(U^{\prime}-V^{\prime})^{\dagger}\|_{\color[rgb]{0,0,0}1}
\displaystyle\leq 2UV2(x|00|e)V1+UV22x|00|e12δ+δ23δ,\displaystyle 2\|U^{\prime}-V^{\prime}\|_{2}\|(x\otimes|0\rangle\langle 0|_{e})V^{\prime\dagger}\|_{\color[rgb]{0,0,0}1}+\|U^{\prime}-V^{\prime}\|_{2}^{2}\|x\otimes|0\rangle\langle 0|_{e}\|_{\color[rgb]{0,0,0}1}\leq 2\delta+\delta^{2}\leq 3\delta,

where in the first equality, we let Φ\Phi be a quantum channel on the density operators on e\mathcal{H}\otimes\mathcal{H}_{e} since s\mathcal{H}_{s} and e\mathcal{H}_{e} are isomorphic.

The size of Sl,δS_{l,\delta} is the same as |U3l,δ||U_{3l,\delta}| which is (2δ)212l\left(\frac{2}{\delta}\right)^{2^{12l}}. ∎

With Lemma 10, we can construct an ϵ\epsilon-net of ll-local quantum channels.

Lemma 11 (ϵ\epsilon-net of ll-local quantum channels).

Let Sn,lS_{n,l} be the set of all ll-local quantum channels on nn qubits. For any 1>δ>01>\delta>0, there is a subset Sn,l,δS_{n,l,\delta} of Sn,lS_{n,l} with

|Sn,l,δ|(nl)(6δ)212l,|S_{n,l,\delta}|\leq\binom{n}{l}\left(\frac{6}{\delta}\right)^{2^{12l}},

such that for any ΦSn,l\Phi\in S_{n,l}, there exists Φ~Sn,l,δ\tilde{\Phi}\in S_{n,l,\delta} such that supxCnΦ(x)Φ~(x)1δ\sup_{x\in C_{n}}\|\Phi(x)-\tilde{\Phi}(x)\|_{\color[rgb]{0,0,0}1}\leq\delta.

Proof.

From Lemma 10, we can construct a collection of sets {Wi1,,il1i1<<iln}\{W_{i_{1},\ldots,i_{l}}\mid 1\leq i_{1}<\cdots<i_{l}\leq n\}. Each Wi1,,ilW_{i_{1},\ldots,i_{l}} is a set of quantum channels on CnC_{n} and contains extension of channels in Sl,δ3S_{l,\frac{\delta}{3}} in Lemma 10 acting on qubits i1,,ili_{1},\ldots,i_{l} and all operators in it perform identity on the other nln-l qubits, so |Wi1,,il|=(6δ)212l|W_{i_{1},\ldots,i_{l}}|=\left(\frac{6}{\delta}\right)^{2^{12l}}.

Let Sn,l,δS_{n,l,\delta} be the union of all WW in {Wi1,,il1i1<<iln}\{W_{i_{1},\ldots,i_{l}}\mid 1\leq i_{1}<\cdots<i_{l}\leq n\}, then we can conclude for any quantum channel ΦSn,l\Phi\in S_{n,l}, there exists Φ~(x)Sn,l,δ\tilde{\Phi}(x)\in S_{n,l,\delta} with supxCnΦ(x)Φ~(x)1δ\sup_{x\in C_{n}}\|\Phi(x)-\tilde{\Phi}(x)\|_{\color[rgb]{0,0,0}1}\leq\delta and |Sn,l,δ|i1,,il|Wi1,,il|(nl)(6δ)212l|S_{n,l,\delta}|\leq\sum_{i_{1},\ldots,i_{l}}|W_{i_{1},\ldots,i_{l}}|\leq\binom{n}{l}\left(\frac{6}{\delta}\right)^{2^{12l}}. ∎

Proof of Corollary 3.

Let

𝒫=minΦSn,lt=1T1xt+1Φ(xt)1,\displaystyle\mathcal{P}^{\prime}=\min_{\Phi\in S_{n,l}}\sum_{t=1}^{T-1}\|x_{t+1}-\Phi(x_{t})\|_{\color[rgb]{0,0,0}1}, (15)

where Sn,lS_{n,l} is the set of all ll-local quantum channels on nn qubits.

For any ΦSn,l\Phi\in S_{n,l}, there exists Φ~\tilde{\Phi} such that supxCnΦ(x)Φ~(x)1δ\sup_{x\in C_{n}}\|\Phi(x)-\tilde{\Phi}(x)\|_{\color[rgb]{0,0,0}1}\leq\delta, then we have

xt+1Φ(xt)1\displaystyle\|x_{t+1}-\Phi(x_{t})\|_{\color[rgb]{0,0,0}1} xt+1Φ~(xt)1Φ(x)Φ~(x)1xt+1Φ~(xt)1δ,\displaystyle\geq\|x_{t+1}-\tilde{\Phi}(x_{t})\|_{\color[rgb]{0,0,0}1}-\|\Phi(x)-\tilde{\Phi}(x)\|_{\color[rgb]{0,0,0}1}\geq\|x_{t+1}-\tilde{\Phi}(x_{t})\|_{\color[rgb]{0,0,0}1}-\delta,

for any 1tT1\leq t\leq T. Thus, we have

minΦSn,lt=1T1xt+1Φ(xt)1minΦSn,l,δt=1T1xt+1Φ(xt)1Tδ.\displaystyle\min_{\Phi\in S_{n,l}}\sum_{t=1}^{T-1}\|x_{t+1}-\Phi(x_{t})\|_{\color[rgb]{0,0,0}1}\geq\min_{\Phi\in S_{n,l,\delta}}\sum_{t=1}^{T-1}\|x_{t+1}-\Phi(x_{t})\|_{\color[rgb]{0,0,0}1}-T\delta.

From Corollary 2, we can show that the regret of Algorithm 4 with the candidate set of quantum channels to be the Sn,l,δS_{n,l,\delta} in Lemma 11 satisfies

Regret
\displaystyle\leq c1LT(n+log(T))minΦSn,l,δt=1T1xt+1Φ(xt)1+c2Tloglog(T)+Tlog((nl)(6δ)212lmissing)\displaystyle c_{1}L\sqrt{T(n+\log(T))\min_{\Phi\in S_{n,l,\delta}}\sum_{t=1}^{T-1}\|x_{t+1}-\Phi(x_{t})\|_{\color[rgb]{0,0,0}1}}+c_{2}\sqrt{T\log\log(T)+T\log\biggl(\binom{n}{l}\left(\frac{6}{\delta}\right)^{2^{12l}}\biggr{missing})}
\displaystyle\leq c1LT(n+log(T))(minΦSn,lt=1T1xt+1Φ(xt)1+Tδ)+c2Tloglog(T)+Tlog((nl)(6δ)212lmissing)\displaystyle c_{1}L\sqrt{T(n+\log(T))(\min_{\Phi\in S_{n,l}}\sum_{t=1}^{T-1}\|x_{t+1}-\Phi(x_{t})\|_{\color[rgb]{0,0,0}1}+T\delta)}+c_{2}\sqrt{T\log\log(T)+T\log\biggl(\binom{n}{l}\left(\frac{6}{\delta}\right)^{2^{12l}}\biggr{missing})}
=\displaystyle= c1LT(n+log(T))(𝒫+3Tδ)+c2Tloglog(T)+T(212llog(6δ)+llog(n)).\displaystyle c_{1}L\sqrt{T(n+\log(T))(\mathcal{P}^{\prime}+3T\delta)}+c_{2}\sqrt{T\log\log(T)+T\left(2^{12l}\log(\frac{6}{\delta})+l\log(n)\right)}.

Let δ=1T\delta=\frac{1}{T}, we can achieve a regret of O~(LnT𝒫+26lT)\tilde{O}(L\sqrt{nT\mathcal{P}^{\prime}}+2^{6l}\sqrt{T}) when 𝒫1\mathcal{P}^{\prime}\geq 1. ∎

Appendix C Other Proofs

C.1 KK-outcome Measurement Cases

We can generalize the two-outcome measurements in our results to KK-outcome measurements. The only issue of replacing two-outcome measurements with KK-outcome measurements is that the output of the KK-outcome measurement is a probability distribution rather than a scalar, so we need a new loss function to measure the regret. Let 𝐩t=(Tr(Et,1xt),,Tr(Et,Kxt))\mathbf{p}_{t}=(\mathrm{Tr}(E_{t,1}x_{t}),\ldots,\mathrm{Tr}(E_{t,K}x_{t})) and 𝐛t=(bt,i)i=1K\mathbf{b}_{t}=(b_{t,i})_{i=1}^{K} be the probability distribution corresponding to the output of the measurement actting on the ground truth state ρt\rho_{t}. In this case, we can define the loss function at time tt to be

t(xt)=dTV(𝐩𝐭,𝐛𝐭)=12i=1K|Tr(Et,ixt)bt,i|.\ell_{t}(x_{t})=d_{\mathrm{TV}}(\mathbf{p_{t}},\mathbf{b_{t}})=\frac{1}{2}\sum_{i=1}^{K}\left|\operatorname{Tr}\left(E_{t,i}x_{t}\right)-b_{t,i}\right|. (16)

This is a convex function so the proof in previous sections still holds and we only need to determine GRG_{R} which is the upper bound on t\|\nabla_{t}\|.

First, we prove the following inequality on the spectral norm of linear combination of EiE_{i}. For any KK-outcome measurement \mathcal{M} described by {E1,,EK}\{E_{1},\ldots,E_{K}\} and KK real numbers a1,,aKa_{1},\ldots,a_{K} such that |ai|L|a_{i}|\leq L for all i[K]i\in[K], we have

i=1KaiEi\displaystyle\|\sum_{i=1}^{K}a_{i}E_{i}\| =maxvn,v2=1v(i=1KaiEi)vmaxvn,v2=1i=1K|ai|vEiv\displaystyle=\max_{v\in\mathbb{C}^{n},\|v\|_{2}=1}v^{\dagger}(\sum_{i=1}^{K}a_{i}E_{i})v\leq\max_{v\in\mathbb{C}^{n},\|v\|_{2}=1}\sum_{i=1}^{K}|a_{i}|v^{\dagger}E_{i}v
maxvn,v2=1Lv(i=1KEi)v=L,\displaystyle\leq\max_{v\in\mathbb{C}^{n},\|v\|_{2}=1}Lv^{\dagger}(\sum_{i=1}^{K}E_{i})v=L, (17)

where we use spectral norm in the first line and the first inequlity comes from 0Ei0\preceq E_{i}. Then we have

t=i=1K12sgn(Tr(Et,ixt)bt,i)Et,i12,\|\nabla_{t}\|=\|\sum_{i=1}^{K}\frac{1}{2}\mathrm{sgn}(\mathrm{Tr}(E_{t,i}x_{t})-b_{t,i})E_{t,i}\|\leq\frac{1}{2}, (18)

where sgn(x)\mathrm{sgn}(x) is the sign function.

Therefore, GR=12G_{R}=\frac{1}{2} is a valid parameter in this case. Since we have GR=LG_{R}=L in previous two-outcome cases, we can simply repalce all LL with 12\frac{1}{2} in our results in two-outcome cases to obatin an algorithm in KK-outcome cases with loss function in Eq. (16).

C.2 Proof of Corollary 1

Proof.

To achieve the mistake bound, we run 𝒜\mathcal{A} in a subset of the iterations. Specifically, we only update xtx_{t} if t(Tr(Etxt))>23ϵ\ell_{t}(\operatorname{Tr}(E_{t}x_{t}))>\frac{2}{3}\epsilon. Since t(Tr(Etxt))>23ϵ\ell_{t}(\operatorname{Tr}(E_{t}x_{t}))>\frac{2}{3}\epsilon whenever |Tr(Etxt)Tr(Etρt)|>ϵ\left|\operatorname{Tr}\left(E_{t}x_{t}\right)-\operatorname{Tr}\left(E_{t}\rho_{t}\right)\right|>\epsilon, the number of the mistakes 𝒜\mathcal{A} makes is at most the number of the updates 𝒜\mathcal{A} makes.

Let the sequence of tt when 𝒜\mathcal{A} updates xtx_{t} be {ti}i=1T\{t_{i}\}_{i=1}^{T}. Since {ρti}\{\rho_{t_{i}}\} is a valid comparator sequence (sub-sequence of a kk-shift sequence shifts at most kk times) with i=1Tti(Tr(Etiρti))13ϵT\sum_{i=1}^{T}\ell_{t_{i}}(\operatorname{Tr}(E_{t_{i}}\rho_{t_{i}}))\leq\frac{1}{3}\epsilon T, the number of updates TT satisfies the inequality 23ϵT13ϵT+O~(LTn𝒫)\frac{2}{3}\epsilon T\leq\frac{1}{3}\epsilon T+\tilde{O}(L\sqrt{Tn\mathcal{P}}), where O~(LTn𝒫)\tilde{O}(L\sqrt{Tn\mathcal{P}}) comes from the fact that the algorithm’s regret compared to the best expert is upper bounded by O~(LTn𝒫)\tilde{O}(L\sqrt{Tn\mathcal{P}}). Thus, it implies T=O~(L2n𝒫ϵ2)T=\tilde{O}(\frac{L^{2}n\mathcal{P}}{\epsilon^{2}}), which completes the proof. ∎

C.3 Proof of Lemma 1

Proof.

The CBCE algorithm in jun2017improved [20] is a meta-algorithm which uses any online convex optimization algorithm \mathcal{B} as a black-box algorithm and obtain a strongly adaptive regret with an O(τlog(T))O(\sqrt{\tau\log(T)}) overhead for any interval length τ\tau. Its guarantee is formalized in the following lemma.

Lemma 12 (SA-Regret of CBCE\text{CBCE}\langle\mathcal{B}\rangle, 20, Theorem 2).

Assume the loss function t(x)\ell_{t}(x), x𝒦x\in\mathcal{K} is convex and maps to [0,1],t[0,1],\forall t\in\mathbb{N} and that the black-box algorithm \mathcal{B} has static regret RTR_{T}^{\mathcal{B}} bounded by cTαcT^{\alpha}, where α(0,1)\alpha\in(0,1). Let I=[I1,I2]I=\left[I_{1},I_{2}\right]. The SA-Regret of CBCE with black-box \mathcal{B} on the interval I is bounded as follows:

RCBCE(I)\displaystyle R^{\mathrm{CBCE}\langle\mathcal{B}\rangle}(I) =minφ𝒦tI(t(𝐱t)t(φ))42α1c|I|α+8|I|(7log(I2)+5)\displaystyle=\min_{\varphi\in\mathcal{K}}\sum_{t\in I}\left(\ell_{t}\left(\mathbf{x}_{t}^{\mathcal{M}\langle\mathcal{B}\rangle}\right)-\ell_{t}(\varphi)\right)\leq\frac{4}{2^{\alpha}-1}c|I|^{\alpha}+8\sqrt{|I|\left(7\log\left(I_{2}\right)+5\right)}
=O(c|I|α+|I|log(I2)).\displaystyle=O\left(c|I|^{\alpha}+\sqrt{|I|\log(I_{2})}\right).

where \mathcal{M}\langle\mathcal{B}\rangle denotes the meta-algorithm. Note that this bound can be obtained with any bounded convex function sequence t\ell_{t}.

For the choice of black-box algorithm, the RFTL based algorithm we used as the black-box function in aaronson2018online [5] achieves an O(Tn)O(\sqrt{Tn}) bound on the static regret when t\ell_{t} are 1\ell_{1} or 2\ell_{2} loss.

Lemma 13 (5, Theorem 2).

Let E1,E2,E_{1},E_{2},\ldots be a sequence of two-outcome measurements on an nn-qubit state presented to the learner, and suppose t\ell_{t} is convex and L-Lipschitz. Then there is an explicit learning strategy that guarantees regret RT=O(LTn)R_{T}=O(L\sqrt{Tn}) for all TT. Specifically, when applied to 1\ell_{1} loss and 2\ell_{2} loss, the RFTL algorithm achieves regret O(Tn)O(\sqrt{Tn}) for both.

Note that both 1\ell_{1} and 2\ell_{2} norm versions of our loss function t\ell_{t} are 22-Lipschitz and bounded. Therefore, we can directly obtain an algorithm with strongly adaptive regret bound by combining Lemma 12 and Lemma 13 with α=1/2\alpha=1/2. ∎

C.4 Proof of Theorem 2

Proof.

Given any 1=t1t2tk+1=T+11=t_{1}\leq t_{2}\leq\cdots\leq t_{k+1}=T+1 and φ1,φ2,,φkCn\varphi_{1},\varphi_{2},\ldots,\varphi_{k}\in C_{n}, 𝒜\mathcal{A} guarantees that

t=1Tt(Etxt𝒜)j=1kt=tjtj+11t(Etφj)\displaystyle\sum_{t=1}^{T}\ell_{t}(E_{t}x_{t}^{\mathcal{A}})-\sum_{j=1}^{k}\sum_{t=t_{j}}^{t_{j+1}-1}\ell_{t}(E_{t}\varphi_{j})
\displaystyle\leq j=1kcLn(tj+1tj)log(T)\displaystyle\sum_{j=1}^{k}cL\sqrt{n(t_{j+1}-t_{j})\log(T)}
\displaystyle\leq cLknTlog(T),\displaystyle cL\sqrt{knT\log(T)},

where cc is some constant. In the second line we use the result of Lemma 1, and in the third line we use the Cauchy–Schwarz inequality. ∎

C.5 Proof of Corollary 4

Proof.

To achieve the mistake bound, we run 𝒜\mathcal{A} in a subset of the iterations. Specifically, we only update xtx_{t} if t(Tr(Etxt))>23ϵ\ell_{t}(\operatorname{Tr}(E_{t}x_{t}))>\frac{2}{3}\epsilon. Since t(Tr(Etxt))>23ϵ\ell_{t}(\operatorname{Tr}(E_{t}x_{t}))>\frac{2}{3}\epsilon whenever |Tr(Etxt)Tr(Etρt)|>ϵ\left|\operatorname{Tr}\left(E_{t}x_{t}\right)-\operatorname{Tr}\left(E_{t}\rho_{t}\right)\right|>\epsilon, the number of the mistakes 𝒜\mathcal{A} makes is at most the number of the updates 𝒜\mathcal{A} makes.

Let the sequence of tt when 𝒜\mathcal{A} updates xtx_{t} be {ti}i=1T\{t_{i}\}_{i=1}^{T}. Since {ρti}\{\rho_{t_{i}}\} is a valid comparator sequence (sub-sequence of a kk-shift sequence shifts at most kk times) with i=1Tti(Tr(Etiρti))13ϵT\sum_{i=1}^{T}\ell_{t_{i}}(\operatorname{Tr}(E_{t_{i}}\rho_{t_{i}}))\leq\frac{1}{3}\epsilon T, from Theorem 2, we can conclude that the number of updates TT satisfies the inequality 23ϵT13ϵT+cknTlog(T)\frac{2}{3}\epsilon T\leq\frac{1}{3}\epsilon T+c\sqrt{knT\log(T)}. Thus, it implies T=O(knϵ2log2(knϵ2))T=O(\frac{kn}{\epsilon^{2}}\log^{2}(\frac{kn}{\epsilon^{2}})), which completes the proof. ∎