This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Maximum-Likelihood Quantum State Tomography by Cover’s Method with Non-Asymptotic Analysis

Chien-Ming Lin Department of Computer Science and Information Engineering, National Taiwan University Hao-Chung Cheng Department of Electrical Engineering, National Taiwan University Department of Mathematics, National Taiwan University Hon Hai (Foxconn) Quantum Computing Centre Yen-Huan Li Department of Computer Science and Information Engineering, National Taiwan University Department of Mathematics, National Taiwan University
Abstract

We propose an iterative algorithm that computes the maximum-likelihood estimate in quantum state tomography. The optimization error of the algorithm converges to zero at an O((1/k)logD)O((1/k)\log D) rate, where kk denotes the number of iterations and DD denotes the dimension of the quantum state. The per-iteration computational complexity of the algorithm is O(D3+ND2)O(D^{3}+ND^{2}), where NN denotes the number of measurement outcomes. The algorithm can be considered as a parameter-free correction of the RρRR\rho R method [A. I. Lvovsky. Iterative maximum-likelihood reconstruction in quantum homodyne tomography. J. Opt. B: Quantum Semiclass. Opt. 2004] [G. Molina-Terriza et al. Triggered qutrits for quantum communication protocols. Phys. Rev. Lett. 2004.].

1 Introduction

Quantum state tomography aims to estimate the quantum state of a physical system, given measurement outcomes (see, e.g., [Pv04] for a complete survey). There are various approaches to quantum state tomography, such as trace regression [FGLE12, GLF+10, OWV97, YJZS20, YFT19], maximum-likelihood estimation [Hra97, HvFJ04], Bayesian estimation [BK10a, BK10b], and recently proposed deep learning-based methods [AMnNK20, QFN21]111A confusion the authors frequently encounter is that many people mix state tomography with the notion of shadow tomography introduced by Aaronson [ACH+18, Aar20]. State tomography aims at estimating the quantum state, whereas shadow tomography aims at estimating the probability distribution of measurement outcomes. Indeed, one interesting conclusion by Aaronson is that shadow tomography requires much less data than state tomography. . Among existing approaches, the maximum-likelihood estimation approach has been standard in practice and enjoys favorable asymptotic statistical guarantees (see, e.g., [HvFJ04, SBK18]). The maximum-likelihood estimator is given by the optimization problem:

ρ^argminρ𝒟f(ρ),f(ρ)1Nn=1NlogTr(Mnρ),\hat{\rho}\in\operatorname*{\arg\min}_{\rho\in\mathcal{D}}f(\rho),\quad f(\rho)\coloneqq\frac{1}{N}\sum_{n=1}^{N}-\log\operatorname{\mathrm{Tr}}(M_{n}\rho), (1)

for some Hermitian positive semi-definite matrices MnM_{n}, where 𝒟\mathcal{D} denotes the set of quantum density matrices, i.e.,

𝒟{ρD×D|ρ=ρ,ρ0,Tr(ρ)=1},\mathcal{D}\coloneqq\Set{\rho\in\mathbb{C}^{D\times D}}{\rho=\rho^{\dagger},\rho\geq 0,\operatorname{\mathrm{Tr}}(\rho)=1},

and NN denotes the number of measurement outcomes. We write ρ\rho^{\dagger} for the conjugate transpose of ρ\rho.

RρRR\rho R is a numerical method developed to solve (1) [Lvo04, MTVv+04]. Given a positive definite ρ1𝒟\rho_{1}\in\mathcal{D}, RρRR\rho R iterates as

ρk+1=𝒩(R(ρk)ρkR(ρk)),R(ρk)f(ρk)=1Nn=1NMnTr(Mnρk),k,\rho_{k+1}=\mathcal{N}(R(\rho_{k})\rho_{k}R(\rho_{k})),\quad R(\rho_{k})\coloneqq-\nabla f(\rho_{k})=\frac{1}{N}\sum_{n=1}^{N}\frac{M_{n}}{\operatorname{\mathrm{Tr}}(M_{n}\rho_{k})},\quad\forall k\in\mathbb{N},

where the mapping 𝒩\mathcal{N} scales its input such that Tr(ρk+1)=1\operatorname{\mathrm{Tr}}(\rho_{k+1})=1, and f\nabla f denotes the gradient mapping of ff. RρRR\rho R is parameter-free (i.e., it does not require parameter tuning) and typically converges fast in practice. Unfortunately, one can construct a synthetic data-set on which RρRR\rho R does not converge [vHKL07].

According to [Lvo04], RρRR\rho R is inspired by Cover’s method222Indeed, Cover’s method coincides with the expectation maximization method for solving (2) and hence is typically called expectation maximization in literature. Nevertheless, Cover’s and our derivations and convergence analyses do not need and are not covered by existing results on expectation maximization, so we do not call the method expectation maximization to avoid possible confusions. for solving the optimization problem [Cov84]:

xargminxΔg(x),g(x)1Nn=1Nlogan,x,x^{\star}\in\operatorname*{\arg\min}_{x\in\Delta}g(x),\quad g(x)\coloneqq\frac{1}{N}\sum_{n=1}^{N}-\log\braket{a_{n},x}, (2)

for some entry-wise non-negative vectors anDa_{n}\in\mathbb{R}^{D}, where Δ\Delta denotes the probability simplex in D\mathbb{R}^{D}, i.e.,

Δ{x=(x1,,xD)D|xd0d,d=1Dxd=1},\Delta\coloneqq\Set{x=(x_{1},\ldots,x_{D})\in\mathbb{R}^{D}}{x_{d}\geq 0~{}\forall~{}d,\sum_{d=1}^{D}x_{d}=1},

and the inner product is the one associated with the Euclidean norm. The optimization problem appears when one wants to compute the growth-optimal portfolio for long-term investment (see, e.g., [MTZ12]). Given an entry-wise strictly positive initial iterate x1++Dx_{1}\in\mathbb{R}_{++}^{D}, Cover’s method iterates as

xk+1=xk(g(xk)),k,x_{k+1}=x_{k}\circ\left(-\nabla g(x_{k})\right),\quad\forall k\in\mathbb{N},

where \circ denotes the entry-wise product, aka the Schur product. Cover’s method is guaranteed to converge to the optimum [Cov84]. Indeed, if the matrices in (1) share the same eigenbasis, then it is easily checked that (1) is equivalent to (2) but RρRR\rho R is not equivalent to Cover’s method333Cover’s method does not need the scaling mapping 𝒩\mathcal{N}. One can check that its iterates are already in Δ\Delta.. This explains why RρRR\rho R does not inherit the convergence guarantee of Cover’s method.

Rehacek et al. proposed a diluted correction of RρRR\rho R [vHKL07]. Given a positive definite initial iterate ρ1𝒟\rho_{1}\in\mathcal{D}, diluted RρRR\rho R iterates as

ρk+1=𝒩((R(ρk)+αkI)ρk(R(ρk)+αkI)),\rho_{k+1}=\mathcal{N}(\left(R(\rho_{k})+\alpha_{k}I\right)\rho_{k}\left(R(\rho_{k})+\alpha_{k}I\right)),

where the parameter αk\alpha_{k} is chosen by exact line search. Later, Goncalves et al. proposed a variant of diluted RρRR\rho R by adopting Armijo line search [GGRL14]. Both versions of diluted RρRR\rho R are guaranteed to converge to the optimum. Unfortunately, the convergence guarantees of both versions of diluted RρRR\rho R are asymptotic and do not allow us to characterize the iteration complexity, the number of iterations required to obtain an approximate solution of (1). In particular, the dimension DD grows exponentially with the number of qubits and can be huge, but the dependence of the iteration complexities of the diluted RρRR\rho R methods on DD is unclear.

We propose the following algorithm.

  • Set ρ1=I/D\rho_{1}=I/D, where II denotes the identity matrix.

  • For each kk\in\mathbb{N}, compute

    ρk+1=𝒩(exp(log(ρk)+log(R(ρk)))),\rho_{k+1}=\mathcal{N}\left(\exp\left(\log(\rho_{k})+\log(R(\rho_{k}))\right)\right),

    where exp\exp and log\log denote matrix exponential and logarithm, respectively.

Notice that the objective function implicitly requires that Tr(Mnρk)\operatorname{\mathrm{Tr}}(M_{n}\rho_{k}) are strictly positive; otherwise, R(ρk)R(\rho_{k}) does not exist as logTr(Mnρk)-\log\operatorname{\mathrm{Tr}}(M_{n}\rho_{k}) is not well-defined. Our initialization and iteration rule guarantee that ρk\rho_{k} are full-rank and Tr(Mnρk)\operatorname{\mathrm{Tr}}(M_{n}\rho_{k}) are strictly positive.

Let us discuss the computational complexity of the proposed algorithm. The computational complexity of computing f(ρk)\nabla f(\rho_{k}) is O(ND2)O(ND^{2}). The computational complexities of computing matrix logarithm and exponential are O(D3)O(D^{3}). The per-iteration computational complexity is hence O(D3+ND2)O(D^{3}+ND^{2}).

We can observe that the proposed algorithm recovers Cover’s method when ρk\rho_{k} and f(ρk)\nabla f(\rho_{k}) share the same eigenbasis. We show that the proposed algorithm indeed converges and its iteration complexity is logarithmic in the dimension DD.

Theorem 1

Assume that n=1Nker(Mn)={0}\bigcap_{n=1}^{N}\ker(M_{n})=\set{0}. Let (ρk)k(\rho_{k})_{k\in\mathbb{N}} be the sequence of iterates generated by the proposed method. Define ρ¯k(ρ1++ρk)/k\overline{\rho}_{k}\coloneqq(\rho_{1}+\cdots+\rho_{k})/k. Then, for every ε>0\varepsilon>0, we have f(ρ¯k)f(ρ^)εf(\overline{\rho}_{k})-f(\hat{\rho})\leq\varepsilon if k(1/ε)logDk\geq(1/\varepsilon)\log D.

Remark 1

Suppose 𝒦n=1Nker(Mn){0}\mathcal{K}\coloneqq\bigcap_{n=1}^{N}\ker(M_{n})\neq\set{0}. Let UU be a matrix whose columns form an orthogonal basis of 𝒦\mathcal{K}^{\perp}. Then, it suffices to solve (1) on a lower-dimensional space by replacing AnA_{n} with UAnUU^{\dagger}A_{n}U in the objective function.

Recall that (2) and Cover’s method are special cases of (1) and the proposed algorithm, respectively. Moreover, Cover’s method is equivalent to the expectation maximization method for Poisson inverse problems [VL93]. Theorem 1 is hence of independent interest even for computing the growth-optimal portfolio by Cover’s method and solving Poisson inverse problems by expectation maximization, showing that the iteration complexities of both are also O(ε1logD)O(\varepsilon^{-1}\log D). This supplements the asymptotic convergence results in [Cov84] and [VSK85]. Whereas the same iteration complexity bound for Cover’s method in growth-optimal portfolio is immediate from a lemma due to Iusem [Ius92, Lemma 2.2], it is currently unclear to us how to extend Iusem’s analysis to the quantum setup.

2 Proof of Theorem 1

For convenience, let MM be a random matrix following the empirical probability distribution of M1,,MNM_{1},\ldots,M_{N}444Notice the derivation here is not restricted to the empirical probability distribution.. Then, we have f(ρ)=𝖤[logM,ρ]f(\rho)=\mathsf{E}\left[-\log\braket{M,\rho}\right], where 𝖤\mathsf{E} denotes the mathematical expectation. Define

r(ρ)MM,ρ,R(ρ)f(ρ)=𝖤[r(ρ)],r(\rho)\coloneqq\frac{M}{\braket{M,\rho}},\quad R(\rho)\coloneqq-\nabla f(\rho)=\mathsf{E}\left[r(\rho)\right],

where the inner product, as well as all inner products in the rest of this section, is the Hilbert-Schmidt inner product. We start with an error upper bound.

Lemma 1

For any density matrix ρ\rho such that R(ρ)R(\rho) exists,

f(ρ)f(ρ^)maxσ𝒟logR(ρ),σ.f(\rho)-f(\hat{\rho})\leq\max_{\sigma\in\mathcal{D}}\braket{\log R(\rho),\sigma}.

Remark 2

Notice that R(ρ)R(\rho) is always positive definite and logR(ρ)\log R(\rho) is always well-defined. Otherwise, suppose there exists some vector uu such that u,R(ρ)u=0\braket{u,R(\rho)u}=0. Then, as MnM_{n} are positive semi-definite, we have u,Mnu=0\braket{u,M_{n}u}=0 for all nn, violating the assumption that n=1Nker(Mn)={0}\bigcap_{n=1}^{N}\ker(M_{n})=\set{0}.

Proof (Lemma 1)

By Jensen’s inequality, we write

f(ρ)f(ρ^)\displaystyle f(\rho)-f(\hat{\rho}) =𝖤[logMM,ρ,ρ^]\displaystyle=\mathsf{E}\left[\log\Braket{\frac{M}{\braket{M,\rho}},\hat{\rho}}\right]
log[𝖤[MM,ρ],ρ^]\displaystyle\leq\log\left[\Braket{\mathsf{E}\left[\frac{M}{\braket{M,\rho}}\right],\hat{\rho}}\right]
=logR(ρ),ρ^\displaystyle=\log\braket{R(\rho),\hat{\rho}}
logλmax(R(ρ))\displaystyle\leq\log\lambda_{\max}(R(\rho))
=λmax(logR(ρ))\displaystyle=\lambda_{\max}(\log R(\rho))
=maxσ𝒟logR(ρ),σ.\displaystyle=\max_{\sigma\in\mathcal{D}}\braket{\log R(\rho),\sigma}.

Deriving the following lemma is the major technical challenge in our convergence analysis. The lemma shows that the mapping logR()\log R(\cdot) is operator convex.

Lemma 2

For any density matrix σ\sigma, the function φ(ρ)logR(ρ),σ\varphi(\rho)\coloneqq\braket{\log R(\rho),\sigma} is convex.

Proof

Equivalently, we want to show that D2φ(ρ)[δ,δ]0D^{2}\varphi(\rho)[\delta,\delta]\geq 0 for all ρdomD2φ\rho\in\operatorname{\mathrm{dom}}D^{2}\varphi and Hermitian δD×D\delta\in\mathbb{C}^{D\times D}. Define A0R(ρ)A_{0}\coloneqq R(\rho). By [HP14, Example 3.22 and Exercise 3.24], we have

A1DR(ρ)[δ]=𝖤[r(ρ)r(ρ),δ],\displaystyle A_{1}\coloneqq DR(\rho)[\delta]=-\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}\right],
A2D2R(ρ)[δ,δ]=2𝖤[r(ρ)r(ρ),δ2],\displaystyle A_{2}\coloneqq D^{2}R(\rho)[\delta,\delta]=2\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}^{2}\right],
Dlog(A0)[A2]=0(A0+sI)1A2(A0+sI)1ds,\displaystyle D\log(A_{0})[A_{2}]=\int_{0}^{\infty}(A_{0}+sI)^{-1}A_{2}(A_{0}+sI)^{-1}\,\mathrm{d}s,
D2log(A0)[A1,A1]=20(A0+sI)1A1(A0+sI)1A1(A0+sI)1ds.\displaystyle D^{2}\log(A_{0})[A_{1},A_{1}]=-2\int_{0}^{\infty}(A_{0}+sI)^{-1}A_{1}(A_{0}+sI)^{-1}A_{1}(A_{0}+sI)^{-1}\,\mathrm{d}s.

Define Φ(ρ)logR(ρ)\Phi(\rho)\coloneqq\log R(\rho). Recall the chain rule for the second-order Fréchet derivative (see, e.g., [Bha97, p. 316]):

D2Φ(ρ)[δ,δ]=D2log(R(ρ))[DR(ρ)[δ],DR(ρ)[δ]]+Dlog(R(ρ))[D2R(ρ)[δ,δ]].D^{2}\Phi(\rho)[\delta,\delta]=D^{2}\log(R(\rho))[DR(\rho)[\delta],DR(\rho)[\delta]]+D\log(R(\rho))[D^{2}R(\rho)[\delta,\delta]].

Then, we write

D2φ(ρ)[δ,δ]\displaystyle D^{2}\varphi(\rho)[\delta,\delta] =Tr(σD2Φ(ρ)[δ,δ])\displaystyle=\operatorname{\mathrm{Tr}}\left(\sigma D^{2}\Phi(\rho)[\delta,\delta]\right)
=20Tr(σ(A0+sI)1A1(A0+sI)1A1(A0+sI)1)ds+\displaystyle=-2\int_{0}^{\infty}\operatorname{\mathrm{Tr}}\left(\sigma(A_{0}+sI)^{-1}A_{1}(A_{0}+sI)^{-1}A_{1}(A_{0}+sI)^{-1}\right)\,\mathrm{d}s+
0Tr(σ(A0+sI)1A2(A0+sI)1)ds\displaystyle\quad\,\,\int_{0}^{\infty}\operatorname{\mathrm{Tr}}(\sigma(A_{0}+sI)^{-1}A_{2}(A_{0}+sI)^{-1})\,\mathrm{d}s
=20Tr(Bs((A0+sI)1σ(A0+sI)1))ds,\displaystyle=2\int_{0}^{\infty}\operatorname{\mathrm{Tr}}\left(B_{s}\left((A_{0}+sI)^{-1}\sigma(A_{0}+sI)^{-1}\right)\right)\,\mathrm{d}s,

where

BsA22A1(A0+sI)1A1.B_{s}\coloneqq\frac{A_{2}}{2}-A_{1}(A_{0}+sI)^{-1}A_{1}.

Since (A0+sI)1σ(A0+sI)1(A_{0}+sI)^{-1}\sigma(A_{0}+sI)^{-1} is obviously positive semi-definite, it suffices to show that BsB_{s} is positive semi-definite for all s0s\geq 0. We write

Bs\displaystyle B_{s} A22A1A01A1\displaystyle\geq\frac{A_{2}}{2}-A_{1}A_{0}^{-1}A_{1}
=𝖤[r(ρ)r(ρ),δ2]𝖤[r(ρ)r(ρ),δ](𝖤[r(ρ)])1𝖤[r(ρ)r(ρ),δ],\displaystyle=\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}^{2}\right]-\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}\right]\left(\mathsf{E}\left[r(\rho)\right]\right)^{-1}\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}\right],

which is positive semi-definite by an extension of the Cauchy-Schwarz inequality due to Lavergne [Lav08]555Whereas Lavergne considers the real matrix case, we notice the proof directly extends to the Hermitian matrix case..

Now, we are ready to prove Theorem 1.

Proof (Theorem 1)

By the Golden-Thompson inequality, we write

Tr(exp(log(ρk)+log(f(ρk))))Tr(ρk(f(ρk)))=𝖤[M,ρkM,ρk]=1.\operatorname{\mathrm{Tr}}\left(\exp\left(\log(\rho_{k})+\log(-\nabla f(\rho_{k}))\right)\right)\leq\operatorname{\mathrm{Tr}}\left(\rho_{k}\left(-\nabla f(\rho_{k})\right)\right)=\mathsf{E}\left[\frac{\braket{M,\rho_{k}}}{\braket{M,\rho_{k}}}\right]=1.

Therefore, by the iteration rule of the proposed algorithm and operator motononicity of the matrix logarithm, we have

logρk+1logρk+logR(ρk).\log\rho_{k+1}\geq\log\rho_{k}+\log R(\rho_{k}). (3)

Then, for any KK\in\mathbb{N}, we write

f(ρ¯K)f(ρ^)\displaystyle f(\overline{\rho}_{K})-f(\hat{\rho}) maxσ𝒟Tr(σlogR(ρ¯K))\displaystyle\leq\max_{\sigma\in\mathcal{D}}\operatorname{\mathrm{Tr}}\left(\sigma\log R\left(\overline{\rho}_{K}\right)\right)
maxσ𝒟1Kk=1KTr(σlogR(ρk))\displaystyle\leq\max_{\sigma\in\mathcal{D}}\frac{1}{K}\sum_{k=1}^{K}\operatorname{\mathrm{Tr}}\left(\sigma\log R\left(\rho_{k}\right)\right)
1Kmaxσ𝒟k=1KTr(σ(log(ρk+1)log(ρk)))\displaystyle\leq\frac{1}{K}\max_{\sigma\in\mathcal{D}}\sum_{k=1}^{K}\operatorname{\mathrm{Tr}}\left(\sigma\left(\log(\rho_{k+1})-\log(\rho_{k})\right)\right)
=1Kmaxσ𝒟Tr(σ(log(ρK+1)log(ρ1)))\displaystyle=\frac{1}{K}\max_{\sigma\in\mathcal{D}}\operatorname{\mathrm{Tr}}\left(\sigma\left(\log(\rho_{K+1})-\log(\rho_{1})\right)\right)
logDK,\displaystyle\leq\frac{\log D}{K},

where the first inequality follows from Lemma 1, the second follows from Lemma 2 and Jensen’s inequality, the third follows from (3), and the last follows from the fact that log(ρK+1)0\log(\rho_{K+1})\leq 0.

Acknowledgement

C.-M. Lin and Y.-H. Li are supported by the Young Scholar Fellowship (Einstein Program) of the Ministry of Science and Technology of Taiwan under grant numbers MOST MOST 109-2636-E-002-025 and MOST 110-2636-E-002-012. H.-C. Cheng is supported by the Young Scholar Fellowship (Einstein Program) of the Ministry of Science and Technology in Taiwan (R.O.C.) under grant number MOST 109-2636-E-002-001 & 110-2636-E-002-009, and is supported by the Yushan Young Scholar Program of the Ministry of Education in Taiwan (R.O.C.) under grant number NTU-109V0904 & NTU-110V0904.

References

  • [Aar20] Scott Aaronson. Shadow tomography of quantum states. SIAM J. Comput., 49(5):STOC18–368–STOC18–394, 2020.
  • [ACH+18] Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak. Online learning of quantum states. In Adv. Neural Information Processing Systems 31, 2018.
  • [AMnNK20] Shahnawaz Ahmed, Carlos Sánchez Muñoz, Franco Nori, and Anton Frisk Kockum. Quantum state tomography with conditional generative adversarial networks, 2020. arXiv:2008.03240.
  • [Bha97] Rajendra Bhatia. Matrix Analysis. Springer, New York, NY, 1997.
  • [BK10a] Robin Blume-Kohout. Hedged maximum likelihood quantum state estimation. Phys. Rev. Lett., 105, 2010.
  • [BK10b] Robin Blume-Kohout. Optimal, reliable estimation of quantum states. New J. Phys., 12, 2010.
  • [Cov84] Thomas M. Cover. An algorithm for maximizing expected log investment return. IEEE Trans. Inf. Theory, IT-30(2):369–373, 1984.
  • [FGLE12] Steven T. Flammia, David Gross, Yi-Kai Liu, and Jens Eisert. Quantum tomography via compressed sensing: Error bounds, sample complexity and efficient estimators. New J. Phys., 14, 2012.
  • [GGRL14] D. S. Gonçalves, M. A. Gomes-Ruggiero, and C. Lavor. Global convergence of diluted iterations in maximum-likelihood quantum tomography. Quantum Inf. Comput., 14(11&12):966–980, 2014.
  • [GLF+10] David Gross, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, and Jens Eisert. Quantum state tomography via compressed sensing. Phys. Rev. Lett., 105, 2010.
  • [HP14] Fumio Hiai and Dénes Petz. Introduction to Matrix Analysis and Applications. Springer, Cham, 2014.
  • [Hra97] Z. Hradil. Quantum-state estimation. Phys. Rev. A, 55(3), 1997.
  • [HvFJ04] Zdeněk Hradil, Jaroslav Řeháček, Jaromír Fiurášek, and Miroslav Ježek. Maximum-likelihood methods in quantum mechanics. In Quantum State Estimation, chapter 3, pages 59–112. Springer, Berlin, 2004.
  • [Ius92] Alfredo N. Iusem. A short convergence proof of the EM algorithm for a specific Poisson model. Rev. Bras. Probab. Estat., 6(1):57–67, 1992.
  • [Lav08] Pascal Lavergne. A Cauchy-Schwarz inequality for expectation of matrices. Discussion Papers dp08-07, Department of Economics, Simon Fraser University, 2008.
  • [Lvo04] A. I. Lvovsky. Iterative maximum-likelihood reconstruction in quantum homodyne tomography. J. Opt. B: Quantum Semiclass. Opt., 6, 2004.
  • [MTVv+04] G. Molina-Terriza, A. Vaziri, J. Řeháček, Z. Hradil, and A. Zeilinger. Triggered qutrits for quantum communication protocols. Phys. Rev. Lett., 92(16), 2004.
  • [MTZ12] Leonard C. MacLean, Edward O. Thorp, and William T. Ziemba, editors. The Kelly Capital Growth Investment Criterion. World Scientific, Singapore, 2012.
  • [OWV97] T. Opatrný, D.-G. Welsch, and W. Vogel. Least-squares inversion for density-matrix reconstruction. Phys. Rev. A, 56(3), 1997.
  • [Pv04] Matteo Paris and Jaroslav Řeháček, editors. Quantum State Estimation. Springer, Berlin, 2004.
  • [QFN21] Yihui Quek, Stanislav Fort, and Hui Khoon Ng. Adaptive quantum staet tomography with neural networks. npj Quantum Inf., 7, 2021.
  • [SBK18] Travis L. Scholten and Robin Blume-Kohout. Behavior of maximum likelihood in quantum state tomography. New J. Phys., 20, 2018.
  • [vHKL07] Jaroslav Řeháček, Zdeněk Hradil, E. Knill, and A. I. Lvovsky. Diluted maximum-likelihood algorithm for quantum tomography. Phys. Rev. A, 75, 2007.
  • [VL93] Y. Vardi and D. Lee. From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problems. J. R. Stat. Soc., Ser. B, 55(3):569–612, 1993.
  • [VSK85] Y. Vardi, L. A. Shepp, and L. Kaufman. A statistical model for positron emission tomography. J. Am. Stat. Assoc., 80(389):8–20, 1985.
  • [YFT19] Akram Youssry, Christopher Ferrie, and Marco Tomamichel. Efficient online quantum state estimation using a matrix-exponentiated gradient method. New J. Phys., 21(033006), 2019.
  • [YJZS20] Feidiao Yang, Jiaqing Jiang, Jialin Zhang, and Xiaoming Sun. Revisiting online quantum state learning. In Proc. AAAI Conf. Artificial Intelligence, 2020.