Maximum-Likelihood Quantum State Tomography by Cover’s Method with Non-Asymptotic Analysis

Chien-Ming Lin Department of Computer Science and Information Engineering, National Taiwan University Hao-Chung Cheng Department of Electrical Engineering, National Taiwan University Department of Mathematics, National Taiwan University Hon Hai (Foxconn) Quantum Computing Centre Yen-Huan Li Department of Computer Science and Information Engineering, National Taiwan University Department of Mathematics, National Taiwan University

Abstract

We propose an iterative algorithm that computes the maximum-likelihood estimate in quantum state tomography. The optimization error of the algorithm converges to zero at an $O((1/k)\log D)$ rate, where $k$ denotes the number of iterations and $D$ denotes the dimension of the quantum state. The per-iteration computational complexity of the algorithm is $O(D^{3}+ND^{2})$ , where $N$ denotes the number of measurement outcomes. The algorithm can be considered as a parameter-free correction of the $R\rho R$ method [A. I. Lvovsky. Iterative maximum-likelihood reconstruction in quantum homodyne tomography. J. Opt. B: Quantum Semiclass. Opt. 2004] [G. Molina-Terriza et al. Triggered qutrits for quantum communication protocols. Phys. Rev. Lett. 2004.].

1 Introduction

Quantum state tomography aims to estimate the quantum state of a physical system, given measurement outcomes (see, e.g., [Pv04] for a complete survey). There are various approaches to quantum state tomography, such as trace regression [FGLE12, GLF⁺10, OWV97, YJZS20, YFT19], maximum-likelihood estimation [Hra97, HvFJ04], Bayesian estimation [BK10a, BK10b], and recently proposed deep learning-based methods [AMnNK20, QFN21]¹¹1A confusion the authors frequently encounter is that many people mix state tomography with the notion of shadow tomography introduced by Aaronson [ACH⁺18, Aar20]. State tomography aims at estimating the quantum state, whereas shadow tomography aims at estimating the probability distribution of measurement outcomes. Indeed, one interesting conclusion by Aaronson is that shadow tomography requires much less data than state tomography. . Among existing approaches, the maximum-likelihood estimation approach has been standard in practice and enjoys favorable asymptotic statistical guarantees (see, e.g., [HvFJ04, SBK18]). The maximum-likelihood estimator is given by the optimization problem:

\hat{\rho}\in\operatorname*{\arg\min}_{\rho\in\mathcal{D}}f(\rho),\quad f(\rho)\coloneqq\frac{1}{N}\sum_{n=1}^{N}-\log\operatorname{\mathrm{Tr}}(M_{n}\rho),

(1)

for some Hermitian positive semi-definite matrices $M_{n}$ , where $\mathcal{D}$ denotes the set of quantum density matrices, i.e.,

\mathcal{D}\coloneqq\Set{\rho\in\mathbb{C}^{D\times D}}{\rho=\rho^{\dagger},\rho\geq 0,\operatorname{\mathrm{Tr}}(\rho)=1},

and $N$ denotes the number of measurement outcomes. We write $\rho^{\dagger}$ for the conjugate transpose of $\rho$ .

$R\rho R$ is a numerical method developed to solve (1) [Lvo04, MTVv⁺04]. Given a positive definite $\rho_{1}\in\mathcal{D}$ , $R\rho R$ iterates as

\rho_{k+1}=\mathcal{N}(R(\rho_{k})\rho_{k}R(\rho_{k})),\quad R(\rho_{k})\coloneqq-\nabla f(\rho_{k})=\frac{1}{N}\sum_{n=1}^{N}\frac{M_{n}}{\operatorname{\mathrm{Tr}}(M_{n}\rho_{k})},\quad\forall k\in\mathbb{N},

where the mapping $\mathcal{N}$ scales its input such that $\operatorname{\mathrm{Tr}}(\rho_{k+1})=1$ , and $\nabla f$ denotes the gradient mapping of $f$ . $R\rho R$ is parameter-free (i.e., it does not require parameter tuning) and typically converges fast in practice. Unfortunately, one can construct a synthetic data-set on which $R\rho R$ does not converge [vHKL07].

According to [Lvo04], $R\rho R$ is inspired by Cover’s method²²2Indeed, Cover’s method coincides with the expectation maximization method for solving (2) and hence is typically called expectation maximization in literature. Nevertheless, Cover’s and our derivations and convergence analyses do not need and are not covered by existing results on expectation maximization, so we do not call the method expectation maximization to avoid possible confusions. for solving the optimization problem [Cov84]:

x^{\star}\in\operatorname*{\arg\min}_{x\in\Delta}g(x),\quad g(x)\coloneqq\frac{1}{N}\sum_{n=1}^{N}-\log\braket{a_{n},x},

(2)

for some entry-wise non-negative vectors $a_{n}\in\mathbb{R}^{D}$ , where $\Delta$ denotes the probability simplex in $\mathbb{R}^{D}$ , i.e.,

\Delta\coloneqq\Set{x=(x_{1},\ldots,x_{D})\in\mathbb{R}^{D}}{x_{d}\geq 0~{}\forall~{}d,\sum_{d=1}^{D}x_{d}=1},

and the inner product is the one associated with the Euclidean norm. The optimization problem appears when one wants to compute the growth-optimal portfolio for long-term investment (see, e.g., [MTZ12]). Given an entry-wise strictly positive initial iterate $x_{1}\in\mathbb{R}_{++}^{D}$ , Cover’s method iterates as

x_{k+1}=x_{k}\circ\left(-\nabla g(x_{k})\right),\quad\forall k\in\mathbb{N},

where $\circ$ denotes the entry-wise product, aka the Schur product. Cover’s method is guaranteed to converge to the optimum [Cov84]. Indeed, if the matrices in (1) share the same eigenbasis, then it is easily checked that (1) is equivalent to (2) but $R\rho R$ is not equivalent to Cover’s method³³3Cover’s method does not need the scaling mapping $\mathcal{N}$ . One can check that its iterates are already in $\Delta$ .. This explains why $R\rho R$ does not inherit the convergence guarantee of Cover’s method.

Rehacek et al. proposed a diluted correction of $R\rho R$ [vHKL07]. Given a positive definite initial iterate $\rho_{1}\in\mathcal{D}$ , diluted $R\rho R$ iterates as

\rho_{k+1}=\mathcal{N}(\left(R(\rho_{k})+\alpha_{k}I\right)\rho_{k}\left(R(\rho_{k})+\alpha_{k}I\right)),

where the parameter $\alpha_{k}$ is chosen by exact line search. Later, Goncalves et al. proposed a variant of diluted $R\rho R$ by adopting Armijo line search [GGRL14]. Both versions of diluted $R\rho R$ are guaranteed to converge to the optimum. Unfortunately, the convergence guarantees of both versions of diluted $R\rho R$ are asymptotic and do not allow us to characterize the iteration complexity, the number of iterations required to obtain an approximate solution of (1). In particular, the dimension $D$ grows exponentially with the number of qubits and can be huge, but the dependence of the iteration complexities of the diluted $R\rho R$ methods on $D$ is unclear.

We propose the following algorithm.

•

Set $\rho_{1}=I/D$ , where $I$ denotes the identity matrix.
•

For each $k\in\mathbb{N}$ , compute

$\rho_{k+1}=\mathcal{N}\left(\exp\left(\log(\rho_{k})+\log(R(\rho_{k}))\right)\right),$

where $\exp$ and $\log$ denote matrix exponential and logarithm, respectively.

Notice that the objective function implicitly requires that $\operatorname{\mathrm{Tr}}(M_{n}\rho_{k})$ are strictly positive; otherwise, $R(\rho_{k})$ does not exist as $-\log\operatorname{\mathrm{Tr}}(M_{n}\rho_{k})$ is not well-defined. Our initialization and iteration rule guarantee that $\rho_{k}$ are full-rank and $\operatorname{\mathrm{Tr}}(M_{n}\rho_{k})$ are strictly positive.

Let us discuss the computational complexity of the proposed algorithm. The computational complexity of computing $\nabla f(\rho_{k})$ is $O(ND^{2})$ . The computational complexities of computing matrix logarithm and exponential are $O(D^{3})$ . The per-iteration computational complexity is hence $O(D^{3}+ND^{2})$ .

We can observe that the proposed algorithm recovers Cover’s method when $\rho_{k}$ and $\nabla f(\rho_{k})$ share the same eigenbasis. We show that the proposed algorithm indeed converges and its iteration complexity is logarithmic in the dimension $D$ .

Theorem 1

Assume that $\bigcap_{n=1}^{N}\ker(M_{n})=\set{0}$ . Let $(\rho_{k})_{k\in\mathbb{N}}$ be the sequence of iterates generated by the proposed method. Define $\overline{\rho}_{k}\coloneqq(\rho_{1}+\cdots+\rho_{k})/k$ . Then, for every $\varepsilon>0$ , we have $f(\overline{\rho}_{k})-f(\hat{\rho})\leq\varepsilon$ if $k\geq(1/\varepsilon)\log D$ . _□

Remark 1

Suppose $\mathcal{K}\coloneqq\bigcap_{n=1}^{N}\ker(M_{n})\neq\set{0}$ . Let $U$ be a matrix whose columns form an orthogonal basis of $\mathcal{K}^{\perp}$ . Then, it suffices to solve (1) on a lower-dimensional space by replacing $A_{n}$ with $U^{\dagger}A_{n}U$ in the objective function. _□

Recall that (2) and Cover’s method are special cases of (1) and the proposed algorithm, respectively. Moreover, Cover’s method is equivalent to the expectation maximization method for Poisson inverse problems [VL93]. Theorem 1 is hence of independent interest even for computing the growth-optimal portfolio by Cover’s method and solving Poisson inverse problems by expectation maximization, showing that the iteration complexities of both are also $O(\varepsilon^{-1}\log D)$ . This supplements the asymptotic convergence results in [Cov84] and [VSK85]. Whereas the same iteration complexity bound for Cover’s method in growth-optimal portfolio is immediate from a lemma due to Iusem [Ius92, Lemma 2.2], it is currently unclear to us how to extend Iusem’s analysis to the quantum setup.

2 Proof of Theorem 1

For convenience, let $M$ be a random matrix following the empirical probability distribution of $M_{1},\ldots,M_{N}$ ⁴⁴4Notice the derivation here is not restricted to the empirical probability distribution.. Then, we have $f(\rho)=\mathsf{E}\left[-\log\braket{M,\rho}\right]$ , where $\mathsf{E}$ denotes the mathematical expectation. Define

r(\rho)\coloneqq\frac{M}{\braket{M,\rho}},\quad R(\rho)\coloneqq-\nabla f(\rho)=\mathsf{E}\left[r(\rho)\right],

where the inner product, as well as all inner products in the rest of this section, is the Hilbert-Schmidt inner product. We start with an error upper bound.

Lemma 1

For any density matrix $\rho$ such that $R(\rho)$ exists,

f(\rho)-f(\hat{\rho})\leq\max_{\sigma\in\mathcal{D}}\braket{\log R(\rho),\sigma}.

_□

Remark 2

Notice that $R(\rho)$ is always positive definite and $\log R(\rho)$ is always well-defined. Otherwise, suppose there exists some vector $u$ such that $\braket{u,R(\rho)u}=0$ . Then, as $M_{n}$ are positive semi-definite, we have $\braket{u,M_{n}u}=0$ for all $n$ , violating the assumption that $\bigcap_{n=1}^{N}\ker(M_{n})=\set{0}$ . _□

Proof (Lemma 1)

By Jensen’s inequality, we write

	$\displaystyle f(\rho)-f(\hat{\rho})$	$\displaystyle=\mathsf{E}\left[\log\Braket{\frac{M}{\braket{M,\rho}},\hat{\rho}}\right]$
		$\displaystyle\leq\log\left[\Braket{\mathsf{E}\left[\frac{M}{\braket{M,\rho}}\right],\hat{\rho}}\right]$
		$\displaystyle=\log\braket{R(\rho),\hat{\rho}}$
		$\displaystyle\leq\log\lambda_{\max}(R(\rho))$
		$\displaystyle=\lambda_{\max}(\log R(\rho))$
		$\displaystyle=\max_{\sigma\in\mathcal{D}}\braket{\log R(\rho),\sigma}.$

_■

Deriving the following lemma is the major technical challenge in our convergence analysis. The lemma shows that the mapping $\log R(\cdot)$ is operator convex.

Lemma 2

For any density matrix $\sigma$ , the function $\varphi(\rho)\coloneqq\braket{\log R(\rho),\sigma}$ is convex. _□

Proof

Equivalently, we want to show that $D^{2}\varphi(\rho)[\delta,\delta]\geq 0$ for all $\rho\in\operatorname{\mathrm{dom}}D^{2}\varphi$ and Hermitian $\delta\in\mathbb{C}^{D\times D}$ . Define $A_{0}\coloneqq R(\rho)$ . By [HP14, Example 3.22 and Exercise 3.24], we have

	$\displaystyle A_{1}\coloneqq DR(\rho)[\delta]=-\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}\right],$
	$\displaystyle A_{2}\coloneqq D^{2}R(\rho)[\delta,\delta]=2\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}^{2}\right],$
	$\displaystyle D\log(A_{0})[A_{2}]=\int_{0}^{\infty}(A_{0}+sI)^{-1}A_{2}(A_{0}+sI)^{-1}\,\mathrm{d}s,$
	$\displaystyle D^{2}\log(A_{0})[A_{1},A_{1}]=-2\int_{0}^{\infty}(A_{0}+sI)^{-1}A_{1}(A_{0}+sI)^{-1}A_{1}(A_{0}+sI)^{-1}\,\mathrm{d}s.$

Define $\Phi(\rho)\coloneqq\log R(\rho)$ . Recall the chain rule for the second-order Fréchet derivative (see, e.g., [Bha97, p. 316]):

D^{2}\Phi(\rho)[\delta,\delta]=D^{2}\log(R(\rho))[DR(\rho)[\delta],DR(\rho)[\delta]]+D\log(R(\rho))[D^{2}R(\rho)[\delta,\delta]].

Then, we write

	$\displaystyle D^{2}\varphi(\rho)[\delta,\delta]$	$\displaystyle=\operatorname{\mathrm{Tr}}\left(\sigma D^{2}\Phi(\rho)[\delta,\delta]\right)$
		$\displaystyle=-2\int_{0}^{\infty}\operatorname{\mathrm{Tr}}\left(\sigma(A_{0}+sI)^{-1}A_{1}(A_{0}+sI)^{-1}A_{1}(A_{0}+sI)^{-1}\right)\,\mathrm{d}s+$
		$\displaystyle\quad\,\,\int_{0}^{\infty}\operatorname{\mathrm{Tr}}(\sigma(A_{0}+sI)^{-1}A_{2}(A_{0}+sI)^{-1})\,\mathrm{d}s$
		$\displaystyle=2\int_{0}^{\infty}\operatorname{\mathrm{Tr}}\left(B_{s}\left((A_{0}+sI)^{-1}\sigma(A_{0}+sI)^{-1}\right)\right)\,\mathrm{d}s,$

where

B_{s}\coloneqq\frac{A_{2}}{2}-A_{1}(A_{0}+sI)^{-1}A_{1}.

Since $(A_{0}+sI)^{-1}\sigma(A_{0}+sI)^{-1}$ is obviously positive semi-definite, it suffices to show that $B_{s}$ is positive semi-definite for all $s\geq 0$ . We write

	$\displaystyle B_{s}$	$\displaystyle\geq\frac{A_{2}}{2}-A_{1}A_{0}^{-1}A_{1}$
		$\displaystyle=\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}^{2}\right]-\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}\right]\left(\mathsf{E}\left[r(\rho)\right]\right)^{-1}\mathsf{E}\left[r(\rho)\braket{r(\rho),\delta}\right],$

which is positive semi-definite by an extension of the Cauchy-Schwarz inequality due to Lavergne [Lav08]⁵⁵5Whereas Lavergne considers the real matrix case, we notice the proof directly extends to the Hermitian matrix case.. _■

Now, we are ready to prove Theorem 1.

Proof (Theorem 1)

By the Golden-Thompson inequality, we write

\operatorname{\mathrm{Tr}}\left(\exp\left(\log(\rho_{k})+\log(-\nabla f(\rho_{k}))\right)\right)\leq\operatorname{\mathrm{Tr}}\left(\rho_{k}\left(-\nabla f(\rho_{k})\right)\right)=\mathsf{E}\left[\frac{\braket{M,\rho_{k}}}{\braket{M,\rho_{k}}}\right]=1.

Therefore, by the iteration rule of the proposed algorithm and operator motononicity of the matrix logarithm, we have

\log\rho_{k+1}\geq\log\rho_{k}+\log R(\rho_{k}).

(3)

Then, for any $K\in\mathbb{N}$ , we write

	$\displaystyle f(\overline{\rho}_{K})-f(\hat{\rho})$	$\displaystyle\leq\max_{\sigma\in\mathcal{D}}\operatorname{\mathrm{Tr}}\left(\sigma\log R\left(\overline{\rho}_{K}\right)\right)$
		$\displaystyle\leq\max_{\sigma\in\mathcal{D}}\frac{1}{K}\sum_{k=1}^{K}\operatorname{\mathrm{Tr}}\left(\sigma\log R\left(\rho_{k}\right)\right)$
		$\displaystyle\leq\frac{1}{K}\max_{\sigma\in\mathcal{D}}\sum_{k=1}^{K}\operatorname{\mathrm{Tr}}\left(\sigma\left(\log(\rho_{k+1})-\log(\rho_{k})\right)\right)$
		$\displaystyle=\frac{1}{K}\max_{\sigma\in\mathcal{D}}\operatorname{\mathrm{Tr}}\left(\sigma\left(\log(\rho_{K+1})-\log(\rho_{1})\right)\right)$
		$\displaystyle\leq\frac{\log D}{K},$

where the first inequality follows from Lemma 1, the second follows from Lemma 2 and Jensen’s inequality, the third follows from (3), and the last follows from the fact that $\log(\rho_{K+1})\leq 0$ . _■

Acknowledgement

C.-M. Lin and Y.-H. Li are supported by the Young Scholar Fellowship (Einstein Program) of the Ministry of Science and Technology of Taiwan under grant numbers MOST MOST 109-2636-E-002-025 and MOST 110-2636-E-002-012. H.-C. Cheng is supported by the Young Scholar Fellowship (Einstein Program) of the Ministry of Science and Technology in Taiwan (R.O.C.) under grant number MOST 109-2636-E-002-001 & 110-2636-E-002-009, and is supported by the Yushan Young Scholar Program of the Ministry of Education in Taiwan (R.O.C.) under grant number NTU-109V0904 & NTU-110V0904.

References

[Aar20] Scott Aaronson. Shadow tomography of quantum states. SIAM J. Comput., 49(5):STOC18–368–STOC18–394, 2020.
[ACH⁺18] Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, and Ashwin Nayak. Online learning of quantum states. In Adv. Neural Information Processing Systems 31, 2018.
[AMnNK20] Shahnawaz Ahmed, Carlos Sánchez Muñoz, Franco Nori, and Anton Frisk Kockum. Quantum state tomography with conditional generative adversarial networks, 2020. arXiv:2008.03240.
[Bha97] Rajendra Bhatia. Matrix Analysis. Springer, New York, NY, 1997.
[BK10a] Robin Blume-Kohout. Hedged maximum likelihood quantum state estimation. Phys. Rev. Lett., 105, 2010.
[BK10b] Robin Blume-Kohout. Optimal, reliable estimation of quantum states. New J. Phys., 12, 2010.
[Cov84] Thomas M. Cover. An algorithm for maximizing expected log investment return. IEEE Trans. Inf. Theory, IT-30(2):369–373, 1984.
[FGLE12] Steven T. Flammia, David Gross, Yi-Kai Liu, and Jens Eisert. Quantum tomography via compressed sensing: Error bounds, sample complexity and efficient estimators. New J. Phys., 14, 2012.
[GGRL14] D. S. Gonçalves, M. A. Gomes-Ruggiero, and C. Lavor. Global convergence of diluted iterations in maximum-likelihood quantum tomography. Quantum Inf. Comput., 14(11&12):966–980, 2014.
[GLF⁺10] David Gross, Yi-Kai Liu, Steven T. Flammia, Stephen Becker, and Jens Eisert. Quantum state tomography via compressed sensing. Phys. Rev. Lett., 105, 2010.
[HP14] Fumio Hiai and Dénes Petz. Introduction to Matrix Analysis and Applications. Springer, Cham, 2014.
[Hra97] Z. Hradil. Quantum-state estimation. Phys. Rev. A, 55(3), 1997.
[HvFJ04] Zdeněk Hradil, Jaroslav Řeháček, Jaromír Fiurášek, and Miroslav Ježek. Maximum-likelihood methods in quantum mechanics. In Quantum State Estimation, chapter 3, pages 59–112. Springer, Berlin, 2004.
[Ius92] Alfredo N. Iusem. A short convergence proof of the EM algorithm for a specific Poisson model. Rev. Bras. Probab. Estat., 6(1):57–67, 1992.
[Lav08] Pascal Lavergne. A Cauchy-Schwarz inequality for expectation of matrices. Discussion Papers dp08-07, Department of Economics, Simon Fraser University, 2008.
[Lvo04] A. I. Lvovsky. Iterative maximum-likelihood reconstruction in quantum homodyne tomography. J. Opt. B: Quantum Semiclass. Opt., 6, 2004.
[MTVv⁺04] G. Molina-Terriza, A. Vaziri, J. Řeháček, Z. Hradil, and A. Zeilinger. Triggered qutrits for quantum communication protocols. Phys. Rev. Lett., 92(16), 2004.
[MTZ12] Leonard C. MacLean, Edward O. Thorp, and William T. Ziemba, editors. The Kelly Capital Growth Investment Criterion. World Scientific, Singapore, 2012.
[OWV97] T. Opatrný, D.-G. Welsch, and W. Vogel. Least-squares inversion for density-matrix reconstruction. Phys. Rev. A, 56(3), 1997.
[Pv04] Matteo Paris and Jaroslav Řeháček, editors. Quantum State Estimation. Springer, Berlin, 2004.
[QFN21] Yihui Quek, Stanislav Fort, and Hui Khoon Ng. Adaptive quantum staet tomography with neural networks. npj Quantum Inf., 7, 2021.
[SBK18] Travis L. Scholten and Robin Blume-Kohout. Behavior of maximum likelihood in quantum state tomography. New J. Phys., 20, 2018.
[vHKL07] Jaroslav Řeháček, Zdeněk Hradil, E. Knill, and A. I. Lvovsky. Diluted maximum-likelihood algorithm for quantum tomography. Phys. Rev. A, 75, 2007.
[VL93] Y. Vardi and D. Lee. From image deblurring to optimal investments: Maximum likelihood solutions for positive linear inverse problems. J. R. Stat. Soc., Ser. B, 55(3):569–612, 1993.
[VSK85] Y. Vardi, L. A. Shepp, and L. Kaufman. A statistical model for positron emission tomography. J. Am. Stat. Assoc., 80(389):8–20, 1985.
[YFT19] Akram Youssry, Christopher Ferrie, and Marco Tomamichel. Efficient online quantum state estimation using a matrix-exponentiated gradient method. New J. Phys., 21(033006), 2019.
[YJZS20] Feidiao Yang, Jiaqing Jiang, Jialin Zhang, and Xiaoming Sun. Revisiting online quantum state learning. In Proc. AAAI Conf. Artificial Intelligence, 2020.