This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quadratic improvement on accuracy of approximating pure quantum states and unitary gates by probabilistic implementation

Seiseki Akibue [email protected]    Go Kato [email protected]    Seiichiro Tani [email protected] NTT Communication Science Labs., NTT Corporation.
3–1, Morinosato-Wakamiya, Atsugi, Kanagawa 243-0198, Japan
Abstract

Pure quantum states are often approximately encoded as classical bit strings such as those representing probability amplitudes and those describing circuits that generate the quantum states. The crucial quantity is the minimum length of classical bit strings from which the original pure states are approximately reconstructible. We derive asymptotically tight bounds on the minimum bit length required for probabilistic encodings with which one can approximately reconstruct the original pure state as an ensemble of the quantum states encoded in classical strings. We also show that such a probabilistic encoding asymptotically halves the bit length required for “deterministic” ones. This is based on the fact that the accuracy of approximating pure states by using a given subset of pure states can be increased quadratically if we use ensembles of pure states in the subset. Moreover, we show that a similar fact holds when we consider the approximation of unitary gates by using a given subset of unitary gates. This improves the reduction rate of the circuit size by using probabilistic circuit synthesis compared to previous results. This also demonstrates that the reduction is possible even for low-accuracy circuit synthesis, which might improve the accuracy of various NISQ algorithms.

discrete geometry of the quantum state, high dimensional quantum states, numerical simulation
preprint: APS/123-QED

I Introduction

Pure quantum states are often approximately encoded in classical states in various quantum information processing tasks, such as classical bit strings storing probability amplitudes of pure states in the classical simulation of quantum circuits; classical data obtained by measurement in state tomography or state estimation; classical descriptions of quantum circuits that generate target pure states in quantum circuit synthesis [1, 2]. Recently, compact classical encodings from which one can predict probability distributions on outcomes obtained by measurements in certain classes have been developed [3, 4, 5].

The key issue is the minimum encoding. Here, we investigate it in terms of the minimum length nn of classical bit strings, with which one can approximately achieve any information processing task achievable with the original pure state on a dd-dimensional system. That is, from the classical strings, one can construct a quantum state ρ^\hat{\rho} that is indistinguishable from the original state ϕ\phi within a certain accuracy by any measurements. In addition to deterministic encodings, which deterministically associate ϕ\phi with a classical bit string, we consider probabilistic encodings, which associate ϕ\phi with one of multiple classical strings according to some distribution. That is, in probabilistic encodings, ϕ\phi is associated with an ensemble of classical strings from which ρ^\hat{\rho} is constructed as an ensemble of the quantum states encoded in the classical strings.

Besides providing fundamental limits for information processing tasks using classical encodings, the minimum length nn is a fundamental quantity in various theoretical subjects including communication complexity, computational complexity, and asymptotic geometric analysis. Indeed, in deterministic encodings, classical strings do nothing but encode elements in an ϵ\epsilon-covering (sometimes called an ϵ\epsilon-net) of the set of pure states. Thus, the minimum bit length nn is the logarithm of the minimum size of ϵ\epsilon-coverings (called the covering number). Due to its prominent role in algorithm design and asymptotic geometric analysis, the covering number has been well-studied [6, 7, 8], and it is known that n=O(d)n=O(d) bits are enough to encode an ϵ\epsilon-covering. On the other hand, a particular task in communication complexity called distributed quantum sampling, which aims to classically transmit a pure state so as to approximately sample outcomes of arbitrary quantum measurement, provides a lower bound on the minimum length required for probabilistic encodings as n=Ω(d)n=\Omega(d) [9]. Taking the two known facts into account, it seems that the minimum probabilistic encoding can be realized by a deterministic one. This intuition is supported by the fact that an ensemble of deterministic classical states (called a probabilistic classical state) represents our lack of knowledge about a classical system while a pure state represents our maximum knowledge about a quantum system.

Contrary to this intuition, we show that the minimum length nn required for probabilistic encodings is exactly half of the one required for deterministic encodings in the asymptotic limits of the dimension or accuracy. Thus, the minimum encoding must associate some pure states with ensembles of classical strings describing distinct quantum states, which may be counter-intuitive, considering that pure states themselves are not probabilistic mixtures of distinct quantum states. The excessive bits required for deterministic encodings can be interpreted as a consequence of their excessive predictive capability such that they can not only reconstruct quantum states within a certain accuracy but also deterministically compute the probability distribution of any measurements within the same accuracy. Such a deterministic computation is impossible by using either the minimum (probabilistic) encoding or the original quantum states.

The bit length reduction by using probabilistic encodings follows from our refined estimation of the covering number and the following fact we prove: for any finite set 𝒜\mathcal{A} of pure states, it is possible to quadratically increase the accuracy of approximating arbitrary pure states by using ensembles of pure states in 𝒜\mathcal{A}. Moreover, we show that a similar fact holds when we consider the approximation of unitary gates by using a finite set of unitary gates. Recently, it has been found that when we approximately implement arbitrary unitary gates by using a gate sequence over a finite universal gate set (called circuit synthesis), the length of the gate sequence or the number of TT gates can be reduced by using ensembles of gate sequences for high-accuracy circuit synthesis [10, 11]. Our result improves the reduction rate of the previous results and shows that the reduction is possible even for low-accuracy circuit synthesis, which might improve the accuracy of NISQ algorithms.

II Preliminaries

In this section, we summarize basic notations used throughout the paper. Note that we consider only finite-dimensional Hilbert spaces. In particular, two-dimensional Hilbert space 2\mathbb{C}^{2} is called a qubit. 𝐋()\mathbf{L}\left(\mathcal{H}\right) and 𝐏𝐨𝐬()\mathbf{Pos}\left(\mathcal{H}\right) represent the set of linear operators and positive semidefinite operators on Hilbert space \mathcal{H}, respectively. 𝐒():={ρ𝐏𝐨𝐬():tr[ρ]=1}\mathbf{S}\left(\mathcal{H}\right):=\left\{\rho\in\mathbf{Pos}\left(\mathcal{H}\right):\text{tr}\left[\rho\right]=1\right\} and 𝐏():={ρ𝐒():tr[ρ2]=1}\mathbf{P}\left(\mathcal{H}\right):=\left\{\rho\in\mathbf{S}\left(\mathcal{H}\right):\text{tr}\left[\rho^{2}\right]=1\right\} represent the set of quantum states and that of pure states, respectively. Pure state ϕ𝐏()\phi\in\mathbf{P}\left(\mathcal{H}\right) is sometimes alternatively represented by complex unit vector |ϕ|{\phi}\rangle\in\mathcal{H} satisfying ϕ=|ϕϕ|\phi=|{\phi}\rangle\langle{\phi}|. Any physical transformation of the quantum state can be represented by a completely positive and trace preserving (CPTP) linear mapping Γ:𝐋()𝐋()\Gamma:\mathbf{L}\left(\mathcal{H}\right)\rightarrow\mathbf{L}\left(\mathcal{H}^{\prime}\right).

The trace distance ρσtr\left\|\rho-\sigma\right\|_{\text{tr}} of two quantum states ρ,σ𝐒()\rho,\sigma\in\mathbf{S}\left(\mathcal{H}\right) is defined as Mtr:=12tr[MM]\left\|M\right\|_{\text{tr}}:=\frac{1}{2}\text{tr}\left[\sqrt{MM^{\dagger}}\right] for M𝐋()M\in\mathbf{L}\left(\mathcal{H}\right). It represents the maximum total variation distance between probability distributions obtained by measurements performed on two quantum states. A similar notion measuring the distinguishability of ρ\rho and σ\sigma is the fidelity function, defined by F(ρ,σ):=maxtr[ΦρΦσ]F\left(\rho,\sigma\right):=\max\text{tr}\left[\Phi^{\rho}\Phi^{\sigma}\right], where Φρ𝐏()\Phi^{\rho}\in\mathbf{P}\left(\mathcal{H}\otimes\mathcal{H}^{\prime}\right) is a purification of ρ\rho, i.e., ρ=tr[Φρ]\rho=\text{tr}_{\mathcal{H}^{\prime}}\left[\Phi^{\rho}\right], and the maximization is taken over all the purifications. Fuchs-van de Graaf inequalities [12] provide relationships between the two measures with respect to the distinguishability as follows:

1F(ρ,σ)ρσtr1F(ρ,σ)1-\sqrt{F\left(\rho,\sigma\right)}\leq\left\|\rho-\sigma\right\|_{\text{tr}}\leq\sqrt{1-F\left(\rho,\sigma\right)} (1)

holds for any states ρ,σ𝐒()\rho,\sigma\in\mathbf{S}\left(\mathcal{H}\right), where the equality of the right inequality holds when ρ\rho and σ\sigma are pure.

III Classical encoding of pure states

Refer to caption
Figure 1: Probabilistic encoding of pure state ϕ\phi on a dd-dimensional system using nn-bit strings and physical transformation Γ\Gamma corresponding to a decoder of the bit strings to quantum states. State ϕ\phi is randomly encoded in label xx in finite set XX according to probability distribution pϕ:X[0,1]p_{\phi}:X\rightarrow[0,1], where n=log2|X|n=\lceil\log_{2}|X|\rceil.

The existence of a probabilistic encoding is equivalent to the existence of physical transformation Γ\Gamma that can approximately reconstruct arbitrary pure state ϕ\phi from probabilistic classical state {(pϕ(x),x)}xX\{(p_{\phi}(x),x)\}_{x\in X} as shown in Fig. 1. Physical transformation Γ\Gamma can be regarded as a decoder of the classical states to quantum states, which outputs mixed state ρx𝐒(d)\rho_{x}\in\mathbf{S}\left(\mathbb{C}^{d}\right) when label xXx\in X is inputted. Formally, Γ\Gamma is represented by a classical-quantum channel [13], which is defined as Γ(σ)=xXx|σ|xρx\Gamma(\sigma)=\sum_{x\in X}\langle{x}|\sigma|{x}\rangle\rho_{x}, where {|x|X|}xX\left\{|{x}\rangle\in\mathbb{C}^{|X|}\right\}_{x\in X} is an orthonormal basis. We require that with the output state ρ^:=xpϕ(x)ρx\hat{\rho}:=\sum_{x}p_{\phi}(x)\rho_{x} of Γ\Gamma, one can approximately sample any measurement outcomes performed on ϕ\phi within total variation distance ϵ\epsilon, i.e., ϕρ^tr<ϵ\left\|\phi-\hat{\rho}\right\|_{\text{tr}}<\epsilon for given ϵ(0,1]\epsilon\in(0,1]. Thus, we say that a probabilistic encoding of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) with accuracy ϵ\epsilon exists if and only if there exists set {ρx𝐒(d)}xX\{\rho_{x}\in\mathbf{S}\left(\mathbb{C}^{d}\right)\}_{x\in X} of quantum states satisfying

maxϕ𝐏(d)minpϕxXp(x)ρxtr<ϵ,\max_{\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)}\min_{p}\left\|\phi-\sum_{x\in X}p(x)\rho_{x}\right\|_{\text{tr}}<\epsilon, (2)

where the minimum is taken over probability distribution pp over XX, i.e, xXp(x)=1\sum_{x\in X}p(x)=1 and p(x)0p(x)\geq 0. Note that since any mixed state is a probabilistic mixture of pure states and trace distance is convex, Eq. (2) also guarantees that an arbitrary mixed state is also approximately reconstructible within the same accuracy as pure states.

III.1 Minimum deterministic encoding

First, we consider deterministic encodings, which can be defined as particular probabilistic encodings. Concretely, every pure state ϕ\phi is encoded into a single label xϕXx_{\phi}\in X, which implies the output state of Γ\Gamma is ρ^=ρxϕ\hat{\rho}=\rho_{x_{\phi}}. Thus, we say that a deterministic encoding of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) with accuracy ϵ\epsilon exists if and only if there exists set {ρx𝐒(d)}xX\{\rho_{x}\in\mathbf{S}\left(\mathbb{C}^{d}\right)\}_{x\in X} of quantum states satisfying

maxϕ𝐏(d)minxXϕρxtr<ϵ,\max_{\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)}\min_{x\in X}\left\|\phi-\rho_{x}\right\|_{\text{tr}}<\epsilon, (3)

which is called an external ϵ\epsilon-covering of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right). A set of pure states {ρx𝐏(d)}xX\{\rho_{x}\in\mathbf{P}\left(\mathbb{C}^{d}\right)\}_{x\in X} satisfying Eq. (3) is called an internal ϵ\epsilon-covering of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right), which corresponds to particular deterministic encodings such as one storing probability amplitudes approximately representing ϕ\phi. The minimum size of internal (or external) ϵ\epsilon-coverings is called the internal (or external) covering number and denoted by IinI_{in} (or IexI_{ex}.) Note that IexIinI_{ex}\leq I_{in} by definition and the minimum bit length nn required for deterministic encodings equals to log2Iex\lceil\log_{2}I_{ex}\rceil.

Since the condition of the ϵ\epsilon-coverings in Eq. (3) is equivalent to that for the set of ϵ\epsilon-balls {Bϵ(ρx):={ψ𝐏(d):ψρxtr<ϵ}}x\left\{B_{\epsilon}\left(\rho_{x}\right):=\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\left\|\psi-\rho_{x}\right\|_{\text{tr}}<\epsilon\right\}\right\}_{x} to cover 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right), a detailed analysis of the volume of the ϵ\epsilon-ball provides a good estimation of the covering numbers. As shown in Appendix A, the volume can be calculated as μ(Bϵ(ϕ))=ϵ2(d1)\mu(B_{\epsilon}\left(\phi\right))=\epsilon^{2(d-1)} with respect to the unitarily invariant probability measure μ\mu for any ϕ𝐏(d)\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right). This directly provides a lower bound on IinI_{in} and also its upper bound by applying the method of constructing an internal ϵ\epsilon-covering developed in [8]. We obtain the following estimation of IinI_{in}, which is tighter than previous estimations [6, 7] in large dimensions. For completeness, we provide a construction of an internal ϵ\epsilon-covering and an estimation of the parameters appearing in the construction in Appendix B.

Lemma 1.

For any ϵ(0,1]\epsilon\in(0,1] and a positive integer dd\in\mathbb{N} specified below, the internal covering number IinI_{in} of internal ϵ\epsilon-coverings of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) is bounded as follows: For any r>2r>2, there exists d0d_{0}\in\mathbb{N} such that

(1ϵ)2(d1)Iinrd(lnd)(1ϵ)2(d1),\left(\frac{1}{\epsilon}\right)^{2(d-1)}\leq I_{in}\leq rd(\ln d)\left(\frac{1}{\epsilon}\right)^{2(d-1)}, (4)

where the left inequality holds for any dd\in\mathbb{N}, and the right inequality holds for any dd0d\geq d_{0}. For example, if r=5r=5, we can set d0=2d_{0}=2.

To obtain a lower bound on IexI_{ex}, we use the following upper bounds on the volume of the ϵ\epsilon-ball as shown in Appendix C:

ϵ(0,12],μ(Bϵ(ρ)){(2ϵ)2(d1)for 3d1ϵ2(d1)ford4.\forall\epsilon\in\left(0,\frac{1}{2}\right],\mu(B_{\epsilon}\left(\rho\right))\leq\left\{\begin{array}[]{l}(2\epsilon)^{2(d-1)}\ \ \text{for}\ 3\geq d\geq 1\\ \epsilon^{2(d-1)}\ \ \ \ \ \text{for}\ d\geq 4.\end{array}\right. (5)

This bound and μ(Bϵ(ϕ))=ϵ2(d1)\mu(B_{\epsilon}\left(\phi\right))=\epsilon^{2(d-1)} imply that the volume of the ϵ\epsilon-ball can be maximized by setting its center as a pure state if d4d\geq 4, which is contrary to what happens in a qubit (d=2d=2), where Bϵ(ρ)B_{\epsilon}\left(\rho\right) corresponds to the intersection of the Bloch sphere and a ball centered at ρ\rho and the intersection is maximized not by a ball centered at a point on the Bloch sphere but by a ball centered at a point inside the Bloch ball. The qubit case also implies that the condition d4d\geq 4 for the second inequality cannot be fully relaxed. μ(Bϵ(σ))=1\mu(B_{\epsilon}\left(\sigma\right))=1 if ϵ>11d\epsilon>1-\frac{1}{d} with the maximally mixed state σ=1d𝕀\sigma=\frac{1}{d}\mathbb{I} implies another condition ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right] is also not fully removable. By using Eq. (5), we easily obtain the following lower bound on IexI_{ex}.

Lemma 2.

For any ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right] and a positive integer dd\in\mathbb{N} specified below, the external covering number IexI_{ex} of external ϵ\epsilon-coverings of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) is bounded as follows:

Iex{(12ϵ)2(d1)for 3d1(1ϵ)2(d1)ford4.I_{ex}\geq\left\{\begin{array}[]{l}\left(\frac{1}{2\epsilon}\right)^{2(d-1)}\ \ \text{for}\ 3\geq d\geq 1\\ \left(\frac{1}{\epsilon}\right)^{2(d-1)}\ \ \ \text{for}\ d\geq 4.\end{array}\right. (6)

Using the two lemmas by setting r=5r=5, we obtain the following theorem straightforwardly.

Theorem 1.

For any ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right] and an integer d2d\geq 2 specified below, the minimum size of label set XX used over all deterministic encodings of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) with accuracy ϵ\epsilon is bounded by

2l(d,2ϵ)log2|X|2l(d,ϵ)+log2(5dlnd),2\cdot l(d,2\epsilon)\leq\log_{2}|X|\leq 2\cdot l(d,\epsilon)+\log_{2}(5d\ln d), (7)

where l(d,ϵ):=(d1)log2(1ϵ)l(d,\epsilon):=\left(d-1\right)\log_{2}\left(\frac{1}{\epsilon}\right). Moreover, if d4d\geq 4, the lower bound can be strengthened as 2l(d,ϵ)log2|X|2\cdot l(d,\epsilon)\leq\log_{2}|X|.

Using Theorem 1 and n=log2|X|n=\lceil\log_{2}|X|\rceil, we obtain the asymptotic bit rate per dimension limdnd=2log2(1ϵ)\lim_{d\rightarrow\infty}\frac{n}{d}=2\log_{2}\left(\frac{1}{\epsilon}\right) of the minimum deterministic encoding for fixed ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right]. We can also obtain the asymptotic bit rate per accuracy limϵ0nlog2ϵ=2(d1)\lim_{\epsilon\rightarrow 0}\frac{n}{-\log_{2}\epsilon}=2(d-1) of the minimum deterministic encoding for fixed d2d\geq 2.

III.2 Minimum probabilistic encoding

We prove the existence of a probabilistic encoding that achieves exactly half the asymptotic bit length required for the minimum deterministic encoding, and its optimality. The main tool for the proof is the following minimax relationship between the fidelity and the trace distance.

Lemma 3.

For any CPTP linear mapping Λ:𝐋()𝐋()\Lambda:\mathbf{L}\left(\mathcal{H}^{\prime}\right)\rightarrow\mathbf{L}\left(\mathcal{H}\right), it holds that

maxϕ𝐏()minσ𝐒()ϕΛ(σ)tr\displaystyle\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\min_{\sigma\in\mathbf{S}\left(\mathcal{H}^{\prime}\right)}\left\|\phi-\Lambda(\sigma)\right\|_{\text{tr}}
=1minϕ𝐏()maxψ𝐏()F(Λ(ψ),ϕ).\displaystyle=1-\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{\psi\in\mathbf{P}\left(\mathcal{H}^{\prime}\right)}F\left(\Lambda(\psi),\phi\right). (8)
Proof.

We use the minimax theorem as follows:

(L.H.S.)\displaystyle(L.H.S.)
=maxϕ𝐏()minσ𝐒()max0M𝕀tr[M(ϕΛ(σ))]\displaystyle=\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\min_{\sigma\in\mathbf{S}\left(\mathcal{H}^{\prime}\right)}\max_{0\leq M\leq\mathbb{I}}\text{tr}\left[M(\phi-\Lambda(\sigma))\right]
=maxϕ𝐏()max0M𝕀minσ𝐒()tr[M(ϕΛ(σ))]\displaystyle=\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{0\leq M\leq\mathbb{I}}\min_{\sigma\in\mathbf{S}\left(\mathcal{H}^{\prime}\right)}\text{tr}\left[M(\phi-\Lambda(\sigma))\right]
=max0M𝕀(maxϕ𝐏()tr[Mϕ]maxψ𝐏()tr[MΛ(ψ)])\displaystyle=\max_{0\leq M\leq\mathbb{I}}\left(\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\text{tr}\left[M\phi\right]-\max_{\psi\in\mathbf{P}\left(\mathcal{H}^{\prime}\right)}\text{tr}\left[M\Lambda(\psi)\right]\right)
=(R.H.S.)\displaystyle=(R.H.S.) (9)

Note that the minimax theorem, used in the second equation, is applicable since f(σ,M):=tr[M(ϕΛ(σ))]f(\sigma,M):=\text{tr}\left[M(\phi-\Lambda(\sigma))\right] is affine with respect to each variable and the domain of MM and σ\sigma are compact and convex. The last equality holds since the maximum is achieved if rankM=1\text{rank}M=1. ∎

As a special case of Lemma 3, we obtain the following lemma about the relationship of the accuracy of approximating pure states by a finite number of pure states and that by their ensembles:

Corollary 1.

Let {ϕx𝐏()}xX\{\phi_{x}\in\mathbf{P}\left(\mathcal{H}\right)\}_{x\in X} be a finite set of pure states. Then, it holds that

maxϕ𝐏()minpϕxXp(x)ϕxtr=maxϕ𝐏()minxXϕϕxtr2,\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\min_{p}\left\|\phi-\sum_{x\in X}p(x)\phi_{x}\right\|_{\text{tr}}=\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\min_{x\in X}\left\|\phi-\phi_{x}\right\|_{\text{tr}}^{2}, (10)

where the first minimum is taken over probability distribution pp over XX.

Proof.

Suppose Λ\Lambda in Lemma 3 is a classical-quantum channel such that Λ(σ)=xXx|σ|xϕx\Lambda(\sigma)=\sum_{x\in X}\langle{x}|\sigma|{x}\rangle\phi_{x}, which corresponds to a particular decoder for a classical encoding that outputs pure state ϕx𝐏()\phi_{x}\in\mathbf{P}\left(\mathcal{H}\right) when label xx is inputted. Then, L.H.S. of Eq. (3) is equal to that of Eq. (10) by interchanging x|σ|x\langle{x}|\sigma|{x}\rangle and p(x)p(x). and R.H.S. of Eq. (3) is equal to

1minϕ𝐏()maxpF(xXp(x)ϕx,ϕ)\displaystyle 1-\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{p}F\left(\sum_{x\in X}p(x)\phi_{x},\phi\right)
=1minϕ𝐏()maxpxXp(x)F(ϕx,ϕ)\displaystyle=1-\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{p}\sum_{x\in X}p(x)F\left(\phi_{x},\phi\right)
=1minϕ𝐏()maxxXF(ϕx,ϕ),\displaystyle=1-\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{x\in X}F\left(\phi_{x},\phi\right), (11)

which is equal to R.H.S. of Eq. (10) since the equality of the second inequlity in Eq. (1) holds when ρ\rho and σ\sigma are pure. ∎

This corollary implies that an internal ϵ\sqrt{\epsilon}-covering is sufficient to approximate arbitrary pure state within accuracy ϵ\epsilon by using its probabilistic mixture. This can be intuitively understood by the curvature of the sphere as illustrated in Fig. 2. Indeed, Corollary 1 (for =2\mathcal{H}=\mathbb{C}^{2}) and the Bloch representation imply that for any compact and convex set KK whose extreme points ext(K)\text{ext}\left(K\right) reside on sphere SS with radius 12\frac{1}{2}, δ=ϵ\delta=\sqrt{\epsilon} holds, where ϵ\epsilon and δ\delta are the distance between KK and the farthest point on SS from KK and that between ext(K)\text{ext}\left(K\right) and the farthest point on SS from ext(K)\text{ext}\left(K\right), respectively. This can also be derived from elementary geometric observations.

Refer to caption
Figure 2: For any pure qubit state ϕ\phi, we can find probability mixture ρ^\hat{\rho} of six pure states, which are the eigenstates of the Pauli operators and represented by the extreme points of the octahedron on the Bloch sphere, such that ϕρ^trϵ=123(31)\left\|\phi-\hat{\rho}\right\|_{\text{tr}}\leq\epsilon=\frac{1}{2\sqrt{3}}\left(\sqrt{3}-1\right). If we represent the Bloch sphere by a sphere with radius 12\frac{1}{2}, ϵ\epsilon is the longest Euclidean distance between a point on the Bloch sphere and the octahedron since the trace distance between two quantum states is equal to the Euclidean distance between the corresponding points in the Bloch ball. On the other hand, the longest trace distance δ\delta between a pure state and the six pure states, which is equivalent to the longest Euclidean distance between a point on the Bloch sphere and the extreme points of the octahedron, satisfies δ=ϵ\delta=\sqrt{\epsilon}.

By using Lemma 3 and Corollary 1, we can derive following asymptotically tight bounds on the minimum bit length required for probabilistic encodings:

Theorem 2.

For any ϵ(0,1]\epsilon\in\left(0,1\right] and an integer d2d\geq 2, the minimum size of label set XX used over all probabilistic encodings of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) with accuracy ϵ\epsilon is bounded by

l(d,ϵ)log2dlog2|X|l(d,ϵ)+log2(5dlnd),l(d,\epsilon)-\log_{2}d\leq\log_{2}|X|\leq l(d,\epsilon)+\log_{2}(5d\ln d), (12)

where l(d,ϵ):=(d1)log2(1ϵ)l(d,\epsilon):=\left(d-1\right)\log_{2}\left(\frac{1}{\epsilon}\right).

Proof.

When {ϕx𝐏(d)}xX\{\phi_{x}\in\mathbf{P}\left(\mathbb{C}^{d}\right)\}_{x\in X} is an internal ϵ\sqrt{\epsilon}-covering, Corollary 1 implies {ϕx}xX\{\phi_{x}\}_{x\in X} satisfies Eq. (2). Thus, there exists a probabilistic encoding of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) with accuracy ϵ\epsilon and label set XX, whose size is upper bounded by using Lemma 1 with setting r=5r=5.

Next, we show the lower bound. Let {ρx𝐒(d)}xX\{\rho_{x}\in\mathbf{S}\left(\mathbb{C}^{d}\right)\}_{x\in X} satisfy Eq. (2). By using Lemma 3 and setting Λ\Lambda a classical-quantum channel such that Λ(σ)=xXx|σ|xρx\Lambda(\sigma)=\sum_{x\in X}\langle{x}|\sigma|{x}\rangle\rho_{x} as in the proof of Corollary 1, we obtain

1ϵ\displaystyle 1-\epsilon <\displaystyle< minϕ𝐏()maxpxXp(x)F(ρx,ϕ)\displaystyle\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{p}\sum_{x\in X}p(x)F\left(\rho_{x},\phi\right) (13)
=\displaystyle= minϕ𝐏()maxxXF(ρx,ϕ).\displaystyle\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{x\in X}F\left(\rho_{x},\phi\right).

By letting ρx=i=1dp(i|x)ϕi|x\rho_{x}=\sum_{i=1}^{d}p(i|x)\phi_{i|x}, we obtain that for any ϕ𝐏(d)\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right), there exists ii and xx such that

1ϵ\displaystyle 1-\epsilon <\displaystyle< F(ρx,ϕ)=j=1dp(j|x)F(ϕj|x,ϕ)\displaystyle F\left(\rho_{x},\phi\right)=\sum_{j=1}^{d}p(j|x)F\left(\phi_{j|x},\phi\right) (14)
\displaystyle\leq F(ϕi|x,ϕ)=1ϕϕi|xtr2.\displaystyle F\left(\phi_{i|x},\phi\right)=1-\left\|\phi-\phi_{i|x}\right\|_{\text{tr}}^{2}.

Thus, {ϕi|x}i,x\{\phi_{i|x}\}_{i,x} is an internal ϵ\sqrt{\epsilon}-covering of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right). Hence, the lower bound can be obtained by applying Lemma 1 as |X|d(1ϵ)2(d1)|X|d\geq\left(\frac{1}{\sqrt{\epsilon}}\right)^{2(d-1)}. ∎

Using Theorem 2 and n=log2|X|n=\lceil\log_{2}|X|\rceil, we obtain the asymptotic bit rate per dimension limdnd=log2(1ϵ)\lim_{d\rightarrow\infty}\frac{n}{d}=\log_{2}\left(\frac{1}{\epsilon}\right) of the minimum probabilistic encoding for fixed ϵ(0,1]\epsilon\in\left(0,1\right], and one per accuracy limϵ0nlog2ϵ=d1\lim_{\epsilon\rightarrow 0}\frac{n}{-\log_{2}\epsilon}=d-1 of the minimum probabilistic encoding for fixed d2d\geq 2, which are exactly half of those of the minimum deterministic encoding.

IV Probabilistic circuit synthesis

In the context of circuit synthesis using a finite universal gate set, it has recently been found that the length of the gate sequence or the number of TT gates can be reduced by using ensembles of gate sequences [10, 11]. This is based on the so-called mixing lemma, which shows that if finite set {Υx}x\{\Upsilon_{x}\}_{x} of unitary transformations approximates arbitrary unitary transformations within sufficient high accuracy, it is possible to increase the accuracy by using particular ensembles xp(x)Υx\sum_{x}p(x)\Upsilon_{x}.

Based on Corollary 1, where we derive the accuracy achieved by the optimal ensembles of pure states in approximating arbitrary pure states, we can derive bounds on the accuracy achieved by the optimal ensembles of unitary transformations in approximating arbitrary unitary transformations in the following theorem. Note that similarly to [10, 11], we measure the accuracy of the approximation by using the diamond norm (sometimes called the completely bounded trace norm) of hermitian preserving linear mappings, defined as

12Ξ:=maxΦ𝐏(11)Ξid1(Φ)tr,\frac{1}{2}\left\|\Xi\right\|_{\diamond}:=\max_{\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1}^{\prime}\right)}\left\|\Xi\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi)\right\|_{\text{tr}}, (15)

where Ξ:𝐋(1)𝐋(2)\Xi:\mathbf{L}\left(\mathcal{H}_{1}\right)\rightarrow\mathbf{L}\left(\mathcal{H}_{2}\right), idid_{\mathcal{H}} is the identity mapping on 𝐋()\mathbf{L}\left(\mathcal{H}\right) and dim1=dim1\dim\mathcal{H}_{1}^{\prime}=\dim\mathcal{H}_{1}.

Theorem 3.

For an integer d2d\geq 2 specified below, let {Υx}xX\{\Upsilon_{x}\}_{x\in X} be a finite set of unitary transformations on 𝐋(d)\mathbf{L}\left(\mathbb{C}^{d}\right). Then, it holds that

2dαmaxΥminp12ΥxXp(x)Υxα,\displaystyle\frac{2}{d}\alpha\leq\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}\leq\alpha, (16)
α=maxΥminxX(12ΥΥx)2,\displaystyle\alpha=\max_{\Upsilon}\min_{x\in X}\left(\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}\right)^{2}, (17)

where the first minimum is taken over probability distribution pp over XX. Note that if d=2d=2, the equalities hold.

This theorem resembles Corollary 1 (especially when d=2d=2), which can be regarded as a consequence of the similarity between pure states and unitary transformations via the Choi-Jamiołkowski representation. Although a proof uses the minimax theorem as in the proof of Corollary 1, it requires additional work to some extent. We give a complete proof in Appendix D with the sharp lower bound.

This theorem implies that quantum circuit synthesis using probabilistically generated circuits formed from finite universal gate set 𝒞={g1,g2,}\mathcal{C}=\{g_{1},g_{2},\cdots\} can reduce the circuit size. To see this, first, let {Υx(n)}x\{\Upsilon_{x}^{(n)}\}_{x} be the set of unitary transformations representing the unitary circuit realized by gate sequences gi1ging_{i_{1}}\cdots g_{i_{n}} of length at most nn. Next, let n(ϵ)n(\epsilon) be the smallest length of gate sequences to approximate arbitrary unitary transformations within accuracy ϵ\epsilon, i.e., n(ϵ):=min{n:maxΥminx12ΥΥx(n)<ϵ}n(\epsilon):=\min\{n\in\mathbb{N}:\max_{\Upsilon}\min_{x}\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}^{(n)}\right\|_{\diamond}<\epsilon\}. Theorem 3 implies that by using the probabilistic implementation, we can implement arbitrary unitary transformation within accuracy ϵ\epsilon only with an n(ϵ)n(\sqrt{\epsilon})-size circuit.

The accuracy in approximating unitary transformations is often measured by using the operator norm X:=maxϕ𝐏()X|ϕ2\left\|X\right\|_{\infty}:=\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\left\|X|{\phi}\rangle\right\|_{2}. The celebrated Solovay-Kitaev theorem shows that for any finite universal gate set 𝒞\mathcal{C}, set {Ux(n)}x\{U_{x}^{(n)}\}_{x} of unitary circuits realized by gate sequences of length n=O(logc(1δ))n=O\left(\log^{c}\left(\frac{1}{\delta}\right)\right) (c1c\geq 1) is sufficient for approximating arbitrary unitary operators, i.e., maxUminxUUx(n)<δ\max_{U}\min_{x}\left\|U-U_{x}^{(n)}\right\|_{\infty}<\delta, where we denote a unitary circuit and the unitary operator representing the circuit by the same symbol Ux(n)U_{x}^{(n)}. On the other hand, a relationship between the operator norm and the diamond norm shown in Appendix E implies that ΥΥx<δ4δ2\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}<\delta\sqrt{4-\delta^{2}} if UUx<δ\left\|U-U_{x}\right\|_{\infty}<\delta, where Υ(ρ)=UρU\Upsilon(\rho)=U\rho U^{\dagger} and Υx(ρ)=UxρUx\Upsilon_{x}(\rho)=U_{x}\rho U_{x}^{\dagger}. Combining with Theorem 3, we can verify that arbitrary unitary transformations can be approximated by ensembles of {Υx}x\{\Upsilon_{x}\}_{x} such that

maxΥminp12Υxp(x)Υx<δ2(1(δ2)2)\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x}p(x)\Upsilon_{x}\right\|_{\diamond}<\delta^{2}\left(1-\left(\frac{\delta}{2}\right)^{2}\right) (18)

if maxUminxUUx<δ\max_{U}\min_{x}\left\|U-U_{x}\right\|_{\infty}<\delta. This bound is tighter than the previous bound obtained in [10, Theorem 1], maxΥminp12Υxp(x)Υx<5δ2\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x}p(x)\Upsilon_{x}\right\|_{\diamond}<5\delta^{2}. Moreover, our bound holds if δ2\delta\leq\sqrt{2} while the previous bound was shown to hold for δ<0.01\delta<0.01. In addition to the improved estimation, our lower bound reveals the limitation of the probabilistic implemetation.

As suggested by the Solovay-Kitaev theorem, if we assume n(ϵ)alogc(1ϵ)n(\epsilon)\sim a\log^{c}\left(\frac{1}{\epsilon}\right) in the high-accuracy regime (ϵ1\epsilon\ll 1), the probabilistic implementation reduces the length of gate sequences by about 1(12)c50%1-\left(\frac{1}{2}\right)^{c}\geq 50\% since c1c\geq 1 from a volume consideration. Even in the low-accuracy regime, our bound guarantees the reduction. For example, we show how the gate length can be reduced by using the probabilistic implementation to synthesize single-qubit unitary transformations with gate set {S,H,T}\{S,H,T\} in Appendix F.

V Conclusion

In this paper, we have considered the minimum probabilistic encoding so as to approximately reconstruct an arbitrary pure state ϕ𝐏(d)\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right) from an nn-bit string within accuracy ϵ\epsilon with respect to the trace distance. We then demonstrated that it cannot be realized by simply storing an element of the minimum ϵ\epsilon-covering. More precisely, we proved that the bit rate required for probabilistic encodings is exactly half of that of the minimum length of bits necessary to store elements of an ϵ\epsilon-covering of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) in asymptotic limit ϵ0\epsilon\rightarrow 0 or in limit dd\rightarrow\infty when ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right]. In limit dd\rightarrow\infty when ϵ(12,1]\epsilon\in\left(\frac{1}{2},1\right], the same result holds if we consider only internal ϵ\epsilon-coverings; however, in general, whether the same result holds or not is an open problem. Several numerical calculations suggest a positive answer.

Moreover, we show that similarly to the state encoding, for any finite set {Υx}x\{\Upsilon_{x}\}_{x} of unitary transformations, we can at least quadratically increase the accuracy of approximating arbitrary unitary transformations by using ensembles of {Υx}x\{\Upsilon_{x}\}_{x}. Particularly, we obtain bounds on the accuracy when one uses the optimal ensembles to reveal the possibility and the limitation of the probabilistic implementation of unitary transformations.

Our result could provide a new quantitative guiding principle to explore further capabilities and limitations of manipulating a quantum system as well as the foundations of quantum theory, including the following two related topics:

  1. 1.

    The results demonstrate an information-theoretical separation of the memory size to store a pure quantum state between strong simulations and weak ones, which are two types of classical simulation of a quantum computer [14, 15, 16] (the former approximately computes the probability distribution over the outcomes, whereas the latter only approximately samples the outcomes.)

  2. 2.

    The complex projective space representation of pure states can be regarded as a classical encoding of pure states in deterministic classical states describing operators in 𝐏()\mathbf{P}\left(\mathcal{H}\right). The fact that any distinct pure states are encoded indistinguishable classical states inclines us to think that the indistinguishability of non-orthogonal pure states results from our limited ability to measure them. To interpret the indistinguishability as an intrinsic feature of pure states, classical encodings of pure states in probabilistic classical states have been constructed in ψ\psi-epistemic models [17, 18], in which the encodings use indistinguishable and probabilistic classical states to encode some distinct pure states. Our results show that indistinguishable classical states encoding distinct elements in an ϵ\epsilon-covering are not only helpful for such an interpretation but also necessary for the minimum probabilistic encoding.

Acknowledgements.
We thank Yuki Takeuchi, Yasunari Suzuki, Yasuhiro Takahashi, and Adel Sohbi for helpful discussions. This work was partially supported by JST Moonshot R&D MILLENNIA Program (Grant No.JPMJMS2061). SA was partially supported by JST, PRESTO Grant No.JPMJPR2111 and JPMXS0120319794. GK was supported in part by the Grant-in-Aid for Scientific Research (C) No.20K03779, (C) No.21K03388, and (S) No.18H05237 of JSPS, CREST (Japan Science and Technology Agency) Grant No.JPMJCR1671. ST was partially supported by the Grant-in-Aid for Transformative Research Areas No.JP20H05966 of JSPS.

References

Appendix A Volume of ϵ\epsilon-ball in 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right)

To construct an ϵ\epsilon-covering, we first derive the volume of ϵ\epsilon-ball Bϵ(ϕ):={ψ𝐏(d):ψϕtr<ϵ}B_{\epsilon}\left(\phi\right):=\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\left\|\psi-\phi\right\|_{\text{tr}}<\epsilon\right\} in 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) as follows:

d,ϵ(0,1],ϕ𝐏(d),μ(Bϵ(ϕ))=ϵ2(d1),\forall d\in\mathbb{N},\forall\epsilon\in(0,1],\forall\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right),\mu(B_{\epsilon}\left(\phi\right))=\epsilon^{2(d-1)}, (19)

where μ\mu is the unitarily invariant probability measure on the Borel sets of 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right).

When d=1d=1, Eq. (19) holds. By assuming d2d\geq 2, we proceed as follows:

μ(Bϵ(ϕ))\displaystyle\mu(B_{\epsilon}\left(\phi\right))
=μ({ψ𝐏(d):|00|ψtr<ϵ})\displaystyle=\mu\left(\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\left\||{0}\rangle\langle{0}|-\psi\right\|_{\text{tr}}<\epsilon\right\}\right)
=μ({ψ𝐏(d):|0|ψ|2>1ϵ2})\displaystyle=\mu\left(\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):|\langle{0}|{\psi}\rangle|^{2}>1-\epsilon^{2}\right\}\right)
=ξ({x2d:x2=1x12+x22>1ϵ2}),\displaystyle=\xi\left(\left\{\vec{x}\in\mathbb{R}^{2d}:\left\|\vec{x}\right\|_{2}=1\wedge x_{1}^{2}+x_{2}^{2}>1-\epsilon^{2}\right\}\right), (20)

where the first equality uses fixed pure state |0|{0}\rangle and the unitary invariance of μ\mu and the trace distance, the second equality uses Eq. (1), and the third equality uses the relationship between μ\mu and the uniform spherical probability measure ξ\xi. Using a spherical coordinate system, we can proceed as follows:

μ(Bϵ(ϕ))\displaystyle\mu(B_{\epsilon}\left(\phi\right)) =\displaystyle= V(ϵ)V(1),\displaystyle\frac{V(\epsilon)}{V(1)}, (21)
whereV(ϵ)\displaystyle\text{where}\ V(\epsilon) :=\displaystyle:= Dϵsin2d2θsin2d3ϕdθdϕ\displaystyle\int_{D_{\epsilon}}\sin^{2d-2}\theta\sin^{2d-3}\phi d\theta d\phi (22)
=\displaystyle= 4D^ϵsin2d2θsin2d3ϕdθdϕ\displaystyle 4\int_{\hat{D}_{\epsilon}}\sin^{2d-2}\theta\sin^{2d-3}\phi d\theta d\phi

and the domain of the integration DϵD_{\epsilon} is given by {(θ,ϕ):θ,ϕ(0,π),sinθsinϕ<ϵ}\{(\theta,\phi):\theta,\phi\in(0,\pi),\sin\theta\sin\phi<\epsilon\}. Since the domain and that of the integrand have reflection symmetries about two lines θ=π2\theta=\frac{\pi}{2} and ϕ=π2\phi=\frac{\pi}{2}, it is sufficient to perform the integration in domain D^ϵ:={(θ,ϕ):θ,ϕ(0,π2),sinθsinϕ<ϵ}\hat{D}_{\epsilon}:=\{(\theta,\phi):\theta,\phi\in\left(0,\frac{\pi}{2}\right),\sin\theta\sin\phi<\epsilon\}. By changing the variables as (xy)=(sinθsinϕsinθ)\begin{pmatrix}x\\ y\end{pmatrix}=\begin{pmatrix}\sin\theta\sin\phi\\ \sin\theta\end{pmatrix}, we obtain

V(ϵ)\displaystyle V(\epsilon) =\displaystyle= 40ϵ𝑑xx2d3x1𝑑yy1y2y2x2\displaystyle 4\int_{0}^{\epsilon}dxx^{2d-3}\int_{x}^{1}dy\frac{y}{\sqrt{1-y^{2}}\sqrt{y^{2}-x^{2}}} (23)
=\displaystyle= 40ϵ𝑑xx2d3[arcsin1y21x2]1x\displaystyle 4\int_{0}^{\epsilon}dxx^{2d-3}\left[\arcsin\sqrt{\frac{1-y^{2}}{1-x^{2}}}\right]_{1}^{x}
=\displaystyle= πd1ϵ2(d1)\displaystyle\frac{\pi}{d-1}\epsilon^{2(d-1)}

for ϵ[0,1]\epsilon\in[0,1]. This completes the calculation.

Appendix B Upper bound for internal covering number IinI_{in}

We construct an internal ϵ\epsilon-covering (ϵ(0,1]\epsilon\in(0,1]) following the proof in [8, Corollary 5.5]. The construction is based on the fact that sufficiently many pure states randomly sampled form an ϵ\epsilon-covering. However, since the probability of a new random pure state residing in the uncovered region decreases when many random ϵ\epsilon-balls are sampled, it is better to stop sampling a pure state and change the strategy of the construction.

In the proof, we represent some parameters explicitly, which are tailored to the ϵ\epsilon-covering with respect to the trace distance. Assume d2d\geq 2 and let D=2(d1)(2)D=2(d-1)(\geq 2). Let {ϕj𝐏(d)}j=1JR\{\phi_{j}\in\mathbf{P}\left(\mathbb{C}^{d}\right)\}_{j=1}^{J_{R}} be a set of finite randomly sampled pure states with respect to product measure μJR\mu^{J_{R}}. The expected volume of the region not covered by A:=j=1JRBϵR(ϕj)A:=\cup_{j=1}^{J_{R}}B_{\epsilon_{R}}\left(\phi_{j}\right) (0<ϵR10<\epsilon_{R}\leq 1) can be calculated as follows:

𝑑μJRμ(Ac)\displaystyle\int d\mu^{J_{R}}\mu\left(A^{c}\right)
=𝑑μJR𝑑μ(ψ)j=1JRI[ψϕjtrϵR]\displaystyle=\int d\mu^{J_{R}}\int d\mu(\psi)\prod_{j=1}^{J_{R}}\text{I}\left[\left\|\psi-\phi_{j}\right\|_{\text{tr}}\geq\epsilon_{R}\right]
=𝑑μ(ψ)j=1JR𝑑μ(ϕj)I[ψϕjtrϵR]\displaystyle=\int d\mu(\psi)\prod_{j=1}^{J_{R}}\int d\mu(\phi_{j})\text{I}\left[\left\|\psi-\phi_{j}\right\|_{\text{tr}}\geq\epsilon_{R}\right]
=(1ϵRD)JRexp(JRϵRD),\displaystyle=\left(1-\epsilon_{R}^{D}\right)^{J_{R}}\leq\exp\left(-J_{R}\epsilon_{R}^{D}\right), (24)

where we use Fubini’s theorem and Eq. (19) in the second equation and the third equation, respectively. Note that I[X]{0,1}\text{I}\left[X\right]\in\{0,1\} is the indicator function, i.e., I[X]=1\text{I}\left[X\right]=1 iff XX is true.

Thus, there exists {ϕj}j=1JR\{\phi_{j}\}_{j=1}^{J_{R}} such that μ(Ac)exp(JRϵRD)\mu\left(A^{c}\right)\leq\exp\left(-J_{R}\epsilon_{R}^{D}\right). Pick {ψj}j=1JP\{\psi_{j}\}_{j=1}^{J_{P}} as much as possible such that BϵP(ψj)B_{\epsilon_{P}}\left(\psi_{j}\right) are disjoint and contained in AcA^{c}. When 0<ϵPϵR10<\epsilon_{P}\leq\epsilon_{R}\leq 1, we can verify that {ϕj}j=1JR{ψj}j=1JP\{\phi_{j}\}_{j=1}^{J_{R}}\cup\{\psi_{j}\}_{j=1}^{J_{P}} is an (ϵR+ϵP)(\epsilon_{R}+\epsilon_{P})-covering and its size J:=JR+JPJ:=J_{R}+J_{P} is upper bounded as

JJR+exp(JRϵRD)ϵPD.J\leq J_{R}+\frac{\exp\left(-J_{R}\epsilon_{R}^{D}\right)}{\epsilon_{P}^{D}}. (25)

By setting JR=DϵRDln(ϵRϵP)J_{R}=\left\lceil\frac{D}{\epsilon_{R}^{D}}\ln\left(\frac{\epsilon_{R}}{\epsilon_{P}}\right)\right\rceil, ϵP=ϵRx\epsilon_{P}=\frac{\epsilon_{R}}{x} and ϵR=x1+xϵ\epsilon_{R}=\frac{x}{1+x}\epsilon with x1x\geq 1, we obtain the following upper bound:

JDlnxϵRD+1ϵRD1ϵD{(1+1x)D(Dlnx+1)+1}=2dlndϵDα(d,x)2dlnd,J\leq\left\lceil\frac{D\ln x}{\epsilon_{R}^{D}}\right\rceil+\frac{1}{\epsilon_{R}^{D}}\leq\frac{1}{\epsilon^{D}}\left\{\left(1+\frac{1}{x}\right)^{D}(D\ln x+1)+1\right\}=\frac{2d\ln d}{\epsilon^{D}}\cdot\frac{\alpha(d,x)}{2d\ln d}, (26)

where α(d,x)=(1+1x)D(Dlnx+1)+1\alpha(d,x)=\left(1+\frac{1}{x}\right)^{D}(D\ln x+1)+1. Since limdα(d,DlnD)2dlnd=1\lim_{d\rightarrow\infty}\frac{\alpha(d,D\ln D)}{2d\ln d}=1, we obtain that for any r>2r>2 there exists d0d_{0}\in\mathbb{N} such that

dd0,ϵ(0,1],Jrd(lnd)(1ϵ)2(d1).\forall d\geq d_{0},\forall\epsilon\in(0,1],J\leq rd(\ln d)\left(\frac{1}{\epsilon}\right)^{2(d-1)}. (27)

For example, if r=5r=5, we can set d0=2d_{0}=2. This completes the proof.

Appendix C Lower bound for external covering number IexI_{ex}

We can derive a lower bound for the external covering number as a direct consequence of the following upper bounds on the volume of the maximum intersection of ϵ\epsilon-ball in 𝐒(d)\mathbf{S}\left(\mathbb{C}^{d}\right) and 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right), which will be proven in this section: For any ρ𝐒(d)\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right) and any ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right],

d1,μ(Bϵ(ρ))\displaystyle\forall d\geq 1,\mu(B_{\epsilon}\left(\rho\right)) \displaystyle\leq (2ϵ)2(d1),\displaystyle(2\epsilon)^{2(d-1)}, (28)
d4,μ(Bϵ(ρ))\displaystyle\forall d\geq 4,\mu(B_{\epsilon}\left(\rho\right)) \displaystyle\leq ϵ2(d1).\displaystyle\epsilon^{2(d-1)}. (29)

Combined with Eq. (19), Eq. (29) implies that the maximum intersection is achieved when ρ\rho is pure if two conditions d4d\geq 4 and ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right] are satisfied. These two conditions are not tight but cannot be fully relaxed since μ(Bϵ(σ))=1\mu(B_{\epsilon}\left(\sigma\right))=1 for any dd\in\mathbb{N} and ϵ>11d\epsilon>1-\frac{1}{d}, where σ\sigma is the maximally mixed state 1d𝕀\frac{1}{d}\mathbb{I}.

C.1 Proof of Eq. (28)

By defining ϕ:=argminϕ𝐏(d)ϕρtr\phi:=\arg\min_{\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)}\left\|\phi-\rho\right\|_{\text{tr}}, we obtain Bϵ(ρ)B2ϵ(ϕ)B_{\epsilon}\left(\rho\right)\subseteq B_{2\epsilon}\left(\phi\right). For ψϕtrψρtr+ϕρtr2ψρtr<2ϵ\left\|\psi-\phi\right\|_{\text{tr}}\leq\left\|\psi-\rho\right\|_{\text{tr}}+\left\|\phi-\rho\right\|_{\text{tr}}\leq 2\left\|\psi-\rho\right\|_{\text{tr}}<2\epsilon for any pure state ψBϵ(ρ)\psi\in B_{\epsilon}\left(\rho\right). This completes the proof since μ(Bϵ(ρ))μ(B2ϵ(ϕ))=(2ϵ)2(d1)\mu(B_{\epsilon}\left(\rho\right))\leq\mu(B_{2\epsilon}\left(\phi\right))=(2\epsilon)^{2(d-1)} for any ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right].

C.2 Proof of Eq. (29)

Let ρ=i=0d1pi|ii|\rho=\sum_{i=0}^{d-1}p_{i}|{i}\rangle\langle{i}|, where {|i}i\{|{i}\rangle\}_{i} is a set of eigenvectors of ρ\rho and eigenvalues are arranged in decreasing order, i.e., p0p1p_{0}\geq p_{1}\geq\cdots. Since μ(Bϵ(ρ))\mu(B_{\epsilon}\left(\rho\right)) depends not on the eigenvectors but on the eigenvalues of ρ\rho, it is sufficient to consider only diagonal ρ\rho with respect to a fixed basis. However, it is difficult to exactly calculate μ(Bϵ(ρ))\mu(B_{\epsilon}\left(\rho\right)) due to a complicated relationship between ψ\psi and the largest eigenvalue of ψρ\psi-\rho, resulting from the condition ϵ>ψρtr=λmax(ψρ)\epsilon>\left\|\psi-\rho\right\|_{\text{tr}}=\lambda_{\max}(\psi-\rho).

We derive lower bound fρ(ψ)f_{\rho}(\psi) of ψρtr\left\|\psi-\rho\right\|_{\text{tr}} and use the relationship μ(Bϵ(ρ))μ({ψ:fρ(ψ)<ϵ})\mu(B_{\epsilon}\left(\rho\right))\leq\mu\left(\{\psi:f_{\rho}(\psi)<\epsilon\}\right) to show Eq. (29), where fρf_{\rho} is a measurable function. Since simple bound fρ(ψ)=1F(ψ,ρ)f_{\rho}(\psi)=1-F\left(\psi,\rho\right) is too loose to show Eq. (29), we derive a tighter lower bound as follows: Let Π\Pi and Π\Pi^{\bot} be the Hermitian projectors on two-dimensional subspace 𝒱span({|0,|ψ})\mathcal{V}\supseteq\text{span}\left(\{|{0}\rangle,|{\psi}\rangle\}\right) and its orthogonal complement, respectively. We then obtain

ψρtr\displaystyle\left\|\psi-\rho\right\|_{\text{tr}} \displaystyle\geq Π(ψρ)Π+Π(ψρ)Πtr\displaystyle\left\|\Pi(\psi-\rho)\Pi+\Pi_{\bot}(\psi-\rho)\Pi_{\bot}\right\|_{\text{tr}} (30)
=\displaystyle= ψΠρΠtr+ΠρΠtr,\displaystyle\left\|\psi-\Pi\rho\Pi\right\|_{\text{tr}}+\left\|\Pi_{\bot}\rho\Pi_{\bot}\right\|_{\text{tr}},

where we use the monotonicity of the trace distance under a CPTP mapping in the first inequality. Define fρ(ψ)f_{\rho}(\psi) as the value in Eq. (30), which can be explicitly written as

fρ(ψ)=12(1+p0q)24(p0q)|0|ψ|2+12(1p0q),f_{\rho}(\psi)=\frac{1}{2}\sqrt{(1+p_{0}-q)^{2}-4(p_{0}-q)|\langle{0}|{\psi}\rangle|^{2}}+\frac{1}{2}(1-p_{0}-q), (31)

where q=0|ρ|0q=\langle{0_{\bot}}|\rho|{0_{\bot}}\rangle and {|0,|0}\{|{0}\rangle,|{0_{\bot}}\rangle\} is an orthonormal basis of 𝒱\mathcal{V}. The explicit formula implies fρf_{\rho} is uniquely defined (although neither 𝒱\mathcal{V} nor qq is uniquely defined if ψ=|00|\psi=|{0}\rangle\langle{0}|) and continuous, thus measurable. Since μ(Bϵ(ρ))=0\mu(B_{\epsilon}\left(\rho\right))=0, satisfying Eq. (29), if p01ϵp_{0}\leq 1-\epsilon, we consider the case p0>1ϵp_{0}>1-\epsilon. By further assuming ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right], we obtain

fρ(ψ)<ϵ|0|ψ|2>(ϵ+p0)(1qϵ)p0q,f_{\rho}(\psi)<\epsilon\Leftrightarrow|\langle{0}|{\psi}\rangle|^{2}>\frac{(\epsilon+p_{0})(1-q-\epsilon)}{p_{0}-q}, (32)

where the this condition is not trivial, i.e., (ϵ+p0)(1qϵ)p0q(0,1)\frac{(\epsilon+p_{0})(1-q-\epsilon)}{p_{0}-q}\in(0,1). Assuming d4d\geq 4, we calculate an upper bound on μ(Bϵ(ρ))\mu(B_{\epsilon}\left(\rho\right)) as follows: By defining 𝐔()\mathbf{U}\left(\mathcal{H}\right) as the set of unitary operators on \mathcal{H} and C:𝐔(d1)𝐔(d)C:\mathbf{U}\left(\mathbb{C}^{d-1}\right)\rightarrow\mathbf{U}\left(\mathbb{C}^{d}\right) as C(U):=|00|UC(U):=|{0}\rangle\langle{0}|\oplus U, we can show that for any unitary operator U𝐔(d1)U\in\mathbf{U}\left(\mathbb{C}^{d-1}\right),

μ({ψ𝐏(d):fρ(ψ)<ϵ})\displaystyle\mu\left(\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):f_{\rho}(\psi)<\epsilon\}\right)
=μ({ψ:fρ(C(U)ψC(U))<ϵ})\displaystyle=\mu\left(\{\psi:f_{\rho}(C(U)\psi C(U)^{\dagger})<\epsilon\}\right)
=𝐏(d)𝑑μ(ψ)I[|0|ψ|2>(ϵ+p0)(1qUϵ)p0qU]\displaystyle=\int_{\mathbf{P}\left(\mathbb{C}^{d}\right)}d\mu(\psi)\text{I}\left[|\langle{0}|{\psi}\rangle|^{2}>\frac{(\epsilon+p_{0})(1-q_{U}-\epsilon)}{p_{0}-q_{U}}\right] (33)

where we use the unitarily invariance of μ\mu in the first equality, qU=0|C(U)ρC(U)|0q_{U}=\langle{0_{\bot}}|C(U)^{\dagger}\rho C(U)|{0_{\bot}}\rangle, and I[X]{0,1}\text{I}\left[X\right]\in\{0,1\} is the indicator function, i.e., I[X]=1\text{I}\left[X\right]=1 iff XX is true. By integrating Eq. (33) with respect to the Haar measure on 𝐔(d1)\mathbf{U}\left(\mathbb{C}^{d-1}\right) and using Fubini’s theorem, we obtain

μ({ψ𝐏(d):fρ(ψ)<ϵ})=𝐏(d)𝑑μ(ψ)𝐏(d1)𝑑μ(ϕ)I[|0|ψ|2>(ϵ+p0)(1F(ρ,ϕ)ϵ)p0F(ρ,ϕ)],\mu\left(\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):f_{\rho}(\psi)<\epsilon\}\right)=\int_{\mathbf{P}\left(\mathbb{C}^{d}\right)}d\mu(\psi)\int_{\mathbf{P}\left(\mathbb{C}^{d-1}\right)}d\mu(\phi)\text{I}\left[|\langle{0}|{\psi}\rangle|^{2}>\frac{(\epsilon+p_{0})(1-F\left(\rho,\phi\right)-\epsilon)}{p_{0}-F\left(\rho,\phi\right)}\right], (34)

where ϕ𝐏(d1)\phi\in\mathbf{P}\left(\mathbb{C}^{d-1}\right) is identified with a pure state on 𝐏(d)\mathbf{P}\left(\mathbb{C}^{d}\right) acting on subspace span({|1,,|d1})\text{span}\left(\{|{1}\rangle,\cdots,|{d-1}\rangle\}\right). Using Fubini’s theorem again and Eq. (19), we can proceed with the calculation:

=𝐏(d1)𝑑μ(ϕ)δ(F(ρ,ϕ))2(d1)\displaystyle=\int_{\mathbf{P}\left(\mathbb{C}^{d-1}\right)}d\mu(\phi)\delta(F\left(\rho,\phi\right))^{2(d-1)}
=𝐏(d1)𝑑μ(ϕ)δ(i=1d1pi|i|ϕ|2)2(d1)\displaystyle=\int_{\mathbf{P}\left(\mathbb{C}^{d-1}\right)}d\mu(\phi)\delta\left(\sum_{i=1}^{d-1}p_{i}|\langle{i}|{\phi}\rangle|^{2}\right)^{2(d-1)}
𝐏(d1)𝑑μ(ϕ)δ((1p0)|1|ϕ|2)2(d1)\displaystyle\leq\int_{\mathbf{P}\left(\mathbb{C}^{d-1}\right)}d\mu(\phi)\delta\left((1-p_{0})|\langle{1}|{\phi}\rangle|^{2}\right)^{2(d-1)}
=(d2)01(1x)d3δ((1p0)x)2(d1)dx=:gd,ϵ(p0),\displaystyle=(d-2)\int_{0}^{1}(1-x)^{d-3}\delta((1-p_{0})x)^{2(d-1)}dx=:g_{d,\epsilon}(p_{0}), (35)

where δ(q)=(ϵ+q)(p0+ϵ1)p0q\delta(q)=\sqrt{\frac{(\epsilon+q)(p_{0}+\epsilon-1)}{p_{0}-q}}, we use the convexity of δ2(d1)\delta^{2(d-1)} and the unitary invariance of μ\mu in the last inequality, and we use the probability density of x=|1|ϕ|2x=|\langle{1}|{\phi}\rangle|^{2} derived by Eq. (19) in the last equality. To confirm the calculation, we plot a comparison between μ(Bϵ(ρ))\mu(B_{\epsilon}\left(\rho\right)) and its upper bound gd,ϵ(p0)g_{d,\epsilon}(p_{0}) for a particular ρ\rho as shown in Fig. 3, where we use the following explicit expression of g4,ϵ(p0)g_{4,\epsilon}(p_{0}):

g4,ϵ(p0)\displaystyle g_{4,\epsilon}(p_{0}) =\displaystyle= 2(p0+ϵ1)3{16bab22ab2+3(a+1)a(ba)(1abalogba)},\displaystyle 2(p_{0}+\epsilon-1)^{3}\bigg{\{}\frac{1-6b-ab^{2}}{2ab^{2}}+\frac{3(a+1)}{a(b-a)}\left(1-\frac{a}{b-a}\log\frac{b}{a}\right)\bigg{\}}, (36)

where a=2p01ϵ+p0a=\frac{2p_{0}-1}{\epsilon+p_{0}} and b=p0ϵ+p0b=\frac{p_{0}}{\epsilon+p_{0}} and p0(1ϵ,1)p_{0}\in(1-\epsilon,1). Note that g4,ϵ(1)=ϵ6=limp01g4,ϵ(p0)g_{4,\epsilon}(1)=\epsilon^{6}=\lim_{p_{0}\rightarrow 1}g_{4,\epsilon}(p_{0}).

Refer to caption
Figure 3: Plots of estimated values of μ(B12(ρ))\mu(B_{\frac{1}{2}}\left(\rho\right)) (dots) and g4,12(p0)g_{4,\frac{1}{2}}(p_{0}) (curve) for ρ=p0|00|+(1p0)|11|𝐒(4)\rho=p_{0}|{0}\rangle\langle{0}|+(1-p_{0})|{1}\rangle\langle{1}|\in\mathbf{S}\left(\mathbb{C}^{4}\right). μ(B12(ρ))\mu(B_{\frac{1}{2}}\left(\rho\right)) is estimated by uniformly sampling 10710^{7} pure states. The plots indicate μ(B12(ρ))\mu(B_{\frac{1}{2}}\left(\rho\right)) is well upper bounded by g4,12(p0)g_{4,\frac{1}{2}}(p_{0}).

It is sufficient to show that under the two conditions ϵ(0,12]\epsilon\in\left(0,\frac{1}{2}\right] and d4d\geq 4,

p0(1ϵ,1),dgd,ϵdp00\forall p_{0}\in(1-\epsilon,1),\frac{dg_{d,\epsilon}}{dp_{0}}\geq 0 (37)

since gd,ϵ(1)=ϵ2(d1)g_{d,\epsilon}(1)=\epsilon^{2(d-1)}. Since the integrand of gd,ϵg_{d,\epsilon} and its partial derivative with respect to p0p_{0} are continuous, we can interchange the partial differential and integral operators:

dgd,ϵdp0\displaystyle\frac{dg_{d,\epsilon}}{dp_{0}} =\displaystyle= (d2)01(1x)d3p0δ((1p0)x)2(d1)𝑑x\displaystyle(d-2)\int_{0}^{1}(1-x)^{d-3}\frac{\partial}{\partial p_{0}}\delta((1-p_{0})x)^{2(d-1)}dx (38)
=\displaystyle= αd,ϵ(p0)01βϵ(p0,x)γd,ϵ(p0,x)𝑑x,\displaystyle\alpha_{d,\epsilon}(p_{0})\int_{0}^{1}\beta_{\epsilon}(p_{0},x)\gamma_{d,\epsilon}(p_{0},x)dx,

where αd,ϵ(p0)=(d2)(d1)(p0+ϵ1)d2\alpha_{d,\epsilon}(p_{0})=(d-2)(d-1)(p_{0}+\epsilon-1)^{d-2}, βϵ(p0,x)=(1p0)2x2+(1ϵϵ2p02)x+(1ϵ)ϵ\beta_{\epsilon}(p_{0},x)=-(1-p_{0})^{2}x^{2}+(1-\epsilon-\epsilon^{2}-p_{0}^{2})x+(1-\epsilon)\epsilon and γd,ϵ(p0,x)=(1x)d3(ϵ+(1p0)x)d2(p0(1p0)x)d\gamma_{d,\epsilon}(p_{0},x)=\frac{(1-x)^{d-3}(\epsilon+(1-p_{0})x)^{d-2}}{(p_{0}-(1-p_{0})x)^{d}}. Since αd,ϵ\alpha_{d,\epsilon} and γd,ϵ\gamma_{d,\epsilon} are non-negative in the entire considered region R:={(p0,x):p0(1ϵ,1)x[0,1]}R:=\{(p_{0},x):p_{0}\in(1-\epsilon,1)\wedge x\in[0,1]\}, dgd,ϵdp00\frac{dg_{d,\epsilon}}{dp_{0}}\geq 0 if βϵ\beta_{\epsilon} is non-negative for all x[0,1]x\in[0,1]. However, βϵ\beta_{\epsilon} can be negative for some x[0,1]x\in[0,1] if and only if βϵ(p0,1)<0\beta_{\epsilon}(p_{0},1)<0. Taking account of considered region RR, it is sufficient to show dgd,ϵdp00\frac{dg_{d,\epsilon}}{dp_{0}}\geq 0 for all p0(1+14ϵ22,1)((1ϵ,1))p_{0}\in\left(\frac{1+\sqrt{1-4\epsilon^{2}}}{2},1\right)(\subseteq(1-\epsilon,1)), where βϵ\beta_{\epsilon} can be negative.

For fixed p(1+14ϵ22,1)p^{*}\in\left(\frac{1+\sqrt{1-4\epsilon^{2}}}{2},1\right), let x(0,1)x^{*}\in(0,1) satisfy βϵ(p,x)=0\beta_{\epsilon}(p^{*},x^{*})=0. Since βϵ(p,x)\beta_{\epsilon}(p^{*},x) is monotonically decreasing in x0x\geq 0, xx^{*} is uniquely defined, βϵ(p,x)>0\beta_{\epsilon}(p^{*},x)>0 if x[0,x)x\in[0,x^{*}) and βϵ(p,x)<0\beta_{\epsilon}(p^{*},x)<0 if x(x,1]x\in(x^{*},1]. Thus, showing

d4,c>0,{γd+1,ϵ(p,x)cγd,ϵ(p,x)forx[0,x)γd+1,ϵ(p,x)cγd,ϵ(p,x)forx(x,1]\forall d\geq 4,\exists c>0,\left\{\begin{array}[]{ll}\gamma_{d+1,\epsilon}(p^{*},x)\geq c\gamma_{d,\epsilon}(p^{*},x)&\text{for}\ x\in[0,x^{*})\\ \gamma_{d+1,\epsilon}(p^{*},x)\leq c\gamma_{d,\epsilon}(p^{*},x)&\text{for}\ x\in(x^{*},1]\end{array}\right. (39)

and

dg4,ϵdp0|p0=p0,\frac{dg_{4,\epsilon}}{dp_{0}}\bigg{|}_{p_{0}=p^{*}}\geq 0, (40)

is sufficient for d4,dgd,ϵdp0|p0=p0\forall d\geq 4,\frac{dg_{d,\epsilon}}{dp_{0}}\Big{|}_{p_{0}=p^{*}}\geq 0. For

αd+1,ϵ(p)1dgd+1,ϵdp0|p0=p\displaystyle\alpha_{d+1,\epsilon}(p^{*})^{-1}\frac{dg_{d+1,\epsilon}}{dp_{0}}\bigg{|}_{p_{0}=p^{*}} =\displaystyle= 0xβϵ(p,x)γd+1,ϵ(p,x)𝑑x+x1βϵ(p,x)γd+1,ϵ(p,x)𝑑x\displaystyle\int_{0}^{x^{*}}\beta_{\epsilon}(p^{*},x)\gamma_{d+1,\epsilon}(p^{*},x)dx+\int_{x^{*}}^{1}\beta_{\epsilon}(p^{*},x)\gamma_{d+1,\epsilon}(p^{*},x)dx (41)
\displaystyle\geq c{0xβϵ(p,x)γd,ϵ(p,x)𝑑x+x1βϵ(p,x)γd,ϵ(p,x)𝑑x}\displaystyle c\biggl{\{}\int_{0}^{x^{*}}\beta_{\epsilon}(p^{*},x)\gamma_{d,\epsilon}(p^{*},x)dx+\int_{x^{*}}^{1}\beta_{\epsilon}(p^{*},x)\gamma_{d,\epsilon}(p^{*},x)dx\biggr{\}}
=\displaystyle= cαd,ϵ(p)1dgd,ϵdp0|p0=p\displaystyle c\alpha_{d,\epsilon}(p^{*})^{-1}\frac{dg_{d,\epsilon}}{dp_{0}}\bigg{|}_{p_{0}=p^{*}}

holds for any d4.d\geq 4.

First, we show Eq. (39). By observing that for any d4d\geq 4,

γd+1,ϵ(p,x)cγd,ϵ(p,x)=γd,ϵ(p,x)((1x)(ϵ+(1p)x)p(1p)xc),\gamma_{d+1,\epsilon}(p^{*},x)-c\gamma_{d,\epsilon}(p^{*},x)=\gamma_{d,\epsilon}(p^{*},x)\left(\frac{(1-x)(\epsilon+(1-p^{*})x)}{p^{*}-(1-p^{*})x}-c\right), (42)

hϵ,p(x):=(1x)(ϵ+(1p)x)p(1p)xh_{\epsilon,p^{*}}(x):=\frac{(1-x)(\epsilon+(1-p^{*})x)}{p^{*}-(1-p^{*})x} is monotonically decreasing in x[x^,1]x\in[\hat{x},1] and hϵ,p(x)hϵ,p(x^)h_{\epsilon,p^{*}}(x)\geq h_{\epsilon,p^{*}}(\hat{x}) for x[0,x^]x\in[0,\hat{x}] with x^:=max{0,12p1p(1p)ϵ}(x)\hat{x}:=\max\left\{0,1-\frac{2p^{*}-1}{p^{*}(1-p^{*})}\epsilon\right\}(\leq x^{*}), setting c=hϵ,p(x)(>0)c=h_{\epsilon,p^{*}}(x^{*})(>0) implies Eq. (39).

Next, Eq. (40) can be verified by using the explicit expression Eq. (36).

Appendix D Proof for Thoerem 3 and its sharp lower bound

In this section, we prove Theorem 3 with a tighter lower bound as the following theorem. After the proof, we show that this lower bound is sharp by constructing set {Υx}x\{\Upsilon_{x}\}_{x} of unitary transformations such that maxΥminp12ΥxXp(x)Υx\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond} is arbitrarily close to its lower bound 4βd(1βd)=4d(11d)\frac{4\beta}{d}\left(1-\frac{\beta}{d}\right)=\frac{4}{d}\left(1-\frac{1}{d}\right) while the upper bound given in the theorem is α=1\alpha=1 (actually, each Υx\Upsilon_{x} in the set can be perfectly distinguished from the identity transformation,) where α\alpha and β\beta are defined in the following theorem.

Theorem 4.

For an integer d2d\geq 2 specified below, let {Υx}xX\{\Upsilon_{x}\}_{x\in X} be a finite set of unitary transformations on 𝐋(d)\mathbf{L}\left(\mathbb{C}^{d}\right). Then, it holds that

4βd(1βd)maxΥminp12ΥxXp(x)Υxα\displaystyle\frac{4\beta}{d}\left(1-\frac{\beta}{d}\right)\leq\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}\leq\alpha (43)
α=maxΥminxX(12ΥΥx)2,β=11α,\displaystyle\alpha=\max_{\Upsilon}\min_{x\in X}\left(\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}\right)^{2},\ \ \beta=1-\sqrt{1-\alpha}, (44)

where the first minimum is taken over probability distribution pp over XX. Note that if d=2d=2, the equalities hold.

The lower bound of Theorem 3 can be derived from this theorem as follows:

4βd(1βd)4βd(1β2)=2dα.\frac{4\beta}{d}\left(1-\frac{\beta}{d}\right)\geq\frac{4\beta}{d}\left(1-\frac{\beta}{2}\right)=\frac{2}{d}\alpha. (45)

In the proof, we use the following fact about the diamond norm: For two CPTP linear mappings Γ\Gamma and Λ\Lambda from 𝐋(1)\mathbf{L}\left(\mathcal{H}_{1}\right) to 𝐋(2)\mathbf{L}\left(\mathcal{H}_{2}\right), we can verify

12ΓΛ=maxM𝐓(1:2)tr[(J(ΓΛ))M],\frac{1}{2}\left\|\Gamma-\Lambda\right\|_{\diamond}=\max_{M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})}\text{tr}\left[(J(\Gamma-\Lambda))M\right], (46)

where J(Ξ):=i,j|ij|Ξ(|ij|)𝐋(12)J(\Xi):=\sum_{i,j}|{i}\rangle\langle{j}|\otimes\Xi(|{i}\rangle\langle{j}|)\in\mathbf{L}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right) is the Choi-Jamiołkowski operator of linear mapping Ξ\Xi and 𝐓(1:2):={M𝐏𝐨𝐬(12):ρ𝐒(1),Mρ𝕀}\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2}):=\{M\in\mathbf{Pos}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right):\exists\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right),M\leq\rho\otimes\mathbb{I}\} is the set of measuring strategies [19] or that of quantum testers [20].

Proof.

Let Υ\Upsilon and Υx\Upsilon_{x} be unitary transformations from 𝐋(1)\mathbf{L}\left(\mathcal{H}_{1}\right) to 𝐋(2)\mathbf{L}\left(\mathcal{H}_{2}\right), defined as Υ(ρ)=UρU\Upsilon(\rho)=U\rho U^{\dagger} and Υx(ρ)=UxρUx\Upsilon_{x}(\rho)=U_{x}\rho U_{x}^{\dagger}, respectively.

First, we show

maxΥminp12ΥxXp(x)ΥxmaxΥminxX(12ΥΥx)2.\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}\leq\max_{\Upsilon}\min_{x\in X}\left(\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}\right)^{2}. (47)

By using Eq. (46), we obtain

(L.H.S. of Ineq.(47))\displaystyle(\text{L.H.S.\ of\ Ineq.}\eqref{ineq:upperdiamond})
=maxΥminpmaxM𝐓(1:2)tr[J(ΥxXp(x)Υx)M]\displaystyle=\max_{\Upsilon}\min_{p}\max_{M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})}\text{tr}\left[J\left(\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right)M\right]
=maxΥmaxM𝐓(1:2)minptr[J(ΥxXp(x)Υx)M]\displaystyle=\max_{\Upsilon}\max_{M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})}\min_{p}\text{tr}\left[J\left(\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right)M\right]
=maxM𝐓(1:2){maxΥtr[J(Υ)M]maxxXtr[J(Υx)M]},\displaystyle=\max_{M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})}\left\{\max_{\Upsilon}\text{tr}\left[J\left(\Upsilon\right)M\right]-\max_{x\in X}\text{tr}\left[J\left(\Upsilon_{x}\right)M\right]\right\}, (48)

where we use the minimax theorem in the second equation since f(p,M):=tr[J(Υxp(x)Υx)M]f(p,M):=\text{tr}\left[J\left(\Upsilon-\sum_{x}p(x)\Upsilon_{x}\right)M\right] is affine with respect to each variable and the domain of pp and MM are compact and convex. Since it is known that the set of mappings Υtr[J(Υ)M]\Upsilon\mapsto\text{tr}\left[J\left(\Upsilon\right)M\right] associated to quantum testers MM is equivalent to that of mappings Υtr[Υid3(Φ)Π]\Upsilon\mapsto\text{tr}\left[\Upsilon\otimes id_{\mathcal{H}_{3}}(\Phi)\Pi\right] associated to pure states Φ\Phi and hermitian projectors Π\Pi [20, Theorem 10] for sufficiently large dimensional Hilbert space 3\mathcal{H}_{3} (to be self-contained, we provide a proof for the equivalence between the two mappings in Appendix G) we can proceed as follows:

=maxΦ𝐏(13)Π𝐏𝐫𝐨𝐣(23)(maxUtr[(U𝕀3)Φ(U𝕀3)Π]maxxXtr[(Ux𝕀3)Φ(Ux𝕀3)Π]),\displaystyle=\max_{\begin{subarray}{c}\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{3}\right)\\ \Pi\in\mathbf{Proj}(\mathcal{H}_{2}\otimes\mathcal{H}_{3})\end{subarray}}\Big{(}\max_{U}\text{tr}\left[(U\otimes\mathbb{I}_{\mathcal{H}_{3}})\Phi(U\otimes\mathbb{I}_{\mathcal{H}_{3}})^{\dagger}\Pi\right]-\max_{x\in X}\text{tr}\left[(U_{x}\otimes\mathbb{I}_{\mathcal{H}_{3}})\Phi(U_{x}\otimes\mathbb{I}_{\mathcal{H}_{3}})^{\dagger}\Pi\right]\Big{)}, (49)

where 𝐏𝐫𝐨𝐣()\mathbf{Proj}(\mathcal{H}) is the set of hermitian projectors on \mathcal{H}. Let Φ^\hat{\Phi}, Π^\hat{\Pi} and U^\hat{U} maximize Eq. (49). Since arbitrary unitary transformations cannot be represented by ensembles of finite set of unitary transformations, the first term cannot be 0, thus Π^U^|Φ^0\hat{\Pi}\hat{U}|{\hat{\Phi}}\rangle\neq 0. Let Ψ^\hat{\Psi} be the pure state such that |Ψ^Π^U^|Φ^|{\hat{\Psi}}\rangle\propto\hat{\Pi}\hat{U}|{\hat{\Phi}}\rangle. Then, we can verify that Eq. (49) is still maximized even if we replace Π^\hat{\Pi} by Ψ^\hat{\Psi}. Thus, Π\Pi in Eq. (49) can be restricted as a pure state, i.e., Π=Ψ𝐏(23)\Pi=\Psi\in\mathbf{P}\left(\mathcal{H}_{2}\otimes\mathcal{H}_{3}\right), and we proceed as follows:

=maxΦ𝐏(13)Ψ𝐏(23)(maxU|Ψ|U𝕀3|Φ|2maxxX|Ψ|Ux𝕀3|Φ|2).\displaystyle=\max_{\begin{subarray}{c}\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{3}\right)\\ \Psi\in\mathbf{P}\left(\mathcal{H}_{2}\otimes\mathcal{H}_{3}\right)\end{subarray}}\Big{(}\max_{U}|\langle{\Psi}|U\otimes\mathbb{I}_{\mathcal{H}_{3}}|{\Phi}\rangle|^{2}-\max_{x\in X}|\langle{\Psi}|U_{x}\otimes\mathbb{I}_{\mathcal{H}_{3}}|{\Phi}\rangle|^{2}\Big{)}. (50)

Before proceeding to the next step, we show the set of mappings fΦ,Ψ:U|Ψ|U𝕀3|Φ|f_{\Phi,\Psi}:U\mapsto|\langle{\Psi}|U\otimes\mathbb{I}_{\mathcal{H}_{3}}|{\Phi}\rangle| associated to pure states Φ\Phi and Ψ\Psi is equivalent to that of mappings gA:U|tr[AU]|g_{A}:U\mapsto\left|\text{tr}\left[AU\right]\right| associated to linear operator AA such that A11\left\|A\right\|_{1}\leq 1, where A1\left\|A\right\|_{1} is the Schatten 11-norm of AA. By using decompositions |Φ=i,jαij|i|j|{\Phi}\rangle=\sum_{i,j}\alpha_{ij}|{i}\rangle|{j}\rangle and |Ψ=i,jβij|i|j|{\Psi}\rangle=\sum_{i,j}\beta_{ij}|{i}\rangle|{j}\rangle with respect to orthonormal bases, we can verify that gAg_{A} with A=i,j,kαikβjk|ij|A=\sum_{i,j,k}\alpha_{ik}\beta^{*}_{jk}|{i}\rangle\langle{j}| equals to fΦ,Ψf_{\Phi,\Psi} and A1=maxUgA(U)=maxUfΦ,Ψ(U)1\left\|A\right\|_{1}=\max_{U}g_{A}(U)=\max_{U}f_{\Phi,\Psi}(U)\leq 1. On the other hand, By using the singular value decomposition A=ipi|xiyi|A=\sum_{i}p_{i}|{x_{i}}\rangle\langle{y_{i}}|, where A11\left\|A\right\|_{1}\leq 1 implies p+ipi=1p+\sum_{i}p_{i}=1 with some p0p\geq 0, we can verify that fΦ,Ψf_{\Phi,\Psi} with |Φ=p|0|+ipi|xi|i|{\Phi}\rangle=\sqrt{p}|{0}\rangle|{\bot}\rangle+\sum_{i}\sqrt{p_{i}}|{x_{i}}\rangle|{i}\rangle and |Ψ=p|0|+ipi|yi|i|{\Psi}\rangle=\sqrt{p}|{0}\rangle|{\bot^{\prime}}\rangle+\sum_{i}\sqrt{p_{i}}|{y_{i}}\rangle|{i}\rangle ({|i}i{|,|}\{|{i}\rangle\}_{i}\cup\{|{\bot}\rangle,|{\bot^{\prime}}\rangle\} is an orthonormal basis) equals to gAg_{A}.

By using the equivalent between two sets of mappings and A1=maxU|tr[AU]|\left\|A\right\|_{1}=\max_{U}\left|\text{tr}\left[AU\right]\right|, we proceed as follows:

=maxA:A11(A12maxxX|tr[AUx]|2)\displaystyle=\max_{A:\left\|A\right\|_{1}\leq 1}\left(\left\|A\right\|_{1}^{2}-\max_{x\in X}|\text{tr}\left[AU_{x}\right]|^{2}\right)
=maxV,ρ𝐒(1)q[0,1]q2(1maxxX|tr[ρVUx]|2)\displaystyle=\max_{\begin{subarray}{c}V,\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)\\ q\in[0,1]\end{subarray}}q^{2}\left(1-\max_{x\in X}|\text{tr}\left[\rho V^{\dagger}U_{x}\right]|^{2}\right)
=1minV,ρ𝐒(1)maxxX|tr[ρVUx]|2,\displaystyle=1-\min_{\begin{subarray}{c}V,\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)\end{subarray}}\max_{x\in X}|\text{tr}\left[\rho V^{\dagger}U_{x}\right]|^{2}, (51)

where we use the polar decomposition A=qρVA=q\rho V^{\dagger} in the second equation and V:12V:\mathcal{H}_{1}\rightarrow\mathcal{H}_{2} is a unitary operator. On the other hand, since 12ΥΥx=maxΦ𝐏(11)Υid1(Φ)Υxid1(Φ)tr=maxΦ𝐏(11)1F(Υid1(Φ),Υxid1(Φ))=1minΦ𝐏(11)|Φ|UUx𝕀1|Φ|2=1minρ𝐒(1)|tr[ρUUx]|2\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}=\max_{\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1}^{\prime}\right)}\left\|\Upsilon\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi)-\Upsilon_{x}\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi)\right\|_{\text{tr}}=\max_{\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1}^{\prime}\right)}\sqrt{1-F\left(\Upsilon\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi),\Upsilon_{x}\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi)\right)}=\sqrt{1-\min_{\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1}^{\prime}\right)}|\langle{\Phi}|U^{\dagger}U_{x}\otimes\mathbb{I}_{\mathcal{H}_{1}^{\prime}}|{\Phi}\rangle|^{2}}=\sqrt{1-\min_{\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)}|\text{tr}\left[\rho U^{\dagger}U_{x}\right]|^{2}}, it holds that

(R.H.S. of Ineq.(47))=1minVmaxxXminρ𝐒(1)|tr[ρVUx]|2.(\text{R.H.S.\ of\ Ineq.}\eqref{ineq:upperdiamond})=1-\min_{V}\max_{x\in X}\min_{\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)}|\text{tr}\left[\rho V^{\dagger}U_{x}\right]|^{2}. (52)

Since maxxminyf(x,y)minymaxxf(x,y)\max_{x}\min_{y}f(x,y)\leq\min_{y}\max_{x}f(x,y) for any ff if the maximum and the minimum exist, we obtain Ineq. (47).

Next, we show

maxΥminp12ΥxXp(x)Υx4βd(1βd),\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}\geq\frac{4\beta}{d}\left(1-\frac{\beta}{d}\right), (53)

where β=11maxΥminxX(12ΥΥx)2\beta=1-\sqrt{1-\max_{\Upsilon}\min_{x\in X}\left(\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}\right)^{2}}. Due to Eq. (52), we can verify that β=1minVmaxxminρ|tr[ρVUx]|\beta=1-\min_{V}\max_{x}\min_{\rho}\left|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\right|. First, observe that for any unitary operator WW on d\mathbb{C}^{d} (d2)(d\geq 2),

1d|tr[W]|=1d|i=1dλi(W)|2dminp|i=1dp(i)λi(W)|+d2d=2dminρ𝐒(d)|tr[ρW]|+d2d\displaystyle\frac{1}{d}\left|\text{tr}\left[W\right]\right|=\frac{1}{d}\left|\sum_{i=1}^{d}\lambda_{i}(W)\right|\leq\frac{2}{d}\min_{p}\left|\sum_{i=1}^{d}p(i)\lambda_{i}(W)\right|+\frac{d-2}{d}=\frac{2}{d}\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\left|\text{tr}\left[\rho W\right]\right|+\frac{d-2}{d} (54)

holds, where λi(W)\lambda_{i}(W) is the ii-th eigenvalue of WW, and in the inequality, we use the following two facts: (i) the minimization is achieved only if pp satisfies i,p(i)12\forall i,p(i)\leq\frac{1}{2} due to a geometric obseravation, and (ii) for such probability distribution p(12)p\left(\leq\frac{1}{2}\right) and complex numbers λi{z:|z|=1}\lambda_{i}\in\{z\in\mathbb{C}:|z|=1\}, |ip(i)λi||i12λi||i(12p(i))λi|12|iλi|i(12p(i))=12|iλi|d22\left|\sum_{i}p(i)\lambda_{i}\right|\geq\left|\sum_{i}\frac{1}{2}\lambda_{i}\right|-\left|\sum_{i}\left(\frac{1}{2}-p(i)\right)\lambda_{i}\right|\geq\frac{1}{2}\left|\sum_{i}\lambda_{i}\right|-\sum_{i}\left(\frac{1}{2}-p(i)\right)=\frac{1}{2}\left|\sum_{i}\lambda_{i}\right|-\frac{d-2}{2}. By using this, we obtain

minV,ρ𝐒(1)maxxX|tr[ρVUx]|minVmaxxX|tr[𝕀dVUx]|\displaystyle\min_{\begin{subarray}{c}V,\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)\end{subarray}}\max_{x\in X}|\text{tr}\left[\rho V^{\dagger}U_{x}\right]|\leq\min_{\begin{subarray}{c}V\end{subarray}}\max_{x\in X}\left|\text{tr}\left[\frac{\mathbb{I}}{d}V^{\dagger}U_{x}\right]\right|
2dminVmaxxXminρ𝐒(d)|tr[ρVUx]|+d2d=2d(1β)+d2d.\displaystyle\leq\frac{2}{d}\min_{\begin{subarray}{c}V\end{subarray}}\max_{x\in X}\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\left|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\right|+\frac{d-2}{d}=\frac{2}{d}(1-\beta)+\frac{d-2}{d}. (55)

This and Eq. (51) implies

(L.H.S. of Ineq.(53))1(2d(1β)+d2d)2=(R.H.S. of Ineq.(53)).\displaystyle(\text{L.H.S.\ of\ Ineq.}\eqref{ineq:lowerdiamond})\geq 1-\left(\frac{2}{d}(1-\beta)+\frac{d-2}{d}\right)^{2}=(\text{R.H.S.\ of\ Ineq.}\eqref{ineq:lowerdiamond}). (56)

This completes the proof.

D.1 Sharpness of the lower bound

We show the equality in Ineq. (53) holds when we approximate arbitrary unitary transformations by using restricted set {Λ:Λ(ρ)=WρW,W𝐖}\{\Lambda:\Lambda(\rho)=W\rho W^{\dagger},W\in\mathbf{W}\} of unitary transformations, where

𝐖:={W:W is a unitary operator on d s.t.the convex hull of W’s eigenvalues contains 0.}.\displaystyle\mathbf{W}:=\left\{W:\begin{matrix}W\text{\ is\ a\ unitary\ operator\ on\ $\mathbb{C}^{d}$ s.t.}\\ \text{the\ convex\ hull\ of\ }W\text{'s\ eigenvalues\ contains\ }0.\end{matrix}\right\}. (57)

More precisely, since we assume {Υx}x\{\Upsilon_{x}\}_{x} is a finite set, we show that Ineq. (53) is getting tighter when ϵ0\epsilon\rightarrow 0 and {Υx:Υx(ρ)=UxρUx}x\{\Upsilon_{x}:\Upsilon_{x}(\rho)=U_{x}\rho U_{x}^{\dagger}\}_{x} is an ϵ\epsilon-covering of {Λ:Λ(ρ)=WρW,W𝐖}\{\Lambda:\Lambda(\rho)=W\rho W^{\dagger},W\in\mathbf{W}\}, i.e., maxΛminxΛΥx<ϵ\max_{\Lambda}\min_{x}\left\|\Lambda-\Upsilon_{x}\right\|_{\diamond}<\epsilon, where Ux𝐖U_{x}\in\mathbf{W}. This can be shown by the following two observations:

First, for any xXx\in X, there exists pure state ϕx𝐏(d)\phi_{x}\in\mathbf{P}\left(\mathbb{C}^{d}\right) such that tr[ϕxUx]=0\text{tr}\left[\phi_{x}U_{x}\right]=0 since 0 is contained in the convex hull of UxU_{x}’s eigenvalues. Then, we obtain

maxΥminxX12ΥΥxminxX12idΥxminxXϕxΥx(ϕx)tr\displaystyle\max_{\Upsilon}\min_{x\in X}\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}\geq\min_{x\in X}\frac{1}{2}\left\|id-\Upsilon_{x}\right\|_{\diamond}\geq\min_{x\in X}\left\|\phi_{x}-\Upsilon_{x}(\phi_{x})\right\|_{\text{tr}}
=minxX1F(ϕx,UxϕxUx)=minxX1|tr[ϕxUx]|2=1.\displaystyle=\min_{x\in X}\sqrt{1-F\left(\phi_{x},U_{x}\phi_{x}U_{x}^{\dagger}\right)}=\min_{x\in X}\sqrt{1-|\text{tr}\left[\phi_{x}U_{x}\right]|^{2}}=1. (58)

This implies that β\beta in Ineq. (53) satisfies β=1\beta=1.

Second, by using Eq. (51), we obtain

maxΥminp12ΥxXp(x)Υx=1minV,ρ𝐒(d)maxxX|tr[ρVUx]|2<1minV,ρ𝐒(d)maxW𝐖|tr[ρVW]|2+ϵ,\displaystyle\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}=1-\min_{V,\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{x\in X}\left|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\right|^{2}<1-\min_{V,\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{W\in\mathbf{W}}\left|\text{tr}\left[\rho V^{\dagger}W\right]\right|^{2}+\epsilon, (59)

where in the inequality, we use that for any W𝐖W\in\mathbf{W}, there exists Υx\Upsilon_{x} such that ϵ>12ΛΥxtr[Φ(VW𝕀)Φ(WV𝕀)]tr[Φ(VUx𝕀)Φ(UxV𝕀)]=|tr[ρVW]|2|tr[ρVUx]|2\epsilon>\frac{1}{2}\left\|\Lambda-\Upsilon_{x}\right\|_{\diamond}\geq\text{tr}\left[\Phi(V^{\dagger}W\otimes\mathbb{I})\Phi(W^{\dagger}V\otimes\mathbb{I})\right]-\text{tr}\left[\Phi(V^{\dagger}U_{x}\otimes\mathbb{I})\Phi(U_{x}^{\dagger}V\otimes\mathbb{I})\right]=\left|\text{tr}\left[\rho V^{\dagger}W\right]\right|^{2}-\left|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\right|^{2}, where Λ(ρ)=WρW\Lambda(\rho)=W\rho W^{\dagger} and Φ\Phi is a purification of ρ\rho. Since ϵ\epsilon can be arbitrarily small positive number, showing

minV,ρ𝐒(d)maxW𝐖|tr[ρVW]|12d\min_{V,\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{W\in\mathbf{W}}\left|\text{tr}\left[\rho V^{\dagger}W\right]\right|\geq 1-\frac{2}{d} (60)

is sufficient to prove the sharpness of Ineq. (53).

For any unitary operator VV, there exists {Wi𝐖}i=1d\{W_{i}\in\mathbf{W}\}_{i=1}^{d} such that V,W1,,WdV,W_{1},\cdots,W_{d} are simultaneously diagonalizable and the jj-th eigenvalues of VV and WiW_{i} are the same for all jij\neq i. By letting VWi=𝕀(1zi)|ii|V^{\dagger}W_{i}=\mathbb{I}-\left(1-z_{i}\right)|{i}\rangle\langle{i}|, where complex number ziz_{i} satisfies |zi|=1|z_{i}|=1 and {|i}i=1d\{|{i}\rangle\}_{i=1}^{d} is an orthonormal basis, we can proceed as follows: for any unitary operator VV,

minρ𝐒(d)maxW𝐖|tr[ρVW]|minρ𝐒(d)max1id|tr[ρ(𝕀(1zi)|ii|)]|\displaystyle\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{W\in\mathbf{W}}\left|\text{tr}\left[\rho V^{\dagger}W\right]\right|\geq\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{1\leq i\leq d}\left|\text{tr}\left[\rho(\mathbb{I}-\left(1-z_{i}\right)|{i}\rangle\langle{i}|)\right]\right|
minρ𝐒(d)max1id(1|1zi|i|ρ|i)12maxρ𝐒(d)min1idi|ρi12d.\displaystyle\geq\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{1\leq i\leq d}\left(1-\left|1-z_{i}\right|\langle{i}|\rho|{i}\rangle\right)\geq 1-2\max_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\min_{1\leq i\leq d}\langle{i}|\rho|{i}\rangle\geq 1-\frac{2}{d}. (61)

This completes the proof.

Appendix E The operator norm and the diamond norm of unitary transformations

Suppose δ[0,2]\delta\in[0,\sqrt{2}]. In this section, we show that Υ1Υ2<δ4δ2\left\|\Upsilon_{1}-\Upsilon_{2}\right\|_{\diamond}<\delta\sqrt{4-\delta^{2}} if U1U2<δ\left\|U_{1}-U_{2}\right\|_{\infty}<\delta, where Υi(ρ):=UiρUi\Upsilon_{i}(\rho):=U_{i}\rho U_{i}^{\dagger} is a unitary transformation on 𝐋()\mathbf{L}\left(\mathcal{H}\right).

By using the unitarily invariance of the diamond norm and the operator norm, it is sufficient to show idΥ<δ4δ2\left\|id-\Upsilon\right\|_{\diamond}<\delta\sqrt{4-\delta^{2}} if 𝕀U<δ\left\|\mathbb{I}-U\right\|_{\infty}<\delta, where Υ(ρ):=UρU\Upsilon(\rho):=U\rho U^{\dagger}. Let λi{z:|z|=1}\lambda_{i}\in\{z\in\mathbb{C}:|z|=1\} be the ii-th eigenvalue of UU. Then, we obtain

δ>𝕀U=maxi{|1λi|}.\displaystyle\delta>\left\|\mathbb{I}-U\right\|_{\infty}=\max_{i}\{|1-\lambda_{i}|\}. (62)

On the other hand, by using the similar observation used for deriving Eq. (52),

idΥ\displaystyle\left\|id-\Upsilon\right\|_{\diamond} =\displaystyle= 21minρ𝐒()|tr[ρU]|2=21minp|ip(i)λi|2.\displaystyle 2\sqrt{1-\min_{\rho\in\mathbf{S}\left(\mathcal{H}\right)}|\text{tr}\left[\rho U\right]|^{2}}=2\sqrt{1-\min_{p}\left|\sum_{i}p(i)\lambda_{i}\right|^{2}}. (63)

By using an elementary geometric observation shown in Fig.4, we obtain minp|ip(i)λi|>ϵ=1δ24(4δ2)\min_{p}\left|\sum_{i}p(i)\lambda_{i}\right|>\epsilon=\sqrt{1-\frac{\delta^{2}}{4}\left(4-\delta^{2}\right)}. This completes the proof.

Refer to caption
Figure 4: Geometric relationship between the operator norm and the diamond norm. Ineq. (62) implies that eigenvalues λi\lambda_{i} of UU reside on the bold arc and their convex hull is contained in the shaded segment, which also implies that the shortest distance between the origin and the convex hull is greater than ϵ\epsilon.

Appendix F Circuit synthesis of single qubit unitary transformations by using gate set {S,H,T}\{S,H,T\}

Recall that n(ϵ)n(\epsilon) is the smallest length of gate sequences formed from gate set {S,H,T}\{S,H,T\} to approximate arbitrary single qubit unitary transformations within accuracy ϵ\epsilon, i.e., n(ϵ):=min{n:maxΥminx12ΥΥx(n)<ϵ}n(\epsilon):=\min\left\{n\in\mathbb{N}:\max_{\Upsilon}\min_{x}\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}^{(n)}\right\|_{\diamond}<\epsilon\right\}, where {Υx(n)}x\{\Upsilon_{x}^{(n)}\}_{x} is the set of unitary transformations representing the unitary circuit realized by the gate sequences of length at most nn. In this section, we perform a numerical calculation of n(ϵ)n(\epsilon) and show that the probabilistic implementation reduces the gate length owing to Theorem 3.

First, we generate {Υx(n)}x\{\Upsilon_{x}^{(n)}\}_{x}. Next, we randomly sample 2×1042\times 10^{4} single-qubit unitary transformations with respect to the Haar measure on the unitary group on 2\mathbb{C}^{2}. Third, for each randomly sampled unitary transformation Υ\Upsilon, we calculate the half ϵ^(Υ)\hat{\epsilon}(\Upsilon) of the minimum diamond norm between Υ\Upsilon and {Υx(n)}x\{\Upsilon_{x}^{(n)}\}_{x}. Then, we compute the maximum value ϵ^\hat{\epsilon} of ϵ^(Υ)\hat{\epsilon}(\Upsilon) over all the sampled Υ\Upsilon. Since another numerical experiment indicates that the set of randomly sampled 2×1042\times 10^{4} single-qubit unitary transformations is 0.10.1-covering of that of single-qubit unitary transformations with high probability, we assume that gate sequences of length at most nn approximate arbitrary single-qubit unitary transformations within accuracy ϵ^\hat{\epsilon}. (The true accuracy ϵ\epsilon satisfies ϵ^ϵ<ϵ^+0.1\hat{\epsilon}\leq\epsilon<\hat{\epsilon}+0.1.)

In Fig.5, we plot n(ϵ)n(\epsilon) and n(ϵ)n(\sqrt{\epsilon}), which correspond to the minimum length of gate sequences to synthesize single-qubit unitary transformations within accuracy ϵ\epsilon by using the deterministic implementation and the probabilistic one, respectively. This graph shows that the probabilistic implementation can reduce the circuit size in a wide range of accuracy ϵ\epsilon.

Refer to caption
Figure 5: The minimum gate length at most nn to synthesize single qubit unitary transformations with gate set {S,H,T}\{S,H,T\} within accuracy ϵ\epsilon. The solid graph and the dashed graph correspond to the deterministic implementation and the probabilistic one, respectively. Note that in both cases, the accuracy is measured by using the half of the diamond norm, i.e., maxΥminx12ΥΥx\max_{\Upsilon}\min_{x}\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond} and maxΥminp12Υxp(x)Υx\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x}p(x)\Upsilon_{x}\right\|_{\diamond}, respectively.

Appendix G Equivalence between quantum testers and quantum networks

Recall that the Choi-Jamiołkowski operator of linear mapping Ξ:𝐋(1)𝐋(2)\Xi:\mathbf{L}\left(\mathcal{H}_{1}\right)\rightarrow\mathbf{L}\left(\mathcal{H}_{2}\right) is defined as J(Ξ):=i,j|ij|Ξ(|ij|)𝐋(12)J(\Xi):=\sum_{i,j}|{i}\rangle\langle{j}|\otimes\Xi(|{i}\rangle\langle{j}|)\in\mathbf{L}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right), and the set of quantum testers is defined as 𝐓(1:2):={M𝐏𝐨𝐬(12):ρ𝐒(1),Mρ𝕀}\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2}):=\{M\in\mathbf{Pos}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right):\exists\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right),M\leq\rho\otimes\mathbb{I}\}. In this section, we show that the set of mappings fM:Ξtr[J(Ξ)M]f_{M}:\Xi\mapsto\text{tr}\left[J\left(\Xi\right)M\right] associated to quantum testers M𝐓(1:2)M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2}) is equivalent to that of mappings gΦ,Π:Ξtr[Ξid3(Φ)Π]g_{\Phi,\Pi}:\Xi\mapsto\text{tr}\left[\Xi\otimes id_{\mathcal{H}_{3}}(\Phi)\Pi\right] associated to pure states Φ\Phi and hermitian projectors Π\Pi for sufficiently large dimensional Hilbert space 3\mathcal{H}_{3}. Note a proof for more general quantum testers is given in [20, Theorem 10].

First, we show that for any Φ𝐏(13)\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{3}\right) and Π𝐏𝐫𝐨𝐣(23)\Pi\in\mathbf{Proj}(\mathcal{H}_{2}\otimes\mathcal{H}_{3}), there exists M𝐓(1:2)M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2}) such that fM=gΦ,Πf_{M}=g_{\Phi,\Pi} as follows: By letting M=tr3[(ΦT1𝕀2)(𝕀1Π)]M=\text{tr}_{3}\left[(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes\Pi)\right], we obtain

gΦ,Π(Ξ)=tr[Ξid3(Φ)Π]\displaystyle g_{\Phi,\Pi}(\Xi)=\text{tr}\left[\Xi\otimes id_{\mathcal{H}_{3}}(\Phi)\Pi\right] =\displaystyle= tr[(J(Ξ)𝕀3)(ΦT1𝕀2)(𝕀1Π)]=tr[J(Ξ)M]=fM(Ξ),\displaystyle\text{tr}\left[(J(\Xi)\otimes\mathbb{I}_{3})(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes\Pi)\right]=\text{tr}\left[J(\Xi)M\right]=f_{M}(\Xi), (64)

where ΦT1\Phi^{T_{1}} and tr3[]\text{tr}_{3}\left[\cdot\right] represent the partial transpose of Φ\Phi and the partial trace, and the subscript of the operator denotes the system where the operator acts on. We can also verify that M𝐓(1:2)M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2}) as follows: Let X=ijαij|j3i|1X=\sum_{ij}\alpha_{ij}|{j}\rangle_{3}\langle{i}|_{1}, where |Φ=ijαij|i1|j3|{\Phi}\rangle=\sum_{ij}\alpha_{ij}|{i}\rangle_{1}|{j}\rangle_{3} with the computational basis {|i1𝐏(1)}i\{|{i}\rangle_{1}\in\mathbf{P}\left(\mathcal{H}_{1}\right)\}_{i} and {|j3𝐏(3)}j\{|{j}\rangle_{3}\in\mathbf{P}\left(\mathcal{H}_{3}\right)\}_{j}. Then, we obtain that for any postive semidefinte operator P𝐏𝐨𝐬(12)P\in\mathbf{Pos}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right),

tr[PM]=tr[(P𝕀3)(ΦT1𝕀2)(𝕀1Π)]=tr[(X𝕀2)P(X𝕀2)Π]0,\text{tr}\left[PM\right]=\text{tr}\left[(P\otimes\mathbb{I}_{3})(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes\Pi)\right]=\text{tr}\left[(X\otimes\mathbb{I}_{2})P(X\otimes\mathbb{I}_{2})^{\dagger}\Pi\right]\geq 0, (65)

which implies M0M\geq 0. By letting ρ=tr3[ΦT1]=tr3[ΦT](𝐒(1))\rho=\text{tr}_{3}\left[\Phi^{T_{1}}\right]=\text{tr}_{3}\left[\Phi^{T}\right](\in\mathbf{S}\left(\mathcal{H}_{1}\right)), we can also verify that

ρ𝕀2M=tr3[(ΦT1𝕀2)(𝕀123𝕀1Π)]=tr3[(ΦT1𝕀2)(𝕀1Π)]0,\displaystyle\rho\otimes\mathbb{I}_{2}-M=\text{tr}_{3}\left[(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{123}-\mathbb{I}_{1}\otimes\Pi)\right]=\text{tr}_{3}\left[(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes\Pi_{\bot})\right]\geq 0, (66)

where Π𝐏𝐫𝐨𝐣(23)\Pi_{\bot}\in\mathbf{Proj}(\mathcal{H}_{2}\otimes\mathcal{H}_{3}) satisfies Π+Π=𝕀\Pi+\Pi_{\bot}=\mathbb{I}, and the last inequality can be verified by the fact that M0M\geq 0.

Next, we show that for any M𝐓(1:2)M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2}), there exist Φ𝐏(13)\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{3}\right) and Π𝐏𝐫𝐨𝐣(23)\Pi\in\mathbf{Proj}(\mathcal{H}_{2}\otimes\mathcal{H}_{3}) such that fM=gΦ,Πf_{M}=g_{\Phi,\Pi} as follows: Let Mρ1𝕀2M\leq\rho_{1}\otimes\mathbb{I}_{2}, Φ𝐏(11)\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1^{\prime}}\right) be a purification of ρ1T\rho_{1}^{T}, its singular value decomposition be |Φ=ip(i)|xi1|yi1|{\Phi}\rangle=\sum_{i}\sqrt{p(i)}|{x_{i}}\rangle_{1}|{y_{i}}\rangle_{1^{\prime}} (p(i)>0p(i)>0) and P𝐏𝐨𝐬(21)P\in\mathbf{Pos}\left(\mathcal{H}_{2}\otimes\mathcal{H}_{1^{\prime}}\right) be P=XMXP=XMX^{\dagger}, where X=i1p(i)|yi1xi|1X=\sum_{i}\frac{1}{\sqrt{p(i)}}|{y_{i}}\rangle_{1^{\prime}}\langle{x_{i}^{*}}|_{1} and |ϕ|{\phi^{*}}\rangle is the complex conjugate of |ϕ|{\phi}\rangle. Then we can verify that

fM(Ξ)=tr[J(Ξ)M]=tr[(J(Ξ)𝕀1)(ΦT1𝕀2)(𝕀1P)]=tr[Ξid1(Φ)P].f_{M}(\Xi)=\text{tr}\left[J(\Xi)M\right]=\text{tr}\left[(J(\Xi)\otimes\mathbb{I}_{1^{\prime}})(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes P)\right]=\text{tr}\left[\Xi\otimes id_{\mathcal{H}_{1^{\prime}}}(\Phi)P\right]. (67)

Since PX(ρ1𝕀2)X𝕀21P\leq X(\rho_{1}\otimes\mathbb{I}_{2})X^{\dagger}\leq\mathbb{I}_{21^{\prime}}, {P,𝕀P}\{P,\mathbb{I}-P\} is a positive operator-valued measure (POVM), which can be embedded in a larger Hilbert space 23\mathcal{H}_{2}\otimes\mathcal{H}_{3} as a projection-valued measure (PVM) {Π,Π}\{\Pi,\Pi_{\bot}\} owing to the Naimark’s extension. This completes the proof.