Quadratic improvement on accuracy of approximating pure quantum states and unitary gates by probabilistic implementation

Seiseki Akibue [email protected] Go Kato [email protected] Seiichiro Tani [email protected] NTT Communication Science Labs., NTT Corporation.
3–1, Morinosato-Wakamiya, Atsugi, Kanagawa 243-0198, Japan

Abstract

Pure quantum states are often approximately encoded as classical bit strings such as those representing probability amplitudes and those describing circuits that generate the quantum states. The crucial quantity is the minimum length of classical bit strings from which the original pure states are approximately reconstructible. We derive asymptotically tight bounds on the minimum bit length required for probabilistic encodings with which one can approximately reconstruct the original pure state as an ensemble of the quantum states encoded in classical strings. We also show that such a probabilistic encoding asymptotically halves the bit length required for “deterministic” ones. This is based on the fact that the accuracy of approximating pure states by using a given subset of pure states can be increased quadratically if we use ensembles of pure states in the subset. Moreover, we show that a similar fact holds when we consider the approximation of unitary gates by using a given subset of unitary gates. This improves the reduction rate of the circuit size by using probabilistic circuit synthesis compared to previous results. This also demonstrates that the reduction is possible even for low-accuracy circuit synthesis, which might improve the accuracy of various NISQ algorithms.

discrete geometry of the quantum state, high dimensional quantum states, numerical simulation

^†^†preprint: APS/123-QED

I Introduction

Pure quantum states are often approximately encoded in classical states in various quantum information processing tasks, such as classical bit strings storing probability amplitudes of pure states in the classical simulation of quantum circuits; classical data obtained by measurement in state tomography or state estimation; classical descriptions of quantum circuits that generate target pure states in quantum circuit synthesis [1, 2]. Recently, compact classical encodings from which one can predict probability distributions on outcomes obtained by measurements in certain classes have been developed [3, 4, 5].

The key issue is the minimum encoding. Here, we investigate it in terms of the minimum length $n$ of classical bit strings, with which one can approximately achieve any information processing task achievable with the original pure state on a $d$ -dimensional system. That is, from the classical strings, one can construct a quantum state $\hat{\rho}$ that is indistinguishable from the original state $\phi$ within a certain accuracy by any measurements. In addition to deterministic encodings, which deterministically associate $\phi$ with a classical bit string, we consider probabilistic encodings, which associate $\phi$ with one of multiple classical strings according to some distribution. That is, in probabilistic encodings, $\phi$ is associated with an ensemble of classical strings from which $\hat{\rho}$ is constructed as an ensemble of the quantum states encoded in the classical strings.

Besides providing fundamental limits for information processing tasks using classical encodings, the minimum length $n$ is a fundamental quantity in various theoretical subjects including communication complexity, computational complexity, and asymptotic geometric analysis. Indeed, in deterministic encodings, classical strings do nothing but encode elements in an $\epsilon$ -covering (sometimes called an $\epsilon$ -net) of the set of pure states. Thus, the minimum bit length $n$ is the logarithm of the minimum size of $\epsilon$ -coverings (called the covering number). Due to its prominent role in algorithm design and asymptotic geometric analysis, the covering number has been well-studied [6, 7, 8], and it is known that $n=O(d)$ bits are enough to encode an $\epsilon$ -covering. On the other hand, a particular task in communication complexity called distributed quantum sampling, which aims to classically transmit a pure state so as to approximately sample outcomes of arbitrary quantum measurement, provides a lower bound on the minimum length required for probabilistic encodings as $n=\Omega(d)$ [9]. Taking the two known facts into account, it seems that the minimum probabilistic encoding can be realized by a deterministic one. This intuition is supported by the fact that an ensemble of deterministic classical states (called a probabilistic classical state) represents our lack of knowledge about a classical system while a pure state represents our maximum knowledge about a quantum system.

Contrary to this intuition, we show that the minimum length $n$ required for probabilistic encodings is exactly half of the one required for deterministic encodings in the asymptotic limits of the dimension or accuracy. Thus, the minimum encoding must associate some pure states with ensembles of classical strings describing distinct quantum states, which may be counter-intuitive, considering that pure states themselves are not probabilistic mixtures of distinct quantum states. The excessive bits required for deterministic encodings can be interpreted as a consequence of their excessive predictive capability such that they can not only reconstruct quantum states within a certain accuracy but also deterministically compute the probability distribution of any measurements within the same accuracy. Such a deterministic computation is impossible by using either the minimum (probabilistic) encoding or the original quantum states.

The bit length reduction by using probabilistic encodings follows from our refined estimation of the covering number and the following fact we prove: for any finite set $\mathcal{A}$ of pure states, it is possible to quadratically increase the accuracy of approximating arbitrary pure states by using ensembles of pure states in $\mathcal{A}$ . Moreover, we show that a similar fact holds when we consider the approximation of unitary gates by using a finite set of unitary gates. Recently, it has been found that when we approximately implement arbitrary unitary gates by using a gate sequence over a finite universal gate set (called circuit synthesis), the length of the gate sequence or the number of $T$ gates can be reduced by using ensembles of gate sequences for high-accuracy circuit synthesis [10, 11]. Our result improves the reduction rate of the previous results and shows that the reduction is possible even for low-accuracy circuit synthesis, which might improve the accuracy of NISQ algorithms.

II Preliminaries

In this section, we summarize basic notations used throughout the paper. Note that we consider only finite-dimensional Hilbert spaces. In particular, two-dimensional Hilbert space $\mathbb{C}^{2}$ is called a qubit. $\mathbf{L}\left(\mathcal{H}\right)$ and $\mathbf{Pos}\left(\mathcal{H}\right)$ represent the set of linear operators and positive semidefinite operators on Hilbert space $\mathcal{H}$ , respectively. $\mathbf{S}\left(\mathcal{H}\right):=\left\{\rho\in\mathbf{Pos}\left(\mathcal{H}\right):\text{tr}\left[\rho\right]=1\right\}$ and $\mathbf{P}\left(\mathcal{H}\right):=\left\{\rho\in\mathbf{S}\left(\mathcal{H}\right):\text{tr}\left[\rho^{2}\right]=1\right\}$ represent the set of quantum states and that of pure states, respectively. Pure state $\phi\in\mathbf{P}\left(\mathcal{H}\right)$ is sometimes alternatively represented by complex unit vector $|{\phi}\rangle\in\mathcal{H}$ satisfying $\phi=|{\phi}\rangle\langle{\phi}|$ . Any physical transformation of the quantum state can be represented by a completely positive and trace preserving (CPTP) linear mapping $\Gamma:\mathbf{L}\left(\mathcal{H}\right)\rightarrow\mathbf{L}\left(\mathcal{H}^{\prime}\right)$ .

The trace distance $\left\|\rho-\sigma\right\|_{\text{tr}}$ of two quantum states $\rho,\sigma\in\mathbf{S}\left(\mathcal{H}\right)$ is defined as $\left\|M\right\|_{\text{tr}}:=\frac{1}{2}\text{tr}\left[\sqrt{MM^{\dagger}}\right]$ for $M\in\mathbf{L}\left(\mathcal{H}\right)$ . It represents the maximum total variation distance between probability distributions obtained by measurements performed on two quantum states. A similar notion measuring the distinguishability of $\rho$ and $\sigma$ is the fidelity function, defined by $F\left(\rho,\sigma\right):=\max\text{tr}\left[\Phi^{\rho}\Phi^{\sigma}\right]$ , where $\Phi^{\rho}\in\mathbf{P}\left(\mathcal{H}\otimes\mathcal{H}^{\prime}\right)$ is a purification of $\rho$ , i.e., $\rho=\text{tr}_{\mathcal{H}^{\prime}}\left[\Phi^{\rho}\right]$ , and the maximization is taken over all the purifications. Fuchs-van de Graaf inequalities [12] provide relationships between the two measures with respect to the distinguishability as follows:

1-\sqrt{F\left(\rho,\sigma\right)}\leq\left\|\rho-\sigma\right\|_{\text{tr}}\leq\sqrt{1-F\left(\rho,\sigma\right)}

(1)

holds for any states $\rho,\sigma\in\mathbf{S}\left(\mathcal{H}\right)$ , where the equality of the right inequality holds when $\rho$ and $\sigma$ are pure.

III Classical encoding of pure states

Refer to caption — Figure 1: Probabilistic encoding of pure state $\phi$ on a $d$ -dimensional system using $n$ -bit strings and physical transformation $\Gamma$ corresponding to a decoder of the bit strings to quantum states. State $\phi$ is randomly encoded in label $x$ in finite set $X$ according to probability distribution $p_{\phi}:X\rightarrow[0,1]$ , where $n=\lceil\log_{2}|X|\rceil$ .

The existence of a probabilistic encoding is equivalent to the existence of physical transformation $\Gamma$ that can approximately reconstruct arbitrary pure state $\phi$ from probabilistic classical state $\{(p_{\phi}(x),x)\}_{x\in X}$ as shown in Fig. 1. Physical transformation $\Gamma$ can be regarded as a decoder of the classical states to quantum states, which outputs mixed state $\rho_{x}\in\mathbf{S}\left(\mathbb{C}^{d}\right)$ when label $x\in X$ is inputted. Formally, $\Gamma$ is represented by a classical-quantum channel [13], which is defined as $\Gamma(\sigma)=\sum_{x\in X}\langle{x}|\sigma|{x}\rangle\rho_{x}$ , where $\left\{|{x}\rangle\in\mathbb{C}^{|X|}\right\}_{x\in X}$ is an orthonormal basis. We require that with the output state $\hat{\rho}:=\sum_{x}p_{\phi}(x)\rho_{x}$ of $\Gamma$ , one can approximately sample any measurement outcomes performed on $\phi$ within total variation distance $\epsilon$ , i.e., $\left\|\phi-\hat{\rho}\right\|_{\text{tr}}<\epsilon$ for given $\epsilon\in(0,1]$ . Thus, we say that a probabilistic encoding of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ with accuracy $\epsilon$ exists if and only if there exists set $\{\rho_{x}\in\mathbf{S}\left(\mathbb{C}^{d}\right)\}_{x\in X}$ of quantum states satisfying

\max_{\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)}\min_{p}\left\|\phi-\sum_{x\in X}p(x)\rho_{x}\right\|_{\text{tr}}<\epsilon,

(2)

where the minimum is taken over probability distribution $p$ over $X$ , i.e, $\sum_{x\in X}p(x)=1$ and $p(x)\geq 0$ . Note that since any mixed state is a probabilistic mixture of pure states and trace distance is convex, Eq. (2) also guarantees that an arbitrary mixed state is also approximately reconstructible within the same accuracy as pure states.

III.1 Minimum deterministic encoding

First, we consider deterministic encodings, which can be defined as particular probabilistic encodings. Concretely, every pure state $\phi$ is encoded into a single label $x_{\phi}\in X$ , which implies the output state of $\Gamma$ is $\hat{\rho}=\rho_{x_{\phi}}$ . Thus, we say that a deterministic encoding of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ with accuracy $\epsilon$ exists if and only if there exists set $\{\rho_{x}\in\mathbf{S}\left(\mathbb{C}^{d}\right)\}_{x\in X}$ of quantum states satisfying

\max_{\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)}\min_{x\in X}\left\|\phi-\rho_{x}\right\|_{\text{tr}}<\epsilon,

(3)

which is called an external $\epsilon$ -covering of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ . A set of pure states $\{\rho_{x}\in\mathbf{P}\left(\mathbb{C}^{d}\right)\}_{x\in X}$ satisfying Eq. (3) is called an internal $\epsilon$ -covering of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ , which corresponds to particular deterministic encodings such as one storing probability amplitudes approximately representing $\phi$ . The minimum size of internal (or external) $\epsilon$ -coverings is called the internal (or external) covering number and denoted by $I_{in}$ (or $I_{ex}$ .) Note that $I_{ex}\leq I_{in}$ by definition and the minimum bit length $n$ required for deterministic encodings equals to $\lceil\log_{2}I_{ex}\rceil$ .

Since the condition of the $\epsilon$ -coverings in Eq. (3) is equivalent to that for the set of $\epsilon$ -balls $\left\{B_{\epsilon}\left(\rho_{x}\right):=\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\left\|\psi-\rho_{x}\right\|_{\text{tr}}<\epsilon\right\}\right\}_{x}$ to cover $\mathbf{P}\left(\mathbb{C}^{d}\right)$ , a detailed analysis of the volume of the $\epsilon$ -ball provides a good estimation of the covering numbers. As shown in Appendix A, the volume can be calculated as $\mu(B_{\epsilon}\left(\phi\right))=\epsilon^{2(d-1)}$ with respect to the unitarily invariant probability measure $\mu$ for any $\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)$ . This directly provides a lower bound on $I_{in}$ and also its upper bound by applying the method of constructing an internal $\epsilon$ -covering developed in [8]. We obtain the following estimation of $I_{in}$ , which is tighter than previous estimations [6, 7] in large dimensions. For completeness, we provide a construction of an internal $\epsilon$ -covering and an estimation of the parameters appearing in the construction in Appendix B.

Lemma 1.

For any $\epsilon\in(0,1]$ and a positive integer $d\in\mathbb{N}$ specified below, the internal covering number $I_{in}$ of internal $\epsilon$ -coverings of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ is bounded as follows: For any $r>2$ , there exists $d_{0}\in\mathbb{N}$ such that

\left(\frac{1}{\epsilon}\right)^{2(d-1)}\leq I_{in}\leq rd(\ln d)\left(\frac{1}{\epsilon}\right)^{2(d-1)},

(4)

where the left inequality holds for any $d\in\mathbb{N}$ , and the right inequality holds for any $d\geq d_{0}$ . For example, if $r=5$ , we can set $d_{0}=2$ .

To obtain a lower bound on $I_{ex}$ , we use the following upper bounds on the volume of the $\epsilon$ -ball as shown in Appendix C:

\forall\epsilon\in\left(0,\frac{1}{2}\right],\mu(B_{\epsilon}\left(\rho\right))\leq\left\{\begin{array}[]{l}(2\epsilon)^{2(d-1)}\ \ \text{for}\ 3\geq d\geq 1\\ \epsilon^{2(d-1)}\ \ \ \ \ \text{for}\ d\geq 4.\end{array}\right.

(5)

This bound and $\mu(B_{\epsilon}\left(\phi\right))=\epsilon^{2(d-1)}$ imply that the volume of the $\epsilon$ -ball can be maximized by setting its center as a pure state if $d\geq 4$ , which is contrary to what happens in a qubit ( $d=2$ ), where $B_{\epsilon}\left(\rho\right)$ corresponds to the intersection of the Bloch sphere and a ball centered at $\rho$ and the intersection is maximized not by a ball centered at a point on the Bloch sphere but by a ball centered at a point inside the Bloch ball. The qubit case also implies that the condition $d\geq 4$ for the second inequality cannot be fully relaxed. $\mu(B_{\epsilon}\left(\sigma\right))=1$ if $\epsilon>1-\frac{1}{d}$ with the maximally mixed state $\sigma=\frac{1}{d}\mathbb{I}$ implies another condition $\epsilon\in\left(0,\frac{1}{2}\right]$ is also not fully removable. By using Eq. (5), we easily obtain the following lower bound on $I_{ex}$ .

Lemma 2.

For any $\epsilon\in\left(0,\frac{1}{2}\right]$ and a positive integer $d\in\mathbb{N}$ specified below, the external covering number $I_{ex}$ of external $\epsilon$ -coverings of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ is bounded as follows:

I_{ex}\geq\left\{\begin{array}[]{l}\left(\frac{1}{2\epsilon}\right)^{2(d-1)}\ \ \text{for}\ 3\geq d\geq 1\\ \left(\frac{1}{\epsilon}\right)^{2(d-1)}\ \ \ \text{for}\ d\geq 4.\end{array}\right.

(6)

Using the two lemmas by setting $r=5$ , we obtain the following theorem straightforwardly.

Theorem 1.

For any $\epsilon\in\left(0,\frac{1}{2}\right]$ and an integer $d\geq 2$ specified below, the minimum size of label set $X$ used over all deterministic encodings of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ with accuracy $\epsilon$ is bounded by

2\cdot l(d,2\epsilon)\leq\log_{2}|X|\leq 2\cdot l(d,\epsilon)+\log_{2}(5d\ln d),

(7)

where $l(d,\epsilon):=\left(d-1\right)\log_{2}\left(\frac{1}{\epsilon}\right)$ . Moreover, if $d\geq 4$ , the lower bound can be strengthened as $2\cdot l(d,\epsilon)\leq\log_{2}|X|$ .

Using Theorem 1 and $n=\lceil\log_{2}|X|\rceil$ , we obtain the asymptotic bit rate per dimension $\lim_{d\rightarrow\infty}\frac{n}{d}=2\log_{2}\left(\frac{1}{\epsilon}\right)$ of the minimum deterministic encoding for fixed $\epsilon\in\left(0,\frac{1}{2}\right]$ . We can also obtain the asymptotic bit rate per accuracy $\lim_{\epsilon\rightarrow 0}\frac{n}{-\log_{2}\epsilon}=2(d-1)$ of the minimum deterministic encoding for fixed $d\geq 2$ .

III.2 Minimum probabilistic encoding

We prove the existence of a probabilistic encoding that achieves exactly half the asymptotic bit length required for the minimum deterministic encoding, and its optimality. The main tool for the proof is the following minimax relationship between the fidelity and the trace distance.

Lemma 3.

For any CPTP linear mapping $\Lambda:\mathbf{L}\left(\mathcal{H}^{\prime}\right)\rightarrow\mathbf{L}\left(\mathcal{H}\right)$ , it holds that

	$\displaystyle\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\min_{\sigma\in\mathbf{S}\left(\mathcal{H}^{\prime}\right)}\left\\|\phi-\Lambda(\sigma)\right\\|_{\text{tr}}$
	$\displaystyle=1-\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{\psi\in\mathbf{P}\left(\mathcal{H}^{\prime}\right)}F\left(\Lambda(\psi),\phi\right).$		(8)

Proof.

We use the minimax theorem as follows:

	$\displaystyle(L.H.S.)$
	$\displaystyle=\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\min_{\sigma\in\mathbf{S}\left(\mathcal{H}^{\prime}\right)}\max_{0\leq M\leq\mathbb{I}}\text{tr}\left[M(\phi-\Lambda(\sigma))\right]$
	$\displaystyle=\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{0\leq M\leq\mathbb{I}}\min_{\sigma\in\mathbf{S}\left(\mathcal{H}^{\prime}\right)}\text{tr}\left[M(\phi-\Lambda(\sigma))\right]$
	$\displaystyle=\max_{0\leq M\leq\mathbb{I}}\left(\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\text{tr}\left[M\phi\right]-\max_{\psi\in\mathbf{P}\left(\mathcal{H}^{\prime}\right)}\text{tr}\left[M\Lambda(\psi)\right]\right)$
	$\displaystyle=(R.H.S.)$		(9)

Note that the minimax theorem, used in the second equation, is applicable since $f(\sigma,M):=\text{tr}\left[M(\phi-\Lambda(\sigma))\right]$ is affine with respect to each variable and the domain of $M$ and $\sigma$ are compact and convex. The last equality holds since the maximum is achieved if $\text{rank}M=1$ . ∎

As a special case of Lemma 3, we obtain the following lemma about the relationship of the accuracy of approximating pure states by a finite number of pure states and that by their ensembles:

Corollary 1.

Let $\{\phi_{x}\in\mathbf{P}\left(\mathcal{H}\right)\}_{x\in X}$ be a finite set of pure states. Then, it holds that

\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\min_{p}\left\|\phi-\sum_{x\in X}p(x)\phi_{x}\right\|_{\text{tr}}=\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\min_{x\in X}\left\|\phi-\phi_{x}\right\|_{\text{tr}}^{2},

(10)

where the first minimum is taken over probability distribution $p$ over $X$ .

Proof.

Suppose $\Lambda$ in Lemma 3 is a classical-quantum channel such that $\Lambda(\sigma)=\sum_{x\in X}\langle{x}|\sigma|{x}\rangle\phi_{x}$ , which corresponds to a particular decoder for a classical encoding that outputs pure state $\phi_{x}\in\mathbf{P}\left(\mathcal{H}\right)$ when label $x$ is inputted. Then, L.H.S. of Eq. (3) is equal to that of Eq. (10) by interchanging $\langle{x}|\sigma|{x}\rangle$ and $p(x)$ . and R.H.S. of Eq. (3) is equal to

	$\displaystyle 1-\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{p}F\left(\sum_{x\in X}p(x)\phi_{x},\phi\right)$
	$\displaystyle=1-\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{p}\sum_{x\in X}p(x)F\left(\phi_{x},\phi\right)$
	$\displaystyle=1-\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{x\in X}F\left(\phi_{x},\phi\right),$		(11)

which is equal to R.H.S. of Eq. (10) since the equality of the second inequlity in Eq. (1) holds when $\rho$ and $\sigma$ are pure. ∎

This corollary implies that an internal $\sqrt{\epsilon}$ -covering is sufficient to approximate arbitrary pure state within accuracy $\epsilon$ by using its probabilistic mixture. This can be intuitively understood by the curvature of the sphere as illustrated in Fig. 2. Indeed, Corollary 1 (for $\mathcal{H}=\mathbb{C}^{2}$ ) and the Bloch representation imply that for any compact and convex set $K$ whose extreme points $\text{ext}\left(K\right)$ reside on sphere $S$ with radius $\frac{1}{2}$ , $\delta=\sqrt{\epsilon}$ holds, where $\epsilon$ and $\delta$ are the distance between $K$ and the farthest point on $S$ from $K$ and that between $\text{ext}\left(K\right)$ and the farthest point on $S$ from $\text{ext}\left(K\right)$ , respectively. This can also be derived from elementary geometric observations.

By using Lemma 3 and Corollary 1, we can derive following asymptotically tight bounds on the minimum bit length required for probabilistic encodings:

Theorem 2.

For any $\epsilon\in\left(0,1\right]$ and an integer $d\geq 2$ , the minimum size of label set $X$ used over all probabilistic encodings of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ with accuracy $\epsilon$ is bounded by

l(d,\epsilon)-\log_{2}d\leq\log_{2}|X|\leq l(d,\epsilon)+\log_{2}(5d\ln d),

(12)

where $l(d,\epsilon):=\left(d-1\right)\log_{2}\left(\frac{1}{\epsilon}\right)$ .

Proof.

When $\{\phi_{x}\in\mathbf{P}\left(\mathbb{C}^{d}\right)\}_{x\in X}$ is an internal $\sqrt{\epsilon}$ -covering, Corollary 1 implies $\{\phi_{x}\}_{x\in X}$ satisfies Eq. (2). Thus, there exists a probabilistic encoding of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ with accuracy $\epsilon$ and label set $X$ , whose size is upper bounded by using Lemma 1 with setting $r=5$ .

Next, we show the lower bound. Let $\{\rho_{x}\in\mathbf{S}\left(\mathbb{C}^{d}\right)\}_{x\in X}$ satisfy Eq. (2). By using Lemma 3 and setting $\Lambda$ a classical-quantum channel such that $\Lambda(\sigma)=\sum_{x\in X}\langle{x}|\sigma|{x}\rangle\rho_{x}$ as in the proof of Corollary 1, we obtain

	$\displaystyle 1-\epsilon$	$\displaystyle<$	$\displaystyle\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{p}\sum_{x\in X}p(x)F\left(\rho_{x},\phi\right)$		(13)
		$\displaystyle=$	$\displaystyle\min_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\max_{x\in X}F\left(\rho_{x},\phi\right).$		(13)

By letting $\rho_{x}=\sum_{i=1}^{d}p(i|x)\phi_{i|x}$ , we obtain that for any $\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)$ , there exists $i$ and $x$ such that

	$\displaystyle 1-\epsilon$	$\displaystyle<$	$\displaystyle F\left(\rho_{x},\phi\right)=\sum_{j=1}^{d}p(j\|x)F\left(\phi_{j\|x},\phi\right)$		(14)
		$\displaystyle\leq$	$\displaystyle F\left(\phi_{i\|x},\phi\right)=1-\left\\|\phi-\phi_{i\|x}\right\\|_{\text{tr}}^{2}.$		(14)

Thus, $\{\phi_{i|x}\}_{i,x}$ is an internal $\sqrt{\epsilon}$ -covering of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ . Hence, the lower bound can be obtained by applying Lemma 1 as $|X|d\geq\left(\frac{1}{\sqrt{\epsilon}}\right)^{2(d-1)}$ . ∎

Using Theorem 2 and $n=\lceil\log_{2}|X|\rceil$ , we obtain the asymptotic bit rate per dimension $\lim_{d\rightarrow\infty}\frac{n}{d}=\log_{2}\left(\frac{1}{\epsilon}\right)$ of the minimum probabilistic encoding for fixed $\epsilon\in\left(0,1\right]$ , and one per accuracy $\lim_{\epsilon\rightarrow 0}\frac{n}{-\log_{2}\epsilon}=d-1$ of the minimum probabilistic encoding for fixed $d\geq 2$ , which are exactly half of those of the minimum deterministic encoding.

IV Probabilistic circuit synthesis

In the context of circuit synthesis using a finite universal gate set, it has recently been found that the length of the gate sequence or the number of $T$ gates can be reduced by using ensembles of gate sequences [10, 11]. This is based on the so-called mixing lemma, which shows that if finite set $\{\Upsilon_{x}\}_{x}$ of unitary transformations approximates arbitrary unitary transformations within sufficient high accuracy, it is possible to increase the accuracy by using particular ensembles $\sum_{x}p(x)\Upsilon_{x}$ .

Based on Corollary 1, where we derive the accuracy achieved by the optimal ensembles of pure states in approximating arbitrary pure states, we can derive bounds on the accuracy achieved by the optimal ensembles of unitary transformations in approximating arbitrary unitary transformations in the following theorem. Note that similarly to [10, 11], we measure the accuracy of the approximation by using the diamond norm (sometimes called the completely bounded trace norm) of hermitian preserving linear mappings, defined as

\frac{1}{2}\left\|\Xi\right\|_{\diamond}:=\max_{\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1}^{\prime}\right)}\left\|\Xi\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi)\right\|_{\text{tr}},

(15)

where $\Xi:\mathbf{L}\left(\mathcal{H}_{1}\right)\rightarrow\mathbf{L}\left(\mathcal{H}_{2}\right)$ , $id_{\mathcal{H}}$ is the identity mapping on $\mathbf{L}\left(\mathcal{H}\right)$ and $\dim\mathcal{H}_{1}^{\prime}=\dim\mathcal{H}_{1}$ .

Theorem 3.

For an integer $d\geq 2$ specified below, let $\{\Upsilon_{x}\}_{x\in X}$ be a finite set of unitary transformations on $\mathbf{L}\left(\mathbb{C}^{d}\right)$ . Then, it holds that

	$\displaystyle\frac{2}{d}\alpha\leq\max_{\Upsilon}\min_{p}\frac{1}{2}\left\\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\\|_{\diamond}\leq\alpha,$		(16)
	$\displaystyle\alpha=\max_{\Upsilon}\min_{x\in X}\left(\frac{1}{2}\left\\|\Upsilon-\Upsilon_{x}\right\\|_{\diamond}\right)^{2},$		(17)

where the first minimum is taken over probability distribution $p$ over $X$ . Note that if $d=2$ , the equalities hold.

This theorem resembles Corollary 1 (especially when $d=2$ ), which can be regarded as a consequence of the similarity between pure states and unitary transformations via the Choi-Jamiołkowski representation. Although a proof uses the minimax theorem as in the proof of Corollary 1, it requires additional work to some extent. We give a complete proof in Appendix D with the sharp lower bound.

This theorem implies that quantum circuit synthesis using probabilistically generated circuits formed from finite universal gate set $\mathcal{C}=\{g_{1},g_{2},\cdots\}$ can reduce the circuit size. To see this, first, let $\{\Upsilon_{x}^{(n)}\}_{x}$ be the set of unitary transformations representing the unitary circuit realized by gate sequences $g_{i_{1}}\cdots g_{i_{n}}$ of length at most $n$ . Next, let $n(\epsilon)$ be the smallest length of gate sequences to approximate arbitrary unitary transformations within accuracy $\epsilon$ , i.e., $n(\epsilon):=\min\{n\in\mathbb{N}:\max_{\Upsilon}\min_{x}\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}^{(n)}\right\|_{\diamond}<\epsilon\}$ . Theorem 3 implies that by using the probabilistic implementation, we can implement arbitrary unitary transformation within accuracy $\epsilon$ only with an $n(\sqrt{\epsilon})$ -size circuit.

The accuracy in approximating unitary transformations is often measured by using the operator norm $\left\|X\right\|_{\infty}:=\max_{\phi\in\mathbf{P}\left(\mathcal{H}\right)}\left\|X|{\phi}\rangle\right\|_{2}$ . The celebrated Solovay-Kitaev theorem shows that for any finite universal gate set $\mathcal{C}$ , set $\{U_{x}^{(n)}\}_{x}$ of unitary circuits realized by gate sequences of length $n=O\left(\log^{c}\left(\frac{1}{\delta}\right)\right)$ ( $c\geq 1$ ) is sufficient for approximating arbitrary unitary operators, i.e., $\max_{U}\min_{x}\left\|U-U_{x}^{(n)}\right\|_{\infty}<\delta$ , where we denote a unitary circuit and the unitary operator representing the circuit by the same symbol $U_{x}^{(n)}$ . On the other hand, a relationship between the operator norm and the diamond norm shown in Appendix E implies that $\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}<\delta\sqrt{4-\delta^{2}}$ if $\left\|U-U_{x}\right\|_{\infty}<\delta$ , where $\Upsilon(\rho)=U\rho U^{\dagger}$ and $\Upsilon_{x}(\rho)=U_{x}\rho U_{x}^{\dagger}$ . Combining with Theorem 3, we can verify that arbitrary unitary transformations can be approximated by ensembles of $\{\Upsilon_{x}\}_{x}$ such that

\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x}p(x)\Upsilon_{x}\right\|_{\diamond}<\delta^{2}\left(1-\left(\frac{\delta}{2}\right)^{2}\right)

(18)

if $\max_{U}\min_{x}\left\|U-U_{x}\right\|_{\infty}<\delta$ . This bound is tighter than the previous bound obtained in [10, Theorem 1], $\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x}p(x)\Upsilon_{x}\right\|_{\diamond}<5\delta^{2}$ . Moreover, our bound holds if $\delta\leq\sqrt{2}$ while the previous bound was shown to hold for $\delta<0.01$ . In addition to the improved estimation, our lower bound reveals the limitation of the probabilistic implemetation.

As suggested by the Solovay-Kitaev theorem, if we assume $n(\epsilon)\sim a\log^{c}\left(\frac{1}{\epsilon}\right)$ in the high-accuracy regime ( $\epsilon\ll 1$ ), the probabilistic implementation reduces the length of gate sequences by about $1-\left(\frac{1}{2}\right)^{c}\geq 50\%$ since $c\geq 1$ from a volume consideration. Even in the low-accuracy regime, our bound guarantees the reduction. For example, we show how the gate length can be reduced by using the probabilistic implementation to synthesize single-qubit unitary transformations with gate set $\{S,H,T\}$ in Appendix F.

V Conclusion

In this paper, we have considered the minimum probabilistic encoding so as to approximately reconstruct an arbitrary pure state $\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)$ from an $n$ -bit string within accuracy $\epsilon$ with respect to the trace distance. We then demonstrated that it cannot be realized by simply storing an element of the minimum $\epsilon$ -covering. More precisely, we proved that the bit rate required for probabilistic encodings is exactly half of that of the minimum length of bits necessary to store elements of an $\epsilon$ -covering of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ in asymptotic limit $\epsilon\rightarrow 0$ or in limit $d\rightarrow\infty$ when $\epsilon\in\left(0,\frac{1}{2}\right]$ . In limit $d\rightarrow\infty$ when $\epsilon\in\left(\frac{1}{2},1\right]$ , the same result holds if we consider only internal $\epsilon$ -coverings; however, in general, whether the same result holds or not is an open problem. Several numerical calculations suggest a positive answer.

Moreover, we show that similarly to the state encoding, for any finite set $\{\Upsilon_{x}\}_{x}$ of unitary transformations, we can at least quadratically increase the accuracy of approximating arbitrary unitary transformations by using ensembles of $\{\Upsilon_{x}\}_{x}$ . Particularly, we obtain bounds on the accuracy when one uses the optimal ensembles to reveal the possibility and the limitation of the probabilistic implementation of unitary transformations.

Our result could provide a new quantitative guiding principle to explore further capabilities and limitations of manipulating a quantum system as well as the foundations of quantum theory, including the following two related topics:

1.

The results demonstrate an information-theoretical separation of the memory size to store a pure quantum state between strong simulations and weak ones, which are two types of classical simulation of a quantum computer [14, 15, 16] (the former approximately computes the probability distribution over the outcomes, whereas the latter only approximately samples the outcomes.)
2.

The complex projective space representation of pure states can be regarded as a classical encoding of pure states in deterministic classical states describing operators in $\mathbf{P}\left(\mathcal{H}\right)$ . The fact that any distinct pure states are encoded indistinguishable classical states inclines us to think that the indistinguishability of non-orthogonal pure states results from our limited ability to measure them. To interpret the indistinguishability as an intrinsic feature of pure states, classical encodings of pure states in probabilistic classical states have been constructed in $\psi$ -epistemic models [17, 18], in which the encodings use indistinguishable and probabilistic classical states to encode some distinct pure states. Our results show that indistinguishable classical states encoding distinct elements in an $\epsilon$ -covering are not only helpful for such an interpretation but also necessary for the minimum probabilistic encoding.

Acknowledgements.

We thank Yuki Takeuchi, Yasunari Suzuki, Yasuhiro Takahashi, and Adel Sohbi for helpful discussions. This work was partially supported by JST Moonshot R&D MILLENNIA Program (Grant No.JPMJMS2061). SA was partially supported by JST, PRESTO Grant No.JPMJPR2111 and JPMXS0120319794. GK was supported in part by the Grant-in-Aid for Scientific Research (C) No.20K03779, (C) No.21K03388, and (S) No.18H05237 of JSPS, CREST (Japan Science and Technology Agency) Grant No.JPMJCR1671. ST was partially supported by the Grant-in-Aid for Transformative Research Areas No.JP20H05966 of JSPS.

References

Shende et al. [2006] V. Shende, S. Bullock, and I. Markov, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25, 1000 (2006).
Brandão et al. [2021] F. G. Brandão, W. Chemissany, N. Hunter-Jones, R. Kueng, and J. Preskill, PRX Quantum 2, 030316 (2021).
Huang et al. [2020] H.-Y. Huang, R. Kueng, and J. Preskill, Nature Physics 16, 1050 (2020).
Raz [1999] R. Raz, in Proceedings of the 31st Annual ACM Symposium on Theory of Computing, STOC ’99 (Association for Computing Machinery, New York, NY, USA, 1999) pp. 358–367.
Gosset and Smolin [2019] D. Gosset and J. Smolin, in Proceedings of the 14th Conference on the Theory of Quantum Computation, Communication and Cryptography (2019) pp. 8:1–8:9.
Hayden et al. [2006] P. Hayden, D. W. Leung, and A. Winter, Communications in Mathematical Physics 265, 95 (2006), arXiv:0407049 [quant-ph] .
Gavinsky and Ito [2013] D. Gavinsky and T. Ito, Quantum Info. Comput. 13, 583 (2013).
Guralnick and Sudakov [2017] R. Guralnick and B. Sudakov, Alice and Bob Meet Banach (American Mathematical Society, 2017).
Montanaro [2019] A. Montanaro, Quantum 3, 154 (2019).
Campbell [2017] E. Campbell, Phys. Rev. A 95, 042306 (2017).
Hastings [2017] M. B. Hastings, Quantum Info. Comput. 17, 488 (2017).
Fuchs and van de Graaf [1999] C. Fuchs and J. van de Graaf, IEEE Transactions on Information Theory 45, 1216 (1999).
Horodecki et al. [2003] M. Horodecki, P. W. Shor, and M. B. Ruskai, Reviews in Mathematical Physics 15, 629 (2003), https://doi.org/10.1142/S0129055X03001709 .
Van Den Nes [2010] M. Van Den Nes, Quantum Info. Comput. 10, 258 (2010).
Bravyi et al. [2019] S. Bravyi, D. Browne, P. Calpin, E. Campbell, D. Gosset, and M. Howard, Quantum 3, 181 (2019).
Hillmich et al. [2020] S. Hillmich, I. L. Markov, and R. Wille, in 2020 57th ACM/IEEE Design Automation Conference (DAC) (2020) pp. 1–6.
Lewis et al. [2012] P. G. Lewis, D. Jennings, J. Barrett, and T. Rudolph, Phys. Rev. Lett. 109, 150404 (2012).
Aaronson et al. [2013] S. Aaronson, A. Bouland, L. Chua, and G. Lowther, Phys. Rev. A 88, 032111 (2013).
Gutoski and Watrous [2007] G. Gutoski and J. Watrous, in Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing, STOC ’07 (Association for Computing Machinery, New York, NY, USA, 2007) pp. 565–574.
Chiribella et al. [2009] G. Chiribella, G. M. D’Ariano, and P. Perinotti, Phys. Rev. A 80, 022339 (2009).

Appendix A Volume of $\epsilon$ -ball in $\mathbf{P}\left(\mathbb{C}^{d}\right)$

To construct an $\epsilon$ -covering, we first derive the volume of $\epsilon$ -ball $B_{\epsilon}\left(\phi\right):=\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\left\|\psi-\phi\right\|_{\text{tr}}<\epsilon\right\}$ in $\mathbf{P}\left(\mathbb{C}^{d}\right)$ as follows:

\forall d\in\mathbb{N},\forall\epsilon\in(0,1],\forall\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right),\mu(B_{\epsilon}\left(\phi\right))=\epsilon^{2(d-1)},

(19)

where $\mu$ is the unitarily invariant probability measure on the Borel sets of $\mathbf{P}\left(\mathbb{C}^{d}\right)$ .

When $d=1$ , Eq. (19) holds. By assuming $d\geq 2$ , we proceed as follows:

	$\displaystyle\mu(B_{\epsilon}\left(\phi\right))$
	$\displaystyle=\mu\left(\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\left\\|\|{0}\rangle\langle{0}\|-\psi\right\\|_{\text{tr}}<\epsilon\right\}\right)$
	$\displaystyle=\mu\left(\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\|\langle{0}\|{\psi}\rangle\|^{2}>1-\epsilon^{2}\right\}\right)$
	$\displaystyle=\xi\left(\left\{\vec{x}\in\mathbb{R}^{2d}:\left\\|\vec{x}\right\\|_{2}=1\wedge x_{1}^{2}+x_{2}^{2}>1-\epsilon^{2}\right\}\right),$		(20)

where the first equality uses fixed pure state $|{0}\rangle$ and the unitary invariance of $\mu$ and the trace distance, the second equality uses Eq. (1), and the third equality uses the relationship between $\mu$ and the uniform spherical probability measure $\xi$ . Using a spherical coordinate system, we can proceed as follows:

$\displaystyle\mu(B_{\epsilon}\left(\phi\right))$	$\displaystyle=$	$\displaystyle\frac{V(\epsilon)}{V(1)},$	(21)
$\displaystyle\text{where}\ V(\epsilon)$	$\displaystyle:=$	$\displaystyle\int_{D_{\epsilon}}\sin^{2d-2}\theta\sin^{2d-3}\phi d\theta d\phi$	(22)
	$\displaystyle=$	$\displaystyle 4\int_{\hat{D}_{\epsilon}}\sin^{2d-2}\theta\sin^{2d-3}\phi d\theta d\phi$	(22)

and the domain of the integration $D_{\epsilon}$ is given by $\{(\theta,\phi):\theta,\phi\in(0,\pi),\sin\theta\sin\phi<\epsilon\}$ . Since the domain and that of the integrand have reflection symmetries about two lines $\theta=\frac{\pi}{2}$ and $\phi=\frac{\pi}{2}$ , it is sufficient to perform the integration in domain $\hat{D}_{\epsilon}:=\{(\theta,\phi):\theta,\phi\in\left(0,\frac{\pi}{2}\right),\sin\theta\sin\phi<\epsilon\}$ . By changing the variables as $\begin{pmatrix}x\\ y\end{pmatrix}=\begin{pmatrix}\sin\theta\sin\phi\\ \sin\theta\end{pmatrix}$ , we obtain

$\displaystyle V(\epsilon)$	$\displaystyle=$	$\displaystyle 4\int_{0}^{\epsilon}dxx^{2d-3}\int_{x}^{1}dy\frac{y}{\sqrt{1-y^{2}}\sqrt{y^{2}-x^{2}}}$	(23)
	$\displaystyle=$	$\displaystyle 4\int_{0}^{\epsilon}dxx^{2d-3}\left[\arcsin\sqrt{\frac{1-y^{2}}{1-x^{2}}}\right]_{1}^{x}$
	$\displaystyle=$	$\displaystyle\frac{\pi}{d-1}\epsilon^{2(d-1)}$

for $\epsilon\in[0,1]$ . This completes the calculation.

Appendix B Upper bound for internal covering number $I_{in}$

We construct an internal $\epsilon$ -covering ( $\epsilon\in(0,1]$ ) following the proof in [8, Corollary 5.5]. The construction is based on the fact that sufficiently many pure states randomly sampled form an $\epsilon$ -covering. However, since the probability of a new random pure state residing in the uncovered region decreases when many random $\epsilon$ -balls are sampled, it is better to stop sampling a pure state and change the strategy of the construction.

In the proof, we represent some parameters explicitly, which are tailored to the $\epsilon$ -covering with respect to the trace distance. Assume $d\geq 2$ and let $D=2(d-1)(\geq 2)$ . Let $\{\phi_{j}\in\mathbf{P}\left(\mathbb{C}^{d}\right)\}_{j=1}^{J_{R}}$ be a set of finite randomly sampled pure states with respect to product measure $\mu^{J_{R}}$ . The expected volume of the region not covered by $A:=\cup_{j=1}^{J_{R}}B_{\epsilon_{R}}\left(\phi_{j}\right)$ ( $0<\epsilon_{R}\leq 1$ ) can be calculated as follows:

	$\displaystyle\int d\mu^{J_{R}}\mu\left(A^{c}\right)$
	$\displaystyle=\int d\mu^{J_{R}}\int d\mu(\psi)\prod_{j=1}^{J_{R}}\text{I}\left[\left\\|\psi-\phi_{j}\right\\|_{\text{tr}}\geq\epsilon_{R}\right]$
	$\displaystyle=\int d\mu(\psi)\prod_{j=1}^{J_{R}}\int d\mu(\phi_{j})\text{I}\left[\left\\|\psi-\phi_{j}\right\\|_{\text{tr}}\geq\epsilon_{R}\right]$
	$\displaystyle=\left(1-\epsilon_{R}^{D}\right)^{J_{R}}\leq\exp\left(-J_{R}\epsilon_{R}^{D}\right),$		(24)

where we use Fubini’s theorem and Eq. (19) in the second equation and the third equation, respectively. Note that $\text{I}\left[X\right]\in\{0,1\}$ is the indicator function, i.e., $\text{I}\left[X\right]=1$ iff $X$ is true.

Thus, there exists $\{\phi_{j}\}_{j=1}^{J_{R}}$ such that $\mu\left(A^{c}\right)\leq\exp\left(-J_{R}\epsilon_{R}^{D}\right)$ . Pick $\{\psi_{j}\}_{j=1}^{J_{P}}$ as much as possible such that $B_{\epsilon_{P}}\left(\psi_{j}\right)$ are disjoint and contained in $A^{c}$ . When $0<\epsilon_{P}\leq\epsilon_{R}\leq 1$ , we can verify that $\{\phi_{j}\}_{j=1}^{J_{R}}\cup\{\psi_{j}\}_{j=1}^{J_{P}}$ is an $(\epsilon_{R}+\epsilon_{P})$ -covering and its size $J:=J_{R}+J_{P}$ is upper bounded as

J\leq J_{R}+\frac{\exp\left(-J_{R}\epsilon_{R}^{D}\right)}{\epsilon_{P}^{D}}.

(25)

By setting $J_{R}=\left\lceil\frac{D}{\epsilon_{R}^{D}}\ln\left(\frac{\epsilon_{R}}{\epsilon_{P}}\right)\right\rceil$ , $\epsilon_{P}=\frac{\epsilon_{R}}{x}$ and $\epsilon_{R}=\frac{x}{1+x}\epsilon$ with $x\geq 1$ , we obtain the following upper bound:

J\leq\left\lceil\frac{D\ln x}{\epsilon_{R}^{D}}\right\rceil+\frac{1}{\epsilon_{R}^{D}}\leq\frac{1}{\epsilon^{D}}\left\{\left(1+\frac{1}{x}\right)^{D}(D\ln x+1)+1\right\}=\frac{2d\ln d}{\epsilon^{D}}\cdot\frac{\alpha(d,x)}{2d\ln d},

(26)

where $\alpha(d,x)=\left(1+\frac{1}{x}\right)^{D}(D\ln x+1)+1$ . Since $\lim_{d\rightarrow\infty}\frac{\alpha(d,D\ln D)}{2d\ln d}=1$ , we obtain that for any $r>2$ there exists $d_{0}\in\mathbb{N}$ such that

\forall d\geq d_{0},\forall\epsilon\in(0,1],J\leq rd(\ln d)\left(\frac{1}{\epsilon}\right)^{2(d-1)}.

(27)

For example, if $r=5$ , we can set $d_{0}=2$ . This completes the proof.

Appendix C Lower bound for external covering number $I_{ex}$

We can derive a lower bound for the external covering number as a direct consequence of the following upper bounds on the volume of the maximum intersection of $\epsilon$ -ball in $\mathbf{S}\left(\mathbb{C}^{d}\right)$ and $\mathbf{P}\left(\mathbb{C}^{d}\right)$ , which will be proven in this section: For any $\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)$ and any $\epsilon\in\left(0,\frac{1}{2}\right]$ ,

	$\displaystyle\forall d\geq 1,\mu(B_{\epsilon}\left(\rho\right))$	$\displaystyle\leq$	$\displaystyle(2\epsilon)^{2(d-1)},$		(28)
	$\displaystyle\forall d\geq 4,\mu(B_{\epsilon}\left(\rho\right))$	$\displaystyle\leq$	$\displaystyle\epsilon^{2(d-1)}.$		(29)

Combined with Eq. (19), Eq. (29) implies that the maximum intersection is achieved when $\rho$ is pure if two conditions $d\geq 4$ and $\epsilon\in\left(0,\frac{1}{2}\right]$ are satisfied. These two conditions are not tight but cannot be fully relaxed since $\mu(B_{\epsilon}\left(\sigma\right))=1$ for any $d\in\mathbb{N}$ and $\epsilon>1-\frac{1}{d}$ , where $\sigma$ is the maximally mixed state $\frac{1}{d}\mathbb{I}$ .

C.1 Proof of Eq. (28)

By defining $\phi:=\arg\min_{\phi\in\mathbf{P}\left(\mathbb{C}^{d}\right)}\left\|\phi-\rho\right\|_{\text{tr}}$ , we obtain $B_{\epsilon}\left(\rho\right)\subseteq B_{2\epsilon}\left(\phi\right)$ . For $\left\|\psi-\phi\right\|_{\text{tr}}\leq\left\|\psi-\rho\right\|_{\text{tr}}+\left\|\phi-\rho\right\|_{\text{tr}}\leq 2\left\|\psi-\rho\right\|_{\text{tr}}<2\epsilon$ for any pure state $\psi\in B_{\epsilon}\left(\rho\right)$ . This completes the proof since $\mu(B_{\epsilon}\left(\rho\right))\leq\mu(B_{2\epsilon}\left(\phi\right))=(2\epsilon)^{2(d-1)}$ for any $\epsilon\in\left(0,\frac{1}{2}\right]$ .

C.2 Proof of Eq. (29)

Let $\rho=\sum_{i=0}^{d-1}p_{i}|{i}\rangle\langle{i}|$ , where $\{|{i}\rangle\}_{i}$ is a set of eigenvectors of $\rho$ and eigenvalues are arranged in decreasing order, i.e., $p_{0}\geq p_{1}\geq\cdots$ . Since $\mu(B_{\epsilon}\left(\rho\right))$ depends not on the eigenvectors but on the eigenvalues of $\rho$ , it is sufficient to consider only diagonal $\rho$ with respect to a fixed basis. However, it is difficult to exactly calculate $\mu(B_{\epsilon}\left(\rho\right))$ due to a complicated relationship between $\psi$ and the largest eigenvalue of $\psi-\rho$ , resulting from the condition $\epsilon>\left\|\psi-\rho\right\|_{\text{tr}}=\lambda_{\max}(\psi-\rho)$ .

We derive lower bound $f_{\rho}(\psi)$ of $\left\|\psi-\rho\right\|_{\text{tr}}$ and use the relationship $\mu(B_{\epsilon}\left(\rho\right))\leq\mu\left(\{\psi:f_{\rho}(\psi)<\epsilon\}\right)$ to show Eq. (29), where $f_{\rho}$ is a measurable function. Since simple bound $f_{\rho}(\psi)=1-F\left(\psi,\rho\right)$ is too loose to show Eq. (29), we derive a tighter lower bound as follows: Let $\Pi$ and $\Pi^{\bot}$ be the Hermitian projectors on two-dimensional subspace $\mathcal{V}\supseteq\text{span}\left(\{|{0}\rangle,|{\psi}\rangle\}\right)$ and its orthogonal complement, respectively. We then obtain

	$\displaystyle\left\\|\psi-\rho\right\\|_{\text{tr}}$	$\displaystyle\geq$	$\displaystyle\left\\|\Pi(\psi-\rho)\Pi+\Pi_{\bot}(\psi-\rho)\Pi_{\bot}\right\\|_{\text{tr}}$		(30)
		$\displaystyle=$	$\displaystyle\left\\|\psi-\Pi\rho\Pi\right\\|_{\text{tr}}+\left\\|\Pi_{\bot}\rho\Pi_{\bot}\right\\|_{\text{tr}},$		(30)

where we use the monotonicity of the trace distance under a CPTP mapping in the first inequality. Define $f_{\rho}(\psi)$ as the value in Eq. (30), which can be explicitly written as

f_{\rho}(\psi)=\frac{1}{2}\sqrt{(1+p_{0}-q)^{2}-4(p_{0}-q)|\langle{0}|{\psi}\rangle|^{2}}+\frac{1}{2}(1-p_{0}-q),

(31)

where $q=\langle{0_{\bot}}|\rho|{0_{\bot}}\rangle$ and $\{|{0}\rangle,|{0_{\bot}}\rangle\}$ is an orthonormal basis of $\mathcal{V}$ . The explicit formula implies $f_{\rho}$ is uniquely defined (although neither $\mathcal{V}$ nor $q$ is uniquely defined if $\psi=|{0}\rangle\langle{0}|$ ) and continuous, thus measurable. Since $\mu(B_{\epsilon}\left(\rho\right))=0$ , satisfying Eq. (29), if $p_{0}\leq 1-\epsilon$ , we consider the case $p_{0}>1-\epsilon$ . By further assuming $\epsilon\in\left(0,\frac{1}{2}\right]$ , we obtain

f_{\rho}(\psi)<\epsilon\Leftrightarrow|\langle{0}|{\psi}\rangle|^{2}>\frac{(\epsilon+p_{0})(1-q-\epsilon)}{p_{0}-q},

(32)

where the this condition is not trivial, i.e., $\frac{(\epsilon+p_{0})(1-q-\epsilon)}{p_{0}-q}\in(0,1)$ . Assuming $d\geq 4$ , we calculate an upper bound on $\mu(B_{\epsilon}\left(\rho\right))$ as follows: By defining $\mathbf{U}\left(\mathcal{H}\right)$ as the set of unitary operators on $\mathcal{H}$ and $C:\mathbf{U}\left(\mathbb{C}^{d-1}\right)\rightarrow\mathbf{U}\left(\mathbb{C}^{d}\right)$ as $C(U):=|{0}\rangle\langle{0}|\oplus U$ , we can show that for any unitary operator $U\in\mathbf{U}\left(\mathbb{C}^{d-1}\right)$ ,

			$\displaystyle\mu\left(\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):f_{\rho}(\psi)<\epsilon\}\right)$
			$\displaystyle=\mu\left(\{\psi:f_{\rho}(C(U)\psi C(U)^{\dagger})<\epsilon\}\right)$
			$\displaystyle=\int_{\mathbf{P}\left(\mathbb{C}^{d}\right)}d\mu(\psi)\text{I}\left[\|\langle{0}\|{\psi}\rangle\|^{2}>\frac{(\epsilon+p_{0})(1-q_{U}-\epsilon)}{p_{0}-q_{U}}\right]$		(33)

where we use the unitarily invariance of $\mu$ in the first equality, $q_{U}=\langle{0_{\bot}}|C(U)^{\dagger}\rho C(U)|{0_{\bot}}\rangle$ , and $\text{I}\left[X\right]\in\{0,1\}$ is the indicator function, i.e., $\text{I}\left[X\right]=1$ iff $X$ is true. By integrating Eq. (33) with respect to the Haar measure on $\mathbf{U}\left(\mathbb{C}^{d-1}\right)$ and using Fubini’s theorem, we obtain

\mu\left(\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):f_{\rho}(\psi)<\epsilon\}\right)=\int_{\mathbf{P}\left(\mathbb{C}^{d}\right)}d\mu(\psi)\int_{\mathbf{P}\left(\mathbb{C}^{d-1}\right)}d\mu(\phi)\text{I}\left[|\langle{0}|{\psi}\rangle|^{2}>\frac{(\epsilon+p_{0})(1-F\left(\rho,\phi\right)-\epsilon)}{p_{0}-F\left(\rho,\phi\right)}\right],

(34)

where $\phi\in\mathbf{P}\left(\mathbb{C}^{d-1}\right)$ is identified with a pure state on $\mathbf{P}\left(\mathbb{C}^{d}\right)$ acting on subspace $\text{span}\left(\{|{1}\rangle,\cdots,|{d-1}\rangle\}\right)$ . Using Fubini’s theorem again and Eq. (19), we can proceed with the calculation:

	$\displaystyle=\int_{\mathbf{P}\left(\mathbb{C}^{d-1}\right)}d\mu(\phi)\delta(F\left(\rho,\phi\right))^{2(d-1)}$
	$\displaystyle=\int_{\mathbf{P}\left(\mathbb{C}^{d-1}\right)}d\mu(\phi)\delta\left(\sum_{i=1}^{d-1}p_{i}\|\langle{i}\|{\phi}\rangle\|^{2}\right)^{2(d-1)}$
	$\displaystyle\leq\int_{\mathbf{P}\left(\mathbb{C}^{d-1}\right)}d\mu(\phi)\delta\left((1-p_{0})\|\langle{1}\|{\phi}\rangle\|^{2}\right)^{2(d-1)}$
	$\displaystyle=(d-2)\int_{0}^{1}(1-x)^{d-3}\delta((1-p_{0})x)^{2(d-1)}dx=:g_{d,\epsilon}(p_{0}),$		(35)

where $\delta(q)=\sqrt{\frac{(\epsilon+q)(p_{0}+\epsilon-1)}{p_{0}-q}}$ , we use the convexity of $\delta^{2(d-1)}$ and the unitary invariance of $\mu$ in the last inequality, and we use the probability density of $x=|\langle{1}|{\phi}\rangle|^{2}$ derived by Eq. (19) in the last equality. To confirm the calculation, we plot a comparison between $\mu(B_{\epsilon}\left(\rho\right))$ and its upper bound $g_{d,\epsilon}(p_{0})$ for a particular $\rho$ as shown in Fig. 3, where we use the following explicit expression of $g_{4,\epsilon}(p_{0})$ :

\displaystyle g_{4,\epsilon}(p_{0})

\displaystyle=

\displaystyle 2(p_{0}+\epsilon-1)^{3}\bigg{\{}\frac{1-6b-ab^{2}}{2ab^{2}}+\frac{3(a+1)}{a(b-a)}\left(1-\frac{a}{b-a}\log\frac{b}{a}\right)\bigg{\}},

(36)

where $a=\frac{2p_{0}-1}{\epsilon+p_{0}}$ and $b=\frac{p_{0}}{\epsilon+p_{0}}$ and $p_{0}\in(1-\epsilon,1)$ . Note that $g_{4,\epsilon}(1)=\epsilon^{6}=\lim_{p_{0}\rightarrow 1}g_{4,\epsilon}(p_{0})$ .

It is sufficient to show that under the two conditions $\epsilon\in\left(0,\frac{1}{2}\right]$ and $d\geq 4$ ,

\forall p_{0}\in(1-\epsilon,1),\frac{dg_{d,\epsilon}}{dp_{0}}\geq 0

(37)

since $g_{d,\epsilon}(1)=\epsilon^{2(d-1)}$ . Since the integrand of $g_{d,\epsilon}$ and its partial derivative with respect to $p_{0}$ are continuous, we can interchange the partial differential and integral operators:

	$\displaystyle\frac{dg_{d,\epsilon}}{dp_{0}}$	$\displaystyle=$	$\displaystyle(d-2)\int_{0}^{1}(1-x)^{d-3}\frac{\partial}{\partial p_{0}}\delta((1-p_{0})x)^{2(d-1)}dx$		(38)
		$\displaystyle=$	$\displaystyle\alpha_{d,\epsilon}(p_{0})\int_{0}^{1}\beta_{\epsilon}(p_{0},x)\gamma_{d,\epsilon}(p_{0},x)dx,$		(38)

where $\alpha_{d,\epsilon}(p_{0})=(d-2)(d-1)(p_{0}+\epsilon-1)^{d-2}$ , $\beta_{\epsilon}(p_{0},x)=-(1-p_{0})^{2}x^{2}+(1-\epsilon-\epsilon^{2}-p_{0}^{2})x+(1-\epsilon)\epsilon$ and $\gamma_{d,\epsilon}(p_{0},x)=\frac{(1-x)^{d-3}(\epsilon+(1-p_{0})x)^{d-2}}{(p_{0}-(1-p_{0})x)^{d}}$ . Since $\alpha_{d,\epsilon}$ and $\gamma_{d,\epsilon}$ are non-negative in the entire considered region $R:=\{(p_{0},x):p_{0}\in(1-\epsilon,1)\wedge x\in[0,1]\}$ , $\frac{dg_{d,\epsilon}}{dp_{0}}\geq 0$ if $\beta_{\epsilon}$ is non-negative for all $x\in[0,1]$ . However, $\beta_{\epsilon}$ can be negative for some $x\in[0,1]$ if and only if $\beta_{\epsilon}(p_{0},1)<0$ . Taking account of considered region $R$ , it is sufficient to show $\frac{dg_{d,\epsilon}}{dp_{0}}\geq 0$ for all $p_{0}\in\left(\frac{1+\sqrt{1-4\epsilon^{2}}}{2},1\right)(\subseteq(1-\epsilon,1))$ , where $\beta_{\epsilon}$ can be negative.

For fixed $p^{*}\in\left(\frac{1+\sqrt{1-4\epsilon^{2}}}{2},1\right)$ , let $x^{*}\in(0,1)$ satisfy $\beta_{\epsilon}(p^{*},x^{*})=0$ . Since $\beta_{\epsilon}(p^{*},x)$ is monotonically decreasing in $x\geq 0$ , $x^{*}$ is uniquely defined, $\beta_{\epsilon}(p^{*},x)>0$ if $x\in[0,x^{*})$ and $\beta_{\epsilon}(p^{*},x)<0$ if $x\in(x^{*},1]$ . Thus, showing

\forall d\geq 4,\exists c>0,\left\{\begin{array}[]{ll}\gamma_{d+1,\epsilon}(p^{*},x)\geq c\gamma_{d,\epsilon}(p^{*},x)&\text{for}\ x\in[0,x^{*})\\ \gamma_{d+1,\epsilon}(p^{*},x)\leq c\gamma_{d,\epsilon}(p^{*},x)&\text{for}\ x\in(x^{*},1]\end{array}\right.

(39)

and

\frac{dg_{4,\epsilon}}{dp_{0}}\bigg{|}_{p_{0}=p^{*}}\geq 0,

(40)

is sufficient for $\forall d\geq 4,\frac{dg_{d,\epsilon}}{dp_{0}}\Big{|}_{p_{0}=p^{*}}\geq 0$ . For

$\displaystyle\alpha_{d+1,\epsilon}(p^{})^{-1}\frac{dg_{d+1,\epsilon}}{dp_{0}}\bigg{\|}_{p_{0}=p^{}}$	$\displaystyle=$	$\displaystyle\int_{0}^{x^{}}\beta_{\epsilon}(p^{},x)\gamma_{d+1,\epsilon}(p^{},x)dx+\int_{x^{}}^{1}\beta_{\epsilon}(p^{},x)\gamma_{d+1,\epsilon}(p^{},x)dx$	(41)
	$\displaystyle\geq$	$\displaystyle c\biggl{\{}\int_{0}^{x^{}}\beta_{\epsilon}(p^{},x)\gamma_{d,\epsilon}(p^{},x)dx+\int_{x^{}}^{1}\beta_{\epsilon}(p^{},x)\gamma_{d,\epsilon}(p^{},x)dx\biggr{\}}$
	$\displaystyle=$	$\displaystyle c\alpha_{d,\epsilon}(p^{})^{-1}\frac{dg_{d,\epsilon}}{dp_{0}}\bigg{\|}_{p_{0}=p^{}}$

holds for any $d\geq 4.$

First, we show Eq. (39). By observing that for any $d\geq 4$ ,

\gamma_{d+1,\epsilon}(p^{*},x)-c\gamma_{d,\epsilon}(p^{*},x)=\gamma_{d,\epsilon}(p^{*},x)\left(\frac{(1-x)(\epsilon+(1-p^{*})x)}{p^{*}-(1-p^{*})x}-c\right),

(42)

$h_{\epsilon,p^{*}}(x):=\frac{(1-x)(\epsilon+(1-p^{*})x)}{p^{*}-(1-p^{*})x}$ is monotonically decreasing in $x\in[\hat{x},1]$ and $h_{\epsilon,p^{*}}(x)\geq h_{\epsilon,p^{*}}(\hat{x})$ for $x\in[0,\hat{x}]$ with $\hat{x}:=\max\left\{0,1-\frac{2p^{*}-1}{p^{*}(1-p^{*})}\epsilon\right\}(\leq x^{*})$ , setting $c=h_{\epsilon,p^{*}}(x^{*})(>0)$ implies Eq. (39).

Next, Eq. (40) can be verified by using the explicit expression Eq. (36).

Appendix D Proof for Thoerem 3 and its sharp lower bound

In this section, we prove Theorem 3 with a tighter lower bound as the following theorem. After the proof, we show that this lower bound is sharp by constructing set $\{\Upsilon_{x}\}_{x}$ of unitary transformations such that $\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}$ is arbitrarily close to its lower bound $\frac{4\beta}{d}\left(1-\frac{\beta}{d}\right)=\frac{4}{d}\left(1-\frac{1}{d}\right)$ while the upper bound given in the theorem is $\alpha=1$ (actually, each $\Upsilon_{x}$ in the set can be perfectly distinguished from the identity transformation,) where $\alpha$ and $\beta$ are defined in the following theorem.

Theorem 4.

For an integer $d\geq 2$ specified below, let $\{\Upsilon_{x}\}_{x\in X}$ be a finite set of unitary transformations on $\mathbf{L}\left(\mathbb{C}^{d}\right)$ . Then, it holds that

	$\displaystyle\frac{4\beta}{d}\left(1-\frac{\beta}{d}\right)\leq\max_{\Upsilon}\min_{p}\frac{1}{2}\left\\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\\|_{\diamond}\leq\alpha$		(43)
	$\displaystyle\alpha=\max_{\Upsilon}\min_{x\in X}\left(\frac{1}{2}\left\\|\Upsilon-\Upsilon_{x}\right\\|_{\diamond}\right)^{2},\ \ \beta=1-\sqrt{1-\alpha},$		(44)

where the first minimum is taken over probability distribution $p$ over $X$ . Note that if $d=2$ , the equalities hold.

The lower bound of Theorem 3 can be derived from this theorem as follows:

\frac{4\beta}{d}\left(1-\frac{\beta}{d}\right)\geq\frac{4\beta}{d}\left(1-\frac{\beta}{2}\right)=\frac{2}{d}\alpha.

(45)

In the proof, we use the following fact about the diamond norm: For two CPTP linear mappings $\Gamma$ and $\Lambda$ from $\mathbf{L}\left(\mathcal{H}_{1}\right)$ to $\mathbf{L}\left(\mathcal{H}_{2}\right)$ , we can verify

\frac{1}{2}\left\|\Gamma-\Lambda\right\|_{\diamond}=\max_{M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})}\text{tr}\left[(J(\Gamma-\Lambda))M\right],

(46)

where $J(\Xi):=\sum_{i,j}|{i}\rangle\langle{j}|\otimes\Xi(|{i}\rangle\langle{j}|)\in\mathbf{L}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right)$ is the Choi-Jamiołkowski operator of linear mapping $\Xi$ and $\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2}):=\{M\in\mathbf{Pos}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right):\exists\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right),M\leq\rho\otimes\mathbb{I}\}$ is the set of measuring strategies [19] or that of quantum testers [20].

Proof.

Let $\Upsilon$ and $\Upsilon_{x}$ be unitary transformations from $\mathbf{L}\left(\mathcal{H}_{1}\right)$ to $\mathbf{L}\left(\mathcal{H}_{2}\right)$ , defined as $\Upsilon(\rho)=U\rho U^{\dagger}$ and $\Upsilon_{x}(\rho)=U_{x}\rho U_{x}^{\dagger}$ , respectively.

First, we show

\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}\leq\max_{\Upsilon}\min_{x\in X}\left(\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}\right)^{2}.

(47)

By using Eq. (46), we obtain

	$\displaystyle(\text{L.H.S.\ of\ Ineq.}\eqref{ineq:upperdiamond})$
	$\displaystyle=\max_{\Upsilon}\min_{p}\max_{M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})}\text{tr}\left[J\left(\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right)M\right]$
	$\displaystyle=\max_{\Upsilon}\max_{M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})}\min_{p}\text{tr}\left[J\left(\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right)M\right]$
	$\displaystyle=\max_{M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})}\left\{\max_{\Upsilon}\text{tr}\left[J\left(\Upsilon\right)M\right]-\max_{x\in X}\text{tr}\left[J\left(\Upsilon_{x}\right)M\right]\right\},$		(48)

where we use the minimax theorem in the second equation since $f(p,M):=\text{tr}\left[J\left(\Upsilon-\sum_{x}p(x)\Upsilon_{x}\right)M\right]$ is affine with respect to each variable and the domain of $p$ and $M$ are compact and convex. Since it is known that the set of mappings $\Upsilon\mapsto\text{tr}\left[J\left(\Upsilon\right)M\right]$ associated to quantum testers $M$ is equivalent to that of mappings $\Upsilon\mapsto\text{tr}\left[\Upsilon\otimes id_{\mathcal{H}_{3}}(\Phi)\Pi\right]$ associated to pure states $\Phi$ and hermitian projectors $\Pi$ [20, Theorem 10] for sufficiently large dimensional Hilbert space $\mathcal{H}_{3}$ (to be self-contained, we provide a proof for the equivalence between the two mappings in Appendix G) we can proceed as follows:

\displaystyle=\max_{\begin{subarray}{c}\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{3}\right)\\ \Pi\in\mathbf{Proj}(\mathcal{H}_{2}\otimes\mathcal{H}_{3})\end{subarray}}\Big{(}\max_{U}\text{tr}\left[(U\otimes\mathbb{I}_{\mathcal{H}_{3}})\Phi(U\otimes\mathbb{I}_{\mathcal{H}_{3}})^{\dagger}\Pi\right]-\max_{x\in X}\text{tr}\left[(U_{x}\otimes\mathbb{I}_{\mathcal{H}_{3}})\Phi(U_{x}\otimes\mathbb{I}_{\mathcal{H}_{3}})^{\dagger}\Pi\right]\Big{)},

(49)

where $\mathbf{Proj}(\mathcal{H})$ is the set of hermitian projectors on $\mathcal{H}$ . Let $\hat{\Phi}$ , $\hat{\Pi}$ and $\hat{U}$ maximize Eq. (49). Since arbitrary unitary transformations cannot be represented by ensembles of finite set of unitary transformations, the first term cannot be $0$ , thus $\hat{\Pi}\hat{U}|{\hat{\Phi}}\rangle\neq 0$ . Let $\hat{\Psi}$ be the pure state such that $|{\hat{\Psi}}\rangle\propto\hat{\Pi}\hat{U}|{\hat{\Phi}}\rangle$ . Then, we can verify that Eq. (49) is still maximized even if we replace $\hat{\Pi}$ by $\hat{\Psi}$ . Thus, $\Pi$ in Eq. (49) can be restricted as a pure state, i.e., $\Pi=\Psi\in\mathbf{P}\left(\mathcal{H}_{2}\otimes\mathcal{H}_{3}\right)$ , and we proceed as follows:

\displaystyle=\max_{\begin{subarray}{c}\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{3}\right)\\ \Psi\in\mathbf{P}\left(\mathcal{H}_{2}\otimes\mathcal{H}_{3}\right)\end{subarray}}\Big{(}\max_{U}|\langle{\Psi}|U\otimes\mathbb{I}_{\mathcal{H}_{3}}|{\Phi}\rangle|^{2}-\max_{x\in X}|\langle{\Psi}|U_{x}\otimes\mathbb{I}_{\mathcal{H}_{3}}|{\Phi}\rangle|^{2}\Big{)}.

(50)

Before proceeding to the next step, we show the set of mappings $f_{\Phi,\Psi}:U\mapsto|\langle{\Psi}|U\otimes\mathbb{I}_{\mathcal{H}_{3}}|{\Phi}\rangle|$ associated to pure states $\Phi$ and $\Psi$ is equivalent to that of mappings $g_{A}:U\mapsto\left|\text{tr}\left[AU\right]\right|$ associated to linear operator $A$ such that $\left\|A\right\|_{1}\leq 1$ , where $\left\|A\right\|_{1}$ is the Schatten $1$ -norm of $A$ . By using decompositions $|{\Phi}\rangle=\sum_{i,j}\alpha_{ij}|{i}\rangle|{j}\rangle$ and $|{\Psi}\rangle=\sum_{i,j}\beta_{ij}|{i}\rangle|{j}\rangle$ with respect to orthonormal bases, we can verify that $g_{A}$ with $A=\sum_{i,j,k}\alpha_{ik}\beta^{*}_{jk}|{i}\rangle\langle{j}|$ equals to $f_{\Phi,\Psi}$ and $\left\|A\right\|_{1}=\max_{U}g_{A}(U)=\max_{U}f_{\Phi,\Psi}(U)\leq 1$ . On the other hand, By using the singular value decomposition $A=\sum_{i}p_{i}|{x_{i}}\rangle\langle{y_{i}}|$ , where $\left\|A\right\|_{1}\leq 1$ implies $p+\sum_{i}p_{i}=1$ with some $p\geq 0$ , we can verify that $f_{\Phi,\Psi}$ with $|{\Phi}\rangle=\sqrt{p}|{0}\rangle|{\bot}\rangle+\sum_{i}\sqrt{p_{i}}|{x_{i}}\rangle|{i}\rangle$ and $|{\Psi}\rangle=\sqrt{p}|{0}\rangle|{\bot^{\prime}}\rangle+\sum_{i}\sqrt{p_{i}}|{y_{i}}\rangle|{i}\rangle$ ( $\{|{i}\rangle\}_{i}\cup\{|{\bot}\rangle,|{\bot^{\prime}}\rangle\}$ is an orthonormal basis) equals to $g_{A}$ .

By using the equivalent between two sets of mappings and $\left\|A\right\|_{1}=\max_{U}\left|\text{tr}\left[AU\right]\right|$ , we proceed as follows:

	$\displaystyle=\max_{A:\left\\|A\right\\|_{1}\leq 1}\left(\left\\|A\right\\|_{1}^{2}-\max_{x\in X}\|\text{tr}\left[AU_{x}\right]\|^{2}\right)$
	$\displaystyle=\max_{\begin{subarray}{c}V,\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)\\ q\in[0,1]\end{subarray}}q^{2}\left(1-\max_{x\in X}\|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\|^{2}\right)$
	$\displaystyle=1-\min_{\begin{subarray}{c}V,\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)\end{subarray}}\max_{x\in X}\|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\|^{2},$		(51)

where we use the polar decomposition $A=q\rho V^{\dagger}$ in the second equation and $V:\mathcal{H}_{1}\rightarrow\mathcal{H}_{2}$ is a unitary operator. On the other hand, since $\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}=\max_{\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1}^{\prime}\right)}\left\|\Upsilon\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi)-\Upsilon_{x}\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi)\right\|_{\text{tr}}=\max_{\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1}^{\prime}\right)}\sqrt{1-F\left(\Upsilon\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi),\Upsilon_{x}\otimes id_{\mathcal{H}_{1}^{\prime}}(\Phi)\right)}=\sqrt{1-\min_{\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1}^{\prime}\right)}|\langle{\Phi}|U^{\dagger}U_{x}\otimes\mathbb{I}_{\mathcal{H}_{1}^{\prime}}|{\Phi}\rangle|^{2}}=\sqrt{1-\min_{\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)}|\text{tr}\left[\rho U^{\dagger}U_{x}\right]|^{2}}$ , it holds that

(\text{R.H.S.\ of\ Ineq.}\eqref{ineq:upperdiamond})=1-\min_{V}\max_{x\in X}\min_{\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)}|\text{tr}\left[\rho V^{\dagger}U_{x}\right]|^{2}.

(52)

Since $\max_{x}\min_{y}f(x,y)\leq\min_{y}\max_{x}f(x,y)$ for any $f$ if the maximum and the minimum exist, we obtain Ineq. (47).

Next, we show

\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}\geq\frac{4\beta}{d}\left(1-\frac{\beta}{d}\right),

(53)

where $\beta=1-\sqrt{1-\max_{\Upsilon}\min_{x\in X}\left(\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}\right\|_{\diamond}\right)^{2}}$ . Due to Eq. (52), we can verify that $\beta=1-\min_{V}\max_{x}\min_{\rho}\left|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\right|$ . First, observe that for any unitary operator $W$ on $\mathbb{C}^{d}$ $(d\geq 2)$ ,

\displaystyle\frac{1}{d}\left|\text{tr}\left[W\right]\right|=\frac{1}{d}\left|\sum_{i=1}^{d}\lambda_{i}(W)\right|\leq\frac{2}{d}\min_{p}\left|\sum_{i=1}^{d}p(i)\lambda_{i}(W)\right|+\frac{d-2}{d}=\frac{2}{d}\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\left|\text{tr}\left[\rho W\right]\right|+\frac{d-2}{d}

(54)

holds, where $\lambda_{i}(W)$ is the $i$ -th eigenvalue of $W$ , and in the inequality, we use the following two facts: (i) the minimization is achieved only if $p$ satisfies $\forall i,p(i)\leq\frac{1}{2}$ due to a geometric obseravation, and (ii) for such probability distribution $p\left(\leq\frac{1}{2}\right)$ and complex numbers $\lambda_{i}\in\{z\in\mathbb{C}:|z|=1\}$ , $\left|\sum_{i}p(i)\lambda_{i}\right|\geq\left|\sum_{i}\frac{1}{2}\lambda_{i}\right|-\left|\sum_{i}\left(\frac{1}{2}-p(i)\right)\lambda_{i}\right|\geq\frac{1}{2}\left|\sum_{i}\lambda_{i}\right|-\sum_{i}\left(\frac{1}{2}-p(i)\right)=\frac{1}{2}\left|\sum_{i}\lambda_{i}\right|-\frac{d-2}{2}$ . By using this, we obtain

	$\displaystyle\min_{\begin{subarray}{c}V,\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)\end{subarray}}\max_{x\in X}\|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\|\leq\min_{\begin{subarray}{c}V\end{subarray}}\max_{x\in X}\left\|\text{tr}\left[\frac{\mathbb{I}}{d}V^{\dagger}U_{x}\right]\right\|$
	$\displaystyle\leq\frac{2}{d}\min_{\begin{subarray}{c}V\end{subarray}}\max_{x\in X}\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\left\|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\right\|+\frac{d-2}{d}=\frac{2}{d}(1-\beta)+\frac{d-2}{d}.$		(55)

This and Eq. (51) implies

\displaystyle(\text{L.H.S.\ of\ Ineq.}\eqref{ineq:lowerdiamond})\geq 1-\left(\frac{2}{d}(1-\beta)+\frac{d-2}{d}\right)^{2}=(\text{R.H.S.\ of\ Ineq.}\eqref{ineq:lowerdiamond}).

(56)

This completes the proof.

∎

D.1 Sharpness of the lower bound

We show the equality in Ineq. (53) holds when we approximate arbitrary unitary transformations by using restricted set $\{\Lambda:\Lambda(\rho)=W\rho W^{\dagger},W\in\mathbf{W}\}$ of unitary transformations, where

\displaystyle\mathbf{W}:=\left\{W:\begin{matrix}W\text{\ is\ a\ unitary\ operator\ on\ $\mathbb{C}^{d}$ s.t.}\\ \text{the\ convex\ hull\ of\ }W\text{'s\ eigenvalues\ contains\ }0.\end{matrix}\right\}.

(57)

More precisely, since we assume $\{\Upsilon_{x}\}_{x}$ is a finite set, we show that Ineq. (53) is getting tighter when $\epsilon\rightarrow 0$ and $\{\Upsilon_{x}:\Upsilon_{x}(\rho)=U_{x}\rho U_{x}^{\dagger}\}_{x}$ is an $\epsilon$ -covering of $\{\Lambda:\Lambda(\rho)=W\rho W^{\dagger},W\in\mathbf{W}\}$ , i.e., $\max_{\Lambda}\min_{x}\left\|\Lambda-\Upsilon_{x}\right\|_{\diamond}<\epsilon$ , where $U_{x}\in\mathbf{W}$ . This can be shown by the following two observations:

First, for any $x\in X$ , there exists pure state $\phi_{x}\in\mathbf{P}\left(\mathbb{C}^{d}\right)$ such that $\text{tr}\left[\phi_{x}U_{x}\right]=0$ since $0$ is contained in the convex hull of $U_{x}$ ’s eigenvalues. Then, we obtain

	$\displaystyle\max_{\Upsilon}\min_{x\in X}\frac{1}{2}\left\\|\Upsilon-\Upsilon_{x}\right\\|_{\diamond}\geq\min_{x\in X}\frac{1}{2}\left\\|id-\Upsilon_{x}\right\\|_{\diamond}\geq\min_{x\in X}\left\\|\phi_{x}-\Upsilon_{x}(\phi_{x})\right\\|_{\text{tr}}$
	$\displaystyle=\min_{x\in X}\sqrt{1-F\left(\phi_{x},U_{x}\phi_{x}U_{x}^{\dagger}\right)}=\min_{x\in X}\sqrt{1-\|\text{tr}\left[\phi_{x}U_{x}\right]\|^{2}}=1.$		(58)

This implies that $\beta$ in Ineq. (53) satisfies $\beta=1$ .

Second, by using Eq. (51), we obtain

\displaystyle\max_{\Upsilon}\min_{p}\frac{1}{2}\left\|\Upsilon-\sum_{x\in X}p(x)\Upsilon_{x}\right\|_{\diamond}=1-\min_{V,\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{x\in X}\left|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\right|^{2}<1-\min_{V,\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{W\in\mathbf{W}}\left|\text{tr}\left[\rho V^{\dagger}W\right]\right|^{2}+\epsilon,

(59)

where in the inequality, we use that for any $W\in\mathbf{W}$ , there exists $\Upsilon_{x}$ such that $\epsilon>\frac{1}{2}\left\|\Lambda-\Upsilon_{x}\right\|_{\diamond}\geq\text{tr}\left[\Phi(V^{\dagger}W\otimes\mathbb{I})\Phi(W^{\dagger}V\otimes\mathbb{I})\right]-\text{tr}\left[\Phi(V^{\dagger}U_{x}\otimes\mathbb{I})\Phi(U_{x}^{\dagger}V\otimes\mathbb{I})\right]=\left|\text{tr}\left[\rho V^{\dagger}W\right]\right|^{2}-\left|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\right|^{2}$ , where $\Lambda(\rho)=W\rho W^{\dagger}$ and $\Phi$ is a purification of $\rho$ . Since $\epsilon$ can be arbitrarily small positive number, showing

\min_{V,\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{W\in\mathbf{W}}\left|\text{tr}\left[\rho V^{\dagger}W\right]\right|\geq 1-\frac{2}{d}

(60)

is sufficient to prove the sharpness of Ineq. (53).

For any unitary operator $V$ , there exists $\{W_{i}\in\mathbf{W}\}_{i=1}^{d}$ such that $V,W_{1},\cdots,W_{d}$ are simultaneously diagonalizable and the $j$ -th eigenvalues of $V$ and $W_{i}$ are the same for all $j\neq i$ . By letting $V^{\dagger}W_{i}=\mathbb{I}-\left(1-z_{i}\right)|{i}\rangle\langle{i}|$ , where complex number $z_{i}$ satisfies $|z_{i}|=1$ and $\{|{i}\rangle\}_{i=1}^{d}$ is an orthonormal basis, we can proceed as follows: for any unitary operator $V$ ,

	$\displaystyle\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{W\in\mathbf{W}}\left\|\text{tr}\left[\rho V^{\dagger}W\right]\right\|\geq\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{1\leq i\leq d}\left\|\text{tr}\left[\rho(\mathbb{I}-\left(1-z_{i}\right)\|{i}\rangle\langle{i}\|)\right]\right\|$
	$\displaystyle\geq\min_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\max_{1\leq i\leq d}\left(1-\left\|1-z_{i}\right\|\langle{i}\|\rho\|{i}\rangle\right)\geq 1-2\max_{\rho\in\mathbf{S}\left(\mathbb{C}^{d}\right)}\min_{1\leq i\leq d}\langle{i}\|\rho\|{i}\rangle\geq 1-\frac{2}{d}.$		(61)

This completes the proof.

Appendix E The operator norm and the diamond norm of unitary transformations

Suppose $\delta\in[0,\sqrt{2}]$ . In this section, we show that $\left\|\Upsilon_{1}-\Upsilon_{2}\right\|_{\diamond}<\delta\sqrt{4-\delta^{2}}$ if $\left\|U_{1}-U_{2}\right\|_{\infty}<\delta$ , where $\Upsilon_{i}(\rho):=U_{i}\rho U_{i}^{\dagger}$ is a unitary transformation on $\mathbf{L}\left(\mathcal{H}\right)$ .

By using the unitarily invariance of the diamond norm and the operator norm, it is sufficient to show $\left\|id-\Upsilon\right\|_{\diamond}<\delta\sqrt{4-\delta^{2}}$ if $\left\|\mathbb{I}-U\right\|_{\infty}<\delta$ , where $\Upsilon(\rho):=U\rho U^{\dagger}$ . Let $\lambda_{i}\in\{z\in\mathbb{C}:|z|=1\}$ be the $i$ -th eigenvalue of $U$ . Then, we obtain

\displaystyle\delta>\left\|\mathbb{I}-U\right\|_{\infty}=\max_{i}\{|1-\lambda_{i}|\}.

(62)

On the other hand, by using the similar observation used for deriving Eq. (52),

\displaystyle\left\|id-\Upsilon\right\|_{\diamond}

\displaystyle=

\displaystyle 2\sqrt{1-\min_{\rho\in\mathbf{S}\left(\mathcal{H}\right)}|\text{tr}\left[\rho U\right]|^{2}}=2\sqrt{1-\min_{p}\left|\sum_{i}p(i)\lambda_{i}\right|^{2}}.

(63)

By using an elementary geometric observation shown in Fig.4, we obtain $\min_{p}\left|\sum_{i}p(i)\lambda_{i}\right|>\epsilon=\sqrt{1-\frac{\delta^{2}}{4}\left(4-\delta^{2}\right)}$ . This completes the proof.

Appendix F Circuit synthesis of single qubit unitary transformations by using gate set $\{S,H,T\}$

Recall that $n(\epsilon)$ is the smallest length of gate sequences formed from gate set $\{S,H,T\}$ to approximate arbitrary single qubit unitary transformations within accuracy $\epsilon$ , i.e., $n(\epsilon):=\min\left\{n\in\mathbb{N}:\max_{\Upsilon}\min_{x}\frac{1}{2}\left\|\Upsilon-\Upsilon_{x}^{(n)}\right\|_{\diamond}<\epsilon\right\}$ , where $\{\Upsilon_{x}^{(n)}\}_{x}$ is the set of unitary transformations representing the unitary circuit realized by the gate sequences of length at most $n$ . In this section, we perform a numerical calculation of $n(\epsilon)$ and show that the probabilistic implementation reduces the gate length owing to Theorem 3.

First, we generate $\{\Upsilon_{x}^{(n)}\}_{x}$ . Next, we randomly sample $2\times 10^{4}$ single-qubit unitary transformations with respect to the Haar measure on the unitary group on $\mathbb{C}^{2}$ . Third, for each randomly sampled unitary transformation $\Upsilon$ , we calculate the half $\hat{\epsilon}(\Upsilon)$ of the minimum diamond norm between $\Upsilon$ and $\{\Upsilon_{x}^{(n)}\}_{x}$ . Then, we compute the maximum value $\hat{\epsilon}$ of $\hat{\epsilon}(\Upsilon)$ over all the sampled $\Upsilon$ . Since another numerical experiment indicates that the set of randomly sampled $2\times 10^{4}$ single-qubit unitary transformations is $0.1$ -covering of that of single-qubit unitary transformations with high probability, we assume that gate sequences of length at most $n$ approximate arbitrary single-qubit unitary transformations within accuracy $\hat{\epsilon}$ . (The true accuracy $\epsilon$ satisfies $\hat{\epsilon}\leq\epsilon<\hat{\epsilon}+0.1$ .)

In Fig.5, we plot $n(\epsilon)$ and $n(\sqrt{\epsilon})$ , which correspond to the minimum length of gate sequences to synthesize single-qubit unitary transformations within accuracy $\epsilon$ by using the deterministic implementation and the probabilistic one, respectively. This graph shows that the probabilistic implementation can reduce the circuit size in a wide range of accuracy $\epsilon$ .

Appendix G Equivalence between quantum testers and quantum networks

Recall that the Choi-Jamiołkowski operator of linear mapping $\Xi:\mathbf{L}\left(\mathcal{H}_{1}\right)\rightarrow\mathbf{L}\left(\mathcal{H}_{2}\right)$ is defined as $J(\Xi):=\sum_{i,j}|{i}\rangle\langle{j}|\otimes\Xi(|{i}\rangle\langle{j}|)\in\mathbf{L}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right)$ , and the set of quantum testers is defined as $\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2}):=\{M\in\mathbf{Pos}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right):\exists\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right),M\leq\rho\otimes\mathbb{I}\}$ . In this section, we show that the set of mappings $f_{M}:\Xi\mapsto\text{tr}\left[J\left(\Xi\right)M\right]$ associated to quantum testers $M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})$ is equivalent to that of mappings $g_{\Phi,\Pi}:\Xi\mapsto\text{tr}\left[\Xi\otimes id_{\mathcal{H}_{3}}(\Phi)\Pi\right]$ associated to pure states $\Phi$ and hermitian projectors $\Pi$ for sufficiently large dimensional Hilbert space $\mathcal{H}_{3}$ . Note a proof for more general quantum testers is given in [20, Theorem 10].

First, we show that for any $\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{3}\right)$ and $\Pi\in\mathbf{Proj}(\mathcal{H}_{2}\otimes\mathcal{H}_{3})$ , there exists $M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})$ such that $f_{M}=g_{\Phi,\Pi}$ as follows: By letting $M=\text{tr}_{3}\left[(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes\Pi)\right]$ , we obtain

\displaystyle g_{\Phi,\Pi}(\Xi)=\text{tr}\left[\Xi\otimes id_{\mathcal{H}_{3}}(\Phi)\Pi\right]

\displaystyle=

\displaystyle\text{tr}\left[(J(\Xi)\otimes\mathbb{I}_{3})(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes\Pi)\right]=\text{tr}\left[J(\Xi)M\right]=f_{M}(\Xi),

(64)

where $\Phi^{T_{1}}$ and $\text{tr}_{3}\left[\cdot\right]$ represent the partial transpose of $\Phi$ and the partial trace, and the subscript of the operator denotes the system where the operator acts on. We can also verify that $M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})$ as follows: Let $X=\sum_{ij}\alpha_{ij}|{j}\rangle_{3}\langle{i}|_{1}$ , where $|{\Phi}\rangle=\sum_{ij}\alpha_{ij}|{i}\rangle_{1}|{j}\rangle_{3}$ with the computational basis $\{|{i}\rangle_{1}\in\mathbf{P}\left(\mathcal{H}_{1}\right)\}_{i}$ and $\{|{j}\rangle_{3}\in\mathbf{P}\left(\mathcal{H}_{3}\right)\}_{j}$ . Then, we obtain that for any postive semidefinte operator $P\in\mathbf{Pos}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{2}\right)$ ,

\text{tr}\left[PM\right]=\text{tr}\left[(P\otimes\mathbb{I}_{3})(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes\Pi)\right]=\text{tr}\left[(X\otimes\mathbb{I}_{2})P(X\otimes\mathbb{I}_{2})^{\dagger}\Pi\right]\geq 0,

(65)

which implies $M\geq 0$ . By letting $\rho=\text{tr}_{3}\left[\Phi^{T_{1}}\right]=\text{tr}_{3}\left[\Phi^{T}\right](\in\mathbf{S}\left(\mathcal{H}_{1}\right))$ , we can also verify that

\displaystyle\rho\otimes\mathbb{I}_{2}-M=\text{tr}_{3}\left[(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{123}-\mathbb{I}_{1}\otimes\Pi)\right]=\text{tr}_{3}\left[(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes\Pi_{\bot})\right]\geq 0,

(66)

where $\Pi_{\bot}\in\mathbf{Proj}(\mathcal{H}_{2}\otimes\mathcal{H}_{3})$ satisfies $\Pi+\Pi_{\bot}=\mathbb{I}$ , and the last inequality can be verified by the fact that $M\geq 0$ .

Next, we show that for any $M\in\mathbf{T}(\mathcal{H}_{1}:\mathcal{H}_{2})$ , there exist $\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{3}\right)$ and $\Pi\in\mathbf{Proj}(\mathcal{H}_{2}\otimes\mathcal{H}_{3})$ such that $f_{M}=g_{\Phi,\Pi}$ as follows: Let $M\leq\rho_{1}\otimes\mathbb{I}_{2}$ , $\Phi\in\mathbf{P}\left(\mathcal{H}_{1}\otimes\mathcal{H}_{1^{\prime}}\right)$ be a purification of $\rho_{1}^{T}$ , its singular value decomposition be $|{\Phi}\rangle=\sum_{i}\sqrt{p(i)}|{x_{i}}\rangle_{1}|{y_{i}}\rangle_{1^{\prime}}$ ( $p(i)>0$ ) and $P\in\mathbf{Pos}\left(\mathcal{H}_{2}\otimes\mathcal{H}_{1^{\prime}}\right)$ be $P=XMX^{\dagger}$ , where $X=\sum_{i}\frac{1}{\sqrt{p(i)}}|{y_{i}}\rangle_{1^{\prime}}\langle{x_{i}^{*}}|_{1}$ and $|{\phi^{*}}\rangle$ is the complex conjugate of $|{\phi}\rangle$ . Then we can verify that

f_{M}(\Xi)=\text{tr}\left[J(\Xi)M\right]=\text{tr}\left[(J(\Xi)\otimes\mathbb{I}_{1^{\prime}})(\Phi^{T_{1}}\otimes\mathbb{I}_{2})(\mathbb{I}_{1}\otimes P)\right]=\text{tr}\left[\Xi\otimes id_{\mathcal{H}_{1^{\prime}}}(\Phi)P\right].

(67)

Since $P\leq X(\rho_{1}\otimes\mathbb{I}_{2})X^{\dagger}\leq\mathbb{I}_{21^{\prime}}$ , $\{P,\mathbb{I}-P\}$ is a positive operator-valued measure (POVM), which can be embedded in a larger Hilbert space $\mathcal{H}_{2}\otimes\mathcal{H}_{3}$ as a projection-valued measure (PVM) $\{\Pi,\Pi_{\bot}\}$ owing to the Naimark’s extension. This completes the proof.

	$\displaystyle\mu(B_{\epsilon}\left(\phi\right))$
	$\displaystyle=\mu\left(\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\left\\|\|{0}\rangle\langle{0}\|-\psi\right\\|_{\text{tr}}<\epsilon\right\}\right)$
	$\displaystyle=\mu\left(\left\{\psi\in\mathbf{P}\left(\mathbb{C}^{d}\right):\|\langle{0}\|{\psi}\rangle\|^{2}>1-\epsilon^{2}\right\}\right)$
	$\displaystyle=\xi\left(\left\{\vec{x}\in\mathbb{R}^{2d}:\left\\|\vec{x}\right\\|_{2}=1\wedge x_{1}^{2}+x_{2}^{2}>1-\epsilon^{2}\right\}\right),$		(20)

	$\displaystyle=\max_{A:\left\\|A\right\\|_{1}\leq 1}\left(\left\\|A\right\\|_{1}^{2}-\max_{x\in X}\|\text{tr}\left[AU_{x}\right]\|^{2}\right)$
	$\displaystyle=\max_{\begin{subarray}{c}V,\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)\\ q\in[0,1]\end{subarray}}q^{2}\left(1-\max_{x\in X}\|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\|^{2}\right)$
	$\displaystyle=1-\min_{\begin{subarray}{c}V,\rho\in\mathbf{S}\left(\mathcal{H}_{1}\right)\end{subarray}}\max_{x\in X}\|\text{tr}\left[\rho V^{\dagger}U_{x}\right]\|^{2},$		(51)

Quadratic improvement on accuracy of approximating pure quantum states and unitary gates by probabilistic implementation

Abstract

I Introduction

II Preliminaries

III Classical encoding of pure states

III.1 Minimum deterministic encoding

Lemma 1.

Lemma 2.

Theorem 1.

III.2 Minimum probabilistic encoding

Lemma 3.

Proof.

Corollary 1.

Proof.

Theorem 2.

Proof.

IV Probabilistic circuit synthesis

Theorem 3.

V Conclusion

Acknowledgements.

References

Appendix A Volume of ϵ\epsilon-ball in 𝐏​(ℂd)\mathbf{P}\left(\mathbb{C}^{d}\right)

Appendix B Upper bound for internal covering number Ii​nI_{in}

Appendix C Lower bound for external covering number Ie​xI_{ex}

C.1 Proof of Eq. (28)

C.2 Proof of Eq. (29)

Appendix D Proof for Thoerem 3 and its sharp lower bound

Theorem 4.

Proof.

D.1 Sharpness of the lower bound

Appendix E The operator norm and the diamond norm of unitary transformations

Appendix F Circuit synthesis of single qubit unitary transformations by using gate set {S,H,T}\{S,H,T\}

Appendix G Equivalence between quantum testers and quantum networks

Appendix A Volume of $\epsilon$ -ball in $\mathbf{P}\left(\mathbb{C}^{d}\right)$

Appendix B Upper bound for internal covering number $I_{in}$

Appendix C Lower bound for external covering number $I_{ex}$

Appendix F Circuit synthesis of single qubit unitary transformations by using gate set $\{S,H,T\}$