This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Observational entropy with general quantum priors

Ge Bai Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543    Dominik Šafránek Center for Theoretical Physics of Complex Systems, Institute for Basic Science (IBS), Daejeon 34126, Korea Basic Science Program, Korea University of Science and Technology (UST), Daejeon - 34113, Korea    Joseph Schindler Física Teòrica: Informació i Fenòmens Quàntics, Departament de Física, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain    Francesco Buscemi [email protected] Department of Mathematical Informatics, Nagoya University, Furo-cho Chikusa-ku, Nagoya 464-8601, Japan    Valerio Scarani [email protected] Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543 Department of Physics, National University of Singapore, 2 Science Drive 3, Singapore 117542
Abstract

Observational entropy captures both the intrinsic uncertainty of a thermodynamic state and the lack of knowledge due to coarse-graining. We demonstrate two interpretations of observational entropy, one as the statistical deficiency resulting from a measurement, the other as the difficulty of inferring the input state from the measurement statistics by quantum Bayesian retrodiction. These interpretations show that the observational entropy implicitly includes a uniform reference prior. Since the uniform prior cannot be used when the system is infinite-dimensional or otherwise energy-constrained, we propose generalizations by replacing the uniform prior with arbitrary quantum states that may not even commute with the state of the system. We propose three candidates for this generalization, discuss their properties, and show that one of them gives a unified expression that relates both interpretations.

1 Introduction

A few pages after defining the entropy that nowadays bears his name, von Neumann warns the reader that the quantity that he just defined is, in fact, unable to capture the phenomenological behavior of thermodynamic entropy [1]. More precisely, while the von Neumann entropy S(ρ):=Tr[ρlnρ]S(\rho):=-\operatorname{Tr}\!\left[\rho\ln\rho\right] is always invariant in a closed system as a consequence of its invariance under unitary evolutions, the thermodynamic entropy of a closed system can instead increase, as it happens for example in the free expansion of an ideal gas. The explanation that von Neumann gives for this apparent paradox is the following: thermodynamic entropy includes not only the intrinsic ignorance associated with the microscopic state ρ\rho of the system, but also the lack of knowledge arising from a macroscopic coarse-graining of it. The latter lack of knowledge becomes worse as the gas expands. This observation leads him to introduce an alternative quantity, that he calls macroscopic entropy, for which an HH-theorem can be proved [2].

In recent years, von Neumann’s macroscopic entropy and a generalization thereof called observational entropy (OE) has been the object of renewed interest [3, 4, 5, 6, 7, 8], finding a number of applications [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. So far, even when the narrative is based on a quantum state ρ\rho being subject to a measurement 𝖬\mathsf{M}, all the definitions fit in classical stochastic thermodynamics.

In this paper, we explore possible generalizations of OE. We note that the original OE includes an implicit prior belief about the state, which is the uniform distribution. Since in several applications the uniform prior cannot be used, e.g., in infinite-dimensional or continuous variable systems, or does not play well with other physical constraints, e.g., in thermodynamic systems with a nondegenerate Hamiltonian at finite temperature, we allow the observer to have a non-uniform prior. More generally, we consider the possibility that the observer has a reference prior described by an arbitrary density operator, which may not even commute with the state of the system. In this case, classical probability distributions may not be sufficient to describe the non-commutativity between the state and the reference, and thus the original definition of OE is not applicable.

2 Classical OE and reference states

In what follows, we restrict our attention to finite-dimensional quantum systems, with Hilbert space d\mathbb{C}^{d}, and finite measurements, i.e., positive operator-valued measures (POVMs) 𝖬={Πy}y\mathsf{M}=\{\Pi_{y}\}_{y} labeled by the elements of a finite set y{1,,m}y\in\{1,\dots,m\}. In this context, the definition of OE is

S𝖬(ρ)y=1mpylnpyVy,\displaystyle S_{\mathsf{M}}(\rho)\coloneqq-\sum_{y=1}^{m}p_{y}\,\ln\frac{p_{y}}{V_{y}}\;, (1)

where py:=Tr[ρΠy]p_{y}:=\operatorname{Tr}\!\left[\rho\Pi_{y}\right] and Vy:=Tr[Πy]V_{y}:=\operatorname{Tr}\!\left[\Pi_{y}\right]. One of the conceptual advantages of OE is that it is able to “interpolate” between Boltzmann and Gibbs–Shannon entropies. On the one hand, if the measurement is so coarse-grained that one of its elements (say Π1\Pi_{1}) is the projector on the support of ρ\rho, then S𝖬(ρ)=lnV1S_{\mathsf{M}}(\rho)=\ln V_{1} takes the form of a Boltzmann entropy. If, on the other hand, the measurement is projective and rank-one (i.e., Vy=1V_{y}=1 for all yy), then S𝖬(ρ)S_{\mathsf{M}}(\rho) coincides with the Shannon entropy of the probability distribution {py}\{p_{y}\}, which is equal to S(ρ)S(\rho) when ρ=ypyΠy\rho=\sum_{y}p_{y}\Pi_{y}.

In general, it holds that [4]

Σ𝖬(ρ)S𝖬(ρ)S(ρ)0.\displaystyle\Sigma_{\mathsf{M}}(\rho)\coloneqq S_{\mathsf{M}}(\rho)-S(\rho)\geq 0\,. (2)

If S(ρ)S(\rho) represents, in von Neumann’s original narrative, the least uncertainty that an observer, able to perform any measurement in principle allowed by quantum theory, has about the state of the system, then the additional uncertainty Σ𝖬(ρ)\Sigma_{\mathsf{M}}(\rho) included in OE is a consequence of observing the system through the “lens” provided by the given measurement 𝖬\mathsf{M}. Thus, in this sense, OE can be seen as a measure of how inadequate a given measurement 𝖬\mathsf{M} is with respect to the state ρ\rho.

2.1 OE from statistical deficiency

The above discussion suggests one possible generalization of OE, starting from the re-writing of (2) recently noticed by some of us [8]. Consider the measurement channel \mathcal{M} associated to the measurement 𝖬\mathsf{M}, defined as

(ρ)yTr[Πyρ]|yy|,\displaystyle\mathcal{M}(\rho)\coloneqq\sum_{y}\mathrm{Tr}[\Pi_{y}\rho]|y\rangle\!\!\!\;\langle y|\;, (3)

where {|y}\{\ket{y}\} is an arbitrary but fixed orthonormal basis of the system that records the measurement outcome. By further noticing that Vy=dTr[Πyu]V_{y}=d\operatorname{Tr}\!\left[\Pi_{y}u\right] with u=𝟙/du=\mathds{1}/d the maximally mixed state, one obtains

Σ𝖬(ρ)=D(ρu)D((ρ)(u)),\displaystyle\Sigma_{\mathsf{M}}(\rho)=D(\rho\|u)-D(\mathcal{M}(\rho)\|\mathcal{M}(u))\;, (4)

where

D(ρσ)Tr[ρ(lnρlnσ)]\displaystyle D(\rho\|\sigma)\coloneqq\mathrm{Tr}[\rho(\ln\rho-\ln\sigma)] (5)

is the Umegaki quantum relative entropy between states ρ\rho and σ\sigma [22, 23], which generalizes the relative entropy (a.k.a. Kullback-Leibler divergence)

D(pq)ipilnpiqi\displaystyle D(p\|q)\coloneqq\sum_{i}p_{i}\ln\frac{p_{i}}{q_{i}} (6)

between probability distributions {pi}\{p_{i}\} and {qi}\{q_{i}\} [24].

The expression (4) makes it clear that the quantity Σ𝖬(ρ)\Sigma_{\mathsf{M}}(\rho) exactly equals the loss of distinguishability between the signal ρ\rho and the totally uniform background uu that occurs when the measurement 𝖬\mathsf{M} is used instead of the best possible measurement allowed by quantum theory. In statistical jargon, we thus say that Σ𝖬(ρ)\Sigma_{\mathsf{M}}(\rho) measures the statistical deficiency of the measurement 𝖬\mathsf{M} in distinguishing ρ\rho against uu.

This observation enlightens something implicit in the original definition (1) of OE: the coarse-graining is captured by the “volumes” Vy=Tr[Πy]V_{y}=\operatorname{Tr}\!\left[\Pi_{y}\right] only because the maximally mixed state is chosen as the reference background. It is thus natural to try to incorporate more general references in the definition of OE. A direct generalization could be obtained, therefore, by replacing uu with another reference state γ\gamma in (4), so that

S𝖬,γ(ρ)\displaystyle S_{\mathsf{M},\gamma}(\rho) S(ρ)+Σ𝖬,γ(ρ)\displaystyle\coloneqq S(\rho)+\Sigma_{\mathsf{M},\gamma}(\rho)
S(ρ)+𝔇(ργ)D((ρ)(γ)),\displaystyle\coloneqq S(\rho)+\mathfrak{D}(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))\,, (7)

where 𝔇()\mathfrak{D}(\cdot\|\cdot) represents some non-commutative generalization of the Kullback–Leibler divergence, not necessarily Umegaki’s one.

2.2 OE from irretrodictability

There exists another evocative re-writing of (2), which in turn suggests that further structures may play a role in the definition of OE. Specifically, here we exhibit a dynamical interpretation of OE, based on a measurement process defined as follows.

Let ρ=x=1dλx|ψxψx|\rho=\sum_{x=1}^{d}\lambda_{x}|\psi_{x}\rangle\!\!\!\;\langle\psi_{x}| be a diagonal decomposition of the state of the system. We consider a stochastic map associated to a prepare-and-measure protocol: with probability λx\lambda_{x}, the state |ψx\ket{\psi_{x}} is prepared, and it is then measured with the POVM 𝖬={Πy}y\mathsf{M}=\{\Pi_{y}\}_{y}, yielding outcome yy with a probability given by the Born rule, that is

PF(y|x)\displaystyle P_{F}(y|x) ψx|Πy|ψx,\displaystyle\coloneqq\bra{\psi_{x}}\Pi_{y}\ket{\psi_{x}}\,, (8)
PF(x,y)\displaystyle P_{F}(x,y) λxψx|Πy|ψx.\displaystyle\coloneqq\lambda_{x}\bra{\psi_{x}}\Pi_{y}\ket{\psi_{x}}\,. (9)

The subscript FF stands for “forward”. This is because, as we will see in what follows, the quantity Σ𝖬(ρ)\Sigma_{\mathsf{M}}(\rho) in (2) emerges also from a comparison between the forward map defined above and a suitably defined “reverse” map.

Traditionally, the definition of the reverse process, given a forward process, relies on a detailed knowledge of the physical dynamics involved [25, 26, 27]. In the absence of such knowledge, which is typically the case for a system interacting with a complex environment, one must resort to physical intuition, plausibility arguments, or, failing that, arbitrary assumptions. In order to avoid all this, a systematic recipe has recently been found [28, 29], which allows to define the reverse process only from unavoidable rules of logical retrodiction: specifically, Jeffrey’s theory of probability kinematics [30] or, equivalently, Pearl’s virtual evidence method [31, 32].

The idea is as follows: given a forward conditional probability PF(y|x)P_{F}(y|x), how should information, obtained at later times about the final outcome and encoded in an arbitrary distribution qyq_{y}, be propagated back to the initial state in a way that ensures logical consistency? Jeffrey’s theory of probability kinematics, which is equivalent to Pearl’s virtual evidence method [31, 32], stipulates that the only logically consistent back-propagation rule is what is now known as Jeffrey’s update: starting from an arbitrarily chosen reference prior on the initial state xx, say γx\gamma_{x}, one constructs the Bayesian inverse of PF(y|x)P_{F}(y|x), i.e.,

PRγ(x|y):=γxPF(y|x)xγxPF(y|x),\displaystyle P_{R}^{\gamma}(x|y):=\frac{\gamma_{x}P_{F}(y|x)}{\sum_{x^{\prime}}\gamma_{x^{\prime}}P_{F}(y|{x^{\prime}})}\,, (10)

and uses that as a stochastic channel to back-propagate the new information qyq_{y} from the final outcome yy to the initial state xx, so that as the reverse process we obtain

PRγ(y,x)=qyPRγ(x|y).\displaystyle P_{R}^{\gamma}(y,x)=q_{y}\,P_{R}^{\gamma}(x|y)\;. (11)

Jeffrey’s update constitutes a generalization of Bayes’ theorem, as the latter is recovered as a special case of the former when qy=δy,y0q_{y}=\delta_{y,y_{0}}, i.e., when the information about the final outcome is definite [32].

An important point to emphasize here is that the reference prior γx\gamma_{x} used to construct the retrodictive channel in (10) is merely a formal device needed to establish a mathematical correspondence between forward and backward process: it need not be related in any way with the “true” distribution λx\lambda_{x}. Likewise, the distribution qyq_{y} represents new and completely arbitrary information, which need not correspond to any input distribution under PFP_{F}, that is, there may exist no distribution qxq^{\prime}_{x} such that qy=xPF(y|x)qxq_{y}=\sum_{x}P_{F}(y|x)q^{\prime}_{x}. Moreover, in principle, qyq_{y} may also be incompatible with the reference prior γx\gamma_{x}, in the sense that it could happen that, for some yy, qy>0q_{y}>0 but xγxPF(y|x)=0\sum_{x}\gamma_{x}P_{F}(y|x)=0. In such a situation, one would conclude that the data falsify the inferential model, but for simplicity we will avoid such cases by assuming that all probabilities are strictly greater than zero (though possibly arbitrarily small).

We now go back to our specific forward process (9), i.e., PF(y|x)=ψx|Πy|ψxP_{F}(y|x)=\bra{\psi_{x}}\Pi_{y}\ket{\psi_{x}}. If we choose as reference the uniform distribution γ=u\gamma=u, i.e., γx=1/d\gamma_{x}=1/d for all xx, and as new information information the outcomes’ expected probability of occurrence, i.e., qy=py=Tr[ρΠy]q_{y}=p_{y}=\mathrm{Tr}[\rho\Pi_{y}], by direct substitution in (10) and (11), we obtain

PRu(y,x)=pyψx|ΠyTr[Πy]|ψx.\displaystyle P^{u}_{R}(y,x)=p_{y}\bra{\psi_{x}}\frac{\Pi_{y}}{\operatorname{Tr}\!\left[\Pi_{y}\right]}\ket{\psi_{x}}\;. (12)

The above can also be read as a prepare-and-measure process, in which the state σyΠyTr[Πy]\sigma_{y}\coloneqq\frac{\Pi_{y}}{\operatorname{Tr}\!\left[\Pi_{y}\right]} is prepared with probability pyp_{y}, and later measured in the basis {|ψx}\{\ket{\psi_{x}}\}. The process in (12) is the process that a retrodictive agent would infer, knowing only the forward process (9) and the outcome distribution pyp_{y}, but completely ignoring the actual distribution λx\lambda_{x}, so that the latter is replaced by the uniform distribution.

Using (9) and (12), it is straightforward to check that

Σ𝖬(ρ)=D(PFPRu).\displaystyle\Sigma_{\mathsf{M}}(\rho)=D(P_{F}\|P^{u}_{R})\;. (13)

The above relation suggests an alternative interpretation for the difference Σ𝖬(ρ)\Sigma_{\mathsf{M}}(\rho), as the degree of statistical distinguishability between a predictive process, i.e., PF(x,y)P_{F}(x,y), and a retrodictive process constructed from a uniform reference, i.e., PRu(y,x)P_{R}^{u}(y,x). Thus, the larger Σ𝖬(ρ)\Sigma_{\mathsf{M}}(\rho), the more irretrodictable the process becomes [33, 34].

Eq. (13) also offers an alternative way of thinking about generalizations of OE, where the uniform reference is again replaced by an arbitrary state, as was done in Section 2.1, but this time for the purpose of constructing another reverse process. That is, one could also consider generalizations such as

S~𝖬,γ(ρ)S(ρ)+𝔇(QFQRγ),\displaystyle\widetilde{S}_{\mathsf{M},\gamma}(\rho)\coloneqq S(\rho)+\mathfrak{D}(Q_{F}\|Q_{R}^{\gamma})\;, (14)

where 𝔇\mathfrak{D} is again some quantum relative entropy (not necessarily Umegaki’s), QFQ_{F} is an input-output description of the quantum process consisting in preparing the state ρ\rho and measuring it with the channel \mathcal{M}, and QRγQ_{R}^{\gamma} is the description of the corresponding reverse process computed with respect to the reference prior γ\gamma. All these ingredients will be rigorously defined in Section 4.

3 A definition of OE for priors γ\gamma such that [ρ,γ]=0[\rho,\gamma]=0

As we have seen, in the case of conventional OE, the statistical deficiency approach and the irretrodictability approach coincide, i.e.

Σ𝖬(ρ)\displaystyle\Sigma_{\mathsf{M}}(\rho) =D(ρu)D((ρ)(u))=D(PFPRu).\displaystyle=D(\rho\|u)-D(\mathcal{M}(\rho)\|\mathcal{M}(u))=D(P_{F}\|P_{R}^{u})\,.

The same holds for any prior γ\gamma that commutes with ρ\rho. Indeed, assuming [ρ,γ]=0[\rho,\gamma]=0, let us write γ=x=1dγx|ψxψx|\gamma=\sum_{x=1}^{d}\gamma_{x}|\psi_{x}\rangle\!\!\!\;\langle\psi_{x}| using the same vectors that diagonalize ρ\rho. The reverse process of PFP_{F} [Eq. (9)] becomes

PRγ(x|y)\displaystyle P_{R}^{\gamma}(x|y) γxψx|Πy|ψxTr[Πyγ]\displaystyle\coloneqq\frac{\gamma_{x}\bra{\psi_{x}}\Pi_{y}\ket{\psi_{x}}}{\mathrm{Tr}[\Pi_{y}\gamma]} (15)
PRγ(x,y)\displaystyle P_{R}^{\gamma}(x,y) =pyPRγ(x|y),\displaystyle=p_{y}P_{R}^{\gamma}(x|y)\,,

and it is straightforward to verify that

D(ργ)D((ρ)(γ))=D(PF||PRγ).\displaystyle D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))=D(P_{F}||P_{R}^{\gamma})\,. (16)

Therefore, when [ρ,γ]=0[\rho,\gamma]=0, the expression

S𝖬,γclax(ρ)\displaystyle S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho)
S(ρ)+D(ργ)D((ρ)(γ))\displaystyle\coloneqq\;S(\rho)+D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) (17)
=S(ρ)+D(PF||PRγ)\displaystyle=\;S(\rho)+D(P_{F}||P_{R}^{\gamma}) (18)
=Tr[ρlnγ]D((ρ)(γ)).\displaystyle=-\mathrm{Tr}[\rho\;\ln\gamma]-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))\;. (19)

is a generalized OE that fits both the statistical deficiency approach and the irretrodictability approach. As we are going to see, it is not obvious to ensure both interpretations when [ρ,γ]0[\rho,\gamma]\neq 0.

Note that nothing has been said about the measurement 𝖬\mathsf{M}, which may well not commute with either ρ\rho or γ\gamma. As a case study, let us look closely to the fully classical case in which, besides having [ρ,γ]=0[\rho,\gamma]=0, the measurement is a projective measurement in their same eigenbasis {|k}\{\ket{k}\}:

𝖬\displaystyle\mathsf{M} ={Πy}y with Πy=kK(y)|kk|\displaystyle=\{\Pi_{y}\}_{y}\quad\textrm{ with }\quad\Pi_{y}=\sum_{k\in K(y)}\ket{k}\bra{k} (20)

where the index sets K(y)K(y) are disjoint and complete as to form a POVM, i.e. yK(y)=1,,d\bigcup_{y}K(y)={1,...,d}, and K(y)K(y)=K(y)\cap K(y^{\prime})=\emptyset for yyy\neq y^{\prime}. Then, denoting by {rk}\{r_{k}\} and {gk}\{g_{k}\} the eigenvalues of ρ\rho and γ\gamma respectively, we have py=kK(y)rkp_{y}=\sum_{k\in K(y)}r_{k} and Tr[γΠy]Gy=kK(y)gk\operatorname{Tr}\!\left[\gamma\Pi_{y}\right]\equiv G_{y}=\sum_{k\in K(y)}g_{k}, and Eq. (19) yields

S𝖬,γclax(ρ)\displaystyle S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho) =krkloggkD({py}||{Gy}).\displaystyle=-\sum_{k}r_{k}\log g_{k}-D(\{p_{y}\}||\{G_{y}\})\,. (21)

While the second term depends only on the observed statistics {py}\{p_{y}\} by construction, the first term depends in general on the full information {rk\{r_{k}}. In fact, S𝖬,γclaxS^{\textrm{clax}}_{\mathsf{M},\gamma} depends only on the {py}\{p_{y}\} if and only if the gkg_{k} are uniform in each K(y)K(y) subspace; that is, gk=Gy/|K(y)|g_{k}=G_{y}/|K(y)| for every kK(y)k\in K(y) and for every yy, or, equivalently, γ=yGyΠy/|K(y)|\gamma=\sum_{y}G_{y}\Pi_{y}/|K(y)|. In this case, S𝖬,γclax(ρ)=S𝖬(ρ)S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho)=S_{\mathsf{M}}(\rho): the weights GyG_{y} of the prior do not matter. The interpretation of this observation is clear: the observation gives precisely the weights pyp_{y} to be attributed to each Πy\Pi_{y}, trumping any prior belief on those weights.

When the dependence on the {rk}\{r_{k}\} is present, it is a mild one: for instance, in the paradigmatic case where γ\gamma is thermal, the term krkloggk-\sum_{k}r_{k}\log g_{k} is (up to an additive constant) the average energy, often assumed as known in thermodynamics. A purely “observational” character could be recovered with minor modifications of the definition; we leave this aside.

4 Definitions of OE with an arbitrary quantum reference prior

In this section, we introduce the mathematical notations and backgrounds, and propose some candidates for OE with a general reference prior state.

4.1 Input-output description of quantum processes

Let AA and BB respectively denote the input and output systems of the measurement channel \mathcal{M} in Eq. (3). The general recipe for the retrodiction of a quantum process \mathcal{M} [28, 8, 35] is defined via its Petz recovery map [36, 37] as

~γ(τ)γ1/2((γ)1/2τ(γ)1/2)γ1/2\displaystyle\widetilde{\mathcal{M}}^{\gamma}(\tau)\coloneqq\gamma^{1/2}\mathcal{M}^{\dagger}(\mathcal{M}(\gamma)^{-1/2}\tau\mathcal{M}(\gamma)^{-1/2})\gamma^{1/2} (22)

where τ:=yqy|yy|\tau:=\sum_{y}q_{y}|y\rangle\!\!\!\;\langle y| encodes the distribution {qy}\{q_{y}\}, describing the retrodictor’s knowledge, cfr. Eq. (11). We will mainly discuss the case where τ=(ρ)\tau=\mathcal{M}(\rho), namely qy=py=Tr[Πyρ]q_{y}=p_{y}=\mathrm{Tr}[\Pi_{y}\rho]. For the measurement channel given in (3), the Petz recovery map can be written as

~γ(τ)=yy|τ|yTr[Πyγ]γΠyγ.\displaystyle\widetilde{\mathcal{M}}^{\gamma}(\tau)=\sum_{y}\frac{\bra{y}\tau\ket{y}}{\mathrm{Tr}[\Pi_{y}\gamma]}\sqrt{\gamma}\Pi_{y}\sqrt{\gamma}\,. (23)

As an ingredient for later constructions, we introduce the Choi operator [38], defined for the process \mathcal{M} from system AA to system BB as

C\displaystyle C_{\mathcal{M}} i,j(|ij|)|ij|,\displaystyle\coloneqq\sum_{i,j}\mathcal{M}(\ket{i}\bra{j})\otimes\ket{i}\bra{j}\,, (24)

where |i\ket{i} and |j\ket{j} belong to an arbitrary but fixed orthonormal basis of the input Hilbert space A\mathcal{H}_{A} of system AA. The reverse process has the following Choi operator

C~γ\displaystyle C_{\widetilde{\mathcal{M}}^{\gamma}} k,l|kl|~γ(|kl|),\displaystyle\coloneqq\sum_{k,l}\ket{k}\bra{l}\otimes\widetilde{\mathcal{M}}^{\gamma}(\ket{k}\bra{l})\,, (25)

where |k\ket{k} and |l\ket{l} belong to an arbitrarily fixed orthonormal basis of the Hilbert space B\mathcal{H}_{B}. Note that we put system BB first and system AA second, in order to have the same ordering of systems for both CC_{\mathcal{M}} and C~γC_{\widetilde{\mathcal{M}}^{\gamma}}.

With such a definition, the Choi operators of the forward and reverse processes are related by the following lemma (proof in Appendix A):

Lemma 1.

For a quantum channel \mathcal{M} and its Petz map ~γ\widetilde{\mathcal{M}}^{\gamma}, their Choi operators CC_{\mathcal{M}} and C~γC_{\widetilde{\mathcal{M}}^{\gamma}} are related as

C~γT=\displaystyle C_{\widetilde{\mathcal{M}}^{\gamma}}^{T}= (26)
((γ)1/2γT)C((γ)1/2γT),\displaystyle\left(\mathcal{M}(\gamma)^{-1/2}\otimes\sqrt{\gamma^{T}}\right)C_{\mathcal{M}}\left(\mathcal{M}(\gamma)^{-1/2}\otimes\sqrt{\gamma^{T}}\right)\;,

where the superscript T{\bullet}^{T} denotes the matrix transposition done with respect to the fixed bases used in Eqs. (24) and (25).

We now want to construct two objects, QFQ_{F} and QRγQ_{R}^{\gamma}, which, analogously to the joint distributions PFP_{F} and PRγP^{\gamma}_{R}, are able to capture both the input and output of the forward and reverse processes. Specifically, the marginals of the operator QFQ_{F} should recover the input state ρ\rho and the output (ρ)\mathcal{M}(\rho) respectively, and analogously for QRγQ_{R}^{\gamma}.

One choice is to define

QF:=(𝟙BρT)C(𝟙BρT).\displaystyle Q_{F}:=(\mathds{1}_{B}\otimes\sqrt{\rho^{T}})C_{\mathcal{M}}(\mathds{1}_{B}\otimes\sqrt{\rho^{T}})\,. (27)

Such an operator is indeed able to capture the input and output of the forward process, in the sense that:

TrA[QF]=(ρ),TrB[QF]=ρT.\displaystyle\mathrm{Tr}_{A}[Q_{F}]=\mathcal{M}(\rho),\quad\mathrm{Tr}_{B}[Q_{F}]=\rho^{T}\,. (28)

We define the representation for the reverse process ~γ\widetilde{\mathcal{M}}^{\gamma} similarly as

QRγ\displaystyle Q_{R}^{\gamma} :=(τ𝟙A)C~γT(τ𝟙A),\displaystyle:=\left(\sqrt{\tau}\otimes\mathds{1}_{A}\right)C_{\widetilde{\mathcal{M}}^{\gamma}}^{T}\left(\sqrt{\tau}\otimes\mathds{1}_{A}\right)\,, (29)

where τ=(ρ)\tau=\mathcal{M}(\rho) is the input of the reverse process. We use the transpose of the Choi operator of the reverse process so that it can be linked to CC_{\mathcal{M}} by Lemma 1 in the following way:

QRγ=\displaystyle Q_{R}^{\gamma}=
(τ(γ)1γT)C((γ)1τγT).\displaystyle\left(\sqrt{\tau}\sqrt{\mathcal{M}{(\gamma)}}^{-1}\otimes\sqrt{\gamma^{T}}\right)C_{\mathcal{M}}\left(\sqrt{\mathcal{M}{(\gamma)}}^{-1}\sqrt{\tau}\otimes\sqrt{\gamma^{T}}\right)\,.

This operator captures the input and output of the reverse process:

TrA[QRγ]=τ,TrB[QRγ]=[~γ(τ)]T.\displaystyle\mathrm{Tr}_{A}[Q_{R}^{\gamma}]=\tau,\quad\mathrm{Tr}_{B}[Q_{R}^{\gamma}]=[\widetilde{\mathcal{M}}^{\gamma}(\tau)]^{T}\,. (30)

The operators QF,QRγQ_{F},Q_{R}^{\gamma} just defined are analogous to the state over time proposed by Leifer and Spekkens [39, 40] up to a partial transpose.

Other definitions of input-output operators may satisfy nice properties. An alternative choice is, for example,

QFt:=C(𝟙BρT)C\displaystyle{}^{t}Q_{F}:=\sqrt{C_{\mathcal{M}}}(\mathds{1}_{B}\otimes\rho^{T})\sqrt{C_{\mathcal{M}}} (31)

and

QRγt:=\displaystyle{}^{t}Q_{R}^{\gamma}:= (32)
C((γ)1/2τ(γ)1/2γT)C.\displaystyle\sqrt{C_{\mathcal{M}}}(\mathcal{M}(\gamma)^{-1/2}\tau\mathcal{M}(\gamma)^{-1/2}\otimes\gamma^{T})\sqrt{C_{\mathcal{M}}}\;.

The superscript t{}^{t}\bullet in (31) and (32) (not to be confused with T\bullet^{T}) is used because the operators QFt{}^{t}Q_{F} and QRγt{}^{t}Q_{R}^{\gamma} are, in a loose sense, a “transposition” of QFQ_{F} and QFγQ_{F}^{\gamma}, respectively. If Πy,ρ,γ\Pi_{y},\rho,\gamma do not commute, in general QFtQF{}^{t}Q_{F}\neq Q_{F} and QRγtQRγ{}^{t}Q_{R}^{\gamma}\neq Q_{R}^{\gamma}. For example, TrB[QFt]=(yΠyρΠy)T\mathrm{Tr}_{B}[{}^{t}Q_{F}]=(\sum_{y}\sqrt{\Pi_{y}}\rho\sqrt{\Pi_{y}})^{T} which in general differs from ρT\rho^{T}. Yet, they are similar, in the sense that QFQ_{F} and QFt{}^{t}Q_{F} (resp. QRγQ_{R}^{\gamma} and QRγt{}^{t}Q_{R}^{\gamma}) share the same eigenvalues, and are thus unitarily equivalent, as it happens when doing a proper transposition. Therefore, QFt{}^{t}Q_{F} and QRγt{}^{t}Q_{R}^{\gamma} can be viewed as legitimate representations (up to unitaries) of the forward and reverse processes, and they will be useful in the irretrodictability interpretation of OE.

4.2 Candidates for generalized OE

Eqs. (7) and (14) provide two forms of the observational entropy: Eq. (7), arising from the statistical deficiency approach, is the difference between relative entropies evaluated on the input system and the output system; Eq. (14), arising from the irretrodictability approach, is the relative entropy between the forward and reverse processes. In the remainder of this section, we will propose generalizations of OE that take either or both of these forms.

4.2.1 Candidate #1: difference between input/output Umegaki entropies

A first fully quantum generalisation of OE may just be obtained by replacing the reference state uu in Eq. (4) with a general reference state γ\gamma:

S𝖬,γ(1)(ρ)S(ρ)+Σ𝖬,γ(1)(ρ)\displaystyle S^{(1)}_{\mathsf{M},\gamma}(\rho)\coloneqq\,S(\rho)+\Sigma^{(1)}_{\mathsf{M},\gamma}(\rho) (33)
withΣ𝖬,γ(1)(ρ)=D(ργ)D((ρ)(γ)),\displaystyle\leavevmode\nobreak\ \textrm{with}\;\Sigma^{(1)}_{\mathsf{M},\gamma}(\rho)=D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))\,,

that is, S𝖬,γ(1)(ρ)=Tr[ρlnγ]D((ρ)(γ))S^{(1)}_{\mathsf{M},\gamma}(\rho)=-\mathrm{Tr}[\rho\ \ln\gamma]-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)), cf. Eq. (19), though this time it may be that [ρ,γ]0[\rho,\gamma]\neq 0. This definition has the form of Eq. (7) with 𝔇\mathfrak{D} taken to be the Umegaki relative entropy (5). Notice that, while D(ργ)D(\rho\|\gamma) is a fully quantum relative entropy, D((ρ)(γ))D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) is in fact classical, since all the outputs of the channel \mathcal{M} are diagonal in the same basis.

4.2.2 Candidate #2: Umegaki relative entropy between forward/reverse processes

Another option is to define a OE through Eq. (14), thus choosing to prioritize irretrodictability. For this, one needs to choose a relative entropy and representations of the forward and reverse processes. Using the Umegaki relative entropy and the representations defined in (27) and (29), we get

S𝖬,γ(2)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(2)}(\rho)\coloneqq\, S(ρ)+Σ𝖬,γ(2)(ρ)\displaystyle S(\rho)+\Sigma_{\mathsf{M},\gamma}^{(2)}(\rho) (34)
withΣ𝖬,γ(2)(ρ)=D(QFQRγ).\displaystyle\textrm{with}\;\Sigma_{\mathsf{M},\gamma}^{(2)}(\rho)=D(Q_{F}\|Q_{R}^{\gamma}).

However, we will show in the following sections that this candidate lacks some of the properties we desire: we introduced it mainly for comparison with other candidates.

4.2.3 Candidate #3: Belavkin–Staszewski relative entropy

Besides Umegaki relative entropy, there are other choices for the quantum relative entropy between the representations of the forward and reverse processes, and between the states ρ\rho and γ\gamma. One such choice is the Belavkin–Staszewski relative entropy [41], defined as

DBS(ρσ)Tr[ρlnρσ1].\displaystyle D_{\rm BS}(\rho\|\sigma)\coloneqq\mathrm{Tr}[\rho\ \ln\rho\sigma^{-1}]. (35)

The Belavkin–Staszewski relative entropy coincides with the Umegaki relative entropy and the classical relative entropy when ρ\rho and σ\sigma commute, otherwise in general it is never smaller than Umegaki’s. For a summary of the main properties of Belavkin–Staszewski relative entropy, and its relations with other quantum relative entropies, we refer the interested reader to Ref. [42].

Inserting DBSD_{\rm BS} into Eq. (7), we obtain

S𝖬,γ(3)(ρ)S(ρ)+Σ𝖬,γ(3)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(3)}(\rho)\coloneqq S(\rho)+\Sigma_{\mathsf{M},\gamma}^{(3)}(\rho) (36)
withΣ𝖬,γ(3)(ρ)=DBS(ργ)D((ρ)(γ)).\displaystyle\leavevmode\nobreak\ \textrm{with}\;\Sigma_{\mathsf{M},\gamma}^{(3)}(\rho)=D_{\rm BS}(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))\,.

Remarkably, it turns out that the above definition recovers the form of Eq. (14). Assuming that QRγQ_{R}^{\gamma}, and thus QRγt{}^{t}Q_{R}^{\gamma}, is full-rank, one has

DBS(ργ)DBS((ρ)(γ))=DBS(QFtQRγt),\displaystyle D_{\rm BS}(\rho\|\gamma)-D_{\rm BS}(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))=D_{\rm BS}\left({}^{t}Q_{F}\middle\|{}^{t}Q_{R}^{\gamma}\right), (37)

where DBS((ρ)(γ))=𝒟((ρ)(γ))D_{\rm BS}(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))=\mathcal{D}(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) since those states commute, and where QFt{}^{t}Q_{F} and QRγt{}^{t}Q_{R}^{\gamma} were defined in (31) and (32). The proof of the identity (37) is given in Appendix B. Thus, Σ𝖬,γ(3)\Sigma_{\mathsf{M},\gamma}^{(3)} indeed admits both the statistical deficiency and the irretrodictability interpretations.

5 Properties

Definition Deficiency interpretation Irretrodictability interpretation Equal to S𝖬,γclax(ρ)S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho) when Petz recovery criterion Non-decreasing under stochastic post-processing
S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)} (33) D(ργ)D((ρ)(γ))D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) N/A [ρ,γ]=0[\rho,\gamma]=0 Yes Yes
S𝖬,γ(2)S_{\mathsf{M},\gamma}^{(2)} (34) N/A D(QFQRγ)D(Q_{F}\|Q_{R}^{\gamma}) ρ,γ,Πy\rho,\gamma,\Pi_{y} commute Yes No
S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)} (36) DBS(ργ)D((ρ)(γ))D_{\rm BS}(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) DBS(QFtQRγt)D_{\rm BS}({}^{t}Q_{F}\|{}^{t}Q_{R}^{\gamma}) [ρ,γ]=0[\rho,\gamma]=0 Yes Yes
Table 1: Properties of S𝖬,γ(j)S^{(j)}_{\mathsf{M},\gamma}. The expressions for the statistical deficiency and irretrodictability interpretations do not match if one uses the Umegaki relative entropy. On the other hand, the use of the Belavkin-Staszewski relative entropy gives an expression that unifies both interpretations.

We proceed now to discuss the properties of the three candidates S𝖬,γ(j)S^{(j)}_{\mathsf{M},\gamma} (j=1,2,3j=1,2,3) defined above, with a comparison between them summarized in Table 1. The main properties to consider for any candidate generalized OE are the following:

  1. (i)

    When the reference prior is the uniform distribution (maximally mixed state), the candidate should recover the original definition (1). This is true for S𝖬,γ(1)S^{(1)}_{\mathsf{M},\gamma} and S𝖬,γ(3)S^{(3)}_{\mathsf{M},\gamma}, namely when γ=u:=𝟙/d\gamma=u:=\mathds{1}/d,

    S𝖬,u(1,3)(ρ)=S𝖬(ρ).\displaystyle S_{\mathsf{M},u}^{(1,3)}(\rho)=S_{\mathsf{M}}(\rho)\,. (38)

    Instead, in order to recover the conventional OE, S𝖬,γ(2)S_{\mathsf{M},\gamma}^{(2)} further requires that [ρ,Πy]=0[\rho,\Pi_{y}]=0 for all yy.

  2. (ii)

    More generally, when [ρ,γ]=0[\rho,\gamma]=0, one has

    S𝖬,γ(1,3)(ρ)=S𝖬,γclax(ρ).\displaystyle S_{\mathsf{M},\gamma}^{(1,3)}(\rho)=S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho)\,. (39)

    Instead, the condition S𝖬,γ(2)(ρ)=S𝖬,γclax(ρ)S_{\mathsf{M},\gamma}^{(2)}(\rho)=S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho) in general requires [ρ,γ]=[ρ,Πy]=[γ,Πy]=0[\rho,\gamma]=[\rho,\Pi_{y}]=[\gamma,\Pi_{y}]=0 for all yy.

  3. (iii)

    Like the original OE, all of them are lower-bounded by the von Neumann entropy:

    S𝖬,γ(j)(ρ)S(ρ).\displaystyle S^{(j)}_{\mathsf{M},\gamma}(\rho)\geq S(\rho)\,. (40)

    Thus, the OEs retain the desirable property that one cannot have less uncertainty than the von Neumann entropy.

The proofs of the above properties are in Appendix C.

Other non-essential, yet desirable properties include:

  1. (iv)

    S𝖬,γ(j)(ρ)S_{\mathsf{M},\gamma}^{(j)}(\rho) admits both interpretations, as statistical deficiency, i.e. Eq. (7), and irretrodictability, i.e., Eq. (14). This property is satisfied by S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)}, with suitable definitions of the input-output descriptions.

  2. (v)

    S𝖬,γ(j)(ρ)S^{(j)}_{\mathsf{M},\gamma}(\rho) satisfies the Petz recovery criterion: S𝖬,γ(j)(ρ)=S(ρ)S^{(j)}_{\mathsf{M},\gamma}(\rho)=S(\rho) if and only if ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}\big{(}\mathcal{M}(\rho)\big{)}=\rho, where ~γ\widetilde{\mathcal{M}}^{\gamma} is the Petz map of \mathcal{M} with reference γ\gamma defined in (22). This property is satisfied by all candidates, as shown in Appendix D.

  3. (vi)

    S𝖬,γ(j)(ρ)S^{(j)}_{\mathsf{M},\gamma}(\rho) is non-decreasing under stochastic post-processing. We say 𝖬={Πz}\mathsf{M}^{\prime}=\{\Pi^{\prime}_{z}\} is a post-processing of 𝖬\mathsf{M} if its outcome can be obtained by applying a stochastic map on the outcome of 𝖬\mathsf{M}, namely there exists a stochastic matrix ww with zwzy=1\sum_{z}w_{zy}=1 for all yy satisfying

    Πz=iwzyΠy,y.\displaystyle\Pi^{\prime}_{z}=\sum_{i}w_{zy}\Pi_{y},\leavevmode\nobreak\ \forall y\,. (41)

    This property for S𝖬,γ(j)(ρ)S^{(j)}_{\mathsf{M},\gamma}(\rho) says that, for any 𝖬\mathsf{M}^{\prime} that is a post-processing of 𝖬\mathsf{M}, one has

    S𝖬,γ(j)(ρ)S𝖬,γ(j)(ρ).\displaystyle S^{(j)}_{\mathsf{M}^{\prime},\gamma}(\rho)\geq S^{(j)}_{\mathsf{M},\gamma}(\rho)\,. (42)

    This property is satisfied by j=1,3j=1,3, with proofs in Appendix E.

Finally, we notice that while the original OE, Eq. (1), is upper bounded as S𝖬(ρ)lndS_{\mathsf{M}}(\rho)\leq\ln d, in general, for a non-uniform reference γ\gamma, the same bound does not hold, as expected. However

S𝖬,γ(1)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(1)}(\rho) S𝖬,γ(3)(ρ)\displaystyle\leq S_{\mathsf{M},\gamma}^{(3)}(\rho) (43)

holds because the Belavkin-Staszewski relative entropy bounds the Umegaki one from above [43, 44]. Also

S𝖬,γ(1)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(1)}(\rho) S𝖬,γ(2)(ρ)\displaystyle\leq S_{\mathsf{M},\gamma}^{(2)}(\rho) (44)

holds due to joint convexity of the relative entropy (proof in Appendix F).

6 Examples

6.1 Gibbs prior

In the presence of a Hamiltonian H=n=0d1En|nn|H=\sum_{n=0}^{d-1}E_{n}|n\rangle\!\!\!\;\langle n|, a very natural choice of non-uniform prior is the Gibbs state

γ:=eβH/Tr[eβH],β>0.\displaystyle\gamma:=e^{-\beta H}/\mathrm{Tr}[e^{-\beta H}],\leavevmode\nobreak\ \beta>0. (45)

We also consider the measurement in the energy eigenbasis 𝖬:={|00|,,|d1d1|}\mathsf{M}:=\{|0\rangle\!\!\!\;\langle 0|,\dots,|d-1\rangle\!\!\!\;\langle d-1|\}, but to move far away from the classical case we assume that the input state is pure and maximally unbiased with the energy eigenbasis:

ρ:=|ψψ|\displaystyle\rho:=|\psi\rangle\!\!\!\;\langle\psi| with |ψ:=1dn=0d1|n.\displaystyle\textrm{ with }\ket{\psi}:=\frac{1}{\sqrt{d}}\sum_{n=0}^{d-1}\ket{n}. (46)

With these assumptions, the first definition yields

S𝖬,γ(1)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(1)}(\rho) =S𝖬(ρ)=lnd,\displaystyle=S_{\mathsf{M}}(\rho)=\ln d\,, (47)

which is also the case if ρ\rho is a mixture of maximally unbiased states. Like S𝖬,γclaxS^{\textrm{clax}}_{\mathsf{M},\gamma}, S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)} reduces to the original S𝖬S_{\mathsf{M}} when the prior is a convex sum of the measurement elements.

The second definition yields

S𝖬,γ(2)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(2)}(\rho) =,\displaystyle=\infty\,, (48)

for any pure state, since the support of QRγ=1dn|nn||nn|Q_{R}^{\gamma}=\frac{1}{d}\sum_{n}|n\rangle\!\!\!\;\langle n|\otimes|n\rangle\!\!\!\;\langle n| does not contain the support of QF=1d𝟙|ψψ|Q_{F}=\frac{1}{d}\mathds{1}\otimes|\psi\rangle\!\!\!\;\langle\psi|. We shall comment on this result after the next example.

Finally, the third definition yields

S𝖬,γ(3)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(3)}(\rho) =lnTr[γ1]+1dTr[lnγ]\displaystyle=\ln\mathrm{Tr}[\gamma^{-1}]+\frac{1}{d}\mathrm{Tr}[\ln\gamma] (49)
=ln1eβωd1eβωβω(d1)2.\displaystyle=\ln\frac{1-e^{\beta\omega d}}{1-e^{\beta\omega}}-\frac{\beta\omega(d-1)}{2}\,. (50)

where the first line is general, while the second is the expression for equidistant spectrum En=nωE_{n}=n\omega. Thus S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)} is more sensitive than S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)} to quantum situations.

6.2 Three-qubit encoding

The following example is inspired by a simple error-correcting code, the three-qubit encoding of a pure qubit:

α|0+β|1α|000+β|111,|α|2+|β|2=1.\displaystyle\alpha\ket{0}+\beta\ket{1}\mapsto\alpha\ket{000}+\beta\ket{111},\leavevmode\nobreak\ |\alpha|^{2}+|\beta|^{2}=1\,. (51)

Suppose ρ\rho is the encoded state

ρ:=(α|000+β|111)(α000|+β111|).\displaystyle\rho:=(\alpha\ket{000}+\beta\ket{111})(\alpha^{*}\bra{000}+\beta^{*}\bra{111})\,. (52)

We consider the measurement of each qubit in the {|+,|}\{\ket{+},\ket{-}\} basis, i.e. the POVM elements are projectors on the basis vectors

{|+++,|++,|++,,||}.\displaystyle\{\ket{+++},\ket{++-},\ket{+-+},\dots,\ket{---}|\}\,. (53)

As for the prior, we suppose that the observer knows the encoding of the error correction code, and expect ρ\rho to be more probably in the subspace spanned by |000\ket{000} and |111\ket{111}, possibly with a bias towards one of those product states; whence

γ\displaystyle\gamma :=p0|000000|+p1|111111|\displaystyle:=p_{0}|000\rangle\!\!\!\;\langle 000|+p_{1}|111\rangle\!\!\!\;\langle 111|
+1p0p16(𝟙|000000||111111|),\displaystyle\leavevmode\nobreak\ +\frac{1-p_{0}-p_{1}}{6}(\mathds{1}-|000\rangle\!\!\!\;\langle 000|-|111\rangle\!\!\!\;\langle 111|)\,, (54)
p0,p1>0,p0+p1<1.\displaystyle p_{0},p_{1}>0,\leavevmode\nobreak\ p_{0}+p_{1}<1\,.

In this case, the three definitions proposed here yield

S𝖬,γ(1)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(1)}(\rho) =|α|2ln1p0+|β|2ln1p1D((ρ)(γ)),\displaystyle=|\alpha|^{2}\ln\frac{1}{p_{0}}+|\beta|^{2}\ln\frac{1}{p_{1}}-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))\,, (55)
S𝖬,γ(2)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(2)}(\rho) =,\displaystyle=\infty\,, (56)
S𝖬,γ(3)(ρ)\displaystyle S_{\mathsf{M},\gamma}^{(3)}(\rho) =ln(|α|2p0+|β|2p1)D((ρ)(γ))\displaystyle=\ln\left(\frac{|\alpha|^{2}}{p_{0}}+\frac{|\beta|^{2}}{p_{1}}\right)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) (57)

with

D((ρ)(γ))\displaystyle D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))
=|α+β|2ln|α+β|+|αβ|2ln|αβ|.\displaystyle=|\alpha+\beta|^{2}\ln|\alpha+\beta|+|\alpha-\beta|^{2}\ln|\alpha-\beta|\,. (58)

S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)} and S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)} differ in the first term, as long as p0p1p_{0}\neq p_{1}: for p0=p1=pp_{0}=p_{1}=p, both yield ln(1/p)\ln(1/p). In particular, when p=18p=\frac{1}{8}, γ=u\gamma=u and therefore S𝖬,γ(1)(ρ)=S𝖬,γ(3)(ρ)=S𝖬(ρ)S_{\mathsf{M},\gamma}^{(1)}(\rho)=S_{\mathsf{M},\gamma}^{(3)}(\rho)=S_{\mathsf{M}}(\rho). We see that S𝖬,γ(2)S_{\mathsf{M},\gamma}^{(2)} is still infinite, for the same reason of support mismatch as in the previous example. From the examples, we observe that S𝖬,γ(2)S_{\mathsf{M},\gamma}^{(2)} is often overly sensitive to the non-commutativity between ρ\rho and γ\gamma. This suggests that, instead of the natural choice of QFQ_{F} and QRγQ_{R}^{\gamma} as input-output representations, one could opt for representations whose supports are more aligned, such as QFt{}^{t}Q_{F} and QRγt{}^{t}Q_{R}^{\gamma}, which relate to S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)} via Eq. (37).

7 Conclusions

The original definition [Eq. (1)] of observational entropy (OE) was known to be lower-bounded by the von Neumann entropy. Here we have first brought to the fore that the excess term Σ𝖬(ρ)\Sigma_{\mathsf{M}}(\rho) can be interpreted in two ways: as a statistical deficiency (4), quantifying the decrease of state distinguishability induced by the measurement; and as irretrodictability (13), quantifying the hardness of retrodicting the input from the output statistics. While it is intuitive that recovering the input state is harder if the measurement makes states less distinguishable, the exact coincidence of the quantifiers is of interest.

In both interpretations, we observe that the uniform state uu plays the role of reference, or prior, knowledge. This may not represent the proper knowledge of the physical situation: for instance, for systems in contact with a thermal bath, it may be more natural to choose the Gibbs prior. Based on this, we have studied generalisations of OE, in which the prior knowledge can be an arbitrary state γ\gamma.

When [ρ,γ]=0[\rho,\gamma]=0, we find an obvious generalisation of the excess term [Eq. (16)] that retains both interpretations of statistical deficiency and irretrodictability. This is no longer straightforward for a general quantum prior. Technically, one of the main difficulty lies in that the irretrodictability quantifier is a relative entropy between joint input-and-output objects, whose definition in quantum theory is a current topic of research. We have explored three possible definitions of generalized OE (Table 1): two specifically designed to satisfy one of the interpretations but lacking the other; the third retaining both by replacing the usual Umegaki relative entropy with the Belavkin-Staszewski version. Thus we have a novel fully quantum object, that quantifies simultaneously the loss of distinguishability by the measurement and the hardness to retrodict the input knowing the output. Being built from information-theoretical considerations, our new formulation of OE may also hold significance in physical (thermodynamical) contexts, such as its relationship with work extraction [19]. We leave a deeper exploration of the physical implications of OE for future research.

Acknowledgments

We thank Clive Aw, Fumio Hiai, Anna Jenčová and Andreas Winter for discussions.

G.B. and V.S. are supported by the National Research Foundation, Singapore and A*STAR under its CQT Bridging Grant; and by the Ministry of Education, Singapore, under the Tier 2 grant “Bayesian approach to irreversibility” (Grant No. MOE-T2EP50123-0002). D.Š. acknowledges the support from the Institute for Basic Science in Korea (IBS-R024-D1). F.B. acknowledges support from MEXT Quantum Leap Flagship Program (MEXT QLEAP) Grant No. JPMXS0120319794, from MEXT-JSPS Grant-in-Aid for Transformative Research Areas (A) “Extreme Universe” No. 21H05183, and from JSPS KAKENHI, Grants No. 20K03746 and No. 23K03230. J.S. acknowledges support by MICIIN with funding from European Union NextGenerationEU (PRTR-C17.I1) and by Generalitat de Catalunya.

References

Appendix A Proof of Lemma 1

Lemma 1 relates the Choi operators of the forward and reverse processes. This can be shown using their definitions (24) and (25).

Proof.

Let the Kraus representation of \mathcal{M} be (ρ)=kKkρKk\mathcal{M}(\rho)=\sum_{k}K_{k}\rho K_{k}^{\dagger}. First, observe the following identity

i,jA|ij|B|ij|=i,j|ij|AT|ij|BT\displaystyle\sum_{i,j}A\ket{i}\bra{j}B\otimes\ket{i}\bra{j}=\sum_{i,j}\ket{i}\bra{j}\otimes A^{T}\ket{i}\bra{j}B^{T} (59)

for any operators AA and BB. Using this identity twice, the right-hand side of Eq. 26 equals to

((γ)1/2γT)C((γ)1/2γT)\displaystyle\phantom{=}\leavevmode\nobreak\ \left(\mathcal{M}(\gamma)^{-1/2}\otimes\sqrt{\gamma^{T}}\right)C_{\mathcal{M}}\left(\mathcal{M}(\gamma)^{-1/2}\otimes\sqrt{\gamma^{T}}\right)
=i,j(γ)1/2(|ij|)(γ)1/2γT|ij|γT\displaystyle=\sum_{i,j}\mathcal{M}(\gamma)^{-1/2}\mathcal{M}(\ket{i}\bra{j})\mathcal{M}(\gamma)^{-1/2}\otimes\sqrt{\gamma^{T}}\ket{i}\bra{j}\sqrt{\gamma^{T}}
=i,j,k(γ)1/2|ij|(γ)1/2γTKkT|ij|KkγT\displaystyle=\sum_{i,j,k}\mathcal{M}(\gamma)^{-1/2}\ket{i}\bra{j}\mathcal{M}(\gamma)^{-1/2}\otimes\sqrt{\gamma^{T}}K_{k}^{T}\ket{i}\bra{j}K_{k}^{*}\sqrt{\gamma^{T}}
=i,j,k|ij|γTKkT((γ)1/2)T|ij|((γ)1/2)TKkγT\displaystyle=\sum_{i,j,k}\ket{i}\bra{j}\otimes\sqrt{\gamma^{T}}K_{k}^{T}(\mathcal{M}(\gamma)^{-1/2})^{T}\ket{i}\bra{j}(\mathcal{M}(\gamma)^{-1/2})^{T}K_{k}^{*}\sqrt{\gamma^{T}} (60)

On the other hand,

C~γ\displaystyle C_{\widetilde{\mathcal{M}}^{\gamma}} =i,j|ij|~γ(|ij|)\displaystyle=\sum_{i,j}\ket{i}\bra{j}\otimes\widetilde{\mathcal{M}}^{\gamma}(\ket{i}\bra{j})
=i,j,k|ij|γKk(γ)1/2|ij|(γ)1/2Kkγ\displaystyle=\sum_{i,j,k}\ket{i}\bra{j}\otimes\sqrt{\gamma}K_{k}^{\dagger}\mathcal{M}(\gamma)^{-1/2}\ket{i}\bra{j}\mathcal{M}(\gamma)^{-1/2}K_{k}\sqrt{\gamma}
=i,j,k|ji|γKk(γ)1/2|ji|(γ)1/2Kkγ\displaystyle=\sum_{i,j,k}\ket{j}\bra{i}\otimes\sqrt{\gamma}K_{k}^{\dagger}\mathcal{M}(\gamma)^{-1/2}\ket{j}\bra{i}\mathcal{M}(\gamma)^{-1/2}K_{k}\sqrt{\gamma} (61)

Notice that Eq. 61 is the transpose of Eq. 60. This proves Eq. 26. ∎

Appendix B Proof of Eq. 37

The most important observation for Eq. 37 is that, one of the C\sqrt{C_{\mathcal{M}}} in the definitions of QFt{}^{t}Q_{F} (31) and QRt{}^{t}Q_{R} (32) will cancel each other in the expression of DBS(QFtQRγt)D_{\rm BS}\left({}^{t}Q_{F}\middle\|{}^{t}Q_{R}^{\gamma}\right), leaving a tensor product inside the logarithm. That is to say,

lnQFt(QRγt)1\displaystyle\phantom{=}\ \ln{}^{t}Q_{F}({}^{t}Q_{R}^{\gamma})^{-1}
=lnC(𝟙ρT)C(C((γ)1/2τ(γ)1/2γT)C)1\displaystyle=\ln\sqrt{C_{\mathcal{M}}}(\mathds{1}\otimes\rho^{T})\sqrt{C_{\mathcal{M}}}\left(\sqrt{C_{\mathcal{M}}}(\mathcal{M}(\gamma)^{-1/2}\tau\mathcal{M}(\gamma)^{-1/2}\otimes\gamma^{T})\sqrt{C_{\mathcal{M}}}\right)^{-1}
=lnC((γ)1/2τ1(γ)1/2ρT(γT)1)C1\displaystyle=\ln\sqrt{C_{\mathcal{M}}}(\mathcal{M}(\gamma)^{1/2}\tau^{-1}\mathcal{M}(\gamma)^{1/2}\otimes\rho^{T}(\gamma^{T})^{-1})\sqrt{C_{\mathcal{M}}}^{-1}
=C(ln(γ)1/2τ1(γ)1/2ρT(γT)1)C1\displaystyle=\sqrt{C_{\mathcal{M}}}(\ln\mathcal{M}(\gamma)^{1/2}\tau^{-1}\mathcal{M}(\gamma)^{1/2}\otimes\rho^{T}(\gamma^{T})^{-1})\sqrt{C_{\mathcal{M}}}^{-1}
=C(ln(γ)1/2τ1(γ)1/2𝟙+𝟙lnρT(γT)1)C1\displaystyle=\sqrt{C_{\mathcal{M}}}\left(\ln\mathcal{M}(\gamma)^{1/2}\tau^{-1}\mathcal{M}(\gamma)^{1/2}\otimes\mathds{1}+\mathds{1}\otimes\ln\rho^{T}(\gamma^{T})^{-1}\right)\sqrt{C_{\mathcal{M}}}^{-1} (62)

Notice that QRγt{}^{t}Q_{R}^{\gamma} being full-rank implies CC_{\mathcal{M}} and γ\gamma being full-rank. Putting Eq. 62 into the definition of DBSD_{\rm BS}, the left-hand side of Eq. 37 is

DBS(QFtQRγt)\displaystyle\phantom{=}\ D_{\rm BS}\left({}^{t}Q_{F}\middle\|{}^{t}Q_{R}^{\gamma}\right)
=Tr[QFtlnQFt(QRγt)1]\displaystyle=\mathrm{Tr}[{}^{t}Q_{F}\ \ln{}^{t}Q_{F}({}^{t}Q_{R}^{\gamma})^{-1}]
=Tr[(𝟙ρT)C(ln(γ)1/2τ1(γ)1/2𝟙+𝟙lnρT(γT)1)]\displaystyle=\mathrm{Tr}\left[(\mathds{1}\otimes\rho^{T})C_{\mathcal{M}}\left(\ln\mathcal{M}(\gamma)^{1/2}\tau^{-1}\mathcal{M}(\gamma)^{1/2}\otimes\mathds{1}+\mathds{1}\otimes\ln\rho^{T}(\gamma^{T})^{-1}\right)\right]
=Tr[TrB[(𝟙ρT)C]lnρT(γT)1]+Tr[TrA[(𝟙ρT)C]ln(γ)1/2τ1(γ)1/2]\displaystyle=\mathrm{Tr}[\mathrm{Tr}_{B}[(\mathds{1}\otimes\rho^{T})C_{\mathcal{M}}]\ \ln\rho^{T}(\gamma^{T})^{-1}]+\mathrm{Tr}[\mathrm{Tr}_{A}[(\mathds{1}\otimes\rho^{T})C_{\mathcal{M}}]\ \ln\mathcal{M}(\gamma)^{1/2}\tau^{-1}\mathcal{M}(\gamma)^{1/2}]
=Tr[ρTlnρT(γT)1]+Tr[(ρ)ln(γ)1/2τ1(γ)1/2]\displaystyle=\mathrm{Tr}[\rho^{T}\ \ln\rho^{T}(\gamma^{T})^{-1}]+\mathrm{Tr}[\mathcal{M}(\rho)\ \ln\mathcal{M}(\gamma)^{1/2}\tau^{-1}\mathcal{M}(\gamma)^{1/2}]
=DBS(ργ)Tr[(ρ)ln(γ)1/2τ(γ)1/2]\displaystyle=D_{\rm BS}(\rho\|\gamma)-\mathrm{Tr}[\mathcal{M}(\rho)\ \ln\mathcal{M}(\gamma)^{-1/2}\tau\mathcal{M}(\gamma)^{-1/2}] (63)

Recall that we choose the input of the reverse process to be τ=(ρ)\tau=\mathcal{M}(\rho). Noting that (ρ)\mathcal{M}(\rho) commutes with (γ)\mathcal{M}(\gamma), the second term equals to

Tr[(ρ)ln(γ)1/2τ(γ)1/2]\displaystyle\phantom{=}\ \mathrm{Tr}[\mathcal{M}(\rho)\ \ln\mathcal{M}(\gamma)^{-1/2}\tau\mathcal{M}(\gamma)^{-1/2}]
=Tr[(ρ)(ln(ρ)ln(γ))]\displaystyle=\mathrm{Tr}[\mathcal{M}(\rho)(\ln\mathcal{M}(\rho)-\ln\mathcal{M}(\gamma))]
=D((ρ)(γ)).\displaystyle=D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))\,. (64)

This proves Eq. 37.

Appendix C Proof of properties (i)-(iii)

Since S𝖬,uclax(ρ)=S𝖬(ρ)S_{\mathsf{M},u}^{\rm clax}(\rho)=S_{\mathsf{M}}(\rho) and [ρ,u]=0[\rho,u]=0, property (i) is a special case of property (ii), so we prove (ii) directly.

When [ρ,γ]=0[\rho,\gamma]=0, D(ργ)D(\rho\|\gamma) and DBS(ργ)D_{\rm BS}(\rho\|\gamma) are both equal to the relative entropy between the eigenvalues of ρ\rho and γ\gamma. By comparing their definitions Eqs. 17, 33 and 36, we obtain S𝖬,γ(1)(ρ)=S𝖬,γ(3)(ρ)=S𝖬,γclax(ρ)S_{\mathsf{M},\gamma}^{(1)}(\rho)=S_{\mathsf{M},\gamma}^{(3)}(\rho)=S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho).

For S𝖬,γ(2)S_{\mathsf{M},\gamma}^{(2)}, we further need to use that [ρ,Πy]=[γ,Πy]=0[\rho,\Pi_{y}]=[\gamma,\Pi_{y}]=0. This condition indicates that C,(𝟙ρ),(𝟙γ)C_{\mathcal{M}},(\mathds{1}\otimes\rho),(\mathds{1}\otimes\gamma) all commute, and therefore QF=(𝟙ρT)CQ_{F}=(\mathds{1}\otimes\rho^{T})C_{\mathcal{M}}, QRγ=((ρ)(γ)1γT)CQ_{R}^{\gamma}=(\mathcal{M}(\rho)\mathcal{M}(\gamma)^{-1}\otimes\gamma^{T})C_{\mathcal{M}}, and

Σ𝖬,γ(2)(ρ)\displaystyle\Sigma_{\mathsf{M},\gamma}^{(2)}(\rho) =D(QFQRγ)\displaystyle=D(Q_{F}\|Q_{R}^{\gamma})
=Tr[QF(lnQFlnQRγ)]\displaystyle=\mathrm{Tr}[Q_{F}(\ln Q_{F}-\ln Q_{R}^{\gamma})]
=Tr[QF(ln(𝟙ρT)+lnC)]Tr[QF(ln((ρ)(γ)1γT)+lnC)]\displaystyle=\mathrm{Tr}[Q_{F}(\ln(\mathds{1}\otimes\rho^{T})+\ln C_{\mathcal{M}})]-\mathrm{Tr}\left[Q_{F}\left(\ln(\mathcal{M}(\rho)\mathcal{M}(\gamma)^{-1}\otimes\gamma^{T})+\ln C_{\mathcal{M}}\right)\right]
=Tr[QF(𝟙(lnρTlnγT))]Tr[QF(ln(ρ)(γ)1𝟙)]\displaystyle=\mathrm{Tr}\left[Q_{F}\left(\mathds{1}\otimes(\ln\rho^{T}-\ln\gamma^{T})\right)\right]-\mathrm{Tr}[Q_{F}(\ln\mathcal{M}(\rho)\mathcal{M}(\gamma)^{-1}\otimes\mathds{1})]
=D(ργ)D((ρ)(γ))\displaystyle=D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))
=S𝖬,γclax(ρ)S(ρ)\displaystyle=S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho)-S(\rho) (65)

Notice that the above equality holds as long as the support of QFQ_{F} is contained in that of QRγQ_{R}^{\gamma} without assuming QRγQ_{R}^{\gamma} to be invertible, since the Umegaki relative entropy is continuous with respect to both arguments [45]. Therefore, S𝖬,γ(2)(ρ)=S𝖬,γclax(ρ)S_{\mathsf{M},\gamma}^{(2)}(\rho)=S^{\textrm{clax}}_{\mathsf{M},\gamma}(\rho) and property (ii) holds for j=2j=2.

Property (iii) is equivalent to say that Σ𝖬,γ(j)\Sigma_{\mathsf{M},\gamma}^{(j)} is non-negative.

Σ𝖬,γ(2)(ρ)=D(QFQRγ)\Sigma_{\mathsf{M},\gamma}^{(2)}(\rho)=D(Q_{F}\|Q_{R}^{\gamma}) is non-negative by the non-negativity of relative entropy between two unit-trace positive operators.

Σ𝖬,γ(1)(ρ)=D(ργ)D((ρ)(γ))\Sigma_{\mathsf{M},\gamma}^{(1)}(\rho)=D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) is non-negative by the data-processing inequality of Umegaki relative entropy. Last, by Eq. 43, Σ𝖬,γ(3)(ρ)Σ𝖬,γ(1)(ρ)0\Sigma_{\mathsf{M},\gamma}^{(3)}(\rho)\geq\Sigma_{\mathsf{M},\gamma}^{(1)}(\rho)\geq 0.

Appendix D Proof of Petz recovery criteria (v)

We first show that both S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)} and S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)} are equal to S(ρ)S(\rho) if and only if ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho. The j=2j=2 case is addressed later.

D.1 Property (v) of S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)}

The property D(ργ)=D((ρ)(γ))~γ((ρ))=ρD(\rho\|\gamma)=D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))\Leftrightarrow\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho is shown, for the larger family called ff-divergences, by Theorem 5.1 in Ref. [46] and Theorem 3.18 in Ref. [44]. Therefore, the property (v) for S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)} is proved since S𝖬,γ(1)(ρ)=S(ρ)S_{\mathsf{M},\gamma}^{(1)}(\rho)=S(\rho) is equivalent to D(ργ)=D((ρ)(γ))D(\rho\|\gamma)=D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)).

D.2 Property (v) of S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)}

For S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)}, first assume ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho. By the data-processing inequality of the Belavkin-Staszewski relative entropy [44, 47],

DBS(ργ)\displaystyle D_{\rm BS}(\rho\|\gamma) DBS((ρ)(γ))\displaystyle\geq D_{\rm BS}(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))
DBS(~γ((ρ))~γ((γ)))\displaystyle\geq D_{\rm BS}\left(\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))\middle\|\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\gamma))\right)
=DBS(ργ),\displaystyle=D_{\rm BS}(\rho\|\gamma)\,, (66)

where ~γ((γ))=γ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\gamma))=\gamma by definition. Therefore, DBS(ργ)=DBS((ρ)(γ))D_{\rm BS}(\rho\|\gamma)=D_{\rm BS}(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) and thus S𝖬,γ(3)(ρ)=S(ρ)S_{\mathsf{M},\gamma}^{(3)}(\rho)=S(\rho).

On the other hand, if S𝖬,γ(3)(ρ)=S(ρ)S_{\mathsf{M},\gamma}^{(3)}(\rho)=S(\rho), by Eq. 43 and property (iii), S𝖬,γ(3)(ρ)S𝖬,γ(1)(ρ)S(ρ)S_{\mathsf{M},\gamma}^{(3)}(\rho)\geq S_{\mathsf{M},\gamma}^{(1)}(\rho)\geq S(\rho), one infers S𝖬,γ(1)(ρ)=S(ρ)S_{\mathsf{M},\gamma}^{(1)}(\rho)=S(\rho), and therefore ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho by property (v) of S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)}.

D.3 Property (v) of S𝖬,γ(2)S_{\mathsf{M},\gamma}^{(2)}

Next, we prove property (v) for S𝖬,γ(2)S_{\mathsf{M},\gamma}^{(2)}. Before that, we notice that the Petz recovery condition ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho indicates that [ρ,γ]=0[\rho,\gamma]=0.

Lemma 2.

For a measurement channel \mathcal{M}, ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho implies [ρ,γ]=0[\rho,\gamma]=0.

Proof.

By property (v) of S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)} and S𝖬,γ(3)S_{\mathsf{M},\gamma}^{(3)}, one has Σ𝖬,γ(1)(ρ)=Σ𝖬,γ(3)(ρ)=0\Sigma_{\mathsf{M},\gamma}^{(1)}(\rho)=\Sigma_{\mathsf{M},\gamma}^{(3)}(\rho)=0. That is,

D(ργ)D((ρ)(γ))\displaystyle D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) =DBS(ργ)D((ρ)(γ))\displaystyle=D_{\rm BS}(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))
D(ργ)\displaystyle D(\rho\|\gamma) =DBS(ργ)\displaystyle=D_{\rm BS}(\rho\|\gamma) (67)

Because D(ργ)=DBS(ργ)D(\rho\|\gamma)=D_{\rm BS}(\rho\|\gamma) if and only if [ρ,γ]=0[\rho,\gamma]=0 (Ref. [44] Theorem 4.3), one has [ρ,γ]=0[\rho,\gamma]=0. ∎

Now, we prove property (v). We first prove the “only if” part.

Suppose S𝖬,γ(2)=S(ρ)S_{\mathsf{M},\gamma}^{(2)}=S(\rho). This is equivalent to that Σ𝖬,γ(2)(ρ)=D(QFQRγ)=0\Sigma_{\mathsf{M},\gamma}^{(2)}(\rho)=D(Q_{F}\|Q_{R}^{\gamma})=0, which is equivalent to that QF=QRγQ_{F}=Q_{R}^{\gamma}. Taking the partial trace over system BB, we get ρT=TrB[QF]=TrB[QRγ]=~γ((ρ))T\rho^{T}=\mathrm{Tr}_{B}[Q_{F}]=\mathrm{Tr}_{B}[Q_{R}^{\gamma}]=\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))^{T}. Therefore, ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho.

For the “if” part, we will show that ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho implies QF=QRγQ_{F}=Q_{R}^{\gamma}, which is equivalent to S𝖬,γ(2)=S(ρ)S_{\mathsf{M},\gamma}^{(2)}=S(\rho).

Suppose ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho. By Lemma 2, [ρ,γ]=0[\rho,\gamma]=0, and thus we can diagonalize them in the same basis:

ρ\displaystyle\rho =xλx|ψxψx|,\displaystyle=\sum_{x}\lambda_{x}|\psi_{x}\rangle\!\!\!\;\langle\psi_{x}|\,, (68)
γ\displaystyle\gamma =xγx|ψxψx|.\displaystyle=\sum_{x}\gamma_{x}|\psi_{x}\rangle\!\!\!\;\langle\psi_{x}|\,. (69)

Next, we construct a new POVM by taking the diagonal elements of the POVM 𝖬={Πy}\mathsf{M}=\{\Pi_{y}\} in the above basis:

𝖬{Πy},Πyx|ψxψx|Πy|ψxψx|.\displaystyle\mathsf{M}^{\prime}\coloneqq\{\Pi_{y}^{\prime}\},\leavevmode\nobreak\ \leavevmode\nobreak\ \Pi_{y}^{\prime}\coloneqq\sum_{x}|\psi_{x}\rangle\!\!\!\;\langle\psi_{x}|\Pi_{y}|\psi_{x}\rangle\!\!\!\;\langle\psi_{x}|\,. (70)

Let (ρ)yTr[Πyρ]|yy|\mathcal{M}^{\prime}(\rho)\coloneqq\sum_{y}\mathrm{Tr}[\Pi_{y}^{\prime}\rho]|y\rangle\!\!\!\;\langle y| be the measurement channel and ~γ\widetilde{\mathcal{M}}^{\prime\gamma} be its Petz map with reference γ\gamma. Notice that Tr[Πyρ]=Tr[Πyρ]\mathrm{Tr}[\Pi_{y}\rho]=\mathrm{Tr}[\Pi_{y}^{\prime}\rho], Tr[Πyγ]=Tr[Πyγ]\mathrm{Tr}[\Pi_{y}\gamma]=\mathrm{Tr}[\Pi_{y}^{\prime}\gamma], and therefore (ρ)=(ρ)\mathcal{M}^{\prime}(\rho)=\mathcal{M}(\rho), (γ)=(γ)\mathcal{M}^{\prime}(\gamma)=\mathcal{M}(\gamma).

Since [ρ,γ]=[ρ,Πy]=[γ,Πy]=0[\rho,\gamma]=[\rho,\Pi^{\prime}_{y}]=[\gamma,\Pi^{\prime}_{y}]=0, using property (ii) for \mathcal{M}^{\prime}, one has

S𝖬,γclax(ρ)S(ρ)\displaystyle S_{\mathsf{M}^{\prime},\gamma}^{\textrm{clax}}(\rho)-S(\rho) =D(ργ)D((ρ)(γ))\displaystyle=D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))
=D(QFQRγ),\displaystyle=D(Q_{F^{\prime}}\|Q_{R^{\prime}}^{\gamma})\,, (71)

where QF(𝟙ρT)CQ_{F^{\prime}}\coloneqq(\mathds{1}\otimes\rho^{T})C_{\mathcal{M}^{\prime}} and QRγ((ρ)(γ)1γT)CQ_{R^{\prime}}^{\gamma}\coloneqq(\mathcal{M}^{\prime}(\rho)\mathcal{M}^{\prime}(\gamma)^{-1}\otimes\gamma^{T})C_{\mathcal{M}^{\prime}} are the representations of \mathcal{M}^{\prime} and ~γ\widetilde{\mathcal{M}}^{\prime\gamma}, which are simplified using the commutativity of ρ\rho,γ\gamma and Πy\Pi_{y}^{\prime}.

Note that the expression Eq. 71 is the same as Σ𝖬,γ(1)(ρ)=D(ργ)D((ρ)(γ))\Sigma_{\mathsf{M},\gamma}^{(1)}(\rho)=D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)). Since ~γ((ρ))=ρ\widetilde{\mathcal{M}}^{\gamma}(\mathcal{M}(\rho))=\rho, by property (v) for S𝖬,γ(1)S_{\mathsf{M},\gamma}^{(1)}, we have Σ𝖬,γ(1)(ρ)=0\Sigma_{\mathsf{M},\gamma}^{(1)}(\rho)=0. Combining this with Eq. 71 indicates that D(QFQRγ)=0D(Q_{F^{\prime}}\|Q_{R^{\prime}}^{\gamma})=0 and QF=QRγQ_{F^{\prime}}=Q_{R^{\prime}}^{\gamma}. Expanding this with respect to the basis {|y|ψx}\{\ket{y}\otimes\ket{\psi_{x}}\}, one get

y|ψx|QF|y|ψx\displaystyle\bra{y}\bra{\psi_{x}}Q_{F^{\prime}}\ket{y}\ket{\psi_{x}} =y|ψx|QRγ|y|ψx\displaystyle=\bra{y}\bra{\psi_{x}}Q_{R^{\prime}}^{\gamma}\ket{y}\ket{\psi_{x}} (72)
φ(y|x)λx\displaystyle\varphi(y|x)\lambda_{x} =φ(y|x)γxTr[Πyρ]Tr[Πyγ],x,y\displaystyle=\frac{\varphi(y|x)\gamma_{x}\mathrm{Tr}[\Pi^{\prime}_{y}\rho]}{\mathrm{Tr}[\Pi^{\prime}_{y}\gamma]},\leavevmode\nobreak\ \forall x,y (73)

where φ(y|x)=ψx|Πy|ψx=ψx|Πy|ψx\varphi(y|x)=\bra{\psi_{x}}\Pi^{\prime}_{y}\ket{\psi_{x}}=\bra{\psi_{x}}\Pi_{y}\ket{\psi_{x}}.

Now, fix yy, and consider the support of Πy\Pi^{\prime}_{y}, which is spanned by all |ψx\ket{\psi_{x}} such that φ(y|x)0\varphi(y|x)\neq 0. In this subset of xx, one could cancel out φ(y|x)\varphi(y|x) in both sides and Eq. 73 becomes

λx=Tr[Πyρ]Tr[Πyγ]γx,for x:|ψxSupp(Πy),\displaystyle\lambda_{x}=\frac{\mathrm{Tr}[\Pi^{\prime}_{y}\rho]}{\mathrm{Tr}[\Pi^{\prime}_{y}\gamma]}\gamma_{x},\leavevmode\nobreak\ \text{for\leavevmode\nobreak\ }x:\ket{\psi_{x}}\in\text{Supp}(\Pi^{\prime}_{y})\,, (74)

where Supp(Πy)\text{Supp}(\Pi^{\prime}_{y}) denotes the support of Πy\Pi^{\prime}_{y}. Taking the square root of this equation, one gets

λx=Tr[Πyρ]Tr[Πyγ]γx,for x:|ψxSupp(Πy).\displaystyle\sqrt{\lambda_{x}}=\sqrt{\frac{\mathrm{Tr}[\Pi^{\prime}_{y}\rho]}{\mathrm{Tr}[\Pi^{\prime}_{y}\gamma]}}\sqrt{\gamma_{x}},\leavevmode\nobreak\ \text{for\leavevmode\nobreak\ }x:\ket{\psi_{x}}\in\text{Supp}(\Pi^{\prime}_{y})\,. (75)

Define (Πy)0x:φ(y|x)0|ψxψx|(\Pi_{y}^{\prime})^{0}\coloneqq\sum_{x:\varphi(y|x)\neq 0}|\psi_{x}\rangle\!\!\!\;\langle\psi_{x}| as the projector onto Supp(Πy)\text{Supp}(\Pi^{\prime}_{y}). The above equation can be rewritten as

ρ(Πy)0=Tr[Πyρ]Tr[Πyγ]γ(Πy)0,\displaystyle\sqrt{\rho}(\Pi_{y}^{\prime})^{0}=\sqrt{\frac{\mathrm{Tr}[\Pi^{\prime}_{y}\rho]}{\mathrm{Tr}[\Pi^{\prime}_{y}\gamma]}}\sqrt{\gamma}(\Pi_{y}^{\prime})^{0}\,, (76)

since multiplication with (Πy)0(\Pi_{y}^{\prime})^{0} selects the eigenvectors of ρ\sqrt{\rho} and γ\sqrt{\gamma} in Supp(Πy)\text{Supp}(\Pi^{\prime}_{y}), which are λx\sqrt{\lambda_{x}} and γx\sqrt{\gamma_{x}} in Eq. 75.

Since Πy\Pi_{y} is positive, its off-diagonal elements are cross terms in Supp(Πy)\text{Supp}(\Pi^{\prime}_{y}):

Πy=Πy+x1,x2:x1x2,|ψx1,|ψx2Supp(Πy)πy,x1,x2|ψx1ψx2|\displaystyle\Pi_{y}=\Pi^{\prime}_{y}+\sum_{\begin{subarray}{c}x_{1},x_{2}:x_{1}\neq x_{2},\\ \ket{\psi_{x_{1}}},\ket{\psi_{x_{2}}}\in\text{Supp}(\Pi_{y}^{\prime})\end{subarray}}\pi_{y,x_{1},x_{2}}\ket{\psi_{x_{1}}}\bra{\psi_{x_{2}}} (77)

for some complex numbers πy,x1,x2\pi_{y,x_{1},x_{2}}. Therefore, Πy(Πy)0=(Πy)0Πy=Πy\Pi_{y}(\Pi_{y}^{\prime})^{0}=(\Pi_{y}^{\prime})^{0}\Pi_{y}=\Pi_{y}.

By this and Eq. 76,

ρΠyρ\displaystyle\sqrt{\rho}\Pi_{y}\sqrt{\rho} =ρ(Πy)0Πy(Πy)0ρ\displaystyle=\sqrt{\rho}(\Pi_{y}^{\prime})^{0}\Pi_{y}(\Pi_{y}^{\prime})^{0}\sqrt{\rho}
=Tr[Πyρ]Tr[Πyγ]γ(Πy)0Πy(Πy)0γ\displaystyle=\frac{\mathrm{Tr}[\Pi^{\prime}_{y}\rho]}{\mathrm{Tr}[\Pi^{\prime}_{y}\gamma]}\sqrt{\gamma}(\Pi_{y}^{\prime})^{0}\Pi_{y}(\Pi_{y}^{\prime})^{0}\sqrt{\gamma}
=Tr[Πyρ]Tr[Πyγ]γΠyγ\displaystyle=\frac{\mathrm{Tr}[\Pi^{\prime}_{y}\rho]}{\mathrm{Tr}[\Pi^{\prime}_{y}\gamma]}\sqrt{\gamma}\Pi_{y}\sqrt{\gamma} (78)

Last, by Tr[Πyρ]=Tr[Πyρ]\mathrm{Tr}[\Pi_{y}\rho]=\mathrm{Tr}[\Pi_{y}^{\prime}\rho], Tr[Πyγ]=Tr[Πyγ]\mathrm{Tr}[\Pi_{y}\gamma]=\mathrm{Tr}[\Pi_{y}^{\prime}\gamma] and Section D.3, one gets

QF\displaystyle Q_{F} =(𝟙ρT)C(𝟙ρT)\displaystyle=(\mathds{1}\otimes\sqrt{\rho^{T}})C_{\mathcal{M}}(\mathds{1}\otimes\sqrt{\rho^{T}})
=y|yy|ρTΠyTρT\displaystyle=\sum_{y}|y\rangle\!\!\!\;\langle y|\otimes\sqrt{\rho^{T}}\Pi_{y}^{T}\sqrt{\rho^{T}}
=y|yy|Tr[Πyρ]Tr[Πyγ]γTΠyTγT\displaystyle=\sum_{y}|y\rangle\!\!\!\;\langle y|\otimes\frac{\mathrm{Tr}[\Pi^{\prime}_{y}\rho]}{\mathrm{Tr}[\Pi^{\prime}_{y}\gamma]}\sqrt{\gamma^{T}}\Pi_{y}^{T}\sqrt{\gamma^{T}}
=y|yy|Tr[Πyρ]Tr[Πyγ]γTΠyTγT\displaystyle=\sum_{y}|y\rangle\!\!\!\;\langle y|\otimes\frac{\mathrm{Tr}[\Pi_{y}\rho]}{\mathrm{Tr}[\Pi_{y}\gamma]}\sqrt{\gamma^{T}}\Pi_{y}^{T}\sqrt{\gamma^{T}}
=((ρ)(γ)1γT)C(𝟙γT)\displaystyle=(\mathcal{M}(\rho)\mathcal{M}(\gamma)^{-1}\otimes\sqrt{\gamma^{T}})C_{\mathcal{M}}(\mathds{1}\otimes\sqrt{\gamma^{T}})
=QRγ\displaystyle=Q_{R}^{\gamma} (79)

Therefore, QF=QRγQ_{F}=Q_{R}^{\gamma}, and S𝖬,γ(2)=S(ρ)S_{\mathsf{M},\gamma}^{(2)}=S(\rho).

Appendix E Proof of monotonicity under stochastic post-processing (vi)

Let 𝒲\mathcal{W} be the linear map describing the post-processing ww, which satisfies

𝒲(|yy|)\displaystyle\mathcal{W}(|y\rangle\!\!\!\;\langle y|) =zwzy|zz|\displaystyle=\sum_{z}w_{zy}|z\rangle\!\!\!\;\langle z| (80)
𝒲(|yy|)\displaystyle\mathcal{W}(\ket{y}\!\!\!\;\bra{y^{\prime}}) =0,for yy.\displaystyle=0,\quad\text{for\leavevmode\nobreak\ }y\neq y^{\prime}\,. (81)

Since ww is a stochastic matrix, 𝒲\mathcal{W} is completely positive and trace-preserving. The measurement channel of 𝖬\mathsf{M}^{\prime} is then described by

(𝒫)(ρ)\displaystyle(\mathcal{P}\circ\mathcal{M})(\rho) =yTr[Πyρ]𝒫(|yy|)\displaystyle=\sum_{y}\mathrm{Tr}[\Pi_{y}\rho]\mathcal{P}(|y\rangle\!\!\!\;\langle y|)
=zTr[Πzρ]|zz|\displaystyle=\sum_{z}\mathrm{Tr}[\Pi_{z}^{\prime}\rho]|z\rangle\!\!\!\;\langle z| (82)

Property (vi) is equivalent to that S𝖬,γ(j)(ρ)S𝖬,γ(j)(ρ)0S^{(j)}_{\mathsf{M}^{\prime},\gamma}(\rho)-S^{(j)}_{\mathsf{M},\gamma}(\rho)\geq 0. For j=1,3j=1,3, they have the same form:

S𝖬,γ(j)(ρ)S𝖬,γ(j)(ρ)\displaystyle\phantom{=}S^{(j)}_{\mathsf{M}^{\prime},\gamma}(\rho)-S^{(j)}_{\mathsf{M},\gamma}(\rho)
=D((ρ)(γ))D((𝒫)(ρ)(𝒫)(γ))\displaystyle=D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))-D((\mathcal{P}\circ\mathcal{M})(\rho)\|(\mathcal{P}\circ\mathcal{M})(\gamma))
0\displaystyle\geq 0 (83)

The inequality is due to the data-processing inequality of relative entropy.

Appendix F Proof of Eq. 44

By definition of QFQ_{F} and QRγQ_{R}^{\gamma} in Eqs. (27) and (29),

QF\displaystyle Q_{F} =y|yy|ρTΠyTρT\displaystyle=\sum_{y}|y\rangle\!\!\!\;\langle y|\otimes\sqrt{\rho^{T}}\Pi_{y}^{T}\sqrt{\rho^{T}} (84)
QR\displaystyle Q_{R} =y|yy|Tr[Πyρ]Tr[Πyγ]γTΠyTγT\displaystyle=\sum_{y}|y\rangle\!\!\!\;\langle y|\otimes\frac{\mathrm{Tr}[\Pi_{y}\rho]}{\mathrm{Tr}[\Pi_{y}\gamma]}\sqrt{\gamma^{T}}\Pi_{y}^{T}\sqrt{\gamma^{T}} (85)

and thus

D(QFQR)\displaystyle D(Q_{F}\|Q_{R}) =Tr[y|yy|ρTΠyTρT(lnρTΠyTρTlnTr[Πyρ]γTΠyTγTTr[Πyγ])]\displaystyle=\mathrm{Tr}\left[\sum_{y}|y\rangle\!\!\!\;\langle y|\otimes\sqrt{\rho^{T}}\Pi_{y}^{T}\sqrt{\rho^{T}}\left(\ln\sqrt{\rho^{T}}\Pi_{y}^{T}\sqrt{\rho^{T}}-\ln\frac{\mathrm{Tr}[\Pi_{y}\rho]\sqrt{\gamma^{T}}\Pi_{y}^{T}\sqrt{\gamma^{T}}}{\mathrm{Tr}[\Pi_{y}\gamma]}\right)\right]
=yTr[ρΠyρ(lnρΠyρTr[Πyρ]lnγΠyγTr[Πyρ](lnTr[Πyρ]lnTr[Πyγ])𝟙)]\displaystyle=\sum_{y}\mathrm{Tr}\left[\sqrt{\rho}\Pi_{y}\sqrt{\rho}\left(\ln\frac{\sqrt{\rho}\Pi_{y}\sqrt{\rho}}{\mathrm{Tr}[\Pi_{y}\rho]}-\ln\frac{\sqrt{\gamma}\Pi_{y}\sqrt{\gamma}}{\mathrm{Tr}[\Pi_{y}\rho]}-(\ln\mathrm{Tr}[\Pi_{y}\rho]-\ln\mathrm{Tr}[\Pi_{y}\gamma])\mathds{1}\right)\right]
=yD(ρΠyρTr[Πyρ]γΠyγTr[Πyρ])yTr[ρΠyρ](lnTr[Πyρ]lnTr[Πyγ])\displaystyle=\sum_{y}D\left(\frac{\sqrt{\rho}\Pi_{y}\sqrt{\rho}}{\mathrm{Tr}[\Pi_{y}\rho]}\middle\|\frac{\sqrt{\gamma}\Pi_{y}\sqrt{\gamma}}{\mathrm{Tr}[\Pi_{y}\rho]}\right)-\sum_{y}\mathrm{Tr}[\sqrt{\rho}\Pi_{y}\sqrt{\rho}](\ln\mathrm{Tr}[\Pi_{y}\rho]-\ln\mathrm{Tr}[\Pi_{y}\gamma])
D(yρΠyρTr[Πyρ]yγΠyγTr[Πyρ])D((ρ)(γ))\displaystyle\geq D\left(\sum_{y}\frac{\sqrt{\rho}\Pi_{y}\sqrt{\rho}}{\mathrm{Tr}[\Pi_{y}\rho]}\middle\|\sum_{y}\frac{\sqrt{\gamma}\Pi_{y}\sqrt{\gamma}}{\mathrm{Tr}[\Pi_{y}\rho]}\right)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma))
=D(ργ)D((ρ)(γ))\displaystyle=D(\rho\|\gamma)-D(\mathcal{M}(\rho)\|\mathcal{M}(\gamma)) (86)

where the inequality comes from the joint convexity of the Umegaki relative entropy. Similar inequalities hold for any other relative entropies that satisfy the joint convexity, such as the Belavkin-Staszewski one. Adding S(ρ)S(\rho) to both sides of Eq. 86 gives Eq. 44.