Observational entropy with general quantum priors
Abstract
Observational entropy captures both the intrinsic uncertainty of a thermodynamic state and the lack of knowledge due to coarse-graining. We demonstrate two interpretations of observational entropy, one as the statistical deficiency resulting from a measurement, the other as the difficulty of inferring the input state from the measurement statistics by quantum Bayesian retrodiction. These interpretations show that the observational entropy implicitly includes a uniform reference prior. Since the uniform prior cannot be used when the system is infinite-dimensional or otherwise energy-constrained, we propose generalizations by replacing the uniform prior with arbitrary quantum states that may not even commute with the state of the system. We propose three candidates for this generalization, discuss their properties, and show that one of them gives a unified expression that relates both interpretations.
1 Introduction
A few pages after defining the entropy that nowadays bears his name, von Neumann warns the reader that the quantity that he just defined is, in fact, unable to capture the phenomenological behavior of thermodynamic entropy [1]. More precisely, while the von Neumann entropy is always invariant in a closed system as a consequence of its invariance under unitary evolutions, the thermodynamic entropy of a closed system can instead increase, as it happens for example in the free expansion of an ideal gas. The explanation that von Neumann gives for this apparent paradox is the following: thermodynamic entropy includes not only the intrinsic ignorance associated with the microscopic state of the system, but also the lack of knowledge arising from a macroscopic coarse-graining of it. The latter lack of knowledge becomes worse as the gas expands. This observation leads him to introduce an alternative quantity, that he calls macroscopic entropy, for which an -theorem can be proved [2].
In recent years, von Neumann’s macroscopic entropy and a generalization thereof called observational entropy (OE) has been the object of renewed interest [3, 4, 5, 6, 7, 8], finding a number of applications [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. So far, even when the narrative is based on a quantum state being subject to a measurement , all the definitions fit in classical stochastic thermodynamics.
In this paper, we explore possible generalizations of OE. We note that the original OE includes an implicit prior belief about the state, which is the uniform distribution. Since in several applications the uniform prior cannot be used, e.g., in infinite-dimensional or continuous variable systems, or does not play well with other physical constraints, e.g., in thermodynamic systems with a nondegenerate Hamiltonian at finite temperature, we allow the observer to have a non-uniform prior. More generally, we consider the possibility that the observer has a reference prior described by an arbitrary density operator, which may not even commute with the state of the system. In this case, classical probability distributions may not be sufficient to describe the non-commutativity between the state and the reference, and thus the original definition of OE is not applicable.
2 Classical OE and reference states
In what follows, we restrict our attention to finite-dimensional quantum systems, with Hilbert space , and finite measurements, i.e., positive operator-valued measures (POVMs) labeled by the elements of a finite set . In this context, the definition of OE is
(1) |
where and . One of the conceptual advantages of OE is that it is able to “interpolate” between Boltzmann and Gibbs–Shannon entropies. On the one hand, if the measurement is so coarse-grained that one of its elements (say ) is the projector on the support of , then takes the form of a Boltzmann entropy. If, on the other hand, the measurement is projective and rank-one (i.e., for all ), then coincides with the Shannon entropy of the probability distribution , which is equal to when .
In general, it holds that [4]
(2) |
If represents, in von Neumann’s original narrative, the least uncertainty that an observer, able to perform any measurement in principle allowed by quantum theory, has about the state of the system, then the additional uncertainty included in OE is a consequence of observing the system through the “lens” provided by the given measurement . Thus, in this sense, OE can be seen as a measure of how inadequate a given measurement is with respect to the state .
2.1 OE from statistical deficiency
The above discussion suggests one possible generalization of OE, starting from the re-writing of (2) recently noticed by some of us [8]. Consider the measurement channel associated to the measurement , defined as
(3) |
where is an arbitrary but fixed orthonormal basis of the system that records the measurement outcome. By further noticing that with the maximally mixed state, one obtains
(4) |
where
(5) |
is the Umegaki quantum relative entropy between states and [22, 23], which generalizes the relative entropy (a.k.a. Kullback-Leibler divergence)
(6) |
between probability distributions and [24].
The expression (4) makes it clear that the quantity exactly equals the loss of distinguishability between the signal and the totally uniform background that occurs when the measurement is used instead of the best possible measurement allowed by quantum theory. In statistical jargon, we thus say that measures the statistical deficiency of the measurement in distinguishing against .
This observation enlightens something implicit in the original definition (1) of OE: the coarse-graining is captured by the “volumes” only because the maximally mixed state is chosen as the reference background. It is thus natural to try to incorporate more general references in the definition of OE. A direct generalization could be obtained, therefore, by replacing with another reference state in (4), so that
(7) |
where represents some non-commutative generalization of the Kullback–Leibler divergence, not necessarily Umegaki’s one.
2.2 OE from irretrodictability
There exists another evocative re-writing of (2), which in turn suggests that further structures may play a role in the definition of OE. Specifically, here we exhibit a dynamical interpretation of OE, based on a measurement process defined as follows.
Let be a diagonal decomposition of the state of the system. We consider a stochastic map associated to a prepare-and-measure protocol: with probability , the state is prepared, and it is then measured with the POVM , yielding outcome with a probability given by the Born rule, that is
(8) | ||||
(9) |
The subscript stands for “forward”. This is because, as we will see in what follows, the quantity in (2) emerges also from a comparison between the forward map defined above and a suitably defined “reverse” map.
Traditionally, the definition of the reverse process, given a forward process, relies on a detailed knowledge of the physical dynamics involved [25, 26, 27]. In the absence of such knowledge, which is typically the case for a system interacting with a complex environment, one must resort to physical intuition, plausibility arguments, or, failing that, arbitrary assumptions. In order to avoid all this, a systematic recipe has recently been found [28, 29], which allows to define the reverse process only from unavoidable rules of logical retrodiction: specifically, Jeffrey’s theory of probability kinematics [30] or, equivalently, Pearl’s virtual evidence method [31, 32].
The idea is as follows: given a forward conditional probability , how should information, obtained at later times about the final outcome and encoded in an arbitrary distribution , be propagated back to the initial state in a way that ensures logical consistency? Jeffrey’s theory of probability kinematics, which is equivalent to Pearl’s virtual evidence method [31, 32], stipulates that the only logically consistent back-propagation rule is what is now known as Jeffrey’s update: starting from an arbitrarily chosen reference prior on the initial state , say , one constructs the Bayesian inverse of , i.e.,
(10) |
and uses that as a stochastic channel to back-propagate the new information from the final outcome to the initial state , so that as the reverse process we obtain
(11) |
Jeffrey’s update constitutes a generalization of Bayes’ theorem, as the latter is recovered as a special case of the former when , i.e., when the information about the final outcome is definite [32].
An important point to emphasize here is that the reference prior used to construct the retrodictive channel in (10) is merely a formal device needed to establish a mathematical correspondence between forward and backward process: it need not be related in any way with the “true” distribution . Likewise, the distribution represents new and completely arbitrary information, which need not correspond to any input distribution under , that is, there may exist no distribution such that . Moreover, in principle, may also be incompatible with the reference prior , in the sense that it could happen that, for some , but . In such a situation, one would conclude that the data falsify the inferential model, but for simplicity we will avoid such cases by assuming that all probabilities are strictly greater than zero (though possibly arbitrarily small).
We now go back to our specific forward process (9), i.e., . If we choose as reference the uniform distribution , i.e., for all , and as new information information the outcomes’ expected probability of occurrence, i.e., , by direct substitution in (10) and (11), we obtain
(12) |
The above can also be read as a prepare-and-measure process, in which the state is prepared with probability , and later measured in the basis . The process in (12) is the process that a retrodictive agent would infer, knowing only the forward process (9) and the outcome distribution , but completely ignoring the actual distribution , so that the latter is replaced by the uniform distribution.
Using (9) and (12), it is straightforward to check that
(13) |
The above relation suggests an alternative interpretation for the difference , as the degree of statistical distinguishability between a predictive process, i.e., , and a retrodictive process constructed from a uniform reference, i.e., . Thus, the larger , the more irretrodictable the process becomes [33, 34].
Eq. (13) also offers an alternative way of thinking about generalizations of OE, where the uniform reference is again replaced by an arbitrary state, as was done in Section 2.1, but this time for the purpose of constructing another reverse process. That is, one could also consider generalizations such as
(14) |
where is again some quantum relative entropy (not necessarily Umegaki’s), is an input-output description of the quantum process consisting in preparing the state and measuring it with the channel , and is the description of the corresponding reverse process computed with respect to the reference prior . All these ingredients will be rigorously defined in Section 4.
3 A definition of OE for priors such that
As we have seen, in the case of conventional OE, the statistical deficiency approach and the irretrodictability approach coincide, i.e.
The same holds for any prior that commutes with . Indeed, assuming , let us write using the same vectors that diagonalize . The reverse process of [Eq. (9)] becomes
(15) | ||||
and it is straightforward to verify that
(16) |
Therefore, when , the expression
(17) | |||
(18) | |||
(19) |
is a generalized OE that fits both the statistical deficiency approach and the irretrodictability approach. As we are going to see, it is not obvious to ensure both interpretations when .
Note that nothing has been said about the measurement , which may well not commute with either or . As a case study, let us look closely to the fully classical case in which, besides having , the measurement is a projective measurement in their same eigenbasis :
(20) |
where the index sets are disjoint and complete as to form a POVM, i.e. , and for . Then, denoting by and the eigenvalues of and respectively, we have and , and Eq. (19) yields
(21) |
While the second term depends only on the observed statistics by construction, the first term depends in general on the full information }. In fact, depends only on the if and only if the are uniform in each subspace; that is, for every and for every , or, equivalently, . In this case, : the weights of the prior do not matter. The interpretation of this observation is clear: the observation gives precisely the weights to be attributed to each , trumping any prior belief on those weights.
When the dependence on the is present, it is a mild one: for instance, in the paradigmatic case where is thermal, the term is (up to an additive constant) the average energy, often assumed as known in thermodynamics. A purely “observational” character could be recovered with minor modifications of the definition; we leave this aside.
4 Definitions of OE with an arbitrary quantum reference prior
In this section, we introduce the mathematical notations and backgrounds, and propose some candidates for OE with a general reference prior state.
4.1 Input-output description of quantum processes
Let and respectively denote the input and output systems of the measurement channel in Eq. (3). The general recipe for the retrodiction of a quantum process [28, 8, 35] is defined via its Petz recovery map [36, 37] as
(22) |
where encodes the distribution , describing the retrodictor’s knowledge, cfr. Eq. (11). We will mainly discuss the case where , namely . For the measurement channel given in (3), the Petz recovery map can be written as
(23) |
As an ingredient for later constructions, we introduce the Choi operator [38], defined for the process from system to system as
(24) |
where and belong to an arbitrary but fixed orthonormal basis of the input Hilbert space of system . The reverse process has the following Choi operator
(25) |
where and belong to an arbitrarily fixed orthonormal basis of the Hilbert space . Note that we put system first and system second, in order to have the same ordering of systems for both and .
With such a definition, the Choi operators of the forward and reverse processes are related by the following lemma (proof in Appendix A):
Lemma 1.
We now want to construct two objects, and , which, analogously to the joint distributions and , are able to capture both the input and output of the forward and reverse processes. Specifically, the marginals of the operator should recover the input state and the output respectively, and analogously for .
One choice is to define
(27) |
Such an operator is indeed able to capture the input and output of the forward process, in the sense that:
(28) |
We define the representation for the reverse process similarly as
(29) |
where is the input of the reverse process. We use the transpose of the Choi operator of the reverse process so that it can be linked to by Lemma 1 in the following way:
This operator captures the input and output of the reverse process:
(30) |
The operators just defined are analogous to the state over time proposed by Leifer and Spekkens [39, 40] up to a partial transpose.
Other definitions of input-output operators may satisfy nice properties. An alternative choice is, for example,
(31) |
and
(32) | ||||
The superscript in (31) and (32) (not to be confused with ) is used because the operators and are, in a loose sense, a “transposition” of and , respectively. If do not commute, in general and . For example, which in general differs from . Yet, they are similar, in the sense that and (resp. and ) share the same eigenvalues, and are thus unitarily equivalent, as it happens when doing a proper transposition. Therefore, and can be viewed as legitimate representations (up to unitaries) of the forward and reverse processes, and they will be useful in the irretrodictability interpretation of OE.
4.2 Candidates for generalized OE
Eqs. (7) and (14) provide two forms of the observational entropy: Eq. (7), arising from the statistical deficiency approach, is the difference between relative entropies evaluated on the input system and the output system; Eq. (14), arising from the irretrodictability approach, is the relative entropy between the forward and reverse processes. In the remainder of this section, we will propose generalizations of OE that take either or both of these forms.
4.2.1 Candidate #1: difference between input/output Umegaki entropies
A first fully quantum generalisation of OE may just be obtained by replacing the reference state in Eq. (4) with a general reference state :
(33) | |||
that is, , cf. Eq. (19), though this time it may be that . This definition has the form of Eq. (7) with taken to be the Umegaki relative entropy (5). Notice that, while is a fully quantum relative entropy, is in fact classical, since all the outputs of the channel are diagonal in the same basis.
4.2.2 Candidate #2: Umegaki relative entropy between forward/reverse processes
Another option is to define a OE through Eq. (14), thus choosing to prioritize irretrodictability. For this, one needs to choose a relative entropy and representations of the forward and reverse processes. Using the Umegaki relative entropy and the representations defined in (27) and (29), we get
(34) | ||||
However, we will show in the following sections that this candidate lacks some of the properties we desire: we introduced it mainly for comparison with other candidates.
4.2.3 Candidate #3: Belavkin–Staszewski relative entropy
Besides Umegaki relative entropy, there are other choices for the quantum relative entropy between the representations of the forward and reverse processes, and between the states and . One such choice is the Belavkin–Staszewski relative entropy [41], defined as
(35) |
The Belavkin–Staszewski relative entropy coincides with the Umegaki relative entropy and the classical relative entropy when and commute, otherwise in general it is never smaller than Umegaki’s. For a summary of the main properties of Belavkin–Staszewski relative entropy, and its relations with other quantum relative entropies, we refer the interested reader to Ref. [42].
Inserting into Eq. (7), we obtain
(36) | |||
Remarkably, it turns out that the above definition recovers the form of Eq. (14). Assuming that , and thus , is full-rank, one has
(37) |
where since those states commute, and where and were defined in (31) and (32). The proof of the identity (37) is given in Appendix B. Thus, indeed admits both the statistical deficiency and the irretrodictability interpretations.
5 Properties
Definition | Deficiency interpretation | Irretrodictability interpretation | Equal to when | Petz recovery criterion | Non-decreasing under stochastic post-processing |
---|---|---|---|---|---|
(33) | N/A | Yes | Yes | ||
(34) | N/A | commute | Yes | No | |
(36) | Yes | Yes |
We proceed now to discuss the properties of the three candidates () defined above, with a comparison between them summarized in Table 1. The main properties to consider for any candidate generalized OE are the following:
-
(i)
When the reference prior is the uniform distribution (maximally mixed state), the candidate should recover the original definition (1). This is true for and , namely when ,
(38) Instead, in order to recover the conventional OE, further requires that for all .
-
(ii)
More generally, when , one has
(39) Instead, the condition in general requires for all .
-
(iii)
Like the original OE, all of them are lower-bounded by the von Neumann entropy:
(40) Thus, the OEs retain the desirable property that one cannot have less uncertainty than the von Neumann entropy.
The proofs of the above properties are in Appendix C.
Other non-essential, yet desirable properties include:
- (iv)
-
(v)
satisfies the Petz recovery criterion: if and only if , where is the Petz map of with reference defined in (22). This property is satisfied by all candidates, as shown in Appendix D.
-
(vi)
is non-decreasing under stochastic post-processing. We say is a post-processing of if its outcome can be obtained by applying a stochastic map on the outcome of , namely there exists a stochastic matrix with for all satisfying
(41) This property for says that, for any that is a post-processing of , one has
(42) This property is satisfied by , with proofs in Appendix E.
Finally, we notice that while the original OE, Eq. (1), is upper bounded as , in general, for a non-uniform reference , the same bound does not hold, as expected. However
(43) |
holds because the Belavkin-Staszewski relative entropy bounds the Umegaki one from above [43, 44]. Also
(44) |
holds due to joint convexity of the relative entropy (proof in Appendix F).
6 Examples
6.1 Gibbs prior
In the presence of a Hamiltonian , a very natural choice of non-uniform prior is the Gibbs state
(45) |
We also consider the measurement in the energy eigenbasis , but to move far away from the classical case we assume that the input state is pure and maximally unbiased with the energy eigenbasis:
(46) |
With these assumptions, the first definition yields
(47) |
which is also the case if is a mixture of maximally unbiased states. Like , reduces to the original when the prior is a convex sum of the measurement elements.
The second definition yields
(48) |
for any pure state, since the support of does not contain the support of . We shall comment on this result after the next example.
Finally, the third definition yields
(49) | ||||
(50) |
where the first line is general, while the second is the expression for equidistant spectrum . Thus is more sensitive than to quantum situations.
6.2 Three-qubit encoding
The following example is inspired by a simple error-correcting code, the three-qubit encoding of a pure qubit:
(51) |
Suppose is the encoded state
(52) |
We consider the measurement of each qubit in the basis, i.e. the POVM elements are projectors on the basis vectors
(53) |
As for the prior, we suppose that the observer knows the encoding of the error correction code, and expect to be more probably in the subspace spanned by and , possibly with a bias towards one of those product states; whence
(54) | ||||
In this case, the three definitions proposed here yield
(55) | ||||
(56) | ||||
(57) |
with
(58) |
and differ in the first term, as long as : for , both yield . In particular, when , and therefore . We see that is still infinite, for the same reason of support mismatch as in the previous example. From the examples, we observe that is often overly sensitive to the non-commutativity between and . This suggests that, instead of the natural choice of and as input-output representations, one could opt for representations whose supports are more aligned, such as and , which relate to via Eq. (37).
7 Conclusions
The original definition [Eq. (1)] of observational entropy (OE) was known to be lower-bounded by the von Neumann entropy. Here we have first brought to the fore that the excess term can be interpreted in two ways: as a statistical deficiency (4), quantifying the decrease of state distinguishability induced by the measurement; and as irretrodictability (13), quantifying the hardness of retrodicting the input from the output statistics. While it is intuitive that recovering the input state is harder if the measurement makes states less distinguishable, the exact coincidence of the quantifiers is of interest.
In both interpretations, we observe that the uniform state plays the role of reference, or prior, knowledge. This may not represent the proper knowledge of the physical situation: for instance, for systems in contact with a thermal bath, it may be more natural to choose the Gibbs prior. Based on this, we have studied generalisations of OE, in which the prior knowledge can be an arbitrary state .
When , we find an obvious generalisation of the excess term [Eq. (16)] that retains both interpretations of statistical deficiency and irretrodictability. This is no longer straightforward for a general quantum prior. Technically, one of the main difficulty lies in that the irretrodictability quantifier is a relative entropy between joint input-and-output objects, whose definition in quantum theory is a current topic of research. We have explored three possible definitions of generalized OE (Table 1): two specifically designed to satisfy one of the interpretations but lacking the other; the third retaining both by replacing the usual Umegaki relative entropy with the Belavkin-Staszewski version. Thus we have a novel fully quantum object, that quantifies simultaneously the loss of distinguishability by the measurement and the hardness to retrodict the input knowing the output. Being built from information-theoretical considerations, our new formulation of OE may also hold significance in physical (thermodynamical) contexts, such as its relationship with work extraction [19]. We leave a deeper exploration of the physical implications of OE for future research.
Acknowledgments
We thank Clive Aw, Fumio Hiai, Anna Jenčová and Andreas Winter for discussions.
G.B. and V.S. are supported by the National Research Foundation, Singapore and A*STAR under its CQT Bridging Grant; and by the Ministry of Education, Singapore, under the Tier 2 grant “Bayesian approach to irreversibility” (Grant No. MOE-T2EP50123-0002). D.Š. acknowledges the support from the Institute for Basic Science in Korea (IBS-R024-D1). F.B. acknowledges support from MEXT Quantum Leap Flagship Program (MEXT QLEAP) Grant No. JPMXS0120319794, from MEXT-JSPS Grant-in-Aid for Transformative Research Areas (A) “Extreme Universe” No. 21H05183, and from JSPS KAKENHI, Grants No. 20K03746 and No. 23K03230. J.S. acknowledges support by MICIIN with funding from European Union NextGenerationEU (PRTR-C17.I1) and by Generalitat de Catalunya.
References
- [1] John von Neumann. “Mathematical foundations of quantum mechanics”. Princeton university press. (1955).
- [2] John von Neumann. “Proof of the ergodic theorem and the H-theorem in quantum mechanics. Translation of: Beweis des Ergodensatzes und des H-Theorems in der neuen Mechanik”. European Physical Journal H 35, 201–237 (2010).
- [3] Dominik Šafránek, J. M. Deutsch, and Anthony Aguirre. “Quantum coarse-grained entropy and thermodynamics”. Phys. Rev. A 99, 010101 (2019). arXiv:1707.09722.
- [4] Dominik Šafránek, J. M. Deutsch, and Anthony Aguirre. “Quantum coarse-grained entropy and thermalization in closed systems”. Phys. Rev. A 99, 012103 (2019). arXiv:1803.00665.
- [5] Dominik Šafránek, Anthony Aguirre, Joseph Schindler, and J. M. Deutsch. “A Brief Introduction to Observational Entropy”. Foundations of Physics 51, 101 (2021). arXiv:2008.04409.
- [6] Philipp Strasberg and Andreas Winter. “First and second law of quantum thermodynamics: A consistent derivation based on a microscopic definition of entropy”. PRX Quantum 2, 030202 (2021).
- [7] Dominik Šafránek and Juzar Thingna. “Quantifying information extraction using generalized quantum measurements”. Physical Review A 108, 032413 (2023). arXiv:2007.07246.
- [8] Francesco Buscemi, Joseph Schindler, and Dominik Šafránek. “Observational entropy, coarse-grained states, and the petz recovery map: information-theoretic properties and bounds”. New Journal of Physics 25, 053002 (2023).
- [9] Andreu Riera-Campeny, Anna Sanpera, and Philipp Strasberg. “Quantum systems correlated with a finite bath: Nonequilibrium dynamics and thermodynamics”. PRX Quantum 2, 010340 (2021). arXiv:2008.02184.
- [10] Dominik Šafránek, Anthony Aguirre, and J. M. Deutsch. “Classical dynamical coarse-grained entropy and comparison with the quantum version”. Phys. Rev. E 102, 032106 (2020). arXiv:1905.03841.
- [11] Joshua M. Deutsch, Dominik Šafránek, and Anthony Aguirre. “Probabilistic bound on extreme fluctuations in isolated quantum systems”. \pre101, 032112 (2020). arXiv:1806.08897.
- [12] Dana Faiez, Dominik Šafránek, J. M. Deutsch, and Anthony Aguirre. “Typical and extreme entropies of long-lived isolated quantum systems”. \pra101, 052101 (2020). arXiv:1908.07083.
- [13] Charlie Nation and Diego Porras. “Taking snapshots of a quantum thermalization process: Emergent classicality in quantum jump trajectories”. \pre102, 042115 (2020). arXiv:2003.08425.
- [14] Philipp Strasberg, María García Díaz, and Andreu Riera-Campeny. “Clausius inequality for finite baths reveals universal efficiency improvements”. Phys. Rev. E 104, L022103 (2021). arXiv:2012.03262.
- [15] Ryusuke Hamazaki. “Speed Limits for Macroscopic Transitions”. PRX Quantum 3, 020319 (2022). arXiv:2110.09716.
- [16] Ranjan Modak and S. Aravinda. “Observational-entropic study of anderson localization”. Phys. Rev. A 106, 062217 (2022).
- [17] Sreeram PG, Ranjan Modak, and S. Aravinda. “Witnessing quantum chaos using observational entropy”. Phys. Rev. E 107, 064204 (2023).
- [18] Joseph Schindler and Andreas Winter. “Continuity bounds on observational entropy and measured relative entropies”. Journal of Mathematical Physics64 (2023). arXiv:2302.00400.
- [19] Dominik Šafránek, Dario Rosa, and Felix C. Binder. “Work extraction from unknown quantum sources”. Phys. Rev. Lett. 130, 210401 (2023).
- [20] Dominik Šafránek and Dario Rosa. “Measuring energy by measuring any other observable”. Phys. Rev. A 108, 022208 (2023).
- [21] Dominik Šafránek. “Ergotropic interpretation of entanglement entropy” (2023). arXiv:2306.08987.
- [22] Hisaharu Umegaki. “On information in operator algebras”. Proc. Japan Acad. 37, 459–461 (1961).
- [23] Hisaharu Umegaki. “Conditional expectation in an operator algebra. iv. entropy and information”. Kodai Mathematical Journal14 (1962).
- [24] S. Kullback and R. A. Leibler. “On information and sufficiency”. Ann. Math. Statist. 22, 79–86 (1951).
- [25] Gavin E. Crooks. “Nonequilibrium Measurements of Free Energy Differences for Microscopically Reversible Markovian Systems”. Journal of Statistical Physics 90, 1481–1487 (1998).
- [26] Massimiliano Esposito, Upendra Harbola, and Shaul Mukamel. “Nonequilibrium fluctuations, fluctuation theorems, and counting statistics in quantum systems”. Rev. Mod. Phys. 81, 1665–1702 (2009).
- [27] Gabriel T. Landi and Mauro Paternostro. “Irreversible entropy production: From classical to quantum”. Rev. Mod. Phys. 93, 035008 (2021).
- [28] Francesco Buscemi and Valerio Scarani. “Fluctuation theorems from bayesian retrodiction”. Phys. Rev. E 103, 052111 (2021).
- [29] Clive Cenxin Aw, Francesco Buscemi, and Valerio Scarani. “Fluctuation theorems with retrodiction rather than reverse processes”. AVS Quantum Science 3, 045601 (2021). arXiv:https://doi.org/10.1116/5.0060893.
- [30] Richard Carl Jeffrey. “The logic of decision”. McGraw-Hill. (1965).
- [31] Judea Pearl. “Probabilistic reasoning in intelligent systems: networks of plausible inference”. Elsevier. (1988).
- [32] Hei Chan and Adnan Darwiche. “On the revision of probabilistic beliefs using uncertain evidence”. Artificial Intelligence 163, 67 – 90 (2005).
- [33] Satosi Watanabe. “Symmetry of physical laws. part iii. prediction and retrodiction”. Rev. Mod. Phys. 27, 179–186 (1955).
- [34] Satosi Watanabe. “Conditional probabilities in physics”. Progr. Theor. Phys. Suppl. E65, 135–160 (1965).
- [35] Arthur J. Parzygnat and Francesco Buscemi. “Axioms for retrodiction: achieving time-reversal symmetry with a prior”. Quantum 7, 1013 (2023).
- [36] Denes Petz. “Sufficient subalgebras and the relative entropy of states of a von neumann algebra”. Comm. Math. Phys. 105, 123–131 (1986).
- [37] Denes Petz. “Sufficiency of channels over von Neumann algebras”. The Quarterly Journal of Mathematics 39, 97–108 (1988).
- [38] Man-Duen Choi. “Completely positive linear maps on complex matrices”. Linear Algebra and its Applications 10, 285–290 (1975).
- [39] M. S. Leifer. “Conditional density operators and the subjectivity of quantum operations”. In AIP Conference Proceedings. Volume 889, page 172–186. AIP (2007).
- [40] M. S. Leifer and Robert W. Spekkens. “Towards a formulation of quantum theory as a causally neutral theory of bayesian inference”. Physical Review A88 (2013).
- [41] Viacheslav P. Belavkin and P. Staszewski. “-algebraic generalization of relative entropy and entropy”. In Annales de l’IHP Physique théorique. Volume 37 no. 1, pages 51–58. (1982). url: http://www.numdam.org/item/AIHPA_1982__37_1_51_0/.
- [42] Sumeet Khatri and Mark M. Wilde. “Principles of quantum communication theory: A modern approach” (2020). arXiv:2011.04672.
- [43] Keiji Matsumoto. “A new quantum version of f-divergence”. In Reality and Measurement in Algebraic Quantum Theory. Page 229–273. Springer Singapore (2018).
- [44] Fumio Hiai and Milán Mosonyi. “Different quantum f-divergences and the reversibility of quantum operations”. Reviews in Mathematical Physics 29, 1750023 (2017).
- [45] Andreas Bluhm, Ángela Capel, Paul Gondolf, and Antonio Pérez-Hernández. “Continuity of quantum entropic quantities via almost convexity”. IEEE Transactions on Information Theory 69, 5869–5901 (2023).
- [46] Fumio Hiai, Milán Mosonyi, Dénes Petz, and Cédric Bény. “Quantum f-divergences and error correction”. Reviews in Mathematical Physics 23, 691–747 (2011).
- [47] Andreas Bluhm and Ángela Capel. “A strengthened data processing inequality for the belavkin–staszewski relative entropy”. Reviews in Mathematical Physics 32, 2050005 (2019).
Appendix A Proof of Lemma 1
Appendix B Proof of Eq. 37
Appendix C Proof of properties (i)-(iii)
When , and are both equal to the relative entropy between the eigenvalues of and . By comparing their definitions Eqs. 17, 33 and 36, we obtain .
For , we further need to use that . This condition indicates that all commute, and therefore , , and
(65) |
Notice that the above equality holds as long as the support of is contained in that of without assuming to be invertible, since the Umegaki relative entropy is continuous with respect to both arguments [45]. Therefore, and property (ii) holds for .
Property (iii) is equivalent to say that is non-negative.
is non-negative by the non-negativity of relative entropy between two unit-trace positive operators.
is non-negative by the data-processing inequality of Umegaki relative entropy. Last, by Eq. 43, .
Appendix D Proof of Petz recovery criteria (v)
We first show that both and are equal to if and only if . The case is addressed later.
D.1 Property (v) of
D.2 Property (v) of
D.3 Property (v) of
Next, we prove property (v) for . Before that, we notice that the Petz recovery condition indicates that .
Lemma 2.
For a measurement channel , implies .
Proof.
Now, we prove property (v). We first prove the “only if” part.
Suppose . This is equivalent to that , which is equivalent to that . Taking the partial trace over system , we get . Therefore, .
For the “if” part, we will show that implies , which is equivalent to .
Suppose . By Lemma 2, , and thus we can diagonalize them in the same basis:
(68) | ||||
(69) |
Next, we construct a new POVM by taking the diagonal elements of the POVM in the above basis:
(70) |
Let be the measurement channel and be its Petz map with reference . Notice that , , and therefore , .
Since , using property (ii) for , one has
(71) |
where and are the representations of and , which are simplified using the commutativity of , and .
Note that the expression Eq. 71 is the same as . Since , by property (v) for , we have . Combining this with Eq. 71 indicates that and . Expanding this with respect to the basis , one get
(72) | ||||
(73) |
where .
Now, fix , and consider the support of , which is spanned by all such that . In this subset of , one could cancel out in both sides and Eq. 73 becomes
(74) |
where denotes the support of . Taking the square root of this equation, one gets
(75) |
Define as the projector onto . The above equation can be rewritten as
(76) |
since multiplication with selects the eigenvectors of and in , which are and in Eq. 75.
Since is positive, its off-diagonal elements are cross terms in :
(77) |
for some complex numbers . Therefore, .
By this and Eq. 76,
(78) |
Appendix E Proof of monotonicity under stochastic post-processing (vi)
Let be the linear map describing the post-processing , which satisfies
(80) | ||||
(81) |
Since is a stochastic matrix, is completely positive and trace-preserving. The measurement channel of is then described by
(82) |
Property (vi) is equivalent to that . For , they have the same form:
(83) |
The inequality is due to the data-processing inequality of relative entropy.
Appendix F Proof of Eq. 44
By definition of and in Eqs. (27) and (29),
(84) | ||||
(85) |
and thus
(86) |
where the inequality comes from the joint convexity of the Umegaki relative entropy. Similar inequalities hold for any other relative entropies that satisfy the joint convexity, such as the Belavkin-Staszewski one. Adding to both sides of Eq. 86 gives Eq. 44.