This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\bstctlcite

BSTcontrol

Work, entropy production, and thermodynamics of information under protocol constraints

Artemy Kolchinsky [email protected] Santa Fe Institute, Santa Fe, New Mexico    David H. Wolpert Complexity Science Hub, Vienna; Arizona State University, Tempe, Arizona; http://davidwolpert.weebly.com Santa Fe Institute, Santa Fe, New Mexico
Abstract

In many real-world situations, there are constraints on the ways in which a physical system can be manipulated. We investigate the entropy production (EP) and extractable work involved in bringing a system from some initial distribution pp to some final distribution pp^{\prime}, given that the set of master equations available to the driving protocol obeys some constraints. We first derive general bounds on EP and extractable work, as well as a decomposition of the nonequilibrium free energy into an “accessible free energy” (which can be extracted as work, given a set of constraints) and an “inaccessible free energy” (which must be dissipated as EP). In a similar vein, we consider the thermodynamics of information in the presence of constraints, and decompose the information acquired in a measurement into “accessible” and “inaccessible” components. This decomposition allows us to consider the thermodynamic efficiency of different measurements of the same system, given a set of constraints. We use our framework to analyze protocols subject to symmetry, modularity, and coarse-grained constraints, and consider various examples including the Szilard box, the 2D Ising model, and a multi-particle flashing ratchet.

I Introduction

I.1 Background

One of the foundational issues in thermodynamics is quantifying how much work is required to transform a system between two thermodynamic states. Recent results in statistical physics have derived general bounds on work which hold even for transformations between nonequilibrium states (takara_generalization_2010, ; parrondo2015thermodynamics, ). In particular, suppose one wishes to transform a system with initial distribution pp and energy function EE to some final distribution pp^{\prime} and energy function EE^{\prime}. For an isothermal process, during which the system remains in contact with a single heat bath at inverse temperature β\beta, the work extracted during this transformation obeys

W(pp)FE(p)FE(p),W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(p)-F_{E^{\prime}}(p^{\prime}), (1)

where FE(p):=EpS(p)/βF_{E}(p):=\left\langle E\right\rangle_{p}-S(p)/\beta is the (nonequilibrium) free energy of distribution pp given energy function EE (takara_generalization_2010, ; parrondo2015thermodynamics, ; esposito2011second, ). This inequality comes from the second law of thermodynamics, which states that entropy production (EP), the total increase of the entropy of the system and all coupled reservoirs, is non-negative. For an isothermal process that carries out the transformation ppp\!\shortrightarrow\!p^{\prime}, EP is given by

Σ(pp)=β[FE(p)FE(p)W(pp)]0.\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})=\beta[F_{E}(p)-F_{E^{\prime}}(p^{\prime})-W(p\!\shortrightarrow\!p^{\prime})]\geq 0. (2)

Eq. 1 follows from Eq. 2 by a simple rearrangement.

To extract work from a system, one must manipulate the system by applying a driving protocol. There are many different driving protocols that can be used to transform some initial distribution pp to some final distribution pp^{\prime}, which generally incur different amounts of EP and work. Achieving the fundamental bounds set by the second law, such as Eq. 1, typically requires idealized protocols, which make use of arbitrary energy functions, infinite timescales, etc. In many real-world scenarios, however, there are strong practical constraints on how one can manipulate a system, and such idealized protocols are unavailable.

The goal of this paper is to derive stronger bounds on EP and work involves in carrying out the transformation ppp\rightarrow p^{\prime}, given constraints on the set of master equations available to the driving protocol. Ultimately, such stronger bounds on EP and work can provide new insights into various real-world thermodynamic processes and work-harvesting devices, ranging from biological organisms to artificial engines. They can also cast new light on some well-studied scenarios in statistical physics.

Refer to caption
Figure 1: A two-dimensional Szilard box with a single Brownian particle, where a vertical partition (blue) can be positioned at different horizontal locations in the box. We demonstrate that only information about the particle’s horizontal position, not its vertical position, can be used to extract work from the system.

For example, consider a two-dimensional Szilard box connected to a heat bath 111We use a Brownian model of the Szilard engine, which is similar to setups commonly employed in modern nonequilibrium statistical physics berut2012experimental ; roldan2014universal ; koski2014experimental ; shizume1995heat ; gong2016stochastic ; parrondo2015thermodynamics , as shown in Fig. 1. This model can be justified by imagining a box that contains a large colloidal particle, as well as a medium of small solvent particles to which the vertical partition is permeable. Note that this model differs from Szilard’s original proposal szilard1929entropieverminderung , in which the box contains a single particle in a vacuum, which has been analyzed in proesmans2015efficiency ; hondou2007equation ; bhat2017unusual ., which contains a single Brownian particle and a vertical partition, and suppose that the driving protocols can manipulate the horizontal position of this partition. Imagine that the particle is initially located in the left half of the box. How much work can be extracted by transforming this initial distribution to a uniform final distribution, assuming the system begins and ends with a uniform energy function? A simple application of Eq. 1 shows that the extractable work is upper bounded by (ln2)/β(\ln 2)/\beta. This bound can be achieved by quickly moving the vertical partition to the middle of the box, and then slowly expanding it rightward. Now imagine an alternative scenario, in which the particle is initially located in the top half of the box. By Eq. 1, the work that can be extracted by bringing this initial distribution to a uniform final distribution is again upper bounded by (ln2)/β(\ln 2)/\beta. Intuitively, however, it seems that this bound should not be achievable, given the constrained set of available protocols (i.e., one can only manipulate the system by moving the vertical partition left and right). Our results will make this intuition rigorous for the two-dimensional Szilard box, as well as various other systems that can only be manipulated by a constrained set of driving protocols.

This phenomenon also occurs when the starting and ending distributions can depend on the outcome of a measurement of the system. This kind of setup, which was first used to analyze the thermodynamics of information in various kinds of Maxwellian demons, is sometimes called “feedback control” in the literature sagawa2008second ; parrondo2015thermodynamics . Imagine that the state of some system XX is first measured using some observation channel (conditional distribution) q(m|x)q(m|x), producing measurement outcome mm with probability p(m)=xp(x)q(m|x)p(m)=\sum_{x}p(x)q(m|x). The system then undergoes a driving protocol which can depend on mm. For simplicity, we assume that the system’s energy function begins as EE and ends as EE^{\prime} for all measurement outcomes. Let pX|mp_{X|m} and pX|mp_{X^{\prime}|m}^{\prime} indicate the system’s initial and final conditional distributions given measurement outcome mm, and let p(x)=mp(m)pX|m(x|m)p(x)=\sum_{m}p(m)p_{X|m}(x|m) and p(x)=mp(m)pX|m(x|m)p^{\prime}(x^{\prime})=\sum_{m}p(m)p_{X^{\prime}|m}^{\prime}(x^{\prime}|m) indicate the system’s initial and final marginal distributions (for simplicity, below we often use notation like pp, instead of p(x)p(x)). We can then take expectations of both sides of Eq. 1 across measurement outcomes, thereby bounding the average extractable work as 222As common in the literature, in Eq. 3 we consider only the work that is extractable from the system after the measurement is made. We do not account for the possible work cost of making the measurement, nor any work exchanges that may be incurred by the measurement apparatus during the driving.

W\displaystyle\left\langle W\right\rangle mp(m)[FE(pX|m)FE(pX|m)].\displaystyle\leq\sum_{m}p(m)[F_{E}(p_{X|m})-F_{E^{\prime}}(p_{X^{\prime}|m}^{\prime})]. (3)

By adding and subtracting [S(p)S(p)]/β[S(p)-S(p^{\prime})]/\beta on the right hand side, we can further rewrite Eq. 3 in terms of the drop of the free energy in the marginal distribution, plus the loss of information between the measurement and the system over the course of the protocol,

WFE(p)FE(p)+[I(X;M)I(X;M)]/β,\displaystyle\left\langle W\right\rangle\leq F_{E}(p)-F_{E^{\prime}}(p^{\prime})+[I(X;M)-I(X^{\prime};M)]/\beta, (4)

where I(X;M)I(X;M) and I(X;M)I(X^{\prime};M) indicate the mutual information under the conditional distributions pX|mp_{X|m} and pX|mp_{X^{\prime}|m}^{\prime} respectively. Comparing Eq. 1 and Eq. 4, the bound on average extractable work increases with the drop of mutual information. This is a classic result from the “thermodynamics of information” sagawa2008second ; parrondo2015thermodynamics , which shows that information about the state of a system can be used to increase the work extracted from this system.

Just like Eq. 1, the bound in Eq. 4 is typically saturated by idealized protocols, which have access to arbitrary energy functions, infinite timescales, etc. As mentioned above, in the real-world there are typically constraints on the available protocols, in which case the bound of Eq. 4 may not be achievable. For example, consider again the Szilard box shown in Fig. 1. Imagine measuring a bit of information about the location of the particle and then using this information to extract work from the system while driving it back to a uniform equilibrium distribution. In this case I(X;M)=ln2I(X;M)=\ln 2 and I(X;M)=0I(X^{\prime};M)=0, so if the system starts and ends with the uniform energy function, Eq. 4 states that W(ln2)/β\langle W\rangle\leq(\ln 2)/\beta. Intuitively, however, it seems that measuring the particle’s horizontal position should be useful for extracting work from the system, while measuring the particle’s vertical position should not be useful. The general bound of Eq. 4 does not distinguish between these two kinds of measurements. In fact, this bound depends only on the overall amount of information acquired by the measurement (as quantified by I(X;M)I(X;M)), and is therefore completely insensitive to the content of that information (i.e., the particular pattern of correlations quantified by I(X;M)I(X;M)).

I.2 Summary of results and roadmap

In this paper we derive bounds on extractable work and EP which arise when carrying out the transformation ppp\!\shortrightarrow\!p^{\prime} under constraints on the driving protocol. We consider a system coupled to a single heat bath which undergoes a driving protocol over some time interval t[0,1]t\in[0,1] (where the units of time are arbitrary). A driving protocol is represented as a continuous-time master equation L(t)L(t), where L(t)L(t) refers to the (infinitesimal) generator at time tt. For example, a driving protocol could be a trajectory of time-dependent discrete-state rate matrices, or a trajectory of time-dependent Fokker-Planck operators for a continuous-state system.

We say that a driving protocol is constrained if there is some restricted set of generators Λ\Lambda such that L(t)ΛL(t)\in\Lambda at all times t[0,1]t\in[0,1]. As discussed below, the particular choice of Λ\Lambda depends on the specific constraints being considered. For example, Λ\Lambda might represent a set of generators that are invariant under some particular symmetry group (e.g., representing the dynamics of a set of indistinguishable particles, or a spin system on a lattice with symmetries).

Our analysis proceeds at three different “levels” of generality, which we summarize in the following subsections.

Level 1: General mathematical framework

In the first level of analysis, presented in Sections III and IV, we provide a general mathematical framework for deriving bounds on EP and work for constrained driving protocols.

To develop our framework, given some some set of allowed generators Λ\Lambda, we consider an associated operator operator ϕ\phi over distributions which satisfies two conditions: it obeys the so-called Pythagorean identity from information geometry, and it commutes with the dynamics generated by elements of Λ\Lambda (Eqs. 14 and 16 below). Given such an operator ϕ\phi, in Section III we show that for any distribution pp, the distribution ϕ(p)\phi(p) contains only that part of the free energy in pp which may be turned into work by a constrained driving protocol. Formally, we decompose the nonequilibrium free energy of distribution pp and energy function EE as

FE(p)=FE(ϕ(p))+D(pϕ(p))/β,\displaystyle F_{E}(p)=F_{E}(\phi(p))+D(p\|\phi(p))/\beta, (5)

where D()D(\cdot\|\cdot) indicates the Kullback-Leibler divergence. Then, for any constrained driving protocol that carries out the transformation ppp\!\shortrightarrow\!p^{\prime}, the extractable work is bounded as

W(pp)FE(ϕ(p))FE(ϕ(p)).\displaystyle W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(\phi(p))-F_{E^{\prime}}(\phi(p^{\prime})). (6)

We also demonstrate that EP can be lower bounded by the contraction of the Kullback-Leibler (KL) divergence between pp and ϕ(p)\phi(p) over the course of the protocol,

Σ(pp)D(pϕ(p))D(pϕ(p)).\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime})). (7)

Given these bounds, it can be seen that Eq. 5 decomposes the nonequilibrium free energy FE(p)F_{E}(p) into two terms: an accessible free energy FE(ϕ(p))F_{E}(\phi(p)), whose decrease over the course of the protocol may be extractable as work, and an inaccessible free energy D(pϕ(p))/βD(p\|\phi(p))/\beta, whose decrease over the course of the protocol cannot be turned into work and must be dissipated as EP. The accessible free energy is always less than the overall free energy, FE(ϕ(p))FE(p)F_{E}(\phi(p))\leq F_{E}(p), which follows from Eq. 5 and the non-negativity of KL divergence. We also show that the right hand side of Eq. 7 is non-negative,

D(pϕ(p))D(pϕ(p))0,D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\geq 0, (8)

which implies that our bounds on EP and work, Eqs. 7 and 6 respectively, are stronger than the general bounds provided by the second law (Σ0\Sigma\geq 0 and Eq. 1). Note that Eq. 8 also implies an irreversibility condition on the dynamics: for any two distributions pp and pp^{\prime}, a constrained driving protocol can either carry out the transformation ppp\!\shortrightarrow\!p^{\prime} or the transformation ppp^{\prime}\!\shortrightarrow\!p but not both — unless D(pϕ(p))=D(pϕ(p))D(p\|\phi(p))=D(p^{\prime}\|\phi(p^{\prime})).

In Section IV, we show that the general framework summarized above has important implications for thermodynamics of information. We consider the type of feedback-control setup discussed above: an observation apparatus first makes a measurement mm of the system, then the system undergoes a driving protocol (which can depend on mm) that carries out the transformation pX|mpX|mp_{X|m}\!\shortrightarrow\!p_{X^{\prime}|m}^{\prime}. Suppose that the driving protocols corresponding to all mm obey bounds like Eq. 6 for the same operator ϕ\phi. This operator then gives rise to the “mapped” initial and final conditional distributions ϕ(pX|m)\phi(p_{X|m}) and ϕ(pX|m)\phi(p_{X^{\prime}|m}^{\prime}). We can then bound average extractable work for feedback control under constraints as

WFE(p)FE(p)+[Iaccϕ(X;M)Iaccϕ(X;M)]/β,\displaystyle\left\langle W\right\rangle\leq F_{E}(p)-F_{E^{\prime}}(p^{\prime})+[I_{\mathrm{acc}}^{\phi}(X;M)-I_{\mathrm{acc}}^{\phi}(X^{\prime};M)]/\beta,

where the accessible information component of the initial mutual information I(X;M)I(X;M) is defined as

Iaccϕ(X;M)=I(X;M)D(pX|Mϕ(pX|M)),\displaystyle I_{\mathrm{acc}}^{\phi}(X;M)=I(X;M)-D(p_{X|M}\|\phi(p_{X|M})), (9)

and similarly for similarly for Iaccϕ(X;M)I_{\mathrm{acc}}^{\phi}(X^{\prime};M). This bound is a refinement of Eq. 4 in the presence of protocol constraints, which shows that the amount of extractable work depends on the accessible information Iaccϕ(X;M)I_{\mathrm{acc}}^{\phi}(X;M), rather than the actual mutual information I(X;M)I(X;M). Loosely speaking, the accessible information reflects the “alignment” between the choice of measured observable and the way the system can be manipulated, given some protocol constraints. This means that, in the presence of constraints, the thermodynamic value of information depends not only on the amount of measured information, but also the content of that information (corning1998thermodynamics, ; kolchinsky2018semantic, ). (See also kauffman2000investigations for a popular discussion of some related issues.)

It is important to note that at this general level of analysis, we do not describe how to construct the operator ϕ\phi, as this construction will typically depend on the structure of the set Λ\Lambda. However, as described in the following subsection, we do provide explicit expressions for ϕ\phi for three broad classes of protocol constraints, which we term symmetry, modularity, and coarse-grained constraints.

Level 2: Symmetry, modularity, and coarse-grained constraints

At the second level of our analysis, we apply the general framework described above to derive bounds on EP and work for three broad classes of protocol constraints:

  • Section V considers symmetry constraints, when the available generators possess some symmetry group. Examples of systems with symmetry constraints include the Szilard box in Fig. 1, spin systems on lattices, and gases of indistinguishable particles. The operator ϕ\phi corresponding to symmetry constraints, defined in Eq. 42, maps distributions to their “symmetrized” versions (which are invariant under the action of the symmetry group).

  • Section VI considers modularity constraints, when the available generators cause different (though possibly overlapping) subsystems of a multivariate system to evolve independently of each other. Examples of systems with modularity constraints include digital circuits wolpert2020thermodynamic , ideal gases, and multi-particle Maxwellian demons. The operator ϕ\phi corresponding to modularity constraints, defined in Eq. 64, maps distributions to their “uncorrelated” versions, without statistical dependencies between independent subsystems.

  • Section VII considers coarse-grained constraints, when the available generators exhibit closed coarse-grained dynamics which obey some constraints (e.g., coarse-grained symmetry or modularity constraints). An example is provided by the Szilard box in Fig. 1: the particle’s vertical position (the coarse-grained macrostate) evolves in a way that does not depend on the horizontal position, and the macrostate equilibrium distribution cannot be controlled by moving the partition. Given a protocol that obeys coarse-grained constraints, we show that the EP can be lower bounded in terms of a “coarse-grained EP”, Eqs. 87, 88 and 89, and that this coarse-grained EP can itself be lower bounded by a coarse-grained version of Eq. 7.

In addition, we also discuss how tighter bounds on work and EP can be derived by combining different kinds of constraints (e.g., when a system obeys two different symmetry groups, or when it obeys both symmetry and modularity constraints).

Level 3: Concrete examples

At the third (and most concrete) level, we illustrate our results for symmetry, modularity, and coarse-grained constraints on several example systems:

  • In Section V.1, we use symmetry constraints to derive thermodynamic bounds for the Szilard box in Fig. 1, which possesses vertical reflection symmetry.

  • In Section V.2, we use symmetry constraints to derive thermodynamic bounds for the Ising model on a 2D lattice, which possesses translational symmetry.

  • In Section VI.1, we use modularity constraints to derive thermodynamic bounds for the Szilard box in Fig. 1, which are different from the bounds derived in Section V.1. We also demonstrate that stronger results can be derived by combining bounds arising from symmetry and modularity constraints.

  • In Sections VI.2 and VI.3, we use modularity constraints to derive bounds on work extraction for two multi-particle feedback-control protocols that have been proposed in the literature: a multi-particle Szilard box song2021optimal and a collective flashing ratchet cao2004feedback .

  • In Section VII.1, we use coarse-grained constraints to derive thermodynamic bounds for a version of the Szilard box in Fig. 1 in the presence of gravity. We also demonstrate that stronger results can be derived by combining bounds arising from coarse-grained and modularity constraints.

Literature review and discussion

After presenting the results summarized above, in Section VIII we discuss related prior literature. We also compare and contrast our results, such as the decomposition of nonequilibrium free energy in Eq. 5, to some relevant work in quantum thermodynamics janzing_quantum_2006 ; vaccaro_tradeoff_2008 . We conclude with a brief discussion in Section IX, which also touches upon how our approach generalizes beyond the assumption of a single heat bath. Proofs and derivations are in the appendices.

II Preliminaries

We consider a physical system with state space XX, which can be either discrete or continuous (X=nX=\mathbb{R}^{n}). The term “probability distribution” will refer to a probability mass function over XX in the discrete case and to a probability density function over XX in the continuous case. We interchangeably use notation like p(x)p(x) and pxp_{x} (as will be clear from context) to indicate the probability of state xx. We use 𝒫\mathcal{P} to refer to the set of all probability distributions over XX.

The system evolves in a stochastic manner during a driving protocol over time t[0,1]t\in[0,1]. We will write p(t)p(t) to indicate the distribution at time tt corresponding to some initial distribution p(0)=pp(0)=p, and p(1)=pp(1)=p^{\prime} to indicate the distribution at the end of the protocol. For a discrete-state system, the distribution at time tt evolves according to the time-dependent master equation,

tpx(t)=x[Lxx(t)px(t)Lxx(t)px(t)],\displaystyle{\textstyle\partial_{t}}p_{x}(t)=\sum_{x^{\prime}}\left[L_{xx^{\prime}}(t)p_{x^{\prime}}(t)-L_{x^{\prime}x}(t)p_{x}(t)\right], (10)

where Lxx(t)L_{x^{\prime}x}(t) is the transition rate from state xx to state xx^{\prime}. We assume that the system is coupled to a heat bath at inverse temperature β\beta, and so each L(t)L(t) obeys local detailed balance (see Section IX for a generalization of this assumption). Formally, this means that πxL(t)Lxx(t)=πxL(t)Lxx(t)\pi^{L(t)}_{x^{\prime}}L_{xx^{\prime}}(t)=\pi^{L(t)}_{x}L_{x^{\prime}x}(t) for all xx,xx^{\prime}, and tt, where πL(t)\pi^{L(t)} is the stationary distribution of rate matrix L(t)L(t), which we assume is unique (though this latter assumption can be relaxed 333The assumption of unique stationary distributions can be relaxed as long as the operator ϕ\phi (as discussed in Section III) satisfies the following weak technical condition: for all p𝒫p\in\mathcal{P} and each stationary distribution π\pi of each LΛL\in\Lambda, D(pϕ(π))<D(p\|\phi(\pi))<\infty whenever D(pπ)<D(p\|\pi)<\infty. Note that ϕ(π)\phi(\pi) is also a stationary distribution of LL by Lemma 1 in Appendix A, so this condition is automatically satisfied when the generators have unique stationary distributions (since in that case π=ϕ(π)\pi=\phi(\pi)). Note also that if some LΛL\in\Lambda have multiple stationary distributions π\pi, the corresponding EP rate in Eq. 11 can be equivalently defined using any π\pi such that D(pπ)<D(p\|\pi)<\infty.).

The rate of entropy production (EP rate) incurred at time tt can be written as (Eq. 33 in esposito2010three )

Σ˙(p(t),L(t))=xtpx(t)lnpx(t)πxL(t)0,\displaystyle\dot{\Sigma}(p(t),L(t))=-\sum_{x}{\textstyle\partial_{t}}p_{x}(t)\ln\frac{p_{x}(t)}{\pi^{L(t)}_{x}}\geq 0, (11)

where tpx(t){\textstyle\partial_{t}}p_{x}(t) is defined in Eq. 10. Note that the right side of Eq. 11 is sometimes called the “nonadiabatic EP rate” in stochastic thermodynamics, and it is equal to the overall EP rate for a system coupled to a single bath and obeying detailed balance esposito2010three . The total EP incurred by a time-extended protocol over t[0,1]t\in[0,1] that carries out the transformation ppp\!\shortrightarrow\!p^{\prime} is given by the integral of the EP rate,

Σ(pp)=01Σ˙(p(t),L(t))𝑑t.\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})=\int_{0}^{1}\dot{\Sigma}(p(t),L(t))\,dt. (12)

The work extracted during a protocol can be calculated by using Eqs. 12 and 2, once the initial and final nonequilibrium free energies, FE(p)F_{E}(p) and FE(p)F_{E^{\prime}}(p^{\prime}), are specified. To define these free energies, we assume that there is some fixed pair of energy functions, EE and EE^{\prime}, which specify the Boltzmann equilibrium distributions of L(0)L(0) and L(1)L(1) respectively.

For a continuous-state system evolving under a continuous master equation (van1992stochastic, ; risken1996fokker, ), the sums in Eqs. 10 and 11 should be replaced by integrals (see Eq. 31 in van2010three ). A prototypical example of a continuous master equation, which we will use below, is a Fokker-Planck equation (ermak1978brownian, ; van1992stochastic, ),

tp(x,t)=(𝖠(x,t)p(x,t)𝖣(x,t)p(x,t)),\displaystyle{\textstyle\partial_{t}}p(x,t)=-\nabla\cdot(\mathsf{A}(x,t)p(x,t)-\mathsf{D}(x,t)\nabla p(x,t)), (13)

where 𝖠\mathsf{A} and 𝖣\mathsf{D} are drift and diffusion terms.

We will often write dynamical equations like Eqs. 10 and 13 using the notation tp(t)=L(t)p(t){\textstyle\partial_{t}}p(t)=L(t)p(t), where L(t)L(t) is a bounded linear operator that is called the (infinitesimal) generator of the dynamics at time tt. Note that for a continuous-state system in phase space, it may be that the system is isolated from the bath for some t[0,1]t\in[0,1], in which case tp(t)=L(t)p(t){\textstyle\partial_{t}}p(t)=L(t)p(t) should be understood in terms of the Liouville equation. (For example, if a system is first isolated and evolves in a Hamiltonian manner, and is then brought in contact with a bath at inverse temperature β\beta and allowed to equilibrate).

III General framework

We begin by presenting our general mathematical framework. The application of this framework to concrete situations is described in latter sections.

A driving protocol {L(t):t[0,1]}\{L(t):t\in[0,1]\} is said to be constrained if there is some restricted set of generators Λ\Lambda such that L(t)ΛL(t)\in\Lambda at all tt. For a given set of allowed generators Λ\Lambda, we consider an associated operator ϕ:𝒫𝒫\phi:\mathcal{P}\to\mathcal{P} which satisfies two conditions. The first condition states that

D(pq)=D(pϕ(p))+D(ϕ(p)q)\displaystyle D(p\|q)=D(p\|\phi(p))+D(\phi(p)\|q) (14)

for all p𝒫p\in\mathcal{P} and qimgϕq\in\mathrm{img}\;\phi with D(pq)<D(p\|q)<\infty (where imgϕ={ϕ(p):p𝒫}\mathrm{img}\;\phi=\{\phi(p):p\in\mathcal{P}\} is the image of the operator ϕ\phi). Eq. 14 is sometimes called the Pythagorean identity of KL divergence in information geometry amari2016information . Any ϕ\phi that obeys Eq. 14 can be written in terms of the following projection 444This is because D(pq)D(pϕ(p))D(p\|q)\geq D(p\|\phi(p)) for any qimgϕq\in\mathrm{img}\;\phi, which follows from Eq. 14 and the non-negativity of KL divergence.

ϕ(p)=argminqimgϕD(pq),\displaystyle\phi(p)=\operatorname*{\arg\,\min}_{q\in\mathrm{img}\;\phi}D(p\|q), (15)

which shows that D(pϕ(p))D(p\|\phi(p)) is the minimal information-theoretic distance from pp to the set of distributions imgϕ\mathrm{img}\;\phi.

The second condition is that ϕ\phi obeys the following commutativity relation for all LΛL\in\Lambda:

eτLϕ(p)=ϕ(eτLp)τ0,p𝒫.\displaystyle e^{\tau L}\phi(p)=\phi(e^{\tau L}p)\quad\forall\tau\geq 0,p\in\mathcal{P}. (16)

In other words, given any initial distribution pp, the same final distribution is reached regardless of whether pp first relaxes under LL for time τ\tau and then undergoes ϕ\phi, or instead first undergoes ϕ\phi and then relaxes under LL for time τ\tau.

Note that the Pythagorean identity in Eq. 14 concerns only the operator ϕ\phi, while the commutativity relation in Eq. 16 concerns the relationship between ϕ\phi and the generators in Λ\Lambda (and therefore all of the generators L(t)L(t) in the driving protocol, since L(t)ΛL(t)\in\Lambda at all tt by assumption). Beyond these two conditions, the operator ϕ\phi can be arbitrary, and may be linear or nonlinear. In the following sections of this paper, will show how to choose ϕ\phi for various types of constrained protocols.

Importantly, any ϕ\phi that satisfies the two conditions above maps any distribution pp to a corresponding “accessible” distribution ϕ(p)\phi(p), which controls the amount of work that can be extracted from pp by a constrained driving protocol. To prove this, we first show that for any LΛL\in\Lambda that obeys Eq. 16, the equilibrium distribution πL\pi^{L} satisfies (Lemma 1 in Appendix A)

πLimgϕ.\displaystyle\pi^{L}\in\mathrm{img}\;\phi. (17)

We also derive the following mathematical result, will be central to much of what follows: if ϕ\phi obeys Eq. 14 and Eq. 16 for some generator LL, then the EP rate incurred by any distribution pp under LL can be written as the sum of two non-negative terms: the EP rate incurred by ϕ(p)\phi(p) under LL, and the instantaneous contraction of the KL divergence between pp and ϕ(p)\phi(p).

Refer to caption
Figure 2: Visual explanation of Theorem 1: distribution pp freely relaxes under LL for time τ\tau (solid gray line). The EP incurred during this relaxation (contraction of purple lines) can be decomposed into the contraction of the KL divergence between pp and ϕ(p)\phi(p) (contraction of green lines), plus the EP incurred during the free relaxation of ϕ(p)\phi(p) (contraction of the red lines). The free relaxation of ϕ(p)\phi(p) under LL is represented by the dotted gray line.
Theorem 1.

If ϕ\phi obeys Eq. 14 and Eq. 16 for some generator LL, then for all p𝒫p\in\mathcal{P},

Σ˙(p,L)=Σ˙(ϕ(p),L)ddtD(p(t)ϕ(p(t))),\dot{\Sigma}(p,L)=\dot{\Sigma}(\phi(p),L)-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t))),

and ddtD(p(t)ϕ(p(t)))0-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))\geq 0, where tp(t)=Lp{\textstyle\partial_{t}}p(t)=Lp.

We sketch the proof of this theorem in terms of a discrete-time relaxation over interval τ\tau, as shown in Fig. 2 (see Appendix A for details). Consider some distribution pp that relaxes for time τ\tau under the generator LL, thereby reaching the distribution eτLpe^{\tau L}p (solid gray line). The EP incurred by this relaxation is given by the contraction of KL divergence to the equilibrium distribution π\pi, Σ(peτLp)=D(pπ)D(eτLpπ)\Sigma(p\!\shortrightarrow\!e^{\tau L}p)=D(p\|\pi)-D(e^{\tau L}p\|\pi) (contraction of purple lines) esposito2010three ; van2010three . Given Eq. 17, we can apply the Pythagorean identity, Eq. 14, to both D(pπ)D(p\|\pi) and D(eτLpπ)D(e^{\tau L}p\|\pi), which lets us rewrite Σ(peτLp)\Sigma(p\!\shortrightarrow\!e^{\tau L}p) as the sum of two terms: D(pϕ(p))D(eτLpϕ(eτLp)D(p\|\phi(p))-D(e^{\tau L}p\|\phi(e^{\tau L}p) (green lines) and D(ϕ(p)π)D(ϕ(eτLp)π)D(\phi(p)\|\pi)-D(\phi(e^{\tau L}p)\|\pi) (red lines). Applying the commutativity relation, Eq. 16, shows that the first term is non-negative by the data-processing inequality and that the second term is equal to Σ(ϕ(p)eτLϕ(p))\Sigma(\phi(p)\!\shortrightarrow\!e^{\tau L}\phi(p)), the EP incurred by letting ϕ(p)\phi(p) relax freely under LL. The continuous-time statement found in Theorem 1 follows by taking the appropriate τ0\tau\to 0 limit, while noting that the EP rate, Eq. 11, can be rewritten in terms of the limit limτ01τ[D(pπ)D(eτLpπ)]\lim_{\tau\to 0}\frac{1}{\tau}[D(p\|\pi)-D(e^{\tau L}p\|\pi)].

Now suppose that Eq. 16 holds, so that the assumptions of Theorem 1 are satisfied during the entire protocol. In that case, as we show in Lemma 3 in Appendix A, any constrained protocol that carries out the transformation ppp\!\shortrightarrow\!p^{\prime} must also transform the initial distribution ϕ(p)\phi(p) to the final distribution ϕ(p)\phi(p^{\prime}). We can then, in essence, integrate Theorem 1 over time and derive the following result about total EP.

Theorem 2.

If ϕ\phi obeys Eq. 14 and Eq. 16 for all LΛL\in\Lambda, then for any constrained protocol that transforms ppp\!\shortrightarrow\!p^{\prime},

Σ(pp)=Σ(ϕ(p)ϕ(p))+[D(pϕ(p))D(pϕ(p))]\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})=\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime}))+\left[D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\right]

and D(pϕ(p))D(pϕ(p))0D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\geq 0.

Refer to caption
Figure 3: Illustration of Theorem 2. Given an appropriate operator ϕ\phi, Σ(pp)\Sigma(p\!\shortrightarrow\!p^{\prime}) (the EP incurred during some desired transformation ppp\!\shortrightarrow\!p^{\prime}; solid gray line) is equal to Σ(ϕ(p)ϕ(p))\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})) (the EP incurred by that protocol when transforming ϕ(p)ϕ(p)\phi(p)\!\shortrightarrow\!\phi(p^{\prime}); dashed gray line) plus the contraction of the KL divergence D(pϕ(p))D(pϕ(p))D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime})) (contraction of green lines). This contraction of KL divergence is a non-negative lower bound on Σ(pp)\Sigma(p\!\shortrightarrow\!p^{\prime}), as in Eq. 18.

We use Theorem 2 to derive several useful bounds on EP and work. First, since Σ(ϕ(p)ϕ(p))0\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime}))\geq 0 by the non-negativity of EP, the contraction of KL divergence between pp and ϕ(p)\phi(p) bounds the EP incurred by a constrained driving protocol that carries out the transformation ppp\!\shortrightarrow\!p^{\prime},

Σ(pp)D(pϕ(p))D(pϕ(p))0,\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\geq 0, (18)

which appeared as Eq. 7 in the introduction. Furthermore, D(pϕ(p))D(pϕ(p))0D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\geq 0 immediately implies that

Σ(pp)Σ(ϕ(p)ϕ(p)).\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})). (19)

We can also derive the decomposition of free energy and the bound on extractable work, which appeared as Eqs. 5 and 6 in the introduction. Consider some transformation ppp\!\shortrightarrow\!p^{\prime}, and write the initial nonequilibrium free energy as

FE(p)=FE(π)+D(pπ)/β,F_{E}(p)=F_{E}(\pi)+D(p\|\pi)/\beta, (20)

where πeβE\pi\propto e^{-\beta E} is the Boltzmann distribution for the initial energy function EE, and FE(π)F_{E}(\pi) is the equilibrium free energy (esposito2011second, ). Using Eq. 17 and the Pythagorean identity, Eq. 14, we decompose the nonequilibrium free energy into a sum of the accessible free energy and the inaccessible free energy,

FE(p)\displaystyle F_{E}(p) =FE(π)+[D(pϕ(p))+D(ϕ(p)π)]/β\displaystyle=F_{E}(\pi)+[D(p\|\phi(p))+D(\phi(p)\|\pi)]/\beta
=FE(ϕ(p))+D(pϕ(p))/β.\displaystyle=F_{E}(\phi(p))+D(p\|\phi(p))/\beta. (21)

Using a similar derivation, we can write the nonequilibrium free energy at the end of the protocol as

FE(p)=FE(ϕ(p))+D(pϕ(p))/β.\displaystyle F_{E^{\prime}}(p^{\prime})=F_{E^{\prime}}(\phi(p^{\prime}))+D(p^{\prime}\|\phi(p^{\prime}))/\beta. (22)

Subtracting Eq. 22 from Eq. 21 shows that the drop in the nonequilibrium free energy during ppp\!\shortrightarrow\!p^{\prime} is given by

FE(p)FE(p)=FE(ϕ(p))FE(ϕ(p))+[D(pϕ(p))D(pϕ(p))]/β.F_{E}(p)-F_{E^{\prime}}(p^{\prime})=F_{E}(\phi(p))-F_{E^{\prime}}(\phi(p^{\prime}))+\\ \left[D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\right]/\beta. (23)

Combining this result with Theorem 2 and Eq. 2, and then rearranging, shows that the work involved in carrying out ppp\!\shortrightarrow\!p^{\prime} is equal to the work involved in carrying out the accessible transformation ϕ(p)ϕ(p)\phi(p)\!\shortrightarrow\!\phi(p^{\prime}):

W(pp)=W(ϕ(p)ϕ(p)).\displaystyle W(p\!\shortrightarrow\!p^{\prime})=W(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})). (24)

Finally, by combining with Eq. 1, we arrive at an upper bound on work that can be extracted by a constrained protocol:

W(pp)FE(ϕ(p))FE(ϕ(p)),\displaystyle W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(\phi(p))-F_{E^{\prime}}(\phi(p^{\prime})), (25)

which is tighter than the bound given by the second law, Eq. 1.

The bounds in Eqs. 18 and 25, as well as the decomposition of free energy in Eq. 21, are the main theoretical results arising from our general framework. Fig. 3 provides a schematic way of understanding these results. Theorem 2 states that, for a constrained protocol that carries out the map ppp\!\shortrightarrow\!p^{\prime}, the EP incurred during the system’s actual trajectory (solid gray line) is given by the EP that would incurred by a “projected trajectory” that carries out the transformation ϕ(p)ϕ(p)\phi(p)\!\shortrightarrow\!\phi(p^{\prime}) (dashed gray line), plus the drop in the KL divergence from the system’s distribution to the set imgϕ\mathrm{img}\;\phi over the course of the protocol (contraction of green lines). Since the EP of the projected trajectory must be non-negative, the drop in the distance from the system’s distribution to imgϕ\mathrm{img}\;\phi serves as a lower bound on EP, as in Eq. 18. In addition, Theorem 2 states that this decrease in the KL divergence must be positive, meaning that the system’s distribution must get closer to imgϕ\mathrm{img}\;\phi over the course of the protocol.

Following Fig. 3, it can be helpful to think of the trajectory ppp\!\shortrightarrow\!p^{\prime} as composed of three segment: (1) from pp down to ϕ(p)\phi(p), (2) from ϕ(p)\phi(p) to ϕ(p)\phi(p^{\prime}) while staying within imgϕ\mathrm{img}\;\phi, and (3) from ϕ(p)\phi(p^{\prime}) up to pp^{\prime} (note that this decomposition is useful for accounting purposes, but does not generally reflect the actual trajectory the system takes in going from pp to pp^{\prime}). The first and third segments contribute (positively and negatively, respectively) only to EP, while the projected second segment ϕ(p)ϕ(p)\phi(p)\!\shortrightarrow\!\phi(p^{\prime}) contributes both to EP and to work. Thus, the work involved in ppp\!\shortrightarrow\!p^{\prime} is determined entirely by the work involved in the second segment, as stated in Eq. 24.

Note also the formal similarity between our decomposition of the drop in free energy, Eq. 23, and the decompositions of EP in Theorem 2. Indeed, like Theorem 2, the result Eq. 23 can be illustrated with Fig. 3: during the transformation ppp\!\shortrightarrow\!p^{\prime} (solid gray line), the drop in free energy is given by the drop in free energy incurred by the transformation ϕ(p)ϕ(p)\phi(p)\!\shortrightarrow\!\phi(p^{\prime}) (dotted gray line), plus the contraction of the KL divergence from the system’s distribution to the set imgϕ\mathrm{img}\;\phi (green lines).

In general, our bounds on EP and work will not always be achievable. Suppose, however, that the final distribution pp^{\prime} is in equilibrium, so p=ϕ(p)p^{\prime}=\phi(p^{\prime}) by Eq. 17. Eq. 18 then gives

Σ(pp)D(pϕ(p)).\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\phi(p)). (26)

This bound is achievable if the generators in Λ\Lambda have a continuous curve of equilibrium distributions from ϕ(p)\phi(p) to p=ϕ(p)p^{\prime}=\phi(p^{\prime}). Imagine a protocol in which the initial distribution pp first relaxes to the equilibrium distribution ϕ(p)\phi(p), and then undergoes quasistatic driving from ϕ(p)\phi(p) to ϕ(p)\phi(p^{\prime}) while remaining in equilibrium throughout (in terms of Fig. 3, the system first relaxes along the green arrow connecting pp to ϕ(p)\phi(p), then follows the dashed line to ϕ(p)\phi(p^{\prime}) quasistatically). The relaxation step incurs D(pϕ(p))D(p\|\phi(p)) of EP, while the quasistatic step incurs a vanishing amount of EP, so the bound in Eq. 26 will be achieved.

III.1 Choice of the ϕ\phi operator

In general, the operator ϕ\phi associated with a given set of generators Λ\Lambda is not unique. For instance, for any driving protocol, the identity map ϕ(p)=p\phi(p)=p always satisfies Eq. 14 and Eq. 16. Choosing ϕ\phi to be the identity map, however, reduces the results in Theorem 2 to trivial identities and the lower bound on EP in Eq. 18 to 0.

At a high level, those ϕ\phi which have smaller imgϕ\mathrm{img}\;\phi will generally give tighter bounds on EP (since, given Eq. 15, a smaller image leads to larger values of D(pϕ(p))D(p\|\phi(p))). To illustrate this phenomenon, consider the extreme case where all LΛL\in\Lambda have the same equilibrium distribution π\pi, so that any constrained driving protocol must be a free relaxation toward π\pi. Then, the operator ϕ(p)=π\phi(p)=\pi for all pp (so imgϕ\mathrm{img}\;\phi is a singleton) satisfies Eqs. 16 and 14 and, when plugged into Eq. 18, gives the following bound on EP:

Σ(pp)D(pπ)D(pπ).\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\pi)-D(p^{\prime}\|\pi). (27)

In fact, the right hand side is an exact expression for the EP incurred by the free relaxation, meaning that it is the tightest possible bound. If, however, the generators LΛL\in\Lambda have different equilibrium distributions, then the operator ϕ(p)=π\phi(p)=\pi (for whatever π\pi) generally violates the commutativity relation in Eq. 16, and bounds like Eq. 27 will no longer hold.

In the following sections, we show how to use our results to derive thermodynamic bounds for Λ\Lambda that obey some kind of symmetry group, modular decomposition, or coarse-grained structure. In more general, possibly unstructured cases, it is an open question of whether a non-trivial operator ϕ\phi exists, and if so how to identify it. We explore related issues in a companion paper kolchinsky_constraints_paper2 , where we use numerical optimization techniques to derive bounds on EP similar to Eq. 18.

Importantly, when there are multiple different operators that all satisfy the Pythagorean identity and the commutativity relation for the available generators Λ\Lambda, one can derive tighter bounds on EP and work by applying our decompositions in an “iterative” manner. For instance, imagine that there are two different operators ϕ1\phi_{1} and ϕ2\phi_{2} that satisfy Eqs. 14 and 16 (for example, these might represent operators arising from symmetry constraints and modularity constraints, respectively, as described below). Applying Theorem 2 iteratively leads to “stacked” bounds on EP analogous Eq. 18,

Σ(pp)[D(pϕ1(p))+D(ϕ1(p)ϕ2(ϕ1(p)))][D(pϕ1(p))+D(ϕ1(p)ϕ2(ϕ1(p)))]0.\Sigma(p\!\shortrightarrow\!p^{\prime})\geq\big{[}D(p\|\phi_{1}(p))+D(\phi_{1}(p)\|\phi_{2}(\phi_{1}(p)))\big{]}-\\ \big{[}D(p^{\prime}\|\phi_{1}(p^{\prime}))+D(\phi_{1}(p^{\prime})\|\phi_{2}(\phi_{1}(p^{\prime})))\big{]}\geq 0. (28)

Similarly, applying Eq. 24 iteratively leads to stacked bounds on extractable work analogous to Eq. 25,

W(pp)FE(ϕ2(ϕ1(p)))FE(ϕ2(ϕ1(p))).\displaystyle W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(\phi_{2}(\phi_{1}(p)))-F_{E^{\prime}}(\phi_{2}(\phi_{1}(p^{\prime}))). (29)

Such stacked bounds are generally tighter than the bounds provided by either ϕ1\phi_{1} or ϕ2\phi_{2} alone. (Note that one can also reverse the order of operations, and consider the composition ϕ1(ϕ2(p))\phi_{1}(\phi_{2}(p)) rather than ϕ2(ϕ1(p))\phi_{2}(\phi_{1}(p)) in Eqs. 28 and 29, which will in general lead to different bounds.)

III.2 Fluctuating entropy production

As we show in detail in Section A.2, our results also have implications for stochastic fluctuations of trajectory-level EP, as considered in stochastic thermodynamics seifert2012stochastic .

Consider any constrained driving protocol over t[0,1]t\in[0,1] with an associated operator ϕ\phi. Let 𝒙\bm{x} indicate some stochastically sampled trajectory of the system visited during the driving protocol, and let σp(𝒙)\sigma_{p}(\bm{x}) indicate the fluctuating EP incurred by trajectory 𝒙\bm{x} when initial states are sampled from the initial distribution pp. In the appendix, we consider the difference between this fluctuating EP and the fluctuating EP incurred by the same trajectory when initial states are sampled from the accessible initial distribution ϕ(p)\phi(p),

mp(𝒙):=σp(𝒙)σϕ(p)(𝒙).\displaystyle m_{p}(\bm{x}):=\sigma_{p}(\bm{x})-\sigma_{\phi(p)}(\bm{x}). (30)

By combining Theorem 2 with recent results in stochastic thermodynamics kolchinsky2021state ; kwon2019fluctuation , we show that the expectation of mp(𝒙)m_{p}(\bm{x}) is equal to the difference of expected EPs, mp(𝒙)=Σ(pp)Σ(ϕ(p)ϕ(p))\langle m_{p}(\bm{x})\rangle=\Sigma(p\!\shortrightarrow\!p^{\prime})-\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})), where \langle\cdot\rangle indicates expectation over trajectories sampled from initial distribution pp. We also show that mp(𝒙)m_{p}(\bm{x}) obeys a detailed fluctuation theorem, which implies a trajectory-level version of Eq. 19: the probability that the fluctuating EP under initial distribution pp is ξ\xi less than the fluctuation EP under the accessible initial distribution ϕ(p)\phi(p) is exponentially small (i.e., it is less than eξe^{-\xi}). We leave further exploration of the connection between our framework and stochastic thermodynamics for future work.

IV Thermodynamics of information under protocol constraints

The framework introduced in the previous section has implications for the thermodynamics of information under constraints. Consider the type of feedback control setup described in the introduction: first an observation apparatus MM measures some system observable, then the system undergoes a driving protocol that depends on the measurement outcome mm. Let L(m)(t)L^{(m)}(t) indicate the driving protocol conditioned on mm, and pX|mp_{X|m} and pX|mp_{X^{\prime}|m}^{\prime} indicate the distributions over system states at the beginning and end of the corresponding driving protocol. As standard in the literature parrondo2015thermodynamics , for simplicity we assume that all protocols start and end with the same energy functions, EE and EE^{\prime}, and that during the protocols, the measurement apparatus MM and the system XX are energetically decoupled and that MM does not change state.

Given the above assumptions, it is straightforward to show that the EP incurred by the joint “supersystem” X×MX\times M obeys

ΣXM=mp(m)Σm,\displaystyle\Sigma_{XM}=\sum_{m}p(m)\Sigma_{m}, (31)

where Σm\Sigma_{m} is the EP incurred by protocol L(m)(t)L^{(m)}(t) in carrying out the transformation pX|mpX|mp_{X|m}\!\shortrightarrow\!p_{X^{\prime}|m}^{\prime}. Similarly, by taking expectations of Eq. 2 and rearranging (see derivation of Eq. 4), the average extracted work under feedback control can be written as

W=ΔF+[I(X;M)I(X;M)]mp(m)Σm,\displaystyle\langle W\rangle=\Delta F+[I(X;M)\!-\!I(X^{\prime};M)]-\!\sum_{m}p(m)\Sigma_{m}, (32)

where for notational convenience we’ve used ΔF=FE(p)FE(p)\Delta F=F_{E}(p)-F_{E^{\prime}}(p^{\prime}) to indicate the drop of marginal free energy. Thus, any lower bounds on Σm\Sigma_{m} (the EP values incurred by the individual protocols L(m)(t)L^{(m)}(t)) can be translated into bounds on the overall EP and average extractable work for a feedback control setup.

For example, suppose that there is some single set of constraints that applies to all of the driving protocols, in that there is some set of generators Λ\Lambda such that L(m)(t)ΛL^{(m)}(t)\in\Lambda for all tt and mm, as well as an operator ϕ\phi that obeys the Pythagorean identity, Eq. 14, and the commutativity relation, Eq. 16, for all LΛL\in\Lambda. In that case, the framework described in Section III leads to bounds on each Σm\Sigma_{m} term. In particular, using Eqs. 18 and 31 gives the bound

ΣXMD(pX|Mϕ(pX|M))D(pX|Mϕ(pX|M))0,\Sigma_{XM}\geq\\ D(p_{X|M}\|\phi(p_{X|M}))-D(p_{X^{\prime}|M}^{\prime}\|\phi(p_{X^{\prime}|M}^{\prime}))\geq 0, (33)

where we’ve defined the conditional KL divergence D(pX|Mϕ(pX|M))=mp(m)D(pX|mϕ(pX|m))D(p_{X|M}\|\phi(p_{X|M}))=\sum_{m}p(m)D(p_{X|m}\|\phi(p_{X|m})), and similarly for D(pX|Mϕ(pX|M))D(p_{X^{\prime}|M}^{\prime}\|\phi(p_{X^{\prime}|M}^{\prime})). Plugging into Eq. 32 gives the following bound on average extractable work:

WΔF+[Iaccϕ(X;M)Iaccϕ(X;M)]/β,\displaystyle\left\langle W\right\rangle\leq\Delta F+[I_{\mathrm{acc}}^{\phi}(X;M)-I_{\mathrm{acc}}^{\phi}(X^{\prime};M)]/\beta, (34)

where Iaccϕ(X;M)I_{\mathrm{acc}}^{\phi}(X;M) is given by

Iaccϕ(X;M)\displaystyle I_{\mathrm{acc}}^{\phi}(X;M) =I(X;M)D(pX|Mϕ(pX|M)),\displaystyle=I(X;M)-D(p_{X|M}\|\phi(p_{X|M})), (35)

and similarly for Iaccϕ(X;M)I_{\mathrm{acc}}^{\phi}(X^{\prime};M).

We refer to Iaccϕ(X;M)I_{\mathrm{acc}}^{\phi}(X;M) as the accessible information in measurement MM, since any decrease in accessible information can contribute to work extraction (Eq. 34). We refer to the conditional KL divergence D(pX|Mϕ(pX|M))D(p_{X|M}\|\phi(p_{X|M})) as the inaccessible information, since any decrease in inaccessible information must be dissipated as EP, and not extracted as work (Eq. 33). The inaccessible information is non-negative by properties of KL divergence, so Iaccϕ(X;M)I(X;M)I_{\mathrm{acc}}^{\phi}(X;M)\leq I(X;M). In addition, whenever pimgϕp\in\mathrm{img}\;\phi (e.g., when pp is an equilibrium distribution, by Eq. 17), the accessible information can be rewritten in simpler form as

Iaccϕ(X;M)=D(ϕ(pX|M)p),\displaystyle I_{\mathrm{acc}}^{\phi}(X;M)=D(\phi(p_{X|M})\|p), (36)

as follows from Eq. 35 by writing I(X;M)=D(pX|Mp)I(X;M)=D(p_{X|M}\|p) and applying the Pythagorean theorem, Eq. 14.

In general, measurements of different observables on the same system will give rise to different amounts of accessible and inaccessible information. At a high level, one should choose measurements that maximize the accessible information Iaccϕ(X;M)I_{\mathrm{acc}}^{\phi}(X;M), or alternatively the “efficiency” quantified as bits of accessible information per bit of measured information, Iaccϕ(X;M)/I(X;M)1I_{\mathrm{acc}}^{\phi}(X;M)/I(X;M)\leq 1. Optimal measurements satisfy Iaccϕ(X;M)=I(X;M)I_{\mathrm{acc}}^{\phi}(X;M)=I(X;M), which happens when the conditional distributions over system states pX|mp_{X|m} are invariant under the action of ϕ\phi (i.e., when ϕ(pX|m)=pX|m\phi(p_{X|m})=p_{X|m} for each mm).

Note that similar results can also be derived using other kinds of bounds on Σm\Sigma_{m} (e.g., when the individual protocols obey a combination of constraints, so that Eq. 28 holds).

V Symmetry constraints

We now use the general framework introduced above to derive bounds on EP under symmetry constraints.

Consider a compact group 𝒢\mathcal{G} that has a measurable action over XX, such that each g𝒢{g}\in\mathcal{G} is a bijection XXX\to X 555A compact group 𝒢\mathcal{G} has a measurable action over XX if the action 𝒢×XX\mathcal{G}\times X\to X is a measurable function, where we assume 𝒢\mathcal{G} and XX are endowed with their respective Borel algebras.. For continuous XX, we assume that each g𝒢g\in\mathcal{G} is a rigid transformation. For notational convenience, for each g𝒢g\in\mathcal{G} we define the composition operator Φg\Phi_{g}, so that for any function f:Xf:X\to\mathbb{R},

Φg(f)(x)=f(g(x)).\displaystyle\Phi_{g}(f)(x)=f(g(x)). (37)

We say that a set of generators Λ\Lambda obeys symmetry constraints (with respect to the action of group 𝒢\mathcal{G}) if the following commutativity relation holds for all LΛL\in\Lambda:

ΦgL=LΦg.g𝒢.\displaystyle\Phi_{g}L=L\Phi_{g}.\qquad\forall{g}\in\mathcal{G}. (38)

In other words, Λ\Lambda obey symmetry constraints when, for each LΛL\in\Lambda and g𝒢{g}\in\mathcal{G}, it does not matter whether one first applies the generator LL and then the bijection g{g} over the state space, or first applies the bijection g{g} over the state space and then the generator LL. In more concrete terms, for a (continuous or discrete) master equation LL, Eq. 38 holds if the transition rates are invariant under the action of 𝒢\mathcal{G}:

Lxx=Lg(x)g(x)x,xX,g𝒢.\displaystyle L_{xx^{\prime}}=L_{{g}(x){g}(x^{\prime})}\qquad\forall x,x^{\prime}\in X,g\in\mathcal{G}. (39)

We can also derive simple sufficient conditions for potential-driven Fokker-Planck equations of the type

Lp=(EL)p+β1Δp,\displaystyle Lp=\nabla\cdot(\nabla E_{L})p+\beta^{-1}\Delta p, (40)

where ELE_{L} is the energy function of generator LL. Then, Eq. 38 holds if all available energy functions are invariant under the action of 𝒢\mathcal{G},

EL(x)=EL(g(x))xX,g𝒢,LΛ.\displaystyle E_{L}(x)=E_{L}({g}(x))\quad\forall x\in X,g\in\mathcal{G},L\in\Lambda\,. (41)

(Eq. 38 is derived from Eqs. 39 and 41 in Appendix B).

We now define a linear operator ϕ𝒢\phi_{\mathcal{G}} which satisfies the Pythagorean identity and the commutativity relation, Eqs. 14 and 16, for symmetry constraints. Let ϕ𝒢\phi_{\mathcal{G}} map each p𝒫p\in\mathcal{P} to its average under the action of 𝒢\mathcal{G},

ϕ𝒢(p)(x):=𝒢p(g(x))𝑑μ(g),\displaystyle\phi_{\mathcal{G}}(p)(x):=\int_{\mathcal{G}}p(g(x))\,d\mu(g), (42)

where μ\mu is the uniform (normalized Haar) measure over 𝒢\mathcal{G} 666 Technically, the definition of the twirling operator in Eq. 42 applies only when pp is a finite-valued probability density function (which excludes things such as the Dirac delta “function”). A more general formulation of our results can be developed in terms of probability measures rather than probability densities (see Ch. 3 in eaton_group_1989 for a version of Eq. 42 defined in terms of probability measures).. For a finite group, the integral in Eq. 42 should be replaced by a summation. Following the terminology in quantum physics, we sometimes refer to ϕ𝒢\phi_{\mathcal{G}} as a twirling operator vollbrecht2001entanglement ; vaccaro_tradeoff_2008 . Intuitively, ϕ𝒢(p)\phi_{\mathcal{G}}(p) symmetrizes pp, removing all information in pp concerning the state of the system along the “coordinates” specified by the symmetry constraints.

In Appendix B, we show that ϕ𝒢\phi_{\mathcal{G}} obeys the Pythagorean identity and, as long as Eq. 38 holds, the commutativity relation of Eq. 16. Thus, any protocol that carries out the transformation ppp\!\shortrightarrow\!p^{\prime} while obeying symmetry constraints with respect to 𝒢\mathcal{G} permits the decomposition of EP found in Theorem 2, with ϕ=ϕ𝒢\phi=\phi_{\mathcal{G}}, and satisfies all the bounds on work and EP that follow from that result.

In particular, using Eq. 21, we can decompose the free energy FE(p)F_{E}(p) of any distribution pp into the accessible free energy FE(ϕ𝒢(p))F_{E}(\phi_{\mathcal{G}}(p)), which is the free energy in the twirled (and therefore symmetric) version of pp, and the inaccessible free energy D(pϕ𝒢(p))/βD(p\|\phi_{\mathcal{G}}(p))/\beta. Note that D(pϕ𝒢(p))D(p\|\phi_{\mathcal{G}}(p)) is a non-negative measure of the asymmetry in distribution pp with respect to the symmetry group 𝒢\mathcal{G}, which vanishes when pp is invariant under ϕ𝒢\phi_{\mathcal{G}}. Thus, for any protocol that obeys symmetry constraints, the first inequality in Eq. 18 states that any “drop in asymmetry” must be dissipated as EP, and not turned into work. The second inequality in Eq. 18 states that the asymmetry in the system’s distribution can only decrease during the protocol. (Some of the above results for symmetry constraints have been previously uncovered in quantum thermodynamics vaccaro_tradeoff_2008 ; janzing_quantum_2006 ; see Section VIII.)

We finish by discussing thermodynamics of information under symmetry constraints. In general, the results derived in Section IV apply to the twirling operator ϕ𝒢\phi_{\mathcal{G}} as a special case. We can also exploit special properties of ϕ𝒢\phi_{\mathcal{G}} to further simplify the expression of the inaccessible information term in Eqs. 35 and 33. Suppose that distribution pp is invariant under ϕ𝒢\phi_{\mathcal{G}}, so p=ϕ𝒢(p)p=\phi_{\mathcal{G}}(p) (e.g., if pp is an equilibrium distribution). As shown in Section B.4, we can then rewrite the inaccessible information term as

D(pX|Mϕ𝒢(pX|M))=lnq(m|x)𝒢q(m|g(x))𝑑μ(g),\displaystyle D(p_{X|M}\|\phi_{\mathcal{G}}(p_{X|M}))=\Bigg{\langle}\!\ln\frac{q(m|x)}{\int_{\mathcal{G}}q(m|g(x))d\mu(g)}\!\Bigg{\rangle}, (43)

where q(m|x)q(m|x) is the measurement channel and \langle\cdot\rangle is indicates expectation under the joint distribution p(x,m)=p(x)q(x|m)p(x,m)=p(x)q(x|m). Eq. 43 conveniently expresses the inaccessible information in terms of the asymmetry of the measurement channel relative to the action of 𝒢\mathcal{G} (the right side of Eq. 43 vanishes when q(m|x)q(m|x) is invariant under that action), which we will exploit in some of our examples below.

V.1 Example: Szilard box with symmetry constraints

Refer to caption
Figure 4: A Szilard box with energy functions as in Eq. 44.

We demonstrate our results on symmetry constraints using the Szilard box shown in Fig. 1. We assume that the box is coupled to a single heat bath at inverse temperature β=1\beta=1, and that the particle inside the box has overdamped Fokker-Planck dynamics, so that all generators have the form of Eq. 40. The system’s state is represented by a horizontal and a vertical coordinate, x=(x1,x2)2x=(x_{1},x_{2})\in\mathbb{R}^{2}.

Suppose that all energy functions have the form

Eλ(x1,x2)=Vp(x1λ)+Vw(|x1|)+Vw(|x2|),E_{\lambda}(x_{1},x_{2})=V_{\mathrm{p}}(x_{1}-\lambda)+V_{\mathrm{w}}(|x_{1}|)+V_{\mathrm{w}}(|x_{2}|), (44)

where λ\lambda\in\mathbb{R} is a controllable parameter that determines the location of the vertical partition, VpV_{\mathrm{p}} is the partition’s repulsion potential, and VwV_{\mathrm{w}} is the repulsion potential of the box walls:

Vw(a)={0if a1otherwiseV_{\mathrm{w}}(a)=\begin{cases}0&\text{if }a\leq 1\\ \infty&\text{otherwise}\end{cases} (45)

meaning that the box extends over (x1,x2)[1,1]2(x_{1},x_{2})\in[-1,1]^{2} 777Technically, the wall potential as defined in Eq. 45 is non-differentiable. To be more accurate, one should imagine it in terms of the limit Vw(|x|)=limα|x|αV_{\mathrm{w}}(|x|)=\lim_{\alpha\to\infty}|x|^{\alpha} dhar2019run .. Assume that Vp(xλ)=0V_{\mathrm{p}}(x-\lambda)=0 for some value of λ\lambda (i.e., the partition can be removed by setting λ\lambda outside the box). For such λ\lambda, let EE^{\varnothing} indicate the corresponding energy function, and note that it obeys E(x1,x2)=0E^{\varnothing}(x_{1},x_{2})=0 within the box (and infinity elsewhere), corresponding to a uniform equilibrium distribution u(x1,x2)=𝟏[1,1]2(x1,x2)/4u(x_{1},x_{2})=\mathbf{1}_{[-1,1]^{2}}(x_{1},x_{2})/4 (where 𝟏\mathbf{1} is the indicator function). This Szilard box is shown schematically in Fig. 4.

The energy functions in Eq. 44 obey the vertical reflection symmetry E(x1,x2)=E(x1,x2)E(x_{1},x_{2})=E(x_{1},-x_{2}), corresponding to the two-element symmetric group S2S_{2} whose action is generated by g(x1,x2)=(x1,x2)g(x_{1},x_{2})=(x_{1},-x_{2}). The corresponding twirling of pp is the uniform mixture of pp and its reflection,

ϕ𝒢(p)(x1,x2)=(p(x1,x2)+p(x1,x2))/2.\displaystyle\phi_{\mathcal{G}}(p)(x_{1},x_{2})=(p(x_{1},x_{2})+p(x_{1},-x_{2}))/2. (46)

We can use our results to derive bounds on the work that can be extracted from this Szilard box. Intuitively, the set of allowed generators LL — that is, Fokker-Planck operators with energy functions as in Eq. 44, corresponding to different horizontal locations of the vertical partition — all obey vertical reflection symmetry. Thus, the dynamics generated by those Fokker-Planck operators commute with ϕ𝒢\phi_{\mathcal{G}}, the twirling operator defined in Eq. 46. Using Eq. 25, we can bound the work extracted during any transformation ppp\!\shortrightarrow\!p^{\prime} in terms of the decrease of the accessible free energy, FE(ϕ𝒢(p))FE(ϕ𝒢(p))F_{E}(\phi_{\mathcal{G}}(p))-F_{E^{\prime}}(\phi_{\mathcal{G}}(p^{\prime})).

In more detail, consider some driving protocol which starts and ends with the partition removed. At intermediate times, the driving protocol manipulates the location of the partition so as to bring the system from some initial distribution pp to a final equilibrium distribution p=up^{\prime}=u while extracting work. The second law gives bounds on EP, Σ(pp)0\Sigma(p\!\shortrightarrow\!p^{\prime})\geq 0, and work:

W(pu)\displaystyle W(p\!\shortrightarrow\!u) FE(p)FE(u)=D(pu),\displaystyle\leq F_{E^{\varnothing}}(p)-F_{E^{\varnothing}}(u)=D(p\|u), (47)

which follows from Eqs. 1 and 20. However, this bound can be too optimistic due to the protocol constraints. Given Eq. 18, as well as the fact that the final distribution obeys ϕ𝒢(u)=u\phi_{\mathcal{G}}(u)=u, we know that Σ(pp)D(pϕ𝒢(p))\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\phi_{\mathcal{G}}(p)). Similarly, Eq. 25 gives a tighter bound on extractable work

W(pu)FE(ϕ𝒢(p))FE(u)=D(ϕ𝒢(p)u),\displaystyle W(p\!\shortrightarrow\!u)\leq F_{E^{\varnothing}}(\phi_{\mathcal{G}}(p))-F_{E^{\varnothing}}(u)=D(\phi_{\mathcal{G}}(p)\|u), (48)

where the second equality follows from Eq. 20.

Refer to caption
Figure 5: (a)(a) A non-equilibrium distribution pθp_{\theta} that is “rotated” by an arbitrary angle θ\theta, Eq. 49. (b)(b) The distribution in (a) under the action of the vertical reflection twirling operator, ϕ𝒢(pθ)\phi_{\mathcal{G}}(p_{\theta}).

It is easy to use these results to resolve the question raised in the introduction: can one show that work can only be extracted from a measurement of whether the particle is in the left or right half of the box, rather than a measurement of whether the particle is in the top or bottom half of the box? Suppose that the particle’s initial distribution pp is uniform across the left or right half of the box. Such a distribution pp is invariant under vertical reflection, so p=ϕ𝒢(p)p=\phi_{\mathcal{G}}(p) and Eq. 48 gives W(pu)D(pu)=ln2W(p\!\shortrightarrow\!u)\leq D(p\|u)=\ln 2, the same as the bound set by the second law, Eq. 47. This bound can be achieved by quickly moving the partition to the middle of the box, and then slowly moving it rightward. Conversely, suppose that under the initial distribution pp, the particle is uniformly distributed across the top or bottom half of the box. The twirling of such a distribution is a uniform distribution over the box, ϕ𝒢(p)=u\phi_{\mathcal{G}}(p)=u. In this case, Eq. 48 gives W(pu)0W(p\!\shortrightarrow\!u)\leq 0, meaning that no work can be extracted.

We now demonstrate the power of our approach by analyzing extractable work given a more complex family of initial distributions (while using the same energy functions as above). Suppose that the initial distribution is concentrated within half the box, as determined by a separating line that is rotated by an arbitrary angle θ[π,π]\theta\in[-\pi,\pi] (see Fig. 5(a)). This initial distribution can be written formally as

pθ(x1,x2)\displaystyle p_{\theta}(x_{1},x_{2}) =𝟏[1,1]2(x1,x2)2Θ(x2sinθx1cosθ),\displaystyle=\frac{\mathbf{1}_{[-1,1]^{2}}(x_{1},x_{2})}{2}\Theta(x_{2}\sin\theta-x_{1}\cos\theta), (49)

where Θ\Theta is the Heaviside function. For instance, pθp_{\theta} for θ=0\theta=0 corresponds to the particle being in the left half of the box, while pθp_{\theta} for θ=π/2\theta=\pi/2 corresponds to the particle being in the top half of the box.

Because we are considering the same set of generators as above, we can bound the extractable work in a given pθp_{\theta} using the same twirling operator as defined above in Eq. 46. (For a sample pθp_{\theta}, the twirling ϕ𝒢(pθ)\phi_{\mathcal{G}}(p_{\theta}) is illustrated in Fig. 5(b).) Using Eq. 48, the extractable work obeys W(pθu)D(ϕ𝒢(pθ)u)W(p_{\theta}\!\shortrightarrow\!u)\leq D(\phi_{\mathcal{G}}(p_{\theta})\|u). Moreover, as we show in Section B.5, this KL divergence can be written in closed form as

D(ϕ𝒢(pθ)u)=ln2{12|tan(θπ2)||θ|(π4,3π4)112|tanθ|otherwise.\displaystyle D(\phi_{\mathcal{G}}(p_{\theta})\|u)\!=\!{\ln 2}\cdot\!\begin{cases}\frac{1}{2}|\tan(\theta-\frac{\pi}{2})|&\!\text{$|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4})$}\\ 1-\frac{1}{2}|\tan\theta|&\text{otherwise.}\end{cases} (50)

This result is plotted as a function of θ\theta in Fig. 6.

Refer to caption
Figure 6: Szilard box with symmetry constraints: the bound on extractable work as a function of θ\theta, Eq. 50.

We can also analyze the thermodynamics of information for different measurements of the Szilard box. Imagine that, starting from a uniform equilibrium distribution, one measures which side of the box contains the particle, as determined by a separating line at some arbitrary angle θ[π,π]\theta\in[-\pi,\pi]. For this measurement, the conditional distribution over system states pX|mp_{X|m} is equal to pθp_{\theta} half the time (as in Fig. 5(a)), and equal to pθ+πp_{\theta+\pi} the other half the time. Then, for both measurement outcomes, one manipulates the vertical partition so as to drive the particle back to the equilibrium distribution p=up^{\prime}=u while extracting work. For simplicity, we assume that the initial and final energy functions are the same.

The general bound on average extractable work for feedback control, Eq. 4, gives

WI(X;M)=ln2,\displaystyle\langle W\rangle\leq I(X;M)=\ln 2, (51)

where we’ve used that p=pp=p^{\prime} and I(X;M)=0I(X^{\prime};M)=0. Our results provide a tighter bound, showing that the average extractable work is bounded by the accessible information in the measurement,

WIaccϕ𝒢(X;M)=D(ϕ𝒢(pθ)u)+D(ϕ𝒢(pθ+π)u)2,\displaystyle\!\langle W\rangle\!\leq\!I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)\!=\!\frac{D(\phi_{\mathcal{G}}(p_{\theta})\|u)\!\!+\!\!D(\phi_{\mathcal{G}}(p_{\theta+\pi})\|u)}{2}, (52)

where we used Eqs. 34 and 36. It can be verified from Eq. 50 that D(ϕ𝒢(pθ)u)=D(ϕ𝒢(pθ+π)u)D(\phi_{\mathcal{G}}(p_{\theta})\|u)=D(\phi_{\mathcal{G}}(p_{\theta+\pi})\|u). Thus, the accessible information for a given θ\theta is simply equal to D(ϕ𝒢(pθ)u)D(\phi_{\mathcal{G}}(p_{\theta})\|u), the right side of Eq. 50, and shown in Fig. 6. As expected, the accessible information achieves a maximum of ln2\ln 2 at θ=0\theta=0 (or θ=±π\theta=\pm\pi), which corresponds to a measurement of whether the particle is on the left or right side of the box. The accessible information falls nonlinearly (but continuously) to a minimum of 0 at θ=±π/2\theta=\pm\pi/2, which corresponds to a measurement of whether the particle is on the top or bottom of the box.

In the example above, the accessible information quantifies in a very literal way the “alignment” between the choice of measurement and the way the system can be manipulated. More generally, this example illustrates how our bounds on EP and work depend on the interplay between the operator ϕ\phi, the initial/final distributions pp and pp^{\prime}, and (for feedback control protocols) the choice of measurement MM. This interplay can give rise to highly non-trivial thermodynamic bounds, such as in Eqs. 50 and 6, even for very simple operators ϕ\phi, such as in Eq. 46.

Finally, we note that our analysis above only assumes that the energy functions are vertically symmetric, which includes many energy functions that do not have the form of the vertical partition defined in Eq. 44. Furthermore, while the bounds on work and EP which we derive here are achievable by some vertically symmetric energy functions, they are not necessarily achievable by manipulating the location of a vertical partition. For instance, achieving the extractable work bound for a given θ\theta, Eq. 50, generally requires that the corresponding twirled distribution ϕ𝒢(p)\phi_{\mathcal{G}}(p), such as the one shown in Fig. 5(b), is an equilibrium distribution for some available energy function.

We analyze the same system using a different set of constraints in Sections VI.1 and VII.1 below. (Also see still2021partially for a different recent analysis of the thermodynamics of the Szilard box with rotated measurements, though from the point of view of partial observability rather than protocol constraints.)

V.2 Example: Feedback control on the Ising model

Our bounds on symmetry constraints can be useful for various multi-particle systems with symmetries, such as gases of indistinguishable particles and spin systems with symmetries. We demonstrate this by analyzing the thermodynamics of feedback control on an Ising model. The reader may also be interested in Section B.6, where we analyze a simpler and more pedagogical example of a discrete-state system with symmetry constraints.

Consider a 2D Ising model on a square lattice on a torus, containing a total of N2=N×NN^{2}=N\times N spins. The state of the lattice is indicated as x(x1,,xN2)x\equiv(x_{1},\dots,x_{N^{2}}), where xi{1,1}x_{i}\in\{-1,1\} is the state of the spin at location ii. We assume that the energy functions have the following form,

E(x)=J(i,j)𝒩xixjHixi.\displaystyle E(x)=-J\sum_{\mathclap{(i,j)\in\mathcal{N}}}x_{i}x_{j}-H\sum_{i}x_{i}. (53)

where 𝒩\mathcal{N} is the set of all nearest neighbors on the lattice, JJ is the coupling strength, and HH is the external magnetic field.

Energy functions like these are invariant under the symmetry group 𝒢\mathcal{G} corresponding to horizontal and vertical translations of the lattice (for simplicity, we ignore other symmetries of the lattice, such as reflections and rotations). The action of this group is given by a set of N2N^{2} bijections ga,b:XXg_{a,b}:X\to X for a,b{0,,N1}a,b\in\{0,\dots,N-1\}, where ga,b(x)g_{a,b}(x) translates the lattice state xx to the right by aa spins and upward by bb spins (with periodic boundary conditions). We assume that the system evolves according to Glauber dynamics krapivsky_kinetic_2010 , or some other dynamics that respects the translational symmetry of the 2D lattice, such that Eq. 39 is satisfied.

Given these assumption, we can derive thermodynamic bounds for the 2D Ising model in terms of the following twirling operator,

ϕ𝒢(p)(x)=N2a=0N1b=0N1ga,b(x).\displaystyle\phi_{\mathcal{G}}(p)(x)=N^{-2}\sum_{a=0}^{N-1}\sum_{b=0}^{N-1}g_{a,b}(x). (54)

We use this twirling operator to analyze the thermodynamics of the following feedback-control setup on the Ising model, also shown in Fig. 7. The lattice is initially in equilibrium pp at some temperature β\beta and J=1,H=0J=1,H=0 (no external field). The state of the spin at location 11 is then measured under the measurement channel q(m|x)=δm(x1)q(m|x)=\delta_{m}(x_{1}), where δ\delta is the Kronecker delta. Since there is no initial external field, the two outcomes m{1,1}m\in\{-1,1\} have equal probability and I(X;M)=ln2I(X;M)=\ln 2. The measured outcome is then used to select a driving protocol, which extracts work from the system by manipulating the control parameters JJ and HH. At the end of the protocol corresponding to each outcome, the system is brought back to the original equilibrium (so pX|m=pp_{X^{\prime}|m}^{\prime}=p for all mm). For simplicity, we assume that the initial and final energy functions are the same.

Refer to caption
Refer to caption
Figure 7: Thermodynamics of information on a 2D Ising model. Left: a measurement MM is made of the state of a single spin (green), and then used to drive the system while extracting work (blue). Right: the accessible information Iaccϕ𝒢(X;M)I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M) increases with inverse temperature after the critical value βc0.44\beta_{c}\approx 0.44 (grey circles from Monte Carlo simulations, black line from closed-form expression, Eq. 56). Inset shows the bound on extractable work, Iaccϕ𝒢(X;M)/βI_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)/\beta, which peaks at β0.547\beta\approx 0.547 (red cross).

Under this setup, one can verify that Iaccϕ𝒢(X;M)=0I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X^{\prime};M)=0 and FE(p)=FE(p)F_{E}(p)=F_{E^{\prime}}(p^{\prime}), so Eq. 34 bounds average extractable work as WIaccϕ𝒢(X;M)/β\langle W\rangle\leq I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)/\beta, where Iaccϕ𝒢(X;M)I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M) is the accessible information from Eq. 35. Using Eqs. 35 and 43, we can write this accessible information as

Iaccϕ𝒢(X;M)=ln2lnq(m|x)N2a,bq(m|ga,b(x)),\displaystyle I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)=\ln 2-\Bigg{\langle}\ln\frac{q(m|x)}{N^{-2}\sum_{a,b}q(m|g_{a,b}(x))}\Bigg{\rangle}, (55)

where \langle\cdot\rangle indicates expectation over the joint distribution p(x)q(m|x)p(x)q(m|x), where p(x)p(x) is the initial equilibrium distribution at inverse temperature β\beta and J=1,H=0J=1,H=0. We emphasize that the accessible information depends on β\beta (though we leave this dependence implicit in the notation).

In general, one can estimate the accessible information in Eq. 55 using various numerical techniques (e.g., by sampling from the initial equilibrium distribution using Monte Carlo methods). It is also possible to use Onsager’s well-known solution of the 2D Ising model to calculate the accessible information in closed form. In particular, in Section B.7 we show that in the thermodynamic limit NN\to\infty,

Iaccϕ𝒢(X;M)={0for ββcln2h2(1+1(sinh2β)482)for β>βc.\displaystyle I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)=\begin{cases}0&\text{for $\beta\leq\beta_{c}$}\\ \ln 2-h_{2}\Big{(}\frac{1+\sqrt[8]{1-(\sinh 2\beta)^{-4}}}{2}\Big{)}&\text{for $\beta>\beta_{c}$.}\end{cases} (56)

where h2(x)=xlnx(1x)ln(1x)h_{2}(x)=-x\ln x-(1-x)\ln(1-x) is the binary entropy function and βc=ln(1+2)/20.44\beta_{c}=\ln(1+\sqrt{2})/2\approx 0.44 is the critical inverse temperature of the 2D Ising model. This result is verified in Fig. 7, where we compare Eq. 56 with a Monte Carlo estimate of Eq. 55 on a 100x100 lattice. It can be seen that in the high temperature (low β\beta) regime, the accessible information vanishes. In the low temperature (high β\beta) regime, the amount of accessible information increases, approaching ln2\ln 2 as β\beta\to\infty.

We also plot the bound on average extractable work, WIaccϕ𝒢(X;M)/β\langle W\rangle\leq I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)/\beta, in the inset in Fig. 7. This bound is the ratio of two terms: the accessible information Iaccϕ𝒢(X;M)I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M) and the inverse temperature β\beta, both of which are increasing in β\beta. In fact, it can be seen from Fig. 7 that the bound on extractable work peaks at a finite value of β\beta, the optimal inverse temperature for work extraction. Using Eq. 56 and numerical techniques, we find this optimal value to be β0.547\beta\approx 0.547, leading to the bound W1.06\langle W\rangle\leq 1.06 joules.

This shows that the amount of accessible information provided by a given measurement can depend on the structure of correlations in the system, and therefore vary dramatically as the system undergoes a phase transition. At a high level, any driving protocol that is restricted to energy functions like Eq. 53 can only extract work from “global” (i.e., translationally invariant) information. If the measurement acquires such information (e.g., if it directly measures the spatially-averaged magnetization), then in principle all of the acquired information may be extractable as work. Measurement of the state of a single spin, however, in general provides only local information. The temperature dependence observed in Eq. 56 and Fig. 7 arises from the presence of long-range order in the magnetic regime (β>βc\beta>\beta_{c}). In this regime, the state of each spin is highly correlated with the magnetization of the entire lattice, so local and global information are equivalent. In the high temperature regime (β<βc\beta<\beta_{c}), the state of a single spin is not correlated with any kind of global information, and so most of the measured information is inaccessible.

For a different kind of analysis of the thermodynamics of a 1D Ising model under constraints, see lekscha_quantum_2018 .

VI Modularity constraints

Many systems of interest exhibit modular organization, meaning that their degrees of freedom can be grouped into decoupled subsystems. Examples of modular systems include computational devices such as digital circuits (gershenfeld1996signal, ; Boyd:2018aa, ; wolpert2020thermodynamic, ), regulatory networks in biology (schlosser2004modularity, ), and brain networks (sporns2016modular, ).

We use our framework to derive bounds on work and EP for modular systems. We begin by introducing some terminology and notation. Consider a system whose degrees of freedom are indexed by the set VV, such that the overall state space can be written as X=XvvVX={}_{v\in V}X_{v}, where XvX_{v} is the state space of degree of freedom vv. We use the term subsystem to refer to any subset of the degrees of freedom, AVA\subseteq V. We use XAX_{A} to indicate the random variable representing the state of subsystem AA and xAx_{A} to indicate an actual state of AA. Given some distribution pp over the entire system, we use pAp_{A} to indicate a marginal distribution over subsystem AA, and [Lp]A[Lp]_{A} to indicate the derivative of the marginal distribution of subsystem AA under the generator LL.

We use the term modular decomposition to refer to a set of subsystems 𝒞\mathcal{C}, such that each vVv\in V belongs to at least one subsystem A𝒞A\in\mathcal{C}. Note that some of the degrees of freedom vVv\in V can belong to more than one subsystem in 𝒞\mathcal{C}. We use

O(𝒞)=A,B𝒞:AB(AB)\displaystyle O(\mathcal{C})=\bigcup_{A,B\in\mathcal{C}:A\neq B}(A\cap B) (57)

to indicate those degrees of freedom that belong to more than one subsystem in 𝒞\mathcal{C}, which we refer to as the overlap. We will often write OO instead of O(𝒞)O(\mathcal{C}) for notational simplicity.

We say that the available driving protocols obey modularity constraints (with respect to the modular decomposition 𝒞\mathcal{C}) if each generator LΛL\in\Lambda can be written as a sum of generators of the different subsystems in 𝒞\mathcal{C},

L=A𝒞L(A),\displaystyle L=\sum_{A\in\mathcal{C}}L^{(A)}, (58)

and each L(A)L^{(A)} obeys two properties: the dynamics over the marginal distribution pAp_{A} are closed under L(A)L^{(A)} (depend only on the marginal distribution over AA),

pA=qA[L(A)p]A=[L(A)q]Ap,q𝒫,p_{A}=q_{A}\implies[L^{(A)}p]_{A}=[L^{(A)}q]_{A}\qquad\forall p,q\in\mathcal{P}, (59)

and the distribution over other subsystems besides AA does not change under L(A)L^{(A)},

[L(A)p]B=0p𝒫,B𝒞{A}.[L^{(A)}p]_{B}=0\qquad\quad\forall p\in\mathcal{P},B\in\mathcal{C}\setminus\{A\}. (60)

In other words, we require that each subsystem evolves independently, and does not affect the other subsystems.

The role of the degrees of freedom in the overlap is somewhat subtle. It can be verified that Eq. 60 implies that the degrees of freedom in the overlap cannot change state when evolving under LL. Importantly, however, the overlap may influence the dynamics of those degrees of freedom that can change state. For example, consider an inclusive model of a feedback control setup: there are two nested subsystems, 𝒞={A,B}\mathcal{C}=\{A,B\} with BAB\subseteq A, and the degrees of freedom in O=BO=B (the controller) cannot change state but can influence the evolution of ABA\setminus B. More elaborate feedback control setups, in which the same controller can control multiple subsystems, can be modeled using decompositions with multiple non-nested subsystems. Other examples of modular decompositions with overlap include circuits wolpert2020thermodynamic , spin systems where some spins are pinned by local magnetic fields, and many-particle systems where some particles have no mobility.

We can also provide more concrete conditions when Eqs. 59 and 60 hold for discrete-state master equations and Fokker-Planck equations. For discrete-state master equations, it can be verified by inspection that Eqs. 59 and 60 hold when all LΛL\in\Lambda can be written in the form

Lxx=A𝒞RxA,xA(A)δxVA(xVA),\displaystyle L_{x^{\prime}x}=\sum_{A\in\mathcal{C}}R^{(A)}_{x_{A}^{\prime},x_{A}}\delta_{x_{V\setminus A}}(x_{V\setminus A}^{\prime}), (61)

where δ\delta is the Kronecker delta and R(A)R^{(A)} is some rate matrix over subsystem AA that does not allow the degrees of freedom in the overlap to change state (RxA,xA(A)=0R^{(A)}_{x_{A}^{\prime},x_{A}}=0 if xAOxAOx_{A\cap O}\neq x_{A\cap O}^{\prime}).

For Fokker-Planck equations, for simplicity consider overdamped dynamics of the form

Lp=vVγvLxv[(xvEL)p+β1xvp],Lp=\sum_{v\in V}\gamma_{v}^{L}\partial_{x_{v}}\Big{[}(\partial_{x_{v}}E_{L})p+\beta^{-1}\partial_{x_{v}}p\Big{]}, (62)

where γvL\gamma_{v}^{L} is the mobility coefficient along dimension vv and ELE_{L} is the potential energy function associated with generator LL. Such equations can represent potential-driven Brownian particles coupled to a heat bath, where the different mobility coefficients represent different particle masses or sizes 888One can also apply the results in this section to Fokker-Planck equations that can be put in the form of Eq. 62 via an appropriate change of variables, see (risken1996fokker, , Sec. 4.9).. Now imagine that for all LΛL\in\Lambda, the energy functions are additive over the subsystems, and that the degrees of freedom in the overlap have no mobility:

EL(x)=A𝒞EL(A)(xA),γvL=0vO.\displaystyle E_{L}(x)=\sum_{A\in\mathcal{C}}E^{(A)}_{L}(x_{A}),\quad\;\;\;\gamma_{v}^{L}=0\;\;\;\forall v\in O. (63)

In that case, Eq. 62 can be rewritten in the form of Eq. 58, with L(A)p=vAOγvLxv[(xvEL(A))pA+β1xvpA]L^{(A)}p=\sum_{v\in A\setminus O}\gamma_{v}^{L}\partial_{x_{v}}[(\partial_{x_{v}}E^{(A)}_{L})p_{A}+\beta^{-1}\partial_{x_{v}}p_{A}], and satisfies Eqs. 59 and 60.

We now define the following nonlinear operator ϕ𝒞\phi_{{\mathcal{C}}}:

ϕ𝒞(p)=pOA𝒞pAO|AO.\displaystyle\phi_{{\mathcal{C}}}(p)=p_{O}\prod_{{A\in\mathcal{C}}}p_{A\setminus O|A\cap O}. (64)

This operator preserves the statistical correlations within each subsystem A𝒞A\in\mathcal{C}, as well as within the overlap OO, while destroying all other statistical correlations. As a simple example, if all the subsystems in 𝒞\mathcal{C} are non-overlapping, then ϕ𝒞(p)\phi_{{\mathcal{C}}}(p) has the product form ϕ𝒞(p)=A𝒞pA\phi_{{\mathcal{C}}}(p)=\prod_{A\in\mathcal{C}}p_{A}. In Appendix C, we show that ϕ𝒞\phi_{{\mathcal{C}}} obeys the Pythagorean identity, Eq. 14. We also show that if some generator L(t)L(t) obeys Eqs. 59 and 60, then eτL(t)e^{\tau L(t)} commutes with ϕ𝒞\phi_{{\mathcal{C}}}, so Eq. 16 holds.

This means that for any protocol that carries out the transformation ppp\!\shortrightarrow\!p^{\prime} while obeying modularity constraints, the decompositions and bounds for EP and work derived in Section III are satisfied for ϕ=ϕ𝒞\phi=\phi_{{\mathcal{C}}}. In particular, using Eq. 21, we can decompose the free energy FE(p)F_{E}(p) of any distribution pp into the accessible free energy FE(ϕ𝒞(p))F_{E}(\phi_{{\mathcal{C}}}(p)) and the inaccessible free energy D(pϕ𝒞(p))/βD(p\|\phi_{{\mathcal{C}}}(p))/\beta. Note that D(pϕ𝒞(p))D(p\|\phi_{{\mathcal{C}}}(p)) is a non-negative measure of the amount of statistical correlations between the subsystems of 𝒞\mathcal{C} under distribution pp, which vanishes when each subsystem is conditionally independent given the overlap OO. Thus, for a protocol that obeys modularity constraints, Eq. 18 states that the drop in those statistical correlations is a lower bound on EP, and that the amount of statistical correlation between the subsystems of 𝒞\mathcal{C} cannot increase over the course of the protocol. (There is a fair amount of closely related prior work; see Section VIII.)

A particularly simple application of our bounds occurs when 𝒞\mathcal{C} contains two (possibly overlapping) subsystems, 𝒞={A,B}\mathcal{C}=\{A,B\}. In that case, the bounds in Eq. 18 can be rewritten in terms of the drop of a conditional mutual information between the two subsystems,

Σ(pp)I(XA;XB|XAB)I(XA;XB|XAB)0.\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq I(X_{A};X_{B}|X_{A\cap B})-I(X_{A}^{\prime};X_{B}^{\prime}|X_{A\cap B}^{\prime})\geq 0. (65)

If the subsystems do not overlap, this can be further rewritten as the drop of the regular mutual information,

Σ(pp)I(XA;XB)I(XA;XB)0.\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq I(X_{A};X_{B})-I(X_{A}^{\prime};X_{B}^{\prime})\geq 0. (66)

More generally, if 𝒞\mathcal{C} contains an arbitrary number of non-overlapping subsystems, the EP can be bound as

Σ(pp)(p)(p)0,\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq\mathcal{I}(p)-\mathcal{I}(p^{\prime})\geq 0, (67)

where (p)=(A𝒞S(pA))S(p)\mathcal{I}(p)=\big{(}\sum_{A\in\mathcal{C}}S(p_{A})\big{)}-S(p) is the multi-information in distribution pp with respect to partition 𝒞\mathcal{C} 999The multi-information is a well-known generalization of mutual information, which is also sometimes called “total correlation” (watanabe1960information, )..

We finish by discussing thermodynamics of information under modularity constraints. In general, the results derived in Section IV apply to modularity constraints as a special case. However, we can also exploit special properties of the operator ϕ𝒞\phi_{{\mathcal{C}}} to further simplify the expression of accessible information. Suppose that the distribution pp is invariant under ϕ𝒞\phi_{{\mathcal{C}}}, so p=ϕ𝒞(p)p=\phi_{{\mathcal{C}}}(p) (e.g., if pp is an equilibrium distribution, see Eq. 17). Using Eq. 64, we can then rewrite Eq. 36 as

Iaccϕ𝒞(X;M)\displaystyle I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M) =I(XO;M)+A𝒞I(XA;M|XAO).\displaystyle=I(X_{O};M)+\sum_{\mathclap{A\in\mathcal{C}}}I(X_{A};M|X_{A\cap O}). (68)

Thus, the accessible information in measurement MM is the information that MM provides about the overlap, plus the conditional mutual information between each subsystem and MM given the relevant part of the overlap. This means that only information about individual subsystems — not about inter-subsystem correlations — can be turned into work. If there is no overlap, Eq. 68 can be further simplified as

Iaccϕ𝒞(X;M)\displaystyle I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M) =A𝒞I(XA;M).\displaystyle=\sum_{\mathclap{A\in\mathcal{C}}}I(X_{A};M). (69)

We will use these expressions in some of our examples below.

VI.1 Example: Szilard box with modularity constraints

We illustrate our results for modularity constraints on a Szilard box. In doing so, we will demonstrate two important concepts: first, how the same set of generators Λ\Lambda can be analyzed under different constraints, resulting in different bounds on work and EP (compare this section to Section V.1); second, how bounds arising from multiple constraints can be stacked on top of each in an iterative manner, as in Eq. 28 (we will combine bounds from modularity and symmetry constraints).

We consider the same setup as in Section V.1: there is a single overdamped particle in a box coupled to a bath at inverse temperature β=1\beta=1, which evolves under potential energy functions as in Eq. 44. This system is driven from some initial distribution pp to a final uniform equilibrium distribution, p=up^{\prime}=u while extracting work.

Refer to caption
Figure 8: (a)(a) Given a “rotated” distribution pθp_{\theta}, as shown above in Fig. 5(a), this shows the decorrelated distribution ϕ𝒞(pθ)\phi_{{\mathcal{C}}}(p_{\theta}), as in Eq. 70. (b)(b) The decorrelated and twirled distribution, ϕ𝒢(ϕ𝒞(pθ))\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta})).
Refer to caption
Figure 9: Bounds on extractable work as a function of θ\theta, as derived from only modularity constraints (in green, Eq. 74), a combination of modularity+symmetry constraints (in orange, Eq. 76), and only symmetry constraints (in blue, Eq. 50).

Note that the energy functions in Eq. 44 have no interaction terms between x1x_{1} (the horizontal position of the particle) and x2x_{2} (the vertical position of the particle). That means that the allowed driving protocols obey modularity constraints for a decomposition of the system into two subsystems, 𝒞={{X1},{X2}}\mathcal{C}=\{\{X_{1}\},\{X_{2}\}\} (since Eq. 63 is satisfied for the decomposition). This allows us to analyze EP and work using an operator ϕ𝒞\phi_{{\mathcal{C}}} which maps each joint distribution over X1×X2X_{1}\times X_{2} into a product distribution,

ϕ𝒞(p)(x1,x2)=p(x1)p(x2).\displaystyle\phi_{{\mathcal{C}}}(p)(x_{1},x_{2})=p(x_{1})p(x_{2}). (70)

In particular, using the same derivation as in Eq. 48, we can bound the extractable work in terms of the accessible free energy in pp,

W(pu)D(ϕ𝒞(p)u).\displaystyle W(p\!\shortrightarrow\!u)\leq D(\phi_{{\mathcal{C}}}(p)\|u). (71)

As discussed in Section V.1, this system also obeys symmetry constraints, corresponding to the vertical reflection twirling operator ϕ𝒢\phi_{\mathcal{G}} defined in Eq. 46. We can use Eq. 29 to bound the extractable work using a combination of ϕ𝒞\phi_{{\mathcal{C}}} and ϕ𝒢\phi_{\mathcal{G}},

W(pu)\displaystyle W(p\!\shortrightarrow\!u) D(ϕ𝒞(ϕ𝒢(p))u)\displaystyle\leq D(\phi_{{\mathcal{C}}}(\phi_{\mathcal{G}}(p))\|u) (72)
W(pu)\displaystyle W(p\!\shortrightarrow\!u) D(ϕ𝒢(ϕ𝒞(p))u).\displaystyle\leq D(\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p))\|u). (73)

For concreteness, imagine that the initial distribution pp is concentrated within half the box, as determined by a separating line rotated by some arbitrary angle θ[π,π]\theta\in[-\pi,\pi], so p=pθp=p_{\theta} from Eq. 49 (see Fig. 5(a) for an illustration).

We consider the extractable work bound in Eq. 71 for the initial distribution pθp_{\theta}. For a given pθp_{\theta}, the corresponding decorrelated initial distribution ϕ𝒞(pθ)\phi_{{\mathcal{C}}}(p_{\theta}) is illustrated in Fig. 8(a). Then, the accessible free energy in Eq. 71 can be expressed in closed form as (see Section C.3),

D(ϕ𝒞(pθ)u)=ln412[min{|tanθ|,|tan(π/2θ)|}+f(max{|tanθ|,|tan(π/2θ)|})],D(\phi_{{\mathcal{C}}}(p_{\theta})\|u)=\ln 4-\frac{1}{2}\Big{[}\min\{|\tan\theta|,|\tan(\pi/2-\theta)|\}\\ +f(\max\{|\tan\theta|,|\tan(\pi/2-\theta)|\})\Big{]}, (74)

where for notational convenience we’ve defined

f(x)=11+x22xlnx+1x1lnx214x2.\displaystyle f(x)=1-\frac{1+x^{2}}{2x}\ln\frac{x+1}{x-1}-\ln\frac{x^{2}-1}{4x^{2}}. (75)

Eq. 74 is plotted in Fig. 9 in green. Note that this function peaks both at θ{π,0,π}\theta\in\{-\pi,0,\pi\} (i.e., when the particle is in the left or right half of the box) as well as θ{π/2,π/2}\theta\in\{-\pi/2,\pi/2\} (i.e., when the particle is in the top or bottom half of the box) — precisely those θ\theta for which pθp_{\theta} has no correlations between the horizontal and vertical position of the particle.

Next, we consider the extractable work bound in Eq. 72 for the initial distribution pθp_{\theta}. It can be verified that ϕ𝒢(ϕ𝒞(pθ))(x1,x2)=pθ(x1)u(x2)\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))(x_{1},x_{2})=p_{\theta}(x_{1})u(x_{2}), which is illustrated in Fig. 8(a). The right hand side of Eq. 72 can again be expressed in closed form as (see Section C.3)

D(ϕ𝒢(ϕ𝒞(pθ))u)=ln212{f(|tanθ|)if |θ|(π4,3π4)|tanθ|otherwise\displaystyle D(\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))\|u)=\ln 2-\frac{1}{2}\begin{cases}f(|\tan\theta|)&\text{if $|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4})$}\\ |\tan\theta|&\text{otherwise}\end{cases} (76)

with ff defined as in Eq. 75. This result is shown in Fig. 9 in orange. Note also that ϕ𝒢(ϕ𝒞(pθ))=ϕ𝒞(ϕ𝒢(pθ))\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))=\phi_{{\mathcal{C}}}(\phi_{\mathcal{G}}(p_{\theta})) for all pθp_{\theta}, so the bounds in Eqs. 72 and 73 are equivalent.

For comparison we also plot the extractable work bound derived using symmetry constraints, Eq. 50 (Fig. 9 in blue). It is clear that the bound derived by exploiting a combination of modularity and symmetry constraints (in orange) is strictly tighter than the bounds derived by using either only modularity (green) or only symmetry constraints (blue) individually.

One can also use the bounds derived in this section to analyze the accessible information in a measurement of the Szilard box. Imagine that, starting from a uniform equilibrium distribution, one measures which side of the box contains the particle, as determined by a separating line at some arbitrary angle θ[π,π]\theta\in[-\pi,\pi]. For this measurement, the conditional distribution over system states pX|mp_{X|m} is equal to pθp_{\theta} half the time and equal to pθ+πp_{\theta+\pi} the other half the time. One can then derive bounds on accessible information such as Eq. 52, while using the bounds derived in this section (Eqs. 71, 72 and 73).

VI.2 Example: Generalized Szilard box

Refer to caption
Figure 10: A generalized Szilard box with multiple particles song2021optimal .

Our results on modularity constraints can be useful for analyzing the thermodynamics of multi-particle systems. As an example, consider the “generalized Szilard box” feedback-control scenario analyzed in song2021optimal . Here, a box containing an ideal gas of NN particles, which are indexed by vVv\in V, begins in uniform equilibrium with a heat bath at inverse temperature β\beta. Several partitions are inserted into the box, separating the box into separate volumes, and a measurement MM is made of the number of particles in each volume (see the illustration in Fig. 10). The box is then separated from the bath and, depending on the outcome of the measurement, the partitions are moved so as to equalize the pressure within each volume while extracting work. To make the process repeatable, suppose that at the end of the protocol, the partitions are removed and the box is again equilibrated with the bath (note that this last step does not contribute to extracted work).

The ideal gas assumption means that the particles do not interact, so by Eqs. 59 and 60 the protocol obeys modularity constraints with respect to a decomposition in which each particle is a separate subsystem. The corresponding operator ϕ𝒞\phi_{{\mathcal{C}}} is given by

ϕ𝒞(p)(x)=v=1Np(xv).\displaystyle\phi_{{\mathcal{C}}}(p)(x)=\prod_{v=1}^{N}p(x_{v}). (77)

Given Eq. 34, the average extractable work for the above feedback-control scenario is bounded by WIaccϕ𝒞(X;M)/β\langle W\rangle\leq I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/\beta, which can also be written in terms of the information provided by the measurement MM about each individual particle,

Wv=1NI(Xv;M)/β,\displaystyle\langle W\rangle\leq\sum_{v=1}^{N}I(X_{v};M)/\beta, (78)

as follows from Eq. 69. In fact, by symmetry of the initial distribution, the measurement provides the same information about each particle, I(Xv;M)=I(X1;M)I(X_{v};M)=I(X_{1};M) for all vv, so we can further rewrite Eq. 78 as WNI(X1;M)/β\langle W\rangle\leq N\cdot I(X_{1};M)/\beta.

This shows that Eq. 78, which is reported as one of the main results of song2021optimal (Eq. 5), follows immediately from our framework. Moreover, our derivation holds under a broader set of conditions than those considered in song2021optimal , since it does not rely on any of the details of setup (such as the type of partitions, the particular work extraction protocol, or even the assumption that the particles are identical).

VI.3 Example: Collective flashing ratchet

As a final example of modularity constraints, we consider the “collective flashing ratchet”, a classic model in the literature on the thermodynamics of information cao2004feedback ; craig_feedback_2008 . This system involves NN overdamped particles evolving under an additive potential

E(x)=λv=1NV(xv).\displaystyle E(x)=\lambda\sum_{v=1}^{N}V(x_{v}). (79)

where VV is a single-particle potential and λ{0,1}\lambda\in\{0,1\} is a control parameter that can be used to turn the potential on/off. The single-particle potential VV is chosen as an asymmetrical sawtooth “ratchet” pattern, shown in Fig. 11, where α[0,1/2]\alpha\in[0,1/2] parameterizes the degree of asymmetry.

By manipulating λ\lambda over time, possibly in a way that depends on measurements of the system, the particles can be driven so as to have a net directional flux, or to do work against the externally applied force feito_information_2007 . For instance, in a feedback control setup, λ\lambda is determined by the outcome of some measurement MM. The most common strategy involves turning the ratchet potential on when the net force on the particles is positive, and turning it off otherwise, according to the following measurement channel cao2004feedback :

q(m|x)=δm[Θ(vV(xv))],\displaystyle q(m|x)=\delta_{m}\big{[}{\textstyle\Theta\big{(}\sum_{v}V^{\prime}(x_{v})\big{)}}\big{]}, (80)

where Θ\Theta is the Heaviside function. Note that this system has been experimentally realized lopez_realization_2008 .

Suppose that starting from some initial distribution pp, the measurement in Eq. 80 is performed. As common in the literature cao2004feedback , we assume that under pp the particles are identically and independently distributed, and that each particle is in the increasing part of the potential (V(xv)0V^{\prime}(x_{v})\geq 0) with probability α\alpha (see Fig. 11). The measurement outcome is then used to drive the system back to distribution pp while extracting work by manipulating the system’s energy function, all while coupled to a heat bath at inverse temperature β\beta. We assume that the driving protocols start and end on the same energy function, and that only additive potentials (without interaction terms) are applied to the system during the driving (this assumption allows for potentials such as Eq. 79, as well as many others).

Refer to caption

s

Figure 11: The sawtooth potential of the flashing ratchet, from cao2004feedback .

The driving protocols obey Eq. 63 for a decomposition where each particle is its own subsystem, corresponding to the same type of ϕ𝒞\phi_{{\mathcal{C}}} as in Eq. 77, ϕ𝒞(p)(x)=vVp(xv)\phi_{{\mathcal{C}}}(p)(x)=\prod_{v\in V}p(x_{v}). As in Section VI.1, we can use Eq. 34 to bound average extractable work as WIaccϕ𝒞(X;M)/β\langle W\rangle\leq I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/\beta. Using Eq. 69,

Iaccϕ𝒞(X;M)=v=1NI(Xv;M)=NI(X1;M),\displaystyle I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)=\sum_{v=1}^{N}I(X_{v};M)=N\cdot I(X_{1};M), (81)

where we’ve used the measurement provides the same information about each particle, I(Xv;M)=I(X1;M)I(X_{v};M)=I(X_{1};M) for all vv (as follows from a symmetry argument).

In Section C.4, we show that Iaccϕ𝒞(X;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M) can be computed in closed form. Values of Iaccϕ𝒞(X;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M) for different values of NN (the number of particles) and α\alpha (the asymmetry parameter) are plotted in Fig. 12(left). Note that the accessible information shows a non-monotonic behavior in the number of particles for α0.5\alpha\neq 0.5. This occurs because for a highly asymmetric potential, the total amount of acquired information grows with NN: I(X;M)I(X;M) grows from a minimum value of h2(α)h_{2}(\alpha) for N=1N=1 to a maximum value of ln2\ln 2 as NN\to\infty. Given this observation, we also calculate the “efficiency” of the measurements in terms of the ratio Iaccϕ𝒞(X;M)/I(X;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/I(X;M). This is shown in Fig. 12(right) for various values of NN and α\alpha. Interestingly, lower values of α\alpha (higher values of asymmetry) have higher efficiency values.

In the NN\to\infty limit, accessible information and efficiency converge to a single value, irrespective of α\alpha. In Section C.4, we show that the accessible information Iaccϕ𝒞(X;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M) converges to 1/π0.321/\pi\approx 0.32 nats, while the efficiency Iaccϕ𝒞(X;M)/I(X;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/I(X;M) converges to 1/(πln2)0.461/(\pi\ln 2)\approx 0.46 (dotted lines in Fig. 12).

For a different (and complementary) theoretical analysis of extracted work in a feedback controlled flashing ratchet, see feito_information_2007 .

Refer to caption
Figure 12: Left: accessible information Iaccϕ𝒞(X;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M) for the collective flashing ratchet, as a function of NN (number of particles) and α\alpha (asymmetry). Right: the efficiency of the measurements, Iaccϕ𝒞(X;M)/I(X;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/I(X;M).

VII Coarse-grained constraints

In our final results section, we consider bounds on EP and work that arise from coarse-grained constraints.

We begin by introducing some notation and preliminaries. Let ξ:XZ\xi:X\to Z be some coarse-graining of the microscopic state space XX, where ZZ is a set of macrostates. For any distribution pp over XX, we use pZ(z)=δξ(x)(z)p(x)𝑑xp_{Z}(z)=\int\delta_{\xi(x)}(z)p(x)\,dx to indicate the corresponding distribution over the macrostates ZZ, pX|Z(x|z)=p(x)/pZ(z)p_{X|Z}(x|z)=p(x)/p_{Z}(z) to indicate the conditional probability distribution of microstates within macrostates, and 𝒫Z:={pZ:p𝒫}{{\mathcal{P}}_{Z}}:=\{p_{Z}:p\in\mathcal{P}\} to indicate the set of all coarse-grained distributions. Finally, for any generator LL and distribution pp, we use [Lp]Z[Lp]_{Z} to indicate the resulting instantaneous dynamics of the coarse-grained distribution pZp_{Z}.

To derive our bounds, we suppose that the dynamics over the coarse-grained distributions are closed, i.e., for all LΛL\in\Lambda,

pZ=qZ[Lp]Z=[Lq]Zp,q𝒫.\displaystyle p_{Z}=q_{Z}\implies[Lp]_{Z}=[Lq]_{Z}\qquad\forall p,q\in\mathcal{P}. (82)

Given this assumption, the evolution of the coarse-grained distribution pZp_{Z} can be represented by a coarse-grained generator, which we write as tpZ=L^pZ{\textstyle\partial_{t}}p_{Z}=\hat{L}p_{Z} (discussed in detail below).

We can specify more concrete conditions that guarantee that Eq. 82 holds for a given generator LL (see Appendix D for details). For a discrete-state rate matrix LL, it is satisfied when

x:ξ(x)=zLxx=L^z,ξ(x)x,zξ(x),\sum_{{x:\xi(x)=z}}L_{xx^{\prime}}={\hat{L}}_{z,\xi(x^{\prime})}\quad\forall x^{\prime},z\neq\xi(x^{\prime}), (83)

where L^z,z{\hat{L}}_{z,z^{\prime}} is some coarse-grained transition rate from macrostate zz^{\prime} to macrostate zz. Eq. 83 states that for each microstate xx^{\prime}, the total rate of transitions from xx^{\prime} to microstates located in another macrostate zξ(x)z\neq\xi(x^{\prime}) depends only on the macrostate ξ(x)\xi(x^{\prime}), not on xx^{\prime} directly. This condition has been sometimes called “lumpability” in the literature nicolis2011transformation .

For a continuous-state master equation, Eq. 82 is satisfied when a continuous-state version of Eq. 83 (with sums replaced by integrals) holds. Moreover, for certain Fokker-Planck equation and linear coarse-graining functions, Eq. 83 can be replaced by a simple coarse-graining condition on the energy functions. Suppose each LΛL\in\Lambda is a Fokker-Planck operator like

Lp=(EL)p+β1Δp,Lp=\nabla\cdot(\nabla E_{L})p+\beta^{-1}\Delta p, (84)

and that ξ\xi is a linear function, ξ(x)=Wx\xi(x)=Wx (where WW is some full-rank m×nm\times n matrix, mnm\leq n). Without loss of generality, we assume that WW is scaled so that WWT=IWW^{T}=I 101010If ξ(x)=Wx\xi(x)=Wx and WWTIWW^{T}\neq I, one can define an equivalent, rescaled coarse-graining function ξ(x)=Wx\xi^{\prime}(x)=W^{\prime}x, where W:=(WWT)1/2WW^{\prime}:=(WW^{T})^{-1/2}W, which obeys WWT=IW^{\prime}W^{\prime T}=I.. In addition, suppose that each energy function satisfies

WEL(x)=F^(ξ(x))x\displaystyle W\nabla E_{L}(x)=-\hat{F}(\xi(x))\quad\forall x (85)

for some arbitrary macrostate drift function F^:Z\hat{F}:Z\to\mathbb{R}. Then, the coarse-grained generator L^\hat{L} itself will have a Fokker-Planck form (see duong2018quantification and Appendix D),

L^pZ=F^pZ+β1ΔpZ.\displaystyle\hat{L}p_{Z}=-\nabla\cdot\hat{F}p_{Z}+\beta^{-1}\Delta p_{Z}. (86)

The right side of Eq. 86 depends only on pZp_{Z} and not the full microstate distribution pp, so Eq. 82 will be satisfied.

Importantly, if Eq. 82 holds, the EP rate at time tt can be bounded as (see Appendix D):

Σ˙(p(t),L(t))ztpZ(z,t)lnpZ(z,t)πZL(t)(z)0,\displaystyle\dot{\Sigma}(p(t),L(t))\geq-\sum_{z}{\textstyle\partial_{t}}p_{Z}(z,t)\ln\frac{p_{Z}(z,t)}{\pi_{Z}^{L(t)}(z)}\geq 0, (87)

where tpZ(t)=L^pZ(t){\textstyle\partial_{t}}p_{Z}(t)=\hat{L}p_{Z}(t) and πZL(t)\pi_{Z}^{L(t)} is the coarse-grained version of πL(t)\pi^{L(t)}, the stationary distribution of L(t)L(t). The right hand side of Eq. 87 is the coarse-grained version of Eq. 11, which arises from the macrostate distribution pZp_{Z} being out of equilibrium. We then define the total “coarse-grained EP” over the course of the protocol as the time integral of the middle term in Eq. 87,

Σ^(pZpZ)=01ztpZ(z,t)lnpZ(z,t)πZL(t)(z)dt.\displaystyle\hat{\Sigma}(p_{Z}\!\shortrightarrow\!{p_{Z}^{\prime}})=\int_{0}^{1}-\sum_{z}{\textstyle\partial_{t}}p_{Z}(z,t)\ln\frac{p_{Z}(z,t)}{\pi_{Z}^{L(t)}(z)}\;dt. (88)

Given Eq. 87, the coarse-grained EP serves as a non-negative lower bound on the total EP,

Σ(pp)Σ^(pZpZ)0.\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq\hat{\Sigma}(p_{Z}\!\shortrightarrow\!{p_{Z}^{\prime}})\geq 0. (89)

Note that esposito2012stochastic previously derived a coarse-grained EP rate for discrete-state master equations, which differs from the one that appears on the right hand side of Eq. 87; however, Eq. 87 can be seen as the “nonadiabatic component” of the coarse-grained EP rate from esposito2012stochastic , and is thus a lower-bound on it esposito2010three .

We say that the available driving protocols obey coarse-grained constraints if the generators LΛL\in\Lambda exhibit closed dynamics over ZZ, Eq. 82, and there is some operator ϕ^:𝒫Z𝒫Z\hat{\phi}:{{\mathcal{P}}_{Z}}\to{{\mathcal{P}}_{Z}} that obeys the Pythagorean identity, Eq. 14, and the commutativity relation, Eq. 16, with respect to all L^\hat{L}. For example, this coarse-grained operator ϕ^\hat{\phi} might reflect the presence of symmetry or modularity constraints on the coarse-grained dynamics.

We can then use Eq. 89 and the framework developed in Section III to derive bounds on work and EP. In particular, Eq. 18 implies the following bound on coarse-grained EP, Σ^(pZpZ)D(pZϕ^(pZ))D(pZϕ^(pZ))0\hat{\Sigma}(p_{Z}\!\shortrightarrow\!{p_{Z}^{\prime}})\geq D(p_{Z}\|\hat{\phi}(p_{Z}))-D({p_{Z}^{\prime}}\|\hat{\phi}({p_{Z}^{\prime}}))\geq 0. Combined with Eq. 89, this lets us bound overall EP as

Σ(pp)D(pZϕ^(pZ))D(pZϕ^(pZ))0.\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p_{Z}\|\hat{\phi}(p_{Z}))-D({p_{Z}^{\prime}}\|\hat{\phi}({p_{Z}^{\prime}}))\geq 0. (90)

Via Eq. 2, this also gives a bound on extractable work like

W(pp)FE(p)FE(p)[D(pZϕ^(pZ))D(pZϕ^(pZ))]/β.W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(p)-F_{E^{\prime}}(p^{\prime})-\\ [D(p_{Z}\|\hat{\phi}(p_{Z}))-D({p_{Z}^{\prime}}\|\hat{\phi}({p_{Z}^{\prime}}))]/\beta. (91)

Eqs. 90 and 91 can also be used to derive bounds on average work extraction in feedback control protocols, using the strategy described in Section IV.

If ϕ^\hat{\phi} represents coarse-grained symmetry or modularity constraints, then Eq. 90 implies that any asymmetry or inter-subsystem correlation in the macrostate distribution can only be dissipated away, not turned into work. Another simple application occurs when all LΛL\in\Lambda have the same coarse-grained equilibrium distribution, i.e., there is some πZ\pi_{Z} such that L^πZ=0\hat{L}\pi_{Z}=0 for all LL. In this case, ϕ^(p)=πZ\hat{\phi}(p)=\pi_{Z} satisfies Eqs. 14 and 16 at the coarse-grained level (compare to the derivation of Eq. 27 above). Applying Eq. 90 then gives

Σ(pp)D(pZπZ)D(pZπZ)0,\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p_{Z}\|\pi_{Z})-D({p_{Z}^{\prime}}\|\pi_{Z})\geq 0, (92)

as well as a corresponding extractable work bound, as in Eq. 91. This shows that if the coarse-grained equilibrium distribution πZ\pi_{Z} cannot change, then any deviation between the actual coarse-grained distribution pZp_{Z} and πZ\pi_{Z} must be dissipated as EP, not turned into work.

VII.1 Example: Szilard box

Refer to caption
Figure 13: A two-dimensional Szilard box with a Brownian particle, in the presence of gravity.

We demonstrate our results on coarse-grained constraints using the Szilard box. We consider a similar setup as in Sections V.1 and VI.1, where there is a single overdamped particle in a box coupled to a bath at inverse temperature β=1\beta=1. However, we now assume that there is a vertical gravitational force, as illustrated in Fig. 13. Formally, this means that the available potential energy functions have the form

Eλ(x1,x2)=Vp(x1λ)+Vw(|x1|)+Vw(|x2|)+κx2,E_{\lambda}(x_{1},x_{2})=V_{\mathrm{p}}(x_{1}-\lambda)+V_{\mathrm{w}}(|x_{1}|)+V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2}, (93)

where κ\kappa is a fixed constant that determines the strength of gravity. Unlike Eq. 44, this energy function in Eq. 93 no longer obeys the reflection symmetry (x1,x2)(x1,x2)(x_{1},x_{2})\mapsto(x_{1},-x_{2}).

The microstate of the particle is represented by the horizontal and vertical position, x=(x1,x2)x=(x_{1},x_{2}). We consider a coarse-graining in which the macrostate is the vertical coordinate of the particle Z=X2Z=X_{2}, corresponding to the coarse-graining function ξ(x1,x2)=Wx=x2\xi(x_{1},x_{2})=Wx=x_{2} with W=[0 1]W=[0\;1]. It is easy to check that the potential energy functions in Eq. 93 satisfy

WEλ(x)=x2[Vw(|x2|)+κx2],\displaystyle W\nabla E_{\lambda}(x)=\partial_{x_{2}}[V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2}], (94)

which obeys Eq. 85 and therefore guarantees that the coarse-grained dynamics are closed. In fact, the coarse-grained generators have the Fokker-Planck form of Eq. 86 with the coarse-grained drift function F^(x2)=x2[Vw(|x2|)+κx2]\hat{F}(x_{2})=-\partial_{x_{2}}[V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2}], which leads to the following Boltzmann stationary distribution:

πX2(x2)\displaystyle\pi_{X_{2}}(x_{2}) eβ[Vw(|x2|)+κx2]\displaystyle\propto e^{-\beta[V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2}]}
=𝟏[1,1](x2)eβκx2,\displaystyle=\mathbf{1}_{[-1,1]}(x_{2})e^{-\beta\kappa x_{2}}, (95)

where in the second line we used the form of Vw()V_{\mathrm{w}}(\cdot) from Eq. 45. Since the coarse-grained equilibrium distribution is the same for all energy functions having the form Eq. 93, we can use the EP bound in Eq. 92.

Suppose that the system starts from some initial distribution pp and is then driven to a final equilibrium distribution pp^{\prime} while extracting work. We assume that the partition is removed at the beginning and end of the protocol, corresponding to the energy function E(x1,x2)=Vw(|x1|)+Vw(|x2|)+κx2E^{\varnothing}(x_{1},x_{2})=V_{\mathrm{w}}(|x_{1}|)+V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2}, with the Boltzmann distribution

π(x1,x2)𝟏[1,1]2(x1,x2)eβκx2.\displaystyle\pi^{\varnothing}(x_{1},x_{2})\propto\mathbf{1}_{[-1,1]^{2}}(x_{1},x_{2})e^{-\beta\kappa x_{2}}. (96)

We will also assume that the final distribution is in equilibrium, so p=πp^{\prime}=\pi^{\varnothing}. Then, the extractable work involved in this transformation can be expressed as

W(pπ)\displaystyle W(p\!\shortrightarrow\!\pi^{\varnothing}) =FE(p)FE(π)Σ(pπ)\displaystyle=F_{E^{\varnothing}}(p)-F_{E^{\varnothing}}(\pi^{\varnothing})-\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})
=D(pπ)Σ(pπ),\displaystyle=D(p\|\pi^{\varnothing})-\Sigma(p\!\shortrightarrow\!\pi^{\varnothing}), (97)

where we used Eqs. 2 and 5. We can then upper bound extractable work by combining Eq. 97 with various lower bounds on Σ(pπ)\Sigma(p\!\shortrightarrow\!\pi^{\varnothing}).

For instance, the second law states that Σ(pπ)0\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})\geq 0, so

W(pπ)D(pπ).\displaystyle W(p\!\shortrightarrow\!\pi^{\varnothing})\leq D(p\|\pi^{\varnothing}). (98)

We can also derive a stronger bound by exploiting coarse-grained constraints. For the coarse-graining described above, Eq. 92 implies that Σ(pπ)D(pX2πX2)\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})\geq D(p_{X_{2}}\|\pi_{X_{2}}), which gives the bound

W(pπ)\displaystyle W(p\!\shortrightarrow\!\pi^{\varnothing}) D(pπ)D(pX2πX2)\displaystyle\leq D(p\|\pi^{\varnothing})-D(p_{X_{2}}\|\pi_{X_{2}})
=D(pX1|X2πX1|X2).\displaystyle=D(p_{X_{1}|X_{2}}\|\pi^{\varnothing}_{X_{1}|X_{2}}). (99)

We can also bound EP and work using other kinds of constraints. For instance, the energy functions in Eq. 93 have no interaction terms between x1x_{1} and x2x_{2}, and therefore obey modularity constraints for the decomposition 𝒞={{X1},{X2}}\mathcal{C}=\{\{X_{1}\},\{X_{2}\}\} (see the analysis in Section VI.1). This allows us to bound EP and work using the operator ϕ𝒞\phi_{{\mathcal{C}}}, as defined above in Eq. 70. In particular, using Theorem 2, we have that

Σ(pπ)\displaystyle\Sigma(p\!\shortrightarrow\!\pi^{\varnothing}) =D(pϕ𝒞(p))+Σ(ϕ𝒞(p)π)\displaystyle=D(p\|\phi_{{\mathcal{C}}}(p))+\Sigma(\phi_{{\mathcal{C}}}(p)\!\shortrightarrow\!\pi^{\varnothing}) (100)
D(pϕ𝒞(p)).\displaystyle\geq D(p\|\phi_{{\mathcal{C}}}(p)).

which implies the extractable work bound

W(pπ)\displaystyle W(p\!\shortrightarrow\!\pi^{\varnothing}) D(pπ)D(pϕ𝒞(p))=D(ϕ𝒞(p)π).\displaystyle\leq D(p\|\pi^{\varnothing})-D(p\|\phi_{{\mathcal{C}}}(p))=D(\phi_{{\mathcal{C}}}(p)\|\pi^{\varnothing}). (101)

Finally, we can also combine modularity and coarse-grained constraints. The coarse-grained constraints implies that Σ(ϕ𝒞(p)π)D(ϕ𝒞(p)X2πX2)\Sigma(\phi_{{\mathcal{C}}}(p)\!\shortrightarrow\!\pi^{\varnothing})\geq D(\phi_{{\mathcal{C}}}(p)_{X_{2}}\|\pi_{X_{2}}) by Eq. 92. Plugged into Eq. 100, this gives

Σ(pπ)D(pϕ𝒞(p))+D(ϕ𝒞(p)X2πX2),\displaystyle\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})\geq D(p\|\phi_{{\mathcal{C}}}(p))+D(\phi_{{\mathcal{C}}}(p)_{X_{2}}\|\pi_{X_{2}}), (102)

resulting in the extractable work bound

W(ϕ𝒞(p)π)D(ϕ𝒞(p)X1|X2πX1|X2),\displaystyle W(\phi_{{\mathcal{C}}}(p)\!\shortrightarrow\!\pi^{\varnothing})\leq D(\phi_{{\mathcal{C}}}(p)_{X_{1}|X_{2}}\|\pi^{\varnothing}_{X_{1}|X_{2}}), (103)

where we’ve again used the chain rule of KL divergence.

Refer to caption
Figure 14: Szilard box with gravity: bounds on extractable work as a function of θ\theta, as derived from the second law (in blue, Eq. 98), coarse-grained constraints (in orange, Eq. 99), modularity constraints (in green, Eq. 101), and a combination of modularity+coarse-grained constraints (in red, Eq. 103).

We now illustrate these bounds using a concrete set of initial distributions. Imagine that the initial distribution pp is the equilibrium distribution π\pi^{\varnothing} restricted to half the box, as determined by a rotated separating line at some angle θ[π,π]\theta\in[-\pi,\pi],

pθ(x1,x2)=12π(x1,x2)Θ(x2sinθx1cosθ).\displaystyle p_{\theta}(x_{1},x_{2})=\frac{1}{2}\pi^{\varnothing}(x_{1},x_{2})\Theta(x_{2}\sin\theta-x_{1}\cos\theta). (104)

(Compare to Eq. 49, for the Szilard box without gravity). For these initial distributions and gravity parameter κ=1\kappa=1, we plot the four extractable work bounds derived above, Eqs. 98, 99, 101 and 103, as a function of θ\theta in Fig. 14 (values are calculated numerically). Note that, unlike the results presented in Figs. 6 and 9, the plots are no longer symmetric under the transformation θθ\theta\mapsto-\theta. This arises because gravity breaks the vertical reflection symmetry, so the nonequilibrium free energy of a distribution concentrated on the top half of the box (θ=π/2\theta=\pi/2) is greater than the nonequilibrium free energy of a distribution concentrated on the bottom half of the box (θ=π/2\theta=-\pi/2). It can also be seen that work bounds derived from coarse-grained constraints, Eq. 99 (orange), can be either weaker or stronger than the work bounds derived from modularity constraints, Eq. 101 (green), depending on the value of θ\theta. For all θ\theta, however, the work bound derived by combining both constraints, Eq. 103 (red), is stronger than the work bound derived from either constraint individually.

VIII Relevant literature

In previous work on the general topic of thermodynamic bounds under constraints, Wilming et al. (wilming_second_2016, ) considered how extractable work depends on constraints on the Hamiltonian, given a quantum system coupled to a finite-sized heat bath. That paper derived an upper bound on the work that could be extracted by carrying out a physical process which consists of sequences of (1) unitary transformations of the system and bath, and (2) total relaxations of the system to some equilibrium Gibbs state (see also a similar setup for closed systems in perarnau-llobet_work_2016 ). Building on (wilming_second_2016, ), lekscha_quantum_2018 analyzed the efficiency of a heat engine coupled to two baths and subject to “local control” constraints (i.e., a many particle system where local Hamiltonians can be changed but the interaction Hamiltonians cannot). In contrast to these works, we consider a classical system coupled to idealized reservoir(s). We then derive bounds on EP and work for a much broader set of protocols.

At a high level, our approach complements previous research on the relationship between EP, extractable work and different aspects of the driving protocol, such as temporal duration (esposito2010finite, ; sivak2012thermodynamic, ; shiraishi_speed_2018, ; gomez2008optimal, ; then2008computing, ; zulkowski2014optimal, ; schmiedl2007optimal, ), stochasticity of control parameters (machta2015dissipation, ), non-idealized work reservoirs (verley_work_2014, ), cyclic protocols schmiedl2007optimal ; allahverdyan2004maximal , the presence of additional conservation laws uzdin2021passivity , and the design of “optimal protocols” (solon2018phase, ; gingrich2016near, ; aurell2011optimal, ).

There is also previous work related to our analysis of thermodynamics of information under constraints in Section IV. (still2020thermodynamic, ) recently analyzed the thermodynamics of feedback control under a somewhat different formulation of constraints 111111That paper proposed to divide the system into two subsystems YY and Y¯\bar{Y}, such that the accessible information is given by I(M;Y)I(M;Y), under three assumption: (1) the system’s marginal distributions remains constant during all steps of feedback control, (2) the conditional distribution of Y¯\bar{Y} given the system and the measurement does not change during the driving, and (3) all conditional information about Y¯\bar{Y} is lost by the time that driving begins. After private communication with the author of still2020thermodynamic , we think that condition (3) may need to be formalized as p(y¯(t2)|y(t2),z(t0))=p(y¯(t2)|y(t2))p(\bar{y}(t_{2})|y(t_{2}),z(t_{0}))=p(\bar{y}(t_{2})|y(t_{2})), although this equation does not appear in that paper.. In this work, we analyze the thermodynamics of information for a broader set of constraints. It is not immediately clear how the framework in still2020thermodynamic compares to ours, or whether it could be applied to the examples considered in this paper, although such a comparison is an interesting direction for future work.

Some of our results concerning work extraction under modularity constraints in Section VI have appeared in prior literature. Eq. 66 was derived in (Boyd:2018aa, ) for the special case of an isothermal processes with two non-overlapping subsystems, where one of the subsystems is held fixed. For the more general case of an arbitrary discrete-state system coupled to one or more reservoirs which have rate matrices as in Eq. 61, Eq. 66 was also previously derived in wolpert_thermo_comp_review_2019 ; wolpert2020thermodynamic , while Eq. 67 was previously derived in wolpert_thermo_comp_review_2019 ; wolpert.thermo.bayes.nets.2020 ; wolpert2020thermodynamic . Decompositions with overlap were previously considered in wolpert2020fluctuation ; wolpert2020minimal . In addition, Example 1 in wolpert2020strengthened can be used to derive the first inequality Eq. 65 for discrete-state systems 121212The reader should be aware that those papers used different terminology from this paper. In wolpert2020fluctuation ; wolpert2020minimal , each degree of freedom vVv\in V is called a “subsystem”, the modular decomposition 𝒞\mathcal{C} is called a “unit structure”, while each A𝒞A\in\mathcal{C} is called a “unit”..

Those papers also derived some results that were more general than the ones derived here, in that they apply even if the overlap changes state. Our paper goes beyond this previous work though to include continuous-state systems, and to derive inequalities such as D(pϕ𝒞(p))D(pϕ𝒞(p))0D(p\|\phi_{{\mathcal{C}}}(p))-D(p^{\prime}\|\phi_{{\mathcal{C}}}(p^{\prime}))\geq 0, albeit for the more restricted scenario where the overlap does not change state.

Some of our results concerning work extraction under symmetry constraints, presented in Section V, appeared in previous work on quantum thermodynamics. For a finite-state quantum system coupled to a work reservoir and heat bath, Vaccaro et al. vaccaro_tradeoff_2008 investigated how much work can be extracted by bringing some initial quantum state ρ\rho to a maximally mixed state, with a uniform initial and final Hamiltonian, using discrete-time operations that commute with the action of some symmetry group 𝒢\mathcal{G}. It was shown that the work that can be extracted from ρ\rho under such transformations is equal to the work that can be extracted from the (quantum) twirling ϕ𝒢(ρ)\phi_{\mathcal{G}}(\rho), analogous to Eq. 24 for symmetry constraints. This research also derived an operational measure of asymmetry that is the quantum equivalent of D(pϕ𝒢(p))D(p\|\phi_{\mathcal{G}}(p)), and showed that asymmetry can only decrease under operations that commute with 𝒢\mathcal{G}. Janzing janzing_quantum_2006 extended vaccaro_tradeoff_2008 to consider arbitrary Hamiltonians, in the process deriving analogues of our decomposition of free energy (Eq. 21) for the special case of the twirling operator ϕ𝒢\phi_{\mathcal{G}}. A similar decomposition of free energy into coherent and incoherent components has recently appeared in lostaglio_description_2015 ; santos_role_2019 (this is a special case of the result in janzing_quantum_2006 , since a decohering map is a twirling operator elphick2019spectral ). Finally, the idea of probability distributions that are invariant under symmetry groups, as well as a version of the twirling operator ϕ𝒢\phi_{\mathcal{G}}, is a topic of research in probability and statistics; for details, see Ch. 3 in eaton_group_1989 .

While our approach is restricted to classical systems, in some respects our results for symmetry constraints are more general than this earlier work, since they hold for arbitrary (discrete and/or uncountably infinite) state spaces and for systems coupled to more than one reservoir (see Section IX). Moreover, for Fokker-Planck dynamics, we derive simple conditions for symmetry constraints stated in terms of the energy functions, which makes these results applicable to a large set of problems in stochastic thermodynamics and biophysics.

More fundamentally, one of the ways in which we go beyond previous literature on symmetry and modularity constraints is that by providing a unified mathematical framework that applies to a broad set of constraints, including symmetry, modularity, and coarse-grained constraints (as well as their combinations) as special cases. A key idea in our framework is that the information-geometric Pythagorean identity, Eq. 14, is the essential property that allows an operator ϕ\phi to uncover the thermodynamically accessible part of any distribution pp (assuming also that ϕ\phi commutes with the dynamics). The Pythagorean identity is satisfied by many ϕ\phi, including both linear operators such as twirling operators ϕ𝒢\phi_{\mathcal{G}} and nonlinear operators such as modular decomposition operators ϕ𝒞\phi_{{\mathcal{C}}}. We believe this idea can be extended to the quantum domain, though we leave this for future work.

Finally, our approach is also related to “resource theories”, which are an active area of research in various areas of quantum physics chitambar2019quantum , including quantum thermodynamics  wilming_second_2016 ; gallego_thermodynamic_2016 ; brandao_resource_2013 ; lostaglio_stochastic_2015 ; faist_fundamental_2018 ; yunger_halpern_beyond_2016 . A resource theory quantifies a physical resource in an operational way, in terms of what transformations are possible when the resource is available. Most resource theories are based on a common set of formal elements, such as a resource quantifier (a real-valued function that measures the amount of a resource), a set of free states (statistical states that lack the resource), and free operations (transformations between statistical states that do not increase the amount of resource). In fact, some previous work on symmetry constraints in quantum thermodynamics vaccaro_tradeoff_2008 ; janzing_quantum_2006 can be seen as part of a broader literature on the resource theory of asymmetry marvian_extending_2014 ; marvian_asymmetry_2014 ; marvian_modes_2014 .

Our approach has similar operational motivations as resource theories; for example, we define “accessible free energy” in an operational way, as a quantity that governs extractable work under protocol constraints. Moreover, many elements of our framework are analogous to elements of the resource theory framework: the set of allowed generators (which we call Λ\Lambda) plays the role of the free operations, the image of the operator ϕ\phi plays the role of the set of free states, and the KL divergence D(pϕ(p))D(p\|\phi(p)) serves as the resource quantifier. In addition, the commutativity relation Eq. 16 (see Section III) has recently appeared in work on so-called resource destroying maps liu2017resource . However, unlike most resource theories, our focus is on the thermodynamics of classical systems modeled as driven continuous-time open systems. Further exploration of the connection between our approach and resource theories is left for future work.

IX Discussion

In this paper, we analyzed the EP and work incurred by a driving protocol that carries out some transformation ppp\!\shortrightarrow\!p^{\prime}, while subject to constraints on the set of available generators. We constructed a general framework that allowed us derive several decompositions and bounds on EP and extractable work, and demonstrated that this framework has implications for the thermodynamics of feedback control under constraints. Finally, we used our framework to analyze three broad classes of protocol constraints, reflecting symmetry, modularity, and coarse-graining.

Note that our bounds on EP and extractable work, such as Eqs. 18 and 25, are expressed in terms of state functions, i.e., they depend only on the initial and final distributions pp and pp^{\prime} and not on the path that the system takes in going from pp to pp^{\prime}. In general, it may be possible to derive other bounds on work and EP that are not written in this form, which may be tighter. Nonetheless, bounds written in terms of state functions have some important advantages. In particular, they allow one to quantify the inherent “thermodynamic value” (in terms of EP and work) of a distribution pp relative to a set of available generators, irrespective of what protocol brought the system there or what future protocols that system may undergo (as long as those protocols obey the relevant constraints).

For simplicity, our results were derived for isothermal protocols, where the system is coupled to a single heat bath at a constant inverse temperature β\beta and obeys local detailed balance (LDB). Nonetheless, many of our results continue to hold for more general protocols, in which the system is coupled to any number of thermodynamic reservoirs and/or violates LDB. For a general protocol, our EP rate in Eq. 11 refers to the so-called nonadiabatic EP rate van2010three ; esposito2010three ; lee_fluctuation_2013 , which is a non-negative quantity that reflects the contribution to EP that is due to the system being out of the stationary distribution. In the general case, our decompositions in Theorems 1 and 2, as well as EP lower bounds in Eqs. 18 and 33, apply to nonadiabatic EP, rather than overall EP. Importantly, the nonadiabatic EP rate is a lower bound on the overall EP rate whenever the stationary distribution of LL is symmetric under conjugation of odd-parity variables lee_fluctuation_2013 , which holds in most cases of interest such as discrete-state master equations (which typically have no odd variables), overdamped dynamics (which have no odd variables), and many types of underdamped dynamics. In such cases, Eqs. 18 and 33 provide lower bounds not only on the nonadiabatic EP, but also on the overall EP, regardless of the number of coupled reservoirs or LDB. However, the relationship between work and EP in Eq. 2, as well as our bounds on work which make use of this relationship such as Eqs. 24 and 25, hold only for isothermal protocols. Note that our EP bound for closed coarse-grained dynamics, Eq. 87, concerns the overall EP rate, not the nonadiabatic EP rate, even for non-isothermal protocols (see Section D.2 for details).

There are several possible directions for future research.

First, it remains an open question of whether our framework can also be used to analyze other classes of constraints, beyond the three classes (symmetry, modularity, and coarse-graining) considered in this paper.

Second, our results point to a novel connection between entropy production, which plays a central role in nonequilibrium thermodynamics, and the Pythagorean identity in Eq. 14, which plays a central role in information geometry. This contributes to the growing number of existing results that demonstrate formal relationships between information geometry and nonequilibrium thermodynamics ito2018stochastic ; takahashi2017shortcuts ; ito2020unified ; nicholson2018nonequilibrium ; ito2020stochastic ; nakamura2019reconsideration . One direction for future work would be to extend the framework developed in this work for classical to quantum systems. In this extension, one would derive bounds on quantum work and EP by considering a quantum operator ϕ\phi over density matrices which obeys quantum analogues of the Pythagorean identity in Eq. 14 (petzQuantumInformationTheory2008, , p. 44) and the commutativity relation in Eq. 16.

Finally, our results may also lead to some new treatments of foundational questions in thermodynamics. In stochastic thermodynamics, probability distributions over system states are usually interpreted in a “subjective” sense, in that the distribution pp assigned to a system typically reflects what one knows about the system (for this reason, this distribution changes once a measurement is made of the system’s state parrondo2015thermodynamics ). At the same time, our results show that for constrained driving protocols, one can often assign a different distribution to the system, ϕ(p)\phi(p), which reflects what one can control about the system. This also leads to the difference between the overall nonequilibrium free energy, defined in terms of the distribution pp, and the accessible free energy, defined in terms of the distribution ϕ(p)\phi(p). Note that thermodynamic entropy is often understood in an operational way, e.g., in terms of constrained macroscopic control, as has been previously discussed by Jaynes jaynes1992gibbs and others. An interesting direction for future work would explore whether the distinction between the distributions pp and ϕ(p)\phi(p) maps onto the distinction between (microscopic) statistical mechanical entropy and (macroscopic) thermodynamic entropy. In particular, one might ask whether this mapping can resolve some classic paradoxes concerning the relationship between statistical mechanical and thermodynamic entropy, such as the Gibbs paradox jaynes1992gibbs (mixing of indistinguishable particles increases statistical mechanical entropy but not thermodynamic entropy) and Loschmidt’s paradox (for an isolated Hamiltonian system, statistical mechanical entropy remains constant while the thermodynamic entropy can increase). This direction could also be related to a recent axiomatic treatment of thermodynamic entropy which has been developed within the framework of quantum resource theory weilenmann2016axiomatic .

Acknowledgments

We thank Massimiliano Esposito and Henrik Wilming for helpful discussions. This research was supported by grant number FQXi-RFP-IPW-1912 from the Foundational Questions Institute and Fetzer Franklin Fund, a donor advised fund of Silicon Valley Community Foundation. The authors thank the Santa Fe Institute for helping to support this research.

References

  • (1) K. Takara, H.-H. Hasegawa, and D. Driebe, “Generalization of the second law for a transition between nonequilibrium states,” Physics Letters A, vol. 375, no. 2, pp. 88–92, Dec. 2010.
  • (2) J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa, “Thermodynamics of information,” Nature Physics, vol. 11, no. 2, pp. 131–139, 2015.
  • (3) M. Esposito and C. Van den Broeck, “Second law and Landauer principle far from equilibrium,” EPL (Europhysics Letters), vol. 95, no. 4, p. 40004, 2011.
  • (4) We use a Brownian model of the Szilard engine, which is similar to setups commonly employed in modern nonequilibrium statistical physics berut2012experimental ; roldan2014universal ; koski2014experimental ; shizume1995heat ; gong2016stochastic ; parrondo2015thermodynamics , as shown in Fig. 1. This model can be justified by imagining a box that contains a large colloidal particle, as well as a medium of small solvent particles to which the vertical partition is permeable. Note that this model differs from Szilard’s original proposal szilard1929entropieverminderung , in which the box contains a single particle in a vacuum, which has been analyzed in proesmans2015efficiency ; hondou2007equation ; bhat2017unusual .
  • (5) T. Sagawa and M. Ueda, “Second law of thermodynamics with discrete quantum feedback control,” Physical Review Letters, vol. 100, no. 8, p. 080403, 2008.
  • (6) As common in the literature, in Eq. 3 we consider only the work that is extractable from the system after the measurement is made. We do not account for the possible work cost of making the measurement, nor any work exchanges that may be incurred by the measurement apparatus during the driving.
  • (7) P. A. Corning and S. J. Kline, “Thermodynamics, information and life revisited, part II: ‘Thermoeconomics’ and ‘Control information’,” Systems Research and Behavioral Science: The Official Journal of the International Federation for Systems Research, vol. 15, no. 6, pp. 453–482, 1998.
  • (8) A. Kolchinsky and D. H. Wolpert, “Semantic information, autonomous agency and non-equilibrium statistical physics,” Interface focus, vol. 8, no. 6, p. 20180041, 2018.
  • (9) S. A. Kauffman, Investigations.   Oxford University Press, 2000.
  • (10) D. H. Wolpert and A. Kolchinsky, “Thermodynamics of computing with circuits,” New Journal of Physics, vol. 22, no. 6, p. 063047, 2020.
  • (11) J. Song, S. Still, R. Díaz Hernández Rojas, I. Pérez Castillo, and M. Marsili, “Optimal work extraction and mutual information in a generalized szilárd engine,” Phys. Rev. E, vol. 103, p. 052121, May 2021.
  • (12) F. J. Cao, L. Dinis, and J. M. R. Parrondo, “Feedback control in a collective flashing ratchet,” Physical Review Letters, vol. 93, no. 4, p. 040603, 2004.
  • (13) D. Janzing, “Quantum Thermodynamics with Missing Reference Frames: Decompositions of Free Energy Into Non-Increasing Components,” Journal of Statistical Physics, vol. 125, no. 3, pp. 761–776, Nov. 2006.
  • (14) J. A. Vaccaro, F. Anselmi, H. M. Wiseman, and K. Jacobs, “Tradeoff between extractable mechanical work, accessible entanglement, and ability to act as a reference system, under arbitrary superselection rules,” Phys. Rev. A, vol. 77, p. 032114, Mar 2008.
  • (15) The assumption of unique stationary distributions can be relaxed as long as the operator ϕ\phi (as discussed in Section III) satisfies the following weak technical condition: for all p𝒫p\in\mathcal{P} and each stationary distribution π\pi of each LΛL\in\Lambda, D(pϕ(π))<D(p\|\phi(\pi))<\infty whenever D(pπ)<D(p\|\pi)<\infty. Note that ϕ(π)\phi(\pi) is also a stationary distribution of LL by Lemma 1 in Appendix A, so this condition is automatically satisfied when the generators have unique stationary distributions (since in that case π=ϕ(π)\pi=\phi(\pi)). Note also that if some LΛL\in\Lambda have multiple stationary distributions π\pi, the corresponding EP rate in Eq. 11 can be equivalently defined using any π\pi such that D(pπ)<D(p\|\pi)<\infty.
  • (16) M. Esposito and C. Van den Broeck, “Three faces of the second law. I. Master equation formulation,” Physical Review E, vol. 82, no. 1, p. 011143, 2010.
  • (17) N. G. Van Kampen, Stochastic processes in physics and chemistry.   Elsevier, 1992, vol. 1.
  • (18) H. Risken, “Fokker-Planck equation,” in The Fokker-Planck Equation.   Springer, 1996, pp. 63–95.
  • (19) C. Van den Broeck and M. Esposito, “Three faces of the second law. II. Fokker-Planck formulation,” Physical Review E, vol. 82, no. 1, p. 011144, 2010.
  • (20) D. L. Ermak and J. A. McCammon, “Brownian dynamics with hydrodynamic interactions,” The Journal of chemical physics, vol. 69, no. 4, pp. 1352–1360, 1978.
  • (21) S.-i. Amari, Information geometry and its applications.   Springer, 2016, vol. 194.
  • (22) This is because D(pq)D(pϕ(p))D(p\|q)\geq D(p\|\phi(p)) for any qimgϕq\in\mathrm{img}\;\phi, which follows from Eq. 14 and the non-negativity of KL divergence.
  • (23) A. Kolchinsky and D. H. Wolpert, “Entropy production given constraints on the energy functions,” Phys. Rev. E, vol. 104, p. 034129, Sep 2021.
  • (24) U. Seifert, “Stochastic thermodynamics, fluctuation theorems and molecular machines,” Reports on Progress in Physics, vol. 75, no. 12, p. 126001, 2012.
  • (25) A. Kolchinsky and D. H. Wolpert, “The state dependence of integrated, instantaneous, and fluctuating entropy production in quantum and classical processes,” arXiv preprint arXiv:2103.05734, 2021.
  • (26) H. Kwon and M. S. Kim, “Fluctuation theorems for a quantum channel,” Physical Review X, vol. 9, no. 3, p. 031029, 2019.
  • (27) A compact group 𝒢\mathcal{G} has a measurable action over XX if the action 𝒢×XX\mathcal{G}\times X\to X is a measurable function, where we assume 𝒢\mathcal{G} and XX are endowed with their respective Borel algebras.
  • (28) Technically, the definition of the twirling operator in Eq. 42 applies only when pp is a finite-valued probability density function (which excludes things such as the Dirac delta “function”). A more general formulation of our results can be developed in terms of probability measures rather than probability densities (see Ch. 3 in eaton_group_1989 for a version of Eq. 42 defined in terms of probability measures).
  • (29) K. G. H. Vollbrecht and R. F. Werner, “Entanglement measures under symmetry,” Physical Review A, vol. 64, no. 6, p. 062307, 2001.
  • (30) Technically, the wall potential as defined in Eq. 45 is non-differentiable. To be more accurate, one should imagine it in terms of the limit Vw(|x|)=limα|x|αV_{\mathrm{w}}(|x|)=\lim_{\alpha\to\infty}|x|^{\alpha} dhar2019run .
  • (31) S. Still and D. Daimer, “Partially observable szilard engines,” arXiv preprint arXiv:2103.15803, 2021.
  • (32) P. L. Krapivsky, S. Redner, and E. Ben-Naim, A Kinetic View of Statistical Physics.   Cambridge University Press, Nov. 2010.
  • (33) J. Lekscha, H. Wilming, J. Eisert, and R. Gallego, “Quantum thermodynamics with local control,” Physical Review E, vol. 97, no. 2, p. 022142, Feb. 2018.
  • (34) N. Gershenfeld, “Signal entropy and the thermodynamics of computation,” IBM Systems Journal, vol. 35, no. 3.4, pp. 577–586, 1996.
  • (35) A. B. Boyd, D. Mandal, and J. P. Crutchfield, “Thermodynamics of modularity: Structural costs beyond the landauer bound,” Phys. Rev. X, vol. 8, p. 031036, Aug 2018.
  • (36) G. Schlosser and G. P. Wagner, Modularity in development and evolution.   University of Chicago Press, 2004.
  • (37) O. Sporns and R. F. Betzel, “Modular brain networks,” Annual review of psychology, vol. 67, pp. 613–640, 2016.
  • (38) One can also apply the results in this section to Fokker-Planck equations that can be put in the form of Eq. 62 via an appropriate change of variables, see (risken1996fokker, , Sec. 4.9).
  • (39) The multi-information is a well-known generalization of mutual information, which is also sometimes called “total correlation” (watanabe1960information, ).
  • (40) E. Craig, N. Kuwada, B. Lopez, and H. Linke, “Feedback control in flashing ratchets,” Annalen der Physik, vol. 17, no. 2-3, pp. 115–129, Feb. 2008.
  • (41) M. Feito and F. J. Cao, “Information and maximum power in a feedback controlled Brownian ratchet,” The European Physical Journal B, vol. 59, no. 1, pp. 63–68, Sep. 2007.
  • (42) B. J. Lopez, N. J. Kuwada, E. M. Craig, B. R. Long, and H. Linke, “Realization of a feedback controlled flashing ratchet,” Physical Review Letters, vol. 101, no. 22, p. 220601, Nov. 2008.
  • (43) G. Nicolis, “Transformation properties of entropy production,” Physical Review E, vol. 83, no. 1, p. 011112, 2011.
  • (44) If ξ(x)=Wx\xi(x)=Wx and WWTIWW^{T}\neq I, one can define an equivalent, rescaled coarse-graining function ξ(x)=Wx\xi^{\prime}(x)=W^{\prime}x, where W:=(WWT)1/2WW^{\prime}:=(WW^{T})^{-1/2}W, which obeys WWT=IW^{\prime}W^{\prime T}=I.
  • (45) M. H. Duong, A. Lamacz, M. A. Peletier, A. Schlichting, and U. Sharma, “Quantification of coarse-graining error in Langevin and overdamped Langevin dynamics,” Nonlinearity, vol. 31, no. 10, p. 4517, 2018.
  • (46) M. Esposito, “Stochastic thermodynamics under coarse graining,” Physical Review E, vol. 85, no. 4, p. 041125, 2012.
  • (47) H. Wilming, R. Gallego, and J. Eisert, “Second law of thermodynamics under control restrictions,” Phys. Rev. E, vol. 93, p. 042126, Apr 2016.
  • (48) M. Perarnau-Llobet, A. Riera, R. Gallego, H. Wilming, and J. Eisert, “Work and entropy production in generalised Gibbs ensembles,” New Journal of Physics, vol. 18, no. 12, p. 123035, Dec. 2016.
  • (49) M. Esposito, R. Kawai, K. Lindenberg, and C. Van den Broeck, “Finite-time thermodynamics for a single-level quantum dot,” EPL (Europhysics Letters), vol. 89, no. 2, p. 20003, 2010.
  • (50) D. A. Sivak and G. E. Crooks, “Thermodynamic metrics and optimal paths,” Physical Review Letters, vol. 108, no. 19, p. 190602, 2012.
  • (51) N. Shiraishi, K. Funo, and K. Saito, “Speed limit for classical stochastic processes,” Phys. Rev. Lett., vol. 121, p. 070601, Aug 2018.
  • (52) A. Gomez-Marin, T. Schmiedl, and U. Seifert, “Optimal protocols for minimal work processes in underdamped stochastic thermodynamics,” The Journal of chemical physics, vol. 129, no. 2, p. 024114, 2008.
  • (53) H. Then and A. Engel, “Computing the optimal protocol for finite-time processes in stochastic thermodynamics,” Physical Review E, vol. 77, no. 4, p. 041105, 2008.
  • (54) P. R. Zulkowski and M. R. DeWeese, “Optimal finite-time erasure of a classical bit,” Physical Review E, vol. 89, no. 5, p. 052140, 2014.
  • (55) T. Schmiedl and U. Seifert, “Optimal finite-time processes in stochastic thermodynamics,” Physical Review Letters, vol. 98, no. 10, p. 108301, 2007.
  • (56) B. B. Machta, “Dissipation bound for thermodynamic control,” Physical Review Letters, vol. 115, no. 26, p. 260603, 2015.
  • (57) G. Verley, C. V. d. Broeck, and M. Esposito, “Work statistics in stochastically driven systems,” New Journal of Physics, vol. 16, no. 9, p. 095001, 2014.
  • (58) A. E. Allahverdyan, R. Balian, and T. M. Nieuwenhuizen, “Maximal work extraction from finite quantum systems,” EPL (Europhysics Letters), vol. 67, no. 4, p. 565, 2004.
  • (59) R. Uzdin and S. Rahav, “Passivity deformation approach for the thermodynamics of isolated quantum setups,” PRX Quantum, vol. 2, no. 1, p. 010336, 2021.
  • (60) A. P. Solon and J. M. Horowitz, “Phase transition in protocols minimizing work fluctuations,” Physical Review Letters, vol. 120, no. 18, p. 180605, 2018.
  • (61) T. R. Gingrich, G. M. Rotskoff, G. E. Crooks, and P. L. Geissler, “Near-optimal protocols in complex nonequilibrium transformations,” Proceedings of the National Academy of Sciences, vol. 113, no. 37, pp. 10 263–10 268, 2016.
  • (62) E. Aurell, C. Mejía-Monasterio, and P. Muratore-Ginanneschi, “Optimal protocols and optimal transport in stochastic thermodynamics,” Physical Review Letters, vol. 106, no. 25, p. 250601, 2011.
  • (63) S. Still, “Thermodynamic cost and benefit of memory,” Physical Review Letters, vol. 124, no. 5, p. 050601, 2020.
  • (64) That paper proposed to divide the system into two subsystems YY and Y¯\bar{Y}, such that the accessible information is given by I(M;Y)I(M;Y), under three assumption: (1) the system’s marginal distributions remains constant during all steps of feedback control, (2) the conditional distribution of Y¯\bar{Y} given the system and the measurement does not change during the driving, and (3) all conditional information about Y¯\bar{Y} is lost by the time that driving begins. After private communication with the author of still2020thermodynamic , we think that condition (3) may need to be formalized as p(y¯(t2)|y(t2),z(t0))=p(y¯(t2)|y(t2))p(\bar{y}(t_{2})|y(t_{2}),z(t_{0}))=p(\bar{y}(t_{2})|y(t_{2})), although this equation does not appear in that paper.
  • (65) D. H. Wolpert, “The stochastic thermodynamics of computation,” Journal of Physics A: Mathematical and Theoretical, 2019.
  • (66) ——, “Uncertainty relations and fluctuation theorems for bayes nets,” Phys. Rev. Lett., vol. 125, p. 200602, Nov 2020.
  • (67) ——, “Fluctuation theorems for multipartite processes,” arXiv:2003.11144, 2020.
  • (68) ——, “Minimal entropy production rate of interacting systems,” New Journal of Physics, vol. 22, no. 11, p. 113013, 2020.
  • (69) ——, “Strengthened Landauer bound for composite systems,” arXiv:2007.10950, 2020.
  • (70) The reader should be aware that those papers used different terminology from this paper. In wolpert2020fluctuation ; wolpert2020minimal , each degree of freedom vVv\in V is called a “subsystem”, the modular decomposition 𝒞\mathcal{C} is called a “unit structure”, while each A𝒞A\in\mathcal{C} is called a “unit”.
  • (71) M. Lostaglio, D. Jennings, and T. Rudolph, “Description of quantum coherence in thermodynamic processes requires constraints beyond free energy,” Nature Communications, vol. 6, no. 1, p. 6383, May 2015.
  • (72) J. P. Santos, L. C. Céleri, G. T. Landi, and M. Paternostro, “The role of quantum coherence in non-equilibrium entropy production,” npj Quantum Information, vol. 5, no. 1, p. 23, Dec. 2019.
  • (73) C. Elphick and P. Wocjan, “Spectral lower bounds for the quantum chromatic number of a graph,” Journal of Combinatorial Theory, Series A, vol. 168, pp. 338–347, 2019.
  • (74) M. L. Eaton, “Group invariance applications in statistics,” in Regional conference series in Probability and Statistics.   JSTOR, 1989, pp. i–133.
  • (75) E. Chitambar and G. Gour, “Quantum resource theories,” Reviews of Modern Physics, vol. 91, no. 2, p. 025001, 2019.
  • (76) R. Gallego, J. Eisert, and H. Wilming, “Thermodynamic work from operational principles,” New Journal of Physics, vol. 18, no. 10, p. 103017, Oct. 2016.
  • (77) F. G. S. L. Brandão, M. Horodecki, J. Oppenheim, J. M. Renes, and R. W. Spekkens, “Resource theory of quantum states out of thermal equilibrium,” Phys. Rev. Lett., vol. 111, p. 250404, Dec 2013.
  • (78) M. Lostaglio, M. P. Müller, and M. Pastena, “Stochastic independence as a resource in small-scale thermodynamics,” Phys. Rev. Lett., vol. 115, p. 150402, Oct 2015.
  • (79) P. Faist and R. Renner, “Fundamental work cost of quantum processes,” Phys. Rev. X, vol. 8, p. 021011, Apr 2018.
  • (80) N. Yunger Halpern and J. M. Renes, “Beyond heat baths: Generalized resource theories for small-scale thermodynamics,” Phys. Rev. E, vol. 93, p. 022126, Feb 2016.
  • (81) I. Marvian and R. W. Spekkens, “Extending Noether’s theorem by quantifying the asymmetry of quantum states,” Nature Communications, vol. 5, no. 1, Sep. 2014.
  • (82) ——, “Asymmetry properties of pure quantum states,” Phys. Rev. A, vol. 90, p. 014102, Jul 2014.
  • (83) ——, “Modes of asymmetry: The application of harmonic analysis to symmetric quantum dynamics and quantum reference frames,” Phys. Rev. A, vol. 90, p. 062110, Dec 2014.
  • (84) Z.-W. Liu, X. Hu, and S. Lloyd, “Resource destroying maps,” Physical Review Letters, vol. 118, no. 6, p. 060502, 2017.
  • (85) H. K. Lee, C. Kwon, and H. Park, “Fluctuation theorems and entropy production with odd-parity variables,” Physical Review Letters, vol. 110, no. 5, p. 050602, 2013.
  • (86) S. Ito, “Stochastic thermodynamic interpretation of information geometry,” Physical Review Letters, vol. 121, no. 3, p. 030605, 2018.
  • (87) K. Takahashi, “Shortcuts to adiabaticity applied to nonequilibrium entropy production: an information geometry viewpoint,” New Journal of Physics, vol. 19, no. 11, p. 115007, 2017.
  • (88) S. Ito, M. Oizumi, and S.-i. Amari, “Unified framework for the entropy production and the stochastic interaction based on information geometry,” Physical Review Research, vol. 2, no. 3, p. 033048, 2020.
  • (89) S. B. Nicholson, A. del Campo, and J. R. Green, “Nonequilibrium uncertainty principle from information geometry,” Physical Review E, vol. 98, no. 3, p. 032106, 2018.
  • (90) S. Ito and A. Dechant, “Stochastic time evolution, information geometry, and the cramér-rao bound,” Physical Review X, vol. 10, no. 2, p. 021056, 2020.
  • (91) T. Nakamura, H. Hasegawa, and D. Driebe, “Reconsideration of the generalized second law based on information geometry,” Journal of Physics Communications, vol. 3, no. 1, p. 015015, 2019.
  • (92) D. Petz, Quantum Information Theory and Quantum Statistics, ser. Theoretical and Mathematical Physics.   Berlin: Springer, 2008.
  • (93) E. T. Jaynes, “The gibbs paradox,” in Maximum entropy and bayesian methods.   Springer, 1992, pp. 1–21.
  • (94) M. Weilenmann, L. Kraemer, P. Faist, and R. Renner, “Axiomatic relation between thermodynamic and information-theoretic entropies,” Physical Review Letters, vol. 117, no. 26, p. 260601, 2016.
  • (95) A. Bérut, A. Arakelyan, A. Petrosyan, S. Ciliberto, R. Dillenschneider, and E. Lutz, “Experimental verification of Landauer’s principle linking information and thermodynamics,” Nature, vol. 483, no. 7388, pp. 187–189, 2012.
  • (96) É. Roldán, I. A. Martinez, J. M. Parrondo, and D. Petrov, “Universal features in the energetics of symmetry breaking,” Nature Physics, vol. 10, no. 6, pp. 457–461, 2014.
  • (97) J. V. Koski, V. F. Maisi, T. Sagawa, and J. P. Pekola, “Experimental observation of the role of mutual information in the nonequilibrium dynamics of a Maxwell demon,” Physical Review Letters, vol. 113, no. 3, p. 030601, 2014.
  • (98) K. Shizume, “Heat generation required by information erasure,” Physical Review E, vol. 52, no. 4, p. 3495, 1995.
  • (99) Z. Gong, Y. Lan, and H. T. Quan, “Stochastic thermodynamics of a particle in a box,” Physical Review Letters, vol. 117, no. 18, p. 180603, 2016.
  • (100) L. Szilard, “Über die entropieverminderung in einem thermodynamischen system bei eingriffen intelligenter wesen,” Zeitschrift für Physik, vol. 53, no. 11-12, pp. 840–856, 1929.
  • (101) K. Proesmans, C. Driesen, B. Cleuren, and C. Van den Broeck, “Efficiency of single-particle engines,” Physical review E, vol. 92, no. 3, p. 032105, 2015.
  • (102) T. Hondou, “Equation of state in a small system: Violation of an assumption of Maxwell’s demon,” EPL (Europhysics Letters), vol. 80, no. 5, p. 50001, 2007.
  • (103) D. Bhat, S. Sabhapandit, A. Kundu, and A. Dhar, “Unusual equilibration of a particle in a potential with a thermal wall,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2017, no. 11, p. 113210, 2017.
  • (104) A. Dhar, A. Kundu, S. N. Majumdar, S. Sabhapandit, and G. Schehr, “Run-and-tumble particle in one-dimensional confining potentials: Steady-state, relaxation, and first-passage properties,” Physical Review E, vol. 99, no. 3, p. 032132, 2019.
  • (105) S. Watanabe, “Information theoretical analysis of multivariate correlation,” IBM Journal of research and development, vol. 4, no. 1, pp. 66–82, 1960.
  • (106) I. Csiszar and J. Körner, Information theory: coding theorems for discrete memoryless systems.   Cambridge University Press, 2011.
  • (107) T. M. Cover and J. A. Thomas, Elements of information theory.   John Wiley & Sons, 2006.
  • (108) C. Van den Broeck and M. Esposito, “Ensemble and trajectory thermodynamics: A brief introduction,” Physica A: Statistical Mechanics and its Applications, vol. 418, pp. 6–16, 2015.
  • (109) U. Seifert, “Entropy production along a stochastic trajectory and an integral fluctuation theorem,” Physical Review Letters, vol. 95, no. 4, p. 040602, 2005.
  • (110) M. Esposito and C. Van den Broeck, “Three detailed fluctuation theorems,” Phys. Rev. Lett., vol. 104, p. 090601, Mar 2010.
  • (111) R. E. Spinney and I. J. Ford, “Entropy production in full phase space for continuous stochastic dynamics,” Physical Review E, vol. 85, no. 5, p. 051113, 2012.
  • (112) Y. Murashita, K. Funo, and M. Ueda, “Nonequilibrium equalities in absolutely irreversible processes,” Physical Review E, vol. 90, no. 4, p. 042110, 2014.
  • (113) C. Jarzynski, “Equalities and inequalities: irreversibility and the second law of thermodynamics at the nanoscale,” Annu. Rev. Condens. Matter Phys., vol. 2, no. 1, pp. 329–351, 2011.
  • (114) J. J. Benedetto and W. Czaja, Integration and Modern Analysis.   Boston: Birkhäuser Boston, 2009.
  • (115) A. C. Barato and U. Seifert, “Coherence of biochemical oscillations is bounded by driving force and network topology,” Physical Review E, vol. 95, no. 6, p. 062409, 2017.
  • (116) C. N. Yang, “The spontaneous magnetization of a two-dimensional ising model,” Physical Review, vol. 85, no. 5, p. 808, 1952.
  • (117) K.-J. Engel and R. Nagel, One-Parameter Semigroups for Linear Evolution Equations, ser. Graduate Texts in Mathematics.   New York: Springer-Verlag, 2000.
  • (118) A. Gomez-Marin, J. M. Parrondo, and C. Van den Broeck, “Lower bounds on dissipation upon coarse graining,” Physical Review E, vol. 78, no. 1, p. 011107, 2008.

Appendix A Derivations for Sections III and IV

A.1 Proofs of Theorems 1 and 2

We first prove a few helpful lemmas.

Lemma 1.

If LL obeys eLϕ(p)=ϕ(eLp)e^{L}\phi(p)=\phi(e^{L}p) for all p𝒫p\in\mathcal{P}, then LL has a stationary distribution πimgϕ\pi\in\mathrm{img}\;\phi.

Proof.

Let qq be some stationary distribution of LL. Then,

eLϕ(q)=ϕ(eLq)=ϕ(q).\displaystyle e^{L}\phi(q)=\phi(e^{L}q)=\phi(q). (105)

Thus, ϕ(q)imgϕ\phi(q)\in\mathrm{img}\;\phi is stationary under LL. ∎

Lemma 2.

If eτLϕ(p)=ϕ(eτLp)e^{\tau L}\phi(p)=\phi(e^{\tau L}p) for all p𝒫p\in\mathcal{P} and τ0\tau\geq 0, then for any r,s𝒫r,s\in\mathcal{P},

ddtD(r(t)ϕ(s(t)))0,\displaystyle-{\textstyle\frac{d}{dt}}D(r(t)\|\phi(s(t)))\geq 0,

where tr=Lr{\textstyle\partial_{t}}r=Lr and ts=Ls{\textstyle\partial_{t}}s=Ls.

Proof.

Expand the derivative as

ddtD(r(t)ϕ(s(t)))\displaystyle-{\textstyle\frac{d}{dt}}D(r(t)\|\phi(s(t)))
=limτ01τ[D(rϕ(s))D(eτLrϕ(eτLs))]\displaystyle\quad=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(r\|\phi(s))-D(e^{\tau L}r\|\phi(e^{\tau L}s))\right]
=limτ01τ[D(rϕ(s))D(eτLreτLϕ(s))]0.\displaystyle\quad=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(r\|\phi(s))-D(e^{\tau L}r\|e^{\tau L}\phi(s))\right]\geq 0.

where in the last line we used the commutativity relation and the data processing inequality for KL divergence csiszar_information_2011 . ∎

Lemma 3.

Consider a protocol {L(t):t[0,1]}\{L(t):t\in[0,1]\} and an operator ϕ\phi that obeys Eqs. 14 and 16. Then

ϕ(p(t))=ϕ(p)(t),\phi(p(t))=\phi(p)(t),

where p(t)p(t) is the distribution at time tt given initial distribution pp, and ϕ(p)(t)\phi(p)(t) is the distribution at time tt given initial distribution ϕ(p)\phi(p).

Proof.

Using Lemma 2 with r=ϕ(p)(t)r=\phi(p)(t) and s=p(t)s=p(t),

ddtD(ϕ(p)(t)ϕ(p(t)))0.\displaystyle{\textstyle\frac{d}{dt}}D(\phi(p)(t)\|\phi(p(t)))\leq 0. (106)

Note that

D([ϕ(p)](0)ϕ(p(0)))=D(ϕ(p)ϕ(p))=0,D([\phi(p)](0)\|\phi(p(0)))=D(\phi(p)\|\phi(p))=0,

and that D(ϕ(p)(t)ϕ(p(t)))0D(\phi(p)(t)\|\phi(p(t)))\geq 0 for all tt by non-negativity of KL divergence. Combined with Eq. 106, this implies D(ϕ(p)(t)ϕ(p(t)))=0D(\phi(p)(t)\|\phi(p(t)))=0 for all tt, and therefore ϕ(p)(t)=ϕ(p(t))\phi(p)(t)=\phi(p(t)) (cover_elements_2006, , Thm. 8.6.1). ∎

We are now ready to prove Theorems 1 and 2. Note that in the proof of Theorem 1, we make the assumption that there is some stationary distribution πL\pi^{L} of LL such that D(pπL)<D(p\|\pi^{L})<\infty, and similarly in Theorem 2 we make the assumption that D(p(t)πL(t))<D(p(t)\|\pi^{L(t)})<\infty at all t[0,1]t\in[0,1]. These are weak and physically realistic assumptions, which essentially mean that we restrict our attention to distributions with finite nonequilibrium free energy (see Eq. 20).

In addition, in these proofs we will use that the EP rate incurred by distribution pp under the generator LL with stationary distribution π\pi can be written as

Σ˙(p,L)\displaystyle\dot{\Sigma}(p,L) =limτ01τ[D(pπ)D(eτLpπ)].\displaystyle=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(p\|\pi)-D(e^{\tau L}p\|\pi)\right]. (107)

This can be derived from Eq. 11, by noting that the KL divergence can be written as

D(pπ)=S(p)𝔼p[lnπ],\displaystyle D(p\|\pi)=-S(p)-\mathbb{E}_{p}\big{[}\ln\pi\big{]}, (108)

where 𝔼p\mathbb{E}_{p} indicates expectation under the distribution pp, and then using that

xtpx(t)lnpx=limτ01τ[S(eτLp)S(p)]\displaystyle-\sum_{x}{\textstyle\partial_{t}}p_{x}(t)\ln p_{x}=\lim_{\tau\to 0}\frac{1}{\tau}\left[S(e^{\tau L}p)-S(p)\right] (109)
xtpx(t)lnπx=limτ01τ[𝔼eτLp[lnπ]𝔼p[lnπ]],\displaystyle\sum_{x}{\textstyle\partial_{t}}p_{x}(t)\ln\pi_{x}=\lim_{\tau\to 0}\frac{1}{\tau}\left[\mathbb{E}_{e^{\tau L}p}[\ln\pi]-\mathbb{E}_{p}[\ln\pi]\right], (110)

where tpx(t){\textstyle\partial_{t}}p_{x}(t) is defined as in Eq. 10. (As usual, summations should be replaced by integrals for continuous-state systems.)

Proof of Theorem 1.

Consider a generator LL with a stationary distribution π\pi, and some distribution p𝒫p\in\mathcal{P} such that D(pπ)<D(p\|\pi)<\infty. By Lemma 1, ϕ(π)imgϕ\phi(\pi)\in\mathrm{img}\;\phi is also a stationary distribution of LL. If LL has a unique stationary distribution, then π=ϕ(π)\pi=\phi(\pi) and so πimgϕ\pi\in\mathrm{img}\;\phi; otherwise, as long as D(pϕ(π))<D(p\|\phi(\pi))<\infty (see Note3 ), we can assume that ϕ(π)=π\phi(\pi)=\pi in Eq. 107. Then, assuming that πimgϕ\pi\in\mathrm{img}\;\phi, we rewrite the term in the brackets in Eq. 107 as

D(pϕ(p))+D(ϕ(p)π)\displaystyle D(p\|\phi(p))+D(\phi(p)\|\pi)
D(eτLpϕ(eτLp))D(ϕ(eτLp)π)\displaystyle\qquad\qquad-D(e^{\tau L}p\|\phi(e^{\tau L}p))-D(\phi(e^{\tau L}p)\|\pi)
=D(pϕ(p))D(eτLpϕ(eτLp))\displaystyle=D(p\|\phi(p))-D(e^{\tau L}p\|\phi(e^{\tau L}p))
+D(ϕ(p)π)D(ϕ(eτLp)π)\displaystyle\qquad\qquad+D(\phi(p)\|\pi)-D(\phi(e^{\tau L}p)\|\pi)
=D(pϕ(p))D(eτLpϕ(eτLp))\displaystyle=D(p\|\phi(p))-D(e^{\tau L}p\|\phi(e^{\tau L}p))
+D(ϕ(p)π)D(eτLϕ(p)π),\displaystyle\qquad\qquad+D(\phi(p)\|\pi)-D(e^{\tau L}\phi(p)\|\pi),

where we used the Pythagorean identity of Eq. 14, rearranged, and then used the commutativity relation of Eq. 16. Plugging into Eq. 107 gives

Σ˙(p,L)\displaystyle\dot{\Sigma}(p,L) =limτ01τ[D(pϕ(p))D(eτLpϕ(eτLp))]\displaystyle=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(p\|\phi(p))-D(e^{\tau L}p\|\phi(e^{\tau L}p))\right]
+limτ01τ[D(ϕ(p)π)D(eτLϕ(p)π)]\displaystyle\quad+\lim_{\tau\to 0}\frac{1}{\tau}\left[D(\phi(p)\|\pi)-D(e^{\tau L}\phi(p)\|\pi)\right]
=ddtD(p(t)ϕ(p(t)))+Σ˙(ϕ(p),L).\displaystyle=-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))+\dot{\Sigma}(\phi(p),L).

The non-negativity of ddtD(p(t)ϕ(p(t)))-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t))) follows by taking r=s=pr=s=p in Lemma 2. ∎

Proof of of Theorem 2.

Using Eqs. 12 and 1, write

Σ(pp)=01Σ˙(p(t),L(t))𝑑t\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})=\int_{0}^{1}\dot{\Sigma}(p(t),L(t))\,dt
=01ddtD(p(t)ϕ(p(t)))𝑑t+01Σ˙(ϕ(p(t)),L(t))𝑑t.\displaystyle\;\;=-\int_{0}^{1}{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))\,dt+\int_{0}^{1}\dot{\Sigma}(\phi(p(t)),L(t))\,dt.

Both integrals have a simple expression. First, by the fundamental theorem of calculus,

01ddtD(p(t)ϕ(p(t)))𝑑t=D(pϕ(p))D(pϕ(p)).\displaystyle-\int_{0}^{1}{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))\,dt=D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime})).

This expression is non-negative, since ddtD(p(t)ϕ(p(t)))0-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))\geq 0 by Lemma 2. Second, using Lemma 3,

01Σ˙(ϕ(p(t)),L(t))𝑑t\displaystyle\int_{0}^{1}\dot{\Sigma}(\phi(p(t)),L(t))\,dt =01Σ˙(ϕ(p)(t),L(t))𝑑t\displaystyle=\int_{0}^{1}\dot{\Sigma}(\phi(p)(t),L(t))\,dt
=Σ(ϕ(p)ϕ(p)).\displaystyle\qquad=\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})).

A.2 Trajectory-level version of Eq. 19

Stochastic thermodynamics has shown that thermodynamic properties of physical processes (such as heat, work, and EP) can be defined as stochastically fluctuating quantities at the level of individual trajectories. We first briefly review the basic concepts of stochastic thermodynamics (for more details, the reader should consult van2015ensemble ; seifert2012stochastic ; Seifert2005 ; esposito_three_2010 ).

Let 𝒙=(x,,x)\bm{x}=(x,\dots,x^{\prime}) indicate a continuous-time trajectory of system states 𝒙\bm{x} over time interval t[0,1]t\in[0,1], where xx and xx^{\prime} indicate the initial and final system states respectively, and let P(𝒙|x)P(\bm{x}|x) indicate the conditional probability of observing trajectory 𝒙\bm{x} given initial state xx. For a given initial distribution p(x)p(x), the probability of observing trajectory 𝒙\bm{x} is then given by p(𝒙)=p(x)P(𝒙|x)p(\bm{x})=p(x)P(\bm{x}|x), and the corresponding final distribution is given by p(x)=P(x|x)p(x)𝑑xp^{\prime}(x^{\prime})=\int P(x^{\prime}|x)p(x)dx. In addition, let P~(𝒙~|x)\tilde{P}(\tilde{\bm{x}}|x^{\prime}) indicate the conditional probability of observing the time-reversed and trajectory 𝒙~=(x,,x)\tilde{\bm{x}}=({x^{\prime}},\dots,{x}) given the final state x{x^{\prime}} under a “time-reversed” driving protocol seifert2012stochastic .

Trajectory-level EP is then defined in terms of the asymmetry between forward and reversed trajectory probabilities,

σp(𝒙)=lnp(x)lnp(x)+lnP(𝒙|x)P~(𝒙~|x),\displaystyle\sigma_{p}(\bm{x})=\ln p(x)-\ln p^{\prime}(x^{\prime})+\ln\frac{P(\bm{x}|x)}{\tilde{P}(\tilde{\bm{x}}|x^{\prime})}, (111)

which is sometimes referred to as a detailed fluctuation theorem. (The above expression should be slightly modified the presence of odd-parity variables such as momentum, though in a way which does not change our derivations; see spinney2012entropy .) The expectation of trajectory-level EP across all trajectories is equal to the standard expression for integrated EP as used in the main text,

σp(𝒙)=Σ(pp),\displaystyle\langle\sigma_{p}(\bm{x})\rangle=\Sigma(p\!\shortrightarrow\!p^{\prime}), (112)

where \langle\cdot\rangle refers to expectations under the trajectory distribution p(𝒙)p(\bm{x}). Furthermore, by a simple manipulation, the detailed fluctuation theorem in Eq. 111 leads to the following integral fluctuation theorem for EP,

eσp\displaystyle\langle e^{-\sigma_{p}}\rangle =p(x)>0p(x)P(𝒙|x)p(x)P~(𝒙~|x)p(x)P(𝒙|x)D𝒙\displaystyle=\int_{p(x)>0}p(x)P(\bm{x}|x)\frac{p^{\prime}(x^{\prime})\tilde{P}(\tilde{\bm{x}}|x^{\prime})}{p(x)P(\bm{x}|x)}D\bm{x}
=p(x)>0p(x)P~(𝒙~|x)D𝒙=γ,\displaystyle=\int_{p(x)>0}p^{\prime}(x^{\prime})\tilde{P}(\tilde{\bm{x}}|x^{\prime})D\bm{x}=\gamma, (113)

where D𝒙\int\;\cdot\;D\bm{x} is the path integral. In this result, γ(0,1]\gamma\in(0,1] reflects the “absolute irreversibility” of the process under initial distribution pp murashita2014nonequilibrium . When pp has full support, γ=1\gamma=1, giving the standard integral fluctuation theorem, eσp=1\langle e^{-\sigma_{p}}\rangle=1.

Now consider the extra trajectory-level EP incurred by some trajectory 𝒙\bm{x} on initial distribution pp, additional to the trajectory-level EP incurred by the same trajectory on initial distribution ϕ(p)\phi(p),

m(𝒙)\displaystyle m(\bm{x}) :=σp(𝒙)σϕ(p)(𝒙)\displaystyle:=\sigma_{p}(\bm{x})-\sigma_{\phi(p)}(\bm{x}) (114)
=lnp(x)ϕ(p)(x)lnp(x)ϕ(p)(x)\displaystyle=\ln\frac{p(x)}{\phi(p)(x)}-\ln\frac{p^{\prime}(x^{\prime})}{\phi(p)^{\prime}(x^{\prime})} (115)
=lnp(x)ϕ(p)(x)lnp(x)ϕ(p)(x)\displaystyle=\ln\frac{p(x)}{\phi(p)(x)}-\ln\frac{p^{\prime}(x^{\prime})}{\phi(p^{\prime})(x^{\prime})} (116)

where in the second line we used that the last term in Eq. 111 cancels (as it does not depend on the initial or final distributions) and in the third line we used that ϕ(p)=ϕ(p)\phi(p)^{\prime}=\phi(p^{\prime}) by Lemma 3. Eq. 114 appears in the main text as Eq. 30. It is easy to verify that m(𝒙)m(\bm{x}) agrees in expectation with the contraction of KL divergence between pp and ϕ(p)\phi(p),

m\displaystyle\langle m\rangle =D(pϕ(p))D(pϕ(p)),\displaystyle=D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime})), (117)

where, as before, \langle\cdot\rangle refers to expectations under the trajectory distribution p(𝒙)p(\bm{x}). Then, given Theorem 2, this implies that the expectation m(𝒙)m(\bm{x}) is also equal to the extra total EP incurred by initial distribution pp rather than the accessible distribution ϕ(p)\phi(p),

m\displaystyle\langle m\rangle =Σ(pp)Σ(ϕ(p)ϕ(p)).\displaystyle=\Sigma(p\!\shortrightarrow\!p^{\prime})-\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})). (118)

In kolchinsky2021state , it is shown that m(𝒙)m(\bm{x}) obeys a fluctuation theorem (see also kwon2019fluctuation ). We re-derive the relevant results here. First, a simple rearrangement of Eq. 115 gives the following detailed fluctuation theorem,

m(𝒙)\displaystyle m(\bm{x}) :=lnp(x)p(x)+lnP(𝒙|x)Q(𝒙~|x),\displaystyle:=\ln\frac{p(x)}{p^{\prime}(x^{\prime})}+\ln\frac{P(\bm{x}|x)}{Q(\tilde{\bm{x}}|x^{\prime})}, (119)

where the conditional distribution Q(𝒙~|x)Q(\tilde{\bm{x}}|x^{\prime}) is given by

Q(𝒙~|x):=P(𝒙|x)ϕ(p)(x)ϕ(p)(x).\displaystyle Q(\tilde{\bm{x}}|x^{\prime}):=\frac{P(\bm{x}|x)\phi(p)(x)}{\phi(p)^{\prime}(x^{\prime})}.

In words, Q(𝒙~|x)Q(\tilde{\bm{x}}|x^{\prime}) is the Bayesian posterior probability of trajectory givenfinalstate\bm{$}givenfinalstatex’,whentheprocessbeginsoninitialdistribution,whentheprocessbeginsoninitialdistributionϕ(p).AsimilarderivationasinEq. 113showsthat.Asimilarderivationasin\lx@cref{creftype~refnum}{eq:app53}showsthatmobeysanintegralfluctuationtheorem,em=p(x)>0p(x)Q(𝒙~|x)D𝒙=χ.Hereobeysanintegralfluctuationtheorem,\begin{aligned} \langle e^{-m}\rangle=\int_{p(x)>0}p^{\prime}(x^{\prime}){Q}(\tilde{\bm{x}}|x^{\prime})D\bm{x}=\chi.\end{aligned}Hereχ∈(0,1]indicatestheabsoluteirreversibilityoftheprocessoninitialdistributionindicatestheabsoluteirreversibilityoftheprocessoninitialdistributionprelativetoinitialdistributionrelativetoinitialdistributionϕ(p).χisequalto1whenisequalto1whenpandandϕ(p)havethesamesupport,whichthenleadstoastandardintegralfluctuationtheoremhavethesamesupport,whichthenleadstoastandardintegralfluctuationtheorem⟨e^-m ⟩=1.Importantly,Section A.2impliesthattheprobabilitythatthetrajectorylevelEPoninitialdistribution.\par Importantly,\lx@cref{creftype~refnum}{eq:app54}impliesthattheprobabilitythatthetrajectory-levelEPoninitialdistributionpisisξlessthanthetrajectorylevelEPoninitialdistributionlessthanthetrajectory-levelEPoninitialdistributionϕ(p)isexponentiallysuppressed,P[σp<σϕ(p)ξ]=(a)P[m<ξ](b)χeξ(c)eξ.Here,isexponentiallysuppressed,\begin{aligned} \mathrm{P}[\sigma_{p}<\sigma_{\phi(p)}-\xi]&\stackrel{{\scriptstyle(a)}}{{=}}\mathrm{P}[m<-\xi]\stackrel{{\scriptstyle(b)}}{{\leq}}\chi e^{-\xi}\stackrel{{\scriptstyle(c)}}{{\leq}}e^{-\xi}.\end{aligned}Here,(a)usesthedefinitionofusesthedefinitionofm(x),(b)usesastandardderivationinstochasticthermodynamics(seejarzynski_equalities_2011 ,ortheappendixinkolchinsky2021state ),whileusesastandardderivationinstochasticthermodynamics(see\cite[cite]{\@@bibref{Authors Phrase1YearPhrase2}{jarzynski_equalities_2011}{\@@citephrase{(}}{\@@citephrase{)}}},ortheappendixin\cite[cite]{\@@bibref{Authors Phrase1YearPhrase2}{kolchinsky2021state}{\@@citephrase{(}}{\@@citephrase{)}}}),while(c)usesthatusesthatχ∈(0,1].

Appendix B Symmetry constraints

B.1 ϕ𝒢\phi_{\mathcal{G}} obeys the Pythagorean identity, Eq. 14

In the following derivations, all integrals should be understood in the Lebesgue sense. For discrete state systems, integrals over XX can be replaced by summations.

The state space XX is assumed to be Borel measurable. Similarly, we assume that the action of the group 𝒢\mathcal{G} (i.e., the function 𝒢×XX:(g,x)g(x)\mathcal{G}\times X\to X:(g,x)\mapsto g(x)) is Borel measurable. Note that these assumptions imply that for any probability distribution p𝒫p\in\mathcal{P}, the function (g,x)p(g(x))(g,x)\mapsto p(g(x)) is measurable, since it is the composition of two Borel measurable functions: (g,x)g(x)(g,x)\mapsto g(x) and xp(x)x\mapsto p(x).

We begin with a few intermediate results.

Lemma 4.

For any p𝒫p\in\mathcal{P}, g𝒢g\in\mathcal{G}, and xXx\in X,

ϕ𝒢(p)(x)=ϕ𝒢(p)(g(x)).\phi_{\mathcal{G}}(p)(x)=\phi_{\mathcal{G}}(p)(g(x)).
Proof.

Using the definition of ϕ𝒢\phi_{\mathcal{G}} in Eq. 42, write

ϕ𝒢(p)(g(x))\displaystyle\phi_{\mathcal{G}}(p)(g(x)) =𝒢p(g(g(x)))𝑑μ(g)\displaystyle=\int_{\mathcal{G}}p(g^{\prime}(g(x)))\,d\mu(g^{\prime})
=𝒢p(g(x))𝑑μ(g)=ϕ𝒢(p)(x),\displaystyle={\textstyle\int_{\mathcal{G}}}\,p(g^{\prime}(x))\,d\mu(g^{\prime})=\phi_{\mathcal{G}}(p)(x),

where we performed a change of variables xg1(x)x\mapsto g^{-1}(x) and used the invariance properties 𝒢\mathcal{G} and the Haar measure μ\mu. ∎

Lemma 5.

For any p𝒫p\in\mathcal{P}, measurable set ΩX\Omega\subseteq X, and function f:Xf:X\to\mathbb{R},

Ωp(x)f(x)=Ωϕ𝒢(p)(x)f(x)𝑑x\int_{\Omega}p(x)f(x)=\int_{\Omega}\phi_{\mathcal{G}}(p)(x)f(x)dx (120)

if the following three conditions hold: (1) g(Ω)=Ωg(\Omega)=\Omega for all g𝒢g\in\mathcal{G}, (2) f(x)=f(g(x))f(x)=f(g(x)) for all xXx\in X and g𝒢g\in\mathcal{G}, (3) either |Ωp(x)f(x)𝑑x|<|\int_{\Omega}p(x)f(x)\,dx|<\infty, or ff is measurable and non-negative.

Proof.

To begin, write the left hand side of Eq. 120 as

Ωp(x)f(x)𝑑x\displaystyle\int_{\Omega}p(x)f(x)\,dx =𝒢[Ωp(x)f(x)𝑑x]𝑑μ(g)\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(x)f(x)\,dx\right]d\mu(g)
=𝒢[g1(Ω)p(g(x))f(g(x))𝑑x]𝑑μ(g)\displaystyle=\int_{\mathcal{G}}\left[\int_{g^{-1}(\Omega)}p(g(x))f(g(x))\,dx\right]d\mu(g)
=𝒢[Ωp(g(x))f(x)𝑑x]𝑑μ(g).\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(g(x))f(x)\,dx\right]d\mu(g). (121)

In the second line, we substituted xg(x)x\mapsto g(x) within each inner integral, while using that each gg is a rigid transformation (so the absolute value of its Jacobian is 1). In the last line, we used conditions (1) and (2).

We now show that we can exchange the order of integrals in Eq. 121 using condition (3) and Tonelli’s theorem. First, if ff is measurable and non-negative, then the function xp(g(x))f(x)x\mapsto p(g(x))f(x) is non-negative and measurable (since it is a product of two non-negative measurable functions), so the integrals can be exchanged by (Thm 3.7.7, benedetto_integration_2009, ). Alternatively, assume that |Ωp(x)f(x)𝑑x|<|\int_{\Omega}p(x)f(x)\,dx|<\infty, which means that the function xp(x)f(x)x\mapsto p(x)f(x) is integrable. This implies that

\displaystyle\infty >Ωp(x)|f(x)|𝑑x\displaystyle>\int_{\Omega}p(x)|f(x)|\,dx
=𝒢[Ωp(x)|f(x)|𝑑x]𝑑μ(g)\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(x)|f(x)|\,dx\right]d\mu(g) (122)
=𝒢[g1(Ω)p(g(x))|f(g(x))|𝑑x]𝑑μ(g)\displaystyle=\int_{\mathcal{G}}\left[\int_{g^{-1}(\Omega)}p(g(x))|f(g(x))|\,dx\right]d\mu(g)
=𝒢[Ωp(g(x))|f(x)|𝑑x]𝑑μ(g)\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(g(x))|f(x)|\,dx\right]d\mu(g) (123)

where the first line follows from definition of Lebesgue integrability, while the rest follows from the same steps as Eq. 121. Given Eq. 123, the function (g,x)p(g(x))f(x)(g,x)\mapsto p(g(x))f(x) must be integrable, which again allows us to exchange the order of the integrals in Eq. 121 (Thm 3.7.8, benedetto_integration_2009, ).

We then derive our result by rewriting Eq. 121 as

Ωp(x)f(x)𝑑x\displaystyle\int_{\Omega}p(x)f(x)\,dx =Ω[𝒢p(g(x))f(x)𝑑μ(g)]𝑑x\displaystyle=\int_{\Omega}\left[\int_{\mathcal{G}}p(g(x))f(x)\,d\mu(g)\right]dx
=Ωϕ𝒢(p)(x)f(x)𝑑x,\displaystyle=\int_{\Omega}\phi_{\mathcal{G}}(p)(x)f(x)\,dx,

where we used the definition of ϕ𝒢\phi_{\mathcal{G}}. ∎

Finally, we prove that ϕ𝒢\phi_{\mathcal{G}} obeys the Pythagorean identity.

Proposition 1.

For any p,q𝒫p,q\in\mathcal{P} such that D(pϕ𝒢(q))<D(p\|\phi_{\mathcal{G}}(q))<\infty,

D(pϕ𝒢(q))=D(pϕ𝒢(p))+D(ϕ𝒢(p)ϕ𝒢(q)).\displaystyle D(p\|\phi_{\mathcal{G}}(q))=D(p\|\phi_{\mathcal{G}}(p))+D(\phi_{\mathcal{G}}(p)\|\phi_{\mathcal{G}}(q)). (124)
Proof.

For any p𝒫p\in\mathcal{P}, we indicate the support set as suppp={xX:p(x)>0}\mathrm{supp}\;p=\{x\in X:p(x)>0\}. We first prove that

supppsuppϕ𝒢(p)suppϕ𝒢(q).\mathrm{supp}\;p\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(q)}. (125)

By the definition of ϕ𝒢\phi_{\mathcal{G}} in Eq. 42, if ϕ𝒢(p)(x)>0\phi_{\mathcal{G}}(p)(x)>0 for some xXx\in X, then p(g(x))>0p(g(x))>0 for that xx and some g𝒢g\in\mathcal{G}. In addition, the assumption that D(pϕ𝒢(q))<D(p\|\phi_{\mathcal{G}}(q))<\infty implies that supppsuppϕ𝒢(q)\mathrm{supp}\;p\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(q)} (cover_elements_2006, ) (except for a set of measure 0, which we can safely ignore). Combining these facts implies that if ϕ𝒢(p)(x)>0\phi_{\mathcal{G}}(p)(x)>0 for some xx, then ϕ𝒢(q)(g(x))>0\phi_{\mathcal{G}}(q)(g(x))>0 for that xx — and therefore also ϕ𝒢(q)(x)>0\phi_{\mathcal{G}}(q)(x)>0 since ϕ𝒢(q)\phi_{\mathcal{G}}(q) is invariant under 𝒢\mathcal{G}, Lemma 4. This proves that suppϕ𝒢(p)suppϕ𝒢(q)\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(q)}. Finally, by Lemma 4 and Lemma 5,

suppϕ𝒢(p)p(x)𝑑x=suppϕ𝒢(p)ϕ𝒢(p)(x)𝑑x=1,\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\,dx=\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}\phi_{\mathcal{G}}(p)(x)\,dx=1,

which implies that supppsuppϕ𝒢(p)\mathrm{supp}\;p\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(p)} (up to a set of measure 0).

Next, write the KL divergence on the left hand side of Eq. 124 as (Eq. 8.58, cover_elements_2006, )

D(pϕ𝒢(q))=supppp(x)lnp(x)ϕ𝒢(q)(x)dx\displaystyle D(p\|\phi_{\mathcal{G}}(q))=\int_{\mathrm{supp}\;p}p(x)\ln\frac{p(x)}{\phi_{\mathcal{G}}(q)(x)}dx
=D(pϕ𝒢(p))+supppp(x)lnϕ𝒢(p)(x)ϕ𝒢(q)(x)dx\displaystyle\quad=D(p\|\phi_{\mathcal{G}}(p))+\int_{\mathrm{supp}\;p}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx
=D(pϕ𝒢(p))+suppϕ𝒢(p)p(x)lnϕ𝒢(p)(x)ϕ𝒢(q)(x)dx,\displaystyle\quad=D(p\|\phi_{\mathcal{G}}(p))+\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx, (126)

where the last line uses Eq. 125 (in particular, that supppsuppϕ𝒢(p)\mathrm{supp}\;p\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(p)} and p(x)lnϕ𝒢(p)(x)ϕ𝒢(q)(x)=0p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}=0 for xsuppϕ𝒢(p)supppx\in\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}\setminus\mathrm{supp}\;p).

The integral in Eq. 126 is bounded from above by D(pϕ𝒢(q))<D(p\|\phi_{\mathcal{G}}(q))<\infty, since D(pϕ𝒢(p))0D(p\|\phi_{\mathcal{G}}(p))\geq 0. We also show that this integral is bounded from below. Note that ϕ𝒢(p)(x)\phi_{\mathcal{G}}(p)(x) and ϕ𝒢(q)(x)\phi_{\mathcal{G}}(q)(x) are both non-negative measurable functions, which follows from the fact that xp(g(x))x\mapsto p(g(x)) and xp(g(x))x\mapsto p(g(x)) are non-negative measurable functions, the definition of ϕ𝒢\phi_{\mathcal{G}}, and Tonelli’s theorem (Thm 3.7.7, benedetto_integration_2009, ). Thus, the function xϕ𝒢(q)(x)ϕ𝒢(p)(x)x\mapsto\frac{\phi_{\mathcal{G}}(q)(x)}{\phi_{\mathcal{G}}(p)(x)} is also non-negative and measurable, letting us bound the integral in the following way:

suppϕ𝒢(p)p(x)lnϕ𝒢(p)(x)ϕ𝒢(q)(x)dx\displaystyle\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx
ln[suppϕ𝒢(p)p(x)ϕ𝒢(q)(x)ϕ𝒢(p)(x)𝑑x]\displaystyle\qquad\geq-\ln\left[\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\frac{\phi_{\mathcal{G}}(q)(x)}{\phi_{\mathcal{G}}(p)(x)}dx\right]
=ln[suppϕ𝒢(p)ϕ𝒢(p)(x)ϕ𝒢(q)(x)ϕ𝒢(p)(x)𝑑x]\displaystyle\qquad=-\ln\left[\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}\phi_{\mathcal{G}}(p)(x)\frac{\phi_{\mathcal{G}}(q)(x)}{\phi_{\mathcal{G}}(p)(x)}dx\right]
=ln[suppϕ𝒢(p)ϕ𝒢(q)(x)𝑑x]ln1=0.\displaystyle\qquad=-\ln\left[\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}\phi_{\mathcal{G}}(q)(x)\,dx\right]\geq-\ln 1=0.

where in the second line we used Jensen’s inequality, while in the third line we applied Lemma 5. Finally, we use Lemma 5 to rewrite the integral in Eq. 126 as

suppϕ𝒢(p)p(x)lnϕ𝒢(p)(x)ϕ𝒢(q)(x)dx=suppϕ𝒢(p)ϕ𝒢(p)(x)lnϕ𝒢(p)(x)ϕ𝒢(q)(x)dx=D(ϕ𝒢(p)ϕ𝒢(q)).\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx=\\ \int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}\phi_{\mathcal{G}}(p)(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx=D(\phi_{\mathcal{G}}(p)\|\phi_{\mathcal{G}}(q)).

B.2 ϕ𝒢\phi_{\mathcal{G}} obeys the commutativity relation, Eq. 16

It is easy to verify that Φg\Phi_{g} is a linear operator. It then follows that if Φg\Phi_{g} commutes with the linear operator LL, as in Eq. 38, then it also commutes with the exponential eτL=k1k!τkLke^{\tau L}=\sum_{k}\frac{1}{k!}\tau^{k}L^{k}. We then have

eτLϕ𝒢(p)\displaystyle e^{\tau L}\phi_{\mathcal{G}}(p) =eτLΦgp𝑑μ(g)\displaystyle=e^{\tau L}\int\Phi_{g}p\,d\mu(g)
=eτLΦgp𝑑μ(g)\displaystyle=\int e^{\tau L}\Phi_{g}p\,d\mu(g)
=ΦgeτLp𝑑μ(g)\displaystyle=\int\Phi_{g}e^{\tau L}p\,d\mu(g)
=ϕ𝒢(eτLp)\displaystyle=\phi_{\mathcal{G}}(e^{\tau L}p)

where in the second line we exchanged the bounded operator eτLe^{\tau L} and the (Bochner) integral, and in the third line we used that Φg\Phi_{g} and eτLe^{\tau L} commute.

B.3 Derivation of Eq. 38 from Eq. 39 and Eq. 41

Consider some f:Xf:X\to\mathbb{R} and a continuous-state master equation LL such that

[Lf](x)=[Lxxf(x)Lxxf(x)]𝑑x.\displaystyle[Lf](x)=\int\left[L_{xx^{\prime}}f(x^{\prime})-L_{x^{\prime}x}f(x)\right]\,dx^{\prime}. (127)

(The derivation for discrete-state master equations, as in Eq. 10, is the same, but with integrals replaced with summations). Then,

[ΦgLf](x)=[Lf](g(x))\displaystyle[\Phi_{g}Lf](x)=[Lf]({g}(x))
=[Lg(x)xf(x)Lxg(x)f(g(x))]𝑑x\displaystyle\quad=\int[L_{{g}(x)x^{\prime}}f(x^{\prime})-L_{x^{\prime}{g}(x)}f({g}(x))]dx^{\prime} (128)
=[Lg(x)g(x)f(g(x))Lg(x)g(x)f(g(x))]𝑑x\displaystyle\quad=\int[L_{{g}(x){g}(x^{\prime})}f({g}(x^{\prime}))-L_{{g}(x^{\prime}){g}(x)}f({g}(x))]dx^{\prime} (129)
=[Lxxf(g(x))Lxxf(g(x))]𝑑x\displaystyle\quad=\int[L_{xx^{\prime}}f({g}(x^{\prime}))-L_{x^{\prime}x}f({g}(x))]dx^{\prime} (130)
=[Lxx[Φgf](y)Lxx[Φgf](x)]𝑑x\displaystyle\quad=\int[L_{xx^{\prime}}[\Phi_{g}f](y)-L_{x^{\prime}x}[\Phi_{g}f](x)]dx^{\prime} (131)
=[LΦgf](x),\displaystyle\quad=[L\Phi_{g}f](x), (132)

which implies ΦgL=LΦg\Phi_{g}L=L\Phi_{g}, Eq. 38. Here we used the definition of Φg\Phi_{g} in the first line and Eq. 127 in Eq. 128. In Eq. 129, we used the variable substitution xg(x)x^{\prime}\mapsto{g}(x^{\prime}), along with the fact that g{g} is volume preserving. In Eq. 130, we used Eq. 39.

Next, we show that Eq. 41 is sufficient for Eq. 38 to hold, assuming that all g𝒢{g}\in\mathcal{G} are rigid transformation and the LΛL\in\Lambda refer to Fokker-Planck equations of the form Eq. 40. First, given some (sufficiently smooth) function f:Xf:X\to\mathbb{R}, write Eq. 40 as

tf=Lf=((E)f)+β1Δf.{\textstyle\partial_{t}}f=Lf=\nabla\cdot((\nabla E)f)+\beta^{-1}\Delta f. (133)

For any g𝒢g\in\mathcal{G}, write the diffusion term in Eq. 133 as

Δf=Δ(Φgfg1)=Δ(Φgf)g1,\displaystyle\Delta f=\Delta(\Phi_{g}f\circ{g^{-1}})=\Delta(\Phi_{g}f)\circ{g^{-1}}, (134)

where we used the identity f=Φg1Φgf=Φgfg1f=\Phi_{{g^{-1}}}\Phi_{g}f=\Phi_{g}f\circ{g^{-1}} and that the Laplace operator commutes with rigid transformations. Now consider the drift term in Eq. 133. Using the product rule,

((E)f)=(f)T(E)+fΔE.\nabla\cdot((\nabla E)f)=(\nabla f)^{T}(\nabla E)+f\Delta E. (135)

We can rewrite the second term above as

fΔE\displaystyle f\Delta E =(Φgfg1)ΔE\displaystyle=(\Phi_{g}f\circ{g^{-1}})\Delta E
=(Φgfg1)Δ(Eg1)\displaystyle=(\Phi_{g}f\circ{g^{-1}})\Delta(E\circ{g^{-1}})
=(Φgfg1)((ΔE)g1)\displaystyle=(\Phi_{g}f\circ{g^{-1}})((\Delta E)\circ{g^{-1}})
=((Φgf)(ΔE))g1,\displaystyle=((\Phi_{g}f)(\Delta E))\circ{g^{-1}}, (136)

where we used f=Φgfg1f=\Phi_{g}f\circ{g^{-1}}, the invariance of EE under 𝒢\mathcal{G} (Eq. 41), and in the third line that the Laplace operator commutes with rigid transformations. Now consider the first term on the right hand side of Eq. 135:

(f)T(E)\displaystyle(\nabla f)^{T}(\nabla E) =((Φgfg1)T(Eg1)\displaystyle=(\nabla(\Phi_{g}f\circ{g^{-1}})^{T}\nabla(E\circ{g^{-1}})
=(JT((Φgf)g1))T(JT((E)g1))\displaystyle=(J^{T}(\nabla(\Phi_{g}f)\circ{g^{-1}}))^{T}(J^{T}((\nabla E)\circ{g^{-1}}))
=((Φgf)g1)TJJT((E)g1)\displaystyle=(\nabla(\Phi_{g}f)\circ{g^{-1}})^{T}JJ^{T}((\nabla E)\circ{g^{-1}})
=((Φgf)g1)T((E)g1)\displaystyle=(\nabla(\Phi_{g}f)\circ{g^{-1}})^{T}((\nabla E)\circ{g^{-1}})
=((Φgf)T(E))g1,\displaystyle=(\nabla(\Phi_{g}f)^{T}(\nabla E))\circ{g^{-1}}, (137)

where JJ indicates the Jacobian of g1{g^{-1}}. In the first line, we again used the identity f=Φgfg1f=\Phi_{g}f\circ{g^{-1}} and the invariance of EE under 𝒢\mathcal{G}, in the second line we used the chain rule, and in the fourth line we used that JJT=IJJ^{T}=I for rigid transformations. Plugging Eqs. 136 and 137 back into Eq. 135 and rearranging gives

((E)f)=((E)(Φgf))g1.\displaystyle\nabla\cdot((\nabla E)f)=\nabla\cdot((\nabla E)(\Phi_{g}f))\circ{g^{-1}}. (138)

Combined with Eqs. 134 and 133, this in turns implies that Lf=(LΦgf)g1Lf=(L\Phi_{g}f)\circ{g^{-1}}, or in other words that

ΦgLf=LΦgf.\Phi_{g}Lf=L\Phi_{g}f.

B.4 Derivation of Eq. 43

First, write the inaccessible information term in Eq. 35 as

D(pX|Mϕ𝒢(pX|M))=mp(m)D(pX|mϕ𝒢(pX|m))\displaystyle D(p_{X|M}\|\phi_{\mathcal{G}}(p_{X|M}))=\sum_{m}p(m)D(p_{X|m}\|\phi_{\mathcal{G}}(p_{X|m}))
=mp(m,x)lnp(x|m)p(g(x)|m)μg\displaystyle=\sum_{m}p(m,x)\ln\frac{p(x|m)}{\int p(g(x)|m)\mu g}
=mp(m,x)lnp(x)q(m|x)/p(m)p(g(x))q(m|g(x))/pg(m)μ(g),\displaystyle=\sum_{m}p(m,x)\ln\frac{p(x)q(m|x)/p(m)}{\int p(g(x))q(m|g(x))/p_{g}(m)\,\mu(g)}, (139)

where we’ve defined p(m)=xp(x)q(m|x)p(m)=\sum_{x}p(x)q(m|x) and pg(m)=xp(g(x))q(m|x)p_{g}(m)=\sum_{x}p(g(x))q(m|x), and used the definition of ϕ𝒢\phi_{\mathcal{G}} in Eq. 42. (Here we assume for simplicity that both XX and MM are discrete valued; otherwise the summations in Eq. 139 should be replaced with integrals.)

Recall that we assumed that pp is invariant under 𝒢\mathcal{G}, so ϕ𝒢(p)=p\phi_{\mathcal{G}}(p)=p. By Lemma 4, p(x)=p(g(x))p(x)=p(g(x)) for all xx and g𝒢g\in\mathcal{G}, which in turn implies that p(m)=pg(m)p(m)=p_{g}(m). Plugging into Eq. 139 then gives

D(pX|Mϕ𝒢(pX|M))=mp(m,x)lnq(m|x)q(m|g(x))μ(g),\displaystyle D(p_{X|M}\|\phi_{\mathcal{G}}(p_{X|M}))=\sum_{m}p(m,x)\ln\frac{q(m|x)}{\int q(m|g(x))\,\mu(g)},

which appears in the main text as Eq. 43.

B.5 Example: Szilard box, derivation of Eq. 50

We derive Eq. 50 using a simple geometric argument.

Consider the twirling of pθp_{\theta}, as shown in Fig. 5(b). From the definition of ϕ𝒢\phi_{\mathcal{G}} and Eq. 49, it is easy to see that

  1. 1.

    The dark gray areas in Fig. 5(b) (where both pθ(x1,x2)=1/2p_{\theta}(x_{1},x_{2})=1/2 and pθ(x1,x2)=1/2p_{\theta}(x_{1},-x_{2})=1/2) have probability density ϕ𝒢(pθ)(x1,x2)=1/2\phi_{\mathcal{G}}(p_{\theta})(x_{1},x_{2})=1/2.

  2. 2.

    The light gray areas in Fig. 5(b) (where either pθ(x1,x2)=1/2p_{\theta}(x_{1},x_{2})=1/2 or pθ(x1,x2)=1/2p_{\theta}(x_{1},-x_{2})=1/2, but not both) have probability density ϕ𝒢(pθ)(x1,x2)=1/4=u(x1,x4)\phi_{\mathcal{G}}(p_{\theta})(x_{1},x_{2})=1/4=u(x_{1},x_{4}).

  3. 3.

    The white areas in Fig. 5(b) (where pθ(x1,x2)=0p_{\theta}(x_{1},x_{2})=0 and pθ(x1,x2)=0p_{\theta}(x_{1},-x_{2})=0) have probability density ϕ𝒢(pθ)(x1,x2)=0\phi_{\mathcal{G}}(p_{\theta})(x_{1},x_{2})=0.

Given this,

D(ϕ𝒢(pθ)u)=ln2Pθ,\displaystyle D(\phi_{\mathcal{G}}(p_{\theta})\|u)=\ln 2\cdot P_{\theta}, (140)

where PθP_{\theta} is the probability assigned by pp to the dark gray areas (i.e., those (x1,x2)(x_{1},x_{2}) where pθ(x1,x2)=1/2=pθ(x1,x2)=1/2p_{\theta}(x_{1},x_{2})=1/2=p_{\theta}(x_{1},-x_{2})=1/2).

Refer to caption
Figure 15: The twirling ϕ𝒞(pθ)\phi_{{\mathcal{C}}}(p_{\theta}) for two cases. Left: |θ|(π4,3π4)|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4}). Right: ϕ𝒞(pθ)\phi_{{\mathcal{C}}}(p_{\theta}) for |θ|[π,π](π4,3π4)|\theta|\in[-\pi,\pi]\setminus(\frac{\pi}{4},\frac{3\pi}{4}).

To calculate the value of PθP_{\theta}, is suffices to consider two separate cases:

  1. 1.

    |θ|[π,π](π4,3π4)|\theta|\in[-\pi,\pi]\setminus(\frac{\pi}{4},\frac{3\pi}{4})

  2. 2.

    |θ|(π4,3π4)|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4})

which are shown visually in Fig. 15. Using this figure, and a bit of trigonometry, it can be shown that Pθ=112|tanθ|P_{\theta}=1-\frac{1}{2}|\tan\theta| in the first case, and Pθ=12|tan(θπ/2)|P_{\theta}=\frac{1}{2}|\tan(\theta-\pi/2)| in the second case. Combining these results with Eq. 140 gives Eq. 50.

B.6 Example: Symmetry constraints on a discrete-state master equation

Here we demonstrate our results on symmetry constraints using a simple finite-state system. The system contains nn states, x={0,,n1}x=\{0,\dots,n-1\}. We consider a group generated by circular shifts, representing mm-fold circular symmetry:

g(x)=x+n/mmodn.\displaystyle{g}(x)=x+n/m\quad\mathrm{mod}\quad n. (141)

Assume that the driving protocol obeys the following symmetry group at all t[0,1]t\in[0,1]:

Lxx(t)=Lg(x)g(x)(t),L_{x^{\prime}x}(t)=L_{{g}(x^{\prime}){g}(x)}(t), (142)

An example of such a master equation would be a unicyclic network, where the nn states are arranged in a ring, and transitions between nearest-neighbor states obey Eq. 142. Such unicyclic networks are often used to model biochemical oscillators and similar biological systems (barato2017coherence, ). This kind of system is illustrated in Fig. 16, with n=12n=12 and m=4m=4.

Imagine that this system starts from the initial distribution p(x)xp(x)\propto x, so the probability grows linearly from 0 (for x=0x=0) to maximal (for x=nx=n). For the 12 state system with 4-fold symmetry, this initial distribution is given by

p(x)=xx=011x=x66,p(x)=\frac{x}{\sum_{x^{\prime}=0}^{11}x^{\prime}}=\frac{x}{66},

and is shown on the left hand side of Fig. 16. How much work can be extracted by bringing this initial distribution to some other distribution pp^{\prime}, while using rate matrices of the form Eq. 142? This is bounded by the drop of the accessible free energy, via Eq. 25:

W(pp)FE(ϕ𝒢(p))FE(ϕ𝒢(p)).\displaystyle W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(\phi_{\mathcal{G}}(p))-F_{E^{\prime}}(\phi_{\mathcal{G}}(p^{\prime})). (143)

Using the example system with 12 states and 4-fold symmetry, the twirled distribution ϕ𝒢(p)\phi_{\mathcal{G}}(p) is given by

ϕ𝒢(p)(x)=x+(x+3 mod 12)+(x+6 mod 12)+(x+9 mod 12)4×66.\phi_{\mathcal{G}}(p)(x)=\\ \frac{x+(x+3\text{ mod }12)+(x+6\text{ mod }12)+(x+9\text{ mod }12)}{4\times 66}.

For example, for the distribution p(x)=x/66p(x)=x/66,

ϕ𝒢(p)(0)\displaystyle\phi_{\mathcal{G}}(p)(0) =(0+3+6+9)/(4×66)\displaystyle=(0+3+6+9)/(4\times 66) 0.068\displaystyle\approx 0.068
ϕ𝒢(p)(1)\displaystyle\phi_{\mathcal{G}}(p)(1) =(1+4+7+10)/(4×66)\displaystyle=(1+4+7+10)/(4\times 66) 0.083\displaystyle\approx 0.083
ϕ𝒢(p)(2)\displaystyle\phi_{\mathcal{G}}(p)(2) =(2+5+8+11)/(4×66)\displaystyle=(2+5+8+11)/(4\times 66) 0.098\displaystyle\approx 0.098
ϕ𝒢(p)(3)\displaystyle\phi_{\mathcal{G}}(p)(3) =(3+6+9+0)/(4×66)\displaystyle=(3+6+9+0)/(4\times 66) 0.068\displaystyle\approx 0.068
\displaystyle\dots \displaystyle\dots

This twirled distribution is shown on the right panel of Fig. 16.

Refer to caption
Figure 16: A unicyclic master equation over 12 states with 4-fold symmetry, as in Eq. 142. Left: an initial distribution p(x)xp({x})\propto x which does not respect the 4-fold symmetry. Right: the twirling ϕ𝒢(p)\phi_{\mathcal{G}}(p), which is invariant to the symmetry. (Colors indicate relative probability assigned to each of the 12 states.) The extractable work depends on the accessible free energy in pp, which is given by FE(ϕ𝒢(p))F_{E}(\phi_{\mathcal{G}}(p)).

B.7 Example: 2D Ising model, derivation of Eq. 56

We begin by recalling the expression for accessible information in our feedback-control protocol over the 2D Ising model, which appears as Eq. 55 in the main text:

Iaccϕ𝒢(X;M)=ln2lnq(m|x)N2a,bq(m|ga,b(x)).\displaystyle I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)\!=\ln 2-\Big{\langle}\!\ln\frac{q(m|x)}{N^{-2}\sum_{a,b}q(m|g_{a,b}(x))}\Big{\rangle}. (144)

Using q(m|x)=δm(x1)q(m|x)=\delta_{m}(x_{1}), the expectation term in Eq. 144 can be rewritten as

xp(x)m{1,1}δm(x1)ln[N2a,bδm(ga,b(x)1)].\displaystyle-\sum_{x}p(x)\sum_{\mathclap{m\in\{-1,1\}}}\delta_{m}(x_{1})\ln\big{[}N^{-2}{\sum_{a,b}\delta_{m}(g_{a,b}(x)_{1})}\big{]}. (145)

Let z(x)=(1+ixi/N2)/2z(x)=(1+\sum_{i}x_{i}/N^{2})/2 indicate the magnetization of lattice state xx, normalized to lie between 0 and 1. Note that for any lattice state xx, the frequency that spin 1 is in state 1 averaged across all translations is equal to the magnetization of xx,

N2a,bδ1(ga,b(x)1)=z(x).N^{-2}\sum_{a,b}\delta_{1}(g_{a,b}(x)_{1})=z(x).

In addition, by symmetry, the probability that spin 1 is in state 1 averaged across all states that have magnetization zz is equal to zz,

xp(x|z)δ1(x1)=z.\sum_{x}p(x|z)\delta_{1}(x_{1})=z.

Using these results and δ1(x)=1δ1(x)\delta_{-1}(x)=1-\delta_{1}(x), we can rewrite the expression in Eq. 145 as

xp(x)[δ1(x1)lnz(x)+(1δ1(x1))ln(1z(x))]\displaystyle-\sum_{x}p(x)[\delta_{1}(x_{1})\ln z(x)+(1-\delta_{1}(x_{1}))\ln(1-z(x))]
=zp(z)[zlnz(1z)ln(1z)]h2(z),\displaystyle=\sum_{z}p(z)[-z\ln z-(1-z)\ln(1-z)]\equiv\langle h_{2}(z)\rangle, (146)

where p(z)=xp(x)δz(z(x))p(z^{\prime})=\sum_{x}p(x)\delta_{z^{\prime}}(z(x)) is the probability that the system has magnetization zz^{\prime} and h2h_{2} is the binary entropy function.

We now consider the NN\to\infty limit, and use Onsager’s expression for the spontaneous magnetization for the 2D Ising model yang1952spontaneous . When β\beta is below the critical inverse temperature, βc=ln(1+2)/20.44\beta_{c}=\ln(1+\sqrt{2})/2\approx 0.44, the magnetization distribution p(z)p(z) concentrates at z=1/2z=1/2, so Eq. 146 approaches h2(1/2)=ln2h_{2}(1/2)=\ln 2. When β>βc\beta>\beta_{c}, the magnetization distribution concentrates on a uniform mixture of two delta functions at z=f(β)z=f(\beta) and z=1f(β)z=1-f(\beta), where f(β)=(1+1(sinh2β)48)/2f(\beta)=(1+\sqrt[8]{1-(\sinh 2\beta)^{-4}})/2. In this case, Eq. 146 approaches (h2(f(β))+h2(1f(β)))/2=h2(f(β))(h_{2}(f(\beta))+h_{2}(1-f(\beta)))/2=h_{2}(f(\beta)). Combining these results with Eq. 144 implies that Iaccϕ𝒢(X;M)=0I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)=0 for ββc\beta\leq\beta_{c} and Iaccϕ𝒢(X;M)=ln2h2(f(β))I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)=\ln 2-h_{2}(f(\beta)) for β>βc\beta>\beta_{c}, which appears as Eq. 56 in the main text.

Appendix C Modularity constraints

C.1 ϕ𝒞\phi_{{\mathcal{C}}} obeys the Pythagorean identity, Eq. 14

We show that ϕ𝒞\phi_{\mathcal{C}} obeys the Pythagorean identity:

D(pϕ𝒞(q))=D(pϕ𝒞(p))+D(ϕ𝒞(p)ϕ𝒞(q)).\displaystyle D(p\|\phi_{\mathcal{C}}(q))=D(p\|\phi_{\mathcal{C}}(p))+D(\phi_{\mathcal{C}}(p)\|\phi_{\mathcal{C}}(q)). (147)

for all p,q𝒫p,q\in\mathcal{P} such that D(pϕ𝒢(q))<D(p\|\phi_{\mathcal{G}}(q))<\infty. For any p,r𝒫p,r\in\mathcal{P},

𝔼p[lnϕ𝒞(r)]=𝔼p[lnrO]+A𝒞𝔼p[lnrAO|AO]\displaystyle\mathbb{E}_{p}[\ln\phi_{{\mathcal{C}}}(r)]=\mathbb{E}_{p}[\ln r_{O}]+\sum_{{A\in\mathcal{C}}}\mathbb{E}_{p}[\ln r_{A\setminus O|A\cap O}]
=𝔼ϕ𝒞(p)[lnrO]+A𝒞𝔼ϕ𝒞(p)[lnrAO|AO]\displaystyle\quad=\mathbb{E}_{\phi_{{\mathcal{C}}}(p)}[\ln r_{O}]+\sum_{{A\in\mathcal{C}}}\mathbb{E}_{\phi_{{\mathcal{C}}}(p)}[\ln r_{A\setminus O|A\cap O}] (148)
=𝔼ϕ𝒞(p)[lnϕ𝒞(r)],\displaystyle\quad=\mathbb{E}_{\phi_{{\mathcal{C}}}(p)}[\ln\phi_{{\mathcal{C}}}(r)], (149)

where aOa_{O} and aAO|AOa_{A\setminus O|A\cap O} indicate marginal and conditional distributions, respectively. In Eq. 148, we used that pp and ϕ𝒞(p)\phi_{{\mathcal{C}}}(p) have the same marginals over all subsystems all A𝒞A\in\mathcal{C} as well as the overlap OO (this can be verified from the definition of ϕ𝒞\phi_{{\mathcal{C}}}, Eq. 64). Then,

D(pϕ𝒞(q))\displaystyle D(p\|\phi_{{\mathcal{C}}}(q)) =D(pϕ𝒞(p))+𝔼p[lnϕ𝒞(p)lnϕ𝒞(q)]\displaystyle=D(p\|\phi_{{\mathcal{C}}}(p))+\mathbb{E}_{p}[\ln\phi_{{\mathcal{C}}}(p)-\ln\phi_{{\mathcal{C}}}(q)]
=D(pϕ𝒞(p))+𝔼ϕ𝒞(p)[lnϕ𝒞(p)lnϕ𝒞(q)]\displaystyle=D(p\|\phi_{{\mathcal{C}}}(p))+\mathbb{E}_{\phi_{{\mathcal{C}}}(p)}[\ln\phi_{{\mathcal{C}}}(p)-\ln\phi_{{\mathcal{C}}}(q)]
=D(pϕ𝒞(p))+D(ϕ𝒞(p)ϕ𝒞(q)),\displaystyle=D(p\|\phi_{{\mathcal{C}}}(p))+D(\phi_{{\mathcal{C}}}(p)\|\phi_{{\mathcal{C}}}(q)),

where the second line follows by applying Eq. 149 twice, first taking r=pr=p and then taking r=qr=q.

C.2 ϕ𝒞\phi_{{\mathcal{C}}} commutes with eτLe^{\tau L}

We show that if for some generator LL, Eqs. 59 and 60 hold for all A𝒞A\in\mathcal{C}, then ϕ𝒞\phi_{\mathcal{C}} and eτLe^{\tau L} obey the commutativity relation of Eq. 16. We assume that all L(A)L^{(A)} in Eq. 60 are bounded linear operators.

Before deriving our result, we introduce some helpful notation:

  1. 1.

    δx(x)\delta_{x}(x^{\prime}) indicates the delta function distribution over XX centered at xx (this is the Dirac delta for continuous XX, and the Kronecker delta for discrete XX). For any subsystem SVS\subseteq V, δxS(xS)\delta_{x_{S}}(x^{\prime}_{S}) indicates the delta function distribution over XSX_{S} centered at xSx_{S}.

  2. 2.

    Tτ(A)(x|x)=[eτL(A)δx](x)T^{(A)}_{\tau}(x^{\prime}|x)=[e^{\tau L^{(A)}}\delta_{x}](x^{\prime}) indicates the conditional distribution over XX, given that the system starts on state xx and then evolves under L(A)L^{(A)} for time τ\tau.

  3. 3.

    For any A𝒞A\in\mathcal{C},

    𝑨:=A(B𝒞{A}B)=AO(𝒞){\textstyle{\bm{A}}:=A\setminus\big{(}\bigcup_{B\in\mathcal{C}\setminus\{A\}}B\big{)}=A\setminus O(\mathcal{C})}

    indicates the set of degrees of freedom that belong exclusively to A𝒞A\in\mathcal{C} (and no other subsystems), and

    𝑨c:=V𝑨=B𝒞{A}B.{\textstyle{{\bm{A}}^{c}}:=V\setminus{\bm{A}}=\bigcup_{B\in\mathcal{C}\setminus\{A\}}B}.

    indicates the complement of 𝑨{\bm{A}}, which is the set of degrees of freedom that fall into at least one of the other subsystem besides AA.

To derive the commutativity relation, we proceed in three steps, which are described in detail in the subsections below. In the first step, we show that, for all τ0\tau\geq 0 and A𝒞A\in\mathcal{C}, the conditional distribution Tτ(A)(x|x)T^{(A)}_{\tau}(x^{\prime}|x) can be written in the following product form:

Tτ(A)(x|x)=Tτ(A)(x𝑨|xA)δx𝑨c(x𝑨c).\displaystyle T^{(A)}_{\tau}(x^{\prime}|x)=T^{(A)}_{\tau}({x_{\bm{A}}^{\prime}|x_{A}})\delta_{x_{{\bm{A}}^{c}}}(x^{\prime}_{{\bm{A}}^{c}}). (150)

In the second step, we show that Eq. 150 implies the following commutativity relation for any p𝒫p\in\mathcal{P} and each A𝒞A\in\mathcal{C}:

eτL(A)ϕ𝒞(p)=ϕ𝒞(eτL(A)p).\displaystyle e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)=\phi_{{\mathcal{C}}}(e^{\tau L^{(A)}}p). (151)

In the third step, we show that the generators corresponding to all subsystems commute:

L(A)L(B)=L(B)L(A)A,B𝒞.\displaystyle L^{(A)}L^{(B)}=L^{(B)}L^{(A)}\qquad\forall A,B\in\mathcal{C}. (152)

We then combine these three results to show that ϕ𝒞\phi_{{\mathcal{C}}} and eτLe^{\tau L} commute. Write

eτLϕ𝒞(p)=eA𝒞τL(A)ϕ𝒞(p)=A𝒞eτL(A)ϕ𝒞(p).\displaystyle e^{\tau L}\phi_{{\mathcal{C}}}(p)=e^{\sum_{A\in\mathcal{C}}\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)=\prod_{A\in\mathcal{C}}e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p).

where we used Eqs. 58 and 152 to expand the operator exponential. Then, using Eq. 151, write

A𝒞eτL(A)ϕ𝒞(p)=ϕ𝒞(A𝒞eτL(A)p)=ϕ𝒞(eτLp).\prod_{A\in\mathcal{C}}e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)=\phi_{{\mathcal{C}}}\Bigg{(}\prod_{A\in\mathcal{C}}e^{\tau L^{(A)}}p\Bigg{)}=\phi_{{\mathcal{C}}}(e^{\tau L}p).

Combining these two results implies that eτLϕ𝒞(p)=ϕ𝒞(eτLp)e^{\tau L}\phi_{{\mathcal{C}}}(p)=\phi_{{\mathcal{C}}}(e^{\tau L}p) for all p𝒫p\in\mathcal{P} and τ0\tau\geq 0, as in Eq. 16.

C.2.1 Derivation of Eq. 150

To derive Eq. 150, consider the conditional distribution over 𝑨{\bm{A}} given initial state xx, as induced by L(A)L^{(A)}:

Tτ(A)(x𝑨|x)\displaystyle T^{(A)}_{\tau}(x^{\prime}_{\bm{A}}|x) =[eτL(A)δx]𝑨(x𝑨)\displaystyle=[e^{\tau L^{(A)}}\delta_{x}]_{\bm{A}}(x^{\prime}_{\bm{A}})
=[δx]𝑨(x𝑨)+k1τkk![L(A)kδx]𝑨(x𝑨)\displaystyle=[\delta_{x}]_{\bm{A}}(x^{\prime}_{\bm{A}})+\sum_{k\geq 1}\frac{\tau^{k}}{k!}[{L^{(A)}}^{k}\delta_{x}]_{\bm{A}}(x^{\prime}_{\bm{A}})
=δx𝑨(x𝑨)+k1τkk![L(A)kδx]𝑨(x𝑨).\displaystyle=\delta_{x_{\bm{A}}}(x^{\prime}_{\bm{A}})+\sum_{k\geq 1}\frac{\tau^{k}}{k!}[{L^{(A)}}^{k}\delta_{x}]_{\bm{A}}(x^{\prime}_{\bm{A}}). (153)

where in the second line we expanded the operator exponential as eτL(A)=kτkL(A)k/k!e^{\tau L^{(A)}}=\sum_{k}\tau^{k}{L^{(A)}}^{k}/k!. Note that 𝑨A{\bm{A}}\subseteq A, so [L(A)δx]𝑨[L^{(A)}\delta_{x}]_{{\bm{A}}} is a function of [L(A)δx]A[L^{(A)}\delta_{x}]_{A}, which in turn is a function of xAx_{A} by Eq. 59. Similarly, δx𝑨(x𝑨)\delta_{x_{\bm{A}}}(x^{\prime}_{\bm{A}}) depends only on xAx_{A}, not xx. This means the right hand side of Eq. 153 depends only on xAx_{A}, which we indicate by

Tτ(A)(x𝑨|x)=Tτ(A)(x𝑨|xA).\displaystyle T^{(A)}_{\tau}(x^{\prime}_{\bm{A}}|x)=T^{(A)}_{\tau}({x_{\bm{A}}^{\prime}|x_{A}}). (154)

Now consider the conditional distribution over any other subsystem BAB\neq A given initial state xx, as induced by L(A)L^{(A)}:

Tτ(A)(xB|x)\displaystyle T^{(A)}_{\tau}(x^{\prime}_{B}|x) =δxB(xB)+k1τkk![L(A)kδx]B(xB)\displaystyle=\delta_{x_{B}}(x^{\prime}_{B})+\sum_{k\geq 1}\frac{\tau^{k}}{k!}[{L^{(A)}}^{k}\delta_{x}]_{B}(x^{\prime}_{B})
=δxB(xB),\displaystyle=\delta_{x_{B}}(x^{\prime}_{B}), (155)

where we used that [L(A)δx]B=0[L^{(A)}\delta_{x}]_{B}=0 by Eq. 60.

Now, it is straightforward to show that if some distribution pp over XVX_{V} has delta function marginals pB=δxBp_{B}=\delta_{x_{B}} for all BAB\neq A, then pp must have following product form:

p(x)=p𝑨(x𝑨)δx𝑨c(x𝑨c),\displaystyle p(x^{\prime})=p_{{\bm{A}}}(x_{{\bm{A}}}^{\prime})\,\delta_{x_{{\bm{A}}^{c}}}(x^{\prime}_{{\bm{A}}^{c}}), (156)

where we use hat 𝑨c=B𝒞{A}B{{\bm{A}}^{c}}=\bigcup_{B\in\mathcal{C}\setminus\{A\}}B. Eq. 150 follows by taking p(x)=Tτ(A)(x|x)p(x^{\prime})=T^{(A)}_{\tau}(x^{\prime}|x) in Eq. 156, while using Eq. 154.

C.2.2 Derivation of Eq. 151

Consider any τ0\tau\geq 0 and A𝒞A\in\mathcal{C}. Using Eq. 59 and the identity eτL(A)=kτkL(A)k/k!e^{\tau L^{(A)}}=\sum_{k}\tau^{k}{L^{(A)}}^{k}/k!, one can show that whenever two distributions p,q𝒫p,q\in\mathcal{P} obey pA=qAp_{A}=q_{A}, it must be that [eτL(A)p]A=[eτL(A)q]A[e^{\tau L^{(A)}}p]_{A}=[e^{\tau L^{(A)}}q]_{A}. Since pA=[ϕ𝒞(p)]Ap_{A}=[\phi_{{\mathcal{C}}}(p)]_{A} (see the definition of ϕ𝒞\phi_{{\mathcal{C}}} in Eq. 64),

[eτL(A)p]A=[eτL(A)ϕ(p)]A.\displaystyle[e^{\tau L^{(A)}}p]_{A}=[e^{\tau L^{(A)}}\phi(p)]_{A}. (157)

In addition, given Eq. 155, we have [eτL(A)p]𝑨c=p𝑨c[e^{\tau L^{(A)}}p]_{{\bm{A}}^{c}}=p_{{\bm{A}}^{c}}. Given that B𝑨cB\subseteq{{\bm{A}}^{c}} for each BAB\neq A, we have

[eτL(A)p]B\displaystyle[e^{\tau L^{(A)}}p]_{B} =pB=ϕ(p)B=[eτL(A)ϕ(p)]B.\displaystyle=p_{B}=\phi(p)_{B}=[e^{\tau L^{(A)}}\phi(p)]_{B}. (158)

Similarly, O(𝒞)𝑨cO(\mathcal{C})\subseteq{{\bm{A}}^{c}} and therefore

[eτL(A)p]O(𝒞)\displaystyle[e^{\tau L^{(A)}}p]_{O(\mathcal{C})} =[eτL(A)ϕ(p)]O(𝒞).\displaystyle=[e^{\tau L^{(A)}}\phi(p)]_{O(\mathcal{C})}. (159)

Now, observe that the distribution ϕ𝒞(p)\phi_{{\mathcal{C}}}(p) does not depend on the full distribution pp, but only on the marginal distributions pO(𝒞)p_{O(\mathcal{C})} and {pA}A𝒞\{p_{A}\}_{A\in\mathcal{C}}. By Eqs. 157, 158 and 159, these marginals are the same for eτL(A)pe^{\tau L^{(A)}}p and eτL(A)ϕ𝒞(p)e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p), which means that

ϕ𝒞(eτL(A)p)=ϕ𝒞(eτL(A)ϕ𝒞(p)).\displaystyle\phi_{{\mathcal{C}}}(e^{\tau L^{(A)}}p)=\phi_{{\mathcal{C}}}(e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)). (160)

Next, using Eq. 150 and some simple (but rather tedious) algebra, it can be shown that

eτL(A)ϕ𝒞(p)=pAO|AOpOBApBO|BO,\displaystyle e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)=p_{A\setminus O|A\cap O}^{\prime}\;p_{O}\;\prod_{{B\neq A}}p_{B\setminus O|B\cap O}\;, (161)

where

pAO|AO(xAO|xAO)=Tτ(A)(x𝑨|x𝑨,xAO)p(x𝑨|xAO)𝑑x𝑨,p_{A\setminus O|A\cap O}^{\prime}(x_{A\setminus O}^{\prime}|x_{A\cap O}^{\prime})=\\ \int T^{(A)}_{\tau}({x_{\bm{A}}^{\prime}|x_{\bm{A}},x_{A\cap O}}^{\prime})p(x_{{\bm{A}}}|x_{A\cap O}^{\prime})dx_{{\bm{A}}}, (162)

and we used the conditional distribution Tτ(A)(x𝑨|x𝑨,xAO)T^{(A)}_{\tau}({x_{\bm{A}}^{\prime}|x_{\bm{A}},x_{A\cap O}}) from Eq. 154. The right hand side of Eq. 161 has the form of the right hand side of Eq. 64, so it is invariant under ϕ𝒞\phi_{\mathcal{C}}:

ϕ𝒞(eτL(A)ϕ𝒞(p))=eτL(A)ϕ𝒞(p).\displaystyle\phi_{\mathcal{C}}(e^{\tau L^{(A)}}\phi_{\mathcal{C}}(p))=e^{\tau L^{(A)}}\phi_{\mathcal{C}}(p). (163)

Eq. 151 follows by combining Eqs. 160 and 163.

C.2.3 Derivation of Eq. 152

Using Eq. 150 and some algebra, one can verify that for all τ0\tau\geq 0 and A,B𝒞A,B\in\mathcal{C},

Tτ(A)(x′′|x)Tτ(B)(x|x)𝑑x=Tτ(B)(x′′|x)Tτ(A)(x|x)𝑑x,\int T^{(A)}_{\tau}(x^{\prime\prime}|x^{\prime})T^{(B)}_{\tau}(x^{\prime}|x)\,dx^{\prime}\\ =\int T^{(B)}_{\tau}(x^{\prime\prime}|x^{\prime})T^{(A)}_{\tau}(x^{\prime}|x)\,dx^{\prime}, (164)

which in operator notation can be written as

eτL(A)eτL(B)δx=eτL(B)eτL(A)δx.\displaystyle e^{\tau L^{(A)}}e^{\tau L^{(B)}}\delta_{x}=e^{\tau L^{(B)}}e^{\tau L^{(A)}}\delta_{x}. (165)

Then, for any function f=f(x)δx𝑑xf=\int f(x)\delta_{x}\,dx, write

eτL(A)eτL(B)f\displaystyle e^{\tau L^{(A)}}e^{\tau L^{(B)}}f =eτL(A)eτL(B)f(x)δx𝑑x\displaystyle=e^{\tau L^{(A)}}e^{\tau L^{(B)}}\int f(x)\delta_{x}\,dx
=f(x)eτL(A)eτL(B)δx𝑑x\displaystyle=\int f(x)e^{\tau L^{(A)}}e^{\tau L^{(B)}}\delta_{x}\,dx
=f(x)eτL(B)eτL(A)δx𝑑x\displaystyle=\int f(x)e^{\tau L^{(B)}}e^{\tau L^{(A)}}\delta_{x}\,dx
=eτL(B)eτL(A)f(x)δx𝑑x\displaystyle=e^{\tau L^{(B)}}e^{\tau L^{(A)}}\int f(x)\delta_{x}\,dx
=eτL(B)eτL(A)f,\displaystyle=e^{\tau L^{(B)}}e^{\tau L^{(A)}}f,

where we exchanged the order of the bounded operators eτL(A)eτL(B)e^{\tau L^{(A)}}e^{\tau L^{(B)}} and eτL(B)eτL(A)e^{\tau L^{(B)}}e^{\tau L^{(A)}} with the (Bochner) integral f(x)δx𝑑x\int f(x)\delta_{x}\,dx, and used Eq. 165. This shows that eτL(A)e^{\tau L^{(A)}} and eτL(B)e^{\tau L^{(B)}} commute for all τ0\tau\geq 0, so their inverses eτL(A)e^{-\tau L^{(A)}} and eτL(B)e^{-\tau L^{(B)}} must also commute. Given that eτL(A)e^{\tau L^{(A)}} and eτL(B)e^{\tau L^{(B)}} commute for all τ\tau\in\mathbb{R}, L(A)L^{(A)} and L(B)L^{(B)} must commute (engel_one-parameter_2000, , p. 23).

C.3 Szilard box: derivation of Eqs. 74 and 76

We first derive Eq. 74. Using Eq. 70 and some rearrangement, write

D(ϕ𝒞(pθ)u)=ln4S(pθ(X1))S(pθ(X2)),\displaystyle D(\phi_{{\mathcal{C}}}(p_{\theta})\|u)=\ln 4-S(p_{\theta}(X_{1}))-S({p_{\theta}}(X_{2})), (166)

where S(pθ(X1))S(p_{\theta}(X_{1})) and S(pθ(X2))S({p_{\theta}}(X_{2})) refer to the marginal entropies under pθp_{\theta}. It is easy to see that by symmetry,

S(pθ(X1))=S(pπ2θ(X2)).\displaystyle S(p_{\theta}(X_{1}))=S(p_{\frac{\pi}{2}-\theta}(X_{2})). (167)

Therefore, we will derive a closed-form expression for D(ϕ𝒞(pθ)u)D(\phi_{{\mathcal{C}}}(p_{\theta})\|u) by finding a closed-form expression for

S(pθ(X1)):=11pθ(x1)lnpθ(x1)𝑑x1.\displaystyle S(p_{\theta}(X_{1})):=-\int_{-1}^{1}p_{\theta}(x_{1})\ln p_{\theta}(x_{1})\,dx_{1}. (168)

First, consider the case of θ[π/2,π/2]\theta\in[-\pi/2,\pi/2], and define Aθ:=|tanθ|A_{\theta}:=|\tan\theta|. It can be verified from Eq. 49 that the marginal distribution pθ(x1)p_{\theta}(x_{1}) always has a piecewise linear form. In particular, if Aθ<1A_{\theta}<1, then for any x1[1,1]x_{1}\in[-1,1],

pθ(x1)={1if 1x1AθAθx12Aθif Aθx1Aθ0if x1>Aθ\displaystyle p_{\theta}(x_{1})=\begin{cases}1&\text{if $-1\leq x_{1}\leq-A_{\theta}$}\\ \frac{A_{\theta}-x_{1}}{2A_{\theta}}&\text{if $-A_{\theta}\leq x_{1}\leq A_{\theta}$}\\ 0&\text{if $x_{1}>A_{\theta}$}\end{cases} (169)

Otherwise, if Aθ>1A_{\theta}>1, then for any x1[1,1]x_{1}\in[-1,1],

pθ(x1)=Aθx12Aθ.\displaystyle p_{\theta}(x_{1})=\frac{A_{\theta}-x_{1}}{2A_{\theta}}. (170)

Plugged into Eq. 168, this gives

S(pθ(X1))\displaystyle S({p_{\theta}}(X_{1})) ={11Aθx12AθlnAθx12Aθdx1if Aθ>1AθAθAθx12AθlnAθx12Aθdx1otherwise\displaystyle=\begin{cases}-\int_{-1}^{1}\frac{A_{\theta}-x_{1}}{2A_{\theta}}\ln\frac{A_{\theta}-x_{1}}{2A_{\theta}}\,dx_{1}&\text{if $A_{\theta}>1$}\\ -\int_{-A_{\theta}}^{A_{\theta}}\frac{A_{\theta}-x_{1}}{2A_{\theta}}\ln\frac{A_{\theta}-x_{1}}{2A_{\theta}}\,dx_{1}&\text{otherwise}\end{cases}

Integrating these two cases separately in Mathematica, and plugging in the definition of AθA_{\theta}, gives

S(pθ(X1))\displaystyle S({p_{\theta}}(X_{1})) =12{f(|tanθ|)if |tanθ|>1|tanθ|otherwise\displaystyle=\frac{1}{2}\begin{cases}f(|\tan\theta|)&\text{if $|\tan\theta|>1$}\\ |\tan\theta|&\text{otherwise}\end{cases} (171)

where for convenience we’ve defined

f(x)=11+x22xlnx+1x1lnx214x2.\displaystyle f(x)=1-\frac{1+x^{2}}{2x}\ln\frac{x+1}{x-1}-\ln\frac{x^{2}-1}{4x^{2}}. (172)

Recall that so far we assumed that θ[π/2,π/2]\theta\in[-\pi/2,\pi/2]. However, by Eq. 49, pθ(x1,x2)=p±πθ(x1,x2)p_{\theta}(x_{1},x_{2})=p_{\pm\pi-\theta}(-x_{1},x_{2}), which implies that pθ(x1)=pπθ(x1)=pπθ(x1)p_{\theta}(x_{1})=p_{\pi-\theta}(-x_{1})=p_{-\pi-\theta}(-x_{1}) and S(pθ(X1))=S(pπθ(X1))=S(pπθ(X1))S({p_{\theta}}(X_{1}))=S(p_{\pi-\theta}(X_{1}))=S(p_{-\pi-\theta}(X_{1})). It can also be verified that |tanθ|=|tan(πθ)|=|tan(πθ)||\tan\theta|=|\tan(\pi-\theta)|=|\tan(-\pi-\theta)|, so in fact Eq. 171 holds for all θ[π,π]\theta\in[-\pi,\pi].

Finally, if |θ|(π4,3π4)|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4}), then Eqs. 171 and 167 imply

|tanθ|>1,\displaystyle|\tan\theta|>1,\quad S(pθ(X1))=12f(|tanθ|)\displaystyle S({p_{\theta}}(X_{1}))=\frac{1}{2}f(|\tan\theta|)
|tan(π2θ)|1,\displaystyle|\tan({\textstyle\frac{\pi}{2}-\theta})|\leq 1,\quad S(pθ(X2))=12|tan(π2θ)|\displaystyle S({p_{\theta}}(X_{2}))=\frac{1}{2}|\tan({\textstyle\frac{\pi}{2}-\theta})|

Conversely, if |θ|[0,π](π4,3π4)|\theta|\in[0,\pi]\setminus(\frac{\pi}{4},\frac{3\pi}{4}), then

|tanθ|1,\displaystyle|\tan\theta|\leq 1,\quad S(pθ(X1))=12|tanθ|\displaystyle S({p_{\theta}}(X_{1}))=\frac{1}{2}|\tan\theta|
|tan(π2θ)|>1,\displaystyle|\tan({\textstyle\frac{\pi}{2}-\theta})|>1,\quad S(pθ(X2))=12f(|tan(π2θ)|)\displaystyle S({p_{\theta}}(X_{2}))=\frac{1}{2}f(|\tan({\textstyle\frac{\pi}{2}-\theta})|)

Eq. 74 follows by combining these results and rearranging.

To derive Eq. 76, use ϕ𝒢(ϕ𝒞(pθ))(x1,x2)=pθ(x1)u(x2)\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))(x_{1},x_{2})=p_{\theta}(x_{1})u(x_{2}) to write

D(ϕ𝒢(ϕ𝒞(pθ))u)\displaystyle D(\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))\|u) =ln4S(pθ(X1))S(u(X2))\displaystyle=\ln 4-S(p_{\theta}(X_{1}))-S(u(X_{2}))
=ln2S(pθ(X1)),\displaystyle=\ln 2-S(p_{\theta}(X_{1})), (173)

where we used that S(u(X2))=ln2S(u(X_{2}))=\ln 2. Eq. 76 then follows by combining Eqs. 173 and 171.

C.4 Example: Feedback controlled flashing ratchet

Here we derive a closed-form expression for the accessible information in the feedback-controlled collective flashing ratchet.

For notational convenience, let a=1/αa=1/\alpha indicate the slope of the increasing part of VV in Fig. 10(b), and b=1/(1α)b=-1/(1-\alpha) indicate the slope of the decreasing part of VV. Note that the net force vV(xv)\sum_{v}V^{\prime}(x_{v}) can be seen as the sum of NN random variables, where by assumption each V(xv)V^{\prime}(x_{v}) is equal to a=1/αa=1/\alpha with probability α\alpha and equal to b=1/(1α)b=-1/(1-\alpha) with probability 1α1-\alpha. This implies that the expectation of V(xv)V^{\prime}(x_{v}) is 0 and the variance is 1/(α(1α))1/(\alpha(1-\alpha)).

We will first compute the accessible information Iaccϕ𝒞(X;M)=vI(Xv;M)=NI(X1;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)=\sum_{v}I(X_{v};M)=N\cdot I(X_{1};M). The mutual information between MM and the state of a single particle X1X_{1} is given by

I(X1;M)=S(M)S(M|X)\displaystyle I(X_{1};M)=S(M)-S(M|X)
=h2(p(1))αh2(p(1|a))(1α)h2(p(1|b)),\displaystyle\quad=h_{2}(p(1))-\alpha h_{2}(p(1|a))-(1-\alpha)h_{2}(p(1|b)), (174)

where p(1)p(1) is the probability that the net force is positive, p(1|a)p(1|a) is the probability that the net force is positive given that particle X1X_{1} experiences force aa, and p(1|b)p(1|b) is the probability that the net force is positive given that the particle X1X_{1} experiences force bb. We can compute p(1)p(1) by considering the case when k=0,1,2,k=0,1,2,\dots particles experience force aa. Assuming the particles are independent, this is given by

p(1)=k=0NBN,α(k)Θ(ka+(Nk)b)\displaystyle p(1)=\sum_{\mathclap{k=0}}^{N}B_{N,\alpha}(k)\Theta(ka+(N-k)b) (175)

where BN,αB_{N,\alpha} is the binomial probability of kk successes, given NN trials with success probability α\alpha. To compute p(1|a)p(1|a), note that, given that X1X_{1} experiences force aa, M=1M=1 whenever the other N1N-1 particles experience a net force larger than a-a. The probability of this event is

p(1|a)=k=0N1BN1,α(k)Θ(ka+(N1k)b+a).\displaystyle p(1|a)=\sum_{\mathclap{k=0}}^{N-1}B_{N-1,\alpha}(k)\Theta(ka+(N-1-k)b+a). (176)

Conversely, if X1X_{1} experiences force bb, then M=1M=1 if the other N1N-1 particles experience a net force larger than b-b, which has probability

p(1|b)=k=0N1BN1,α(k)Θ(ka+(N1k)b+b).\displaystyle p(1|b)=\sum_{\mathclap{k=0}}^{N-1}B_{N-1,\alpha}(k)\Theta(ka+(N-1-k)b+b). (177)

Plugging Eqs. 175, 176 and 177 into Eq. 174 gives I(X1;M)I(X_{1};M). Multiplying by NN gives the accessible information,

Iaccϕ𝒞(X;M)=NI(X1;M)=\displaystyle I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)=N\cdot I(X_{1};M)= (178)
N[h2(k=0NBN,α(k)Θ(ka+(Nk)b))\displaystyle N\Bigg{[}h_{2}\Bigg{(}\sum_{\mathclap{k=0}}^{N}B_{N,\alpha}(k)\Theta(ka+(N-k)b)\Bigg{)}-
αh2(k=0N1BN1,α(k)Θ(ka+(N1k)b+a))\displaystyle\alpha h_{2}\Bigg{(}\sum_{\mathclap{k=0}}^{N-1}B_{N-1,\alpha}(k)\Theta(ka+(N-1-k)b+a)\Bigg{)}-
(1α)h2(k=0N1BN1,α(k)Θ(ka+(N1k)b+b))],\displaystyle(1-\alpha)h_{2}\Bigg{(}\sum_{\mathclap{k=0}}^{N-1}B_{N-1,\alpha}(k)\Theta(ka+(N-1-k)b+b)\Bigg{)}\Bigg{]},

This is shown in Fig. 12(left) for different values of NN and α\alpha.

To compute the efficiency values in Fig. 12(right), we simply divide Iaccϕ𝒞(X;M)I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M) by I(X;M)I(X;M) the total mutual information between the measurement and all particles. Since the measurement in Eq. 80 is deterministic, this mutual information is given by the entropy of MM,

I(X;M)=S(M)=h2(p(1)),\displaystyle I(X;M)=S(M)=h_{2}(p(1)), (179)

which can be computed using Eq. 175.

We now compute the asymptotic value of accessible information and efficiency in the NN\to\infty limit. The sum of a large number of independent random variables with mean 0 and variance 1/(α(1α))1/(\alpha(1-\alpha)) approaches a Gaussian with mean 0 and variance N/(α(1α))N/(\alpha(1-\alpha)). Thus, in the NN\to\infty limit, the probability that the force is positive converges to p(1)=1/2p(1)=1/2, so I(X;M)=S(M)I(X;M)=S(M) converges to ln2\ln 2. Recall that p(1|a)p(1|a) is given by the probability that N1N-1 particles experience a net force larger than a-a. In the NN\to\infty limit, this conditional probability converges to

p(1|a)=1Φα,N1(a)=Φα,N1(a).p(1|a)=1-\Phi_{\alpha,N-1}(-a)=\Phi_{\alpha,N-1}(a).

where Φα,N1\Phi_{\alpha,N-1} is the cumulative distribution function of a Gaussian with mean 0 and variance N/(α(1α)){N}/(\alpha(1-\alpha)). We can similarly calculate

p(1|b)=1Φα,N1(b)=Φα,N1(b).p(1|b)=1-\Phi_{\alpha,N-1}(-b)=\Phi_{\alpha,N-1}(b).

Plugging into Eq. 174 gives

I(X1;M)=ln2αh2(Φα,N1(a))(1α)h2(Φα,N1(b)).I(X_{1};M)=\\ \ln 2-\alpha h_{2}\big{(}\Phi_{\alpha,N-1}(a)\big{)}-(1-\alpha)h_{2}\big{(}\Phi_{\alpha,N-1}(b)\big{)}. (180)

Using a=1/αa=1/\alpha and b=1/(1α)b=-1/(1-\alpha) and some analysis (e.g., by taking limits in Mathematica) shows that

limNNI(X1;M)=1π,\displaystyle\lim_{N\to\infty}N\cdot I(X_{1};M)=\frac{1}{\pi}, (181)

irrespective of α\alpha. This is the asymptotic accessible information, which appears as the dotted line in Fig. 12(left). The asymptotic efficiency, which appears as the dotted line in Fig. 12(right), is given by 1/(πln2)1/(\pi\ln 2) (since I(X;M)=ln2I(X;M)=\ln 2 in the NN\to\infty limit).

Appendix D Coarse-grained constraints

D.1 Derivation of Eq. 82 from Eqs. 83 and 85

In general, the microstate distribution pp evolves according to some generator LL, tp(t)=Lp(t){\textstyle\partial_{t}}p(t)=Lp(t), the macrostate distribution pZp_{Z} evolves according to a coarse-grained generator L^p\hat{L}^{p}. In general, the coarse-grained dynamics will not be closed, meaning that L^p\hat{L}^{p} can depend on the microstate distribution pp. In this section, we provide concrete conditions on the generators that guarantee that the coarse-grained dynamics are closed. In the following derivations, for notational simplicity, we omit the dependence of p(x,t)p(x,t) and p(z,t)p(z,t) on tt.

For discrete-state master equations, the coarse-grained dynamics are given by (esposito2012stochastic, )

tpZ(z)=L^ppZ(z)=z[L^zzppZ(z)L^zzppZ(z)],{\textstyle\partial_{t}}p_{Z}(z)=\hat{L}^{p}p_{Z}(z)=\sum_{z}\Big{[}\hat{L}^{p}_{zz^{\prime}}p_{Z}(z^{\prime})-\hat{L}^{p}_{z^{\prime}z}p_{Z}(z)\Big{]}, (182)

where L^zzp\hat{L}^{p}_{zz^{\prime}} is the transition rate from macrostate zz^{\prime} to zz,

L^zzp=xp(x|z)xδξ(x)(z)Lxx.\hat{L}^{p}_{zz^{\prime}}=\sum_{x^{\prime}}p({x^{\prime}|z^{\prime}})\sum_{x}\delta_{\xi(x)}(z)L_{xx^{\prime}}. (183)

By plugging Eq. 83 into Eq. 183 and simplifying, one can verify that L^zzp\hat{L}^{p}_{zz^{\prime}} does not depend on the microstate distribution pp, therefore Eq. 82 holds.

A similar approach can be used for continuous-state master equations.

We now consider Fokker-Planck equations of the form Eq. 84, given a linear coarse-graining function ξ(x)=Wx\xi(x)=Wx. Using (duong2018quantification, , Prop. 2.8), we write the evolution of the coarse-grained distribution pZp_{Z} as

tpZ(z)=(𝖠^(z)pZ(z))+β1tr(HT(𝖣^(z)pZ(z))),{\textstyle\partial_{t}}p_{Z}(z)=\nabla\cdot(\hat{\mathsf{A}}(z)p_{Z}(z))+\beta^{-1}\mathrm{tr}(H^{T}(\hat{\mathsf{D}}(z)p_{Z}(z))), (184)

where HH is the Hessian matrix of second derivative operators, and we’ve defined

𝖠^(z)\displaystyle\hat{\mathsf{A}}(z) :=[p(x|z)WE(x)β1Δξ(x)]𝑑x\displaystyle:=\int\left[p(x|z)W\nabla E(x)-\beta^{-1}\Delta\xi(x)\right]dx (185)
=[p(x|z)WE(x)]𝑑x\displaystyle=\int\left[p(x|z)W\nabla E(x)\right]dx (186)
=F^(z),\displaystyle=-\hat{F}(z), (187)
𝖣^(z)\displaystyle\hat{\mathsf{D}}(z) :=p(x|z)WWT𝑑x=I.\displaystyle:=\int p(x|z)WW^{T}\,dx=I. (188)

We used Eq. 2.29 from duong2018quantification in Eq. 185, the linearity of ξ\xi in Eq. 186, and Eq. 85 in Eq. 187. We used Eq. 2.30 from duong2018quantification and the assumption that WWT=IWW^{T}=I in Eq. 188. It is easy to check that tr(HT(IpZ))=ΔpZ\mathrm{tr}(H^{T}(Ip_{Z}))=\Delta p_{Z}; combined with Eqs. 187, 188 and 184, this gives to Eq. 86. Since the right hand side of Eq. 86 does not depend on the microstate distribution, the coarse-grained dynamics are closed.

D.2 Derivation of Eq. 87

Our derivation below does not assume isothermal protocols, so the inequality in Eq. 87 holds both for isothermal protocols and for protocols connected to any number of thermodynamic reservoirs.

To derive this result for a given LL, we make two assumptions. First, as described in the main text, we assume that the coarse-grained dynamics are closed, Eq. 82. Second, we assume that the coarse-grained stationary distribution πZ\pi_{Z} (where π\pi is the stationary distribution of LL), is invariant under conjugation of odd-parity variables,

πZ(ξ(x))=πZ(ξ(x))xX\pi_{Z}(\xi(x))=\pi_{Z}(\xi(x^{\dagger}))\qquad\forall x\in X (189)

where xx^{\dagger} indicate the conjugation of state xx in which all odd-parity variables (such as momentum) have their sign flipped. For an isothermal protocol, the stationary distributions are equilibrium distributions, and Eq. 189 is satisfied lee_fluctuation_2013 . For more general protocols, Eq. 189 holds if there are no odd-parity variables (e.g., overdamped dynamics), so x=xx=x^{\dagger}. It also holds if the coarse-graining function maps each xx and its conjugate to the same macrostate, ξ(x)=ξ(x)\xi(x)=\xi(x^{\dagger}), as well as some other cases.

Now imagine a system that starts from some initial distribution pp at time t=0t=0, and then undergoes free relaxation under LL towards a (possibly nonequilibrium) stationary distribution π\pi, reaching a final distribution pp^{\prime} by time t=τt=\tau. Next, we use existing results in stochastic thermodynamics esposito_three_2010 ; lee_fluctuation_2013 and write the EP incurred over time interval t[0,τ]t\in[0,\tau] as

Σ(τ)=D(p(𝒙,𝝂)p~(𝒙~,𝝂~)),\displaystyle\Sigma(\tau)=D(p(\bm{x},\bm{\nu})\|\tilde{p}(\tilde{\bm{x}}^{\dagger},\tilde{\bm{\nu}})), (190)

(see also Section A.2), where:

  1. 1.

    𝒙=(x,,x)\bm{x}=(x,\dots,x^{\prime}) indicate a continuous-time trajectory of system states over time interval t[0,τ]t\in[0,\tau], where xx and xx^{\prime} indicate the initial and final system states respectively, and 𝒙~=(x,,x)\tilde{\bm{x}}^{\dagger}=({x^{\prime}}^{\dagger},\dots,x^{\dagger}) is the corresponding time-reversed and conjugated trajectory;

  2. 2.

    𝝂\bm{\nu} is a sequence of reservoirs which exchange conserved quantities with the system during t[0,τ]t\in[0,\tau] and 𝝂~\tilde{\bm{\nu}} is the corresponding time-reversed sequence esposito_three_2010 ; van2010three ; esposito2010three ;

  3. 3.

    p(𝒙,𝝂)=P(𝒙,𝝂|x)p(x)p(\bm{x},\bm{\nu})=P(\bm{x},\bm{\nu}|x)p(x) is the probability of forward trajectory (𝒙,𝝂)(\bm{x},\bm{\nu}) given initial distribution pp, where P(𝒙,𝝂|x)P(\bm{x},\bm{\nu}|x) is the conditional distribution generated by the free relaxation;

  4. 4.

    p~(𝒙~,𝝂~)=P(𝒙~,𝝂~|x)p(x)\tilde{p}(\tilde{\bm{x}}^{\dagger},\tilde{\bm{\nu}})=P(\tilde{\bm{x}}^{\dagger},\tilde{\bm{\nu}}|{x^{\prime}}^{\dagger})p^{\prime}(x^{\prime}) is the probability of reverse trajectory (𝒙~,𝝂~)(\tilde{\bm{x}}^{\dagger},\tilde{\bm{\nu}}) under a free relaxation that starts with the following distribution:

    p(x)=P(x|x)p(x)𝑑x.\displaystyle p^{\prime}(x^{\prime})=\int P(x^{\prime}|x)p(x)dx. (191)

Using the fact that EP decreases under state-space and temporal coarse-graining esposito2012stochastic ; gomez2008cg , we bound Eq. 190 as

Σ(τ)D(p(𝒙)p(𝒙~))D(p(z,z)p~(z,z)),\displaystyle\Sigma(\tau)\geq D(p(\bm{x})\|p(\tilde{\bm{x}}^{\dagger}))\geq D(p(z,z^{\prime})\|\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})), (192)

where z=ξ(x)z=\xi(x), z=ξ(x)z^{\prime}=\xi(x^{\prime}), z=ξ(x)z^{\dagger}=\xi(x^{\dagger}), and z=ξ(x){z^{\prime}}^{\dagger}=\xi({x^{\prime}}^{\dagger}). The final KL divergence can be decomposed as

D(p(z,z)p~(z,z))=[D(pZπZ)D(pZπZ)]+p(z,z)ln[p(z,z)πZ(z)pZ(z)p~(z,z)pZ(z)πZ(z)]𝑑z𝑑z.D(p(z,z^{\prime})\|\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger}))=\left[D(p_{Z}\|\pi_{Z})-D({p_{Z}^{\prime}}\|\pi_{Z})\right]+\\ \int p(z,z^{\prime})\ln\left[\frac{p(z,z^{\prime})\pi_{Z}(z){p_{Z}^{\prime}}(z^{\prime})}{\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})p_{Z}(z)\pi_{Z}(z^{\prime})}\right]dz\,dz^{\prime}. (193)

Using Jensen’s inequality, we lower bound the integral term as

p(z,z)ln[p(z,z)πZ(z)pZ(z)p~(z,z)pZ(z)πZ(z)]𝑑z𝑑z\displaystyle\int p(z,z^{\prime})\ln\left[\frac{p(z,z^{\prime})\pi_{Z}(z){p_{Z}^{\prime}}(z^{\prime})}{\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})p_{Z}(z)\pi_{Z}(z^{\prime})}\right]dz\,dz^{\prime}
=p(z,z)ln[p~(z,z)pZ(z)πZ(z)p(z,z)πZ(z)pZ(z)]𝑑z𝑑z\displaystyle\quad=-\int p(z,z^{\prime})\ln\left[\frac{\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})p_{Z}(z)\pi_{Z}(z^{\prime})}{p(z,z^{\prime})\pi_{Z}(z){p_{Z}^{\prime}}(z^{\prime})}\right]dz\,dz^{\prime}
ln[p~(z,z)pZ(z)πZ(z)πZ(z)pZ(z)𝑑z𝑑z].\displaystyle\quad\geq-\ln\left[\int\frac{\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})p_{Z}(z)\pi_{Z}(z^{\prime})}{\pi_{Z}(z){p_{Z}^{\prime}}(z^{\prime})}dz\,dz^{\prime}\right]. (194)

Note that πZ(z)=πZ(z)\pi_{Z}(z^{\prime})=\pi_{Z}({z^{\prime}}^{\dagger}) by Eq. 189, and p~Z(z)=pZ(z)\tilde{p}_{Z}({z^{\prime}}^{\dagger})={p_{Z}^{\prime}}(z^{\prime}) by the definition of pZ{p_{Z}^{\prime}} in Eq. 191, allowing us to rewrite the RHS of Eq. 194 as

ln[pZ(z)πZ(z)[p~(z|z)πZ(z)𝑑z]𝑑z].-\ln\left[\int\frac{p_{Z}(z)}{\pi_{Z}(z)}\left[\int\tilde{p}(z^{\dagger}|{z^{\prime}}^{\dagger})\pi_{Z}({z^{\prime}}^{\dagger})dz^{\prime}\right]dz\right]. (195)

The inner integral can be further rewritten as

p~(z|z)πZ(z)𝑑z\displaystyle\int\tilde{p}(z^{\dagger}|{z^{\prime}}^{\dagger})\pi_{Z}({z^{\prime}}^{\dagger})dz^{\prime} =P(z|x)p~(x|z)πZ(z)𝑑x\displaystyle=\int P(z^{\dagger}|{x^{\prime}}^{\dagger})\tilde{p}({x^{\prime}}^{\dagger}|{z^{\prime}}^{\dagger})\pi_{Z}({z^{\prime}}^{\dagger})dx^{\prime}
=πZ(z)\displaystyle=\pi_{Z}(z^{\dagger})
=πZ(z),\displaystyle=\pi_{Z}(z),

where in the second line we used the assumption of closed dynamics (Eq. 82) and the stationarity of π\pi under P(|)P(\cdot|\cdot), and in the third line we used Eq. 189. We can then rewrite Eq. 195 as

ln[πZ(z)πZ(z)πZ(z)𝑑z]=0.-\ln\left[\int\frac{\pi_{Z}(z)}{\pi_{Z}(z)}\pi_{Z}(z)\,dz\right]=0.

Combined with Eq. 194, this implies that the integral term in Eq. 193 is non-negative. Combining with Eq. 192 gives

Σ(τ)D(pZπZ)D(pZπZ).\Sigma(\tau)\geq D(p_{Z}\|\pi_{Z})-D({p_{Z}^{\prime}}\|\pi_{Z}).

Finally, using the definition of the EP rate and the results above,

Σ˙(p,L)\displaystyle\dot{\Sigma}(p,L) :=limτ01τΣ(τ)\displaystyle:=\lim_{\tau\to 0}\frac{1}{\tau}\Sigma(\tau)
limτ01τ[D(pZπZ)D(pZπZ)]\displaystyle\geq\lim_{\tau\to 0}\frac{1}{\tau}[D(p_{Z}\|\pi_{Z})-D({p_{Z}^{\prime}}\|\pi_{Z})]
=tpZ(t)(z)lnpZ(z)πZ(z)dz0,\displaystyle=-\int{\textstyle\partial_{t}}p_{Z}(t)(z)\ln\frac{p_{Z}(z)}{\pi_{Z}(z)}\,dz\geq 0, (196)

where tpZ(t)=L^pZ{\textstyle\partial_{t}}p_{Z}(t)=\hat{L}p_{Z}. Eq. 196 follows from Eqs. 107, 108, 109 and 110 above (with summations replaced by integrals). The discrete-state form of Eq. 196, and also where pp and LL are explicitly time-dependent, appears in the main text as Eq. 87.