\bstctlcite

BSTcontrol

Work, entropy production, and thermodynamics of information under protocol constraints

Artemy Kolchinsky [email protected] Santa Fe Institute, Santa Fe, New Mexico David H. Wolpert Complexity Science Hub, Vienna; Arizona State University, Tempe, Arizona; http://davidwolpert.weebly.com Santa Fe Institute, Santa Fe, New Mexico

Abstract

In many real-world situations, there are constraints on the ways in which a physical system can be manipulated. We investigate the entropy production (EP) and extractable work involved in bringing a system from some initial distribution $p$ to some final distribution $p^{\prime}$ , given that the set of master equations available to the driving protocol obeys some constraints. We first derive general bounds on EP and extractable work, as well as a decomposition of the nonequilibrium free energy into an “accessible free energy” (which can be extracted as work, given a set of constraints) and an “inaccessible free energy” (which must be dissipated as EP). In a similar vein, we consider the thermodynamics of information in the presence of constraints, and decompose the information acquired in a measurement into “accessible” and “inaccessible” components. This decomposition allows us to consider the thermodynamic efficiency of different measurements of the same system, given a set of constraints. We use our framework to analyze protocols subject to symmetry, modularity, and coarse-grained constraints, and consider various examples including the Szilard box, the 2D Ising model, and a multi-particle flashing ratchet.

I Introduction

I.1 Background

One of the foundational issues in thermodynamics is quantifying how much work is required to transform a system between two thermodynamic states. Recent results in statistical physics have derived general bounds on work which hold even for transformations between nonequilibrium states (takara_generalization_2010, ; parrondo2015thermodynamics, ). In particular, suppose one wishes to transform a system with initial distribution $p$ and energy function $E$ to some final distribution $p^{\prime}$ and energy function $E^{\prime}$ . For an isothermal process, during which the system remains in contact with a single heat bath at inverse temperature $\beta$ , the work extracted during this transformation obeys

W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(p)-F_{E^{\prime}}(p^{\prime}),

(1)

where $F_{E}(p):=\left\langle E\right\rangle_{p}-S(p)/\beta$ is the (nonequilibrium) free energy of distribution $p$ given energy function $E$ (takara_generalization_2010, ; parrondo2015thermodynamics, ; esposito2011second, ). This inequality comes from the second law of thermodynamics, which states that entropy production (EP), the total increase of the entropy of the system and all coupled reservoirs, is non-negative. For an isothermal process that carries out the transformation $p\!\shortrightarrow\!p^{\prime}$ , EP is given by

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})=\beta[F_{E}(p)-F_{E^{\prime}}(p^{\prime})-W(p\!\shortrightarrow\!p^{\prime})]\geq 0.

(2)

Eq. 1 follows from Eq. 2 by a simple rearrangement.

To extract work from a system, one must manipulate the system by applying a driving protocol. There are many different driving protocols that can be used to transform some initial distribution $p$ to some final distribution $p^{\prime}$ , which generally incur different amounts of EP and work. Achieving the fundamental bounds set by the second law, such as Eq. 1, typically requires idealized protocols, which make use of arbitrary energy functions, infinite timescales, etc. In many real-world scenarios, however, there are strong practical constraints on how one can manipulate a system, and such idealized protocols are unavailable.

The goal of this paper is to derive stronger bounds on EP and work involves in carrying out the transformation $p\rightarrow p^{\prime}$ , given constraints on the set of master equations available to the driving protocol. Ultimately, such stronger bounds on EP and work can provide new insights into various real-world thermodynamic processes and work-harvesting devices, ranging from biological organisms to artificial engines. They can also cast new light on some well-studied scenarios in statistical physics.

Refer to caption — Figure 1: A two-dimensional Szilard box with a single Brownian particle, where a vertical partition (blue) can be positioned at different horizontal locations in the box. We demonstrate that only information about the particle’s horizontal position, not its vertical position, can be used to extract work from the system.

For example, consider a two-dimensional Szilard box connected to a heat bath ¹¹1We use a Brownian model of the Szilard engine, which is similar to setups commonly employed in modern nonequilibrium statistical physics berut2012experimental ; roldan2014universal ; koski2014experimental ; shizume1995heat ; gong2016stochastic ; parrondo2015thermodynamics , as shown in Fig. 1. This model can be justified by imagining a box that contains a large colloidal particle, as well as a medium of small solvent particles to which the vertical partition is permeable. Note that this model differs from Szilard’s original proposal szilard1929entropieverminderung , in which the box contains a single particle in a vacuum, which has been analyzed in proesmans2015efficiency ; hondou2007equation ; bhat2017unusual ., which contains a single Brownian particle and a vertical partition, and suppose that the driving protocols can manipulate the horizontal position of this partition. Imagine that the particle is initially located in the left half of the box. How much work can be extracted by transforming this initial distribution to a uniform final distribution, assuming the system begins and ends with a uniform energy function? A simple application of Eq. 1 shows that the extractable work is upper bounded by $(\ln 2)/\beta$ . This bound can be achieved by quickly moving the vertical partition to the middle of the box, and then slowly expanding it rightward. Now imagine an alternative scenario, in which the particle is initially located in the top half of the box. By Eq. 1, the work that can be extracted by bringing this initial distribution to a uniform final distribution is again upper bounded by $(\ln 2)/\beta$ . Intuitively, however, it seems that this bound should not be achievable, given the constrained set of available protocols (i.e., one can only manipulate the system by moving the vertical partition left and right). Our results will make this intuition rigorous for the two-dimensional Szilard box, as well as various other systems that can only be manipulated by a constrained set of driving protocols.

This phenomenon also occurs when the starting and ending distributions can depend on the outcome of a measurement of the system. This kind of setup, which was first used to analyze the thermodynamics of information in various kinds of Maxwellian demons, is sometimes called “feedback control” in the literature sagawa2008second ; parrondo2015thermodynamics . Imagine that the state of some system $X$ is first measured using some observation channel (conditional distribution) $q(m|x)$ , producing measurement outcome $m$ with probability $p(m)=\sum_{x}p(x)q(m|x)$ . The system then undergoes a driving protocol which can depend on $m$ . For simplicity, we assume that the system’s energy function begins as $E$ and ends as $E^{\prime}$ for all measurement outcomes. Let $p_{X|m}$ and $p_{X^{\prime}|m}^{\prime}$ indicate the system’s initial and final conditional distributions given measurement outcome $m$ , and let $p(x)=\sum_{m}p(m)p_{X|m}(x|m)$ and $p^{\prime}(x^{\prime})=\sum_{m}p(m)p_{X^{\prime}|m}^{\prime}(x^{\prime}|m)$ indicate the system’s initial and final marginal distributions (for simplicity, below we often use notation like $p$ , instead of $p(x)$ ). We can then take expectations of both sides of Eq. 1 across measurement outcomes, thereby bounding the average extractable work as ²²2As common in the literature, in Eq. 3 we consider only the work that is extractable from the system after the measurement is made. We do not account for the possible work cost of making the measurement, nor any work exchanges that may be incurred by the measurement apparatus during the driving.

\displaystyle\left\langle W\right\rangle

\displaystyle\leq\sum_{m}p(m)[F_{E}(p_{X|m})-F_{E^{\prime}}(p_{X^{\prime}|m}^{\prime})].

(3)

By adding and subtracting $[S(p)-S(p^{\prime})]/\beta$ on the right hand side, we can further rewrite Eq. 3 in terms of the drop of the free energy in the marginal distribution, plus the loss of information between the measurement and the system over the course of the protocol,

\displaystyle\left\langle W\right\rangle\leq F_{E}(p)-F_{E^{\prime}}(p^{\prime})+[I(X;M)-I(X^{\prime};M)]/\beta,

(4)

where $I(X;M)$ and $I(X^{\prime};M)$ indicate the mutual information under the conditional distributions $p_{X|m}$ and $p_{X^{\prime}|m}^{\prime}$ respectively. Comparing Eq. 1 and Eq. 4, the bound on average extractable work increases with the drop of mutual information. This is a classic result from the “thermodynamics of information” sagawa2008second ; parrondo2015thermodynamics , which shows that information about the state of a system can be used to increase the work extracted from this system.

Just like Eq. 1, the bound in Eq. 4 is typically saturated by idealized protocols, which have access to arbitrary energy functions, infinite timescales, etc. As mentioned above, in the real-world there are typically constraints on the available protocols, in which case the bound of Eq. 4 may not be achievable. For example, consider again the Szilard box shown in Fig. 1. Imagine measuring a bit of information about the location of the particle and then using this information to extract work from the system while driving it back to a uniform equilibrium distribution. In this case $I(X;M)=\ln 2$ and $I(X^{\prime};M)=0$ , so if the system starts and ends with the uniform energy function, Eq. 4 states that $\langle W\rangle\leq(\ln 2)/\beta$ . Intuitively, however, it seems that measuring the particle’s horizontal position should be useful for extracting work from the system, while measuring the particle’s vertical position should not be useful. The general bound of Eq. 4 does not distinguish between these two kinds of measurements. In fact, this bound depends only on the overall amount of information acquired by the measurement (as quantified by $I(X;M)$ ), and is therefore completely insensitive to the content of that information (i.e., the particular pattern of correlations quantified by $I(X;M)$ ).

I.2 Summary of results and roadmap

In this paper we derive bounds on extractable work and EP which arise when carrying out the transformation $p\!\shortrightarrow\!p^{\prime}$ under constraints on the driving protocol. We consider a system coupled to a single heat bath which undergoes a driving protocol over some time interval $t\in[0,1]$ (where the units of time are arbitrary). A driving protocol is represented as a continuous-time master equation $L(t)$ , where $L(t)$ refers to the (infinitesimal) generator at time $t$ . For example, a driving protocol could be a trajectory of time-dependent discrete-state rate matrices, or a trajectory of time-dependent Fokker-Planck operators for a continuous-state system.

We say that a driving protocol is constrained if there is some restricted set of generators $\Lambda$ such that $L(t)\in\Lambda$ at all times $t\in[0,1]$ . As discussed below, the particular choice of $\Lambda$ depends on the specific constraints being considered. For example, $\Lambda$ might represent a set of generators that are invariant under some particular symmetry group (e.g., representing the dynamics of a set of indistinguishable particles, or a spin system on a lattice with symmetries).

Our analysis proceeds at three different “levels” of generality, which we summarize in the following subsections.

Level 1: General mathematical framework

In the first level of analysis, presented in Sections III and IV, we provide a general mathematical framework for deriving bounds on EP and work for constrained driving protocols.

To develop our framework, given some some set of allowed generators $\Lambda$ , we consider an associated operator operator $\phi$ over distributions which satisfies two conditions: it obeys the so-called Pythagorean identity from information geometry, and it commutes with the dynamics generated by elements of $\Lambda$ (Eqs. 14 and 16 below). Given such an operator $\phi$ , in Section III we show that for any distribution $p$ , the distribution $\phi(p)$ contains only that part of the free energy in $p$ which may be turned into work by a constrained driving protocol. Formally, we decompose the nonequilibrium free energy of distribution $p$ and energy function $E$ as

\displaystyle F_{E}(p)=F_{E}(\phi(p))+D(p\|\phi(p))/\beta,

(5)

where $D(\cdot\|\cdot)$ indicates the Kullback-Leibler divergence. Then, for any constrained driving protocol that carries out the transformation $p\!\shortrightarrow\!p^{\prime}$ , the extractable work is bounded as

\displaystyle W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(\phi(p))-F_{E^{\prime}}(\phi(p^{\prime})).

(6)

We also demonstrate that EP can be lower bounded by the contraction of the Kullback-Leibler (KL) divergence between $p$ and $\phi(p)$ over the course of the protocol,

\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime})).

(7)

Given these bounds, it can be seen that Eq. 5 decomposes the nonequilibrium free energy $F_{E}(p)$ into two terms: an accessible free energy $F_{E}(\phi(p))$ , whose decrease over the course of the protocol may be extractable as work, and an inaccessible free energy $D(p\|\phi(p))/\beta$ , whose decrease over the course of the protocol cannot be turned into work and must be dissipated as EP. The accessible free energy is always less than the overall free energy, $F_{E}(\phi(p))\leq F_{E}(p)$ , which follows from Eq. 5 and the non-negativity of KL divergence. We also show that the right hand side of Eq. 7 is non-negative,

D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\geq 0,

(8)

which implies that our bounds on EP and work, Eqs. 7 and 6 respectively, are stronger than the general bounds provided by the second law ( $\Sigma\geq 0$ and Eq. 1). Note that Eq. 8 also implies an irreversibility condition on the dynamics: for any two distributions $p$ and $p^{\prime}$ , a constrained driving protocol can either carry out the transformation $p\!\shortrightarrow\!p^{\prime}$ or the transformation $p^{\prime}\!\shortrightarrow\!p$ but not both — unless $D(p\|\phi(p))=D(p^{\prime}\|\phi(p^{\prime}))$ .

In Section IV, we show that the general framework summarized above has important implications for thermodynamics of information. We consider the type of feedback-control setup discussed above: an observation apparatus first makes a measurement $m$ of the system, then the system undergoes a driving protocol (which can depend on $m$ ) that carries out the transformation $p_{X|m}\!\shortrightarrow\!p_{X^{\prime}|m}^{\prime}$ . Suppose that the driving protocols corresponding to all $m$ obey bounds like Eq. 6 for the same operator $\phi$ . This operator then gives rise to the “mapped” initial and final conditional distributions $\phi(p_{X|m})$ and $\phi(p_{X^{\prime}|m}^{\prime})$ . We can then bound average extractable work for feedback control under constraints as

\displaystyle\left\langle W\right\rangle\leq F_{E}(p)-F_{E^{\prime}}(p^{\prime})+[I_{\mathrm{acc}}^{\phi}(X;M)-I_{\mathrm{acc}}^{\phi}(X^{\prime};M)]/\beta,

where the accessible information component of the initial mutual information $I(X;M)$ is defined as

\displaystyle I_{\mathrm{acc}}^{\phi}(X;M)=I(X;M)-D(p_{X|M}\|\phi(p_{X|M})),

(9)

and similarly for similarly for $I_{\mathrm{acc}}^{\phi}(X^{\prime};M)$ . This bound is a refinement of Eq. 4 in the presence of protocol constraints, which shows that the amount of extractable work depends on the accessible information $I_{\mathrm{acc}}^{\phi}(X;M)$ , rather than the actual mutual information $I(X;M)$ . Loosely speaking, the accessible information reflects the “alignment” between the choice of measured observable and the way the system can be manipulated, given some protocol constraints. This means that, in the presence of constraints, the thermodynamic value of information depends not only on the amount of measured information, but also the content of that information (corning1998thermodynamics, ; kolchinsky2018semantic, ). (See also kauffman2000investigations for a popular discussion of some related issues.)

It is important to note that at this general level of analysis, we do not describe how to construct the operator $\phi$ , as this construction will typically depend on the structure of the set $\Lambda$ . However, as described in the following subsection, we do provide explicit expressions for $\phi$ for three broad classes of protocol constraints, which we term symmetry, modularity, and coarse-grained constraints.

Level 2: Symmetry, modularity, and coarse-grained constraints

At the second level of our analysis, we apply the general framework described above to derive bounds on EP and work for three broad classes of protocol constraints:

•

Section V considers symmetry constraints, when the available generators possess some symmetry group. Examples of systems with symmetry constraints include the Szilard box in Fig. 1, spin systems on lattices, and gases of indistinguishable particles. The operator $\phi$ corresponding to symmetry constraints, defined in Eq. 42, maps distributions to their “symmetrized” versions (which are invariant under the action of the symmetry group).
•

Section VI considers modularity constraints, when the available generators cause different (though possibly overlapping) subsystems of a multivariate system to evolve independently of each other. Examples of systems with modularity constraints include digital circuits wolpert2020thermodynamic , ideal gases, and multi-particle Maxwellian demons. The operator $\phi$ corresponding to modularity constraints, defined in Eq. 64, maps distributions to their “uncorrelated” versions, without statistical dependencies between independent subsystems.
•

Section VII considers coarse-grained constraints, when the available generators exhibit closed coarse-grained dynamics which obey some constraints (e.g., coarse-grained symmetry or modularity constraints). An example is provided by the Szilard box in Fig. 1: the particle’s vertical position (the coarse-grained macrostate) evolves in a way that does not depend on the horizontal position, and the macrostate equilibrium distribution cannot be controlled by moving the partition. Given a protocol that obeys coarse-grained constraints, we show that the EP can be lower bounded in terms of a “coarse-grained EP”, Eqs. 87, 88 and 89, and that this coarse-grained EP can itself be lower bounded by a coarse-grained version of Eq. 7.

In addition, we also discuss how tighter bounds on work and EP can be derived by combining different kinds of constraints (e.g., when a system obeys two different symmetry groups, or when it obeys both symmetry and modularity constraints).

Level 3: Concrete examples

At the third (and most concrete) level, we illustrate our results for symmetry, modularity, and coarse-grained constraints on several example systems:

•

In Section V.1, we use symmetry constraints to derive thermodynamic bounds for the Szilard box in Fig. 1, which possesses vertical reflection symmetry.
•

In Section V.2, we use symmetry constraints to derive thermodynamic bounds for the Ising model on a 2D lattice, which possesses translational symmetry.
•

In Section VI.1, we use modularity constraints to derive thermodynamic bounds for the Szilard box in Fig. 1, which are different from the bounds derived in Section V.1. We also demonstrate that stronger results can be derived by combining bounds arising from symmetry and modularity constraints.
•

In Sections VI.2 and VI.3, we use modularity constraints to derive bounds on work extraction for two multi-particle feedback-control protocols that have been proposed in the literature: a multi-particle Szilard box song2021optimal and a collective flashing ratchet cao2004feedback .
•

In Section VII.1, we use coarse-grained constraints to derive thermodynamic bounds for a version of the Szilard box in Fig. 1 in the presence of gravity. We also demonstrate that stronger results can be derived by combining bounds arising from coarse-grained and modularity constraints.

Literature review and discussion

After presenting the results summarized above, in Section VIII we discuss related prior literature. We also compare and contrast our results, such as the decomposition of nonequilibrium free energy in Eq. 5, to some relevant work in quantum thermodynamics janzing_quantum_2006 ; vaccaro_tradeoff_2008 . We conclude with a brief discussion in Section IX, which also touches upon how our approach generalizes beyond the assumption of a single heat bath. Proofs and derivations are in the appendices.

II Preliminaries

We consider a physical system with state space $X$ , which can be either discrete or continuous ( $X=\mathbb{R}^{n}$ ). The term “probability distribution” will refer to a probability mass function over $X$ in the discrete case and to a probability density function over $X$ in the continuous case. We interchangeably use notation like $p(x)$ and $p_{x}$ (as will be clear from context) to indicate the probability of state $x$ . We use $\mathcal{P}$ to refer to the set of all probability distributions over $X$ .

The system evolves in a stochastic manner during a driving protocol over time $t\in[0,1]$ . We will write $p(t)$ to indicate the distribution at time $t$ corresponding to some initial distribution $p(0)=p$ , and $p(1)=p^{\prime}$ to indicate the distribution at the end of the protocol. For a discrete-state system, the distribution at time $t$ evolves according to the time-dependent master equation,

\displaystyle{\textstyle\partial_{t}}p_{x}(t)=\sum_{x^{\prime}}\left[L_{xx^{\prime}}(t)p_{x^{\prime}}(t)-L_{x^{\prime}x}(t)p_{x}(t)\right],

(10)

where $L_{x^{\prime}x}(t)$ is the transition rate from state $x$ to state $x^{\prime}$ . We assume that the system is coupled to a heat bath at inverse temperature $\beta$ , and so each $L(t)$ obeys local detailed balance (see Section IX for a generalization of this assumption). Formally, this means that $\pi^{L(t)}_{x^{\prime}}L_{xx^{\prime}}(t)=\pi^{L(t)}_{x}L_{x^{\prime}x}(t)$ for all $x$ , $x^{\prime}$ , and $t$ , where $\pi^{L(t)}$ is the stationary distribution of rate matrix $L(t)$ , which we assume is unique (though this latter assumption can be relaxed ³³3The assumption of unique stationary distributions can be relaxed as long as the operator $\phi$ (as discussed in Section III) satisfies the following weak technical condition: for all $p\in\mathcal{P}$ and each stationary distribution $\pi$ of each $L\in\Lambda$ , $D(p\|\phi(\pi))<\infty$ whenever $D(p\|\pi)<\infty$ . Note that $\phi(\pi)$ is also a stationary distribution of $L$ by Lemma 1 in Appendix A, so this condition is automatically satisfied when the generators have unique stationary distributions (since in that case $\pi=\phi(\pi)$ ). Note also that if some $L\in\Lambda$ have multiple stationary distributions $\pi$ , the corresponding EP rate in Eq. 11 can be equivalently defined using any $\pi$ such that $D(p\|\pi)<\infty$ .).

The rate of entropy production (EP rate) incurred at time $t$ can be written as (Eq. 33 in esposito2010three )

\displaystyle\dot{\Sigma}(p(t),L(t))=-\sum_{x}{\textstyle\partial_{t}}p_{x}(t)\ln\frac{p_{x}(t)}{\pi^{L(t)}_{x}}\geq 0,

(11)

where ${\textstyle\partial_{t}}p_{x}(t)$ is defined in Eq. 10. Note that the right side of Eq. 11 is sometimes called the “nonadiabatic EP rate” in stochastic thermodynamics, and it is equal to the overall EP rate for a system coupled to a single bath and obeying detailed balance esposito2010three . The total EP incurred by a time-extended protocol over $t\in[0,1]$ that carries out the transformation $p\!\shortrightarrow\!p^{\prime}$ is given by the integral of the EP rate,

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})=\int_{0}^{1}\dot{\Sigma}(p(t),L(t))\,dt.

(12)

The work extracted during a protocol can be calculated by using Eqs. 12 and 2, once the initial and final nonequilibrium free energies, $F_{E}(p)$ and $F_{E^{\prime}}(p^{\prime})$ , are specified. To define these free energies, we assume that there is some fixed pair of energy functions, $E$ and $E^{\prime}$ , which specify the Boltzmann equilibrium distributions of $L(0)$ and $L(1)$ respectively.

For a continuous-state system evolving under a continuous master equation (van1992stochastic, ; risken1996fokker, ), the sums in Eqs. 10 and 11 should be replaced by integrals (see Eq. 31 in van2010three ). A prototypical example of a continuous master equation, which we will use below, is a Fokker-Planck equation (ermak1978brownian, ; van1992stochastic, ),

\displaystyle{\textstyle\partial_{t}}p(x,t)=-\nabla\cdot(\mathsf{A}(x,t)p(x,t)-\mathsf{D}(x,t)\nabla p(x,t)),

(13)

where $\mathsf{A}$ and $\mathsf{D}$ are drift and diffusion terms.

We will often write dynamical equations like Eqs. 10 and 13 using the notation ${\textstyle\partial_{t}}p(t)=L(t)p(t)$ , where $L(t)$ is a bounded linear operator that is called the (infinitesimal) generator of the dynamics at time $t$ . Note that for a continuous-state system in phase space, it may be that the system is isolated from the bath for some $t\in[0,1]$ , in which case ${\textstyle\partial_{t}}p(t)=L(t)p(t)$ should be understood in terms of the Liouville equation. (For example, if a system is first isolated and evolves in a Hamiltonian manner, and is then brought in contact with a bath at inverse temperature $\beta$ and allowed to equilibrate).

III General framework

We begin by presenting our general mathematical framework. The application of this framework to concrete situations is described in latter sections.

A driving protocol $\{L(t):t\in[0,1]\}$ is said to be constrained if there is some restricted set of generators $\Lambda$ such that $L(t)\in\Lambda$ at all $t$ . For a given set of allowed generators $\Lambda$ , we consider an associated operator $\phi:\mathcal{P}\to\mathcal{P}$ which satisfies two conditions. The first condition states that

\displaystyle D(p\|q)=D(p\|\phi(p))+D(\phi(p)\|q)

(14)

for all $p\in\mathcal{P}$ and $q\in\mathrm{img}\;\phi$ with $D(p\|q)<\infty$ (where $\mathrm{img}\;\phi=\{\phi(p):p\in\mathcal{P}\}$ is the image of the operator $\phi$ ). Eq. 14 is sometimes called the Pythagorean identity of KL divergence in information geometry amari2016information . Any $\phi$ that obeys Eq. 14 can be written in terms of the following projection ⁴⁴4This is because $D(p\|q)\geq D(p\|\phi(p))$ for any $q\in\mathrm{img}\;\phi$ , which follows from Eq. 14 and the non-negativity of KL divergence.

\displaystyle\phi(p)=\operatorname*{\arg\,\min}_{q\in\mathrm{img}\;\phi}D(p\|q),

(15)

which shows that $D(p\|\phi(p))$ is the minimal information-theoretic distance from $p$ to the set of distributions $\mathrm{img}\;\phi$ .

The second condition is that $\phi$ obeys the following commutativity relation for all $L\in\Lambda$ :

\displaystyle e^{\tau L}\phi(p)=\phi(e^{\tau L}p)\quad\forall\tau\geq 0,p\in\mathcal{P}.

(16)

In other words, given any initial distribution $p$ , the same final distribution is reached regardless of whether $p$ first relaxes under $L$ for time $\tau$ and then undergoes $\phi$ , or instead first undergoes $\phi$ and then relaxes under $L$ for time $\tau$ .

Note that the Pythagorean identity in Eq. 14 concerns only the operator $\phi$ , while the commutativity relation in Eq. 16 concerns the relationship between $\phi$ and the generators in $\Lambda$ (and therefore all of the generators $L(t)$ in the driving protocol, since $L(t)\in\Lambda$ at all $t$ by assumption). Beyond these two conditions, the operator $\phi$ can be arbitrary, and may be linear or nonlinear. In the following sections of this paper, will show how to choose $\phi$ for various types of constrained protocols.

Importantly, any $\phi$ that satisfies the two conditions above maps any distribution $p$ to a corresponding “accessible” distribution $\phi(p)$ , which controls the amount of work that can be extracted from $p$ by a constrained driving protocol. To prove this, we first show that for any $L\in\Lambda$ that obeys Eq. 16, the equilibrium distribution $\pi^{L}$ satisfies (Lemma 1 in Appendix A)

\displaystyle\pi^{L}\in\mathrm{img}\;\phi.

(17)

We also derive the following mathematical result, will be central to much of what follows: if $\phi$ obeys Eq. 14 and Eq. 16 for some generator $L$ , then the EP rate incurred by any distribution $p$ under $L$ can be written as the sum of two non-negative terms: the EP rate incurred by $\phi(p)$ under $L$ , and the instantaneous contraction of the KL divergence between $p$ and $\phi(p)$ .

Theorem 1.

If $\phi$ obeys Eq. 14 and Eq. 16 for some generator $L$ , then for all $p\in\mathcal{P}$ ,

\dot{\Sigma}(p,L)=\dot{\Sigma}(\phi(p),L)-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t))),

and $-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))\geq 0$ , where ${\textstyle\partial_{t}}p(t)=Lp$ .

We sketch the proof of this theorem in terms of a discrete-time relaxation over interval $\tau$ , as shown in Fig. 2 (see Appendix A for details). Consider some distribution $p$ that relaxes for time $\tau$ under the generator $L$ , thereby reaching the distribution $e^{\tau L}p$ (solid gray line). The EP incurred by this relaxation is given by the contraction of KL divergence to the equilibrium distribution $\pi$ , $\Sigma(p\!\shortrightarrow\!e^{\tau L}p)=D(p\|\pi)-D(e^{\tau L}p\|\pi)$ (contraction of purple lines) esposito2010three ; van2010three . Given Eq. 17, we can apply the Pythagorean identity, Eq. 14, to both $D(p\|\pi)$ and $D(e^{\tau L}p\|\pi)$ , which lets us rewrite $\Sigma(p\!\shortrightarrow\!e^{\tau L}p)$ as the sum of two terms: $D(p\|\phi(p))-D(e^{\tau L}p\|\phi(e^{\tau L}p)$ (green lines) and $D(\phi(p)\|\pi)-D(\phi(e^{\tau L}p)\|\pi)$ (red lines). Applying the commutativity relation, Eq. 16, shows that the first term is non-negative by the data-processing inequality and that the second term is equal to $\Sigma(\phi(p)\!\shortrightarrow\!e^{\tau L}\phi(p))$ , the EP incurred by letting $\phi(p)$ relax freely under $L$ . The continuous-time statement found in Theorem 1 follows by taking the appropriate $\tau\to 0$ limit, while noting that the EP rate, Eq. 11, can be rewritten in terms of the limit $\lim_{\tau\to 0}\frac{1}{\tau}[D(p\|\pi)-D(e^{\tau L}p\|\pi)]$ .

Now suppose that Eq. 16 holds, so that the assumptions of Theorem 1 are satisfied during the entire protocol. In that case, as we show in Lemma 3 in Appendix A, any constrained protocol that carries out the transformation $p\!\shortrightarrow\!p^{\prime}$ must also transform the initial distribution $\phi(p)$ to the final distribution $\phi(p^{\prime})$ . We can then, in essence, integrate Theorem 1 over time and derive the following result about total EP.

Theorem 2.

If $\phi$ obeys Eq. 14 and Eq. 16 for all $L\in\Lambda$ , then for any constrained protocol that transforms $p\!\shortrightarrow\!p^{\prime}$ ,

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})=\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime}))+\left[D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\right]

and $D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\geq 0$ .

We use Theorem 2 to derive several useful bounds on EP and work. First, since $\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime}))\geq 0$ by the non-negativity of EP, the contraction of KL divergence between $p$ and $\phi(p)$ bounds the EP incurred by a constrained driving protocol that carries out the transformation $p\!\shortrightarrow\!p^{\prime}$ ,

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\geq 0,

(18)

which appeared as Eq. 7 in the introduction. Furthermore, $D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\geq 0$ immediately implies that

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})).

(19)

We can also derive the decomposition of free energy and the bound on extractable work, which appeared as Eqs. 5 and 6 in the introduction. Consider some transformation $p\!\shortrightarrow\!p^{\prime}$ , and write the initial nonequilibrium free energy as

F_{E}(p)=F_{E}(\pi)+D(p\|\pi)/\beta,

(20)

where $\pi\propto e^{-\beta E}$ is the Boltzmann distribution for the initial energy function $E$ , and $F_{E}(\pi)$ is the equilibrium free energy (esposito2011second, ). Using Eq. 17 and the Pythagorean identity, Eq. 14, we decompose the nonequilibrium free energy into a sum of the accessible free energy and the inaccessible free energy,

	$\displaystyle F_{E}(p)$	$\displaystyle=F_{E}(\pi)+[D(p\\|\phi(p))+D(\phi(p)\\|\pi)]/\beta$
		$\displaystyle=F_{E}(\phi(p))+D(p\\|\phi(p))/\beta.$		(21)

Using a similar derivation, we can write the nonequilibrium free energy at the end of the protocol as

\displaystyle F_{E^{\prime}}(p^{\prime})=F_{E^{\prime}}(\phi(p^{\prime}))+D(p^{\prime}\|\phi(p^{\prime}))/\beta.

(22)

Subtracting Eq. 22 from Eq. 21 shows that the drop in the nonequilibrium free energy during $p\!\shortrightarrow\!p^{\prime}$ is given by

F_{E}(p)-F_{E^{\prime}}(p^{\prime})=F_{E}(\phi(p))-F_{E^{\prime}}(\phi(p^{\prime}))+\\ \left[D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime}))\right]/\beta.

(23)

Combining this result with Theorem 2 and Eq. 2, and then rearranging, shows that the work involved in carrying out $p\!\shortrightarrow\!p^{\prime}$ is equal to the work involved in carrying out the accessible transformation $\phi(p)\!\shortrightarrow\!\phi(p^{\prime})$ :

\displaystyle W(p\!\shortrightarrow\!p^{\prime})=W(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})).

(24)

Finally, by combining with Eq. 1, we arrive at an upper bound on work that can be extracted by a constrained protocol:

\displaystyle W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(\phi(p))-F_{E^{\prime}}(\phi(p^{\prime})),

(25)

which is tighter than the bound given by the second law, Eq. 1.

The bounds in Eqs. 18 and 25, as well as the decomposition of free energy in Eq. 21, are the main theoretical results arising from our general framework. Fig. 3 provides a schematic way of understanding these results. Theorem 2 states that, for a constrained protocol that carries out the map $p\!\shortrightarrow\!p^{\prime}$ , the EP incurred during the system’s actual trajectory (solid gray line) is given by the EP that would incurred by a “projected trajectory” that carries out the transformation $\phi(p)\!\shortrightarrow\!\phi(p^{\prime})$ (dashed gray line), plus the drop in the KL divergence from the system’s distribution to the set $\mathrm{img}\;\phi$ over the course of the protocol (contraction of green lines). Since the EP of the projected trajectory must be non-negative, the drop in the distance from the system’s distribution to $\mathrm{img}\;\phi$ serves as a lower bound on EP, as in Eq. 18. In addition, Theorem 2 states that this decrease in the KL divergence must be positive, meaning that the system’s distribution must get closer to $\mathrm{img}\;\phi$ over the course of the protocol.

Following Fig. 3, it can be helpful to think of the trajectory $p\!\shortrightarrow\!p^{\prime}$ as composed of three segment: (1) from $p$ down to $\phi(p)$ , (2) from $\phi(p)$ to $\phi(p^{\prime})$ while staying within $\mathrm{img}\;\phi$ , and (3) from $\phi(p^{\prime})$ up to $p^{\prime}$ (note that this decomposition is useful for accounting purposes, but does not generally reflect the actual trajectory the system takes in going from $p$ to $p^{\prime}$ ). The first and third segments contribute (positively and negatively, respectively) only to EP, while the projected second segment $\phi(p)\!\shortrightarrow\!\phi(p^{\prime})$ contributes both to EP and to work. Thus, the work involved in $p\!\shortrightarrow\!p^{\prime}$ is determined entirely by the work involved in the second segment, as stated in Eq. 24.

Note also the formal similarity between our decomposition of the drop in free energy, Eq. 23, and the decompositions of EP in Theorem 2. Indeed, like Theorem 2, the result Eq. 23 can be illustrated with Fig. 3: during the transformation $p\!\shortrightarrow\!p^{\prime}$ (solid gray line), the drop in free energy is given by the drop in free energy incurred by the transformation $\phi(p)\!\shortrightarrow\!\phi(p^{\prime})$ (dotted gray line), plus the contraction of the KL divergence from the system’s distribution to the set $\mathrm{img}\;\phi$ (green lines).

In general, our bounds on EP and work will not always be achievable. Suppose, however, that the final distribution $p^{\prime}$ is in equilibrium, so $p^{\prime}=\phi(p^{\prime})$ by Eq. 17. Eq. 18 then gives

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\phi(p)).

(26)

This bound is achievable if the generators in $\Lambda$ have a continuous curve of equilibrium distributions from $\phi(p)$ to $p^{\prime}=\phi(p^{\prime})$ . Imagine a protocol in which the initial distribution $p$ first relaxes to the equilibrium distribution $\phi(p)$ , and then undergoes quasistatic driving from $\phi(p)$ to $\phi(p^{\prime})$ while remaining in equilibrium throughout (in terms of Fig. 3, the system first relaxes along the green arrow connecting $p$ to $\phi(p)$ , then follows the dashed line to $\phi(p^{\prime})$ quasistatically). The relaxation step incurs $D(p\|\phi(p))$ of EP, while the quasistatic step incurs a vanishing amount of EP, so the bound in Eq. 26 will be achieved.

III.1 Choice of the $\phi$ operator

In general, the operator $\phi$ associated with a given set of generators $\Lambda$ is not unique. For instance, for any driving protocol, the identity map $\phi(p)=p$ always satisfies Eq. 14 and Eq. 16. Choosing $\phi$ to be the identity map, however, reduces the results in Theorem 2 to trivial identities and the lower bound on EP in Eq. 18 to $0$ .

At a high level, those $\phi$ which have smaller $\mathrm{img}\;\phi$ will generally give tighter bounds on EP (since, given Eq. 15, a smaller image leads to larger values of $D(p\|\phi(p))$ ). To illustrate this phenomenon, consider the extreme case where all $L\in\Lambda$ have the same equilibrium distribution $\pi$ , so that any constrained driving protocol must be a free relaxation toward $\pi$ . Then, the operator $\phi(p)=\pi$ for all $p$ (so $\mathrm{img}\;\phi$ is a singleton) satisfies Eqs. 16 and 14 and, when plugged into Eq. 18, gives the following bound on EP:

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\pi)-D(p^{\prime}\|\pi).

(27)

In fact, the right hand side is an exact expression for the EP incurred by the free relaxation, meaning that it is the tightest possible bound. If, however, the generators $L\in\Lambda$ have different equilibrium distributions, then the operator $\phi(p)=\pi$ (for whatever $\pi$ ) generally violates the commutativity relation in Eq. 16, and bounds like Eq. 27 will no longer hold.

In the following sections, we show how to use our results to derive thermodynamic bounds for $\Lambda$ that obey some kind of symmetry group, modular decomposition, or coarse-grained structure. In more general, possibly unstructured cases, it is an open question of whether a non-trivial operator $\phi$ exists, and if so how to identify it. We explore related issues in a companion paper kolchinsky_constraints_paper2 , where we use numerical optimization techniques to derive bounds on EP similar to Eq. 18.

Importantly, when there are multiple different operators that all satisfy the Pythagorean identity and the commutativity relation for the available generators $\Lambda$ , one can derive tighter bounds on EP and work by applying our decompositions in an “iterative” manner. For instance, imagine that there are two different operators $\phi_{1}$ and $\phi_{2}$ that satisfy Eqs. 14 and 16 (for example, these might represent operators arising from symmetry constraints and modularity constraints, respectively, as described below). Applying Theorem 2 iteratively leads to “stacked” bounds on EP analogous Eq. 18,

\Sigma(p\!\shortrightarrow\!p^{\prime})\geq\big{[}D(p\|\phi_{1}(p))+D(\phi_{1}(p)\|\phi_{2}(\phi_{1}(p)))\big{]}-\\ \big{[}D(p^{\prime}\|\phi_{1}(p^{\prime}))+D(\phi_{1}(p^{\prime})\|\phi_{2}(\phi_{1}(p^{\prime})))\big{]}\geq 0.

(28)

Similarly, applying Eq. 24 iteratively leads to stacked bounds on extractable work analogous to Eq. 25,

\displaystyle W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(\phi_{2}(\phi_{1}(p)))-F_{E^{\prime}}(\phi_{2}(\phi_{1}(p^{\prime}))).

(29)

Such stacked bounds are generally tighter than the bounds provided by either $\phi_{1}$ or $\phi_{2}$ alone. (Note that one can also reverse the order of operations, and consider the composition $\phi_{1}(\phi_{2}(p))$ rather than $\phi_{2}(\phi_{1}(p))$ in Eqs. 28 and 29, which will in general lead to different bounds.)

III.2 Fluctuating entropy production

As we show in detail in Section A.2, our results also have implications for stochastic fluctuations of trajectory-level EP, as considered in stochastic thermodynamics seifert2012stochastic .

Consider any constrained driving protocol over $t\in[0,1]$ with an associated operator $\phi$ . Let $\bm{x}$ indicate some stochastically sampled trajectory of the system visited during the driving protocol, and let $\sigma_{p}(\bm{x})$ indicate the fluctuating EP incurred by trajectory $\bm{x}$ when initial states are sampled from the initial distribution $p$ . In the appendix, we consider the difference between this fluctuating EP and the fluctuating EP incurred by the same trajectory when initial states are sampled from the accessible initial distribution $\phi(p)$ ,

\displaystyle m_{p}(\bm{x}):=\sigma_{p}(\bm{x})-\sigma_{\phi(p)}(\bm{x}).

(30)

By combining Theorem 2 with recent results in stochastic thermodynamics kolchinsky2021state ; kwon2019fluctuation , we show that the expectation of $m_{p}(\bm{x})$ is equal to the difference of expected EPs, $\langle m_{p}(\bm{x})\rangle=\Sigma(p\!\shortrightarrow\!p^{\prime})-\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime}))$ , where $\langle\cdot\rangle$ indicates expectation over trajectories sampled from initial distribution $p$ . We also show that $m_{p}(\bm{x})$ obeys a detailed fluctuation theorem, which implies a trajectory-level version of Eq. 19: the probability that the fluctuating EP under initial distribution $p$ is $\xi$ less than the fluctuation EP under the accessible initial distribution $\phi(p)$ is exponentially small (i.e., it is less than $e^{-\xi}$ ). We leave further exploration of the connection between our framework and stochastic thermodynamics for future work.

IV Thermodynamics of information under protocol constraints

The framework introduced in the previous section has implications for the thermodynamics of information under constraints. Consider the type of feedback control setup described in the introduction: first an observation apparatus $M$ measures some system observable, then the system undergoes a driving protocol that depends on the measurement outcome $m$ . Let $L^{(m)}(t)$ indicate the driving protocol conditioned on $m$ , and $p_{X|m}$ and $p_{X^{\prime}|m}^{\prime}$ indicate the distributions over system states at the beginning and end of the corresponding driving protocol. As standard in the literature parrondo2015thermodynamics , for simplicity we assume that all protocols start and end with the same energy functions, $E$ and $E^{\prime}$ , and that during the protocols, the measurement apparatus $M$ and the system $X$ are energetically decoupled and that $M$ does not change state.

Given the above assumptions, it is straightforward to show that the EP incurred by the joint “supersystem” $X\times M$ obeys

\displaystyle\Sigma_{XM}=\sum_{m}p(m)\Sigma_{m},

(31)

where $\Sigma_{m}$ is the EP incurred by protocol $L^{(m)}(t)$ in carrying out the transformation $p_{X|m}\!\shortrightarrow\!p_{X^{\prime}|m}^{\prime}$ . Similarly, by taking expectations of Eq. 2 and rearranging (see derivation of Eq. 4), the average extracted work under feedback control can be written as

\displaystyle\langle W\rangle=\Delta F+[I(X;M)\!-\!I(X^{\prime};M)]-\!\sum_{m}p(m)\Sigma_{m},

(32)

where for notational convenience we’ve used $\Delta F=F_{E}(p)-F_{E^{\prime}}(p^{\prime})$ to indicate the drop of marginal free energy. Thus, any lower bounds on $\Sigma_{m}$ (the EP values incurred by the individual protocols $L^{(m)}(t)$ ) can be translated into bounds on the overall EP and average extractable work for a feedback control setup.

For example, suppose that there is some single set of constraints that applies to all of the driving protocols, in that there is some set of generators $\Lambda$ such that $L^{(m)}(t)\in\Lambda$ for all $t$ and $m$ , as well as an operator $\phi$ that obeys the Pythagorean identity, Eq. 14, and the commutativity relation, Eq. 16, for all $L\in\Lambda$ . In that case, the framework described in Section III leads to bounds on each $\Sigma_{m}$ term. In particular, using Eqs. 18 and 31 gives the bound

\Sigma_{XM}\geq\\ D(p_{X|M}\|\phi(p_{X|M}))-D(p_{X^{\prime}|M}^{\prime}\|\phi(p_{X^{\prime}|M}^{\prime}))\geq 0,

(33)

where we’ve defined the conditional KL divergence $D(p_{X|M}\|\phi(p_{X|M}))=\sum_{m}p(m)D(p_{X|m}\|\phi(p_{X|m}))$ , and similarly for $D(p_{X^{\prime}|M}^{\prime}\|\phi(p_{X^{\prime}|M}^{\prime}))$ . Plugging into Eq. 32 gives the following bound on average extractable work:

\displaystyle\left\langle W\right\rangle\leq\Delta F+[I_{\mathrm{acc}}^{\phi}(X;M)-I_{\mathrm{acc}}^{\phi}(X^{\prime};M)]/\beta,

(34)

where $I_{\mathrm{acc}}^{\phi}(X;M)$ is given by

\displaystyle I_{\mathrm{acc}}^{\phi}(X;M)

\displaystyle=I(X;M)-D(p_{X|M}\|\phi(p_{X|M})),

(35)

and similarly for $I_{\mathrm{acc}}^{\phi}(X^{\prime};M)$ .

We refer to $I_{\mathrm{acc}}^{\phi}(X;M)$ as the accessible information in measurement $M$ , since any decrease in accessible information can contribute to work extraction (Eq. 34). We refer to the conditional KL divergence $D(p_{X|M}\|\phi(p_{X|M}))$ as the inaccessible information, since any decrease in inaccessible information must be dissipated as EP, and not extracted as work (Eq. 33). The inaccessible information is non-negative by properties of KL divergence, so $I_{\mathrm{acc}}^{\phi}(X;M)\leq I(X;M)$ . In addition, whenever $p\in\mathrm{img}\;\phi$ (e.g., when $p$ is an equilibrium distribution, by Eq. 17), the accessible information can be rewritten in simpler form as

\displaystyle I_{\mathrm{acc}}^{\phi}(X;M)=D(\phi(p_{X|M})\|p),

(36)

as follows from Eq. 35 by writing $I(X;M)=D(p_{X|M}\|p)$ and applying the Pythagorean theorem, Eq. 14.

In general, measurements of different observables on the same system will give rise to different amounts of accessible and inaccessible information. At a high level, one should choose measurements that maximize the accessible information $I_{\mathrm{acc}}^{\phi}(X;M)$ , or alternatively the “efficiency” quantified as bits of accessible information per bit of measured information, $I_{\mathrm{acc}}^{\phi}(X;M)/I(X;M)\leq 1$ . Optimal measurements satisfy $I_{\mathrm{acc}}^{\phi}(X;M)=I(X;M)$ , which happens when the conditional distributions over system states $p_{X|m}$ are invariant under the action of $\phi$ (i.e., when $\phi(p_{X|m})=p_{X|m}$ for each $m$ ).

Note that similar results can also be derived using other kinds of bounds on $\Sigma_{m}$ (e.g., when the individual protocols obey a combination of constraints, so that Eq. 28 holds).

V Symmetry constraints

We now use the general framework introduced above to derive bounds on EP under symmetry constraints.

Consider a compact group $\mathcal{G}$ that has a measurable action over $X$ , such that each ${g}\in\mathcal{G}$ is a bijection $X\to X$ ⁵⁵5A compact group $\mathcal{G}$ has a measurable action over $X$ if the action $\mathcal{G}\times X\to X$ is a measurable function, where we assume $\mathcal{G}$ and $X$ are endowed with their respective Borel algebras.. For continuous $X$ , we assume that each $g\in\mathcal{G}$ is a rigid transformation. For notational convenience, for each $g\in\mathcal{G}$ we define the composition operator $\Phi_{g}$ , so that for any function $f:X\to\mathbb{R}$ ,

\displaystyle\Phi_{g}(f)(x)=f(g(x)).

(37)

We say that a set of generators $\Lambda$ obeys symmetry constraints (with respect to the action of group $\mathcal{G}$ ) if the following commutativity relation holds for all $L\in\Lambda$ :

\displaystyle\Phi_{g}L=L\Phi_{g}.\qquad\forall{g}\in\mathcal{G}.

(38)

In other words, $\Lambda$ obey symmetry constraints when, for each $L\in\Lambda$ and ${g}\in\mathcal{G}$ , it does not matter whether one first applies the generator $L$ and then the bijection ${g}$ over the state space, or first applies the bijection ${g}$ over the state space and then the generator $L$ . In more concrete terms, for a (continuous or discrete) master equation $L$ , Eq. 38 holds if the transition rates are invariant under the action of $\mathcal{G}$ :

\displaystyle L_{xx^{\prime}}=L_{{g}(x){g}(x^{\prime})}\qquad\forall x,x^{\prime}\in X,g\in\mathcal{G}.

(39)

We can also derive simple sufficient conditions for potential-driven Fokker-Planck equations of the type

\displaystyle Lp=\nabla\cdot(\nabla E_{L})p+\beta^{-1}\Delta p,

(40)

where $E_{L}$ is the energy function of generator $L$ . Then, Eq. 38 holds if all available energy functions are invariant under the action of $\mathcal{G}$ ,

\displaystyle E_{L}(x)=E_{L}({g}(x))\quad\forall x\in X,g\in\mathcal{G},L\in\Lambda\,.

(41)

(Eq. 38 is derived from Eqs. 39 and 41 in Appendix B).

We now define a linear operator $\phi_{\mathcal{G}}$ which satisfies the Pythagorean identity and the commutativity relation, Eqs. 14 and 16, for symmetry constraints. Let $\phi_{\mathcal{G}}$ map each $p\in\mathcal{P}$ to its average under the action of $\mathcal{G}$ ,

\displaystyle\phi_{\mathcal{G}}(p)(x):=\int_{\mathcal{G}}p(g(x))\,d\mu(g),

(42)

where $\mu$ is the uniform (normalized Haar) measure over $\mathcal{G}$ ⁶⁶6 Technically, the definition of the twirling operator in Eq. 42 applies only when $p$ is a finite-valued probability density function (which excludes things such as the Dirac delta “function”). A more general formulation of our results can be developed in terms of probability measures rather than probability densities (see Ch. 3 in eaton_group_1989 for a version of Eq. 42 defined in terms of probability measures).. For a finite group, the integral in Eq. 42 should be replaced by a summation. Following the terminology in quantum physics, we sometimes refer to $\phi_{\mathcal{G}}$ as a twirling operator vollbrecht2001entanglement ; vaccaro_tradeoff_2008 . Intuitively, $\phi_{\mathcal{G}}(p)$ symmetrizes $p$ , removing all information in $p$ concerning the state of the system along the “coordinates” specified by the symmetry constraints.

In Appendix B, we show that $\phi_{\mathcal{G}}$ obeys the Pythagorean identity and, as long as Eq. 38 holds, the commutativity relation of Eq. 16. Thus, any protocol that carries out the transformation $p\!\shortrightarrow\!p^{\prime}$ while obeying symmetry constraints with respect to $\mathcal{G}$ permits the decomposition of EP found in Theorem 2, with $\phi=\phi_{\mathcal{G}}$ , and satisfies all the bounds on work and EP that follow from that result.

In particular, using Eq. 21, we can decompose the free energy $F_{E}(p)$ of any distribution $p$ into the accessible free energy $F_{E}(\phi_{\mathcal{G}}(p))$ , which is the free energy in the twirled (and therefore symmetric) version of $p$ , and the inaccessible free energy $D(p\|\phi_{\mathcal{G}}(p))/\beta$ . Note that $D(p\|\phi_{\mathcal{G}}(p))$ is a non-negative measure of the asymmetry in distribution $p$ with respect to the symmetry group $\mathcal{G}$ , which vanishes when $p$ is invariant under $\phi_{\mathcal{G}}$ . Thus, for any protocol that obeys symmetry constraints, the first inequality in Eq. 18 states that any “drop in asymmetry” must be dissipated as EP, and not turned into work. The second inequality in Eq. 18 states that the asymmetry in the system’s distribution can only decrease during the protocol. (Some of the above results for symmetry constraints have been previously uncovered in quantum thermodynamics vaccaro_tradeoff_2008 ; janzing_quantum_2006 ; see Section VIII.)

We finish by discussing thermodynamics of information under symmetry constraints. In general, the results derived in Section IV apply to the twirling operator $\phi_{\mathcal{G}}$ as a special case. We can also exploit special properties of $\phi_{\mathcal{G}}$ to further simplify the expression of the inaccessible information term in Eqs. 35 and 33. Suppose that distribution $p$ is invariant under $\phi_{\mathcal{G}}$ , so $p=\phi_{\mathcal{G}}(p)$ (e.g., if $p$ is an equilibrium distribution). As shown in Section B.4, we can then rewrite the inaccessible information term as

\displaystyle D(p_{X|M}\|\phi_{\mathcal{G}}(p_{X|M}))=\Bigg{\langle}\!\ln\frac{q(m|x)}{\int_{\mathcal{G}}q(m|g(x))d\mu(g)}\!\Bigg{\rangle},

(43)

where $q(m|x)$ is the measurement channel and $\langle\cdot\rangle$ is indicates expectation under the joint distribution $p(x,m)=p(x)q(x|m)$ . Eq. 43 conveniently expresses the inaccessible information in terms of the asymmetry of the measurement channel relative to the action of $\mathcal{G}$ (the right side of Eq. 43 vanishes when $q(m|x)$ is invariant under that action), which we will exploit in some of our examples below.

V.1 Example: Szilard box with symmetry constraints

We demonstrate our results on symmetry constraints using the Szilard box shown in Fig. 1. We assume that the box is coupled to a single heat bath at inverse temperature $\beta=1$ , and that the particle inside the box has overdamped Fokker-Planck dynamics, so that all generators have the form of Eq. 40. The system’s state is represented by a horizontal and a vertical coordinate, $x=(x_{1},x_{2})\in\mathbb{R}^{2}$ .

Suppose that all energy functions have the form

E_{\lambda}(x_{1},x_{2})=V_{\mathrm{p}}(x_{1}-\lambda)+V_{\mathrm{w}}(|x_{1}|)+V_{\mathrm{w}}(|x_{2}|),

(44)

where $\lambda\in\mathbb{R}$ is a controllable parameter that determines the location of the vertical partition, $V_{\mathrm{p}}$ is the partition’s repulsion potential, and $V_{\mathrm{w}}$ is the repulsion potential of the box walls:

V_{\mathrm{w}}(a)=\begin{cases}0&\text{if }a\leq 1\\ \infty&\text{otherwise}\end{cases}

(45)

meaning that the box extends over $(x_{1},x_{2})\in[-1,1]^{2}$ ⁷⁷7Technically, the wall potential as defined in Eq. 45 is non-differentiable. To be more accurate, one should imagine it in terms of the limit $V_{\mathrm{w}}(|x|)=\lim_{\alpha\to\infty}|x|^{\alpha}$ dhar2019run .. Assume that $V_{\mathrm{p}}(x-\lambda)=0$ for some value of $\lambda$ (i.e., the partition can be removed by setting $\lambda$ outside the box). For such $\lambda$ , let $E^{\varnothing}$ indicate the corresponding energy function, and note that it obeys $E^{\varnothing}(x_{1},x_{2})=0$ within the box (and infinity elsewhere), corresponding to a uniform equilibrium distribution $u(x_{1},x_{2})=\mathbf{1}_{[-1,1]^{2}}(x_{1},x_{2})/4$ (where $\mathbf{1}$ is the indicator function). This Szilard box is shown schematically in Fig. 4.

The energy functions in Eq. 44 obey the vertical reflection symmetry $E(x_{1},x_{2})=E(x_{1},-x_{2})$ , corresponding to the two-element symmetric group $S_{2}$ whose action is generated by $g(x_{1},x_{2})=(x_{1},-x_{2})$ . The corresponding twirling of $p$ is the uniform mixture of $p$ and its reflection,

\displaystyle\phi_{\mathcal{G}}(p)(x_{1},x_{2})=(p(x_{1},x_{2})+p(x_{1},-x_{2}))/2.

(46)

We can use our results to derive bounds on the work that can be extracted from this Szilard box. Intuitively, the set of allowed generators $L$ — that is, Fokker-Planck operators with energy functions as in Eq. 44, corresponding to different horizontal locations of the vertical partition — all obey vertical reflection symmetry. Thus, the dynamics generated by those Fokker-Planck operators commute with $\phi_{\mathcal{G}}$ , the twirling operator defined in Eq. 46. Using Eq. 25, we can bound the work extracted during any transformation $p\!\shortrightarrow\!p^{\prime}$ in terms of the decrease of the accessible free energy, $F_{E}(\phi_{\mathcal{G}}(p))-F_{E^{\prime}}(\phi_{\mathcal{G}}(p^{\prime}))$ .

In more detail, consider some driving protocol which starts and ends with the partition removed. At intermediate times, the driving protocol manipulates the location of the partition so as to bring the system from some initial distribution $p$ to a final equilibrium distribution $p^{\prime}=u$ while extracting work. The second law gives bounds on EP, $\Sigma(p\!\shortrightarrow\!p^{\prime})\geq 0$ , and work:

\displaystyle W(p\!\shortrightarrow\!u)

\displaystyle\leq F_{E^{\varnothing}}(p)-F_{E^{\varnothing}}(u)=D(p\|u),

(47)

which follows from Eqs. 1 and 20. However, this bound can be too optimistic due to the protocol constraints. Given Eq. 18, as well as the fact that the final distribution obeys $\phi_{\mathcal{G}}(u)=u$ , we know that $\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p\|\phi_{\mathcal{G}}(p))$ . Similarly, Eq. 25 gives a tighter bound on extractable work

\displaystyle W(p\!\shortrightarrow\!u)\leq F_{E^{\varnothing}}(\phi_{\mathcal{G}}(p))-F_{E^{\varnothing}}(u)=D(\phi_{\mathcal{G}}(p)\|u),

(48)

where the second equality follows from Eq. 20.

It is easy to use these results to resolve the question raised in the introduction: can one show that work can only be extracted from a measurement of whether the particle is in the left or right half of the box, rather than a measurement of whether the particle is in the top or bottom half of the box? Suppose that the particle’s initial distribution $p$ is uniform across the left or right half of the box. Such a distribution $p$ is invariant under vertical reflection, so $p=\phi_{\mathcal{G}}(p)$ and Eq. 48 gives $W(p\!\shortrightarrow\!u)\leq D(p\|u)=\ln 2$ , the same as the bound set by the second law, Eq. 47. This bound can be achieved by quickly moving the partition to the middle of the box, and then slowly moving it rightward. Conversely, suppose that under the initial distribution $p$ , the particle is uniformly distributed across the top or bottom half of the box. The twirling of such a distribution is a uniform distribution over the box, $\phi_{\mathcal{G}}(p)=u$ . In this case, Eq. 48 gives $W(p\!\shortrightarrow\!u)\leq 0$ , meaning that no work can be extracted.

We now demonstrate the power of our approach by analyzing extractable work given a more complex family of initial distributions (while using the same energy functions as above). Suppose that the initial distribution is concentrated within half the box, as determined by a separating line that is rotated by an arbitrary angle $\theta\in[-\pi,\pi]$ (see Fig. 5(a)). This initial distribution can be written formally as

\displaystyle p_{\theta}(x_{1},x_{2})

\displaystyle=\frac{\mathbf{1}_{[-1,1]^{2}}(x_{1},x_{2})}{2}\Theta(x_{2}\sin\theta-x_{1}\cos\theta),

(49)

where $\Theta$ is the Heaviside function. For instance, $p_{\theta}$ for $\theta=0$ corresponds to the particle being in the left half of the box, while $p_{\theta}$ for $\theta=\pi/2$ corresponds to the particle being in the top half of the box.

Because we are considering the same set of generators as above, we can bound the extractable work in a given $p_{\theta}$ using the same twirling operator as defined above in Eq. 46. (For a sample $p_{\theta}$ , the twirling $\phi_{\mathcal{G}}(p_{\theta})$ is illustrated in Fig. 5(b).) Using Eq. 48, the extractable work obeys $W(p_{\theta}\!\shortrightarrow\!u)\leq D(\phi_{\mathcal{G}}(p_{\theta})\|u)$ . Moreover, as we show in Section B.5, this KL divergence can be written in closed form as

\displaystyle D(\phi_{\mathcal{G}}(p_{\theta})\|u)\!=\!{\ln 2}\cdot\!\begin{cases}\frac{1}{2}|\tan(\theta-\frac{\pi}{2})|&\!\text{$|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4})$}\\ 1-\frac{1}{2}|\tan\theta|&\text{otherwise.}\end{cases}

(50)

This result is plotted as a function of $\theta$ in Fig. 6.

We can also analyze the thermodynamics of information for different measurements of the Szilard box. Imagine that, starting from a uniform equilibrium distribution, one measures which side of the box contains the particle, as determined by a separating line at some arbitrary angle $\theta\in[-\pi,\pi]$ . For this measurement, the conditional distribution over system states $p_{X|m}$ is equal to $p_{\theta}$ half the time (as in Fig. 5(a)), and equal to $p_{\theta+\pi}$ the other half the time. Then, for both measurement outcomes, one manipulates the vertical partition so as to drive the particle back to the equilibrium distribution $p^{\prime}=u$ while extracting work. For simplicity, we assume that the initial and final energy functions are the same.

The general bound on average extractable work for feedback control, Eq. 4, gives

\displaystyle\langle W\rangle\leq I(X;M)=\ln 2,

(51)

where we’ve used that $p=p^{\prime}$ and $I(X^{\prime};M)=0$ . Our results provide a tighter bound, showing that the average extractable work is bounded by the accessible information in the measurement,

\displaystyle\!\langle W\rangle\!\leq\!I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)\!=\!\frac{D(\phi_{\mathcal{G}}(p_{\theta})\|u)\!\!+\!\!D(\phi_{\mathcal{G}}(p_{\theta+\pi})\|u)}{2},

(52)

where we used Eqs. 34 and 36. It can be verified from Eq. 50 that $D(\phi_{\mathcal{G}}(p_{\theta})\|u)=D(\phi_{\mathcal{G}}(p_{\theta+\pi})\|u)$ . Thus, the accessible information for a given $\theta$ is simply equal to $D(\phi_{\mathcal{G}}(p_{\theta})\|u)$ , the right side of Eq. 50, and shown in Fig. 6. As expected, the accessible information achieves a maximum of $\ln 2$ at $\theta=0$ (or $\theta=\pm\pi$ ), which corresponds to a measurement of whether the particle is on the left or right side of the box. The accessible information falls nonlinearly (but continuously) to a minimum of 0 at $\theta=\pm\pi/2$ , which corresponds to a measurement of whether the particle is on the top or bottom of the box.

In the example above, the accessible information quantifies in a very literal way the “alignment” between the choice of measurement and the way the system can be manipulated. More generally, this example illustrates how our bounds on EP and work depend on the interplay between the operator $\phi$ , the initial/final distributions $p$ and $p^{\prime}$ , and (for feedback control protocols) the choice of measurement $M$ . This interplay can give rise to highly non-trivial thermodynamic bounds, such as in Eqs. 50 and 6, even for very simple operators $\phi$ , such as in Eq. 46.

Finally, we note that our analysis above only assumes that the energy functions are vertically symmetric, which includes many energy functions that do not have the form of the vertical partition defined in Eq. 44. Furthermore, while the bounds on work and EP which we derive here are achievable by some vertically symmetric energy functions, they are not necessarily achievable by manipulating the location of a vertical partition. For instance, achieving the extractable work bound for a given $\theta$ , Eq. 50, generally requires that the corresponding twirled distribution $\phi_{\mathcal{G}}(p)$ , such as the one shown in Fig. 5(b), is an equilibrium distribution for some available energy function.

We analyze the same system using a different set of constraints in Sections VI.1 and VII.1 below. (Also see still2021partially for a different recent analysis of the thermodynamics of the Szilard box with rotated measurements, though from the point of view of partial observability rather than protocol constraints.)

V.2 Example: Feedback control on the Ising model

Our bounds on symmetry constraints can be useful for various multi-particle systems with symmetries, such as gases of indistinguishable particles and spin systems with symmetries. We demonstrate this by analyzing the thermodynamics of feedback control on an Ising model. The reader may also be interested in Section B.6, where we analyze a simpler and more pedagogical example of a discrete-state system with symmetry constraints.

Consider a 2D Ising model on a square lattice on a torus, containing a total of $N^{2}=N\times N$ spins. The state of the lattice is indicated as $x\equiv(x_{1},\dots,x_{N^{2}})$ , where $x_{i}\in\{-1,1\}$ is the state of the spin at location $i$ . We assume that the energy functions have the following form,

\displaystyle E(x)=-J\sum_{\mathclap{(i,j)\in\mathcal{N}}}x_{i}x_{j}-H\sum_{i}x_{i}.

(53)

where $\mathcal{N}$ is the set of all nearest neighbors on the lattice, $J$ is the coupling strength, and $H$ is the external magnetic field.

Energy functions like these are invariant under the symmetry group $\mathcal{G}$ corresponding to horizontal and vertical translations of the lattice (for simplicity, we ignore other symmetries of the lattice, such as reflections and rotations). The action of this group is given by a set of $N^{2}$ bijections $g_{a,b}:X\to X$ for $a,b\in\{0,\dots,N-1\}$ , where $g_{a,b}(x)$ translates the lattice state $x$ to the right by $a$ spins and upward by $b$ spins (with periodic boundary conditions). We assume that the system evolves according to Glauber dynamics krapivsky_kinetic_2010 , or some other dynamics that respects the translational symmetry of the 2D lattice, such that Eq. 39 is satisfied.

Given these assumption, we can derive thermodynamic bounds for the 2D Ising model in terms of the following twirling operator,

\displaystyle\phi_{\mathcal{G}}(p)(x)=N^{-2}\sum_{a=0}^{N-1}\sum_{b=0}^{N-1}g_{a,b}(x).

(54)

We use this twirling operator to analyze the thermodynamics of the following feedback-control setup on the Ising model, also shown in Fig. 7. The lattice is initially in equilibrium $p$ at some temperature $\beta$ and $J=1,H=0$ (no external field). The state of the spin at location $1$ is then measured under the measurement channel $q(m|x)=\delta_{m}(x_{1})$ , where $\delta$ is the Kronecker delta. Since there is no initial external field, the two outcomes $m\in\{-1,1\}$ have equal probability and $I(X;M)=\ln 2$ . The measured outcome is then used to select a driving protocol, which extracts work from the system by manipulating the control parameters $J$ and $H$ . At the end of the protocol corresponding to each outcome, the system is brought back to the original equilibrium (so $p_{X^{\prime}|m}^{\prime}=p$ for all $m$ ). For simplicity, we assume that the initial and final energy functions are the same.

Under this setup, one can verify that $I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X^{\prime};M)=0$ and $F_{E}(p)=F_{E^{\prime}}(p^{\prime})$ , so Eq. 34 bounds average extractable work as $\langle W\rangle\leq I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)/\beta$ , where $I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)$ is the accessible information from Eq. 35. Using Eqs. 35 and 43, we can write this accessible information as

\displaystyle I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)=\ln 2-\Bigg{\langle}\ln\frac{q(m|x)}{N^{-2}\sum_{a,b}q(m|g_{a,b}(x))}\Bigg{\rangle},

(55)

where $\langle\cdot\rangle$ indicates expectation over the joint distribution $p(x)q(m|x)$ , where $p(x)$ is the initial equilibrium distribution at inverse temperature $\beta$ and $J=1,H=0$ . We emphasize that the accessible information depends on $\beta$ (though we leave this dependence implicit in the notation).

In general, one can estimate the accessible information in Eq. 55 using various numerical techniques (e.g., by sampling from the initial equilibrium distribution using Monte Carlo methods). It is also possible to use Onsager’s well-known solution of the 2D Ising model to calculate the accessible information in closed form. In particular, in Section B.7 we show that in the thermodynamic limit $N\to\infty$ ,

\displaystyle I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)=\begin{cases}0&\text{for $\beta\leq\beta_{c}$}\\ \ln 2-h_{2}\Big{(}\frac{1+\sqrt[8]{1-(\sinh 2\beta)^{-4}}}{2}\Big{)}&\text{for $\beta>\beta_{c}$.}\end{cases}

(56)

where $h_{2}(x)=-x\ln x-(1-x)\ln(1-x)$ is the binary entropy function and $\beta_{c}=\ln(1+\sqrt{2})/2\approx 0.44$ is the critical inverse temperature of the 2D Ising model. This result is verified in Fig. 7, where we compare Eq. 56 with a Monte Carlo estimate of Eq. 55 on a 100x100 lattice. It can be seen that in the high temperature (low $\beta$ ) regime, the accessible information vanishes. In the low temperature (high $\beta$ ) regime, the amount of accessible information increases, approaching $\ln 2$ as $\beta\to\infty$ .

We also plot the bound on average extractable work, $\langle W\rangle\leq I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)/\beta$ , in the inset in Fig. 7. This bound is the ratio of two terms: the accessible information $I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)$ and the inverse temperature $\beta$ , both of which are increasing in $\beta$ . In fact, it can be seen from Fig. 7 that the bound on extractable work peaks at a finite value of $\beta$ , the optimal inverse temperature for work extraction. Using Eq. 56 and numerical techniques, we find this optimal value to be $\beta\approx 0.547$ , leading to the bound $\langle W\rangle\leq 1.06$ joules.

This shows that the amount of accessible information provided by a given measurement can depend on the structure of correlations in the system, and therefore vary dramatically as the system undergoes a phase transition. At a high level, any driving protocol that is restricted to energy functions like Eq. 53 can only extract work from “global” (i.e., translationally invariant) information. If the measurement acquires such information (e.g., if it directly measures the spatially-averaged magnetization), then in principle all of the acquired information may be extractable as work. Measurement of the state of a single spin, however, in general provides only local information. The temperature dependence observed in Eq. 56 and Fig. 7 arises from the presence of long-range order in the magnetic regime ( $\beta>\beta_{c}$ ). In this regime, the state of each spin is highly correlated with the magnetization of the entire lattice, so local and global information are equivalent. In the high temperature regime ( $\beta<\beta_{c}$ ), the state of a single spin is not correlated with any kind of global information, and so most of the measured information is inaccessible.

For a different kind of analysis of the thermodynamics of a 1D Ising model under constraints, see lekscha_quantum_2018 .

VI Modularity constraints

Many systems of interest exhibit modular organization, meaning that their degrees of freedom can be grouped into decoupled subsystems. Examples of modular systems include computational devices such as digital circuits (gershenfeld1996signal, ; Boyd:2018aa, ; wolpert2020thermodynamic, ), regulatory networks in biology (schlosser2004modularity, ), and brain networks (sporns2016modular, ).

We use our framework to derive bounds on work and EP for modular systems. We begin by introducing some terminology and notation. Consider a system whose degrees of freedom are indexed by the set $V$ , such that the overall state space can be written as $X={}_{v\in V}X_{v}$ , where $X_{v}$ is the state space of degree of freedom $v$ . We use the term subsystem to refer to any subset of the degrees of freedom, $A\subseteq V$ . We use $X_{A}$ to indicate the random variable representing the state of subsystem $A$ and $x_{A}$ to indicate an actual state of $A$ . Given some distribution $p$ over the entire system, we use $p_{A}$ to indicate a marginal distribution over subsystem $A$ , and $[Lp]_{A}$ to indicate the derivative of the marginal distribution of subsystem $A$ under the generator $L$ .

We use the term modular decomposition to refer to a set of subsystems $\mathcal{C}$ , such that each $v\in V$ belongs to at least one subsystem $A\in\mathcal{C}$ . Note that some of the degrees of freedom $v\in V$ can belong to more than one subsystem in $\mathcal{C}$ . We use

\displaystyle O(\mathcal{C})=\bigcup_{A,B\in\mathcal{C}:A\neq B}(A\cap B)

(57)

to indicate those degrees of freedom that belong to more than one subsystem in $\mathcal{C}$ , which we refer to as the overlap. We will often write $O$ instead of $O(\mathcal{C})$ for notational simplicity.

We say that the available driving protocols obey modularity constraints (with respect to the modular decomposition $\mathcal{C}$ ) if each generator $L\in\Lambda$ can be written as a sum of generators of the different subsystems in $\mathcal{C}$ ,

\displaystyle L=\sum_{A\in\mathcal{C}}L^{(A)},

(58)

and each $L^{(A)}$ obeys two properties: the dynamics over the marginal distribution $p_{A}$ are closed under $L^{(A)}$ (depend only on the marginal distribution over $A$ ),

p_{A}=q_{A}\implies[L^{(A)}p]_{A}=[L^{(A)}q]_{A}\qquad\forall p,q\in\mathcal{P},

(59)

and the distribution over other subsystems besides $A$ does not change under $L^{(A)}$ ,

[L^{(A)}p]_{B}=0\qquad\quad\forall p\in\mathcal{P},B\in\mathcal{C}\setminus\{A\}.

(60)

In other words, we require that each subsystem evolves independently, and does not affect the other subsystems.

The role of the degrees of freedom in the overlap is somewhat subtle. It can be verified that Eq. 60 implies that the degrees of freedom in the overlap cannot change state when evolving under $L$ . Importantly, however, the overlap may influence the dynamics of those degrees of freedom that can change state. For example, consider an inclusive model of a feedback control setup: there are two nested subsystems, $\mathcal{C}=\{A,B\}$ with $B\subseteq A$ , and the degrees of freedom in $O=B$ (the controller) cannot change state but can influence the evolution of $A\setminus B$ . More elaborate feedback control setups, in which the same controller can control multiple subsystems, can be modeled using decompositions with multiple non-nested subsystems. Other examples of modular decompositions with overlap include circuits wolpert2020thermodynamic , spin systems where some spins are pinned by local magnetic fields, and many-particle systems where some particles have no mobility.

We can also provide more concrete conditions when Eqs. 59 and 60 hold for discrete-state master equations and Fokker-Planck equations. For discrete-state master equations, it can be verified by inspection that Eqs. 59 and 60 hold when all $L\in\Lambda$ can be written in the form

\displaystyle L_{x^{\prime}x}=\sum_{A\in\mathcal{C}}R^{(A)}_{x_{A}^{\prime},x_{A}}\delta_{x_{V\setminus A}}(x_{V\setminus A}^{\prime}),

(61)

where $\delta$ is the Kronecker delta and $R^{(A)}$ is some rate matrix over subsystem $A$ that does not allow the degrees of freedom in the overlap to change state ( $R^{(A)}_{x_{A}^{\prime},x_{A}}=0$ if $x_{A\cap O}\neq x_{A\cap O}^{\prime}$ ).

For Fokker-Planck equations, for simplicity consider overdamped dynamics of the form

Lp=\sum_{v\in V}\gamma_{v}^{L}\partial_{x_{v}}\Big{[}(\partial_{x_{v}}E_{L})p+\beta^{-1}\partial_{x_{v}}p\Big{]},

(62)

where $\gamma_{v}^{L}$ is the mobility coefficient along dimension $v$ and $E_{L}$ is the potential energy function associated with generator $L$ . Such equations can represent potential-driven Brownian particles coupled to a heat bath, where the different mobility coefficients represent different particle masses or sizes ⁸⁸8One can also apply the results in this section to Fokker-Planck equations that can be put in the form of Eq. 62 via an appropriate change of variables, see (risken1996fokker, , Sec. 4.9).. Now imagine that for all $L\in\Lambda$ , the energy functions are additive over the subsystems, and that the degrees of freedom in the overlap have no mobility:

\displaystyle E_{L}(x)=\sum_{A\in\mathcal{C}}E^{(A)}_{L}(x_{A}),\quad\;\;\;\gamma_{v}^{L}=0\;\;\;\forall v\in O.

(63)

In that case, Eq. 62 can be rewritten in the form of Eq. 58, with $L^{(A)}p=\sum_{v\in A\setminus O}\gamma_{v}^{L}\partial_{x_{v}}[(\partial_{x_{v}}E^{(A)}_{L})p_{A}+\beta^{-1}\partial_{x_{v}}p_{A}]$ , and satisfies Eqs. 59 and 60.

We now define the following nonlinear operator $\phi_{{\mathcal{C}}}$ :

\displaystyle\phi_{{\mathcal{C}}}(p)=p_{O}\prod_{{A\in\mathcal{C}}}p_{A\setminus O|A\cap O}.

(64)

This operator preserves the statistical correlations within each subsystem $A\in\mathcal{C}$ , as well as within the overlap $O$ , while destroying all other statistical correlations. As a simple example, if all the subsystems in $\mathcal{C}$ are non-overlapping, then $\phi_{{\mathcal{C}}}(p)$ has the product form $\phi_{{\mathcal{C}}}(p)=\prod_{A\in\mathcal{C}}p_{A}$ . In Appendix C, we show that $\phi_{{\mathcal{C}}}$ obeys the Pythagorean identity, Eq. 14. We also show that if some generator $L(t)$ obeys Eqs. 59 and 60, then $e^{\tau L(t)}$ commutes with $\phi_{{\mathcal{C}}}$ , so Eq. 16 holds.

This means that for any protocol that carries out the transformation $p\!\shortrightarrow\!p^{\prime}$ while obeying modularity constraints, the decompositions and bounds for EP and work derived in Section III are satisfied for $\phi=\phi_{{\mathcal{C}}}$ . In particular, using Eq. 21, we can decompose the free energy $F_{E}(p)$ of any distribution $p$ into the accessible free energy $F_{E}(\phi_{{\mathcal{C}}}(p))$ and the inaccessible free energy $D(p\|\phi_{{\mathcal{C}}}(p))/\beta$ . Note that $D(p\|\phi_{{\mathcal{C}}}(p))$ is a non-negative measure of the amount of statistical correlations between the subsystems of $\mathcal{C}$ under distribution $p$ , which vanishes when each subsystem is conditionally independent given the overlap $O$ . Thus, for a protocol that obeys modularity constraints, Eq. 18 states that the drop in those statistical correlations is a lower bound on EP, and that the amount of statistical correlation between the subsystems of $\mathcal{C}$ cannot increase over the course of the protocol. (There is a fair amount of closely related prior work; see Section VIII.)

A particularly simple application of our bounds occurs when $\mathcal{C}$ contains two (possibly overlapping) subsystems, $\mathcal{C}=\{A,B\}$ . In that case, the bounds in Eq. 18 can be rewritten in terms of the drop of a conditional mutual information between the two subsystems,

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq I(X_{A};X_{B}|X_{A\cap B})-I(X_{A}^{\prime};X_{B}^{\prime}|X_{A\cap B}^{\prime})\geq 0.

(65)

If the subsystems do not overlap, this can be further rewritten as the drop of the regular mutual information,

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq I(X_{A};X_{B})-I(X_{A}^{\prime};X_{B}^{\prime})\geq 0.

(66)

More generally, if $\mathcal{C}$ contains an arbitrary number of non-overlapping subsystems, the EP can be bound as

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq\mathcal{I}(p)-\mathcal{I}(p^{\prime})\geq 0,

(67)

where $\mathcal{I}(p)=\big{(}\sum_{A\in\mathcal{C}}S(p_{A})\big{)}-S(p)$ is the multi-information in distribution $p$ with respect to partition $\mathcal{C}$ ⁹⁹9The multi-information is a well-known generalization of mutual information, which is also sometimes called “total correlation” (watanabe1960information, )..

We finish by discussing thermodynamics of information under modularity constraints. In general, the results derived in Section IV apply to modularity constraints as a special case. However, we can also exploit special properties of the operator $\phi_{{\mathcal{C}}}$ to further simplify the expression of accessible information. Suppose that the distribution $p$ is invariant under $\phi_{{\mathcal{C}}}$ , so $p=\phi_{{\mathcal{C}}}(p)$ (e.g., if $p$ is an equilibrium distribution, see Eq. 17). Using Eq. 64, we can then rewrite Eq. 36 as

\displaystyle I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)

\displaystyle=I(X_{O};M)+\sum_{\mathclap{A\in\mathcal{C}}}I(X_{A};M|X_{A\cap O}).

(68)

Thus, the accessible information in measurement $M$ is the information that $M$ provides about the overlap, plus the conditional mutual information between each subsystem and $M$ given the relevant part of the overlap. This means that only information about individual subsystems — not about inter-subsystem correlations — can be turned into work. If there is no overlap, Eq. 68 can be further simplified as

\displaystyle I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)

\displaystyle=\sum_{\mathclap{A\in\mathcal{C}}}I(X_{A};M).

(69)

We will use these expressions in some of our examples below.

VI.1 Example: Szilard box with modularity constraints

We illustrate our results for modularity constraints on a Szilard box. In doing so, we will demonstrate two important concepts: first, how the same set of generators $\Lambda$ can be analyzed under different constraints, resulting in different bounds on work and EP (compare this section to Section V.1); second, how bounds arising from multiple constraints can be stacked on top of each in an iterative manner, as in Eq. 28 (we will combine bounds from modularity and symmetry constraints).

We consider the same setup as in Section V.1: there is a single overdamped particle in a box coupled to a bath at inverse temperature $\beta=1$ , which evolves under potential energy functions as in Eq. 44. This system is driven from some initial distribution $p$ to a final uniform equilibrium distribution, $p^{\prime}=u$ while extracting work.

Note that the energy functions in Eq. 44 have no interaction terms between $x_{1}$ (the horizontal position of the particle) and $x_{2}$ (the vertical position of the particle). That means that the allowed driving protocols obey modularity constraints for a decomposition of the system into two subsystems, $\mathcal{C}=\{\{X_{1}\},\{X_{2}\}\}$ (since Eq. 63 is satisfied for the decomposition). This allows us to analyze EP and work using an operator $\phi_{{\mathcal{C}}}$ which maps each joint distribution over $X_{1}\times X_{2}$ into a product distribution,

\displaystyle\phi_{{\mathcal{C}}}(p)(x_{1},x_{2})=p(x_{1})p(x_{2}).

(70)

In particular, using the same derivation as in Eq. 48, we can bound the extractable work in terms of the accessible free energy in $p$ ,

\displaystyle W(p\!\shortrightarrow\!u)\leq D(\phi_{{\mathcal{C}}}(p)\|u).

(71)

As discussed in Section V.1, this system also obeys symmetry constraints, corresponding to the vertical reflection twirling operator $\phi_{\mathcal{G}}$ defined in Eq. 46. We can use Eq. 29 to bound the extractable work using a combination of $\phi_{{\mathcal{C}}}$ and $\phi_{\mathcal{G}}$ ,

	$\displaystyle W(p\!\shortrightarrow\!u)$	$\displaystyle\leq D(\phi_{{\mathcal{C}}}(\phi_{\mathcal{G}}(p))\\|u)$		(72)
	$\displaystyle W(p\!\shortrightarrow\!u)$	$\displaystyle\leq D(\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p))\\|u).$		(73)

For concreteness, imagine that the initial distribution $p$ is concentrated within half the box, as determined by a separating line rotated by some arbitrary angle $\theta\in[-\pi,\pi]$ , so $p=p_{\theta}$ from Eq. 49 (see Fig. 5(a) for an illustration).

We consider the extractable work bound in Eq. 71 for the initial distribution $p_{\theta}$ . For a given $p_{\theta}$ , the corresponding decorrelated initial distribution $\phi_{{\mathcal{C}}}(p_{\theta})$ is illustrated in Fig. 8(a). Then, the accessible free energy in Eq. 71 can be expressed in closed form as (see Section C.3),

D(\phi_{{\mathcal{C}}}(p_{\theta})\|u)=\ln 4-\frac{1}{2}\Big{[}\min\{|\tan\theta|,|\tan(\pi/2-\theta)|\}\\ +f(\max\{|\tan\theta|,|\tan(\pi/2-\theta)|\})\Big{]},

(74)

where for notational convenience we’ve defined

\displaystyle f(x)=1-\frac{1+x^{2}}{2x}\ln\frac{x+1}{x-1}-\ln\frac{x^{2}-1}{4x^{2}}.

(75)

Eq. 74 is plotted in Fig. 9 in green. Note that this function peaks both at $\theta\in\{-\pi,0,\pi\}$ (i.e., when the particle is in the left or right half of the box) as well as $\theta\in\{-\pi/2,\pi/2\}$ (i.e., when the particle is in the top or bottom half of the box) — precisely those $\theta$ for which $p_{\theta}$ has no correlations between the horizontal and vertical position of the particle.

Next, we consider the extractable work bound in Eq. 72 for the initial distribution $p_{\theta}$ . It can be verified that $\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))(x_{1},x_{2})=p_{\theta}(x_{1})u(x_{2})$ , which is illustrated in Fig. 8(a). The right hand side of Eq. 72 can again be expressed in closed form as (see Section C.3)

\displaystyle D(\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))\|u)=\ln 2-\frac{1}{2}\begin{cases}f(|\tan\theta|)&\text{if $|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4})$}\\ |\tan\theta|&\text{otherwise}\end{cases}

(76)

with $f$ defined as in Eq. 75. This result is shown in Fig. 9 in orange. Note also that $\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))=\phi_{{\mathcal{C}}}(\phi_{\mathcal{G}}(p_{\theta}))$ for all $p_{\theta}$ , so the bounds in Eqs. 72 and 73 are equivalent.

For comparison we also plot the extractable work bound derived using symmetry constraints, Eq. 50 (Fig. 9 in blue). It is clear that the bound derived by exploiting a combination of modularity and symmetry constraints (in orange) is strictly tighter than the bounds derived by using either only modularity (green) or only symmetry constraints (blue) individually.

One can also use the bounds derived in this section to analyze the accessible information in a measurement of the Szilard box. Imagine that, starting from a uniform equilibrium distribution, one measures which side of the box contains the particle, as determined by a separating line at some arbitrary angle $\theta\in[-\pi,\pi]$ . For this measurement, the conditional distribution over system states $p_{X|m}$ is equal to $p_{\theta}$ half the time and equal to $p_{\theta+\pi}$ the other half the time. One can then derive bounds on accessible information such as Eq. 52, while using the bounds derived in this section (Eqs. 71, 72 and 73).

VI.2 Example: Generalized Szilard box

Our results on modularity constraints can be useful for analyzing the thermodynamics of multi-particle systems. As an example, consider the “generalized Szilard box” feedback-control scenario analyzed in song2021optimal . Here, a box containing an ideal gas of $N$ particles, which are indexed by $v\in V$ , begins in uniform equilibrium with a heat bath at inverse temperature $\beta$ . Several partitions are inserted into the box, separating the box into separate volumes, and a measurement $M$ is made of the number of particles in each volume (see the illustration in Fig. 10). The box is then separated from the bath and, depending on the outcome of the measurement, the partitions are moved so as to equalize the pressure within each volume while extracting work. To make the process repeatable, suppose that at the end of the protocol, the partitions are removed and the box is again equilibrated with the bath (note that this last step does not contribute to extracted work).

The ideal gas assumption means that the particles do not interact, so by Eqs. 59 and 60 the protocol obeys modularity constraints with respect to a decomposition in which each particle is a separate subsystem. The corresponding operator $\phi_{{\mathcal{C}}}$ is given by

\displaystyle\phi_{{\mathcal{C}}}(p)(x)=\prod_{v=1}^{N}p(x_{v}).

(77)

Given Eq. 34, the average extractable work for the above feedback-control scenario is bounded by $\langle W\rangle\leq I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/\beta$ , which can also be written in terms of the information provided by the measurement $M$ about each individual particle,

\displaystyle\langle W\rangle\leq\sum_{v=1}^{N}I(X_{v};M)/\beta,

(78)

as follows from Eq. 69. In fact, by symmetry of the initial distribution, the measurement provides the same information about each particle, $I(X_{v};M)=I(X_{1};M)$ for all $v$ , so we can further rewrite Eq. 78 as $\langle W\rangle\leq N\cdot I(X_{1};M)/\beta$ .

This shows that Eq. 78, which is reported as one of the main results of song2021optimal (Eq. 5), follows immediately from our framework. Moreover, our derivation holds under a broader set of conditions than those considered in song2021optimal , since it does not rely on any of the details of setup (such as the type of partitions, the particular work extraction protocol, or even the assumption that the particles are identical).

VI.3 Example: Collective flashing ratchet

As a final example of modularity constraints, we consider the “collective flashing ratchet”, a classic model in the literature on the thermodynamics of information cao2004feedback ; craig_feedback_2008 . This system involves $N$ overdamped particles evolving under an additive potential

\displaystyle E(x)=\lambda\sum_{v=1}^{N}V(x_{v}).

(79)

where $V$ is a single-particle potential and $\lambda\in\{0,1\}$ is a control parameter that can be used to turn the potential on/off. The single-particle potential $V$ is chosen as an asymmetrical sawtooth “ratchet” pattern, shown in Fig. 11, where $\alpha\in[0,1/2]$ parameterizes the degree of asymmetry.

By manipulating $\lambda$ over time, possibly in a way that depends on measurements of the system, the particles can be driven so as to have a net directional flux, or to do work against the externally applied force feito_information_2007 . For instance, in a feedback control setup, $\lambda$ is determined by the outcome of some measurement $M$ . The most common strategy involves turning the ratchet potential on when the net force on the particles is positive, and turning it off otherwise, according to the following measurement channel cao2004feedback :

\displaystyle q(m|x)=\delta_{m}\big{[}{\textstyle\Theta\big{(}\sum_{v}V^{\prime}(x_{v})\big{)}}\big{]},

(80)

where $\Theta$ is the Heaviside function. Note that this system has been experimentally realized lopez_realization_2008 .

Suppose that starting from some initial distribution $p$ , the measurement in Eq. 80 is performed. As common in the literature cao2004feedback , we assume that under $p$ the particles are identically and independently distributed, and that each particle is in the increasing part of the potential ( $V^{\prime}(x_{v})\geq 0$ ) with probability $\alpha$ (see Fig. 11). The measurement outcome is then used to drive the system back to distribution $p$ while extracting work by manipulating the system’s energy function, all while coupled to a heat bath at inverse temperature $\beta$ . We assume that the driving protocols start and end on the same energy function, and that only additive potentials (without interaction terms) are applied to the system during the driving (this assumption allows for potentials such as Eq. 79, as well as many others).

The driving protocols obey Eq. 63 for a decomposition where each particle is its own subsystem, corresponding to the same type of $\phi_{{\mathcal{C}}}$ as in Eq. 77, $\phi_{{\mathcal{C}}}(p)(x)=\prod_{v\in V}p(x_{v})$ . As in Section VI.1, we can use Eq. 34 to bound average extractable work as $\langle W\rangle\leq I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/\beta$ . Using Eq. 69,

\displaystyle I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)=\sum_{v=1}^{N}I(X_{v};M)=N\cdot I(X_{1};M),

(81)

where we’ve used the measurement provides the same information about each particle, $I(X_{v};M)=I(X_{1};M)$ for all $v$ (as follows from a symmetry argument).

In Section C.4, we show that $I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)$ can be computed in closed form. Values of $I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)$ for different values of $N$ (the number of particles) and $\alpha$ (the asymmetry parameter) are plotted in Fig. 12(left). Note that the accessible information shows a non-monotonic behavior in the number of particles for $\alpha\neq 0.5$ . This occurs because for a highly asymmetric potential, the total amount of acquired information grows with $N$ : $I(X;M)$ grows from a minimum value of $h_{2}(\alpha)$ for $N=1$ to a maximum value of $\ln 2$ as $N\to\infty$ . Given this observation, we also calculate the “efficiency” of the measurements in terms of the ratio $I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/I(X;M)$ . This is shown in Fig. 12(right) for various values of $N$ and $\alpha$ . Interestingly, lower values of $\alpha$ (higher values of asymmetry) have higher efficiency values.

In the $N\to\infty$ limit, accessible information and efficiency converge to a single value, irrespective of $\alpha$ . In Section C.4, we show that the accessible information $I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)$ converges to $1/\pi\approx 0.32$ nats, while the efficiency $I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)/I(X;M)$ converges to $1/(\pi\ln 2)\approx 0.46$ (dotted lines in Fig. 12).

For a different (and complementary) theoretical analysis of extracted work in a feedback controlled flashing ratchet, see feito_information_2007 .

VII Coarse-grained constraints

In our final results section, we consider bounds on EP and work that arise from coarse-grained constraints.

We begin by introducing some notation and preliminaries. Let $\xi:X\to Z$ be some coarse-graining of the microscopic state space $X$ , where $Z$ is a set of macrostates. For any distribution $p$ over $X$ , we use $p_{Z}(z)=\int\delta_{\xi(x)}(z)p(x)\,dx$ to indicate the corresponding distribution over the macrostates $Z$ , $p_{X|Z}(x|z)=p(x)/p_{Z}(z)$ to indicate the conditional probability distribution of microstates within macrostates, and ${{\mathcal{P}}_{Z}}:=\{p_{Z}:p\in\mathcal{P}\}$ to indicate the set of all coarse-grained distributions. Finally, for any generator $L$ and distribution $p$ , we use $[Lp]_{Z}$ to indicate the resulting instantaneous dynamics of the coarse-grained distribution $p_{Z}$ .

To derive our bounds, we suppose that the dynamics over the coarse-grained distributions are closed, i.e., for all $L\in\Lambda$ ,

\displaystyle p_{Z}=q_{Z}\implies[Lp]_{Z}=[Lq]_{Z}\qquad\forall p,q\in\mathcal{P}.

(82)

Given this assumption, the evolution of the coarse-grained distribution $p_{Z}$ can be represented by a coarse-grained generator, which we write as ${\textstyle\partial_{t}}p_{Z}=\hat{L}p_{Z}$ (discussed in detail below).

We can specify more concrete conditions that guarantee that Eq. 82 holds for a given generator $L$ (see Appendix D for details). For a discrete-state rate matrix $L$ , it is satisfied when

\sum_{{x:\xi(x)=z}}L_{xx^{\prime}}={\hat{L}}_{z,\xi(x^{\prime})}\quad\forall x^{\prime},z\neq\xi(x^{\prime}),

(83)

where ${\hat{L}}_{z,z^{\prime}}$ is some coarse-grained transition rate from macrostate $z^{\prime}$ to macrostate $z$ . Eq. 83 states that for each microstate $x^{\prime}$ , the total rate of transitions from $x^{\prime}$ to microstates located in another macrostate $z\neq\xi(x^{\prime})$ depends only on the macrostate $\xi(x^{\prime})$ , not on $x^{\prime}$ directly. This condition has been sometimes called “lumpability” in the literature nicolis2011transformation .

For a continuous-state master equation, Eq. 82 is satisfied when a continuous-state version of Eq. 83 (with sums replaced by integrals) holds. Moreover, for certain Fokker-Planck equation and linear coarse-graining functions, Eq. 83 can be replaced by a simple coarse-graining condition on the energy functions. Suppose each $L\in\Lambda$ is a Fokker-Planck operator like

Lp=\nabla\cdot(\nabla E_{L})p+\beta^{-1}\Delta p,

(84)

and that $\xi$ is a linear function, $\xi(x)=Wx$ (where $W$ is some full-rank $m\times n$ matrix, $m\leq n$ ). Without loss of generality, we assume that $W$ is scaled so that $WW^{T}=I$ ¹⁰¹⁰10If $\xi(x)=Wx$ and $WW^{T}\neq I$ , one can define an equivalent, rescaled coarse-graining function $\xi^{\prime}(x)=W^{\prime}x$ , where $W^{\prime}:=(WW^{T})^{-1/2}W$ , which obeys $W^{\prime}W^{\prime T}=I$ .. In addition, suppose that each energy function satisfies

\displaystyle W\nabla E_{L}(x)=-\hat{F}(\xi(x))\quad\forall x

(85)

for some arbitrary macrostate drift function $\hat{F}:Z\to\mathbb{R}$ . Then, the coarse-grained generator $\hat{L}$ itself will have a Fokker-Planck form (see duong2018quantification and Appendix D),

\displaystyle\hat{L}p_{Z}=-\nabla\cdot\hat{F}p_{Z}+\beta^{-1}\Delta p_{Z}.

(86)

The right side of Eq. 86 depends only on $p_{Z}$ and not the full microstate distribution $p$ , so Eq. 82 will be satisfied.

Importantly, if Eq. 82 holds, the EP rate at time $t$ can be bounded as (see Appendix D):

\displaystyle\dot{\Sigma}(p(t),L(t))\geq-\sum_{z}{\textstyle\partial_{t}}p_{Z}(z,t)\ln\frac{p_{Z}(z,t)}{\pi_{Z}^{L(t)}(z)}\geq 0,

(87)

where ${\textstyle\partial_{t}}p_{Z}(t)=\hat{L}p_{Z}(t)$ and $\pi_{Z}^{L(t)}$ is the coarse-grained version of $\pi^{L(t)}$ , the stationary distribution of $L(t)$ . The right hand side of Eq. 87 is the coarse-grained version of Eq. 11, which arises from the macrostate distribution $p_{Z}$ being out of equilibrium. We then define the total “coarse-grained EP” over the course of the protocol as the time integral of the middle term in Eq. 87,

\displaystyle\hat{\Sigma}(p_{Z}\!\shortrightarrow\!{p_{Z}^{\prime}})=\int_{0}^{1}-\sum_{z}{\textstyle\partial_{t}}p_{Z}(z,t)\ln\frac{p_{Z}(z,t)}{\pi_{Z}^{L(t)}(z)}\;dt.

(88)

Given Eq. 87, the coarse-grained EP serves as a non-negative lower bound on the total EP,

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq\hat{\Sigma}(p_{Z}\!\shortrightarrow\!{p_{Z}^{\prime}})\geq 0.

(89)

Note that esposito2012stochastic previously derived a coarse-grained EP rate for discrete-state master equations, which differs from the one that appears on the right hand side of Eq. 87; however, Eq. 87 can be seen as the “nonadiabatic component” of the coarse-grained EP rate from esposito2012stochastic , and is thus a lower-bound on it esposito2010three .

We say that the available driving protocols obey coarse-grained constraints if the generators $L\in\Lambda$ exhibit closed dynamics over $Z$ , Eq. 82, and there is some operator $\hat{\phi}:{{\mathcal{P}}_{Z}}\to{{\mathcal{P}}_{Z}}$ that obeys the Pythagorean identity, Eq. 14, and the commutativity relation, Eq. 16, with respect to all $\hat{L}$ . For example, this coarse-grained operator $\hat{\phi}$ might reflect the presence of symmetry or modularity constraints on the coarse-grained dynamics.

We can then use Eq. 89 and the framework developed in Section III to derive bounds on work and EP. In particular, Eq. 18 implies the following bound on coarse-grained EP, $\hat{\Sigma}(p_{Z}\!\shortrightarrow\!{p_{Z}^{\prime}})\geq D(p_{Z}\|\hat{\phi}(p_{Z}))-D({p_{Z}^{\prime}}\|\hat{\phi}({p_{Z}^{\prime}}))\geq 0$ . Combined with Eq. 89, this lets us bound overall EP as

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p_{Z}\|\hat{\phi}(p_{Z}))-D({p_{Z}^{\prime}}\|\hat{\phi}({p_{Z}^{\prime}}))\geq 0.

(90)

Via Eq. 2, this also gives a bound on extractable work like

W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(p)-F_{E^{\prime}}(p^{\prime})-\\ [D(p_{Z}\|\hat{\phi}(p_{Z}))-D({p_{Z}^{\prime}}\|\hat{\phi}({p_{Z}^{\prime}}))]/\beta.

(91)

Eqs. 90 and 91 can also be used to derive bounds on average work extraction in feedback control protocols, using the strategy described in Section IV.

If $\hat{\phi}$ represents coarse-grained symmetry or modularity constraints, then Eq. 90 implies that any asymmetry or inter-subsystem correlation in the macrostate distribution can only be dissipated away, not turned into work. Another simple application occurs when all $L\in\Lambda$ have the same coarse-grained equilibrium distribution, i.e., there is some $\pi_{Z}$ such that $\hat{L}\pi_{Z}=0$ for all $L$ . In this case, $\hat{\phi}(p)=\pi_{Z}$ satisfies Eqs. 14 and 16 at the coarse-grained level (compare to the derivation of Eq. 27 above). Applying Eq. 90 then gives

\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})\geq D(p_{Z}\|\pi_{Z})-D({p_{Z}^{\prime}}\|\pi_{Z})\geq 0,

(92)

as well as a corresponding extractable work bound, as in Eq. 91. This shows that if the coarse-grained equilibrium distribution $\pi_{Z}$ cannot change, then any deviation between the actual coarse-grained distribution $p_{Z}$ and $\pi_{Z}$ must be dissipated as EP, not turned into work.

VII.1 Example: Szilard box

We demonstrate our results on coarse-grained constraints using the Szilard box. We consider a similar setup as in Sections V.1 and VI.1, where there is a single overdamped particle in a box coupled to a bath at inverse temperature $\beta=1$ . However, we now assume that there is a vertical gravitational force, as illustrated in Fig. 13. Formally, this means that the available potential energy functions have the form

E_{\lambda}(x_{1},x_{2})=V_{\mathrm{p}}(x_{1}-\lambda)+V_{\mathrm{w}}(|x_{1}|)+V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2},

(93)

where $\kappa$ is a fixed constant that determines the strength of gravity. Unlike Eq. 44, this energy function in Eq. 93 no longer obeys the reflection symmetry $(x_{1},x_{2})\mapsto(x_{1},-x_{2})$ .

The microstate of the particle is represented by the horizontal and vertical position, $x=(x_{1},x_{2})$ . We consider a coarse-graining in which the macrostate is the vertical coordinate of the particle $Z=X_{2}$ , corresponding to the coarse-graining function $\xi(x_{1},x_{2})=Wx=x_{2}$ with $W=[0\;1]$ . It is easy to check that the potential energy functions in Eq. 93 satisfy

\displaystyle W\nabla E_{\lambda}(x)=\partial_{x_{2}}[V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2}],

(94)

which obeys Eq. 85 and therefore guarantees that the coarse-grained dynamics are closed. In fact, the coarse-grained generators have the Fokker-Planck form of Eq. 86 with the coarse-grained drift function $\hat{F}(x_{2})=-\partial_{x_{2}}[V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2}]$ , which leads to the following Boltzmann stationary distribution:

	$\displaystyle\pi_{X_{2}}(x_{2})$	$\displaystyle\propto e^{-\beta[V_{\mathrm{w}}(\|x_{2}\|)+\kappa x_{2}]}$
		$\displaystyle=\mathbf{1}_{[-1,1]}(x_{2})e^{-\beta\kappa x_{2}},$		(95)

where in the second line we used the form of $V_{\mathrm{w}}(\cdot)$ from Eq. 45. Since the coarse-grained equilibrium distribution is the same for all energy functions having the form Eq. 93, we can use the EP bound in Eq. 92.

Suppose that the system starts from some initial distribution $p$ and is then driven to a final equilibrium distribution $p^{\prime}$ while extracting work. We assume that the partition is removed at the beginning and end of the protocol, corresponding to the energy function $E^{\varnothing}(x_{1},x_{2})=V_{\mathrm{w}}(|x_{1}|)+V_{\mathrm{w}}(|x_{2}|)+\kappa x_{2}$ , with the Boltzmann distribution

\displaystyle\pi^{\varnothing}(x_{1},x_{2})\propto\mathbf{1}_{[-1,1]^{2}}(x_{1},x_{2})e^{-\beta\kappa x_{2}}.

(96)

We will also assume that the final distribution is in equilibrium, so $p^{\prime}=\pi^{\varnothing}$ . Then, the extractable work involved in this transformation can be expressed as

	$\displaystyle W(p\!\shortrightarrow\!\pi^{\varnothing})$	$\displaystyle=F_{E^{\varnothing}}(p)-F_{E^{\varnothing}}(\pi^{\varnothing})-\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})$
		$\displaystyle=D(p\\|\pi^{\varnothing})-\Sigma(p\!\shortrightarrow\!\pi^{\varnothing}),$		(97)

where we used Eqs. 2 and 5. We can then upper bound extractable work by combining Eq. 97 with various lower bounds on $\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})$ .

For instance, the second law states that $\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})\geq 0$ , so

\displaystyle W(p\!\shortrightarrow\!\pi^{\varnothing})\leq D(p\|\pi^{\varnothing}).

(98)

We can also derive a stronger bound by exploiting coarse-grained constraints. For the coarse-graining described above, Eq. 92 implies that $\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})\geq D(p_{X_{2}}\|\pi_{X_{2}})$ , which gives the bound

	$\displaystyle W(p\!\shortrightarrow\!\pi^{\varnothing})$	$\displaystyle\leq D(p\\|\pi^{\varnothing})-D(p_{X_{2}}\\|\pi_{X_{2}})$
		$\displaystyle=D(p_{X_{1}\|X_{2}}\\|\pi^{\varnothing}_{X_{1}\|X_{2}}).$		(99)

We can also bound EP and work using other kinds of constraints. For instance, the energy functions in Eq. 93 have no interaction terms between $x_{1}$ and $x_{2}$ , and therefore obey modularity constraints for the decomposition $\mathcal{C}=\{\{X_{1}\},\{X_{2}\}\}$ (see the analysis in Section VI.1). This allows us to bound EP and work using the operator $\phi_{{\mathcal{C}}}$ , as defined above in Eq. 70. In particular, using Theorem 2, we have that

	$\displaystyle\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})$	$\displaystyle=D(p\\|\phi_{{\mathcal{C}}}(p))+\Sigma(\phi_{{\mathcal{C}}}(p)\!\shortrightarrow\!\pi^{\varnothing})$		(100)
		$\displaystyle\geq D(p\\|\phi_{{\mathcal{C}}}(p)).$

which implies the extractable work bound

\displaystyle W(p\!\shortrightarrow\!\pi^{\varnothing})

\displaystyle\leq D(p\|\pi^{\varnothing})-D(p\|\phi_{{\mathcal{C}}}(p))=D(\phi_{{\mathcal{C}}}(p)\|\pi^{\varnothing}).

(101)

Finally, we can also combine modularity and coarse-grained constraints. The coarse-grained constraints implies that $\Sigma(\phi_{{\mathcal{C}}}(p)\!\shortrightarrow\!\pi^{\varnothing})\geq D(\phi_{{\mathcal{C}}}(p)_{X_{2}}\|\pi_{X_{2}})$ by Eq. 92. Plugged into Eq. 100, this gives

\displaystyle\Sigma(p\!\shortrightarrow\!\pi^{\varnothing})\geq D(p\|\phi_{{\mathcal{C}}}(p))+D(\phi_{{\mathcal{C}}}(p)_{X_{2}}\|\pi_{X_{2}}),

(102)

resulting in the extractable work bound

\displaystyle W(\phi_{{\mathcal{C}}}(p)\!\shortrightarrow\!\pi^{\varnothing})\leq D(\phi_{{\mathcal{C}}}(p)_{X_{1}|X_{2}}\|\pi^{\varnothing}_{X_{1}|X_{2}}),

(103)

where we’ve again used the chain rule of KL divergence.

We now illustrate these bounds using a concrete set of initial distributions. Imagine that the initial distribution $p$ is the equilibrium distribution $\pi^{\varnothing}$ restricted to half the box, as determined by a rotated separating line at some angle $\theta\in[-\pi,\pi]$ ,

\displaystyle p_{\theta}(x_{1},x_{2})=\frac{1}{2}\pi^{\varnothing}(x_{1},x_{2})\Theta(x_{2}\sin\theta-x_{1}\cos\theta).

(104)

(Compare to Eq. 49, for the Szilard box without gravity). For these initial distributions and gravity parameter $\kappa=1$ , we plot the four extractable work bounds derived above, Eqs. 98, 99, 101 and 103, as a function of $\theta$ in Fig. 14 (values are calculated numerically). Note that, unlike the results presented in Figs. 6 and 9, the plots are no longer symmetric under the transformation $\theta\mapsto-\theta$ . This arises because gravity breaks the vertical reflection symmetry, so the nonequilibrium free energy of a distribution concentrated on the top half of the box ( $\theta=\pi/2$ ) is greater than the nonequilibrium free energy of a distribution concentrated on the bottom half of the box ( $\theta=-\pi/2$ ). It can also be seen that work bounds derived from coarse-grained constraints, Eq. 99 (orange), can be either weaker or stronger than the work bounds derived from modularity constraints, Eq. 101 (green), depending on the value of $\theta$ . For all $\theta$ , however, the work bound derived by combining both constraints, Eq. 103 (red), is stronger than the work bound derived from either constraint individually.

VIII Relevant literature

In previous work on the general topic of thermodynamic bounds under constraints, Wilming et al. (wilming_second_2016, ) considered how extractable work depends on constraints on the Hamiltonian, given a quantum system coupled to a finite-sized heat bath. That paper derived an upper bound on the work that could be extracted by carrying out a physical process which consists of sequences of (1) unitary transformations of the system and bath, and (2) total relaxations of the system to some equilibrium Gibbs state (see also a similar setup for closed systems in perarnau-llobet_work_2016 ). Building on (wilming_second_2016, ), lekscha_quantum_2018 analyzed the efficiency of a heat engine coupled to two baths and subject to “local control” constraints (i.e., a many particle system where local Hamiltonians can be changed but the interaction Hamiltonians cannot). In contrast to these works, we consider a classical system coupled to idealized reservoir(s). We then derive bounds on EP and work for a much broader set of protocols.

At a high level, our approach complements previous research on the relationship between EP, extractable work and different aspects of the driving protocol, such as temporal duration (esposito2010finite, ; sivak2012thermodynamic, ; shiraishi_speed_2018, ; gomez2008optimal, ; then2008computing, ; zulkowski2014optimal, ; schmiedl2007optimal, ), stochasticity of control parameters (machta2015dissipation, ), non-idealized work reservoirs (verley_work_2014, ), cyclic protocols schmiedl2007optimal ; allahverdyan2004maximal , the presence of additional conservation laws uzdin2021passivity , and the design of “optimal protocols” (solon2018phase, ; gingrich2016near, ; aurell2011optimal, ).

There is also previous work related to our analysis of thermodynamics of information under constraints in Section IV. (still2020thermodynamic, ) recently analyzed the thermodynamics of feedback control under a somewhat different formulation of constraints ¹¹¹¹11That paper proposed to divide the system into two subsystems $Y$ and $\bar{Y}$ , such that the accessible information is given by $I(M;Y)$ , under three assumption: (1) the system’s marginal distributions remains constant during all steps of feedback control, (2) the conditional distribution of $\bar{Y}$ given the system and the measurement does not change during the driving, and (3) all conditional information about $\bar{Y}$ is lost by the time that driving begins. After private communication with the author of still2020thermodynamic , we think that condition (3) may need to be formalized as $p(\bar{y}(t_{2})|y(t_{2}),z(t_{0}))=p(\bar{y}(t_{2})|y(t_{2}))$ , although this equation does not appear in that paper.. In this work, we analyze the thermodynamics of information for a broader set of constraints. It is not immediately clear how the framework in still2020thermodynamic compares to ours, or whether it could be applied to the examples considered in this paper, although such a comparison is an interesting direction for future work.

Some of our results concerning work extraction under modularity constraints in Section VI have appeared in prior literature. Eq. 66 was derived in (Boyd:2018aa, ) for the special case of an isothermal processes with two non-overlapping subsystems, where one of the subsystems is held fixed. For the more general case of an arbitrary discrete-state system coupled to one or more reservoirs which have rate matrices as in Eq. 61, Eq. 66 was also previously derived in wolpert_thermo_comp_review_2019 ; wolpert2020thermodynamic , while Eq. 67 was previously derived in wolpert_thermo_comp_review_2019 ; wolpert.thermo.bayes.nets.2020 ; wolpert2020thermodynamic . Decompositions with overlap were previously considered in wolpert2020fluctuation ; wolpert2020minimal . In addition, Example 1 in wolpert2020strengthened can be used to derive the first inequality Eq. 65 for discrete-state systems ¹²¹²12The reader should be aware that those papers used different terminology from this paper. In wolpert2020fluctuation ; wolpert2020minimal , each degree of freedom $v\in V$ is called a “subsystem”, the modular decomposition $\mathcal{C}$ is called a “unit structure”, while each $A\in\mathcal{C}$ is called a “unit”..

Those papers also derived some results that were more general than the ones derived here, in that they apply even if the overlap changes state. Our paper goes beyond this previous work though to include continuous-state systems, and to derive inequalities such as $D(p\|\phi_{{\mathcal{C}}}(p))-D(p^{\prime}\|\phi_{{\mathcal{C}}}(p^{\prime}))\geq 0$ , albeit for the more restricted scenario where the overlap does not change state.

Some of our results concerning work extraction under symmetry constraints, presented in Section V, appeared in previous work on quantum thermodynamics. For a finite-state quantum system coupled to a work reservoir and heat bath, Vaccaro et al. vaccaro_tradeoff_2008 investigated how much work can be extracted by bringing some initial quantum state $\rho$ to a maximally mixed state, with a uniform initial and final Hamiltonian, using discrete-time operations that commute with the action of some symmetry group $\mathcal{G}$ . It was shown that the work that can be extracted from $\rho$ under such transformations is equal to the work that can be extracted from the (quantum) twirling $\phi_{\mathcal{G}}(\rho)$ , analogous to Eq. 24 for symmetry constraints. This research also derived an operational measure of asymmetry that is the quantum equivalent of $D(p\|\phi_{\mathcal{G}}(p))$ , and showed that asymmetry can only decrease under operations that commute with $\mathcal{G}$ . Janzing janzing_quantum_2006 extended vaccaro_tradeoff_2008 to consider arbitrary Hamiltonians, in the process deriving analogues of our decomposition of free energy (Eq. 21) for the special case of the twirling operator $\phi_{\mathcal{G}}$ . A similar decomposition of free energy into coherent and incoherent components has recently appeared in lostaglio_description_2015 ; santos_role_2019 (this is a special case of the result in janzing_quantum_2006 , since a decohering map is a twirling operator elphick2019spectral ). Finally, the idea of probability distributions that are invariant under symmetry groups, as well as a version of the twirling operator $\phi_{\mathcal{G}}$ , is a topic of research in probability and statistics; for details, see Ch. 3 in eaton_group_1989 .

While our approach is restricted to classical systems, in some respects our results for symmetry constraints are more general than this earlier work, since they hold for arbitrary (discrete and/or uncountably infinite) state spaces and for systems coupled to more than one reservoir (see Section IX). Moreover, for Fokker-Planck dynamics, we derive simple conditions for symmetry constraints stated in terms of the energy functions, which makes these results applicable to a large set of problems in stochastic thermodynamics and biophysics.

More fundamentally, one of the ways in which we go beyond previous literature on symmetry and modularity constraints is that by providing a unified mathematical framework that applies to a broad set of constraints, including symmetry, modularity, and coarse-grained constraints (as well as their combinations) as special cases. A key idea in our framework is that the information-geometric Pythagorean identity, Eq. 14, is the essential property that allows an operator $\phi$ to uncover the thermodynamically accessible part of any distribution $p$ (assuming also that $\phi$ commutes with the dynamics). The Pythagorean identity is satisfied by many $\phi$ , including both linear operators such as twirling operators $\phi_{\mathcal{G}}$ and nonlinear operators such as modular decomposition operators $\phi_{{\mathcal{C}}}$ . We believe this idea can be extended to the quantum domain, though we leave this for future work.

Finally, our approach is also related to “resource theories”, which are an active area of research in various areas of quantum physics chitambar2019quantum , including quantum thermodynamics wilming_second_2016 ; gallego_thermodynamic_2016 ; brandao_resource_2013 ; lostaglio_stochastic_2015 ; faist_fundamental_2018 ; yunger_halpern_beyond_2016 . A resource theory quantifies a physical resource in an operational way, in terms of what transformations are possible when the resource is available. Most resource theories are based on a common set of formal elements, such as a resource quantifier (a real-valued function that measures the amount of a resource), a set of free states (statistical states that lack the resource), and free operations (transformations between statistical states that do not increase the amount of resource). In fact, some previous work on symmetry constraints in quantum thermodynamics vaccaro_tradeoff_2008 ; janzing_quantum_2006 can be seen as part of a broader literature on the resource theory of asymmetry marvian_extending_2014 ; marvian_asymmetry_2014 ; marvian_modes_2014 .

Our approach has similar operational motivations as resource theories; for example, we define “accessible free energy” in an operational way, as a quantity that governs extractable work under protocol constraints. Moreover, many elements of our framework are analogous to elements of the resource theory framework: the set of allowed generators (which we call $\Lambda$ ) plays the role of the free operations, the image of the operator $\phi$ plays the role of the set of free states, and the KL divergence $D(p\|\phi(p))$ serves as the resource quantifier. In addition, the commutativity relation Eq. 16 (see Section III) has recently appeared in work on so-called resource destroying maps liu2017resource . However, unlike most resource theories, our focus is on the thermodynamics of classical systems modeled as driven continuous-time open systems. Further exploration of the connection between our approach and resource theories is left for future work.

IX Discussion

In this paper, we analyzed the EP and work incurred by a driving protocol that carries out some transformation $p\!\shortrightarrow\!p^{\prime}$ , while subject to constraints on the set of available generators. We constructed a general framework that allowed us derive several decompositions and bounds on EP and extractable work, and demonstrated that this framework has implications for the thermodynamics of feedback control under constraints. Finally, we used our framework to analyze three broad classes of protocol constraints, reflecting symmetry, modularity, and coarse-graining.

Note that our bounds on EP and extractable work, such as Eqs. 18 and 25, are expressed in terms of state functions, i.e., they depend only on the initial and final distributions $p$ and $p^{\prime}$ and not on the path that the system takes in going from $p$ to $p^{\prime}$ . In general, it may be possible to derive other bounds on work and EP that are not written in this form, which may be tighter. Nonetheless, bounds written in terms of state functions have some important advantages. In particular, they allow one to quantify the inherent “thermodynamic value” (in terms of EP and work) of a distribution $p$ relative to a set of available generators, irrespective of what protocol brought the system there or what future protocols that system may undergo (as long as those protocols obey the relevant constraints).

For simplicity, our results were derived for isothermal protocols, where the system is coupled to a single heat bath at a constant inverse temperature $\beta$ and obeys local detailed balance (LDB). Nonetheless, many of our results continue to hold for more general protocols, in which the system is coupled to any number of thermodynamic reservoirs and/or violates LDB. For a general protocol, our EP rate in Eq. 11 refers to the so-called nonadiabatic EP rate van2010three ; esposito2010three ; lee_fluctuation_2013 , which is a non-negative quantity that reflects the contribution to EP that is due to the system being out of the stationary distribution. In the general case, our decompositions in Theorems 1 and 2, as well as EP lower bounds in Eqs. 18 and 33, apply to nonadiabatic EP, rather than overall EP. Importantly, the nonadiabatic EP rate is a lower bound on the overall EP rate whenever the stationary distribution of $L$ is symmetric under conjugation of odd-parity variables lee_fluctuation_2013 , which holds in most cases of interest such as discrete-state master equations (which typically have no odd variables), overdamped dynamics (which have no odd variables), and many types of underdamped dynamics. In such cases, Eqs. 18 and 33 provide lower bounds not only on the nonadiabatic EP, but also on the overall EP, regardless of the number of coupled reservoirs or LDB. However, the relationship between work and EP in Eq. 2, as well as our bounds on work which make use of this relationship such as Eqs. 24 and 25, hold only for isothermal protocols. Note that our EP bound for closed coarse-grained dynamics, Eq. 87, concerns the overall EP rate, not the nonadiabatic EP rate, even for non-isothermal protocols (see Section D.2 for details).

There are several possible directions for future research.

First, it remains an open question of whether our framework can also be used to analyze other classes of constraints, beyond the three classes (symmetry, modularity, and coarse-graining) considered in this paper.

Second, our results point to a novel connection between entropy production, which plays a central role in nonequilibrium thermodynamics, and the Pythagorean identity in Eq. 14, which plays a central role in information geometry. This contributes to the growing number of existing results that demonstrate formal relationships between information geometry and nonequilibrium thermodynamics ito2018stochastic ; takahashi2017shortcuts ; ito2020unified ; nicholson2018nonequilibrium ; ito2020stochastic ; nakamura2019reconsideration . One direction for future work would be to extend the framework developed in this work for classical to quantum systems. In this extension, one would derive bounds on quantum work and EP by considering a quantum operator $\phi$ over density matrices which obeys quantum analogues of the Pythagorean identity in Eq. 14 (petzQuantumInformationTheory2008, , p. 44) and the commutativity relation in Eq. 16.

Finally, our results may also lead to some new treatments of foundational questions in thermodynamics. In stochastic thermodynamics, probability distributions over system states are usually interpreted in a “subjective” sense, in that the distribution $p$ assigned to a system typically reflects what one knows about the system (for this reason, this distribution changes once a measurement is made of the system’s state parrondo2015thermodynamics ). At the same time, our results show that for constrained driving protocols, one can often assign a different distribution to the system, $\phi(p)$ , which reflects what one can control about the system. This also leads to the difference between the overall nonequilibrium free energy, defined in terms of the distribution $p$ , and the accessible free energy, defined in terms of the distribution $\phi(p)$ . Note that thermodynamic entropy is often understood in an operational way, e.g., in terms of constrained macroscopic control, as has been previously discussed by Jaynes jaynes1992gibbs and others. An interesting direction for future work would explore whether the distinction between the distributions $p$ and $\phi(p)$ maps onto the distinction between (microscopic) statistical mechanical entropy and (macroscopic) thermodynamic entropy. In particular, one might ask whether this mapping can resolve some classic paradoxes concerning the relationship between statistical mechanical and thermodynamic entropy, such as the Gibbs paradox jaynes1992gibbs (mixing of indistinguishable particles increases statistical mechanical entropy but not thermodynamic entropy) and Loschmidt’s paradox (for an isolated Hamiltonian system, statistical mechanical entropy remains constant while the thermodynamic entropy can increase). This direction could also be related to a recent axiomatic treatment of thermodynamic entropy which has been developed within the framework of quantum resource theory weilenmann2016axiomatic .

Acknowledgments

We thank Massimiliano Esposito and Henrik Wilming for helpful discussions. This research was supported by grant number FQXi-RFP-IPW-1912 from the Foundational Questions Institute and Fetzer Franklin Fund, a donor advised fund of Silicon Valley Community Foundation. The authors thank the Santa Fe Institute for helping to support this research.

References

(1) K. Takara, H.-H. Hasegawa, and D. Driebe, “Generalization of the second law for a transition between nonequilibrium states,” Physics Letters A, vol. 375, no. 2, pp. 88–92, Dec. 2010.
(2) J. M. R. Parrondo, J. M. Horowitz, and T. Sagawa, “Thermodynamics of information,” Nature Physics, vol. 11, no. 2, pp. 131–139, 2015.
(3) M. Esposito and C. Van den Broeck, “Second law and Landauer principle far from equilibrium,” EPL (Europhysics Letters), vol. 95, no. 4, p. 40004, 2011.
(4) We use a Brownian model of the Szilard engine, which is similar to setups commonly employed in modern nonequilibrium statistical physics berut2012experimental ; roldan2014universal ; koski2014experimental ; shizume1995heat ; gong2016stochastic ; parrondo2015thermodynamics , as shown in Fig. 1. This model can be justified by imagining a box that contains a large colloidal particle, as well as a medium of small solvent particles to which the vertical partition is permeable. Note that this model differs from Szilard’s original proposal szilard1929entropieverminderung , in which the box contains a single particle in a vacuum, which has been analyzed in proesmans2015efficiency ; hondou2007equation ; bhat2017unusual .
(5) T. Sagawa and M. Ueda, “Second law of thermodynamics with discrete quantum feedback control,” Physical Review Letters, vol. 100, no. 8, p. 080403, 2008.
(6) As common in the literature, in Eq. 3 we consider only the work that is extractable from the system after the measurement is made. We do not account for the possible work cost of making the measurement, nor any work exchanges that may be incurred by the measurement apparatus during the driving.
(7) P. A. Corning and S. J. Kline, “Thermodynamics, information and life revisited, part II: ‘Thermoeconomics’ and ‘Control information’,” Systems Research and Behavioral Science: The Official Journal of the International Federation for Systems Research, vol. 15, no. 6, pp. 453–482, 1998.
(8) A. Kolchinsky and D. H. Wolpert, “Semantic information, autonomous agency and non-equilibrium statistical physics,” Interface focus, vol. 8, no. 6, p. 20180041, 2018.
(9) S. A. Kauffman, Investigations. Oxford University Press, 2000.
(10) D. H. Wolpert and A. Kolchinsky, “Thermodynamics of computing with circuits,” New Journal of Physics, vol. 22, no. 6, p. 063047, 2020.
(11) J. Song, S. Still, R. Díaz Hernández Rojas, I. Pérez Castillo, and M. Marsili, “Optimal work extraction and mutual information in a generalized szilárd engine,” Phys. Rev. E, vol. 103, p. 052121, May 2021.
(12) F. J. Cao, L. Dinis, and J. M. R. Parrondo, “Feedback control in a collective flashing ratchet,” Physical Review Letters, vol. 93, no. 4, p. 040603, 2004.
(13) D. Janzing, “Quantum Thermodynamics with Missing Reference Frames: Decompositions of Free Energy Into Non-Increasing Components,” Journal of Statistical Physics, vol. 125, no. 3, pp. 761–776, Nov. 2006.
(14) J. A. Vaccaro, F. Anselmi, H. M. Wiseman, and K. Jacobs, “Tradeoff between extractable mechanical work, accessible entanglement, and ability to act as a reference system, under arbitrary superselection rules,” Phys. Rev. A, vol. 77, p. 032114, Mar 2008.
(15) The assumption of unique stationary distributions can be relaxed as long as the operator $\phi$ (as discussed in Section III) satisfies the following weak technical condition: for all $p\in\mathcal{P}$ and each stationary distribution $\pi$ of each $L\in\Lambda$ , $D(p\|\phi(\pi))<\infty$ whenever $D(p\|\pi)<\infty$ . Note that $\phi(\pi)$ is also a stationary distribution of $L$ by Lemma 1 in Appendix A, so this condition is automatically satisfied when the generators have unique stationary distributions (since in that case $\pi=\phi(\pi)$ ). Note also that if some $L\in\Lambda$ have multiple stationary distributions $\pi$ , the corresponding EP rate in Eq. 11 can be equivalently defined using any $\pi$ such that $D(p\|\pi)<\infty$ .
(16) M. Esposito and C. Van den Broeck, “Three faces of the second law. I. Master equation formulation,” Physical Review E, vol. 82, no. 1, p. 011143, 2010.
(17) N. G. Van Kampen, Stochastic processes in physics and chemistry. Elsevier, 1992, vol. 1.
(18) H. Risken, “Fokker-Planck equation,” in The Fokker-Planck Equation. Springer, 1996, pp. 63–95.
(19) C. Van den Broeck and M. Esposito, “Three faces of the second law. II. Fokker-Planck formulation,” Physical Review E, vol. 82, no. 1, p. 011144, 2010.
(20) D. L. Ermak and J. A. McCammon, “Brownian dynamics with hydrodynamic interactions,” The Journal of chemical physics, vol. 69, no. 4, pp. 1352–1360, 1978.
(21) S.-i. Amari, Information geometry and its applications. Springer, 2016, vol. 194.
(22) This is because $D(p\|q)\geq D(p\|\phi(p))$ for any $q\in\mathrm{img}\;\phi$ , which follows from Eq. 14 and the non-negativity of KL divergence.
(23) A. Kolchinsky and D. H. Wolpert, “Entropy production given constraints on the energy functions,” Phys. Rev. E, vol. 104, p. 034129, Sep 2021.
(24) U. Seifert, “Stochastic thermodynamics, fluctuation theorems and molecular machines,” Reports on Progress in Physics, vol. 75, no. 12, p. 126001, 2012.
(25) A. Kolchinsky and D. H. Wolpert, “The state dependence of integrated, instantaneous, and fluctuating entropy production in quantum and classical processes,” arXiv preprint arXiv:2103.05734, 2021.
(26) H. Kwon and M. S. Kim, “Fluctuation theorems for a quantum channel,” Physical Review X, vol. 9, no. 3, p. 031029, 2019.
(27) A compact group $\mathcal{G}$ has a measurable action over $X$ if the action $\mathcal{G}\times X\to X$ is a measurable function, where we assume $\mathcal{G}$ and $X$ are endowed with their respective Borel algebras.
(28) Technically, the definition of the twirling operator in Eq. 42 applies only when $p$ is a finite-valued probability density function (which excludes things such as the Dirac delta “function”). A more general formulation of our results can be developed in terms of probability measures rather than probability densities (see Ch. 3 in eaton_group_1989 for a version of Eq. 42 defined in terms of probability measures).
(29) K. G. H. Vollbrecht and R. F. Werner, “Entanglement measures under symmetry,” Physical Review A, vol. 64, no. 6, p. 062307, 2001.
(30) Technically, the wall potential as defined in Eq. 45 is non-differentiable. To be more accurate, one should imagine it in terms of the limit $V_{\mathrm{w}}(|x|)=\lim_{\alpha\to\infty}|x|^{\alpha}$ dhar2019run .
(31) S. Still and D. Daimer, “Partially observable szilard engines,” arXiv preprint arXiv:2103.15803, 2021.
(32) P. L. Krapivsky, S. Redner, and E. Ben-Naim, A Kinetic View of Statistical Physics. Cambridge University Press, Nov. 2010.
(33) J. Lekscha, H. Wilming, J. Eisert, and R. Gallego, “Quantum thermodynamics with local control,” Physical Review E, vol. 97, no. 2, p. 022142, Feb. 2018.
(34) N. Gershenfeld, “Signal entropy and the thermodynamics of computation,” IBM Systems Journal, vol. 35, no. 3.4, pp. 577–586, 1996.
(35) A. B. Boyd, D. Mandal, and J. P. Crutchfield, “Thermodynamics of modularity: Structural costs beyond the landauer bound,” Phys. Rev. X, vol. 8, p. 031036, Aug 2018.
(36) G. Schlosser and G. P. Wagner, Modularity in development and evolution. University of Chicago Press, 2004.
(37) O. Sporns and R. F. Betzel, “Modular brain networks,” Annual review of psychology, vol. 67, pp. 613–640, 2016.
(38) One can also apply the results in this section to Fokker-Planck equations that can be put in the form of Eq. 62 via an appropriate change of variables, see (risken1996fokker, , Sec. 4.9).
(39) The multi-information is a well-known generalization of mutual information, which is also sometimes called “total correlation” (watanabe1960information, ).
(40) E. Craig, N. Kuwada, B. Lopez, and H. Linke, “Feedback control in flashing ratchets,” Annalen der Physik, vol. 17, no. 2-3, pp. 115–129, Feb. 2008.
(41) M. Feito and F. J. Cao, “Information and maximum power in a feedback controlled Brownian ratchet,” The European Physical Journal B, vol. 59, no. 1, pp. 63–68, Sep. 2007.
(42) B. J. Lopez, N. J. Kuwada, E. M. Craig, B. R. Long, and H. Linke, “Realization of a feedback controlled flashing ratchet,” Physical Review Letters, vol. 101, no. 22, p. 220601, Nov. 2008.
(43) G. Nicolis, “Transformation properties of entropy production,” Physical Review E, vol. 83, no. 1, p. 011112, 2011.
(44) If $\xi(x)=Wx$ and $WW^{T}\neq I$ , one can define an equivalent, rescaled coarse-graining function $\xi^{\prime}(x)=W^{\prime}x$ , where $W^{\prime}:=(WW^{T})^{-1/2}W$ , which obeys $W^{\prime}W^{\prime T}=I$ .
(45) M. H. Duong, A. Lamacz, M. A. Peletier, A. Schlichting, and U. Sharma, “Quantification of coarse-graining error in Langevin and overdamped Langevin dynamics,” Nonlinearity, vol. 31, no. 10, p. 4517, 2018.
(46) M. Esposito, “Stochastic thermodynamics under coarse graining,” Physical Review E, vol. 85, no. 4, p. 041125, 2012.
(47) H. Wilming, R. Gallego, and J. Eisert, “Second law of thermodynamics under control restrictions,” Phys. Rev. E, vol. 93, p. 042126, Apr 2016.
(48) M. Perarnau-Llobet, A. Riera, R. Gallego, H. Wilming, and J. Eisert, “Work and entropy production in generalised Gibbs ensembles,” New Journal of Physics, vol. 18, no. 12, p. 123035, Dec. 2016.
(49) M. Esposito, R. Kawai, K. Lindenberg, and C. Van den Broeck, “Finite-time thermodynamics for a single-level quantum dot,” EPL (Europhysics Letters), vol. 89, no. 2, p. 20003, 2010.
(50) D. A. Sivak and G. E. Crooks, “Thermodynamic metrics and optimal paths,” Physical Review Letters, vol. 108, no. 19, p. 190602, 2012.
(51) N. Shiraishi, K. Funo, and K. Saito, “Speed limit for classical stochastic processes,” Phys. Rev. Lett., vol. 121, p. 070601, Aug 2018.
(52) A. Gomez-Marin, T. Schmiedl, and U. Seifert, “Optimal protocols for minimal work processes in underdamped stochastic thermodynamics,” The Journal of chemical physics, vol. 129, no. 2, p. 024114, 2008.
(53) H. Then and A. Engel, “Computing the optimal protocol for finite-time processes in stochastic thermodynamics,” Physical Review E, vol. 77, no. 4, p. 041105, 2008.
(54) P. R. Zulkowski and M. R. DeWeese, “Optimal finite-time erasure of a classical bit,” Physical Review E, vol. 89, no. 5, p. 052140, 2014.
(55) T. Schmiedl and U. Seifert, “Optimal finite-time processes in stochastic thermodynamics,” Physical Review Letters, vol. 98, no. 10, p. 108301, 2007.
(56) B. B. Machta, “Dissipation bound for thermodynamic control,” Physical Review Letters, vol. 115, no. 26, p. 260603, 2015.
(57) G. Verley, C. V. d. Broeck, and M. Esposito, “Work statistics in stochastically driven systems,” New Journal of Physics, vol. 16, no. 9, p. 095001, 2014.
(58) A. E. Allahverdyan, R. Balian, and T. M. Nieuwenhuizen, “Maximal work extraction from finite quantum systems,” EPL (Europhysics Letters), vol. 67, no. 4, p. 565, 2004.
(59) R. Uzdin and S. Rahav, “Passivity deformation approach for the thermodynamics of isolated quantum setups,” PRX Quantum, vol. 2, no. 1, p. 010336, 2021.
(60) A. P. Solon and J. M. Horowitz, “Phase transition in protocols minimizing work fluctuations,” Physical Review Letters, vol. 120, no. 18, p. 180605, 2018.
(61) T. R. Gingrich, G. M. Rotskoff, G. E. Crooks, and P. L. Geissler, “Near-optimal protocols in complex nonequilibrium transformations,” Proceedings of the National Academy of Sciences, vol. 113, no. 37, pp. 10 263–10 268, 2016.
(62) E. Aurell, C. Mejía-Monasterio, and P. Muratore-Ginanneschi, “Optimal protocols and optimal transport in stochastic thermodynamics,” Physical Review Letters, vol. 106, no. 25, p. 250601, 2011.
(63) S. Still, “Thermodynamic cost and benefit of memory,” Physical Review Letters, vol. 124, no. 5, p. 050601, 2020.
(64) That paper proposed to divide the system into two subsystems $Y$ and $\bar{Y}$ , such that the accessible information is given by $I(M;Y)$ , under three assumption: (1) the system’s marginal distributions remains constant during all steps of feedback control, (2) the conditional distribution of $\bar{Y}$ given the system and the measurement does not change during the driving, and (3) all conditional information about $\bar{Y}$ is lost by the time that driving begins. After private communication with the author of still2020thermodynamic , we think that condition (3) may need to be formalized as $p(\bar{y}(t_{2})|y(t_{2}),z(t_{0}))=p(\bar{y}(t_{2})|y(t_{2}))$ , although this equation does not appear in that paper.
(65) D. H. Wolpert, “The stochastic thermodynamics of computation,” Journal of Physics A: Mathematical and Theoretical, 2019.
(66) ——, “Uncertainty relations and fluctuation theorems for bayes nets,” Phys. Rev. Lett., vol. 125, p. 200602, Nov 2020.
(67) ——, “Fluctuation theorems for multipartite processes,” arXiv:2003.11144, 2020.
(68) ——, “Minimal entropy production rate of interacting systems,” New Journal of Physics, vol. 22, no. 11, p. 113013, 2020.
(69) ——, “Strengthened Landauer bound for composite systems,” arXiv:2007.10950, 2020.
(70) The reader should be aware that those papers used different terminology from this paper. In wolpert2020fluctuation ; wolpert2020minimal , each degree of freedom $v\in V$ is called a “subsystem”, the modular decomposition $\mathcal{C}$ is called a “unit structure”, while each $A\in\mathcal{C}$ is called a “unit”.
(71) M. Lostaglio, D. Jennings, and T. Rudolph, “Description of quantum coherence in thermodynamic processes requires constraints beyond free energy,” Nature Communications, vol. 6, no. 1, p. 6383, May 2015.
(72) J. P. Santos, L. C. Céleri, G. T. Landi, and M. Paternostro, “The role of quantum coherence in non-equilibrium entropy production,” npj Quantum Information, vol. 5, no. 1, p. 23, Dec. 2019.
(73) C. Elphick and P. Wocjan, “Spectral lower bounds for the quantum chromatic number of a graph,” Journal of Combinatorial Theory, Series A, vol. 168, pp. 338–347, 2019.
(74) M. L. Eaton, “Group invariance applications in statistics,” in Regional conference series in Probability and Statistics. JSTOR, 1989, pp. i–133.
(75) E. Chitambar and G. Gour, “Quantum resource theories,” Reviews of Modern Physics, vol. 91, no. 2, p. 025001, 2019.
(76) R. Gallego, J. Eisert, and H. Wilming, “Thermodynamic work from operational principles,” New Journal of Physics, vol. 18, no. 10, p. 103017, Oct. 2016.
(77) F. G. S. L. Brandão, M. Horodecki, J. Oppenheim, J. M. Renes, and R. W. Spekkens, “Resource theory of quantum states out of thermal equilibrium,” Phys. Rev. Lett., vol. 111, p. 250404, Dec 2013.
(78) M. Lostaglio, M. P. Müller, and M. Pastena, “Stochastic independence as a resource in small-scale thermodynamics,” Phys. Rev. Lett., vol. 115, p. 150402, Oct 2015.
(79) P. Faist and R. Renner, “Fundamental work cost of quantum processes,” Phys. Rev. X, vol. 8, p. 021011, Apr 2018.
(80) N. Yunger Halpern and J. M. Renes, “Beyond heat baths: Generalized resource theories for small-scale thermodynamics,” Phys. Rev. E, vol. 93, p. 022126, Feb 2016.
(81) I. Marvian and R. W. Spekkens, “Extending Noether’s theorem by quantifying the asymmetry of quantum states,” Nature Communications, vol. 5, no. 1, Sep. 2014.
(82) ——, “Asymmetry properties of pure quantum states,” Phys. Rev. A, vol. 90, p. 014102, Jul 2014.
(83) ——, “Modes of asymmetry: The application of harmonic analysis to symmetric quantum dynamics and quantum reference frames,” Phys. Rev. A, vol. 90, p. 062110, Dec 2014.
(84) Z.-W. Liu, X. Hu, and S. Lloyd, “Resource destroying maps,” Physical Review Letters, vol. 118, no. 6, p. 060502, 2017.
(85) H. K. Lee, C. Kwon, and H. Park, “Fluctuation theorems and entropy production with odd-parity variables,” Physical Review Letters, vol. 110, no. 5, p. 050602, 2013.
(86) S. Ito, “Stochastic thermodynamic interpretation of information geometry,” Physical Review Letters, vol. 121, no. 3, p. 030605, 2018.
(87) K. Takahashi, “Shortcuts to adiabaticity applied to nonequilibrium entropy production: an information geometry viewpoint,” New Journal of Physics, vol. 19, no. 11, p. 115007, 2017.
(88) S. Ito, M. Oizumi, and S.-i. Amari, “Unified framework for the entropy production and the stochastic interaction based on information geometry,” Physical Review Research, vol. 2, no. 3, p. 033048, 2020.
(89) S. B. Nicholson, A. del Campo, and J. R. Green, “Nonequilibrium uncertainty principle from information geometry,” Physical Review E, vol. 98, no. 3, p. 032106, 2018.
(90) S. Ito and A. Dechant, “Stochastic time evolution, information geometry, and the cramér-rao bound,” Physical Review X, vol. 10, no. 2, p. 021056, 2020.
(91) T. Nakamura, H. Hasegawa, and D. Driebe, “Reconsideration of the generalized second law based on information geometry,” Journal of Physics Communications, vol. 3, no. 1, p. 015015, 2019.
(92) D. Petz, Quantum Information Theory and Quantum Statistics, ser. Theoretical and Mathematical Physics. Berlin: Springer, 2008.
(93) E. T. Jaynes, “The gibbs paradox,” in Maximum entropy and bayesian methods. Springer, 1992, pp. 1–21.
(94) M. Weilenmann, L. Kraemer, P. Faist, and R. Renner, “Axiomatic relation between thermodynamic and information-theoretic entropies,” Physical Review Letters, vol. 117, no. 26, p. 260601, 2016.
(95) A. Bérut, A. Arakelyan, A. Petrosyan, S. Ciliberto, R. Dillenschneider, and E. Lutz, “Experimental verification of Landauer’s principle linking information and thermodynamics,” Nature, vol. 483, no. 7388, pp. 187–189, 2012.
(96) É. Roldán, I. A. Martinez, J. M. Parrondo, and D. Petrov, “Universal features in the energetics of symmetry breaking,” Nature Physics, vol. 10, no. 6, pp. 457–461, 2014.
(97) J. V. Koski, V. F. Maisi, T. Sagawa, and J. P. Pekola, “Experimental observation of the role of mutual information in the nonequilibrium dynamics of a Maxwell demon,” Physical Review Letters, vol. 113, no. 3, p. 030601, 2014.
(98) K. Shizume, “Heat generation required by information erasure,” Physical Review E, vol. 52, no. 4, p. 3495, 1995.
(99) Z. Gong, Y. Lan, and H. T. Quan, “Stochastic thermodynamics of a particle in a box,” Physical Review Letters, vol. 117, no. 18, p. 180603, 2016.
(100) L. Szilard, “Über die entropieverminderung in einem thermodynamischen system bei eingriffen intelligenter wesen,” Zeitschrift für Physik, vol. 53, no. 11-12, pp. 840–856, 1929.
(101) K. Proesmans, C. Driesen, B. Cleuren, and C. Van den Broeck, “Efficiency of single-particle engines,” Physical review E, vol. 92, no. 3, p. 032105, 2015.
(102) T. Hondou, “Equation of state in a small system: Violation of an assumption of Maxwell’s demon,” EPL (Europhysics Letters), vol. 80, no. 5, p. 50001, 2007.
(103) D. Bhat, S. Sabhapandit, A. Kundu, and A. Dhar, “Unusual equilibration of a particle in a potential with a thermal wall,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2017, no. 11, p. 113210, 2017.
(104) A. Dhar, A. Kundu, S. N. Majumdar, S. Sabhapandit, and G. Schehr, “Run-and-tumble particle in one-dimensional confining potentials: Steady-state, relaxation, and first-passage properties,” Physical Review E, vol. 99, no. 3, p. 032132, 2019.
(105) S. Watanabe, “Information theoretical analysis of multivariate correlation,” IBM Journal of research and development, vol. 4, no. 1, pp. 66–82, 1960.
(106) I. Csiszar and J. Körner, Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011.
(107) T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2006.
(108) C. Van den Broeck and M. Esposito, “Ensemble and trajectory thermodynamics: A brief introduction,” Physica A: Statistical Mechanics and its Applications, vol. 418, pp. 6–16, 2015.
(109) U. Seifert, “Entropy production along a stochastic trajectory and an integral fluctuation theorem,” Physical Review Letters, vol. 95, no. 4, p. 040602, 2005.
(110) M. Esposito and C. Van den Broeck, “Three detailed fluctuation theorems,” Phys. Rev. Lett., vol. 104, p. 090601, Mar 2010.
(111) R. E. Spinney and I. J. Ford, “Entropy production in full phase space for continuous stochastic dynamics,” Physical Review E, vol. 85, no. 5, p. 051113, 2012.
(112) Y. Murashita, K. Funo, and M. Ueda, “Nonequilibrium equalities in absolutely irreversible processes,” Physical Review E, vol. 90, no. 4, p. 042110, 2014.
(113) C. Jarzynski, “Equalities and inequalities: irreversibility and the second law of thermodynamics at the nanoscale,” Annu. Rev. Condens. Matter Phys., vol. 2, no. 1, pp. 329–351, 2011.
(114) J. J. Benedetto and W. Czaja, Integration and Modern Analysis. Boston: Birkhäuser Boston, 2009.
(115) A. C. Barato and U. Seifert, “Coherence of biochemical oscillations is bounded by driving force and network topology,” Physical Review E, vol. 95, no. 6, p. 062409, 2017.
(116) C. N. Yang, “The spontaneous magnetization of a two-dimensional ising model,” Physical Review, vol. 85, no. 5, p. 808, 1952.
(117) K.-J. Engel and R. Nagel, One-Parameter Semigroups for Linear Evolution Equations, ser. Graduate Texts in Mathematics. New York: Springer-Verlag, 2000.
(118) A. Gomez-Marin, J. M. Parrondo, and C. Van den Broeck, “Lower bounds on dissipation upon coarse graining,” Physical Review E, vol. 78, no. 1, p. 011107, 2008.

Appendix A Derivations for Sections III and IV

A.1 Proofs of Theorems 1 and 2

We first prove a few helpful lemmas.

Lemma 1.

If $L$ obeys $e^{L}\phi(p)=\phi(e^{L}p)$ for all $p\in\mathcal{P}$ , then $L$ has a stationary distribution $\pi\in\mathrm{img}\;\phi$ .

Proof.

Let $q$ be some stationary distribution of $L$ . Then,

\displaystyle e^{L}\phi(q)=\phi(e^{L}q)=\phi(q).

(105)

Thus, $\phi(q)\in\mathrm{img}\;\phi$ is stationary under $L$ . ∎

Lemma 2.

If $e^{\tau L}\phi(p)=\phi(e^{\tau L}p)$ for all $p\in\mathcal{P}$ and $\tau\geq 0$ , then for any $r,s\in\mathcal{P}$ ,

\displaystyle-{\textstyle\frac{d}{dt}}D(r(t)\|\phi(s(t)))\geq 0,

where ${\textstyle\partial_{t}}r=Lr$ and ${\textstyle\partial_{t}}s=Ls$ .

Proof.

Expand the derivative as

	$\displaystyle-{\textstyle\frac{d}{dt}}D(r(t)\\|\phi(s(t)))$
	$\displaystyle\quad=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(r\\|\phi(s))-D(e^{\tau L}r\\|\phi(e^{\tau L}s))\right]$
	$\displaystyle\quad=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(r\\|\phi(s))-D(e^{\tau L}r\\|e^{\tau L}\phi(s))\right]\geq 0.$

where in the last line we used the commutativity relation and the data processing inequality for KL divergence csiszar_information_2011 . ∎

Lemma 3.

Consider a protocol $\{L(t):t\in[0,1]\}$ and an operator $\phi$ that obeys Eqs. 14 and 16. Then

\phi(p(t))=\phi(p)(t),

where $p(t)$ is the distribution at time $t$ given initial distribution $p$ , and $\phi(p)(t)$ is the distribution at time $t$ given initial distribution $\phi(p)$ .

Proof.

Using Lemma 2 with $r=\phi(p)(t)$ and $s=p(t)$ ,

\displaystyle{\textstyle\frac{d}{dt}}D(\phi(p)(t)\|\phi(p(t)))\leq 0.

(106)

Note that

D([\phi(p)](0)\|\phi(p(0)))=D(\phi(p)\|\phi(p))=0,

and that $D(\phi(p)(t)\|\phi(p(t)))\geq 0$ for all $t$ by non-negativity of KL divergence. Combined with Eq. 106, this implies $D(\phi(p)(t)\|\phi(p(t)))=0$ for all $t$ , and therefore $\phi(p)(t)=\phi(p(t))$ (cover_elements_2006, , Thm. 8.6.1). ∎

We are now ready to prove Theorems 1 and 2. Note that in the proof of Theorem 1, we make the assumption that there is some stationary distribution $\pi^{L}$ of $L$ such that $D(p\|\pi^{L})<\infty$ , and similarly in Theorem 2 we make the assumption that $D(p(t)\|\pi^{L(t)})<\infty$ at all $t\in[0,1]$ . These are weak and physically realistic assumptions, which essentially mean that we restrict our attention to distributions with finite nonequilibrium free energy (see Eq. 20).

In addition, in these proofs we will use that the EP rate incurred by distribution $p$ under the generator $L$ with stationary distribution $\pi$ can be written as

\displaystyle\dot{\Sigma}(p,L)

\displaystyle=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(p\|\pi)-D(e^{\tau L}p\|\pi)\right].

(107)

This can be derived from Eq. 11, by noting that the KL divergence can be written as

\displaystyle D(p\|\pi)=-S(p)-\mathbb{E}_{p}\big{[}\ln\pi\big{]},

(108)

where $\mathbb{E}_{p}$ indicates expectation under the distribution $p$ , and then using that

	$\displaystyle-\sum_{x}{\textstyle\partial_{t}}p_{x}(t)\ln p_{x}=\lim_{\tau\to 0}\frac{1}{\tau}\left[S(e^{\tau L}p)-S(p)\right]$		(109)
	$\displaystyle\sum_{x}{\textstyle\partial_{t}}p_{x}(t)\ln\pi_{x}=\lim_{\tau\to 0}\frac{1}{\tau}\left[\mathbb{E}_{e^{\tau L}p}[\ln\pi]-\mathbb{E}_{p}[\ln\pi]\right],$		(110)

where ${\textstyle\partial_{t}}p_{x}(t)$ is defined as in Eq. 10. (As usual, summations should be replaced by integrals for continuous-state systems.)

Proof of Theorem 1.

Consider a generator $L$ with a stationary distribution $\pi$ , and some distribution $p\in\mathcal{P}$ such that $D(p\|\pi)<\infty$ . By Lemma 1, $\phi(\pi)\in\mathrm{img}\;\phi$ is also a stationary distribution of $L$ . If $L$ has a unique stationary distribution, then $\pi=\phi(\pi)$ and so $\pi\in\mathrm{img}\;\phi$ ; otherwise, as long as $D(p\|\phi(\pi))<\infty$ (see Note3 ), we can assume that $\phi(\pi)=\pi$ in Eq. 107. Then, assuming that $\pi\in\mathrm{img}\;\phi$ , we rewrite the term in the brackets in Eq. 107 as

	$\displaystyle D(p\\|\phi(p))+D(\phi(p)\\|\pi)$
	$\displaystyle\qquad\qquad-D(e^{\tau L}p\\|\phi(e^{\tau L}p))-D(\phi(e^{\tau L}p)\\|\pi)$
	$\displaystyle=D(p\\|\phi(p))-D(e^{\tau L}p\\|\phi(e^{\tau L}p))$
	$\displaystyle\qquad\qquad+D(\phi(p)\\|\pi)-D(\phi(e^{\tau L}p)\\|\pi)$
	$\displaystyle=D(p\\|\phi(p))-D(e^{\tau L}p\\|\phi(e^{\tau L}p))$
	$\displaystyle\qquad\qquad+D(\phi(p)\\|\pi)-D(e^{\tau L}\phi(p)\\|\pi),$

where we used the Pythagorean identity of Eq. 14, rearranged, and then used the commutativity relation of Eq. 16. Plugging into Eq. 107 gives

	$\displaystyle\dot{\Sigma}(p,L)$	$\displaystyle=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(p\\|\phi(p))-D(e^{\tau L}p\\|\phi(e^{\tau L}p))\right]$
		$\displaystyle\quad+\lim_{\tau\to 0}\frac{1}{\tau}\left[D(\phi(p)\\|\pi)-D(e^{\tau L}\phi(p)\\|\pi)\right]$
		$\displaystyle=-{\textstyle\frac{d}{dt}}D(p(t)\\|\phi(p(t)))+\dot{\Sigma}(\phi(p),L).$

The non-negativity of $-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))$ follows by taking $r=s=p$ in Lemma 2. ∎

Proof of of Theorem 2.

Using Eqs. 12 and 1, write

	$\displaystyle\Sigma(p\!\shortrightarrow\!p^{\prime})=\int_{0}^{1}\dot{\Sigma}(p(t),L(t))\,dt$
	$\displaystyle\;\;=-\int_{0}^{1}{\textstyle\frac{d}{dt}}D(p(t)\\|\phi(p(t)))\,dt+\int_{0}^{1}\dot{\Sigma}(\phi(p(t)),L(t))\,dt.$

Both integrals have a simple expression. First, by the fundamental theorem of calculus,

\displaystyle-\int_{0}^{1}{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))\,dt=D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime})).

This expression is non-negative, since $-{\textstyle\frac{d}{dt}}D(p(t)\|\phi(p(t)))\geq 0$ by Lemma 2. Second, using Lemma 3,

	$\displaystyle\int_{0}^{1}\dot{\Sigma}(\phi(p(t)),L(t))\,dt$	$\displaystyle=\int_{0}^{1}\dot{\Sigma}(\phi(p)(t),L(t))\,dt$
		$\displaystyle\qquad=\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})).$

∎

A.2 Trajectory-level version of Eq. 19

Stochastic thermodynamics has shown that thermodynamic properties of physical processes (such as heat, work, and EP) can be defined as stochastically fluctuating quantities at the level of individual trajectories. We first briefly review the basic concepts of stochastic thermodynamics (for more details, the reader should consult van2015ensemble ; seifert2012stochastic ; Seifert2005 ; esposito_three_2010 ).

Let $\bm{x}=(x,\dots,x^{\prime})$ indicate a continuous-time trajectory of system states $\bm{x}$ over time interval $t\in[0,1]$ , where $x$ and $x^{\prime}$ indicate the initial and final system states respectively, and let $P(\bm{x}|x)$ indicate the conditional probability of observing trajectory $\bm{x}$ given initial state $x$ . For a given initial distribution $p(x)$ , the probability of observing trajectory $\bm{x}$ is then given by $p(\bm{x})=p(x)P(\bm{x}|x)$ , and the corresponding final distribution is given by $p^{\prime}(x^{\prime})=\int P(x^{\prime}|x)p(x)dx$ . In addition, let $\tilde{P}(\tilde{\bm{x}}|x^{\prime})$ indicate the conditional probability of observing the time-reversed and trajectory $\tilde{\bm{x}}=({x^{\prime}},\dots,{x})$ given the final state ${x^{\prime}}$ under a “time-reversed” driving protocol seifert2012stochastic .

Trajectory-level EP is then defined in terms of the asymmetry between forward and reversed trajectory probabilities,

\displaystyle\sigma_{p}(\bm{x})=\ln p(x)-\ln p^{\prime}(x^{\prime})+\ln\frac{P(\bm{x}|x)}{\tilde{P}(\tilde{\bm{x}}|x^{\prime})},

(111)

which is sometimes referred to as a detailed fluctuation theorem. (The above expression should be slightly modified the presence of odd-parity variables such as momentum, though in a way which does not change our derivations; see spinney2012entropy .) The expectation of trajectory-level EP across all trajectories is equal to the standard expression for integrated EP as used in the main text,

\displaystyle\langle\sigma_{p}(\bm{x})\rangle=\Sigma(p\!\shortrightarrow\!p^{\prime}),

(112)

where $\langle\cdot\rangle$ refers to expectations under the trajectory distribution $p(\bm{x})$ . Furthermore, by a simple manipulation, the detailed fluctuation theorem in Eq. 111 leads to the following integral fluctuation theorem for EP,

	$\displaystyle\langle e^{-\sigma_{p}}\rangle$	$\displaystyle=\int_{p(x)>0}p(x)P(\bm{x}\|x)\frac{p^{\prime}(x^{\prime})\tilde{P}(\tilde{\bm{x}}\|x^{\prime})}{p(x)P(\bm{x}\|x)}D\bm{x}$
		$\displaystyle=\int_{p(x)>0}p^{\prime}(x^{\prime})\tilde{P}(\tilde{\bm{x}}\|x^{\prime})D\bm{x}=\gamma,$		(113)

where $\int\;\cdot\;D\bm{x}$ is the path integral. In this result, $\gamma\in(0,1]$ reflects the “absolute irreversibility” of the process under initial distribution $p$ murashita2014nonequilibrium . When $p$ has full support, $\gamma=1$ , giving the standard integral fluctuation theorem, $\langle e^{-\sigma_{p}}\rangle=1$ .

Now consider the extra trajectory-level EP incurred by some trajectory $\bm{x}$ on initial distribution $p$ , additional to the trajectory-level EP incurred by the same trajectory on initial distribution $\phi(p)$ ,

$\displaystyle m(\bm{x})$	$\displaystyle:=\sigma_{p}(\bm{x})-\sigma_{\phi(p)}(\bm{x})$	(114)
	$\displaystyle=\ln\frac{p(x)}{\phi(p)(x)}-\ln\frac{p^{\prime}(x^{\prime})}{\phi(p)^{\prime}(x^{\prime})}$	(115)
	$\displaystyle=\ln\frac{p(x)}{\phi(p)(x)}-\ln\frac{p^{\prime}(x^{\prime})}{\phi(p^{\prime})(x^{\prime})}$	(116)

where in the second line we used that the last term in Eq. 111 cancels (as it does not depend on the initial or final distributions) and in the third line we used that $\phi(p)^{\prime}=\phi(p^{\prime})$ by Lemma 3. Eq. 114 appears in the main text as Eq. 30. It is easy to verify that $m(\bm{x})$ agrees in expectation with the contraction of KL divergence between $p$ and $\phi(p)$ ,

\displaystyle\langle m\rangle

\displaystyle=D(p\|\phi(p))-D(p^{\prime}\|\phi(p^{\prime})),

(117)

where, as before, $\langle\cdot\rangle$ refers to expectations under the trajectory distribution $p(\bm{x})$ . Then, given Theorem 2, this implies that the expectation $m(\bm{x})$ is also equal to the extra total EP incurred by initial distribution $p$ rather than the accessible distribution $\phi(p)$ ,

\displaystyle\langle m\rangle

\displaystyle=\Sigma(p\!\shortrightarrow\!p^{\prime})-\Sigma(\phi(p)\!\shortrightarrow\!\phi(p^{\prime})).

(118)

In kolchinsky2021state , it is shown that $m(\bm{x})$ obeys a fluctuation theorem (see also kwon2019fluctuation ). We re-derive the relevant results here. First, a simple rearrangement of Eq. 115 gives the following detailed fluctuation theorem,

\displaystyle m(\bm{x})

\displaystyle:=\ln\frac{p(x)}{p^{\prime}(x^{\prime})}+\ln\frac{P(\bm{x}|x)}{Q(\tilde{\bm{x}}|x^{\prime})},

(119)

where the conditional distribution $Q(\tilde{\bm{x}}|x^{\prime})$ is given by

\displaystyle Q(\tilde{\bm{x}}|x^{\prime}):=\frac{P(\bm{x}|x)\phi(p)(x)}{\phi(p)^{\prime}(x^{\prime})}.

In words, $Q(\tilde{\bm{x}}|x^{\prime})$ is the Bayesian posterior probability of trajectory $\bm{$}givenfinalstate$ x’ $,whentheprocessbeginsoninitialdistribution$ ϕ(p) $.Asimilarderivationasin\lx@cref{creftype~refnum}{eq:app53}showsthat$ m $obeysanintegralfluctuationtheorem,\begin{aligned} \langle e^{-m}\rangle=\int_{p(x)>0}p^{\prime}(x^{\prime}){Q}(\tilde{\bm{x}}|x^{\prime})D\bm{x}=\chi.\end{aligned}Here$ χ∈(0,1] $indicatestheabsoluteirreversibilityoftheprocessoninitialdistribution$ p $relativetoinitialdistribution$ ϕ(p).χ $isequalto1when$ p $and$ ϕ(p) $havethesamesupport,whichthenleadstoastandardintegralfluctuationtheorem$ ⟨e^-m ⟩=1 $.\par Importantly,\lx@cref{creftype~refnum}{eq:app54}impliesthattheprobabilitythatthetrajectory-levelEPoninitialdistribution$ p $is$ ξ $lessthanthetrajectory-levelEPoninitialdistribution$ ϕ(p) $isexponentiallysuppressed,\begin{aligned} \mathrm{P}[\sigma_{p}<\sigma_{\phi(p)}-\xi]&\stackrel{{\scriptstyle(a)}}{{=}}\mathrm{P}[m<-\xi]\stackrel{{\scriptstyle(b)}}{{\leq}}\chi e^{-\xi}\stackrel{{\scriptstyle(c)}}{{\leq}}e^{-\xi}.\end{aligned}Here,$ (a) $usesthedefinitionof$ m(x),(b) $usesastandardderivationinstochasticthermodynamics(see\cite[cite]{\@@bibref{Authors Phrase1YearPhrase2}{jarzynski_equalities_2011}{\@@citephrase{(}}{\@@citephrase{)}}},ortheappendixin\cite[cite]{\@@bibref{Authors Phrase1YearPhrase2}{kolchinsky2021state}{\@@citephrase{(}}{\@@citephrase{)}}}),while$ (c) $usesthat$ χ∈(0,1].

Appendix B Symmetry constraints

B.1 $\phi_{\mathcal{G}}$ obeys the Pythagorean identity, Eq. 14

In the following derivations, all integrals should be understood in the Lebesgue sense. For discrete state systems, integrals over $X$ can be replaced by summations.

The state space $X$ is assumed to be Borel measurable. Similarly, we assume that the action of the group $\mathcal{G}$ (i.e., the function $\mathcal{G}\times X\to X:(g,x)\mapsto g(x)$ ) is Borel measurable. Note that these assumptions imply that for any probability distribution $p\in\mathcal{P}$ , the function $(g,x)\mapsto p(g(x))$ is measurable, since it is the composition of two Borel measurable functions: $(g,x)\mapsto g(x)$ and $x\mapsto p(x)$ .

We begin with a few intermediate results.

Lemma 4.

For any $p\in\mathcal{P}$ , $g\in\mathcal{G}$ , and $x\in X$ ,

\phi_{\mathcal{G}}(p)(x)=\phi_{\mathcal{G}}(p)(g(x)).

Proof.

Using the definition of $\phi_{\mathcal{G}}$ in Eq. 42, write

	$\displaystyle\phi_{\mathcal{G}}(p)(g(x))$	$\displaystyle=\int_{\mathcal{G}}p(g^{\prime}(g(x)))\,d\mu(g^{\prime})$
		$\displaystyle={\textstyle\int_{\mathcal{G}}}\,p(g^{\prime}(x))\,d\mu(g^{\prime})=\phi_{\mathcal{G}}(p)(x),$

where we performed a change of variables $x\mapsto g^{-1}(x)$ and used the invariance properties $\mathcal{G}$ and the Haar measure $\mu$ . ∎

Lemma 5.

For any $p\in\mathcal{P}$ , measurable set $\Omega\subseteq X$ , and function $f:X\to\mathbb{R}$ ,

\int_{\Omega}p(x)f(x)=\int_{\Omega}\phi_{\mathcal{G}}(p)(x)f(x)dx

(120)

if the following three conditions hold: (1) $g(\Omega)=\Omega$ for all $g\in\mathcal{G}$ , (2) $f(x)=f(g(x))$ for all $x\in X$ and $g\in\mathcal{G}$ , (3) either $|\int_{\Omega}p(x)f(x)\,dx|<\infty$ , or $f$ is measurable and non-negative.

Proof.

To begin, write the left hand side of Eq. 120 as

$\displaystyle\int_{\Omega}p(x)f(x)\,dx$	$\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(x)f(x)\,dx\right]d\mu(g)$
	$\displaystyle=\int_{\mathcal{G}}\left[\int_{g^{-1}(\Omega)}p(g(x))f(g(x))\,dx\right]d\mu(g)$
	$\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(g(x))f(x)\,dx\right]d\mu(g).$	(121)

In the second line, we substituted $x\mapsto g(x)$ within each inner integral, while using that each $g$ is a rigid transformation (so the absolute value of its Jacobian is 1). In the last line, we used conditions (1) and (2).

We now show that we can exchange the order of integrals in Eq. 121 using condition (3) and Tonelli’s theorem. First, if $f$ is measurable and non-negative, then the function $x\mapsto p(g(x))f(x)$ is non-negative and measurable (since it is a product of two non-negative measurable functions), so the integrals can be exchanged by (Thm 3.7.7, benedetto_integration_2009, ). Alternatively, assume that $|\int_{\Omega}p(x)f(x)\,dx|<\infty$ , which means that the function $x\mapsto p(x)f(x)$ is integrable. This implies that

$\displaystyle\infty$	$\displaystyle>\int_{\Omega}p(x)\|f(x)\|\,dx$
	$\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(x)\|f(x)\|\,dx\right]d\mu(g)$	(122)
	$\displaystyle=\int_{\mathcal{G}}\left[\int_{g^{-1}(\Omega)}p(g(x))\|f(g(x))\|\,dx\right]d\mu(g)$
	$\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(g(x))\|f(x)\|\,dx\right]d\mu(g)$	(123)

where the first line follows from definition of Lebesgue integrability, while the rest follows from the same steps as Eq. 121. Given Eq. 123, the function $(g,x)\mapsto p(g(x))f(x)$ must be integrable, which again allows us to exchange the order of the integrals in Eq. 121 (Thm 3.7.8, benedetto_integration_2009, ).

We then derive our result by rewriting Eq. 121 as

	$\displaystyle\int_{\Omega}p(x)f(x)\,dx$	$\displaystyle=\int_{\Omega}\left[\int_{\mathcal{G}}p(g(x))f(x)\,d\mu(g)\right]dx$
		$\displaystyle=\int_{\Omega}\phi_{\mathcal{G}}(p)(x)f(x)\,dx,$

where we used the definition of $\phi_{\mathcal{G}}$ . ∎

Finally, we prove that $\phi_{\mathcal{G}}$ obeys the Pythagorean identity.

Proposition 1.

For any $p,q\in\mathcal{P}$ such that $D(p\|\phi_{\mathcal{G}}(q))<\infty$ ,

\displaystyle D(p\|\phi_{\mathcal{G}}(q))=D(p\|\phi_{\mathcal{G}}(p))+D(\phi_{\mathcal{G}}(p)\|\phi_{\mathcal{G}}(q)).

(124)

Proof.

For any $p\in\mathcal{P}$ , we indicate the support set as $\mathrm{supp}\;p=\{x\in X:p(x)>0\}$ . We first prove that

\mathrm{supp}\;p\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(q)}.

(125)

By the definition of $\phi_{\mathcal{G}}$ in Eq. 42, if $\phi_{\mathcal{G}}(p)(x)>0$ for some $x\in X$ , then $p(g(x))>0$ for that $x$ and some $g\in\mathcal{G}$ . In addition, the assumption that $D(p\|\phi_{\mathcal{G}}(q))<\infty$ implies that $\mathrm{supp}\;p\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(q)}$ (cover_elements_2006, ) (except for a set of measure 0, which we can safely ignore). Combining these facts implies that if $\phi_{\mathcal{G}}(p)(x)>0$ for some $x$ , then $\phi_{\mathcal{G}}(q)(g(x))>0$ for that $x$ — and therefore also $\phi_{\mathcal{G}}(q)(x)>0$ since $\phi_{\mathcal{G}}(q)$ is invariant under $\mathcal{G}$ , Lemma 4. This proves that $\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(q)}$ . Finally, by Lemma 4 and Lemma 5,

\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\,dx=\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}\phi_{\mathcal{G}}(p)(x)\,dx=1,

which implies that $\mathrm{supp}\;p\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}$ (up to a set of measure 0).

Next, write the KL divergence on the left hand side of Eq. 124 as (Eq. 8.58, cover_elements_2006, )

	$\displaystyle D(p\\|\phi_{\mathcal{G}}(q))=\int_{\mathrm{supp}\;p}p(x)\ln\frac{p(x)}{\phi_{\mathcal{G}}(q)(x)}dx$
	$\displaystyle\quad=D(p\\|\phi_{\mathcal{G}}(p))+\int_{\mathrm{supp}\;p}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx$
	$\displaystyle\quad=D(p\\|\phi_{\mathcal{G}}(p))+\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx,$		(126)

where the last line uses Eq. 125 (in particular, that $\mathrm{supp}\;p\subseteq\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}$ and $p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}=0$ for $x\in\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}\setminus\mathrm{supp}\;p$ ).

The integral in Eq. 126 is bounded from above by $D(p\|\phi_{\mathcal{G}}(q))<\infty$ , since $D(p\|\phi_{\mathcal{G}}(p))\geq 0$ . We also show that this integral is bounded from below. Note that $\phi_{\mathcal{G}}(p)(x)$ and $\phi_{\mathcal{G}}(q)(x)$ are both non-negative measurable functions, which follows from the fact that $x\mapsto p(g(x))$ and $x\mapsto p(g(x))$ are non-negative measurable functions, the definition of $\phi_{\mathcal{G}}$ , and Tonelli’s theorem (Thm 3.7.7, benedetto_integration_2009, ). Thus, the function $x\mapsto\frac{\phi_{\mathcal{G}}(q)(x)}{\phi_{\mathcal{G}}(p)(x)}$ is also non-negative and measurable, letting us bound the integral in the following way:

	$\displaystyle\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx$
	$\displaystyle\qquad\geq-\ln\left[\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\frac{\phi_{\mathcal{G}}(q)(x)}{\phi_{\mathcal{G}}(p)(x)}dx\right]$
	$\displaystyle\qquad=-\ln\left[\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}\phi_{\mathcal{G}}(p)(x)\frac{\phi_{\mathcal{G}}(q)(x)}{\phi_{\mathcal{G}}(p)(x)}dx\right]$
	$\displaystyle\qquad=-\ln\left[\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}\phi_{\mathcal{G}}(q)(x)\,dx\right]\geq-\ln 1=0.$

where in the second line we used Jensen’s inequality, while in the third line we applied Lemma 5. Finally, we use Lemma 5 to rewrite the integral in Eq. 126 as

\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx=\\ \int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}\phi_{\mathcal{G}}(p)(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx=D(\phi_{\mathcal{G}}(p)\|\phi_{\mathcal{G}}(q)).

∎

B.2 $\phi_{\mathcal{G}}$ obeys the commutativity relation, Eq. 16

It is easy to verify that $\Phi_{g}$ is a linear operator. It then follows that if $\Phi_{g}$ commutes with the linear operator $L$ , as in Eq. 38, then it also commutes with the exponential $e^{\tau L}=\sum_{k}\frac{1}{k!}\tau^{k}L^{k}$ . We then have

	$\displaystyle e^{\tau L}\phi_{\mathcal{G}}(p)$	$\displaystyle=e^{\tau L}\int\Phi_{g}p\,d\mu(g)$
		$\displaystyle=\int e^{\tau L}\Phi_{g}p\,d\mu(g)$
		$\displaystyle=\int\Phi_{g}e^{\tau L}p\,d\mu(g)$
		$\displaystyle=\phi_{\mathcal{G}}(e^{\tau L}p)$

where in the second line we exchanged the bounded operator $e^{\tau L}$ and the (Bochner) integral, and in the third line we used that $\Phi_{g}$ and $e^{\tau L}$ commute.

B.3 Derivation of Eq. 38 from Eq. 39 and Eq. 41

Consider some $f:X\to\mathbb{R}$ and a continuous-state master equation $L$ such that

\displaystyle[Lf](x)=\int\left[L_{xx^{\prime}}f(x^{\prime})-L_{x^{\prime}x}f(x)\right]\,dx^{\prime}.

(127)

(The derivation for discrete-state master equations, as in Eq. 10, is the same, but with integrals replaced with summations). Then,

	$\displaystyle[\Phi_{g}Lf](x)=[Lf]({g}(x))$
	$\displaystyle\quad=\int[L_{{g}(x)x^{\prime}}f(x^{\prime})-L_{x^{\prime}{g}(x)}f({g}(x))]dx^{\prime}$		(128)
	$\displaystyle\quad=\int[L_{{g}(x){g}(x^{\prime})}f({g}(x^{\prime}))-L_{{g}(x^{\prime}){g}(x)}f({g}(x))]dx^{\prime}$		(129)
	$\displaystyle\quad=\int[L_{xx^{\prime}}f({g}(x^{\prime}))-L_{x^{\prime}x}f({g}(x))]dx^{\prime}$		(130)
	$\displaystyle\quad=\int[L_{xx^{\prime}}[\Phi_{g}f](y)-L_{x^{\prime}x}[\Phi_{g}f](x)]dx^{\prime}$		(131)
	$\displaystyle\quad=[L\Phi_{g}f](x),$		(132)

which implies $\Phi_{g}L=L\Phi_{g}$ , Eq. 38. Here we used the definition of $\Phi_{g}$ in the first line and Eq. 127 in Eq. 128. In Eq. 129, we used the variable substitution $x^{\prime}\mapsto{g}(x^{\prime})$ , along with the fact that ${g}$ is volume preserving. In Eq. 130, we used Eq. 39.

Next, we show that Eq. 41 is sufficient for Eq. 38 to hold, assuming that all ${g}\in\mathcal{G}$ are rigid transformation and the $L\in\Lambda$ refer to Fokker-Planck equations of the form Eq. 40. First, given some (sufficiently smooth) function $f:X\to\mathbb{R}$ , write Eq. 40 as

{\textstyle\partial_{t}}f=Lf=\nabla\cdot((\nabla E)f)+\beta^{-1}\Delta f.

(133)

For any $g\in\mathcal{G}$ , write the diffusion term in Eq. 133 as

\displaystyle\Delta f=\Delta(\Phi_{g}f\circ{g^{-1}})=\Delta(\Phi_{g}f)\circ{g^{-1}},

(134)

where we used the identity $f=\Phi_{{g^{-1}}}\Phi_{g}f=\Phi_{g}f\circ{g^{-1}}$ and that the Laplace operator commutes with rigid transformations. Now consider the drift term in Eq. 133. Using the product rule,

\nabla\cdot((\nabla E)f)=(\nabla f)^{T}(\nabla E)+f\Delta E.

(135)

We can rewrite the second term above as

$\displaystyle f\Delta E$	$\displaystyle=(\Phi_{g}f\circ{g^{-1}})\Delta E$
	$\displaystyle=(\Phi_{g}f\circ{g^{-1}})\Delta(E\circ{g^{-1}})$
	$\displaystyle=(\Phi_{g}f\circ{g^{-1}})((\Delta E)\circ{g^{-1}})$
	$\displaystyle=((\Phi_{g}f)(\Delta E))\circ{g^{-1}},$	(136)

where we used $f=\Phi_{g}f\circ{g^{-1}}$ , the invariance of $E$ under $\mathcal{G}$ (Eq. 41), and in the third line that the Laplace operator commutes with rigid transformations. Now consider the first term on the right hand side of Eq. 135:

$\displaystyle(\nabla f)^{T}(\nabla E)$	$\displaystyle=(\nabla(\Phi_{g}f\circ{g^{-1}})^{T}\nabla(E\circ{g^{-1}})$
	$\displaystyle=(J^{T}(\nabla(\Phi_{g}f)\circ{g^{-1}}))^{T}(J^{T}((\nabla E)\circ{g^{-1}}))$
	$\displaystyle=(\nabla(\Phi_{g}f)\circ{g^{-1}})^{T}JJ^{T}((\nabla E)\circ{g^{-1}})$
	$\displaystyle=(\nabla(\Phi_{g}f)\circ{g^{-1}})^{T}((\nabla E)\circ{g^{-1}})$
	$\displaystyle=(\nabla(\Phi_{g}f)^{T}(\nabla E))\circ{g^{-1}},$	(137)

where $J$ indicates the Jacobian of ${g^{-1}}$ . In the first line, we again used the identity $f=\Phi_{g}f\circ{g^{-1}}$ and the invariance of $E$ under $\mathcal{G}$ , in the second line we used the chain rule, and in the fourth line we used that $JJ^{T}=I$ for rigid transformations. Plugging Eqs. 136 and 137 back into Eq. 135 and rearranging gives

\displaystyle\nabla\cdot((\nabla E)f)=\nabla\cdot((\nabla E)(\Phi_{g}f))\circ{g^{-1}}.

(138)

Combined with Eqs. 134 and 133, this in turns implies that $Lf=(L\Phi_{g}f)\circ{g^{-1}}$ , or in other words that

\Phi_{g}Lf=L\Phi_{g}f.

B.4 Derivation of Eq. 43

First, write the inaccessible information term in Eq. 35 as

	$\displaystyle D(p_{X\|M}\\|\phi_{\mathcal{G}}(p_{X\|M}))=\sum_{m}p(m)D(p_{X\|m}\\|\phi_{\mathcal{G}}(p_{X\|m}))$
	$\displaystyle=\sum_{m}p(m,x)\ln\frac{p(x\|m)}{\int p(g(x)\|m)\mu g}$
	$\displaystyle=\sum_{m}p(m,x)\ln\frac{p(x)q(m\|x)/p(m)}{\int p(g(x))q(m\|g(x))/p_{g}(m)\,\mu(g)},$		(139)

where we’ve defined $p(m)=\sum_{x}p(x)q(m|x)$ and $p_{g}(m)=\sum_{x}p(g(x))q(m|x)$ , and used the definition of $\phi_{\mathcal{G}}$ in Eq. 42. (Here we assume for simplicity that both $X$ and $M$ are discrete valued; otherwise the summations in Eq. 139 should be replaced with integrals.)

Recall that we assumed that $p$ is invariant under $\mathcal{G}$ , so $\phi_{\mathcal{G}}(p)=p$ . By Lemma 4, $p(x)=p(g(x))$ for all $x$ and $g\in\mathcal{G}$ , which in turn implies that $p(m)=p_{g}(m)$ . Plugging into Eq. 139 then gives

\displaystyle D(p_{X|M}\|\phi_{\mathcal{G}}(p_{X|M}))=\sum_{m}p(m,x)\ln\frac{q(m|x)}{\int q(m|g(x))\,\mu(g)},

which appears in the main text as Eq. 43.

B.5 Example: Szilard box, derivation of Eq. 50

We derive Eq. 50 using a simple geometric argument.

Consider the twirling of $p_{\theta}$ , as shown in Fig. 5(b). From the definition of $\phi_{\mathcal{G}}$ and Eq. 49, it is easy to see that

1.

The dark gray areas in Fig. 5(b) (where both $p_{\theta}(x_{1},x_{2})=1/2$ and $p_{\theta}(x_{1},-x_{2})=1/2$ ) have probability density $\phi_{\mathcal{G}}(p_{\theta})(x_{1},x_{2})=1/2$ .
2.

The light gray areas in Fig. 5(b) (where either $p_{\theta}(x_{1},x_{2})=1/2$ or $p_{\theta}(x_{1},-x_{2})=1/2$ , but not both) have probability density $\phi_{\mathcal{G}}(p_{\theta})(x_{1},x_{2})=1/4=u(x_{1},x_{4})$ .
3.

The white areas in Fig. 5(b) (where $p_{\theta}(x_{1},x_{2})=0$ and $p_{\theta}(x_{1},-x_{2})=0$ ) have probability density $\phi_{\mathcal{G}}(p_{\theta})(x_{1},x_{2})=0$ .

Given this,

\displaystyle D(\phi_{\mathcal{G}}(p_{\theta})\|u)=\ln 2\cdot P_{\theta},

(140)

where $P_{\theta}$ is the probability assigned by $p$ to the dark gray areas (i.e., those $(x_{1},x_{2})$ where $p_{\theta}(x_{1},x_{2})=1/2=p_{\theta}(x_{1},-x_{2})=1/2$ ).

To calculate the value of $P_{\theta}$ , is suffices to consider two separate cases:

1.

$|\theta|\in[-\pi,\pi]\setminus(\frac{\pi}{4},\frac{3\pi}{4})$
2.

$|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4})$

which are shown visually in Fig. 15. Using this figure, and a bit of trigonometry, it can be shown that $P_{\theta}=1-\frac{1}{2}|\tan\theta|$ in the first case, and $P_{\theta}=\frac{1}{2}|\tan(\theta-\pi/2)|$ in the second case. Combining these results with Eq. 140 gives Eq. 50.

B.6 Example: Symmetry constraints on a discrete-state master equation

Here we demonstrate our results on symmetry constraints using a simple finite-state system. The system contains $n$ states, $x=\{0,\dots,n-1\}$ . We consider a group generated by circular shifts, representing $m$ -fold circular symmetry:

\displaystyle{g}(x)=x+n/m\quad\mathrm{mod}\quad n.

(141)

Assume that the driving protocol obeys the following symmetry group at all $t\in[0,1]$ :

L_{x^{\prime}x}(t)=L_{{g}(x^{\prime}){g}(x)}(t),

(142)

An example of such a master equation would be a unicyclic network, where the $n$ states are arranged in a ring, and transitions between nearest-neighbor states obey Eq. 142. Such unicyclic networks are often used to model biochemical oscillators and similar biological systems (barato2017coherence, ). This kind of system is illustrated in Fig. 16, with $n=12$ and $m=4$ .

Imagine that this system starts from the initial distribution $p(x)\propto x$ , so the probability grows linearly from 0 (for $x=0$ ) to maximal (for $x=n$ ). For the 12 state system with 4-fold symmetry, this initial distribution is given by

p(x)=\frac{x}{\sum_{x^{\prime}=0}^{11}x^{\prime}}=\frac{x}{66},

and is shown on the left hand side of Fig. 16. How much work can be extracted by bringing this initial distribution to some other distribution $p^{\prime}$ , while using rate matrices of the form Eq. 142? This is bounded by the drop of the accessible free energy, via Eq. 25:

\displaystyle W(p\!\shortrightarrow\!p^{\prime})\leq F_{E}(\phi_{\mathcal{G}}(p))-F_{E^{\prime}}(\phi_{\mathcal{G}}(p^{\prime})).

(143)

Using the example system with 12 states and 4-fold symmetry, the twirled distribution $\phi_{\mathcal{G}}(p)$ is given by

\phi_{\mathcal{G}}(p)(x)=\\ \frac{x+(x+3\text{ mod }12)+(x+6\text{ mod }12)+(x+9\text{ mod }12)}{4\times 66}.

For example, for the distribution $p(x)=x/66$ ,

$\displaystyle\phi_{\mathcal{G}}(p)(0)$	$\displaystyle=(0+3+6+9)/(4\times 66)$	$\displaystyle\approx 0.068$
$\displaystyle\phi_{\mathcal{G}}(p)(1)$	$\displaystyle=(1+4+7+10)/(4\times 66)$	$\displaystyle\approx 0.083$
$\displaystyle\phi_{\mathcal{G}}(p)(2)$	$\displaystyle=(2+5+8+11)/(4\times 66)$	$\displaystyle\approx 0.098$
$\displaystyle\phi_{\mathcal{G}}(p)(3)$	$\displaystyle=(3+6+9+0)/(4\times 66)$	$\displaystyle\approx 0.068$
$\displaystyle\dots$	$\displaystyle\dots$

This twirled distribution is shown on the right panel of Fig. 16.

B.7 Example: 2D Ising model, derivation of Eq. 56

We begin by recalling the expression for accessible information in our feedback-control protocol over the 2D Ising model, which appears as Eq. 55 in the main text:

\displaystyle I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)\!=\ln 2-\Big{\langle}\!\ln\frac{q(m|x)}{N^{-2}\sum_{a,b}q(m|g_{a,b}(x))}\Big{\rangle}.

(144)

Using $q(m|x)=\delta_{m}(x_{1})$ , the expectation term in Eq. 144 can be rewritten as

\displaystyle-\sum_{x}p(x)\sum_{\mathclap{m\in\{-1,1\}}}\delta_{m}(x_{1})\ln\big{[}N^{-2}{\sum_{a,b}\delta_{m}(g_{a,b}(x)_{1})}\big{]}.

(145)

Let $z(x)=(1+\sum_{i}x_{i}/N^{2})/2$ indicate the magnetization of lattice state $x$ , normalized to lie between 0 and 1. Note that for any lattice state $x$ , the frequency that spin 1 is in state 1 averaged across all translations is equal to the magnetization of $x$ ,

N^{-2}\sum_{a,b}\delta_{1}(g_{a,b}(x)_{1})=z(x).

In addition, by symmetry, the probability that spin 1 is in state 1 averaged across all states that have magnetization $z$ is equal to $z$ ,

\sum_{x}p(x|z)\delta_{1}(x_{1})=z.

Using these results and $\delta_{-1}(x)=1-\delta_{1}(x)$ , we can rewrite the expression in Eq. 145 as

	$\displaystyle-\sum_{x}p(x)[\delta_{1}(x_{1})\ln z(x)+(1-\delta_{1}(x_{1}))\ln(1-z(x))]$
	$\displaystyle=\sum_{z}p(z)[-z\ln z-(1-z)\ln(1-z)]\equiv\langle h_{2}(z)\rangle,$		(146)

where $p(z^{\prime})=\sum_{x}p(x)\delta_{z^{\prime}}(z(x))$ is the probability that the system has magnetization $z^{\prime}$ and $h_{2}$ is the binary entropy function.

We now consider the $N\to\infty$ limit, and use Onsager’s expression for the spontaneous magnetization for the 2D Ising model yang1952spontaneous . When $\beta$ is below the critical inverse temperature, $\beta_{c}=\ln(1+\sqrt{2})/2\approx 0.44$ , the magnetization distribution $p(z)$ concentrates at $z=1/2$ , so Eq. 146 approaches $h_{2}(1/2)=\ln 2$ . When $\beta>\beta_{c}$ , the magnetization distribution concentrates on a uniform mixture of two delta functions at $z=f(\beta)$ and $z=1-f(\beta)$ , where $f(\beta)=(1+\sqrt[8]{1-(\sinh 2\beta)^{-4}})/2$ . In this case, Eq. 146 approaches $(h_{2}(f(\beta))+h_{2}(1-f(\beta)))/2=h_{2}(f(\beta))$ . Combining these results with Eq. 144 implies that $I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)=0$ for $\beta\leq\beta_{c}$ and $I_{\mathrm{acc}}^{\phi_{\mathcal{G}}}(X;M)=\ln 2-h_{2}(f(\beta))$ for $\beta>\beta_{c}$ , which appears as Eq. 56 in the main text.

Appendix C Modularity constraints

C.1 $\phi_{{\mathcal{C}}}$ obeys the Pythagorean identity, Eq. 14

We show that $\phi_{\mathcal{C}}$ obeys the Pythagorean identity:

\displaystyle D(p\|\phi_{\mathcal{C}}(q))=D(p\|\phi_{\mathcal{C}}(p))+D(\phi_{\mathcal{C}}(p)\|\phi_{\mathcal{C}}(q)).

(147)

for all $p,q\in\mathcal{P}$ such that $D(p\|\phi_{\mathcal{G}}(q))<\infty$ . For any $p,r\in\mathcal{P}$ ,

	$\displaystyle\mathbb{E}_{p}[\ln\phi_{{\mathcal{C}}}(r)]=\mathbb{E}_{p}[\ln r_{O}]+\sum_{{A\in\mathcal{C}}}\mathbb{E}_{p}[\ln r_{A\setminus O\|A\cap O}]$
	$\displaystyle\quad=\mathbb{E}_{\phi_{{\mathcal{C}}}(p)}[\ln r_{O}]+\sum_{{A\in\mathcal{C}}}\mathbb{E}_{\phi_{{\mathcal{C}}}(p)}[\ln r_{A\setminus O\|A\cap O}]$		(148)
	$\displaystyle\quad=\mathbb{E}_{\phi_{{\mathcal{C}}}(p)}[\ln\phi_{{\mathcal{C}}}(r)],$		(149)

where $a_{O}$ and $a_{A\setminus O|A\cap O}$ indicate marginal and conditional distributions, respectively. In Eq. 148, we used that $p$ and $\phi_{{\mathcal{C}}}(p)$ have the same marginals over all subsystems all $A\in\mathcal{C}$ as well as the overlap $O$ (this can be verified from the definition of $\phi_{{\mathcal{C}}}$ , Eq. 64). Then,

	$\displaystyle D(p\\|\phi_{{\mathcal{C}}}(q))$	$\displaystyle=D(p\\|\phi_{{\mathcal{C}}}(p))+\mathbb{E}_{p}[\ln\phi_{{\mathcal{C}}}(p)-\ln\phi_{{\mathcal{C}}}(q)]$
		$\displaystyle=D(p\\|\phi_{{\mathcal{C}}}(p))+\mathbb{E}_{\phi_{{\mathcal{C}}}(p)}[\ln\phi_{{\mathcal{C}}}(p)-\ln\phi_{{\mathcal{C}}}(q)]$
		$\displaystyle=D(p\\|\phi_{{\mathcal{C}}}(p))+D(\phi_{{\mathcal{C}}}(p)\\|\phi_{{\mathcal{C}}}(q)),$

where the second line follows by applying Eq. 149 twice, first taking $r=p$ and then taking $r=q$ .

C.2 $\phi_{{\mathcal{C}}}$ commutes with $e^{\tau L}$

We show that if for some generator $L$ , Eqs. 59 and 60 hold for all $A\in\mathcal{C}$ , then $\phi_{\mathcal{C}}$ and $e^{\tau L}$ obey the commutativity relation of Eq. 16. We assume that all $L^{(A)}$ in Eq. 60 are bounded linear operators.

Before deriving our result, we introduce some helpful notation:

1.

$\delta_{x}(x^{\prime})$ indicates the delta function distribution over $X$ centered at $x$ (this is the Dirac delta for continuous $X$ , and the Kronecker delta for discrete $X$ ). For any subsystem $S\subseteq V$ , $\delta_{x_{S}}(x^{\prime}_{S})$ indicates the delta function distribution over $X_{S}$ centered at $x_{S}$ .
2.

$T^{(A)}_{\tau}(x^{\prime}|x)=[e^{\tau L^{(A)}}\delta_{x}](x^{\prime})$ indicates the conditional distribution over $X$ , given that the system starts on state $x$ and then evolves under $L^{(A)}$ for time $\tau$ .
3.

For any $A\in\mathcal{C}$ ,

${\textstyle{\bm{A}}:=A\setminus\big{(}\bigcup_{B\in\mathcal{C}\setminus\{A\}}B\big{)}=A\setminus O(\mathcal{C})}$

indicates the set of degrees of freedom that belong exclusively to $A\in\mathcal{C}$ (and no other subsystems), and

${\textstyle{{\bm{A}}^{c}}:=V\setminus{\bm{A}}=\bigcup_{B\in\mathcal{C}\setminus\{A\}}B}.$

indicates the complement of ${\bm{A}}$ , which is the set of degrees of freedom that fall into at least one of the other subsystem besides $A$ .

To derive the commutativity relation, we proceed in three steps, which are described in detail in the subsections below. In the first step, we show that, for all $\tau\geq 0$ and $A\in\mathcal{C}$ , the conditional distribution $T^{(A)}_{\tau}(x^{\prime}|x)$ can be written in the following product form:

\displaystyle T^{(A)}_{\tau}(x^{\prime}|x)=T^{(A)}_{\tau}({x_{\bm{A}}^{\prime}|x_{A}})\delta_{x_{{\bm{A}}^{c}}}(x^{\prime}_{{\bm{A}}^{c}}).

(150)

In the second step, we show that Eq. 150 implies the following commutativity relation for any $p\in\mathcal{P}$ and each $A\in\mathcal{C}$ :

\displaystyle e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)=\phi_{{\mathcal{C}}}(e^{\tau L^{(A)}}p).

(151)

In the third step, we show that the generators corresponding to all subsystems commute:

\displaystyle L^{(A)}L^{(B)}=L^{(B)}L^{(A)}\qquad\forall A,B\in\mathcal{C}.

(152)

We then combine these three results to show that $\phi_{{\mathcal{C}}}$ and $e^{\tau L}$ commute. Write

\displaystyle e^{\tau L}\phi_{{\mathcal{C}}}(p)=e^{\sum_{A\in\mathcal{C}}\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)=\prod_{A\in\mathcal{C}}e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p).

where we used Eqs. 58 and 152 to expand the operator exponential. Then, using Eq. 151, write

\prod_{A\in\mathcal{C}}e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)=\phi_{{\mathcal{C}}}\Bigg{(}\prod_{A\in\mathcal{C}}e^{\tau L^{(A)}}p\Bigg{)}=\phi_{{\mathcal{C}}}(e^{\tau L}p).

Combining these two results implies that $e^{\tau L}\phi_{{\mathcal{C}}}(p)=\phi_{{\mathcal{C}}}(e^{\tau L}p)$ for all $p\in\mathcal{P}$ and $\tau\geq 0$ , as in Eq. 16.

C.2.1 Derivation of Eq. 150

To derive Eq. 150, consider the conditional distribution over ${\bm{A}}$ given initial state $x$ , as induced by $L^{(A)}$ :

$\displaystyle T^{(A)}_{\tau}(x^{\prime}_{\bm{A}}\|x)$	$\displaystyle=[e^{\tau L^{(A)}}\delta_{x}]_{\bm{A}}(x^{\prime}_{\bm{A}})$
	$\displaystyle=[\delta_{x}]_{\bm{A}}(x^{\prime}_{\bm{A}})+\sum_{k\geq 1}\frac{\tau^{k}}{k!}[{L^{(A)}}^{k}\delta_{x}]_{\bm{A}}(x^{\prime}_{\bm{A}})$
	$\displaystyle=\delta_{x_{\bm{A}}}(x^{\prime}_{\bm{A}})+\sum_{k\geq 1}\frac{\tau^{k}}{k!}[{L^{(A)}}^{k}\delta_{x}]_{\bm{A}}(x^{\prime}_{\bm{A}}).$	(153)

where in the second line we expanded the operator exponential as $e^{\tau L^{(A)}}=\sum_{k}\tau^{k}{L^{(A)}}^{k}/k!$ . Note that ${\bm{A}}\subseteq A$ , so $[L^{(A)}\delta_{x}]_{{\bm{A}}}$ is a function of $[L^{(A)}\delta_{x}]_{A}$ , which in turn is a function of $x_{A}$ by Eq. 59. Similarly, $\delta_{x_{\bm{A}}}(x^{\prime}_{\bm{A}})$ depends only on $x_{A}$ , not $x$ . This means the right hand side of Eq. 153 depends only on $x_{A}$ , which we indicate by

\displaystyle T^{(A)}_{\tau}(x^{\prime}_{\bm{A}}|x)=T^{(A)}_{\tau}({x_{\bm{A}}^{\prime}|x_{A}}).

(154)

Now consider the conditional distribution over any other subsystem $B\neq A$ given initial state $x$ , as induced by $L^{(A)}$ :

	$\displaystyle T^{(A)}_{\tau}(x^{\prime}_{B}\|x)$	$\displaystyle=\delta_{x_{B}}(x^{\prime}_{B})+\sum_{k\geq 1}\frac{\tau^{k}}{k!}[{L^{(A)}}^{k}\delta_{x}]_{B}(x^{\prime}_{B})$
		$\displaystyle=\delta_{x_{B}}(x^{\prime}_{B}),$		(155)

where we used that $[L^{(A)}\delta_{x}]_{B}=0$ by Eq. 60.

Now, it is straightforward to show that if some distribution $p$ over $X_{V}$ has delta function marginals $p_{B}=\delta_{x_{B}}$ for all $B\neq A$ , then $p$ must have following product form:

\displaystyle p(x^{\prime})=p_{{\bm{A}}}(x_{{\bm{A}}}^{\prime})\,\delta_{x_{{\bm{A}}^{c}}}(x^{\prime}_{{\bm{A}}^{c}}),

(156)

where we use hat ${{\bm{A}}^{c}}=\bigcup_{B\in\mathcal{C}\setminus\{A\}}B$ . Eq. 150 follows by taking $p(x^{\prime})=T^{(A)}_{\tau}(x^{\prime}|x)$ in Eq. 156, while using Eq. 154.

C.2.2 Derivation of Eq. 151

Consider any $\tau\geq 0$ and $A\in\mathcal{C}$ . Using Eq. 59 and the identity $e^{\tau L^{(A)}}=\sum_{k}\tau^{k}{L^{(A)}}^{k}/k!$ , one can show that whenever two distributions $p,q\in\mathcal{P}$ obey $p_{A}=q_{A}$ , it must be that $[e^{\tau L^{(A)}}p]_{A}=[e^{\tau L^{(A)}}q]_{A}$ . Since $p_{A}=[\phi_{{\mathcal{C}}}(p)]_{A}$ (see the definition of $\phi_{{\mathcal{C}}}$ in Eq. 64),

\displaystyle[e^{\tau L^{(A)}}p]_{A}=[e^{\tau L^{(A)}}\phi(p)]_{A}.

(157)

In addition, given Eq. 155, we have $[e^{\tau L^{(A)}}p]_{{\bm{A}}^{c}}=p_{{\bm{A}}^{c}}$ . Given that $B\subseteq{{\bm{A}}^{c}}$ for each $B\neq A$ , we have

\displaystyle[e^{\tau L^{(A)}}p]_{B}

\displaystyle=p_{B}=\phi(p)_{B}=[e^{\tau L^{(A)}}\phi(p)]_{B}.

(158)

Similarly, $O(\mathcal{C})\subseteq{{\bm{A}}^{c}}$ and therefore

\displaystyle[e^{\tau L^{(A)}}p]_{O(\mathcal{C})}

\displaystyle=[e^{\tau L^{(A)}}\phi(p)]_{O(\mathcal{C})}.

(159)

Now, observe that the distribution $\phi_{{\mathcal{C}}}(p)$ does not depend on the full distribution $p$ , but only on the marginal distributions $p_{O(\mathcal{C})}$ and $\{p_{A}\}_{A\in\mathcal{C}}$ . By Eqs. 157, 158 and 159, these marginals are the same for $e^{\tau L^{(A)}}p$ and $e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)$ , which means that

\displaystyle\phi_{{\mathcal{C}}}(e^{\tau L^{(A)}}p)=\phi_{{\mathcal{C}}}(e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)).

(160)

Next, using Eq. 150 and some simple (but rather tedious) algebra, it can be shown that

\displaystyle e^{\tau L^{(A)}}\phi_{{\mathcal{C}}}(p)=p_{A\setminus O|A\cap O}^{\prime}\;p_{O}\;\prod_{{B\neq A}}p_{B\setminus O|B\cap O}\;,

(161)

where

p_{A\setminus O|A\cap O}^{\prime}(x_{A\setminus O}^{\prime}|x_{A\cap O}^{\prime})=\\ \int T^{(A)}_{\tau}({x_{\bm{A}}^{\prime}|x_{\bm{A}},x_{A\cap O}}^{\prime})p(x_{{\bm{A}}}|x_{A\cap O}^{\prime})dx_{{\bm{A}}},

(162)

and we used the conditional distribution $T^{(A)}_{\tau}({x_{\bm{A}}^{\prime}|x_{\bm{A}},x_{A\cap O}})$ from Eq. 154. The right hand side of Eq. 161 has the form of the right hand side of Eq. 64, so it is invariant under $\phi_{\mathcal{C}}$ :

\displaystyle\phi_{\mathcal{C}}(e^{\tau L^{(A)}}\phi_{\mathcal{C}}(p))=e^{\tau L^{(A)}}\phi_{\mathcal{C}}(p).

(163)

Eq. 151 follows by combining Eqs. 160 and 163.

C.2.3 Derivation of Eq. 152

Using Eq. 150 and some algebra, one can verify that for all $\tau\geq 0$ and $A,B\in\mathcal{C}$ ,

\int T^{(A)}_{\tau}(x^{\prime\prime}|x^{\prime})T^{(B)}_{\tau}(x^{\prime}|x)\,dx^{\prime}\\ =\int T^{(B)}_{\tau}(x^{\prime\prime}|x^{\prime})T^{(A)}_{\tau}(x^{\prime}|x)\,dx^{\prime},

(164)

which in operator notation can be written as

\displaystyle e^{\tau L^{(A)}}e^{\tau L^{(B)}}\delta_{x}=e^{\tau L^{(B)}}e^{\tau L^{(A)}}\delta_{x}.

(165)

Then, for any function $f=\int f(x)\delta_{x}\,dx$ , write

	$\displaystyle e^{\tau L^{(A)}}e^{\tau L^{(B)}}f$	$\displaystyle=e^{\tau L^{(A)}}e^{\tau L^{(B)}}\int f(x)\delta_{x}\,dx$
		$\displaystyle=\int f(x)e^{\tau L^{(A)}}e^{\tau L^{(B)}}\delta_{x}\,dx$
		$\displaystyle=\int f(x)e^{\tau L^{(B)}}e^{\tau L^{(A)}}\delta_{x}\,dx$
		$\displaystyle=e^{\tau L^{(B)}}e^{\tau L^{(A)}}\int f(x)\delta_{x}\,dx$
		$\displaystyle=e^{\tau L^{(B)}}e^{\tau L^{(A)}}f,$

where we exchanged the order of the bounded operators $e^{\tau L^{(A)}}e^{\tau L^{(B)}}$ and $e^{\tau L^{(B)}}e^{\tau L^{(A)}}$ with the (Bochner) integral $\int f(x)\delta_{x}\,dx$ , and used Eq. 165. This shows that $e^{\tau L^{(A)}}$ and $e^{\tau L^{(B)}}$ commute for all $\tau\geq 0$ , so their inverses $e^{-\tau L^{(A)}}$ and $e^{-\tau L^{(B)}}$ must also commute. Given that $e^{\tau L^{(A)}}$ and $e^{\tau L^{(B)}}$ commute for all $\tau\in\mathbb{R}$ , $L^{(A)}$ and $L^{(B)}$ must commute (engel_one-parameter_2000, , p. 23).

C.3 Szilard box: derivation of Eqs. 74 and 76

We first derive Eq. 74. Using Eq. 70 and some rearrangement, write

\displaystyle D(\phi_{{\mathcal{C}}}(p_{\theta})\|u)=\ln 4-S(p_{\theta}(X_{1}))-S({p_{\theta}}(X_{2})),

(166)

where $S(p_{\theta}(X_{1}))$ and $S({p_{\theta}}(X_{2}))$ refer to the marginal entropies under $p_{\theta}$ . It is easy to see that by symmetry,

\displaystyle S(p_{\theta}(X_{1}))=S(p_{\frac{\pi}{2}-\theta}(X_{2})).

(167)

Therefore, we will derive a closed-form expression for $D(\phi_{{\mathcal{C}}}(p_{\theta})\|u)$ by finding a closed-form expression for

\displaystyle S(p_{\theta}(X_{1})):=-\int_{-1}^{1}p_{\theta}(x_{1})\ln p_{\theta}(x_{1})\,dx_{1}.

(168)

First, consider the case of $\theta\in[-\pi/2,\pi/2]$ , and define $A_{\theta}:=|\tan\theta|$ . It can be verified from Eq. 49 that the marginal distribution $p_{\theta}(x_{1})$ always has a piecewise linear form. In particular, if $A_{\theta}<1$ , then for any $x_{1}\in[-1,1]$ ,

\displaystyle p_{\theta}(x_{1})=\begin{cases}1&\text{if $-1\leq x_{1}\leq-A_{\theta}$}\\ \frac{A_{\theta}-x_{1}}{2A_{\theta}}&\text{if $-A_{\theta}\leq x_{1}\leq A_{\theta}$}\\ 0&\text{if $x_{1}>A_{\theta}$}\end{cases}

(169)

Otherwise, if $A_{\theta}>1$ , then for any $x_{1}\in[-1,1]$ ,

\displaystyle p_{\theta}(x_{1})=\frac{A_{\theta}-x_{1}}{2A_{\theta}}.

(170)

Plugged into Eq. 168, this gives

\displaystyle S({p_{\theta}}(X_{1}))

\displaystyle=\begin{cases}-\int_{-1}^{1}\frac{A_{\theta}-x_{1}}{2A_{\theta}}\ln\frac{A_{\theta}-x_{1}}{2A_{\theta}}\,dx_{1}&\text{if $A_{\theta}>1$}\\ -\int_{-A_{\theta}}^{A_{\theta}}\frac{A_{\theta}-x_{1}}{2A_{\theta}}\ln\frac{A_{\theta}-x_{1}}{2A_{\theta}}\,dx_{1}&\text{otherwise}\end{cases}

Integrating these two cases separately in Mathematica, and plugging in the definition of $A_{\theta}$ , gives

\displaystyle S({p_{\theta}}(X_{1}))

\displaystyle=\frac{1}{2}\begin{cases}f(|\tan\theta|)&\text{if $|\tan\theta|>1$}\\ |\tan\theta|&\text{otherwise}\end{cases}

(171)

where for convenience we’ve defined

\displaystyle f(x)=1-\frac{1+x^{2}}{2x}\ln\frac{x+1}{x-1}-\ln\frac{x^{2}-1}{4x^{2}}.

(172)

Recall that so far we assumed that $\theta\in[-\pi/2,\pi/2]$ . However, by Eq. 49, $p_{\theta}(x_{1},x_{2})=p_{\pm\pi-\theta}(-x_{1},x_{2})$ , which implies that $p_{\theta}(x_{1})=p_{\pi-\theta}(-x_{1})=p_{-\pi-\theta}(-x_{1})$ and $S({p_{\theta}}(X_{1}))=S(p_{\pi-\theta}(X_{1}))=S(p_{-\pi-\theta}(X_{1}))$ . It can also be verified that $|\tan\theta|=|\tan(\pi-\theta)|=|\tan(-\pi-\theta)|$ , so in fact Eq. 171 holds for all $\theta\in[-\pi,\pi]$ .

Finally, if $|\theta|\in(\frac{\pi}{4},\frac{3\pi}{4})$ , then Eqs. 171 and 167 imply

	$\displaystyle\|\tan\theta\|>1,\quad$	$\displaystyle S({p_{\theta}}(X_{1}))=\frac{1}{2}f(\|\tan\theta\|)$
	$\displaystyle\|\tan({\textstyle\frac{\pi}{2}-\theta})\|\leq 1,\quad$	$\displaystyle S({p_{\theta}}(X_{2}))=\frac{1}{2}\|\tan({\textstyle\frac{\pi}{2}-\theta})\|$

Conversely, if $|\theta|\in[0,\pi]\setminus(\frac{\pi}{4},\frac{3\pi}{4})$ , then

	$\displaystyle\|\tan\theta\|\leq 1,\quad$	$\displaystyle S({p_{\theta}}(X_{1}))=\frac{1}{2}\|\tan\theta\|$
	$\displaystyle\|\tan({\textstyle\frac{\pi}{2}-\theta})\|>1,\quad$	$\displaystyle S({p_{\theta}}(X_{2}))=\frac{1}{2}f(\|\tan({\textstyle\frac{\pi}{2}-\theta})\|)$

Eq. 74 follows by combining these results and rearranging.

To derive Eq. 76, use $\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))(x_{1},x_{2})=p_{\theta}(x_{1})u(x_{2})$ to write

	$\displaystyle D(\phi_{\mathcal{G}}(\phi_{{\mathcal{C}}}(p_{\theta}))\\|u)$	$\displaystyle=\ln 4-S(p_{\theta}(X_{1}))-S(u(X_{2}))$
		$\displaystyle=\ln 2-S(p_{\theta}(X_{1})),$		(173)

where we used that $S(u(X_{2}))=\ln 2$ . Eq. 76 then follows by combining Eqs. 173 and 171.

C.4 Example: Feedback controlled flashing ratchet

Here we derive a closed-form expression for the accessible information in the feedback-controlled collective flashing ratchet.

For notational convenience, let $a=1/\alpha$ indicate the slope of the increasing part of $V$ in Fig. 10(b), and $b=-1/(1-\alpha)$ indicate the slope of the decreasing part of $V$ . Note that the net force $\sum_{v}V^{\prime}(x_{v})$ can be seen as the sum of $N$ random variables, where by assumption each $V^{\prime}(x_{v})$ is equal to $a=1/\alpha$ with probability $\alpha$ and equal to $b=-1/(1-\alpha)$ with probability $1-\alpha$ . This implies that the expectation of $V^{\prime}(x_{v})$ is 0 and the variance is $1/(\alpha(1-\alpha))$ .

We will first compute the accessible information $I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)=\sum_{v}I(X_{v};M)=N\cdot I(X_{1};M)$ . The mutual information between $M$ and the state of a single particle $X_{1}$ is given by

	$\displaystyle I(X_{1};M)=S(M)-S(M\|X)$
	$\displaystyle\quad=h_{2}(p(1))-\alpha h_{2}(p(1\|a))-(1-\alpha)h_{2}(p(1\|b)),$		(174)

where $p(1)$ is the probability that the net force is positive, $p(1|a)$ is the probability that the net force is positive given that particle $X_{1}$ experiences force $a$ , and $p(1|b)$ is the probability that the net force is positive given that the particle $X_{1}$ experiences force $b$ . We can compute $p(1)$ by considering the case when $k=0,1,2,\dots$ particles experience force $a$ . Assuming the particles are independent, this is given by

\displaystyle p(1)=\sum_{\mathclap{k=0}}^{N}B_{N,\alpha}(k)\Theta(ka+(N-k)b)

(175)

where $B_{N,\alpha}$ is the binomial probability of $k$ successes, given $N$ trials with success probability $\alpha$ . To compute $p(1|a)$ , note that, given that $X_{1}$ experiences force $a$ , $M=1$ whenever the other $N-1$ particles experience a net force larger than $-a$ . The probability of this event is

\displaystyle p(1|a)=\sum_{\mathclap{k=0}}^{N-1}B_{N-1,\alpha}(k)\Theta(ka+(N-1-k)b+a).

(176)

Conversely, if $X_{1}$ experiences force $b$ , then $M=1$ if the other $N-1$ particles experience a net force larger than $-b$ , which has probability

\displaystyle p(1|b)=\sum_{\mathclap{k=0}}^{N-1}B_{N-1,\alpha}(k)\Theta(ka+(N-1-k)b+b).

(177)

Plugging Eqs. 175, 176 and 177 into Eq. 174 gives $I(X_{1};M)$ . Multiplying by $N$ gives the accessible information,

	$\displaystyle I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)=N\cdot I(X_{1};M)=$		(178)
	$\displaystyle N\Bigg{[}h_{2}\Bigg{(}\sum_{\mathclap{k=0}}^{N}B_{N,\alpha}(k)\Theta(ka+(N-k)b)\Bigg{)}-$
	$\displaystyle\alpha h_{2}\Bigg{(}\sum_{\mathclap{k=0}}^{N-1}B_{N-1,\alpha}(k)\Theta(ka+(N-1-k)b+a)\Bigg{)}-$
	$\displaystyle(1-\alpha)h_{2}\Bigg{(}\sum_{\mathclap{k=0}}^{N-1}B_{N-1,\alpha}(k)\Theta(ka+(N-1-k)b+b)\Bigg{)}\Bigg{]},$

This is shown in Fig. 12(left) for different values of $N$ and $\alpha$ .

To compute the efficiency values in Fig. 12(right), we simply divide $I_{\mathrm{acc}}^{\phi_{{\mathcal{C}}}}(X;M)$ by $I(X;M)$ the total mutual information between the measurement and all particles. Since the measurement in Eq. 80 is deterministic, this mutual information is given by the entropy of $M$ ,

\displaystyle I(X;M)=S(M)=h_{2}(p(1)),

(179)

which can be computed using Eq. 175.

We now compute the asymptotic value of accessible information and efficiency in the $N\to\infty$ limit. The sum of a large number of independent random variables with mean 0 and variance $1/(\alpha(1-\alpha))$ approaches a Gaussian with mean 0 and variance $N/(\alpha(1-\alpha))$ . Thus, in the $N\to\infty$ limit, the probability that the force is positive converges to $p(1)=1/2$ , so $I(X;M)=S(M)$ converges to $\ln 2$ . Recall that $p(1|a)$ is given by the probability that $N-1$ particles experience a net force larger than $-a$ . In the $N\to\infty$ limit, this conditional probability converges to

p(1|a)=1-\Phi_{\alpha,N-1}(-a)=\Phi_{\alpha,N-1}(a).

where $\Phi_{\alpha,N-1}$ is the cumulative distribution function of a Gaussian with mean 0 and variance ${N}/(\alpha(1-\alpha))$ . We can similarly calculate

p(1|b)=1-\Phi_{\alpha,N-1}(-b)=\Phi_{\alpha,N-1}(b).

Plugging into Eq. 174 gives

I(X_{1};M)=\\ \ln 2-\alpha h_{2}\big{(}\Phi_{\alpha,N-1}(a)\big{)}-(1-\alpha)h_{2}\big{(}\Phi_{\alpha,N-1}(b)\big{)}.

(180)

Using $a=1/\alpha$ and $b=-1/(1-\alpha)$ and some analysis (e.g., by taking limits in Mathematica) shows that

\displaystyle\lim_{N\to\infty}N\cdot I(X_{1};M)=\frac{1}{\pi},

(181)

irrespective of $\alpha$ . This is the asymptotic accessible information, which appears as the dotted line in Fig. 12(left). The asymptotic efficiency, which appears as the dotted line in Fig. 12(right), is given by $1/(\pi\ln 2)$ (since $I(X;M)=\ln 2$ in the $N\to\infty$ limit).

Appendix D Coarse-grained constraints

D.1 Derivation of Eq. 82 from Eqs. 83 and 85

In general, the microstate distribution $p$ evolves according to some generator $L$ , ${\textstyle\partial_{t}}p(t)=Lp(t)$ , the macrostate distribution $p_{Z}$ evolves according to a coarse-grained generator $\hat{L}^{p}$ . In general, the coarse-grained dynamics will not be closed, meaning that $\hat{L}^{p}$ can depend on the microstate distribution $p$ . In this section, we provide concrete conditions on the generators that guarantee that the coarse-grained dynamics are closed. In the following derivations, for notational simplicity, we omit the dependence of $p(x,t)$ and $p(z,t)$ on $t$ .

For discrete-state master equations, the coarse-grained dynamics are given by (esposito2012stochastic, )

{\textstyle\partial_{t}}p_{Z}(z)=\hat{L}^{p}p_{Z}(z)=\sum_{z}\Big{[}\hat{L}^{p}_{zz^{\prime}}p_{Z}(z^{\prime})-\hat{L}^{p}_{z^{\prime}z}p_{Z}(z)\Big{]},

(182)

where $\hat{L}^{p}_{zz^{\prime}}$ is the transition rate from macrostate $z^{\prime}$ to $z$ ,

\hat{L}^{p}_{zz^{\prime}}=\sum_{x^{\prime}}p({x^{\prime}|z^{\prime}})\sum_{x}\delta_{\xi(x)}(z)L_{xx^{\prime}}.

(183)

By plugging Eq. 83 into Eq. 183 and simplifying, one can verify that $\hat{L}^{p}_{zz^{\prime}}$ does not depend on the microstate distribution $p$ , therefore Eq. 82 holds.

A similar approach can be used for continuous-state master equations.

We now consider Fokker-Planck equations of the form Eq. 84, given a linear coarse-graining function $\xi(x)=Wx$ . Using (duong2018quantification, , Prop. 2.8), we write the evolution of the coarse-grained distribution $p_{Z}$ as

{\textstyle\partial_{t}}p_{Z}(z)=\nabla\cdot(\hat{\mathsf{A}}(z)p_{Z}(z))+\beta^{-1}\mathrm{tr}(H^{T}(\hat{\mathsf{D}}(z)p_{Z}(z))),

(184)

where $H$ is the Hessian matrix of second derivative operators, and we’ve defined

$\displaystyle\hat{\mathsf{A}}(z)$	$\displaystyle:=\int\left[p(x\|z)W\nabla E(x)-\beta^{-1}\Delta\xi(x)\right]dx$	(185)
	$\displaystyle=\int\left[p(x\|z)W\nabla E(x)\right]dx$	(186)
	$\displaystyle=-\hat{F}(z),$	(187)
$\displaystyle\hat{\mathsf{D}}(z)$	$\displaystyle:=\int p(x\|z)WW^{T}\,dx=I.$	(188)

We used Eq. 2.29 from duong2018quantification in Eq. 185, the linearity of $\xi$ in Eq. 186, and Eq. 85 in Eq. 187. We used Eq. 2.30 from duong2018quantification and the assumption that $WW^{T}=I$ in Eq. 188. It is easy to check that $\mathrm{tr}(H^{T}(Ip_{Z}))=\Delta p_{Z}$ ; combined with Eqs. 187, 188 and 184, this gives to Eq. 86. Since the right hand side of Eq. 86 does not depend on the microstate distribution, the coarse-grained dynamics are closed.

D.2 Derivation of Eq. 87

Our derivation below does not assume isothermal protocols, so the inequality in Eq. 87 holds both for isothermal protocols and for protocols connected to any number of thermodynamic reservoirs.

To derive this result for a given $L$ , we make two assumptions. First, as described in the main text, we assume that the coarse-grained dynamics are closed, Eq. 82. Second, we assume that the coarse-grained stationary distribution $\pi_{Z}$ (where $\pi$ is the stationary distribution of $L$ ), is invariant under conjugation of odd-parity variables,

\pi_{Z}(\xi(x))=\pi_{Z}(\xi(x^{\dagger}))\qquad\forall x\in X

(189)

where $x^{\dagger}$ indicate the conjugation of state $x$ in which all odd-parity variables (such as momentum) have their sign flipped. For an isothermal protocol, the stationary distributions are equilibrium distributions, and Eq. 189 is satisfied lee_fluctuation_2013 . For more general protocols, Eq. 189 holds if there are no odd-parity variables (e.g., overdamped dynamics), so $x=x^{\dagger}$ . It also holds if the coarse-graining function maps each $x$ and its conjugate to the same macrostate, $\xi(x)=\xi(x^{\dagger})$ , as well as some other cases.

Now imagine a system that starts from some initial distribution $p$ at time $t=0$ , and then undergoes free relaxation under $L$ towards a (possibly nonequilibrium) stationary distribution $\pi$ , reaching a final distribution $p^{\prime}$ by time $t=\tau$ . Next, we use existing results in stochastic thermodynamics esposito_three_2010 ; lee_fluctuation_2013 and write the EP incurred over time interval $t\in[0,\tau]$ as

\displaystyle\Sigma(\tau)=D(p(\bm{x},\bm{\nu})\|\tilde{p}(\tilde{\bm{x}}^{\dagger},\tilde{\bm{\nu}})),

(190)

(see also Section A.2), where:

1.

$\bm{x}=(x,\dots,x^{\prime})$ indicate a continuous-time trajectory of system states over time interval $t\in[0,\tau]$ , where $x$ and $x^{\prime}$ indicate the initial and final system states respectively, and $\tilde{\bm{x}}^{\dagger}=({x^{\prime}}^{\dagger},\dots,x^{\dagger})$ is the corresponding time-reversed and conjugated trajectory;
2.

$\bm{\nu}$ is a sequence of reservoirs which exchange conserved quantities with the system during $t\in[0,\tau]$ and $\tilde{\bm{\nu}}$ is the corresponding time-reversed sequence esposito_three_2010 ; van2010three ; esposito2010three ;
3.

$p(\bm{x},\bm{\nu})=P(\bm{x},\bm{\nu}|x)p(x)$ is the probability of forward trajectory $(\bm{x},\bm{\nu})$ given initial distribution $p$ , where $P(\bm{x},\bm{\nu}|x)$ is the conditional distribution generated by the free relaxation;
4.

$\tilde{p}(\tilde{\bm{x}}^{\dagger},\tilde{\bm{\nu}})=P(\tilde{\bm{x}}^{\dagger},\tilde{\bm{\nu}}|{x^{\prime}}^{\dagger})p^{\prime}(x^{\prime})$ is the probability of reverse trajectory $(\tilde{\bm{x}}^{\dagger},\tilde{\bm{\nu}})$ under a free relaxation that starts with the following distribution:

$\displaystyle p^{\prime}(x^{\prime})=\int P(x^{\prime}|x)p(x)dx.$ (191)

Using the fact that EP decreases under state-space and temporal coarse-graining esposito2012stochastic ; gomez2008cg , we bound Eq. 190 as

\displaystyle\Sigma(\tau)\geq D(p(\bm{x})\|p(\tilde{\bm{x}}^{\dagger}))\geq D(p(z,z^{\prime})\|\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})),

(192)

where $z=\xi(x)$ , $z^{\prime}=\xi(x^{\prime})$ , $z^{\dagger}=\xi(x^{\dagger})$ , and ${z^{\prime}}^{\dagger}=\xi({x^{\prime}}^{\dagger})$ . The final KL divergence can be decomposed as

D(p(z,z^{\prime})\|\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger}))=\left[D(p_{Z}\|\pi_{Z})-D({p_{Z}^{\prime}}\|\pi_{Z})\right]+\\ \int p(z,z^{\prime})\ln\left[\frac{p(z,z^{\prime})\pi_{Z}(z){p_{Z}^{\prime}}(z^{\prime})}{\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})p_{Z}(z)\pi_{Z}(z^{\prime})}\right]dz\,dz^{\prime}.

(193)

Using Jensen’s inequality, we lower bound the integral term as

	$\displaystyle\int p(z,z^{\prime})\ln\left[\frac{p(z,z^{\prime})\pi_{Z}(z){p_{Z}^{\prime}}(z^{\prime})}{\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})p_{Z}(z)\pi_{Z}(z^{\prime})}\right]dz\,dz^{\prime}$
	$\displaystyle\quad=-\int p(z,z^{\prime})\ln\left[\frac{\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})p_{Z}(z)\pi_{Z}(z^{\prime})}{p(z,z^{\prime})\pi_{Z}(z){p_{Z}^{\prime}}(z^{\prime})}\right]dz\,dz^{\prime}$
	$\displaystyle\quad\geq-\ln\left[\int\frac{\tilde{p}(z^{\dagger},{z^{\prime}}^{\dagger})p_{Z}(z)\pi_{Z}(z^{\prime})}{\pi_{Z}(z){p_{Z}^{\prime}}(z^{\prime})}dz\,dz^{\prime}\right].$		(194)

Note that $\pi_{Z}(z^{\prime})=\pi_{Z}({z^{\prime}}^{\dagger})$ by Eq. 189, and $\tilde{p}_{Z}({z^{\prime}}^{\dagger})={p_{Z}^{\prime}}(z^{\prime})$ by the definition of ${p_{Z}^{\prime}}$ in Eq. 191, allowing us to rewrite the RHS of Eq. 194 as

-\ln\left[\int\frac{p_{Z}(z)}{\pi_{Z}(z)}\left[\int\tilde{p}(z^{\dagger}|{z^{\prime}}^{\dagger})\pi_{Z}({z^{\prime}}^{\dagger})dz^{\prime}\right]dz\right].

(195)

The inner integral can be further rewritten as

	$\displaystyle\int\tilde{p}(z^{\dagger}\|{z^{\prime}}^{\dagger})\pi_{Z}({z^{\prime}}^{\dagger})dz^{\prime}$	$\displaystyle=\int P(z^{\dagger}\|{x^{\prime}}^{\dagger})\tilde{p}({x^{\prime}}^{\dagger}\|{z^{\prime}}^{\dagger})\pi_{Z}({z^{\prime}}^{\dagger})dx^{\prime}$
		$\displaystyle=\pi_{Z}(z^{\dagger})$
		$\displaystyle=\pi_{Z}(z),$

where in the second line we used the assumption of closed dynamics (Eq. 82) and the stationarity of $\pi$ under $P(\cdot|\cdot)$ , and in the third line we used Eq. 189. We can then rewrite Eq. 195 as

-\ln\left[\int\frac{\pi_{Z}(z)}{\pi_{Z}(z)}\pi_{Z}(z)\,dz\right]=0.

Combined with Eq. 194, this implies that the integral term in Eq. 193 is non-negative. Combining with Eq. 192 gives

\Sigma(\tau)\geq D(p_{Z}\|\pi_{Z})-D({p_{Z}^{\prime}}\|\pi_{Z}).

Finally, using the definition of the EP rate and the results above,

$\displaystyle\dot{\Sigma}(p,L)$	$\displaystyle:=\lim_{\tau\to 0}\frac{1}{\tau}\Sigma(\tau)$
	$\displaystyle\geq\lim_{\tau\to 0}\frac{1}{\tau}[D(p_{Z}\\|\pi_{Z})-D({p_{Z}^{\prime}}\\|\pi_{Z})]$
	$\displaystyle=-\int{\textstyle\partial_{t}}p_{Z}(t)(z)\ln\frac{p_{Z}(z)}{\pi_{Z}(z)}\,dz\geq 0,$	(196)

where ${\textstyle\partial_{t}}p_{Z}(t)=\hat{L}p_{Z}$ . Eq. 196 follows from Eqs. 107, 108, 109 and 110 above (with summations replaced by integrals). The discrete-state form of Eq. 196, and also where $p$ and $L$ are explicitly time-dependent, appears in the main text as Eq. 87.

	$\displaystyle D(p\\|\phi(p))+D(\phi(p)\\|\pi)$
	$\displaystyle\qquad\qquad-D(e^{\tau L}p\\|\phi(e^{\tau L}p))-D(\phi(e^{\tau L}p)\\|\pi)$
	$\displaystyle=D(p\\|\phi(p))-D(e^{\tau L}p\\|\phi(e^{\tau L}p))$
	$\displaystyle\qquad\qquad+D(\phi(p)\\|\pi)-D(\phi(e^{\tau L}p)\\|\pi)$
	$\displaystyle=D(p\\|\phi(p))-D(e^{\tau L}p\\|\phi(e^{\tau L}p))$
	$\displaystyle\qquad\qquad+D(\phi(p)\\|\pi)-D(e^{\tau L}\phi(p)\\|\pi),$

	$\displaystyle\dot{\Sigma}(p,L)$	$\displaystyle=\lim_{\tau\to 0}\frac{1}{\tau}\left[D(p\\|\phi(p))-D(e^{\tau L}p\\|\phi(e^{\tau L}p))\right]$
		$\displaystyle\quad+\lim_{\tau\to 0}\frac{1}{\tau}\left[D(\phi(p)\\|\pi)-D(e^{\tau L}\phi(p)\\|\pi)\right]$
		$\displaystyle=-{\textstyle\frac{d}{dt}}D(p(t)\\|\phi(p(t)))+\dot{\Sigma}(\phi(p),L).$

$\displaystyle\infty$	$\displaystyle>\int_{\Omega}p(x)\|f(x)\|\,dx$
	$\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(x)\|f(x)\|\,dx\right]d\mu(g)$	(122)
	$\displaystyle=\int_{\mathcal{G}}\left[\int_{g^{-1}(\Omega)}p(g(x))\|f(g(x))\|\,dx\right]d\mu(g)$
	$\displaystyle=\int_{\mathcal{G}}\left[\int_{\Omega}p(g(x))\|f(x)\|\,dx\right]d\mu(g)$	(123)

	$\displaystyle D(p\\|\phi_{\mathcal{G}}(q))=\int_{\mathrm{supp}\;p}p(x)\ln\frac{p(x)}{\phi_{\mathcal{G}}(q)(x)}dx$
	$\displaystyle\quad=D(p\\|\phi_{\mathcal{G}}(p))+\int_{\mathrm{supp}\;p}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx$
	$\displaystyle\quad=D(p\\|\phi_{\mathcal{G}}(p))+\int_{\mathrm{supp}\;{\phi_{\mathcal{G}}(p)}}p(x)\ln\frac{\phi_{\mathcal{G}}(p)(x)}{\phi_{\mathcal{G}}(q)(x)}dx,$		(126)

	$\displaystyle D(p_{X\|M}\\|\phi_{\mathcal{G}}(p_{X\|M}))=\sum_{m}p(m)D(p_{X\|m}\\|\phi_{\mathcal{G}}(p_{X\|m}))$
	$\displaystyle=\sum_{m}p(m,x)\ln\frac{p(x\|m)}{\int p(g(x)\|m)\mu g}$
	$\displaystyle=\sum_{m}p(m,x)\ln\frac{p(x)q(m\|x)/p(m)}{\int p(g(x))q(m\|g(x))/p_{g}(m)\,\mu(g)},$		(139)

Work, entropy production, and thermodynamics of information under protocol constraints

Abstract

I Introduction

I.1 Background

I.2 Summary of results and roadmap

Level 1: General mathematical framework

Level 2: Symmetry, modularity, and coarse-grained constraints

Level 3: Concrete examples

Literature review and discussion

II Preliminaries

III General framework

Theorem 1.

Theorem 2.

III.1 Choice of the ϕ\phi operator

III.2 Fluctuating entropy production

IV Thermodynamics of information under protocol constraints

V Symmetry constraints

V.1 Example: Szilard box with symmetry constraints

V.2 Example: Feedback control on the Ising model

VI Modularity constraints

VI.1 Example: Szilard box with modularity constraints

VI.2 Example: Generalized Szilard box

VI.3 Example: Collective flashing ratchet

VII Coarse-grained constraints

VII.1 Example: Szilard box

VIII Relevant literature

IX Discussion

Acknowledgments

References

Appendix A Derivations for Sections III and IV

A.1 Proofs of Theorems 1 and 2

Lemma 1.

Proof.

Lemma 2.

Proof.

Lemma 3.

Proof.

Proof of Theorem 1.

Proof of of Theorem 2.

A.2 Trajectory-level version of Eq. 19

Appendix B Symmetry constraints

B.1 ϕ𝒢\phi_{\mathcal{G}} obeys the Pythagorean identity, Eq. 14

Lemma 4.

Proof.

Lemma 5.

Proof.

Proposition 1.

Proof.

B.2 ϕ𝒢\phi_{\mathcal{G}} obeys the commutativity relation, Eq. 16

B.3 Derivation of Eq. 38 from Eq. 39 and Eq. 41

B.4 Derivation of Eq. 43

B.5 Example: Szilard box, derivation of Eq. 50

B.6 Example: Symmetry constraints on a discrete-state master equation

B.7 Example: 2D Ising model, derivation of Eq. 56

Appendix C Modularity constraints

C.1 ϕ𝒞\phi_{{\mathcal{C}}} obeys the Pythagorean identity, Eq. 14

C.2 ϕ𝒞\phi_{{\mathcal{C}}} commutes with eτ​Le^{\tau L}

C.2.1 Derivation of Eq. 150

C.2.2 Derivation of Eq. 151

C.2.3 Derivation of Eq. 152

C.3 Szilard box: derivation of Eqs. 74 and 76

C.4 Example: Feedback controlled flashing ratchet

Appendix D Coarse-grained constraints

D.1 Derivation of Eq. 82 from Eqs. 83 and 85

D.2 Derivation of Eq. 87

III.1 Choice of the $\phi$ operator

B.1 $\phi_{\mathcal{G}}$ obeys the Pythagorean identity, Eq. 14

B.2 $\phi_{\mathcal{G}}$ obeys the commutativity relation, Eq. 16

C.1 $\phi_{{\mathcal{C}}}$ obeys the Pythagorean identity, Eq. 14

C.2 $\phi_{{\mathcal{C}}}$ commutes with $e^{\tau L}$