This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Thermodynamic optimization of finite-time feedback protocols for Markov jump systems

Rihito Nagase Department of Applied Physics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan    Takahiro Sagawa Department of Applied Physics, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan Quantum-Phase Electronics Center (QPEC), The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
Abstract

In recent advances in finite-time thermodynamics, optimization of entropy production required for finite-time information processing is an important issue. In this work, we consider finite-time feedback processes in classical discrete systems described by Markov jump processes, and derive achievable bounds on entropy production for feedback processes controlled by Maxwell’s demons. The key ingredients of our approach is optimal transport theory and an achievable Fano’s inequality, by which we optimize the Wasserstein distance over final distributions under fixed consumed information. Our study reveals the minimum entropy production for consuming a certain amount of information, and moreover, the optimal feedback protocol to achieve it. These results are expected to lead to design principles for information processing in various stochastic systems with discrete states.

I Introduction

One of the key concepts in stochastic thermodynamics is entropy production [1, 2, 3], which quantifies entropy production accompanying irreversibility of dynamics. Its fundamental bound is imposed by the second law of thermodynamics, which can be achieved in the quasi-static limit requiring infinite time. Beyond the quasi-static regime, recent advancements in stochastic thermodynamics have established various bounds on entropy production by incorporating finite-time effects, represented by thermodynamic speed limits [4, 5, 6, 7, 8, 9, 10, 11] and thermodynamic uncertainty relations [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]. In particular, the speed limits based on optimal transport theory [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36] provide achievable bounds for any finite operation times, by explicitly identifying the optimal protocols.

Recently, there has been progress in applying these finite-time frameworks to thermodynamics of information [37]. In the finite-time regime, the speed limits and the thermodynamic uncertainty relations have been extended to situations incorporating information processing [38, 39, 40, 41, 42, 43, 44, 45, 46, 47]. In particular, optimal transport theory provides achievable bounds for the finite-time entropy production for various information processing including measurement, feedback and information erasure, when the initial and final probability distributions are fixed [38, 41, 44, 46, 47]. However, optimizing the finite-time entropy productions for a given amount of processed information has been addressed only for measurement processes [45], and the optimal entropy productions for consuming a given amount of information through feedback processes have not been elucidated.

In this study, we consider optimization of feedback processes in classical discrete systems obeying Markov jump processes. Specifically, we determine the optimal entropy production for fixed consuming mutual information |ΔI||\Delta I|. The optimization consists of two stages. The first stage is based on optimal transport theory, where we optimize entropy production over time-dependent protocols under fixed initial and identify final distributions [36]. In the second stage, we further optimize entropy production over final distributions under fixed consumed information |ΔI||\Delta I|, and obtain the achievable lower bounds on entropy production and the optimal final distributions. These are derived by exploiting an achievable Fano’s inequality [48], which bounds the conditional Shannon entropy by a function of error rate. Our results are relevant to experimental platforms such as single electron systems [49, 50, 51].

The organization of this paper is as follows. In Sec. II, we describe our setup along with a brief review of thermodynamics of information, optimal transport theory and thermodynamical speed limits. In Sec. III, we introduce an achievable Fano’s inequality as a mathematical tool, and describe our main mathematical Theorem, and provide its proof. In Sec. IV, we present two bounds on entropy production of feedback process as physical consequences of the main Theorem. We also identify the optimal protocols to achieve these bounds. In Sec. V, we numerically demonstrate our bounds and the optimal protocols by a coupled two-level system. In Sec. VI, we summarize the results of this paper and discuss future prospects.

II Setup

II.1 Thermodynamics of information

We consider a bipartite classical stochastic system consisting of subsystems XX and YY, which take discrete states. The entire system is in contact with a heat bath at inverse temperature β\beta, and its dynamics is described by a Markov jump process. System XX is the target of feedback control, while YY plays the role of Maxwell’s demon applying feedback operations to XX based on the measurement results on XX’s state.

The set of possible states for XX is 𝒳{1,2,,n}\mathcal{X}\coloneqq\{1,2,\dots,n\}, and for YY is 𝒴{1,2,,n}\mathcal{Y}\coloneqq\{1,2,\dots,n\}. Here y𝒴y\in\mathcal{Y} corresponds to the measurement result on x𝒳x\in\mathcal{X}. The joint state of the entire system XYXY is represented by the pair (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}, and the probability to find the system in state (x,y)(x,y) at time tt is denoted as ptXY(x,y)p_{t}^{XY}(x,y). The marginal probabilities for XX and YY are defined as ptX(x)=yptXY(x,y)p_{t}^{X}(x)=\sum_{y}p_{t}^{XY}(x,y) and ptY(y)=xptXY(x,y)p_{t}^{Y}(y)=\sum_{x}p_{t}^{XY}(x,y), respectively.

Since we focus on the feedback process, we assume that XX and YY are correlated at the initial time t=0t=0, as a consequence of the measurement performed beforehand. Here, we suppose that YY is a controller that stores and uses the information of the target system XX. Moreover, we make a simple assumption that the measurement is error-free. That is, the initial distribution is given of the form

p0XY(x,y)=pX(x)δx,y\displaystyle p_{0}^{XY}(x,y)=p^{X}(x)\delta_{x,y} (1)

with δx,y\delta_{x,y} being the Kronecker delta, which guarantees that x=yx=y holds with unit probability. During the feedback, the state yy of YY is assumed to remain unchanged, i.e., no transitions occur between different states in YY. That is, transitions from (x,y)(x^{\prime},y^{\prime}) to (x,y)(x,y) are prohibited if yyy\neq y^{\prime}. This assumption is reasonable for our classical stochastic processes, where the feedback operation on XX does not influence YY’s state itself [37]. Under this assumption, the marginal distribution ptY(y)p_{t}^{Y}(y) is fixed to a certain distribution pY(y)p^{Y}(y) throughout the process, and the initial joint distribution is given by p0XY(x,y)=δx,ypY(y)p_{0}^{XY}(x,y)=\delta_{x,y}p^{Y}(y).

The time evolution of the entire system during the feedback process, from t=0t=0 to t=τt=\tau, is described by the master equation:

ddtptXY(x,y)=x:(x,x)𝒩y\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}p_{t}^{XY}(x,y)=\sum_{x^{\prime}:(x^{\prime},x)\in\mathcal{N}_{y}} [RtX|y(x,x)ptXY(x,y)\displaystyle\Big{[}R_{t}^{X|y}(x,x^{\prime})p_{t}^{XY}(x^{\prime},y)
RtX|y(x,x)ptXY(x,y)],\displaystyle-R_{t}^{X|y}(x^{\prime},x)p_{t}^{XY}(x,y)\Big{]}, (2)

where RtX|y(x,x)R_{t}^{X|y}(x,x^{\prime}) is the transition rate from (x,y)(x^{\prime},y) to (x,y)(x,y) at time tt. This rate describes the feedback operation that YY applies to xx when the measurement result is yy. The set 𝒩y\mathcal{N}_{y} denotes the pairs of different XX’s states (x,x)(x^{\prime},x) that are allowed to transition when the measurement result is yy. These transitions are assumed to be bidirectional, meaning (x,x)𝒩y(x^{\prime},x)\in\mathcal{N}_{y} implies (x,x)𝒩y(x,x^{\prime})\in\mathcal{N}_{y}. The stochastic heat QtX|y(x,x)Q_{t}^{X|y}(x,x^{\prime}) absorbed by XX during a transition from (x,y)(x^{\prime},y) to (x,y)(x,y) satisfies the local detailed balance condition:

lnRtX|y(x,x)RtX|y(x,x)=βQtX|y(x,x).\ln\frac{R_{t}^{X|y}(x,x^{\prime})}{R_{t}^{X|y}(x^{\prime},x)}=-\beta Q_{t}^{X|y}(x,x^{\prime}). (3)

We next introduce mutual information, which represents the amount of information shared between XX and YY. The mutual information between XX and YY at time tt is defined as

ItX:YS(ptX)+S(ptY)S(ptXY),\displaystyle I_{t}^{X:Y}\coloneqq S(p_{t}^{X})+S(p_{t}^{Y})-S(p_{t}^{XY}), (4)

where S(p)S(p) is the Shannon entropy of the probability distribution pp. If XX and YY are uncorrelated, ItX:Y=0I_{t}^{X:Y}=0, while ItX:Y>0I_{t}^{X:Y}>0 otherwise. In our setup, with the assumptiuon (1), S(p0X)=S(p0Y)=S(p0XY)=S(pY)S(p_{0}^{X})=S(p_{0}^{Y})=S(p_{0}^{XY})=S(p^{Y}), yielding I0X:Y=S(pY)I_{0}^{X:Y}=S(p^{Y}), which is the maximum mutual information for fixed pYp^{Y}. This means that YY has fully acquired the information of XX at time t=0t=0.

We also introduce entropy production, which represents thermodynamic cost required for the feedback process. The entropy production from time 0 to τ\tau, denoted as ΣτXY\Sigma_{\tau}^{XY}, is defined using the total heat absorbed by XX up to time τ\tau, QτX0τx,x,yQtX|y(x,x)RtX|y(x,x)ptXY(x,y)dtQ_{\tau}^{X}\coloneqq\int_{0}^{\tau}\sum_{x,x^{\prime},y}Q_{t}^{X|y}(x,x^{\prime})R_{t}^{X|y}(x,x^{\prime})p_{t}^{XY}(x^{\prime},y)\,\mathrm{d}t, as

ΣτXYS(pτXY)S(p0XY)βQτX.\displaystyle\Sigma_{\tau}^{XY}\coloneqq S(p_{\tau}^{XY})-S(p_{0}^{XY})-\beta Q_{\tau}^{X}. (5)

Here, the first two terms represent the entropy change of system XYXY, and the third term represents the entropy change of the heat bath. Since YY does not undergo transitions, QτXQ_{\tau}^{X} equals the total heat absorbed by the system XYXY. Thus, ΣτXY\Sigma_{\tau}^{XY} represents the entropy change of the entire system including the heat bath, and describes dissipation due to irreversibility.

We here introduce the probability of a transition from state (x,y)(x^{\prime},y) to (x,y)(x,y) denoted as jtX|y(x,x)RtX|y(x,x)ptXY(x,y)j_{t}^{X|y}(x,x^{\prime})\coloneqq R_{t}^{X|y}(x,x^{\prime})p_{t}^{XY}(x^{\prime},y), and the probability current from state (x,y)(x^{\prime},y) to (x,y)(x,y) defined as JtX|y(x,x)=jtX|y(x,x)jtX|y(x,x)J_{t}^{X|y}(x,x^{\prime})=j_{t}^{X|y}(x,x^{\prime})-j_{t}^{X|y}(x^{\prime},x). We also define thermodynamic force for the transition from (x,y)(x^{\prime},y) to (x,y)(x,y) as

FtX|y(x,x)lnjtX|y(x,x)jtX|y(x,x).F_{t}^{X|y}(x,x^{\prime})\coloneqq\ln\frac{j_{t}^{X|y}(x,x^{\prime})}{j_{t}^{X|y}(x^{\prime},x)}. (6)

Then, the entropy production rate σtXYdΣtXY/dt\sigma_{t}^{XY}\coloneqq\mathrm{d}\Sigma_{t}^{XY}/\mathrm{d}t can be expressed as

σtXY=x>xJtX|y(x,x)FtX|y(x,x),\sigma_{t}^{XY}=\sum_{x>x^{\prime}}J_{t}^{X|y}(x,x^{\prime})F_{t}^{X|y}(x,x^{\prime}), (7)

which satisfies the second law of thermodynamics στXY0\sigma_{\tau}^{XY}\geq 0.

entropy production for subsystem XX, ΣτX\Sigma_{\tau}^{X}, is defined as

ΣτXS(pτX)S(p0X)βQτX.\displaystyle\Sigma_{\tau}^{X}\coloneqq S(p_{\tau}^{X})-S(p_{0}^{X})-\beta Q_{\tau}^{X}. (8)

Using this and the change in the mutual information ΔIτX:YIτX:YI0X:Y\Delta I_{\tau}^{X:Y}\coloneqq I_{\tau}^{X:Y}-I_{0}^{X:Y}, the entropy production can be decomposed as ΣτXY=ΣτXΔIτX:Y\Sigma_{\tau}^{XY}=\Sigma_{\tau}^{X}-\Delta I_{\tau}^{X:Y} [52]. In the present setup, since the maximum mutual information is stored at the initial time, we have ΔIτX:Y0\Delta I_{\tau}^{X:Y}\leq 0. Therefore, the decomposition can be rewritten as

ΣτXY=ΣτX+|ΔIτX:Y|.\displaystyle\Sigma_{\tau}^{XY}=\Sigma_{\tau}^{X}+|\Delta I_{\tau}^{X:Y}|. (9)

By substituting this decomposition into the second law, we obtain

ΣτX|ΔIτX:Y|.\displaystyle\Sigma_{\tau}^{X}\geq-\absolutevalue{\Delta I_{\tau}^{X:Y}}. (10)

This indicates that by consuming mutual information, it is possible to achieve a lower entropy production than that determined by the second law for the case where YY is absent, ΣτX0\Sigma_{\tau}^{X}\geq 0. We emphasize that the equality in (10) is achieved in the quasi-static limit, which requires infinite time [37].

II.2 Optimal transport theory and the speed limits

We next briefly overview optimal transport theory. A key insight of optimal transport theory is that the minimized transport cost, known as the Wasserstein distance, serves as a metric between distributions. Optimal transport theory finds applications in fields such as image processing [53], machine learning [54], and biology [55]. Applying it to thermodynamics reveals that the achievable speed limits for entropy productions can be expressed in terms of the Wasserstein distance between initial and final distributions, enabling the identification of optimal protocols that achieve the equality.

The Wasserstein distance between two probability distributions p0XYp_{0}^{XY} and pτXYp_{\tau}^{XY} is defined as

𝒲(p0XY,pτXY)minπΠ(p0XY,pτXY)x,x,yd(x,y),(x,y)π(x,y),(x,y),\displaystyle\mathcal{W}(p_{0}^{XY},p_{\tau}^{XY})\coloneqq\min_{\pi\in\Pi(p_{0}^{XY},p_{\tau}^{XY})}\sum_{x,x^{\prime},y}d_{(x,y),(x^{\prime},y)}\pi_{(x,y),(x^{\prime},y)}, (11)

where π(x,y),(x,y)(0)\pi_{(x,y),(x^{\prime},y)}(\geq 0) represents the probability transported from state (x,y)(x^{\prime},y) to state (x,y)(x,y). The collection of these probabilities for all states, π={π(x,y),(x,y)}x,x,y\pi=\{\pi_{(x,y),(x^{\prime},y)}\}_{x,x^{\prime},y}, is referred to as the transport plan. We denote by Π(p0XY,pτXY)\Pi(p_{0}^{XY},p_{\tau}^{XY}) the set of all transport plans that transform the initial distribution p0XYp_{0}^{XY} into the final distribution pτXYp_{\tau}^{XY}. This set satisfies the probability conservation conditions: x𝒳π(x,y),(x,y)=p0XY(x,y)\sum_{x\in\mathcal{X}}\pi_{(x,y),(x^{\prime},y)}=p_{0}^{XY}(x^{\prime},y)x𝒳π(x,y),(x,y)=pτXY(x,y)\sum_{x^{\prime}\in\mathcal{X}}\pi_{(x,y),(x^{\prime},y)}=p_{\tau}^{XY}(x,y). The coefficient d(x,y),(x,y)d_{(x,y),(x^{\prime},y)} is the minimum number of transitions required for moving from state (x,y)(x^{\prime},y) to state (x,y)(x,y) under the Markov jump process defined in Eq. (2), which determines the cost of transport per unit probability. Thus, the definition in Eq. (11) indicates that by fixing the initial distribution p0p_{0} and the final distribution pτp_{\tau}, the minimum total cost over all possible transport plans π\pi can be defined as the distance between the distributions. The transport plan π\pi that achieves this minimum cost is referred to as the optimal transport plan.

When probability distribution p0XYp_{0}^{XY} evolves into pτXYp_{\tau}^{XY} following Eq. (2) from time t=0t=0 to t=τt=\tau, the entropy production is bounded as [36]

ΣτXY𝒲(p0XY,pτXY)f(𝒲(p0XY,pτXY)Dτ),\Sigma_{\tau}^{XY}\geq\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right)f\left(\frac{\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right)}{D\tau}\right), (12)

where f(x)f(x) is a function determined by the choice of fixed timescale DD.

We introduce two quantities representing the timescale: activity and mobility. We define activity ata_{t} as

aty,xxjtX|y(x,x)=x>xatX|y(x,x).a_{t}\coloneqq\sum_{y,\ x\neq x^{\prime}}j_{t}^{X|y}(x,x^{\prime})=\sum_{x>x^{\prime}}a_{t}^{X|y}(x,x^{\prime}). (13)

Here, the local activity between states (x,y)(x,y) and (x,y)(x^{\prime},y) is defined as atX|y(x,x)jtX|y(x,x)+jtX|y(x,x)a_{t}^{X|y}(x,x^{\prime})\coloneqq j_{t}^{X|y}(x,x^{\prime})+j_{t}^{X|y}(x^{\prime},x), which corresponds to the number of jumps per unit time at time tt. Additionally, the time integral representing the total number of jumps from time 0 to τ\tau is expressed as Aτ0τatdtA_{\tau}\coloneqq\int_{0}^{\tau}a_{t}\,\mathrm{d}t, and its time average is given by aτAτ/τ\langle a\rangle_{\tau}\coloneqq A_{\tau}/\tau. We define mobility mtm_{t} as

mtx>xJtX|y(x,x)FtX|y(x,x).m_{t}\coloneqq\sum_{x>x^{\prime}}\frac{J_{t}^{X|y}(x,x^{\prime})}{F_{t}^{X|y}(x,x^{\prime})}. (14)

This represents the probability currents induced by the thermodynamic forces applied to the system. As well as activity, the time integral is expressed as Mτ0τmtdtM_{\tau}\coloneqq\int_{0}^{\tau}m_{t}\,\mathrm{d}t, and the time average is given by mτMτ/τ\langle m\rangle_{\tau}\coloneqq M_{\tau}/\tau.

Depending on whether DD corresponds to either of these two quantities, the function ff in the lower bound (12) changes. If DD represents the time-averaged mobility mτ\langle m\rangle_{\tau}, then f(x)=xf(x)=x; if DD represents the time-averaged activity aτ\langle a\rangle_{\tau}, then f(x)=2tanh1(x)f(x)=2\tanh^{-1}(x).

For either choice of DD, the optimal protocol {RtX|y(x,x)}\{R_{t}^{X|y}(x,x^{\prime})\} that achieves the equality in inequality (12) can be constructed by the condition that the probability is transported from p0XYp_{0}^{XY} to pτXYp_{\tau}^{XY} along the optimal transport plan with a uniform and constant thermodynamic force.

By substituting the decomposition (9) into inequality (12), we obtain the fundamental bound on energy for the fixed initial probability distribution p0XYp_{0}^{XY} and the final distribution pτXYp_{\tau}^{XY}

ΣτX|ΔIτX:Y|+𝒲(p0XY,pτXY)f(𝒲(p0XY,pτXY)Dτ).\displaystyle\Sigma_{\tau}^{X}\geq-\absolutevalue{\Delta I_{\tau}^{X:Y}}+\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right)f\left(\frac{\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right)}{D\tau}\right). (15)

The equality is achieved by the same protocol that achieves inequality (9).

The minimum entropy production for consuming a certain amount of information via feedback processes is not determined by inequality (15) alone. This is because, for a fixed consumed information |ΔIτX:Y||\Delta I_{\tau}^{X:Y}|, there exist infinitely many possible final distributions pτXYp_{\tau}^{XY}, which have different Wasserstein distance 𝒲(p0XY,pτXY)\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right). Therefore, we can further minimize the right-hand side of (15) which depends on 𝒲(p0XY,pτXY)\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right) over pτXYp_{\tau}^{XY} under fixed |ΔIτX:Y||\Delta I_{\tau}^{X:Y}|, and determine the truly minimum entropy production for consuming |ΔIτX:Y||\Delta I_{\tau}^{X:Y}|.

III Fundamental bounds on the consumed information

III.1 Fano’s inequality

To address the above problem, we first introduce the mathematical tool called Fano’s inequality. It enables us to evaluate the error that arises when a sender YY sends a message to a receiver XX through a classical communication channel. Let the set of messages to be transmitted and received be {1,2,,n}\{1,2,\dots,n\}. Due to the presence of errors in the communication channel, when the sender YY sends a message yy, the receiver XX may not always receive yy with probability 1. The performance of the communication channel is characterized by the conditional probability distribution pY|x(y)pXY(x,y)/pX(x)p^{Y|x}(y)\coloneqq p^{XY}(x,y)/p^{X}(x), which indicates the ambiguity about the original message yy given the received message xx.

Consider the case where the sender YY stochastically transmits messages according to a distribution pY(y)p^{Y}(y). Denote the joint probability that YY sends yy and XX receives xx as pXY(x,y)p^{XY}(x,y). Now we aim to evaluate the error probability Prob(xy)=xypXY(x,y)\mathcal{E}\coloneqq\mathrm{Prob}(x\neq y)=\sum_{x\neq y}p^{XY}(x,y)) based on the channel performance pY|x(y)p^{Y|x}(y). This evaluation is provided by Fano’s inequality, and the traditional form is given by

SY|Xln(1)ln(1)+ln(n1).S^{Y|X}\leq-\mathcal{E}\ln\mathcal{E}-(1-\mathcal{E})\ln(1-\mathcal{E})+\mathcal{E}\ln(n-1). (16)

Here, the left-hand side represents the conditional Shannon entropy SY|XxpX(x)S(pY|x)S^{Y|X}\coloneqq\sum_{x}p^{X}(x)S(p^{Y|x}), which quantifies the randomness of the communication channel’s performance pY|x(y)p^{Y|x}(y). The right-hand side is an increasing function of the error probability \mathcal{E}. Thus, Fano’s inequality indicates that the randomness SY|XS^{Y|X} of the channel’s performance leads to a higher error probability. The equality holds when the conditional probabilities pY|x(y)p^{Y|x}(y) satisfies

pY|x(y)={1,x=y,/(n1),xy.\displaystyle p^{Y|x}(y)=\begin{cases}1-\mathcal{E},&x=y,\\ \mathcal{E}/(n-1),&x\neq y.\end{cases} (17)

However, depending on the probability distribution pYp^{Y}, it might not be possible to construct pY|x(y)p^{Y|x}(y) in accordance with Eq. (17). Therefore, the traditional Fano inequality (16) is not generally achievable.

On this matter, a previous research [48] has introduced a tighter version of Fano’s inequality that can be achieved for any pYp^{Y}. Applying it to the above setup of communication channel, we can summarize the achievable Fano’s inequality as follows. For fixed pYp^{Y} and \mathcal{E}, there exists a probability distribution p~Y\tilde{p}_{\mathcal{E}}^{Y} (whose construction will be explained below) determined by pYp^{Y} and \mathcal{E}, such that

SY|XS(p~Y),S^{Y|X}\leq S\left(\tilde{p}_{\mathcal{E}}^{Y}\right), (18)

which is achievable for any pYp^{Y}.

Here, the construction of p~Y\tilde{p}_{\mathcal{E}}^{Y} is as follows:

Refer to caption
Fig. 1: The construction of p~Y\tilde{p}_{\mathcal{E}}^{Y}. (a) Histogram of pYp^{Y}, the original probability distribution of YY. (b) Histogram of pYp_{\downarrow}^{Y}, the probability distribution obtained by rearranging pYp^{Y} in descending order. (c) Histogram of p~Y\tilde{p}_{\mathcal{E}}^{Y}.
  1. (i)

    Let pYp_{\downarrow}^{Y} be the probability distribution obtained by rearranging pYp^{Y}(Fig. 1(a)) in descending order pY(1)pY(1)pY(n)p_{\downarrow}^{Y}(1)\geq p_{\downarrow}^{Y}(1)\geq\cdots\geq p_{\downarrow}^{Y}(n)(Fig. 1(b). If pY(1)1p_{\downarrow}^{Y}(1)\geq 1-\mathcal{E}, then p~Y\tilde{p}_{\mathcal{E}}^{Y} is defined as p~Y=pY\tilde{p}_{\mathcal{E}}^{Y}=p_{\downarrow}^{Y}. If pY(1)<1p_{\downarrow}^{Y}(1)<1-\mathcal{E}, then p~Y\tilde{p}_{\mathcal{E}}^{Y} is constructed according to the following steps.

  2. (ii)

    Define a function MM_{\mathcal{E}} that specifies a state of YY corresponding to \mathcal{E} (the construction of the function MM_{\mathcal{E}} will be provided in step (iii)). Then, p~Y\tilde{p}_{\mathcal{E}}^{Y} is defined as

    p~Y(y){1,y=1y=M+1npY(y)M1,2yMpY(y),M<yn.\tilde{p}_{\mathcal{E}}^{Y}(y)\coloneqq\begin{cases}1-\mathcal{E},&y=1\\ \frac{\mathcal{E}-\sum_{y=M_{\mathcal{E}}+1}^{n}p_{\downarrow}^{Y}(y)}{M_{\mathcal{E}}-1},&2\leq y\leq M_{\mathcal{E}}\\ p_{\downarrow}^{Y}(y),&M_{\mathcal{E}}<y\leq n.\end{cases} (19)

    The corresponding histogram is shown in Fig. 1(c).

  3. (iii)

    MM_{\mathcal{E}} is defined as the largest m{1,,n}m\in\{1,\cdots,n\} that satisfies

    y=m+1npY(y)m1<pY(m).\displaystyle\frac{\mathcal{E}-\sum_{y=m+1}^{n}p_{\downarrow}^{Y}(y)}{m-1}<p_{\downarrow}^{Y}(m). (20)

    When pY(1)<1p_{\downarrow}^{Y}(1)<1-\mathcal{E}, such MM_{\mathcal{E}} is uniquely determined in the range 2Mn2\leq M_{\mathcal{E}}\leq n. This can be interpreted as the largest MM_{\mathcal{E}} that ensures the probability distribution p~Y\tilde{p}_{\mathcal{E}}^{Y} [defined in Eq. (19)] is in descending order.

The achievable Fano’s inequality (18) can be regarded as an inequality that evaluates the upper bound of the conditional Shannon entropy SX|YS^{X|Y} when the error probability \mathcal{E} and the marginal distribution pYp^{Y} are fixed. Therefore, this inequality can be applied not only to the communication channel setting described above, but also to general joint systems XYXY characterized by any joint probability distribution pXY(x,y)p^{XY}(x,y).

III.2 Main theorem

To obtain the fundamental bounds on entropy production required to consume information by the feedback processes, we consider the minimization problem of the second term on the right-hand side of inequality (15) for a fixed |ΔIτX:Y||\Delta I_{\tau}^{X:Y}|. Since this term is a monotonically increasing function of 𝒲\mathcal{W} for both f(x)=xf(x)=x and f(x)=2tanh1(x)f(x)=2\tanh^{-1}(x), it suffices to minimize 𝒲(p0XY,pτXY)\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right).

Our main theorem is now stated as follows, which will be used to solve the above problem in the next section.

Theorem.

For a fixed pYp^{Y} and 𝒲=𝒲(p0XY,pτXY)\mathcal{W}=\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right),

|ΔIτX:Y|S(p~𝒲Y)\displaystyle|\Delta I_{\tau}^{X:Y}|\leq S\left(\tilde{p}_{\mathcal{W}}^{Y}\right) (21)

holds regardless of the structure of the entire system XYXY. p~𝒲Y\tilde{p}_{\mathcal{W}}^{Y} is determined by identifying =𝒲\mathcal{E}=\mathcal{W} in Eq. (19). The equality can be achieved by setting the final distribution to a certain distribution (the explicit form will be described in the following section) if the entire system XYXY satisfies the following condition (C):

  • (C)

    For all states yYy\in Y, the direct transition in system XX from the state x=yx=y to any other state x(y)x^{\prime}(\neq y) is possible. In other words,the direct transition from state (y,y)(y,y) to (x,y)(x^{\prime},y) is possible.

In our setup, we assumed that at t=0t=0, x=yx=y holds with probability 1 through an error-free measurement. Therefore, the condition (C) indicates that regardless of the measurement result yy, the direct transition from the initial state x=yx=y to any other state x(y)x^{\prime}(\neq y) is allowed in the feedback control. Inequality (21) provides an upper bound on the mutual information that can be consumed in a finite-time feedback process when the Wasserstein distance is fixed. Since the right-hand side, S(p~𝒲Y)S\left(\tilde{p}_{\mathcal{W}}^{Y}\right), is an increasing function of 𝒲\mathcal{W}, this implies that the larger the Wasserstein distance between the initial and final distributions, the more mutual information can be consumed by the feedback prosess.

This Theorem comprises two statements: first, that inequality (21) holds for any system XYXY which satisfies our setup; second, that if the system XYXY meets the condition (C), the equality in (21) can be achieved by setting pτXYp_{\tau}^{XY} to an optimal distribution. We give the proof of these two statements in the two following subsections respectively.

III.3 Proof of inequality (21)

We here prove inequality (21). By transforming Eq. (4), the mutual information can be rewritten as

ItX:Y=S(ptY)StY|X,I_{t}^{X:Y}=S\left(p_{t}^{Y}\right)-S_{t}^{Y|X},

where StY|XS_{t}^{Y|X} denotes the conditional Shannon entropy of YY given XX at time tt. In the present setting, ptY=pYp_{t}^{Y}=p^{Y} is fixed, and initially, x=yx=y with probability 1. Therefore, S0Y|X=0S_{0}^{Y|X}=0. Thus, we obtain

|ΔIτX:Y|\displaystyle\absolutevalue{\Delta I_{\tau}^{X:Y}} =SτY|X.\displaystyle=S_{\tau}^{Y|X}. (22)

Let the probability that xyx\neq y holds at time τ\tau be τxypτXY(x,y)\mathcal{E}_{\tau}\coloneqq\sum_{x\neq y}p_{\tau}^{XY}(x,y). From achievable Fano’s inequality (18), we obtain

|ΔIτX:Y|S(p~τY).\displaystyle|\Delta I_{\tau}^{X:Y}|\leq S\left(\tilde{p}_{\mathcal{E}_{\tau}}^{Y}\right). (23)

Here, S(p~τY)S\left(\tilde{p}_{\mathcal{E}_{\tau}}^{Y}\right) is an increasing function of τ\mathcal{E}_{\tau}, and τ𝒲\mathcal{E}_{\tau}\leq\mathcal{W} holds (the proofs are provided in the Appendix). Substituting these into inequality (23) yields the main Theorem (21).

III.4 Proof of the optimality

We prove that the equality in (21) can be achieved by setting pτXYp_{\tau}^{XY} to an optimal distribution if the entire system XYXY satisfies the condition (C). First, we will describe the method for constructing the optimal final distribution pτXYp_{\tau}^{XY}. Next, we will prove that this final distribution satisfies the constraints pτY=pYp_{\tau}^{Y}=p^{Y} and 𝒲(p0XY,pτXY)=𝒲\mathcal{W}(p_{0}^{XY},p_{\tau}^{XY})=\mathcal{W}. Finally, we will show that this final distribution achieves the equality in (21).

The optimal final distribution pτXYp_{\tau}^{XY} can be constructed as follows. Since the joint probability distribution of the entire system can be expressed as pτXY(x,y)=pτX(x)pτY|x(y)p_{\tau}^{XY}(x,y)=p_{\tau}^{X}(x)p_{\tau}^{Y|x}(y), it suffices to define pτX(x)p_{\tau}^{X}(x) and pτY|x(y)p_{\tau}^{Y|x}(y). Here, let σpY\sigma_{p^{Y}} denote the permutation that rearranges the states yy in descending order of the probability distribution pYp^{Y}. In other words, σpY(y)=n\sigma_{p^{Y}}(y)=n means that yy corresponds to the nn-th largest value of pY(y)p^{Y}(y). If there are states with equal probabilities, their ordering in σpY\sigma_{p^{Y}} can be chosen arbitrarily; once σpY\sigma_{p^{Y}} is defined, it must be fixed throughout the following discussion.

The optimal pτX(x)p_{\tau}^{X}(x) is then defined as

pτX(x)={pY(x)p~𝒲Y(2)p~𝒲Y(1)p~𝒲Y(2),1σpY(x)M𝒲,0,σpY(x)>M𝒲.\displaystyle p_{\tau}^{X}(x)=\begin{cases}\displaystyle\frac{p^{Y}(x)-\tilde{p}_{\mathcal{W}}^{Y}(2)}{\tilde{p}_{\mathcal{W}}^{Y}(1)-\tilde{p}_{\mathcal{W}}^{Y}(2)},\quad&1\leq\sigma_{p^{Y}}(x)\leq M_{\mathcal{W}},\\ 0,&\sigma_{p^{Y}}(x)>M_{\mathcal{W}}.\end{cases} (24)

From the definition of p~𝒲Y\tilde{p}_{\mathcal{W}}^{Y}, we have p~𝒲Y(1)=1𝒲\tilde{p}_{\mathcal{W}}^{Y}(1)=1-\mathcal{W} and p~𝒲Y(2)=𝒲y=M𝒲+1npY(y)M𝒲1\tilde{p}_{\mathcal{W}}^{Y}(2)=\frac{\mathcal{W}-\sum_{y=M_{\mathcal{W}}+1}^{n}p_{\downarrow}^{Y}(y)}{M_{\mathcal{W}}-1}. The optimal conditional probability distribution pτY|x(y)p_{\tau}^{Y|x}(y) is given as

pτY|x(y)={p~𝒲Y(1),y=x,p~𝒲Y(x),σpY(y)=1,p~𝒲Y(σpY(y)),otherwise.\displaystyle p_{\tau}^{Y|x}(y)=\begin{cases}\tilde{p}_{\mathcal{W}}^{Y}(1),&y=x,\\ \tilde{p}_{\mathcal{W}}^{Y}(x),&\sigma_{p^{Y}}(y)=1,\\ \tilde{p}_{\mathcal{W}}^{Y}(\sigma_{p^{Y}}(y)),&\mathrm{otherwise.}\end{cases} (25)
Refer to caption
Fig. 2: (a) Histogram of p~𝒲Y\tilde{p}_{\mathcal{W}}^{Y}. (b) Histogram of the probability distribution obtained by rearranging p~𝒲Y\tilde{p}_{\mathcal{W}}^{Y} such that the descending order of probability matches that in pYp^{Y}. (c) Histogram of the optimal conditional probability distribution pτY|xp_{\tau}^{Y|x}, constructed by swapping the probabilities of the state with the highest probability in p~𝒲Y\tilde{p}_{\mathcal{W}}^{Y} and the state y=xy=x.

This procedure for constructing the optimal probability distribution can also be visualized graphically as follows: begin with p~𝒲Y\tilde{p}_{\mathcal{W}}^{Y} (Fig. 2 (a)), which by definition is a probability distribution arranged in descending order along the states yy. Apply the permutation σpY1\sigma_{p^{Y}}^{-1} to p~𝒲Y\tilde{p}_{\mathcal{W}}^{Y} (Fig. 2 (b)), which rearranges the states such that the descending order of probabilities matches that of pYp^{Y}. Then, swap the probability of the state y=xy=x with the probability of the state with the largest probability (i.e., the state satisfying σpY(y)=1\sigma_{p^{Y}}(y)=1) (Fig. 2 (c)). From the second operation, the optimal conditional probability distribution pτY|xp_{\tau}^{Y|x} satisfies pτY|x(x)=p~𝒲Y(1)=1𝒲p_{\tau}^{Y|x}(x)=\tilde{p}_{\mathcal{W}}^{Y}(1)=1-\mathcal{W}.

We next show that its marginal distribution matches the fixed pYp^{Y}. For yy such that σpY(y)>M𝒲\sigma_{p^{Y}}(y)>M_{\mathcal{W}} holds, from Eq. (25), pτY|x(y)=p~𝒲Y(σpY(y))=pY(y)p_{\tau}^{Y|x}(y)=\tilde{p}_{\mathcal{W}}^{Y}(\sigma_{p^{Y}}(y))=p^{Y}(y) holds independently of xx, which yields xpτXY(x,y)=pY(y)\sum_{x}p_{\tau}^{XY}(x,y)=p^{Y}(y). For yy such that σpY(y)M𝒲\sigma_{p^{Y}}(y)\leq M_{\mathcal{W}} holds, from Eq. (25) and the definition of p~𝒲Y\tilde{p}_{\mathcal{W}}^{Y}, pτY|x(y)p_{\tau}^{Y|x}(y) takes the value p~𝒲Y(1)\tilde{p}_{\mathcal{W}}^{Y}(1) for y=xy=x and p~𝒲Y(2)\tilde{p}_{\mathcal{W}}^{Y}(2) for yxy\neq x. Therefore,

x=1npτXY(x,y)\displaystyle\sum_{x=1}^{n}p_{\tau}^{XY}(x,y) =x=1npτX(x)pτY|x(y)\displaystyle=\sum_{x=1}^{n}p_{\tau}^{X}(x)p_{\tau}^{Y|x}(y) (26)
=pτX(y)p~𝒲Y(1)+[1pτX(y)]p~𝒲Y(2)\displaystyle=p_{\tau}^{X}(y)\tilde{p}_{\mathcal{W}}^{Y}(1)+[1-p_{\tau}^{X}(y)]\tilde{p}_{\mathcal{W}}^{Y}(2) (27)
=pY(y).\displaystyle=p^{Y}(y). (28)

Here, we used Eq. (24) for the transformation from Eq. (27) to Eq. (28).

We next show that under the condition (C), the Wasserstein distance between the constructed pτXYp_{\tau}^{XY} and p0XYp_{0}^{XY} equals the fixed value 𝒲\mathcal{W}. When the condition (C) is satisfied, τ=𝒲(p0XY,pτXY)\mathcal{E}_{\tau}=\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right) holds (the proof is provided in the Appendix). Therefore, we obtain

𝒲(p0XY.pτXY)\displaystyle\mathcal{W}\left(p_{0}^{XY}.p_{\tau}^{XY}\right) =τ\displaystyle=\mathcal{E}_{\tau} (29)
=1x,y:x=ypτXY(x,y)\displaystyle=1-\sum_{x,y:x=y}p_{\tau}^{XY}(x,y) (30)
=1xpτX(x)pτY|x(x)\displaystyle=1-\sum_{x}p_{\tau}^{X}(x)p_{\tau}^{Y|x}(x) (31)
=1p~𝒲Y(1)\displaystyle=1-\tilde{p}_{\mathcal{W}}^{Y}(1) (32)
=𝒲.\displaystyle=\mathcal{W}. (33)

We finally show that the constructed pτXYp_{\tau}^{XY} is optimal, i.e., it achieves the equality in (21). From Eq. (25), since S(pτY|x)=S(p~𝒲Y)S\left(p_{\tau}^{Y|x}\right)=S\left(\tilde{p}_{\mathcal{W}}^{Y}\right) holds regardless of xx,

|ΔIτX:Y|\displaystyle\absolutevalue{\Delta I_{\tau}^{X:Y}} =SτY|X\displaystyle=S_{\tau}^{Y|X} (34)
=xpτX(x)S(pτY|x)\displaystyle=\sum_{x}p_{\tau}^{X}(x)S\left(p_{\tau}^{Y|x}\right) (35)
=S(p~𝒲Y)\displaystyle=S\left(\tilde{p}_{\mathcal{W}}^{Y}\right) (36)

holds. Therefore, the equality in (21) can be achieved.

IV Fundamental bound on entropy production

IV.1 Speed limit for fixed mobility

In the previous section, we considered the maximization problem for the consumed information |ΔIτX:Y||\Delta I_{\tau}^{X:Y}| for fixed 𝒲(p0XY,pτXY)\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right), whose solution is given by the main Theorem (21). Since the original problem of minimizing 𝒲(p0XY,pτXY)\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right) for fixed |ΔIτX:Y||\Delta I_{\tau}^{X:Y}| corresponds to the dual problem, its solution is directly derived by the main Theorem. From this dual solution, we can obtain the fundamental bound on entropy production required to consume |ΔIτX:Y||\Delta I_{\tau}^{X:Y}|.

First, the right-hand side of inequality (21), S(p~𝒲Y)S\left(\tilde{p}_{\mathcal{W}}^{Y}\right), is a strictly increasing function of 𝒲\mathcal{W} in the range 0𝒲1pY(1)0\leq\mathcal{W}\leq 1-p_{\downarrow}^{Y}(1), and takes a constant value S(pY)S\left(p^{Y}\right) for 𝒲1pY(1)\mathcal{W}\geq 1-p_{\downarrow}^{Y}(1) (the proof is provided in the Appendix). Therefore, given a fixed pYp^{Y}, the inverse function 𝒲pYmin:[0,S(pY)][0,1pY(1)]\mathcal{W}_{p^{Y}}^{\min}:[0,S\left(p^{Y}\right)]\to[0,1-p_{\downarrow}^{Y}(1)] can be defined as S(p~𝒲pYmin(I))=IS\left(\tilde{p}_{\mathcal{W}_{p^{Y}}^{\min}(I)}\right)=I. This function provides the minimum Wasserstein distance 𝒲(p0XY,pτXY)\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right) for the fixed amount of consumed information , |ΔIτX:Y||\Delta I_{\tau}^{X:Y}|.

Here, considering Eq. (15) with DD chosen as the time-averaged mobility mτ\langle m\rangle_{\tau}, the right-hand side becomes an increasing function of 𝒲(p0XY,pτXY)\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right). Therefore, minimizing 𝒲(p0XY,pτXY)\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right) is equivalent to minimizing entropy production Στ\Sigma_{\tau}. This allows us to derive the speed limit in the feedback process as

ΣτX|ΔIτX:Y|+𝒲pYmin(|ΔIτX:Y|)2τmτ,\displaystyle\Sigma_{\tau}^{X}\geq-\absolutevalue{\Delta I_{\tau}^{X:Y}}+\frac{{\mathcal{W}_{p^{Y}}^{\min}\left(\absolutevalue{\Delta I_{\tau}^{X:Y}}\right)}^{2}}{\tau\langle m\rangle_{\tau}}, (37)

which provides the minimum entropy production required to consume information |ΔIτX:Y||\Delta I_{\tau}^{X:Y}| through feedback processes keeping time-averaged mobility mτ\langle m\rangle_{\tau} fixed.

The equality in inequality (37) is achieved by simultaneously satisfying the equalities in inequalities (12) and (21). First, the equality in (21) is achieved by setting the final distribution pτXYp_{\tau}^{XY} to the form determined by Eqs. (24) and (25), while satisfying condition (C). Next, the equality in (12) is achieved by constructing a protocol {RtX|y(x,x)}t=0τ\{R_{t}^{X|y}(x,x^{\prime})\}_{t=0}^{\tau} that transports the probability between p0XYp_{0}^{XY} and pτXYp_{\tau}^{XY} along the optimal transport plan, with uniform and constant thermodynamic force. Consequently, inequality (37) is satisfied. This highlights the thermodynamic implication of our main Theorem.

IV.2 Speed limit for fixed activity

When we choose time-averaged activity aτ\langle a\rangle_{\tau} as DD, the second term on the right-hand side of inequality (12) becomes 2𝒲tanh1[𝒲/(τaτ)]2\mathcal{W}\tanh^{-1}[\mathcal{W}/(\tau\langle a\rangle_{\tau})]. Similarly to the case where mobility is fixed, this term is an increasing function of 𝒲\mathcal{W} for any fixed τ\tau and aτ\langle a\rangle_{\tau}. Therefore, by applying the main Theorem, we can derive another speed limit

ΣτX|ΔIτX:Y|+2𝒲pYmin(|ΔIτX:Y|)tanh1𝒲pYmin(|ΔIτX:Y|)τaτ,\displaystyle\Sigma_{\tau}^{X}\geq-\absolutevalue{\Delta I_{\tau}^{X:Y}}+2\mathcal{W}_{p^{Y}}^{\min}\left(\absolutevalue{\Delta I_{\tau}^{X:Y}}\right)\tanh^{-1}\frac{\mathcal{W}_{p^{Y}}^{\min}\left(\absolutevalue{\Delta I_{\tau}^{X:Y}}\right)}{\tau\langle a\rangle_{\tau}}, (38)

which provides the minimum entropy production required to consume the information |ΔIτX:Y|\absolutevalue{\Delta I_{\tau}^{X:Y}} in feedback control when time-averaged activity aτ\langle a\rangle_{\tau} is fixed.

The equality in inequality (38) is achieved by simultaneously achieving the equalities in inequalities (12) and (21). First, the equality in (21) is realized by setting the final distribution pτXYp_{\tau}^{XY} to the form determined by Eqs. (24) and (25), while satisfying condition (C). Next, the equality in (12) is achieved by constructing a protocol {RtX|y(x,x)}t=0τ\{R_{t}^{X|y}(x,x^{\prime})\}_{t=0}^{\tau} that transports the probability between the specified initial and final distributions p0XYp_{0}^{XY} and pτXYp_{\tau}^{XY} along the optimal transport plan, with uniform and constant thermodynamic force. Consequently, inequality (38) is achieved. This again highlights the thermodynamic implication of the main Theorem.

V Example: Coupled two level system

We demonstrate our main inequalities (37) and (38) by numerical simulation of a coupled system which can take two states, x=1,2x=1,2 and y=1,2y=1,2. Such a setup may be realized, for instance, by confining colloidal particles to two positions corresponding to 11 and 22 using an optical potential, with coupling mediated by Coulomb interactions. In this case, if the time evolution of the entire system is governed by Eq. (2), the controllable transition rates are given by the set {RtX|1(2,1),RtX|1(1,2),RtX|2(2,1),RtX|2(1,2)}\{R_{t}^{X|1}(2,1),R_{t}^{X|1}(1,2),R_{t}^{X|2}(2,1),R_{t}^{X|2}(1,2)\}. Throughout this section, we represent the probability distribution as a matrix ptXY=[ptXY(x,y)]x,yp_{t}^{XY}=\left[p_{t}^{XY}(x,y)\right]_{x,y}.

First, when fixing pY(1)=p(<1/2)p^{Y}(1)=p\ (<1/2), the initial distribution p0XYp_{0}^{XY} is given as

p0XY\displaystyle p_{0}^{XY} =[p001p].\displaystyle=\left[\begin{array}[]{cc}p&0\\ 0&1-p\end{array}\right]. (41)

The equality in the bound (37) is achieved when the equalities in both (12) and (21) are simultaneously satisfied. By setting 𝒲=𝒲pYmin(|ΔIτX:Y|)\mathcal{W}=\mathcal{W}_{p^{Y}}^{\mathrm{min}}\left(\absolutevalue{\Delta I_{\tau}^{X:Y}}\right) with respect to the consumed mutual information |ΔIτX:Y||\Delta I_{\tau}^{X:Y}|, the optimal final distribution that satisfies the equality in (21) is obtained as

pτXY\displaystyle p_{\tau}^{XY} =[p𝒲12𝒲(1𝒲)p𝒲12𝒲𝒲1p𝒲12𝒲𝒲1p𝒲12𝒲(1𝒲)].\displaystyle=\left[\begin{array}[]{cc}\frac{p-\mathcal{W}}{1-2\mathcal{W}}(1-\mathcal{W})&\frac{p-\mathcal{W}}{1-2\mathcal{W}}\mathcal{W}\\ \frac{1-p-\mathcal{W}}{1-2\mathcal{W}}\mathcal{W}&\frac{1-p-\mathcal{W}}{1-2\mathcal{W}}(1-\mathcal{W})\end{array}\right]. (44)

For this final distribution, the protocol that achieves the equality in (12) is realized by transporting probabilities along the optimal transport plan from p0XYp_{0}^{XY} to pτXYp_{\tau}^{XY} under a constant and uniform thermodynamic force FF. Given the protocol {RtX|y(x,x)}\{R_{t}^{X|y}(x,x^{\prime})\} (the detailed form of the protocol is provided in the Appendix), we can calculate entropy production ΣτX\Sigma_{\tau}^{X} for it. A numerical calculation of ΣτX\Sigma_{\tau}^{X} for each |ΔIτX:Y|\absolutevalue{\Delta I_{\tau}^{X:Y}} is shown as the orange line in Fig. 3 (a).

Refer to caption
Fig. 3: (a) Numerical results with fixed mobility. entropy production for non-optimal final distributions does not achieve the bound (37), whereas entropy production for optimal final distributions achieves it. The parameters used are p=0.75p=0.75, mτ=1\langle m\rangle_{\tau}=1, and τ=1\tau=1. (b) Numerical results with fixed activity. entropy production for non-optimal final distributions does not achieve the bound (38), whereas entropy production for optimal final distributions does achieve the bound (38). The parameters used are p=0.75p=0.75, aτ=1\langle a\rangle_{\tau}=1, and τ=1\tau=1.

Comparing this result with the right-hand side of (37) computed for each |ΔIτX:Y|\absolutevalue{\Delta I_{\tau}^{X:Y}} (shown as the green dotted line in Fig. 3 (a)), it is evident that this protocol achieves the equality in the bound (38). Notably, the fact that entropy production is negative indicates that the feedback using information enables achieving an entropy production smaller than that imposed by the conventional thermodynamic second law, ΣτX0\Sigma_{\tau}^{X}\geq 0. In other words, it implies the ability to extract work exceeding the negative of the change in nonequilibrium free energy.

Additionally, we also show examples of non-optimal protocols. Specifically, we consider the case where the equality in (12) is achieved but the equality in (21) is not. For instance, for the same initial distribution, constructing a final distribution of the form

pτXY=[pΔ0Δ1p],\displaystyle p_{\tau}^{XY}=\left[\begin{array}[]{cc}p-\Delta&0\\ \Delta&1-p\end{array}\right], (47)

by choosing Δ\Delta such that the mutual information change remains fixed at |ΔIτX:Y|\absolutevalue{\Delta I_{\tau}^{X:Y}} yields a non-optimal final distribution for consuming |ΔIτX:Y|\absolutevalue{\Delta I_{\tau}^{X:Y}}. For this distribution, numerical calculations of entropy production ΣτX\Sigma_{\tau}^{X} under a protocol {RtX|y(x,x)}\{R_{t}^{X|y}(x,x^{\prime})\} that achieves the equality in (12) (the detailed form of the protocol is provided in the Appendix) are shown as the gray line in Fig. 3(a). Since it does not achieve the equality in (37), it is evident that this is not an optimal feedback protocol for consuming |ΔIτX:Y|\absolutevalue{\Delta I_{\tau}^{X:Y}}.

We next numerically demonstrate the bound (38) for fixed activity. For the initial distribution given by (41), we can take an optimal final distribution given by (44) and a non-optimal final distribution given by (47). For these final distributions, the protocols that achieve the equality in (12) transport probabilities from p0XYp_{0}^{XY} to pτXYp_{\tau}^{XY} along the optimal transport plan under a uniform and constant thermodynamic force. When DD represents the time-averaged activity aτ\langle a\rangle_{\tau} fixed as aτ=A\langle a\rangle_{\tau}=A, maintaining a constant thermodynamic force corresponds to setting the probability current JtX|y(x,x)J_{t}^{X|y}(x,^{\prime}x) to a constant value JJ. Numerical calculations of entropy production ΣτX\Sigma_{\tau}^{X} for such a protocol, {RtX|y(x,x)}\{R_{t}^{X|y}(x,x^{\prime})\} (the detailed form of the protocol is provided in the Appendix), are shown in Fig. 3 (b). The orange line corresponds to the optimal final distribution, while the gray line corresponds to the non-optimal final distribution. Comparing these results with the right-hand side of (38) computed for each |ΔIτX:Y|\absolutevalue{\Delta I_{\tau}^{X:Y}} (shown as the green dotted line in Fig. 3(a)), it can be seen that the bound (38) is achieved for the optimal final distribution, while it is not achieved for the non-optimal final distribution.

VI Conclusion

In this study, we have derived the main Theorem (21) which provides the upper bound on the change in mutual information under the fixed error-free initial distribution, the fixed marginal distribution of the controller, and the fixed Wasserstein distance. The equality can be achieved under the condition (C) by setting the optimal final distribution. This Theorem gives the maximum change in mutual information that can be induced under a fixed minimum dissipation when the controller performs feedback control based on error-free measurement outcomes. Based on this Theorem, we have derived the speed limits (37) and (38) which give the lower bounds on entropy production for consuming a fixed amount of information through the feedback process. These bounds are achievable under the condition (C), and we have identified the optimal protocols. While a similar optimization problem for measurement processes has been solved in our previous work [45], the present work for feedback processes is based on largely different mathematical techniques including the achievable Fano’s inequality (16[48], and therefore separate treatments are required for measurement and feedback processes.

Our results theoretically advance thermodynamics of information by specifying the form of contribution of processed information to finite-time entropy production. The optimal feedback protocols identified in this study can be applied to a wide range of feedback processes described by Markov jump processes, which would be experimentally implemented by single electron devices [49, 50, 51], double quantum dots in the classical regime [56, 57], and effective discrete dynamics by Brownian nanoparticles [58]. Future directions also include extending our theoretical framework to continuous or quantum systems.

ACKNOWLEDGEMENTS

We thank Kosuke Kumasaki and Yosuke Mitsuhashi for valuable discussions. R.N. is supported by the World-leading Innovative Graduate Study Program for Materials Research, Industry, and Technology (MERIT-WINGS) of the University of Tokyo. T.S. is supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant No. JP19H05796, JST, CREST Grant No. JPMJCR20C1 and JST ERATO-FS Grant No. JPMJER2204. T.S. is also supported by the Institute of AI and Beyond of the University of Tokyo and JST ERATO Grant No. JPMJER2302, Japan.

Appendix A Derivation of the main Theorem and the speed limits

We provide the full proof of the main Theorem (21) and the speed limits (37) and (38).

A.1 Majorization

We introduce the concept of majorization, which is used to provide an upper bound on mutual information. For two probability distributions pXp^{X} and qXq^{X}, we denote that pXp^{X} majorizes qXq^{X}, or qXpXq^{X}\prec p^{X}, when they satisfy [60, 59]

qXpXx𝒳,x=1xqX(x)x=1xpX(x).q^{X}\prec p^{X}\Longleftrightarrow\forall x\in\mathcal{X},\ \sum_{x^{\prime}=1}^{x}q_{\downarrow}^{X}(x^{\prime})\leq\sum_{x^{\prime}=1}^{x}p_{\downarrow}^{X}(x^{\prime}). (48)

Here, pXp_{\downarrow}^{X} and qXq_{\downarrow}^{X} are the distributions obtained by rearranging pXp^{X} and qXq^{X} in descending order, respectively. When qXpXq^{X}\prec p^{X}, the following relationship holds for any convex function ff [60, 59]:

x𝒳f(qX(x))x𝒳f(pX(x)).\sum_{x\in\mathcal{X}}f\left(q^{X}(x)\right)\leq\sum_{x\in\mathcal{X}}f\left(p^{X}(x)\right). (49)

By setting f(x)=xlnxf(x)=x\ln x and taking the negative of the both sides, we obtain the following relation of the Shannon enropy:

qXpXS(qX)S(pX).q^{X}\prec p^{X}\implies S(q^{X})\geq S(p^{X}). (50)

A.2 Proof of the main Theorem

The proof of the main Theorem is structured into two propositions.

Proposition A.1.

𝒲(p0XY,pτXY).\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right)\geq\mathcal{E}.

Proof.

Since the Wasserstein distance is greater than or equal to the total variation distance, we obtain

𝒲(p0XY,pτXY)12x,y|pτXY(x,y)p0XY(x,y)|.\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right)\geq\frac{1}{2}\sum_{x,y}\absolutevalue{p^{XY}_{\tau}(x,y)-p^{XY}_{0}(x,y)}. (51)

Since the marginal distribution of YY is fixed as p0Y(y)=pτY(y)=pY(y)p_{0}^{Y}(y)=p_{\tau}^{Y}(y)=p^{Y}(y), the absolute value on the right-hand side can be calculated as

|pτXY(x,y)p0XY(x,y)|={pY(y)pτXY(y,y)x=y,pτXY(x,y)xy.\absolutevalue{p^{XY}_{\tau}(x,y)-p^{XY}_{0}(x,y)}=\begin{cases}p^{Y}(y)-p^{XY}_{\tau}(y,y)&x=y,\\ p^{XY}_{\tau}(x,y)&x\neq y.\end{cases} (52)

By using pY(y)pτXY(y,y)=x(y)pτXY(x,y)p^{Y}(y)-p^{XY}_{\tau}(y,y)=\sum_{x(\neq y)}p^{XY}_{\tau}(x,y), we get

12x,y|pτXY(x,y)p0XY(x,y)|=xypτXY(x,y)=.\displaystyle\frac{1}{2}\sum_{x,y}\absolutevalue{p^{XY}_{\tau}(x,y)-p^{XY}_{0}(x,y)}=\sum_{x\neq y}p^{XY}_{\tau}(x,y)=\mathcal{E}. (53)

Proposition A.2.

S(p~Y)S\left(\tilde{p}^{Y}_{\mathcal{E}}\right) is monotonically and strictly increasing in [0,1pY(1)]\mathcal{E}\in[0,1-p_{\downarrow}^{Y}(1)].

Proof.

Recalling the definition of MM_{\mathcal{E}}, if there exists a maximum m{1,2,,n}m\in\{1,2,\cdots,n\} that satisfies y=1mpY(y)(m1)pY(m)<1\sum_{y=1}^{m}p_{\downarrow}^{Y}(y)-(m-1)p_{\downarrow}^{Y}(m)<1-\mathcal{E}, this mm is defined as MM_{\mathcal{E}}. First, we prove that MM_{\mathcal{E}} decreases with respect to \mathcal{E}. Let f(m)y=1mpY(y)mpY(m)f(m)\coloneqq\sum_{y=1}^{m}p_{\downarrow}^{Y}(y)-mp_{\downarrow}^{Y}(m). For m>mm^{\prime}>m, we have

f(m)\displaystyle f(m^{\prime}) =y=1mpY(y)mpY(m)\displaystyle=\sum_{y=1}^{m^{\prime}}p_{\downarrow}^{Y}(y)-m^{\prime}p_{\downarrow}^{Y}(m^{\prime}) (54)
=y=1m[pY(y)pY(m)]\displaystyle=\sum_{y=1}^{m^{\prime}}\left[p_{\downarrow}^{Y}(y)-p_{\downarrow}^{Y}(m^{\prime})\right] (55)
y=1m[pY(y)pY(m)]\displaystyle\geq\sum_{y=1}^{m^{\prime}}\left[p_{\downarrow}^{Y}(y)-p_{\downarrow}^{Y}(m)\right] (56)
y=1m[pY(y)pY(m)]\displaystyle\geq\sum_{y=1}^{m}\left[p_{\downarrow}^{Y}(y)-p_{\downarrow}^{Y}(m)\right] (57)
=f(m).\displaystyle=f(m). (58)

Therefore, f(m)f(m) increases with respect to mm. Consequently, MM_{\mathcal{E}}, which is determined as the maximum mm satisfying f(m)<1f(m)<1-\mathcal{E}, decreases with respect to \mathcal{E}.

We next recall the definition of p~Y\tilde{p}_{\mathcal{E}}^{Y}:

p~Y={pY(y),σpY(y)>M,y=1MpY(y)(1)M12σpY(y)M1,σpY(y)=1.\displaystyle\tilde{p}_{\mathcal{E}}^{Y}=\begin{cases}p^{Y}(y),&\sigma_{p^{Y}}(y)>M_{\mathcal{E}},\\ \displaystyle\frac{\sum_{y=1}^{M_{\mathcal{E}}}p_{\downarrow}^{Y}(y^{\prime})-(1-\mathcal{E})}{M_{\mathcal{E}}-1}&2\leq\sigma_{p^{Y}}(y)\leq M_{\mathcal{E}}\\ 1-\mathcal{E},&\sigma_{p^{Y}}(y)=1.\end{cases} (59)

We note that the permutation σpY\sigma_{p^{Y}}, which rearranges pYp^{Y} in descending order, coincides with σp~Y\sigma_{\tilde{p}_{\mathcal{E}}^{Y}}, the permutation rearranging p~Y\tilde{p}_{\mathcal{E}}^{Y} in descending order. Now, consider >\mathcal{E}^{\prime}>\mathcal{E}. We aim to show that p~Yp~Y\tilde{p}^{Y}_{\mathcal{E}^{\prime}}\prec\tilde{p}^{Y}_{\mathcal{E}}, i.e.,

y,y=1yp~,Y(y)y=1yp~,Y(y).\displaystyle\forall y,\quad\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E},\downarrow}^{Y}(y^{\prime})\geq\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E}^{\prime},\downarrow}^{Y}(y^{\prime}). (60)

First, when yMy\geq M_{\mathcal{E}}, yMy\geq M_{\mathcal{E}^{\prime}} also holds. Therefore, we have

y=1yp~,Y(y)=y=1yp~,Y(y)=y=1ypY(y).\displaystyle\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E},\downarrow}^{Y}(y^{\prime})=\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E}^{\prime},\downarrow}^{Y}(y^{\prime})=\sum_{y^{\prime}=1}^{y}p_{\downarrow}^{Y}(y^{\prime}). (61)

When MyMM_{\mathcal{E}^{\prime}}\leq y\leq M_{\mathcal{E}},

y=1yp~,Y(y)\displaystyle\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E},\downarrow}^{Y}(y^{\prime}) =y=1MpY(y)(My)p~,Y(M),\displaystyle=\sum_{y^{\prime}=1}^{M_{\mathcal{E}}}p_{\downarrow}^{Y}(y^{\prime})-(M_{\mathcal{E}}-y)\tilde{p}_{\mathcal{E},\downarrow}^{Y}(M_{\mathcal{E}}), (62)
y=1yp~,Y(y)\displaystyle\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E}^{\prime},\downarrow}^{Y}(y^{\prime}) =y=1ypY(y)\displaystyle=\sum_{y^{\prime}=1}^{y}p_{\downarrow}^{Y}(y^{\prime}) (63)

holds, which yields

y=1yp~,Y(y)y=1yp~,Y(y)\displaystyle\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E},\downarrow}^{Y}(y^{\prime})-\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E}^{\prime},\downarrow}^{Y}(y^{\prime}) (64)
=y=y+1MpY(y)(My)p~,Y(M)\displaystyle=\sum_{y^{\prime}=y+1}^{M_{\mathcal{E}}}p_{\downarrow}^{Y}(y^{\prime})-(M_{\mathcal{E}}-y)\tilde{p}_{\mathcal{E},\downarrow}^{Y}(M_{\mathcal{E}}) (65)
=y=yM[pY(y)p~,Y(M)]\displaystyle=\sum_{y^{\prime}=y}^{M_{\mathcal{E}}}\left[p_{\downarrow}^{Y}(y^{\prime})-\tilde{p}_{\mathcal{E},\downarrow}^{Y}(M_{\mathcal{E}})\right] (66)
0.\displaystyle\geq 0. (67)

Additionally, when 2yM2\leq y\leq M_{\mathcal{E}^{\prime}},

y=1yp~,Y(y)\displaystyle\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E},\downarrow}^{Y}(y^{\prime}) =y=1MpY(y)(My)p~,Y(M),\displaystyle=\sum_{y^{\prime}=1}^{M_{\mathcal{E}}}p_{\downarrow}^{Y}(y^{\prime})-(M_{\mathcal{E}}-y)\tilde{p}_{\mathcal{E},\downarrow}^{Y}(M_{\mathcal{E}}), (68)
y=1yp~,Y(y)\displaystyle\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E}^{\prime},\downarrow}^{Y}(y^{\prime}) =y=1MpY(y)(My)p~,Y(M)\displaystyle=\sum_{y^{\prime}=1}^{M_{\mathcal{E}^{\prime}}}p_{\downarrow}^{Y}(y^{\prime})-(M_{\mathcal{E}^{\prime}}-y)\tilde{p}_{\mathcal{E}^{\prime},\downarrow}^{Y}(M_{\mathcal{E}^{\prime}}) (69)

holds. Therefore,

y=1yp~,Y(y)y=1yp~,Y(y)\displaystyle\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E},\downarrow}^{Y}(y^{\prime})-\sum_{y^{\prime}=1}^{y}\tilde{p}_{\mathcal{E}^{\prime},\downarrow}^{Y}(y) (71)
=y=M+1M[pY(y)p~,Y(M)]\displaystyle=\sum_{y^{\prime}=M_{\mathcal{E}^{\prime}}+1}^{M_{\mathcal{E}}}\left[p_{\downarrow}^{Y}(y^{\prime})-\tilde{p}_{\mathcal{E},\downarrow}^{Y}(M_{\mathcal{E}})\right]
+y=2M[p~,Y(M)p~,Y(M)]\displaystyle\hskip 50.0pt+\sum_{y^{\prime}=2}^{M_{\mathcal{E}^{\prime}}}\left[\tilde{p}_{\mathcal{E}^{\prime},\downarrow}^{Y}(M_{\mathcal{E}^{\prime}})-\tilde{p}_{\mathcal{E},\downarrow}^{Y}(M_{\mathcal{E}})\right] (72)
0.\displaystyle\geq 0. (73)

From the above, it follows that p~Yp~Y\tilde{p}^{Y}_{\mathcal{E}^{\prime}}\prec\tilde{p}^{Y}_{\mathcal{E}}. Moreover, by definition, when \mathcal{E}^{\prime}\neq\mathcal{E}, p~Yp~Y\tilde{p}^{Y}_{\mathcal{E}^{\prime}}\neq\tilde{p}^{Y}_{\mathcal{E}} holds. ∎

From Propositions A.1 and A.2, the upper bound (21) on the consumed mutual information is obtained when 𝒲(p0XY,pτXY)=𝒲\mathcal{W}\left(p_{0}^{XY},p_{\tau}^{XY}\right)=\mathcal{W} is fixed.

A.3 Formalism by Lorenz curves

While p~Y\tilde{p}_{\mathcal{E}}^{Y} was constructed using histograms of probability distributions in Sec.III, this construction can also be explained using Lorenz curves. A Lorenz curve is a polyline uniquely associated with a probability distribution and is constructed as follows [59]. First, arrange the probability distribution pYp^{Y} in descending order and obtain pYp_{\downarrow}^{Y}. Next, plot the cumulative probability y=1ypY(y)\sum_{y^{\prime}=1}^{y}p_{\downarrow}^{Y}(y^{\prime}) as a function of state yy. The polyline obtained by connecting these points in addition to the origin (0,0)(0,0) is the Lorenz curve for the probability distribution pYp^{Y}.

The Lorenz curve visualizes the concentration of probabilities in the distribution. This can be understood as follows: consider two probability distributions pYp^{Y} and qYq^{Y}, and assume that the Lorenz curve of pYp^{Y} lies above that of qYq^{Y}. That is, for all yy, y=1ypY(y)y=1yqY(y)\sum_{y^{\prime}=1}^{y}p_{\downarrow}^{Y}(y^{\prime})\geq\sum_{y^{\prime}=1}^{y}q_{\downarrow}^{Y}(y^{\prime}). Then, from the definition of majorization introduced in Appendix A1, pYqYp^{Y}\succ q^{Y} holds, which implies S(pY)S(qY)S(p^{Y})\leq S(q^{Y}). Therefore, a probability distribution with a Lorenz curve higher on the graph has smaller Shannon entropy and is more concentrated.

Using this property of Lorenz curves, the construction of p~\tilde{p}_{\mathcal{E}} can be intuitively explained. First, if 1<pY(1)1-\mathcal{E}<p_{\downarrow}^{Y}(1), p~=pY\tilde{p}_{\mathcal{E}}=p^{Y}. For 1pY(1)1-\mathcal{E}\geq p_{\downarrow}^{Y}(1), first draw the Lorenz curve of pYp^{Y} (the black line in Fig. 4 (a)). Next, extend a half-line upward and to the right from point P1\mathrm{P}_{1} defined by coordinate (1,1)(1,1-\mathcal{E}) (the red point in Fig. 4 (a)), and rotate it around P1\mathrm{P}_{1} to approach the Lorenz curve of pYp^{Y} from above (the red dotted line in Fig. 4 (a)). When the half-line touches the Lorenz curve of pYp^{Y}, stop the rotation, and call the point of contact P2\mathrm{P}_{2} (Fig. 4 (b)). The state yy corresponding to point P2\mathrm{P}_{2} is MM_{\mathcal{E}} as defined in Sec. III. Finally, connect the three points (origin, P1\mathrm{P}_{1}, and P2\mathrm{P}_{2}) with polyline, and extend the curve beyond P2\mathrm{P}_{2} along the Lorenz curve of pYp^{Y}. The probability distribution corresponding to this modified Lorenz curve (the red line in Fig. 4 (c)) is defined as p~\tilde{p}_{\mathcal{E}}.

Refer to caption
Fig. 4: (a) The Lorenz curve of pYp^{Y} and the half-line extended from point P1\mathrm{P}_{1} with coordinate (1,1)(1,1-\mathcal{E}). (b) The construction of point P2\mathrm{P}_{2}, where the half-line from P1\mathrm{P}_{1} touches the Lorenz curve of pYp^{Y}. (c) The construction of the Lorenz curve for p~\tilde{p}_{\mathcal{E}}.

Appendix B Optimal protocol for a two level system

In the example of the two-level system discussed in Sec. V, the optimal transport from the initial distribution defined by Eq. (41) to the optimal final distribution defined by Eq. (44), as well as to the non-optimal final distribution defined by Eq. (47), was constructed. To provide a general method for constructing these protocols, we present the protocol for optimal transport from the initial distribution defined by Eq. (41) to the final distribution defined by

pτXY=[pΔ1Δ2Δ11pΔ2].\displaystyle p_{\tau}^{XY}=\left[\begin{array}[]{cc}p-\Delta_{1}&\Delta_{2}\\ \Delta_{1}&1-p-\Delta_{2}\end{array}\right]. (76)

B.1 For fixed mobility

The protocol that achieves the fundamental limit (37) for fixed mobility transports probabilities along the optimal transport plan from the initial distribution to the final distribution under a constant and uniform thermodynamic force FF. The optimal transport plan is to transport a probability Δ1\Delta_{1} from state (1,1)(1,1) to (2,1)(2,1) and a probability Δ2\Delta_{2} from state (2,2)(2,2) to (1,2)(1,2).

To implement this using a constant thermodynamic force FF by controlling the transition rates in a Markov jump process, the time interval [0,τ][0,\tau] is divided into two intervals: t[0,τ/2]t\in[0,\tau/2] and t[τ/2,τ]t\in[\tau/2,\tau]. In individual intervals, the transports of Δ1\Delta_{1} and Δ2\Delta_{2} are performed.

First, in the interval t[0,τ/2]t\in[0,\tau/2], only the transport of probability Δ1\Delta_{1} from state (1,1)(1,1) to (2,1)(2,1) is carried out. An example of the time evolution implemented in this interval is

ptXY(2,1)\displaystyle p_{t}^{XY}(2,1) =Δ1tτ/2,\displaystyle=\frac{\Delta_{1}t}{\tau/2}, (77)
ptXY(1,1)\displaystyle p_{t}^{XY}(1,1) =pΔ1tτ/2,\displaystyle=p-\frac{\Delta_{1}t}{\tau/2}, (78)

while keeping the probabilities ptXY(2,1)p_{t}^{XY}(2,1) and ptXY(2,2)pt^{XY}(2,2). To implement this under a constant thermodynamic force FF, the transition rates RtX|1(x,x)R_{t}^{X|1}(x,x^{\prime}) should be set as

RtX|1(2,1)\displaystyle R_{t}^{X|1}(2,1) =Δ1τ/211eF1pΔ1tτ/2,\displaystyle=\frac{\Delta_{1}}{\tau/2}\frac{1}{1-e^{-F}}\frac{1}{p-\frac{\Delta_{1}t}{\tau/2}}, (79)
RtX|1(1,2)\displaystyle R_{t}^{X|1}(1,2) =eF1eF1t.\displaystyle=\frac{e^{-F}}{1-e^{-F}}\frac{1}{t}. (80)

The remaining transition rates are set to RtX|2(x,x)=0R_{t}^{X|2}(x,x^{\prime})=0.

In the interval t[τ/2,τ]t\in[\tau/2,\tau], the probability transport of Δ2\Delta_{2} from state (2,2)(2,2) to (1,2)(1,2) is performed. The example of the time evolution to be implemented in this case is

ptXY(1,2)\displaystyle p_{t}^{XY}(1,2) =Δ2tτ/2,\displaystyle=\frac{\Delta_{2}t}{\tau/2}, (81)
ptXY(1,1)\displaystyle p_{t}^{XY}(1,1) =1pΔ2tτ/2\displaystyle=1-p-\frac{\Delta_{2}t}{\tau/2} (82)

with ptXY(1,1)p_{t}^{XY}(1,1) and ptXY(2,1)p_{t}^{XY}(2,1) kept constant. To implement this under a constant and uniform thermodynamic force FF, the transition rates RtX|2(x,x)R_{t}^{X|2}(x,x^{\prime}) should be set as

RtX|2(1,2)\displaystyle R_{t}^{X|2}(1,2) =Δ2τ/211eF11pΔ2tτ/2,\displaystyle=\frac{\Delta_{2}}{\tau/2}\frac{1}{1-e^{-F}}\frac{1}{1-p-\frac{\Delta_{2}t}{\tau/2}}, (83)
RtX|2(2,1)\displaystyle R_{t}^{X|2}(2,1) =eF1eF1t.\displaystyle=\frac{e^{-F}}{1-e^{-F}}\frac{1}{t}. (84)

The remaining transition rates are set to RtX|1(x,x)=0R_{t}^{X|1}(x,x^{\prime})=0. Moreover, the entropy production generated by the entire protocol over the two intervals is given by

ΣτXY\displaystyle\Sigma_{\tau}^{XY} =0τdtxx,yJtX|y(x,x)FtX|y(x,x)\displaystyle=\int_{0}^{\tau}\mathrm{d}t\sum_{x\neq x^{\prime},y}J_{t}^{X|y}(x,x^{\prime})F_{t}^{X|y}(x,x^{\prime}) (85)
=(Δ1+Δ2)F\displaystyle=(\Delta_{1}+\Delta_{2})F (86)
=𝒲(p0XY,pτXY)2mτ,\displaystyle=\frac{\mathcal{W}(p_{0}^{XY},p_{\tau}^{XY})^{2}}{\langle m\rangle_{\tau}}, (87)

which achieves the bound (37). The orange solid line and gray solid line representing entropy production ΣτX\Sigma_{\tau}^{X} in Fig. 3 (a) are plotted using ΣτXY\Sigma_{\tau}^{XY} calculated from Eq. (85), with ΣτX=ΣτXY|ΔIτX:Y|\Sigma_{\tau}^{X}=\Sigma_{\tau}^{XY}-\absolutevalue{\Delta I_{\tau}^{X:Y}}.

B.2 For fixed activity

Next, we present the protocol that achieves the fundamental bound (38) for fixed activity. This protocol transports probabilities along the optimal transport plan from the initial distribution to the final distribution under a constant and uniform activity AA and probability current JJ. The optimal transport plan, as well as the case of fixed mobility, transports a probability Δ1\Delta_{1} from state (1,1)(1,1) to (2,1)(2,1) and a probability Δ2\Delta_{2} from state (2,2)(2,2) to (1,2)(1,2).

To achieve this under a constant activity AA and probability current J=(Δ1+Δ2)/τJ=(\Delta_{1}+\Delta_{2})/\tau (which is uniquely determined by the fixed time τ\tau and the initial and final distributions), the time interval [0,τ][0,\tau] is divided into two intervals t[0,Δ1Δ1+Δ2τ]t\in\left[0,\frac{\Delta_{1}}{\Delta_{1}+\Delta_{2}}\tau\right] and t[Δ1Δ1+Δ2τ,τ]t\in\left[\frac{\Delta_{1}}{\Delta_{1}+\Delta_{2}}\tau,\tau\right]. In individual intervals, the transports of Δ1\Delta_{1} and Δ2\Delta_{2} are performed.

First, in the interval t[0,Δ1Δ1+Δ2τ]t\in\left[0,\frac{\Delta_{1}}{\Delta_{1}+\Delta_{2}}\tau\right], the transport of a probability Δ1\Delta_{1} from state (1,1)(1,1) to (2,1)(2,1) is performed. The time evolution to be implemented is

ptXY(2,1)=(Δ1+Δ2)tτ,ptXY(1,1)=p(Δ1+Δ2)tτ\displaystyle p_{t}^{XY}(2,1)=\frac{(\Delta_{1}+\Delta_{2})t}{\tau},\quad p_{t}^{XY}(1,1)=p-\frac{(\Delta_{1}+\Delta_{2})t}{\tau} (88)

with ptXY(2,1)p_{t}^{XY}(2,1) and ptXY(2,2)p_{t}^{XY}(2,2) kept constant. To implement this under a constant activity AA and probability current J=(Δ1+Δ2)/τJ=(\Delta_{1}+\Delta_{2})/\tau, the transition rates RtX|1(x,x)R_{t}^{X|1}(x,x^{\prime}) should be set as

RtX|1(2,1)\displaystyle R_{t}^{X|1}(2,1) =A+J2(pJt),\displaystyle=\frac{A+J}{2(p-Jt)}, (89)
RtX|1(1,2)\displaystyle R_{t}^{X|1}(1,2) =AJ2Jt.\displaystyle=\frac{A-J}{2Jt}. (90)

The remaining transition rates are set to RtX|2(x,x)=0R_{t}^{X|2}(x,x^{\prime})=0.

In the interval t[Δ1Δ1+Δ2τ,τ]t\in\left[\frac{\Delta_{1}}{\Delta_{1}+\Delta_{2}}\tau,\tau\right], the transport of a probability Δ2\Delta_{2} from state (2,2)(2,2) to (1,2)(1,2) is conducted. The time evolution to be implemented in this interval is

ptXY(1,2)\displaystyle p_{t}^{XY}(1,2) =(Δ1+Δ2)tτ,\displaystyle=\frac{(\Delta_{1}+\Delta_{2})t}{\tau}, (91)
ptXY(1,1)\displaystyle p_{t}^{XY}(1,1) =1p(Δ1+Δ2)tτ,\displaystyle=1-p-\frac{(\Delta_{1}+\Delta_{2})t}{\tau}, (92)

with ptXY(1,1)p_{t}^{XY}(1,1) and ptXY(2,1)p_{t}^{XY}(2,1) kept constant. To implement this under a constant activity AA and probability current J=(Δ1+Δ2)/τJ=(\Delta_{1}+\Delta_{2})/\tau, the transition rates RtX|2(x,x)R_{t}^{X|2}(x,x^{\prime}) should be set as

RtX|2(1,2)\displaystyle R_{t}^{X|2}(1,2) =A+J2(pJt),\displaystyle=\frac{A+J}{2(p-Jt)}, (93)
RtX|2(2,1)\displaystyle R_{t}^{X|2}(2,1) =AJ2Jt.\displaystyle=\frac{A-J}{2Jt}. (94)

The remaining transition rates are set to RtX|1(x,x)=0R_{t}^{X|1}(x,x^{\prime})=0. The total entropy production generated by the entire protocol is calculated as

ΣτXY\displaystyle\Sigma_{\tau}^{XY} =0τdtxx,yJtX|y(x,x)FtX|y(x,x)\displaystyle=\int_{0}^{\tau}\mathrm{d}t\sum_{x\neq x^{\prime},y}J_{t}^{X|y}(x,x^{\prime})F_{t}^{X|y}(x,x^{\prime}) (95)
=2(Δ1+Δ2)tanh1(Δ1+Δ2Aτ)\displaystyle=2(\Delta_{1}+\Delta_{2})\tanh^{-1}\left(\frac{\Delta_{1}+\Delta_{2}}{A\tau}\right) (96)
=𝒲(p0XY,pτXY)tanh1𝒲(p0XY,pτXY)aτ,\displaystyle=\mathcal{W}(p_{0}^{XY},p_{\tau}^{XY})\tanh^{-1}\frac{\mathcal{W}(p_{0}^{XY},p_{\tau}^{XY})}{\langle a\rangle_{\tau}}, (97)

which achieves the bound (38). The orange solid line and gray solid line representing entropy production ΣτX\Sigma_{\tau}^{X} in Fig. 3 (b) are plotted using ΣτXY\Sigma_{\tau}^{XY} calculated from Eq. (95), with ΣτX=ΣτXY|ΔIτX:Y|\Sigma_{\tau}^{X}=\Sigma_{\tau}^{XY}-\absolutevalue{\Delta I_{\tau}^{X:Y}}.

References