This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Mathematics of the Ensemble Theory

Xiang Gao (高翔) [email protected] 6889 Rochelle Ave, Newark, CA 94560, United States
Abstract

This study shows that the generalized Boltzmann distribution is the only distribution mathematically consistent with thermodynamics when the system is described by an ensemble of a certain mathematical form. This mathematical form is very general, such that the canonical, grand-canonical, or isothermal-isobaric ensemble theories are all special cases of this form. Compared with the standard textbook formalism of the statistical mechanics (SM), this approach does not require a prior distribution, does not assume the functional form or maximization of entropy, and employs fewer assumptions. Therefore, this new insight challenges the belief on the requirement of a prior distribution in SM and provides a new way to derive the Boltzmann distribution. This study also reveals the logical and mathematical constraints of SM’s fundamental components; therefore, it could potentially benefit researchers on non-Boltzmann-Gibbs SM and philosophers studying the foundations of SM.

I Introduction

The statistical mechanics(SM) dates back to BoltzmannBoltzmann (2012) and GibbsGibbs (1902) and has become the core of modern physics for describing matters and radiations. The role that SM plays in modern physics is to fill the gap between thermodynamics and microscopic theories (classical or quantum mechanics). Although it is widely believed that given the current microscopic state of a system, it is possible to predict the microscopic state of that system at any time in the future, there is no known method to derive thermodynamics from only microscopic theories. Indeed, even defining concepts in thermodynamics is already very tricky:

Consider an isolated system 𝒮\mathcal{S} with NN particles whose dynamics are given by classical mechanics. Alice has some superpower to know the positions q1,,q3Nq_{1},\ldots,q_{3N} and momentums p1,,p3Np_{1},\ldots,p_{3N} of all the NN particles in 𝒮\mathcal{S}; that is, Alice knows at which exact point of the Γ\Gamma space the system currently is. How should Alice compute the volume, pressure, temperature, internal energy, entropy, etc., as a function Γ\Gamma\to\mathbb{R} of q1,,q3N,p1,,p3Nq_{1},\ldots,q_{3N},p_{1},\ldots,p_{3N}? How should Alice judge whether this system is in equilibrium or not, as a function Γ{true,false}\Gamma\to\left\{\text{true},\text{false}\right\} of q1,,q3N,p1,,p3Nq_{1},\ldots,q_{3N},p_{1},\ldots,p_{3N}? How can Alice derive the second law of thermodynamics S(q1,,q3N,p1,,p3N)S(q1,,q3N,p1,,p3N)S\left(q_{1},\ldots,q_{3N},p_{1},\ldots,p_{3N}\right)\leq S\left(q^{\prime}_{1},\ldots,q^{\prime}_{3N},p^{\prime}_{1},\ldots,p^{\prime}_{3N}\right) for t>tt^{\prime}>t, from classical mechanics?

The ensemble theory was invented to overcome these difficulties. Instead of studying the system at a single microscopic state, the ensemble theory considers the system’s microscopic state as unknown and employs a probability density function to describe each state’s probability. It is widely believed that the probability density can not be derived from the microscopic theory and has to be obtained from additional assumptions (therefore sometimes they are referred to as “a prior distribution”). For example, isolated systems are assumed to have uniform distributions; closed systems are assumed to be a small part of a much larger isolated system with uniform distributions. With these assumptions, microcanonical, canonical, grand-canonical, and isothermal-isobaric ensembles are proved to have uniform and generalized-Boltzmann distributions111The term “generalized-Boltzmann distributions” is introduced in Gao et al. (2019). It refers to the distribution of the form Pr(ω)exp[η=1nXηxη(ω)kBTE(ω)kBT]Pr\left(\omega\right)\propto\exp\left[\sum_{\eta=1}^{n}\frac{X_{\eta}x_{\eta}^{\left(\omega\right)}}{k_{B}T}-\frac{E^{\left(\omega\right)}}{k_{B}T}\right] respectively. Further assumptions are required to obtain thermodynamics state functions. For example, the entropy of isolated systems is assumed to be S=kBlog|Ω|S=k_{B}\log\left|\Omega\right|; closed systems’ internal energy is assumed to be the ensemble average ωPr(ω)E(ω)\sum_{\omega}\Pr(\omega)E^{(\omega)}.

This article shows that, if a system is described by an ensemble theory of a specific mathematical form, then the generalized-Boltzmann distribution is the only distribution that can reproduce the thermodynamics of that system. This mathematical form is very general and the canonical, grand-canonical, or isothermal-isobaric ensemble theories are special cases of this mathematical theory. Compared to how textbooksKardar (2007); Callen (1998); Chandler (1987); Landau and Lifshitz (2013); Balescu (1991); Tolman (1979) formalize SM, this approach does not require the assumption of a prior distribution. The set of assumptions in this approach can be considered a strict subset of the assumptions of textbook approaches. With this insight, this article has the following contribution:

First, it challenges the belief that a prior distribution is required in SM. Instead, this article indicates that it is the mathematical form and the consistency with thermodynamics that determine an ensemble’s probability density.

Second, it introduces a new method to derive the generalized-Boltzmann distribution. This derivation is only based on the mathematical form of the ensemble. Compared to standard textbook approaches and newer approaches likeGao et al. (2019), this way to derive the generalized-Boltzmann distribution requires fewer assumptions.

Third, this article could provide new insights into non-Boltzmann-Gibbs (non-BG) SM such as the Tsallis statisticsTsallis (1988); Curado and Tsallis (1991); Tsallis et al. (1998); Tsallis (2019, 2009); Boon and Tsallis (2005). Non-BG SM is widely used in the study of complex systems. A notable difficulty that non-BG SM researchers are facing is how to construct a self-consistent theory, while at the same time, being consistent with thermodynamics.

Fourth, the conclusion of this article might be helpful for philosophers studying the foundations of SM. Interested readers are referred to Frigg (2011) for a comprehensive field review. Earlier reviewsHaar (1955); Penrose (1979) might also be helpful.

II Definitions, notations, and assumptions

Before moving to our theory’s formal statement, let us first revisit an ensemble theory’s essential components:

The first and foremost is the set of microstates, denoted by Ω\Omega. The set of microstates is the underlying set of the measure space and is determined by the ensemble theory and the underlying microscopic theory together. For example, for a microcanonical ensemble, the set of microstates contains all the points in the Γ\Gamma space of classical mechanics whose coordinates satisfy volume constraints and energy is a constant EE or in a small range of energies [E,E+ΔE]\left[E,E+\Delta E\right], or all eigenstates of the Hamiltonian operator of quantum mechanics under this energy constraints.

The second is the different roles that different thermodynamic state functions play in the ensemble theory. They are classified as: parameters determining Ω\Omega, parameters determining the probability density, quantities associated with random variables, and other statistical quantities. Let us take a look at the canonical ensemble as an example. The volume and the number of particles are parameters determining Ω\Omega. These parameters together with the underlying microscopic theory defines which microstates are contained in Ω\Omega, but they do not directly appear in the equation of the probability density. The temperature is also a parameter; this parameter has nothing to do with Ω\Omega, but it appears as a parameter of the probability density function. The system’s energy is a quantity associated with random variable: for each microstate, there is corresponding energy. Other statistical quantities include pressure and entropy. Since neither Ω\Omega nor the probability density directly depend on these two quantities, they are not parameters. They are not quantities associated with random variables either because a single microstate does not have a well-defined pressure or entropy.

The third is the probability density. It is a function of random variables and parameters determining the probability density, but not the other quantities. For the example of the canonical ensemble, the probability density is a function f(E,T)f\left(E,T\right) of EE and TT, but not NN, VV, SS, pp directly.

The last but not the least is the set of rules connecting each random variable with its corresponding thermodynamic state functions. For the case of the canonical, grand-canonical, or isothermal-isobaric ensembles, these rules are U=ωPr(ω)E(ω)U=\sum_{\omega}\Pr(\omega)E^{(\omega)}, N=ωPr(ω)N(ω)N=\sum_{\omega}\Pr(\omega)N^{(\omega)}, etc., but for the case of the Tsallis statistics, the rule is more complicatedBoon and Tsallis (2005).

Being aware of these essential components, we are now ready to define the mathematical form of our theory formally:

Definition 1.

In this article, we will study a thermodynamic system with generalized forces X1,,XnX_{1},\ldots,X_{n}, Y1,,YmY_{1},\ldots,Y_{m} and generalized coordinates χ1,,χn\chi_{1},\ldots,\chi_{n}, y1,,ymy_{1},\ldots,y_{m}. The first law of thermodynamics for this system states that

dU=TdS+η=1nXηdχη+η=1mYηdyη.dU=TdS+\sum_{\eta=1}^{n}X_{\eta}d\chi_{\eta}+\sum_{\eta=1}^{m}Y_{\eta}dy_{\eta}. (1)

We want to describe this thermodynamic system with an ensemble parametrized by T,X1,,Xn,y1,,ymT,X_{1},\ldots,X_{n},y_{1},\ldots,y_{m}, where y1,,ymy_{1},\ldots,y_{m} determines the set of microstates, and T,X1,,XnT,X_{1},\ldots,X_{n} determines the probability density. In this setup, E,x1,,xnE,x_{1},\ldots,x_{n} are random variables; their corresponding thermodynamic state functions are U,χ1,,χnU,\chi_{1},\ldots,\chi_{n}. Y1,,Ym,SY_{1},\ldots,Y_{m},S are other statistical quantities of that ensemble. For a microstate ωΩ\omega\in\Omega, we denote the value of random variables at ω\omega by E(ω),x1(ω),,xn(ω)E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}. For clarity, we have used XX vs YY to distinguish ensemble parameters from statistical quantities. In addition, we utilize χ\chi vs xx and UU vs EE to distinguish thermodynamic state functions from random variables. Specific heats of the system are assumed to be positive222There do exist systems with negative specific heat. These systems have odd properties. For example, they are never extensive, and they cannot achieve thermal equilibrium with a large heat bath. Discussion of such systems is beyond the scope of this article. Interested readers are referred to Lynden-Bell (1999).

The mathematical form stated in definition 1 is very general, and the canonical, grand-canonical, or isothermal-isobaric ensemble theories are all special cases of this form. Let us take a look at a few examples:

  1. 1.

    The canonical ensemble of a single component system is also called an NVTNVT ensemble. This ensemble has y1=Ny_{1}=N, y2=Vy_{2}=V, Y1=μY_{1}=\mu, Y2=pY_{2}=-p. There are no χ\chi, XX or xx. The first law for this system reads dU=TdSpdV+μdNdU=TdS-pdV+\mu dN.

  2. 2.

    The grand canonical ensemble of a two component system is also called a μ1μ2VT\mu_{1}\mu_{2}VT ensemble. This ensemble has y1=Vy_{1}=V, Y1=pY_{1}=-p, χ1=N1\chi_{1}=N_{1}, χ2=N2\chi_{2}=N_{2}, X1=μ1X_{1}=\mu_{1}, X2=μ2X_{2}=\mu_{2}, x1(ω)=N1(ω)x_{1}^{\left(\omega\right)}=N_{1}^{\left(\omega\right)}, and x2(ω)=N2(ω)x_{2}^{\left(\omega\right)}=N_{2}^{\left(\omega\right)}. The first law for this system reads dU=TdSpdV+μ1dN1+μ2dN2dU=TdS-pdV+\mu_{1}dN_{1}+\mu_{2}dN_{2}.

  3. 3.

    The isothermal-isobaric ensemble of a single component is also called an NpT ensemble. It has y1=Ny_{1}=N, Y1=μY_{1}=\mu, χ1=V\chi_{1}=V, X1=pX_{1}=-p, and x1(ω)=V(ω)x_{1}^{\left(\omega\right)}=V^{\left(\omega\right)}. The first law for this system reads dU=TdSpdV+μdNdU=TdS-pdV+\mu dN.

In textbooksKardar (2007); Callen (1998); Chandler (1987); Landau and Lifshitz (2013); Balescu (1991); Tolman (1979), the probability density Pr(ω)\Pr\left(\omega\right) (i.e. the generalized Boltzmann distribution) of the ensemble defined in definition 1 is usually derived by assuming:

  1. 1.

    The microcanonical ensemble has a uniform distribution.

  2. 2.

    The entropy of a microcanonical ensemble is given by S=kBlog|Ω|S=k_{B}\log\left|\Omega\right|

  3. 3.

    The ensemble defined in definition 1 can be considered as a system in equilibrium with a reservoir. The interaction between the system and the reservoir is weak. Furthermore, the system, together with the reservoir as a whole, can be described by a microcanonical ensemble. Or, alternatively,

  4. 4.

    The entropy S=kBωPr(ω)logPr(ω)S=-k_{B}\sum_{\omega}\Pr(\omega)\log\Pr(\omega) is maximal with respect to the probability density function Pr(ω)\Pr(\omega) under the constraints that U=ωPr(ω)E(ω)U=\sum_{\omega}\Pr(\omega)E^{\left(\omega\right)}, χ1=ωPr(ω)x1(ω)\chi_{1}=\sum_{\omega}\Pr(\omega)x_{1}^{\left(\omega\right)}, \cdots, χn=ωPr(ω)xn(ω)\chi_{n}=\sum_{\omega}\Pr(\omega)x_{n}^{\left(\omega\right)} being constant.

This article employs a different set of assumptions. The reader will soon find that these assumptions are just a subset of standard approaches in textbooks or common sense. We will show that we can derive the generalized Boltzmann distribution using this small subset of standard approaches or common sense.

Assumption 1.

The probability density function Pr(ω)\Pr\left(\omega\right) is proportional to a function

Pr(ω)f(E(ω),x1(ω),,xn(ω);T,X1,,Xn)\Pr\left(\omega\right)\propto f\left(E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)};T,X_{1},\ldots,X_{n}\right) (2)

The function will be denoted by f(ω)f\left(\omega\right) or fωf_{\omega} in short.

This is a standard assumption in textbooks. In textbooks, the system being studied is assumed to be in contact with a reservoir. The interaction between the system and the reservoir is assumed to be weak. This weak-interacting assumption means the microstates of system+reservoir are the cartesian product of the microstates of the system and the reservoir. This implies that the probability density is proportional to the number of microstates in the reservoir.

Assumption 2.

Random variables are connected to their corresponding state functions through ensemble average:

U=ωPr(ω)E(ω)χ1=ωPr(ω)x1(ω)χn=ωPr(ω)xn(ω)\begin{array}[]{c}U=\sum_{\omega}\Pr(\omega)E^{\left(\omega\right)}\\ \chi_{1}=\sum_{\omega}\Pr(\omega)x_{1}^{\left(\omega\right)}\\ \vdots\\ \chi_{n}=\sum_{\omega}\Pr(\omega)x_{n}^{\left(\omega\right)}\end{array} (3)

The above assumption is also standard in textbooks. It is not used to derive the generalized Boltzmann distribution, but it is required to obtain thermodynamic state functions after obtaining the distribution. Only with this assumption, people can derive the connection between the partition functions and a thermodynamic state function and then derive the rest state functions taking advantage of natural variables. There exist non-BG statistical mechanics where this assumption does not holdBoon and Tsallis (2005).

Assumption 3.

At infinite temperature, all the microstates have the same probability.

People usually consider this assumption common sense instead of writing it out in textbooks. In the author’s opinion, this assumption should be viewed as a qualitative definition of infinite temperature.

It is worth mentioning that the only assumption about entropy this article made is that the entropy is a state function satisfying equation 1 in definition 1. The functional form of entropy is not assumed. Instead, the functional form of entropy is a conclusion of theorem 1 as shown in theorem 2 and its proof. This article does not assume the entropy is being maximized either.

This article does not assume the form of the underlying microscopic theory, so the conclusions of this article should fit well in both classical mechanics and quantum mechanics.

III The Theory

The main result of this article is the following theorem.

Theorem 1.

An ensemble as defined in definition 1 and satisfies assumptions 1, 2, and 3 obeys the generalized Boltzmann distribution:

Pr(ω)exp[η=1nXηxη(ω)kBTE(ω)kBT].\Pr\left(\omega\right)\propto\exp\left[\sum_{\eta=1}^{n}\frac{X_{\eta}x_{\eta}^{\left(\omega\right)}}{k_{B}T}-\frac{E^{\left(\omega\right)}}{k_{B}T}\right]. (4)
Proof.

Our proof employs some lemmas, that are stated and proved in section IV. From assumption 1, we can write Pr(ω)\Pr\left(\omega\right) as follows:

Pr(ω)f(E(ω),x1(ω),,xn(ω);T,X1,,Xn)\Pr\left(\omega\right)\propto f\left(E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)};T,X_{1},\ldots,X_{n}\right) (5)

Let β=1kBT\beta=\frac{1}{k_{B}T}, X~η=βXη\tilde{X}_{\eta}=\beta X_{\eta} and Y~η=βYη\tilde{Y}_{\eta}=\beta Y_{\eta}. Instead of writing ff as a function of (T,X1,,Xn)\left(T,X_{1},\ldots,X_{n}\right), we will write it as a function of (β,X~1,,X~n)\left(\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}\right):

Pr(ω)f(E(ω),x1(ω),,xn(ω);β,X~1,,X~n)\Pr\left(\omega\right)\propto f\left(E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)};\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}\right) (6)

Rewrite the first law of thermodynamics (equation 1) with β,X~1,,X~n\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}, we get

dSkB=βdUη=1nX~ηdχηη=1mY~ηdyη\frac{dS}{k_{B}}=\beta dU-\sum_{\eta=1}^{n}\tilde{X}_{\eta}d\chi_{\eta}-\sum_{\eta=1}^{m}\tilde{Y}_{\eta}dy_{\eta} (7)

do Legendre transformation to get a state function BB with natural variables β\beta, X~1,,X~n\tilde{X}_{1},\ldots,\tilde{X}_{n}, y1,,ymy_{1},\ldots,y_{m}, we have

B=SkBβU+η=1nX~ηχηB=\frac{S}{k_{B}}-\beta U+\sum_{\eta=1}^{n}\tilde{X}_{\eta}\chi_{\eta} (8)
dB=Udβ+η=1nχηdX~ηη=1mY~ηdyηdB=-Ud\beta+\sum_{\eta=1}^{n}\chi_{\eta}d\tilde{X}_{\eta}-\sum_{\eta=1}^{m}\tilde{Y}_{\eta}dy_{\eta} (9)

therefore

U=ωPr(ω)E(ω)=BβU=\sum_{\omega}\Pr(\omega)E^{\left(\omega\right)}=-\frac{\partial B}{\partial\beta} (10)
χη=ωPr(ω)xη(ω)=BX~η\chi_{\eta}=\sum_{\omega}\Pr(\omega)x_{\eta}^{\left(\omega\right)}=\frac{\partial B}{\partial\tilde{X}_{\eta}} (11)

The normalization constant (partition function) for equation 6 is

Z=ωfωZ=\sum_{\omega}f_{\omega} (12)

where fωf_{\omega} is short for

f(E(ω),x1(ω),,xn(ω);β,X~1,,X~n)f\left(E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)};\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}\right) (13)

Then equation 10 and equation 11 becomes

ωE(ω)fωZ=Bβ\sum_{\omega}\frac{E^{\left(\omega\right)}f_{\omega}}{Z}=-\frac{\partial B}{\partial\beta} (14)
ωxη(ω)fωZ=BX~η\sum_{\omega}\frac{x_{\eta}^{\left(\omega\right)}f_{\omega}}{Z}=\frac{\partial B}{\partial\tilde{X}_{\eta}} (15)

From basic multivariable calculus, we have 2BX~ηβ=2BβX~η\frac{\partial^{2}B}{\partial\tilde{X}_{\eta}\partial\beta}=\frac{\partial^{2}B}{\partial\beta\partial\tilde{X}_{\eta}}. Therefore

X~ηωE(ω)fωZ+βωxη(ω)fωZ=0\frac{\partial}{\partial\tilde{X}_{\eta}}\sum_{\omega}\frac{E^{\left(\omega\right)}f_{\omega}}{Z}+\frac{\partial}{\partial\beta}\sum_{\omega}\frac{x_{\eta}^{\left(\omega\right)}f_{\omega}}{Z}=0 (16)

which simplifies to

ω[E(ω)(fω/Z)X~η+xη(ω)(fω/Z)β]=0\sum_{\omega}\left[E^{\left(\omega\right)}\frac{\partial\left(f_{\omega}/Z\right)}{\partial\tilde{X}_{\eta}}+x_{\eta}^{\left(\omega\right)}\frac{\partial\left(f_{\omega}/Z\right)}{\partial\beta}\right]=0 (17)

the above equality should always be true, regardless of the details of the system and microstates, the only way to guarantee this is to have

E(ω)(fω/Z)X~η+xη(ω)(fω/Z)β=0E^{\left(\omega\right)}\frac{\partial\left(f_{\omega}/Z\right)}{\partial\tilde{X}_{\eta}}+x_{\eta}^{\left(\omega\right)}\frac{\partial\left(f_{\omega}/Z\right)}{\partial\beta}=0 (18)

for all ω\omegas. Apply the same thing to 2BX~iX~j=2BX~jX~i\frac{\partial^{2}B}{\partial\tilde{X}_{i}\partial\tilde{X}_{j}}=\frac{\partial^{2}B}{\partial\tilde{X}_{j}\partial\tilde{X}_{i}} and from lemma 1, we know that ff must have the form g(ζ,E(ω),x1(ω),,xn(ω))g\left(\zeta,E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right), where

ζ=βE(ω)η=1nX~ηxη(ω)\zeta=\beta E^{\left(\omega\right)}-\sum_{\eta=1}^{n}\tilde{X}_{\eta}x_{\eta}^{\left(\omega\right)} (19)

Let GG be an antiderivative of gg with respect to ζ\zeta, that is,

G=g(ζ,E(ω),x1(ω),,xn(ω))G^{\prime}=g\left(\zeta,E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right) (20)

We use the prime exclusively for derivative with respect to the first argument ζ\zeta while keeping other arguments E(ω),x1(ω),,xn(ω)E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)} constant. Let K=ωG(ζ;E(ω),x1(ω),,xn(ω))K=\sum_{\omega}G\left(\zeta;E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right), it is easy to show that

Kβ=ωE(ω)gω=ZωPr(ω)E(ω)=ZBβ\frac{\partial K}{\partial\beta}=\sum_{\omega}E^{\left(\omega\right)}g_{\omega}=Z\cdot\sum_{\omega}\Pr(\omega)E^{\left(\omega\right)}=-Z\cdot\frac{\partial B}{\partial\beta} (21)
KX~η=ωxη(ω)gω=ZωPr(ω)xη(ω)=ZBX~η\frac{\partial K}{\partial\tilde{X}_{\eta}}=-\sum_{\omega}x_{\eta}^{\left(\omega\right)}g_{\omega}=-Z\cdot\sum_{\omega}\Pr(\omega)x_{\eta}^{\left(\omega\right)}=-Z\cdot\frac{\partial B}{\partial\tilde{X}_{\eta}} (22)

where gωg_{\omega} is short for

g(βE(ω)η=1nX~ηxη(ω),E(ω),x1(ω),,xn(ω))g\left(\beta E^{\left(\omega\right)}-\sum_{\eta=1}^{n}\tilde{X}_{\eta}x_{\eta}^{\left(\omega\right)},E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right) (23)

Note that K,Z,BK,Z,B all have the same set of natural variables β\beta, X~1,,X~n\tilde{X}_{1},\ldots,\tilde{X}_{n}, y1,,ymy_{1},\ldots,y_{m}, so equation 21 and equation 22 can be condensed as

dK=ZdBdK=-Z\cdot dB (24)

Properties of exact differential requires that KK, ZZ, and BB must have a function relationship between each other.

Besides, KK and ZZ are both functionals with parameters β,X~1,,X~n\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n} that map functions of ω\omega (random variables E(ω),x1(ω),,xn(ω)E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}) to numbers. If the random variables change by a small amount δE(ω),δx1(ω),,δxn(ω)\delta E^{\left(\omega\right)},\delta x_{1}^{\left(\omega\right)},\ldots,\delta x_{n}^{\left(\omega\right)}, then these functionals change as follows:

δK\displaystyle\delta K =\displaystyle= ω[(GE(ω)+βgω)δE(ω)\displaystyle\sum_{\omega}\left[\left(\frac{\partial G}{\partial E^{\left(\omega\right)}}+\beta g_{\omega}\right)\delta E^{\left(\omega\right)}\right. (25)
+\displaystyle+ η(Gxη(ω)X~ηgω)δxη(ω)]\displaystyle\left.\sum_{\eta}\left(\frac{\partial G}{\partial x_{\eta}^{\left(\omega\right)}}-\tilde{X}_{\eta}g_{\omega}\right)\delta x_{\eta}^{\left(\omega\right)}\right]
δZ\displaystyle\delta Z =\displaystyle= ω[(gE(ω)+βgω)δE(ω)\displaystyle\sum_{\omega}\left[\left(\frac{\partial g}{\partial E^{\left(\omega\right)}}+\beta g^{\prime}_{\omega}\right)\delta E^{\left(\omega\right)}\right. (26)
+\displaystyle+ η(gxη(ω)X~ηgω)δxη(ω)]\displaystyle\left.\sum_{\eta}\left(\frac{\partial g}{\partial x_{\eta}^{\left(\omega\right)}}-\tilde{X}_{\eta}g^{\prime}_{\omega}\right)\delta x_{\eta}^{\left(\omega\right)}\right]

where the E(ω)\frac{\partial}{\partial E^{\left(\omega\right)}} and xη(ω)\frac{\partial}{\partial x_{\eta}^{\left(\omega\right)}} are partial derivatives keeping ζ\zeta constant:

E(ω)|ζ,x1(ω),,xn(ω)\left.\frac{\partial}{\partial E^{\left(\omega\right)}}\right|_{\zeta,x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}} (27)
xη(ω)|ζ,E(ω),x1(ω),xη1(ω),xη+1(ω),,xn(ω)\left.\frac{\partial}{\partial x_{\eta}^{\left(\omega\right)}}\right|_{\zeta,E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots x_{\eta-1}^{\left(\omega\right)},x_{\eta+1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}} (28)

The function relationship between KK and ZZ requires δK=C(β,X~1,,X~n)δZ\delta K=C\left(\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}\right)\delta Z to be true for all δE(ω),δx1(ω),,δxn(ω)\delta E^{\left(\omega\right)},\delta x_{1}^{\left(\omega\right)},\ldots,\delta x_{n}^{\left(\omega\right)}, where C(β,X~1,,X~n)C\left(\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}\right) is some constant that must not depend on E(ω),x1(ω),,xn(ω)E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)} but could depend on β,X~1,,X~n\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}. Then

GE(ω)+βgω=C(β,X~1,,X~n)[gE(ω)+βgω]\frac{\partial G}{\partial E^{\left(\omega\right)}}+\beta g_{\omega}=C\left(\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}\right)\cdot\left[\frac{\partial g}{\partial E^{\left(\omega\right)}}+\beta g^{\prime}_{\omega}\right] (29)
Gxη(ω)X~ηgω=C(β,X~1,,X~n)[gxη(ω)X~ηgω]\frac{\partial G}{\partial x_{\eta}^{\left(\omega\right)}}-\tilde{X}_{\eta}g_{\omega}=C\left(\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}\right)\cdot\left[\frac{\partial g}{\partial x_{\eta}^{\left(\omega\right)}}-\tilde{X}_{\eta}g^{\prime}_{\omega}\right] (30)

From lemma 2, C(β,X~1,,X~n)C\left(\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}\right) is a constant that does not depends on β,X~1,,X~n\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n}. Denote it as C1C_{1}. Define

^=E(ω)+βζ\hat{\mathcal{L}}=\frac{\partial}{\partial E^{\left(\omega\right)}}+\beta\frac{\partial}{\partial\zeta} (31)

then equation 29 can be written as ^G=C1^g\hat{\mathcal{L}}G=C_{1}\hat{\mathcal{L}}g. Since ^\hat{\mathcal{L}} is a linear operator, we have ^(GC1g)=0\hat{\mathcal{L}}\left(G-C_{1}g\right)=0. The kernel of ^\hat{\mathcal{L}} contains functions of the form φ(ζβE(ω))\varphi\left(\zeta-\beta E^{\left(\omega\right)}\right). By performing a same thing to equation 30, we see that GC1gG-C_{1}g must have the form

GC1g=φ(ζβE(ω)+η=1nX~ηxη(ω)).G-C_{1}g=\varphi\left(\zeta-\beta E^{\left(\omega\right)}+\sum_{\eta=1}^{n}\tilde{X}_{\eta}x_{\eta}^{\left(\omega\right)}\right). (32)

Since ζβE(ω)+η=1nX~ηxη(ω)0\zeta-\beta E^{\left(\omega\right)}+\sum_{\eta=1}^{n}\tilde{X}_{\eta}x_{\eta}^{\left(\omega\right)}\equiv 0, we then have that GC1g=C3G-C_{1}g=C_{3} where C3C_{3} denotes another constant. By taking the derivative of both side, we get that g=C1gg=C_{1}g^{\prime}, which immediately leads to

g(ζ,E(ω),x1(ω),,xn(ω))\displaystyle g\left(\zeta,E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right) =\displaystyle=
C2(E(ω),x1(ω),,xn(ω))\displaystyle C_{2}\left(E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right) \displaystyle\cdot exp(ζ/C1)\displaystyle\exp\left(\zeta/C_{1}\right) (33)

and

G(ζ,E(ω),x1(ω),,xn(ω))\displaystyle G\left(\zeta,E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right) =\displaystyle=
C1C2(E(ω),x1(ω),,xn(ω))\displaystyle C_{1}\cdot C_{2}\left(E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right) \displaystyle\cdot exp(ζ/C1),\displaystyle\exp\left(\zeta/C_{1}\right), (34)

where C2(E(ω),x1(ω),,xn(ω))C_{2}\left(E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right) denotes a constant that must not depend on β,X~1,,X~n\beta,\tilde{X}_{1},\ldots,\tilde{X}_{n} but could depend on E(ω),x1(ω),,xn(ω)E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}. In order to have positive specific heat, we must have C1<0C_{1}<0. Since C1C_{1} denotes a constant multiplied toward the temperature, from lemma 3, we can choose C1=1C_{1}=-1 without loss of generality. By defining

Λ(ω)=C2(E(ω),x1(ω),,xn(ω)),\Lambda\left(\omega\right)=C_{2}\left(E^{\left(\omega\right)},x_{1}^{\left(\omega\right)},\ldots,x_{n}^{\left(\omega\right)}\right), (35)

we have

Pr(ω)Λ(ω)exp[η=1nXηxη(ω)kBTE(ω)kBT].\Pr\left(\omega\right)\propto\Lambda\left(\omega\right)\cdot\exp\left[\sum_{\eta=1}^{n}\frac{X_{\eta}x_{\eta}^{\left(\omega\right)}}{k_{B}T}-\frac{E^{\left(\omega\right)}}{k_{B}T}\right]. (36)

As TT\to\infty, the probability density function Pr(ω)Λ(ω)\Pr\left(\omega\right)\to\Lambda\left(\omega\right). From assumption 3, Λ(ω)\Lambda\left(\omega\right) must be a constant. This completes the proof.

The procedure to obtain all the other thermodynamic state functions is the same as in textbooks. In the proof of theorem 1, it is easy to see that K=ZK=-Z. Applying equation 24, we obtain that B=logZ+C4B=\log Z+C_{4}. BB being extensive implies that C4C_{4} must vanish. Then, we have that B=logZB=\log Z. Let

J=kBTB=kBTlogZJ=k_{B}T\cdot B=k_{B}T\log Z (37)

and substitute into equation 8 and equation 9, we get that

J=TSU+η=1nXηχηJ=TS-U+\sum_{\eta=1}^{n}X_{\eta}\chi_{\eta} (38)
dJ=SdT+η=1nχηdXηη=1mYηdyηdJ=SdT+\sum_{\eta=1}^{n}\chi_{\eta}dX_{\eta}-\sum_{\eta=1}^{m}Y_{\eta}dy_{\eta} (39)

Then we can obtain all the state functions by taking advantage of natural variables. Let us take a look at entropy as an example:

Theorem 2.

The ensemble as defined in definition 1 and satisfies assumptions 1, 2, and 3 has entropy:

S=kBωPr(ω)logPr(ω).S=-k_{B}\sum_{\omega}\Pr\left(\omega\right)\log\Pr\left(\omega\right). (40)
Proof.

For brevity, in the context of this proof, we will use T\frac{\partial}{\partial T} and β\frac{\partial}{\partial\beta} to denote T|X1,,Xn,y1,,ym\left.\frac{\partial}{\partial T}\right|_{X_{1},\ldots,X_{n},y_{1},\ldots,y_{m}} and β|X1,,Xn,y1,,ym\left.\frac{\partial}{\partial\beta}\right|_{X_{1},\ldots,X_{n},y_{1},\ldots,y_{m}}.

From equation 39, we have

S=JT=1kBT2JβS=\frac{\partial J}{\partial T}=-\frac{1}{k_{B}T^{2}}\frac{\partial J}{\partial\beta} (41)

From equation 37, we have

Jβ=1β2(βZZβlogZ).\frac{\partial J}{\partial\beta}=\frac{1}{\beta^{2}}\left(\frac{\beta}{Z}\frac{\partial Z}{\partial\beta}-\log Z\right). (42)

Let γ(ω)=η=1nXηxη(ω)E(ω)\gamma^{\left(\omega\right)}=\sum_{\eta=1}^{n}X_{\eta}x_{\eta}^{\left(\omega\right)}-E^{\left(\omega\right)}, from equation 12 and equation 4, we have

Z=ωexp(βγ(ω))Z=\sum_{\omega}\exp\left(\beta\gamma^{\left(\omega\right)}\right) (43)

and

Pr(ω)=exp(βγ(ω))Z\Pr(\omega)=\frac{\exp\left(\beta\gamma^{\left(\omega\right)}\right)}{Z} (44)

Therefore

βZZβ=βZωγ(ω)exp(βγ(ω))=ωβγ(ω)Pr(ω).\frac{\beta}{Z}\frac{\partial Z}{\partial\beta}=\frac{\beta}{Z}\sum_{\omega}\gamma^{\left(\omega\right)}\exp\left(\beta\gamma^{\left(\omega\right)}\right)=\sum_{\omega}\beta\gamma^{\left(\omega\right)}\Pr(\omega). (45)

From equation 44, we have

βγ(ω)=logZ+logPr(ω).\beta\gamma^{\left(\omega\right)}=\log Z+\log\Pr\left(\omega\right). (46)

Substitute equation 46 and equation 45 back to equation 42, we have

Jβ=1β2ωPr(ω)logPr(ω).\frac{\partial J}{\partial\beta}=\frac{1}{\beta^{2}}\sum_{\omega}\Pr\left(\omega\right)\log\Pr\left(\omega\right). (47)

Combine equation 41 and equation 47, we have

S=kBωPr(ω)logPr(ω).S=-k_{B}\sum_{\omega}\Pr\left(\omega\right)\log\Pr\left(\omega\right). (48)

This completes the proof.

IV Lemmas and their proof

Lemma 1.

For a function of 4 variables f(a,b,c,d)f\left(a,b,c,d\right), if

afb|acd+cfd|abc=0,a\left.\frac{\partial f}{\partial b}\right|_{acd}+c\left.\frac{\partial f}{\partial d}\right|_{abc}=0, (49)

then there exists a function gg such that f(a,b,c,d)=g(adbc,a,c)f\left(a,b,c,d\right)=g\left(ad-bc,a,c\right).

Proof.

Let us call (a,b,c,d)\left(a,b,c,d\right) the old coordinates and define new coordinates (u,v,w,x)\left(u,v,w,x\right) such that

{u=av=cw=adbcx=ad+bc\left\{\begin{array}[]{c}u=a\\ v=c\\ w=ad-bc\\ x=ad+bc\end{array}\right. (50)

then the reverse transformation is

{a=uc=vd=w+x2ub=xw2v\left\{\begin{array}[]{c}a=u\\ c=v\\ d=\frac{w+x}{2u}\\ b=\frac{x-w}{2v}\end{array}\right. (51)

Evaluating partial derivatives in new coordinate, we have

fx|uvw\displaystyle\left.\frac{\partial f}{\partial x}\right|_{uvw} =\displaystyle= fd|abc12a+fb|acd12c\displaystyle\left.\frac{\partial f}{\partial d}\right|_{abc}\cdot\frac{1}{2a}+\left.\frac{\partial f}{\partial b}\right|_{acd}\cdot\frac{1}{2c} (52)
=\displaystyle= 12ac(cfd|abc+afb|acd)=0\displaystyle\frac{1}{2ac}\left(c\left.\frac{\partial f}{\partial d}\right|_{abc}+a\left.\frac{\partial f}{\partial b}\right|_{acd}\right)=0

that is, ff does not depend on xx. Therefore, it is a function of only u,v,u,v, and ww.

Lemma 2.

Let ff, gg, and CC be functions, and x,y,a,x,y,a, and bb be variables. Then f(ax+by,x,y)=C(a,b)g(ax+by,x,y)f\left(ax+by,x,y\right)=C\left(a,b\right)\cdot g\left(ax+by,x,y\right) implies that C(a,b)C\left(a,b\right) is a constant that does not depend on aa and bb.

Proof.

For fixed xx and yy, the set of all possible values of (a,b)\left(a,b\right) that has ax+by=zax+by=z is a line, where zz denotes a constant. When x,y,x,y, and ax+byax+by are all fixed, the values of ff or gg does not change. Therefore, C(a,b)C\left(a,b\right) must also be a constant on that line. This is true for all values of x,y,x,y, and zz, that is, C(a,b)C\left(a,b\right) is a constant on all possible lines. Since different lines cross, then C(a,b)C\left(a,b\right) must be a constant that does not depend on aa and bb.

Lemma 3.

If we scale the temperature and entropy by 1α\frac{1}{\alpha} and α\alpha, respectively, we do not change any physics.

Proof.

Let us begin our proof by reviewing how the BG theory of equilibrium SM is built in textbooks. We start the procedure by defining the entropy of the microcanonical ensemble as S=kBlog|Ω|S=k_{B}\log\left|\Omega\right| , and we establish a system in thermal equilibrium with a reservoir that defines TT. Thus, the number of the microstates of the reservoir |Ωr|\left|\Omega_{r}\right| is given by

|Ωr|=exp(Sr/kB).\left|\Omega_{r}\right|=\exp\left(S_{r}/k_{B}\right). (53)

By taking the power series of SrS_{r} at EtotalE_{\text{total}} with respect to the energy of the system EsE_{s} to the first order, we get that

Sr(EtotalES)=Sr(Etotal)SrErESS_{r}\left(E_{\text{total}}-E_{S}\right)=S_{r}\left(E_{\text{total}}\right)-\frac{\partial S_{r}}{\partial E_{r}}\cdot E_{S} (54)

The application of the first law of thermodynamics gives that

SrEr=1T\frac{\partial S_{r}}{\partial E_{r}}=\frac{1}{T} (55)

By combining equations 53, 54, and 55, we obtain the following Boltzmann distribution:

|Ωr|=Cexp(EskBT)\left|\Omega_{r}\right|=C\cdot\exp\left(-\frac{E_{s}}{k_{B}T}\right) (56)

where C=exp(Sr(Etotal))C=\exp\left(S_{r}\left(E_{\text{total}}\right)\right) denotes a constant that does not depend on ESE_{S}. In the above procedure, the temperature scale is introduced by defining entropy as S=kBlog|Ω|S=k_{B}\log\left|\Omega\right|. The constant kBk_{B} gets propagated along the logic chain and determines the temperature scale together with the first law of thermodynamics.

If we instead had started by defining S=αkBlog|Ω|S^{\prime}=\alpha k_{B}\log\left|\Omega\right|, then the following equation can be obtained by applying the same logic:

|Ωr|=exp(SrkBα).\left|\Omega_{r}\right|=\exp\left(\frac{S_{r}^{\prime}}{k_{B}\alpha}\right). (57)

In this case, the first law of thermodynamics yields

SrEr=1T.\frac{\partial S_{r}^{\prime}}{\partial E_{r}}=\frac{1}{T^{\prime}}. (58)

where TT^{\prime} denotes the temperature in the new scale. Thus, the Boltzmann distribution in the new scale will look like

|Ωr|=Cexp(EskBαT).\left|\Omega_{r}\right|=C\cdot\exp\left(-\frac{E_{s}}{k_{B}\alpha T^{\prime}}\right). (59)

Temperature scales are artificial, while probabilities are physical. Therefore, equation 59 must match with equation 56. To prove this, we utilize the first law of thermodynamics to obtain the relationship between TT and TT^{\prime}:

{dU=TdSpdVdU=TdSpdVS=αST=Tα.\left\{\begin{array}[]{c}dU=TdS-pdV\\ dU=T^{\prime}dS^{\prime}-pdV\\ S^{\prime}=\alpha S\end{array}\right.\Rightarrow T^{\prime}=\frac{T}{\alpha}. (60)

By substituting equation 60 into equation 59, we obtain an exact match with equation 56. This completes this proof.

V Discussion

In this article, we discussed what the basic components of an ensemble theory are. These basic components include the set of microstates, the different roles that different thermodynamic state functions play (these roles include: parameters determining Ω\Omega, parameters determining the probability density, quantities associated with random variables, and other statistical quantities), the probability density, and the set of rules connecting each random variable with its corresponding thermodynamic state functions. We showed how the mathematical form of these basic components, together with some additional assumptions, can be used to derive the generalized Boltzmann distribution. This derivation only uses the mathematical form of these components and the consistency with thermodynamics; no prior distribution is required.

Since definition 1 is based on the textbook approach of equilibrium thermodynamics, the primary purpose of this article is to reveal the internal structure of the classical theory of equilibrium thermodynamics and statistical mechanics. However, the author would like to remind the reader that definition 1, the three assumptions, and the proof of theorem 1 and theorem 2 are all about mathematical forms and contains little about the physical interpretation of these mathematics. As a result, the argument in this article can be naturally extended to non-equilibrium theories that share the same mathematical form.

As a side note, the reader can easily verify that, if assumption 2 is replaced with the corresponding equation in Curado and Tsallis (1991), the conclusion of theorem 1 (i.e. equation 4) is replaced with the qq-distribution, the equation 20 is replaced with G=gqG^{\prime}=g^{q}, and the conclusion of theorem 2 (i.e. equation 40) is replaced with the qq-entropy, our arguments still hold. This means that our approach can also be used to obtain the Tsallis statistics.

VI Acknowledgment

The author thanks Constantino Tsallis for the discussion on the connection with the Tsallis statistics. The authors would like to thank Enago (www.enago.com) for the English language review.

References

  • Boltzmann (2012) L. Boltzmann, Wissenschaftliche Abhandlungen, edited by F. Hasenohrl (Cambridge University Press, Cambridge, 2012).
  • Gibbs (1902) J. W. Gibbs, Elementary Principles in Statistical Mechanics (Yale University Press, 1902) p. 207.
  • Note (1) The term “generalized-Boltzmann distributions” is introduced in Gao et al. (2019). It refers to the distribution of the form Pr(ω)exp[\sum@\slimits@η=1nXηxη(ω)kBTE(ω)kBT]Pr\left(\omega\right)\propto\mathop{exp}\nolimits\left[\sum@\slimits@_{\eta=1}^{n}\frac{X_{\eta}x_{\eta}^{\left(\omega\right)}}{k_{B}T}-\frac{E^{\left(\omega\right)}}{k_{B}T}\right].
  • Kardar (2007) M. Kardar, Statistical physics of particles (Cambridge University Press, 2007).
  • Callen (1998) H. B. Callen, “Thermodynamics and an introduction to thermostatistics,”  (1998).
  • Chandler (1987) D. Chandler, Introduction to modern statistical mechanics (Oxford University Press, 1987).
  • Landau and Lifshitz (2013) L. D. Landau and E. M. Lifshitz, Course of theoretical physics: statistical physics (Elsevier, 2013).
  • Balescu (1991) R. Balescu, Equilibrium and Nonequilibrium Statistical Mechanics (Krieger, 1991).
  • Tolman (1979) R. C. Tolman, The principles of statistical mechanics (Courier Corporation, 1979).
  • Gao et al. (2019) X. Gao, E. Gallicchio,  and A. E. Roitberg, Journal of Chemical Physics 151, 034113 (2019)arXiv:1903.02121 .
  • Tsallis (1988) C. Tsallis, Journal of Statistical Physics 52, 479 (1988).
  • Curado and Tsallis (1991) E. M. F. Curado and C. Tsallis, Journal of Physics A: Mathematical and General 24, L69 (1991).
  • Tsallis et al. (1998) C. Tsallis, R. Mendes,  and A. R. Plastino, Physica A: Statistical Mechanics and its Applications 261, 534 (1998).
  • Tsallis (2019) C. Tsallis, Entropy 21, 696 (2019).
  • Tsallis (2009) C. Tsallis, Introduction to nonextensive statistical mechanics: approaching a complex world (Springer Science & Business Media, 2009).
  • Boon and Tsallis (2005) J. Boon and C. Tsallis, Europhys. News 36, 185 (2005).
  • Frigg (2011) R. Frigg, in The Ashgate Companion to Contemporary Philosophy of Physics, edited by D. Rickles (2011) Chap. 3: A Field Guide to Recent Work on the Foundations of Statistical Mechanics.
  • Haar (1955) D. T. Haar, Reviews of Modern Physics 27, 289 (1955).
  • Penrose (1979) O. Penrose, Reports on Progress in Physics 42, 1937 (1979).
  • Note (2) There do exist systems with negative specific heat. These systems have odd properties. For example, they are never extensive, and they cannot achieve thermal equilibrium with a large heat bath. Discussion of such systems is beyond the scope of this article. Interested readers are referred to Lynden-Bell (1999).
  • Lynden-Bell (1999) D. Lynden-Bell, Physica A: Statistical Mechanics and its Applications 263, 293 (1999).