A geometric conjecture about phase transitions
Abstract
As phenomena that necessarily emerge from the collective behavior of interacting particles, phase transitions continue to be difficult to predict using statistical thermodynamics. A recent proposal called the topological hypothesis suggests that the existence of a phase transition could perhaps be inferred from changes to the topology of the accessible part of the configuration space. This paper instead suggests that such a topological change is often associated with a dramatic change in the configuration space geometry, and that the geometric change is the actual driver of the phase transition. More precisely, a geometric change that brings about a discontinuity in the mixing time required for an initial probability distribution on the configuration space to reach steady-state is conjectured to be related to the onset of a phase transition in the thermodynamic limit. This conjecture is tested by evaluating the diffusion diameter and -mixing time of the configuration spaces of hard disk and hard sphere systems of increasing size. Explicit geometries are constructed for the configuration spaces of these systems, and numerical evidence suggests that a discontinuity in the -mixing time coincides with the solid-fluid phase transition in the thermodynamic limit.
I Introduction
Phase transitions are essential to a variety of scientific and engineering applications, but continue to be difficult to predict from fundamental considerations. Instead, a phase transition is usually identified by an observed discontinuity in one of the derivatives of a thermodynamic potential. There have been several proposals concerning the origin of these discontinuities. For example, Landau theory Landau (1936); Landau and Lifshitz (2013) associates first-order phase transitions with spontaneous symmetry breaking as quantified by an appropriately-constructed order parameter. Such order parameters often need to be defined ex post facto though, after the characteristics of the phases involved in the phase transition are already known. An understanding of phase transitions that derives from more fundamental considerations would therefore be valuable.
Statistical thermodynamics suggests that thermodynamic observables can be calculated as time averages of the relevant microscopic quantity along the system’s trajectory through the phase space. The ergodic hypothesis Boltzmann (1871); Ehrenfest and Ehrenfest (1990) implies that, provided such time averages are conducted over a period longer than a characteristic mixing time, they can be replaced by an average over the entire phase space using a measure that is independent of the system’s initial microstate; that is, they can be replaced by an ensemble average. One question that has been raised about this procedure is whether the ensemble average could really be independent of the system’s initial conditions in general. Consider that any system with a disconnected configuration space (i.e., a system that is not metrically transitive Birkhoff (1931); Neumann (1932)) is necessarily confined to the component of the configuration space where it begins, meaning that the average behavior of an ensemble of such systems would not resemble the time-averaged behavior of any one system. Instead, the measure used for the average should depend on the system’s initial microstate, since that defines the component of the configuration space to which the system is confined. Following this line of thought further, changes to the connectivity of the configuration space could discontinuously change the support of the measure used for thermodynamic averages, and thereby lead to the discontinuities in thermodynamic observables required for a phase transition. This is one way to motivate the topological hypothesis Caiani et al. (1997); Franzosi and Pettini (2004) which roughly proposes that changes in the topology of the accessible part of the configuration space are necessary for a phase transition to occur.

The application of these ideas to a toy model is shown in Fig. 1. Suppose that there is an isolated system with the configuration space in the left panel, and that the potential energy is strictly monotone increasing with the -coordinate in the figure. The system’s internal energy then defines a value of called the filtration value above which the system’s trajectory cannot pass. Further suppose that the system’s initial microstate corresponds to a point in the bottom left corner of the configuration space, and that the observable of interest is the -coordinate in the figure. The middle panel shows the ensemble average of this observable as a function of filtration value for both the ergodic hypothesis and the topological hypothesis. Since the configuration space is symmetric about and the ergodic hypothesis stipulates that the measure should be independent of the initial microstate, the ensemble average of the observable is zero for all filtration values. By comparison, the topological hypothesis recognizes that for sufficiently small filtration values the configuration space is disconnected, the system is confined to the left side of the configuration space, and the ensemble average of the observable should be zero only when the filtration value increases enough for the space to become connected. Moreover, the discontinuous change in the integration measure at this filtration value leads to a discontinuous observable, precisely of the type that would be expected for a system undergoing a phase transition. It is significant that it was not necessary at any point to pass to the thermodynamic limit for there to be a discontinuity in the observable, though it is reasonable to suppose that the magnitude of the discontinuities and their location with respect to any control variables would vary systematically with size in a more realistic situation.
Let be the potential energy of a system, be the coordinates of the system’s particles, and be a smooth, stable, confining and short-range interaction potential. The topological hypothesis Franzosi et al. (2007); Franzosi and Pettini (2007) initially claimed that a topological change in the equipotential energy submanifolds of the configuration space was a necessary condition for a phase transition. Kastner and Mehta Kastner and Mehta (2011) observed a second-order phase transition in a two-dimensional system at an energy where no topological change occurred, disproving this claim. Gori et al. Gori et al. (2018) subsequently refined the hypothesis and observed that the phase transition in the two-dimensional system was caused by a diverging transition time between two parts of the configuration space, and that this was associated with an asymptotic topological change. That is, a continuous change in the configuration space geometry brought about a divergence in the mixing time, and while such events are often accompanied by topological changes they are not always. The significance of the configuration space geometry is supported by a recent study of the topological and geometric properties of the two-dimensional model by Bel-Hadj-Aissa et al. Bel-Hadj-Aissa et al. (2021); they computed the mean geometric curvature of the equipotential energy level sets and observed that the location of the phase transition could be inferred from the level set curvatures.
With this as background, the fundamental conjecture of this work is:
Conjecture 1.
A necessary condition for a first-order phase transition is a discontinuity in the mixing time on the configuration space.
This is not intended to suggest that the mixing time directly regulates the appearance of phase transitions, but merely that the mixing time is sensitive to any geometric changes that could discontinuously change the integration measure for thermodynamic averages.
A natural question at this point is whether it is actually possible to measure the mixing time of a thermodynamic system, or whether this will always remain a vague quantity significant only in that it is necessarily shorter than the time scales of quasiequilibrium processes. We hypothesize that the relevant geometric changes are so severe that any geometric quantity reasonably sensitive to the accessible volume of the configuration space and the length and number of paths connecting distant regions should exhibit measurable discontinuities for the same values of the control variable as the mixing time. The two quantities used here are the diffusion distance Coifman and Lafon (2006); Talmon et al. (2013) and the -mixing time. The first measures the difference between two distributions that start as Dirac delta distributions at different locations on the configuration space and evolve by diffusion; maximizing the diffusion distance over all starting locations of the two distributions gives the configuration space’s diffusion diameter. The second measures the time required for a Dirac delta distribution to diffuse to the steady-state distribution within a tolerance defined by , averaged over all starting locations of the distribution in the configuration space. This is intended to resemble the thermodynamic mixing time, though the distribution evolves by the diffusion equation rather than the Liouville equation and the similarity to the steady-state distribution is quantified by the Kullback–Leibler divergence Kullback and Leibler (1951).
The conjecture is tested by evaluating the diffusion diameter and -mixing time for the configuration spaces of hard disk and hard sphere systems, collectively called hard disk systems in the following. These systems are frequently used to model simple fluids Dyre (2016), and are governed by the hard disk potential for which the energy is infinite if any disks overlap and is zero otherwise. The phase transitions for these systems have been studied extensively, starting with the seminal work of Alder and Wainwright Alder and Wainwright (1962) who observed a phase transition in a system of hard disks as a function of packing fraction (the fraction of the area covered by the disks). The solid for is characterized by the presence of a long-range translational and orientational order, whereas the fluid for is characterized by the absence of any long-range order Weber et al. (1995); Mitus et al. (1997). The interval was initially believed to contain coexisting solid and fluid phases as would be expected of a first-order transition, but more recently was claimed to contain a hexatic phase Halperin and Nelson (1978); Young (1979). It is significant that there is still not agreement in the literature about the order of the transitions and the phases involved Weber et al. (1995); Mitus et al. (1997); Bernard and Krauth (2011); Engel et al. (2013) even for this simple system. Similar studies of hard spheres in three dimensions include those by Isobe and Krauth Isobe and Krauth (2015) and Pieprzyk et al. Pieprzyk et al. (2019) who observed solid-fluid phase coexistence in the interval.
The configuration spaces of hard disk systems have been studied before. Carlsson et al. Carlsson et al. (2012) explored the topology of the configuration space of five hard disks in a square box by regularizing the potential energy surface and using classical Morse theory Morse (1934); Milnor (2016). Baryshnikov et al. Baryshnikov et al. (2014) developed a min-type Morse theory for the configuration spaces of hard disk systems and proved that the critical points where the topology changes as a function of packing fraction are precisely the mechanically-balanced disk configurations. Ritchey Ritchey (2019) studied configuration spaces of hard disk in the hexagonal torus for disks, created a database of the critical points, defined their critical index (specifying the nature of the associated topological change), and found their plane symmetry groups. The authors previously Ericok and Mason (2021) proposed distance functions on the configuration spaces of hard disks quotiented by various symmetry groups. They subsequently Eriçok et al. (2021) used these distances to explicitly triangulate the quotients of the configuration space of two spheres in the rhombic dodecahedron and measure the diffusion diameters of the resulting spaces. They also observed an accumulation of critical points in the configuration space at packing fractions in the phase coexistence interval, suggesting the possibility of a topological and geometric catastrophe there in the thermodynamic limit.
The technical contribution of this work is a set of techniques described in Sec. II that collectively allow the configuration space geometries for hard disk systems with disks and hard sphere systems with spheres to be evaluated in practice. The diffusion diameters and -mixing times (defined in Sec. III) of these spaces are computed as functions of particle diameter and particle number, and in Sec. IV significant discontinuities are observed at the packing fractions of critical points close to the phase coexistence intervals. Along with the observation Eriçok et al. (2021) that critical points accumulate in the phase coexistence intervals with increasing particle number, this suggests that there should be a discontinuity in the mixing time at the packing fraction of the phase transition in the thermodynamic limit, offering preliminary support for Conj. 1.
II Configuration Spaces
II.1 Tautological function
The configuration space of points on a -dimensional torus is
(1) |
A two-dimensional torus for hard disks is obtained by identifying opposite edges of a regular hexagon, whereas a three-dimensional torus for hard spheres is obtained by identifying opposite faces of a rhombic dodecahedron. Figure 2 shows these domains with some of the periodic images of the fundamental unit cells. The center to center distance to the neighboring cells is one in both cases.

The tautological function is the maximum radius that the disks could have without overlapping, or
(2) |
where is half the geodesic distance between the centers of disks and . This allows the configuration space of disks of radius to be written as
(3) |
or the set of all configurations of points where the minimum point separation is at least .
II.2 Critical configurations
Morse theory Morse (1934); Milnor (2016) stipulates that the topology of the sublevel sets of a generic real-valued function defined on a smooth manifold can change only at the critical points of . Intuitively, a critical point is a point where the gradient of the function vanishes. The index of a critical point is defined as the number of negative eigenvalues of the Hessian matrix there, and roughly indicates the number of independent directions along which decreases to second order. Let denote a sublevel set of . One of the results of Morse theory is that the topology of and are equivalent if the interval doesn’t contain any critical points of .
Baryshnikov et al. Baryshnikov et al. (2014) began to develop a min-type Morse theory specifically for hard disk configuration spaces, and proved that the critical points of the tautological function are precisely the mechanically-balanced configurations. These can be shown to coincide with the critical points of the regularized hard-disk potential in Ref. Eriçok et al. (2021), found using a conjugate gradient algorithm to minimize the square of the potential gradient magnitude. Interactive databases of the critical points for the hard disk111https://github.com/burakericok/hard_disk_crits and hard sphere222https://github.com/burakericok/hard_sphere_crits configuration spaces are available online. The distributions of critical points as functions of index and packing fraction in Refs. Ritchey (2019); Eriçok et al. (2021) suggest that an accumulation of critical points in a narrow packing fraction interval with increasing number of disks could be associated with the onset of a phase transition; one of the purposes of this work is to explore this observation further.
II.3 Geometric representation
The procedure followed previously Ericok and Mason (2021); Eriçok et al. (2021) when studying hard disk configuration spaces involved repeatedly sampling points in the configuration space of points , connecting nearby points in the resulting point cloud to reconstruct , and restricting to the hard disk configuration space by retaining only those regions with suitable values of the tautological function. The difficulty with this procedure is that the dimension and complexity of increases rapidly with , quickly requiring a point cloud with an overwhelming number of points to accurately capture the geometric details. This section describes several techniques that, while they do not solve the problem, reduce the required number of points enough to allow us to study the geometry of configuration spaces with up to dimensions.
The first is to quotient the configuration space by various symmetry groups, thereby reducing the relevant volume. The three isometry groups considered here are the group of rigid translations , the group of disk label permutations , and the point group symmetries of the fundamental unit cell . As described previously Ericok and Mason (2021); Eriçok et al. (2021), the configuration space is quotiented by these groups by changing the distance function used to identify and connect nearby points in the point cloud. More precisely, there is a natural distance function on that is the sum of the geodesic distances each point in a configuration would need to travel to be converted into configuration . Given an isometry group , the distance in the quotient space can be written as Burago et al. (2001)
(4) |
Observe that is a continuous group, that Eq. 4 involves solving a global optimization problem when , and that this problem can have multiple local minima. While algorithms like Tabu search Glover (1989); Chelouah and Siarry (2000) can be used in such situations, the number of problems to be solved increases rapidly with the number of isometry groups considered. For example, when , the global optimization over rigid translations needs to be solved times to evaluate Eq. 4 where is the order of .
The second technique reduces the number of optimization problems that need to be solved in Eq. 4. The idea is to construct an approximation to the solution of the global optimization problem over rigid translations, allowing the discrete symmetry operations that are unlikely to realize the minimum to be quickly rejected. The procedure followed here samples a fixed number of random translations for a fixed discrete symmetry operation, and uses the minimum over the random translations as the approximation. The details of this procedure and various numerical results are provided in App. A. This reduces the computational cost enough to be able to evaluate a small number of distances , but not all the pairwise distances in the point cloud when identifying nearby points to reconstruct .
The third techniques reduces the number of times that needs to be evaluated. Points sampled from are initially mapped to a Cartesian space with coordinates given by descriptors that are invariant to the action of , , and Ericok and Mason (2021); Eriçok et al. (2021). This mapping is conjectured to be an embedding and therefore to preserve the neighborhood of every point in , but is not isometric in the sense that distances between configurations are distorted. That said, there are efficient algorithms to calculate -nearest neighbor graphs in the descriptor space with the Euclidean metric. Since neighborhoods are preserved, the -nearest neighbor graph of a point in the descriptor space should contain the -nearest neighbor graph of the point in for sufficiently large ; our numerical experiments suggest that is usually sufficient. The procedure therefore involves constructing the -nearest neighbor graph in the descriptor space, evaluating for each edge of this graph, and constructing the approximate -nearest neighbor graph in using these distances.
It is useful at this point to discuss the relationship of the -nearest neighbor graph in to the geometry of the underlying space. It is clear that the apparent connectivity of the space depends on the value of ; for very small the graph would likely contain many disconnected components, whereas for very large every point would share an edge with every other point, regardless of the actual properties of the space. While there does not seem to be an established canonical approach to selecting the value of , the strategy followed here involves the use of topological information. Observe that, for any filtration of the -nearest neighbor graph by the tautological function, the number of disconnected components in the resulting graph should be at most the number of index- critical points with disk radius larger than or equal to the filtration value. Moreover, when the filtration value is equal to the smallest value such that all of the index- critical points (which correspond to saddle points) are included, only a single connected component should remain. The smallest value of for which these conditions are satisfied seems a reasonable choice, and for the numbers of points in our point clouds (indicated in Fig. 3) is approximately where is the number of disks.

The fourth technique is to directly use the -nearest neighbor graph rather than a simplicial complex (a triangulation) to evaluate the geometric properties of . Previously, the configuration spaces for small numbers of hard disks and spheres were triangulated as -complexes Ericok and Mason (2021); Eriçok et al. (2021). While such simplicial complexes can accurately represent all the geometric properties of the underlying space, the number of simplices required generally increases exponentially with dimension, quickly making the computational memory requirements prohibitive Sheehy (2013); Buchet et al. (2016). Fortunately, the only geometric information required to calculate the diffusion diameter and the -mixing time are geodesic distances, and these can be reasonably approximated from the -nearest neighbor graph alone. Figure 4 shows the quotient space of two points on represented as an -complex (left) and as a graph (right). While the shortest path connecting two points is a straight line on the left, the path through the edges of the graph on the right is slightly longer. Since both the diffusion distance and the -mixing time are designed to be robust to small geometric perturbations like these, using the -nearest neighbor graph of directly is sufficient for our purposes.

The fifth and final technique is related to the construction of the point cloud on , and is intended to reduce the number of points required to accurately represent the relevant geometric features of . Figure 5 shows the distributions of tautological function values for point clouds on sampled by three different procedures, with the dashed lines indicating the radii of previously-identified critical points. The left panel shows the distribution of points sampled uniformly on , and reveals that the overwhelming majority of the space’s volume has comparatively small values of where only a single pair of disks would be in contact. The critical points are concentrated at relatively high values of though. This motivates the use of importance sampling Kahn and Harris (1951) to sample points more uniformly over , concentrating points in the regions where the topology and geometry of are most likely to change and giving the distribution in the middle panel. The sampling density around the critical points can be increased further by perturbing known critical configurations and adding the resulting configurations directly to the point cloud, giving the distribution in the right panel. The intention is to ensure that the density of sampled points is high enough to accurately reflect the parts of that are most relevant to the hypothesis without unnecessarily sampling points elsewhere in the space.

III Configuration space geometry
Diffusion is a smoothing process. Flows from regions of high concentration to low concentration necessarily make a concentration field more uniform with time. This also makes diffusion processes relatively insensitive to small perturbations in the initial concentration field or to the geometry of the underlying space. This property is likely one of the reasons that diffusive processes are often used to learn about the geometry of a space represented by a sampled point cloud, since the resulting insights do not depend sensitively on the distribution of the points.
III.1 Diffusion Distance
Let be the -nearest neighbor graph on constructed by the procedure in Sec. II.3. The diffusion distance on measures the distance between two distributions that begin as Dirac delta distributions on vertices and and diffuse on for a time . Let the kernel matrix have entries where is Dijkstra’s distance between vertices and and is twice the median length of edges in , and the diagonal degree matrix has entries . Then is the transition rate matrix of a continuous-time Markov process where the effect of raising to the th power is equivalent to propagating the process by a time . Let be the set of eigenvalues and eigenvectors of for where is the number of vertices. Since is normalized, the largest eigenvalue is associated with a constant eigenvector . Discarding and , the diffusion coordinates are defined as
(5) |
and encode a distribution starting as a Dirac delta distribution on vertex and diffusing for a time . The diffusion distance between vertices and is then defined as
(6) |
where is the Euclidean norm. Note that the significance of the time depends on the diffusion rate implicit in the kernel . Since the eigenspectrum of decays very quickly for sufficiently large , the diffusion distance can often be accurately approximated using only the first few eigenvalues and eigenvectors for a substantial time and memory savings Talmon et al. (2013). Details about this approximation are provided in App. B.
III.2 Mixing Time
Thermodynamics is generally concerned with equilibrium or quasiequilibrium systems, where there is no net redistribution of matter or energy and thermodynamic observables are time independent. From the standpoint of statistical thermodynamics, this means that the time average of any thermodynamic observable over the system’s trajectory through the phase space does not depend on the initial microstate provided that the averaging is performed over a sufficiently long time interval; the shortest such interval is known as the mixing time.
Let be the phase space of a thermodynamic system, and be the canonical coordinates and momenta, and be the probability distribution of microstates on at time . Suppose that the system has an arbitrary initial condition . Briefly setting aside the specifics of the time evolution of , the steady-state distribution defines the equilibrium condition on , and the mixing time could in principle be defined by the approach of to . That said, the literature does not seem to precisely define the necessary conditions for the two distributions to be effectively indistinguishable, nor the effect of the initial condition on the resulting mixing time. A likely reason for this is the historical emphasis on equilibrium states for which, by hypothesis, the observation time can be made as long as necessary for there to be complete mixing.
Classically, the time evolution of is governed by the Liouville equation Liouville (1838); Gibbs (1902). Liouville’s theorem states that the resulting convective derivative of the probability distribution is zero, or that the flow of probability density resembles that of an incompressible fluid. This raises questions relating to the convergence of probability distributions that are closely related to those about the origins of irreversibility, and that continue to be discussed in the literature. This discussion is avoided by simply supposing that evolves by the diffusion equation, while not endorsing this approach generally. One consequence of this supposition is that the marginal probability distributions on the configuration and momentum subspaces evolve independently, though only the marginal probability distribution on the configuration space has a limiting distribution (the momentum subspace is unbounded). This suggests that the mixing time be defined by the convergence of the marginal probability distribution on the configuration space to the limiting distribution .
The Kullback-Leibler divergence Kullback and Leibler (1951) is often called the relative entropy, and is a standard way to quantify how much a probability distribution differs from a reference probability distribution :
(7) |
We propose that the condition for to have converged to be that , where is an adjustable parameter analogous to a conventional convergence threshold.
Now suppose that is a Dirac delta distribution centered at , and that is the minimum time required for this initial condition to converge to on in the sense above. The -mixing time is defined as the weighted average of over all possible choices of the initial condition, or
(8) |
where the measure of integration is taken to be in the absence of a natural alternative. The -mixing time is regarded as a precisely-defined proxy for the thermodynamic mixing time on the configuration space in the following.
This leaves only the definition of a diffusive process occurring on a connected graph instead of on a continuous space . Let be the probability masses on the vertices of at time , and define the graph Laplacian as Chung and Graham (1997). The governing equation for a diffusive process is
(9) |
Let be the diagonal matrix of the eigenvalues of , and be the matrix of the corresponding eigenvectors where the first column is a constant vector. Since is a symmetric matrix, the columns of form an orthogonal basis and the solution to the diffusion equation is
(10) |
for the initial condition . The steady-state distribution is readily calculated using the fact that all of the diagonal elements of go to zero in the limit of long time except the first term which goes to one. This suggests that the steady-state distribution is always the uniform distribution over all vertices of .
The right panel of Fig. 1 shows the different behaviors of the diffusion distance and the mixing time in practice. Observe that while both exhibit a discontinuous jump at the filtration value where the two components of the space merge, the diffusion diameter has an additional peak at the filtration value of the bottleneck. The initial rise is attributed to the probability mass having difficulty diffusing to the point of the bottleneck through a small number of paths. Since the space is constructed by means of -nearest neighbor graphs, more paths spanning the neck appear with further increases in the filtration value, subsequently reducing the diffusion diameter. That is, the peak in the diffusion diameter is effectively an artifact of the construction of the space and the precise sequence in which the edges appear. By comparison, the average mixing time is unaffected by the bottleneck since the small volume of that region reduces its contribution to the average in Eq. 8. More details about the calculation of the -mixing time for this system are given in App. C.
IV Results
The hard disk equation of state usually appears in the literature as a function of packing fraction Mulero et al. (2008); Tian et al. (2020). The radius of the disks appearing in Eq. 3 can be converted to a packing fraction using for hard disks and for hard spheres, where and are respectively the area of the fundamental hexagon and the volume of the fundamental rhombic dodecahedron in Fig. 2.
The authors previously constructed quotient spaces of two hard disks Ericok and Mason (2021) and two hard spheres Eriçok et al. (2021) and studied the topological and geometric properties of both the original and quotient spaces as functions of packing fraction. They observed that while the geometric and topological features of the these spaces changed dramatically with the choice of symmetry groups by which to quotient, the general behavior of the diffusion diameter (and significantly the locations of the discontinuities) as a function of packing fraction did not. This suggests that the quotient space has the advantage of a substantially reduced volume while retaining the essential geometric and topological features of the full configuration space. If the behavior of the -mixing time is similarly robust to the choice of symmetry groups by which to quotient, this would allow Conj. 1 to be tested more easily and for systems with larger numbers of disks.
Figure 6 shows the -mixing times of the quotient spaces and for hard disks (top) and hard spheres (bottom). Since only relative changes in the -mixing time are significant, all the results in this section are normalized to their maximum value for ease of comparison. Initially consider the hard disk results in the top row. There are only two critical points, one index- and one index-. The volumes and -mixing times of both spaces initially grow slowly with decreasing , with a discontinuity appearing at precisely the packing fraction of the index- critical point; the discontinuity is stronger for since the geometric change is more severe, though at the price of greatly increased computational cost. The bottom row shows the results for hard spheres, for which there are two distinct index-, one index-, and one index- critical points. Since having distinct index- critical points results in the space having multiple disconnected components for certain intervals of , the -mixing times for each component are shown with an opacity that indicates the fraction of vertices participating in that component. As before, there is a discontinuity in the -mixing time at the packing fraction of the index- critical point, in this situation indicating that the disconnected components are joined. Notice that a similar discontinuity does not occur at the packing fraction of the index- critical point, nor indeed at any other critical point of the tautological function. Since the behavior of the diffusion diameter was previously found to be similarly robust to the choice of symmetry groups by which to quotient, the quotient spaces will be used exclusively in the following.

Figure 7 shows the diffusion diameters and -mixing times of the quotient spaces for hard disks as functions of packing fraction, with dashed lines indicating the packing fractions of the critical points. The first observation is that not all of the critical points correspond to substantial geometric changes, at least not ones to which the diffusion distance and -mixing time are sensitive. The second is that the diffusion diameter is generally much noisier than the -mixing time, and while the structure in the signal could perhaps be analyzed for further geometric information that is not the purpose of our study. The third is that while the discontinuities in the diffusion distance and -mixing time do not always occur at the same packing fractions, the largest discontinuities generally do.

Figure 8 shows the corresponding results to Fig. 7 for hard spheres. While the number of critical points is greatly increased relative to the hard disk systems in Fig. 7, the number of discontinuities in the diffusion diameter and -mixing time are approximately the same. This suggests that there is perhaps a small class of critical points associated with substantial geometric changes to the configuration space, and that the distribution of these critical points is the most relevant to the underlying hypothesis. Notice particularly that the packing fraction of the largest discontinuities appears to be approaching the packing fraction of the lower end of the phase-coexistence interval with increasing . Unfortunately, even using all of the techniques in Sec. II.3, the memory requirements increase so rapidly with that we could not realistically examine the spaces for . Even for five and six hard spheres, the volume of the space increases so rapidly with decreasing packing fraction that we cannot report diffusion diameters and -mixing times over the entire domain. This does not substantially affect our conclusions though, since the sampled domain of already extends well below the coexistence interval, and notably includes all of the index- critical points.

There is evidence that, at least for the hard disk systems, the substantial geometric changes leading to discontinuities in the -mixing time are often associated with the lowest-packing-fraction index- critical point. Figure 9 shows the packing fractions of the largest discontinuities in Figs. 7 and 8 as black squares, with the number of hard disks (top) and hard spheres (bottom) increasing on the vertical axes. The packing fractions of all known critical points are also shown, with index- critical points in purple, index- critical points in blue, and all others in green. While -mixing time data is only available for hard disks and hard spheres, populations of critical points for hard disks and hard spheres have been sampled using established techniques that are described elsewhere Ritchey (2019); Ericok and Mason (2021); Eriçok et al. (2021). Apart from the increasing concentration of low-index critical points around the phase coexistence interval with increasing , the most striking aspect of the figure is that the largest discontinuities almost always occur close to the packing fraction of the last index- critical point to appear with decreasing packing fraction (each of the black squares comes slightly before an index- critical point due to finite sampling). This is not entirely unexpected, since each index- critical point either joins previously disconnected components or add a new class of closed paths to the space. Supposing that discontinuities in the -mixing time are associated with the former, it also makes sense that the largest discontinuity would be observed at lower packing fractions where the disconnected components being joined had the opportunity to grow to substantial volumes. The increasing concentration of low-index critical points around the phase coexistence interval draws the discontinuity closer to the liquid limit with increasing , though that is admittedly a noisy trend for the small numbers of disks considered here. Nevertheless, this does provide evidence that something along the lines of Conj. 1 could be true for hard sphere systems, and perhaps for thermodynamics systems as well.

V Conclusion
A phase transition is necessarily related to a discontinuous change in the probability density function describing the distribution of the system’s microstates on the phase space. The question of how this could occur as the thermodynamic control variables are continuously varied has not been conclusively answered in the literature. One proposal called the topological hypothesis Caiani et al. (1997); Franzosi and Pettini (2004) suggests that topological changes to the accessible region of the configuration space is a necessary condition for a phase transition. This paper instead suggests that a substantial change to the geometry of the accessible region is a necessary condition for a phase transition, and that such a change is often (but not always) associated with a topological change. More specifically, Conj. 1 proposes that a discontinuity in the mixing time on the configuration space is a necessary condition for a first-order phase transition in the thermodynamic limit. Our main result is a preliminary test of this conjecture for hard disk and hard sphere systems with few enough disks that the configuration space geometry can be explicitly studied.
The configuration spaces of hard disks and hard spheres are represented as graphs with vertices representing specific configurations of disks and edges indicating the distances between the configurations. The vertices of the graph are sampled more densely around critical points, or locations where the topology of the configuration space is known to change as a function of disk radius, to more accurately capture the topological and geometric changes in those regions. The diffusion diameters and the -mixing times of the resulting graphs are calculated as functions of packing fraction for and hard spheres; the -mixing time is proposed here, and is designed to be an analogue to the thermodynamic mixing time that can be explicitly evaluated.
The geometric signals obtained for the diffusion diameter and the -mixing time are consistent in the sense that their discontinuities generally occur at the same packing fractions. Apart from relating more directly to the content of Conj. 1, the -mixing time is a much smoother function of packing fraction than the diffusion diameter. The discontinuities in the -mixing time are found to occur at comparatively few critical points; for the hard disk and hard sphere systems at least these are predominantly index- critical points. Along with the observation that the low-index critical points are increasingly concentrated around the phase coexistence interval with the number of disks, this suggests that a discontinuous change in the -mixing time could indeed coincide with the first-order phase transitions in the hard disk and hard sphere systems in the thermodynamic limit.
Future studies along these lines would likely need to extend the analysis to larger numbers of disks to make the trends in the approach to the thermodynamic limit clearer. Given the rate of increase in the computational requirements with number of disks, this would require at a minimum more efficient algorithms to search for critical points and calculate the distances between configurations. The computational requirements would also be reduced by using a further-reduced representation of the spaces by means of witness graphs Aronov et al. (2013), or by restricting to particular intervals of disk radius close to the phase coexistence region. Further development of a suitable min-type Morse theory would also be helpful Mori and Salvetti (2011); Baryshnikov et al. (2014).
Acknowledgements.
O.B.E. and J.K.M. were supported by the National Science Foundation under Grant No. 1839370.Appendix A Heuristic for the distance
Let in Eq. 4. Since is a continuous group and is discrete, Eq. 4 can be rewritten as
Evaluating involves solving a global optimization problem over rigid translations for a fixed discrete symmetry operation . Since evaluating involves solving of these global optimization problems, the computational cost could be significantly reduced if a suitable approximation for could be found to quickly reject some of the .
Such an approximation is constructed by, for a fixed number of iterations , randomly sampling a random translation , calculating the distance for that translation, and accepting that translation only if it reduces Wells et al. (2004). The resulting approximations for are sorted by increasing magnitude, and the full optimization problem is solved only for the first symmetry operations in the sorted list. The rationale for this procedure is that calculating the approximations using a moderate is less expensive than solving the global optimization problem.

For two generic configurations of hard disks there are symmetric versions of the second configuration due to the actions of and . Figure 10 shows the sorted true distances , and is the minimum of this set. Figure 11 shows the same configurations sorted by the approximate values of for , , , and plotted with the true values of on the vertical axis. Observe that with increasing these curves should converge to the one in Fig. 10, and that even for small the true minimum distance should appear within the first symmetry operations. Numerical experiments suggest that and are generally sufficient for an approximation with a relative error on the order of .

Appendix B Eigenspectrum analysis
The diffusion coordinates in Eq. 5 involve the th powers of the eigenvalues of sorted by decreasing magnitude. This means that for sufficiently large , the computational expense of calculating the diffusion distance can be considerably reduced with negligible loss of accuracy by truncating the eigenspectrum and only considering the first few diffusion coordinates Talmon et al. (2013). Figure 12 shows the th powers of the eigenvalues of for at times , , and . Observe that even for the relatively small most of the diffusion coordinates will be numerically indistinguishable from zero. Only the first diffusion coordinates are used here unless otherwise specified, changing the diffusion diameter to a relative precision of less than and making this source of error negligible compared to other sources.

Appendix C Mixing time of an example system
Figure 13 shows the -mixing times as a function of for the example system in Fig. 1. The -nearest neighbor graph is constructed with , the length of an edge is given by the Euclidean distance between the corresponding vertices, and is used since the steady-state probability mass at each vertex is (the inverse of the number of vertices). Points closer to the center have smaller -mixing times since the initial Dirac-delta distribution can spread to the upper and lower regions more easily as compared to distributions starting near the corners. The average mixing time for this example is around .

References
- Landau (1936) L. Landau, Nature 138, 840 (1936).
- Landau and Lifshitz (2013) L. D. Landau and E. M. Lifshitz, Statistical Physics: Volume 5, vol. 5 (Elsevier, 2013).
- Boltzmann (1871) L. E. Boltzmann, Wiener Berichte 63, 679 (1871).
- Ehrenfest and Ehrenfest (1990) P. Ehrenfest and T. Ehrenfest, The conceptual foundations of the statistical approach in mechanics (Courier Corporation, 1990).
- Birkhoff (1931) G. D. Birkhoff, Proceedings of the National Academy of Sciences 17, 656 (1931).
- Neumann (1932) J. v. Neumann, Proceedings of the National Academy of Sciences 18, 70 (1932).
- Caiani et al. (1997) L. Caiani, L. Casetti, C. Clementi, and M. Pettini, Physical review letters 79, 4361 (1997).
- Franzosi and Pettini (2004) R. Franzosi and M. Pettini, Physical review letters 92, 060601 (2004).
- Franzosi et al. (2007) R. Franzosi, M. Pettini, and L. Spinelli, Nuclear Physics B 782, 189 (2007).
- Franzosi and Pettini (2007) R. Franzosi and M. Pettini, Nuclear Physics B 782, 219 (2007).
- Kastner and Mehta (2011) M. Kastner and D. Mehta, Physical review letters 107, 160602 (2011).
- Gori et al. (2018) M. Gori, R. Franzosi, and M. Pettini, Journal of Statistical Mechanics: Theory and Experiment 2018, 093204 (2018).
- Bel-Hadj-Aissa et al. (2021) G. Bel-Hadj-Aissa, M. Gori, R. Franzosi, and M. Pettini, Journal of Statistical Mechanics: Theory and Experiment 2021, 023206 (2021).
- Coifman and Lafon (2006) R. R. Coifman and S. Lafon, Applied and computational harmonic analysis 21, 5 (2006).
- Talmon et al. (2013) R. Talmon, I. Cohen, S. Gannot, and R. R. Coifman, IEEE signal processing magazine 30, 75 (2013).
- Kullback and Leibler (1951) S. Kullback and R. A. Leibler, The annals of mathematical statistics 22, 79 (1951).
- Dyre (2016) J. C. Dyre, Journal of Physics: Condensed Matter 28, 323001 (2016).
- Alder and Wainwright (1962) B. Alder and T. Wainwright, Physical Review 127, 359 (1962).
- Weber et al. (1995) H. Weber, D. Marx, and K. Binder, Physical Review B 51, 14636 (1995).
- Mitus et al. (1997) A. Mitus, H. Weber, and D. Marx, Physical Review E 55, 6855 (1997).
- Halperin and Nelson (1978) B. Halperin and D. R. Nelson, Physical Review Letters 41, 121 (1978).
- Young (1979) A. Young, Physical Review B 19, 1855 (1979).
- Bernard and Krauth (2011) E. P. Bernard and W. Krauth, Physical review letters 107, 155704 (2011).
- Engel et al. (2013) M. Engel, J. A. Anderson, S. C. Glotzer, M. Isobe, E. P. Bernard, and W. Krauth, Physical Review E 87, 042134 (2013).
- Isobe and Krauth (2015) M. Isobe and W. Krauth, The Journal of chemical physics 143, 084509 (2015).
- Pieprzyk et al. (2019) S. Pieprzyk, M. N. Bannerman, A. C. Brańka, M. Chudak, and D. M. Heyes, Physical Chemistry Chemical Physics 21, 6886 (2019).
- Carlsson et al. (2012) G. Carlsson, J. Gorham, M. Kahle, and J. Mason, Physical Review E 85, 011303 (2012).
- Morse (1934) M. Morse, The calculus of variations in the large, vol. 18 (American Mathematical Soc., 1934).
- Milnor (2016) J. Milnor, Morse theory.(AM-51), vol. 51 (Princeton university press, 2016).
- Baryshnikov et al. (2014) Y. Baryshnikov, P. Bubenik, and M. Kahle, International Mathematics Research Notices 2014, 2577 (2014).
- Ritchey (2019) K. Ritchey, Ph.D. thesis, Ohio State University, Electronic Thesis or Dissertation (2019).
- Ericok and Mason (2021) O. B. Ericok and J. K. Mason, arXiv preprint arXiv:2101.00780 (2021).
- Eriçok et al. (2021) O. B. Eriçok, K. Ganesan, and J. K. Mason, Phys. Rev. E 104, 055304 (2021).
- Burago et al. (2001) D. Burago, I. D. Burago, Y. Burago, S. Ivanov, and S. A. Ivanov, A course in metric geometry, vol. 33 (American Mathematical Soc., 2001).
- Glover (1989) F. Glover, ORSA Journal on computing 1, 190–206 (1989).
- Chelouah and Siarry (2000) R. Chelouah and P. Siarry, European journal of operational research 123, 256–270 (2000).
- Sheehy (2013) D. R. Sheehy, Discrete & Computational Geometry 49, 778 (2013).
- Buchet et al. (2016) M. Buchet, F. Chazal, S. Y. Oudot, and D. R. Sheehy, Computational Geometry 58, 70 (2016).
- Kahn and Harris (1951) H. Kahn and T. E. Harris, National Bureau of Standards applied mathematics series 12, 27 (1951).
- Liouville (1838) J. Liouville, Journal de Mathématiques Pures et Appliquées 3, 342 (1838).
- Gibbs (1902) J. W. Gibbs, Elementary principles in statistical mechanics: Developed with especial reference to the rational foundations of thermodynamics (C. Scribner’s sons, 1902).
- Chung and Graham (1997) F. R. Chung and F. C. Graham, Spectral graph theory, 92 (American Mathematical Soc., 1997).
- Mulero et al. (2008) A. Mulero, C. Galán, M. Parra, and F. Cuadros, in Theory and Simulation of Hard-Sphere Fluids and Related Systems (Springer, 2008), pp. 37–109.
- Tian et al. (2020) J. Tian, H. Jiang, and A. Mulero, Molecular Physics 118, e1687948 (2020).
- Aronov et al. (2013) B. Aronov, M. Dulieu, and F. Hurtado, Computational Geometry 46, 894 (2013), ISSN 0925-7721, euroCG 2009.
- Mori and Salvetti (2011) F. Mori and M. Salvetti, Mathematical Research Letters 18, 39 (2011).
- Wells et al. (2004) M. T. Wells, G. Casella, and C. P. Robert, in A Festschrift for Herman Rubin (Institute of Mathematical Statistics, 2004), pp. 342–347.