Graphon-valued processes with
vertex-level fluctuations
Abstract.
We consider a class of graph-valued stochastic processes in which each vertex has a type that fluctuates randomly over time. Collectively, the paths of the vertex types up to a given time determine the probabilities that the edges are active or inactive at that time. Our focus is on the evolution of the associated empirical graphon in the limit as the number of vertices tends to infinity, in the setting where fluctuations in the graph-valued process are more likely to be caused by fluctuations in the vertex types than by fluctuations in the states of the edges given these types. We derive both sample-path large deviation principles and convergence of stochastic processes. We demonstrate the flexibility of our approach by treating a class of stochastic processes where the edge probabilities depend not only on the fluctuations in the vertex types but also on the state of the graph itself.
Key words.
Graphs, graphons, dynamics, sample paths, process convergence, large deviations, optimal paths.
MSC2010.
05C80, 60C05, 60F10.
Acknowledgment.
The work in this paper was supported by the Netherlands Organisation for Scientific Research (NWO) through Gravitation-grant NETWORKS-024.002.003.
1. Introduction
1.1. Background
Graphons arise as a powerful tool for characterising the limit of a sequence of dense graphs, i.e., graphs in which the number of edges scales as the square of the number of vertices. The theory describing these graphons (see e.g. [21], [22], [4], [5], [20]) focuses on the limiting properties of large dense graphs in terms of their subgraph densities. The literature covers both typical and atypical behaviour, a notable result being the large deviation principle (LDP) for homogeneous Erdős-Rényi random graphs and associated graphons [9], and their inhomogeneous counterparts [12].
While most of the existing theory focuses on static random graphons, the attention has gradually shifted to dynamic random graphons (see e.g. [26], [7], [1]). Two notable contributions in this area are [1], which presents a stochastic process limit in the space of graphons for a class of processes where the edges evolve in a dependent manner, and [6], which extends the LDP of [9] to a sample-path LDP for a dynamic random graph in which the edges switch on and off independently in a random fashion. In [6] the authors leave open the question whether a sample-path large deviation principle can be established for processes where the edges switch on and off in a dependent manner, such as in [1].
1.2. Motivation
The goal of the present paper is two-fold: (1) to answer the open question raised in [6] by establishing a sample-path large deviation principle for a class of processes in which edges evolve in a dependent manner; (2) to strengthen the results in [1] while working in a more general framework.
In the class of processes we consider, each vertex is assigned a type that changes randomly over time, and fluctuations in the types of the vertices determine how the edges interact with each other while switching on and off. Specifically, the paths of the types of all the vertices up to time determine the probability that the edges in the random graph are active at time . Collectively, these paths are called the driving process.
Our results generalise those of [1] in a number of directions:
- (i)
-
(ii)
We consider a general driving process and a general edge-switching dynamics, whereas [1] restricts attention to a specific driving process (the multi-type Moran model) and to a specific edge-switching dynamics (modulated by a fitness function).
- (iii)
-
(iv)
We allow for processes in which the probabilities that edges are active depend not only on the fluctuations in the types but also on the state of the graph itself, i.e., on which edges are active or not (Section 4.1).
On the way to proving our results for graphon-valued processes, we also prove a new large deviation principle for static random graphs (Theorem 2.6). This result can be viewed as a generalisation of the large deviation principle for inhomogeneous Erdős-Rényi random graphs in which each vertex is assigned a random type. Our proofs rely on concentrations estimates, coupling arguments, and continuous mapping. Along the way, several examples are presented.
The models analysed in this paper have a one-way dependence: the states of the edges depend on the types of the vertices, but the types of the vertices do not depend on the states of the edges. There are many natural models that fall into this framework, coming from statistical physics, population genetics and the social sciences, where the strengths of the interactions between particles, alleles or individuals generally depend on the type they carry. It is much harder to analyse models that exhibit a two-way dependence. Such models capture the evolution of spins, infections or opinions on dynamic random networks with mutual feedback, an area that so far remains largely unexplored.
1.3. Outline
In Section 2 we recall basic LDPs for graphons, and present three LDPs for what we call inhomogeneous random graphs with type dependence (IRGTs), which are static random objects. In Section 3 we look at their dynamic counterparts, which are graph-valued processes, the main result being a sample-path LDP in graphon space. We illustrate our results via a running example, and derive convergence of the graph-valued process to a graphon process. In Section 4 we describe various applications, and discuss possible extensions. Section 5 contains the proofs of our main theorems. Appendix A identifies the rate function in the LDP of the underlying driving process.
2. Large deviations for static random graphs
While the goal of the present paper is to study a specific class of dynamic random graphs, we begin by analyzing their static counterparts. The reason is that the marginal distributions of the dynamic random graphs to be considered (introduced in Section 3) at any given time corresponds to a random graph with type dependence (introduced in Section 2.3).
In Section 2.1 we recall a few basic definitions related to graphons. In Section 2.2 we introduce inhomogeneous Erdős-Rényi random graphs and recall the large deviation principle for their associated empirical graphons. In Section 2.3 we describe a generalisation of inhomogeneous Erdős-Rényi random graphs, referred to as inhomogeneous random graphs with type dependence (IRGT), which motivate the definition of the class of graph-valued stochastic processes that we will be working with from Section 3 onwards. In Section 2.4 we state a number of key assumptions that are needed along the way. In Section 2.5 we establish the large deviation principle for the associated empirical graphon processes under the assumption that the driving process satisfies the LDP. The latter assumption is investigated in detail in Appendix A.
2.1. Graphs and graphons
Let be the space of functions such that for all , formed after taking the quotient with respect to the equivalence relation of almost everywhere equality. A finite simple graph on vertices can be represented as a graphon by setting
(2.1) |
This object is referred to as an empirical graphon and has a block structure. The space of graphons is endowed with the cut distance
(2.2) |
It is noted that the space is not compact [19, Example F.6].
On there is a natural equivalence relation, referred to as ‘’. Letting denote the set of measure-preserving bijections , we write when there exists a such that for all . This equivalence relation induces the quotient space , where is the cut metric defined by
(2.3) |
Notably, the space is compact [21, Lemma 8].
2.2. Inhomogeneous Erdős-Rényi random graph
Let be a reference graphon. Fix and consider a random graph with vertex set , where the pair of vertices , , is connected by an edge with probability , independently of other pairs of vertices. Write to denote the law of . Use the same symbol to denote the law on induced by the map that associates with the graph its graphon . Write to denote the law of , the equivalence class associated with .
The following theorem is an extension of the LDP for homogeneous Erdős-Rényi random graphs established in [9]. It was first stated in [12] under additional assumptions. These assumptions were subsequently relaxed in [24], [3], [13]. The following version of the LDP corresponds to [13, Theorem 4.1].
Theorem 2.1.
Suppose that and are integrable. Then the sequence of probability measures satisfies the LDP on with rate and with rate function , i.e.,
(2.4) | ||||
where
(2.5) |
is any representative of , and
(2.6) |
with
(2.7) |
2.3. Inhomogeneous random graphs with type dependence
Consider the following generalisation of the inhomogeneous Erdős-Rényi random graph defined in Section 2.2. Suppose that each vertex is assigned a (possibly random) type . Denote the empirical type measure by
(2.8) |
and the empirical type distribution
(2.9) |
where is the indicator function of the event and . Let denote the space of measures on endowed with the topology of weak convergence.
The way the graph is constructed is as follows. Whether or not an edge is active depends on local properties, namely, the types and , as well as global properties, namely, the empirical type distribution . Concretely, we let edge be active with probability
(2.10) |
where is symmetric in its first two inputs. Given , the edge placement is independent for all vertex pairs. We label the resulting sequence of random graphs as , and refer to them as inhomogeneous random graphs with type dependence (IRGT).
Two relations with inhomogeneous Erdős-Rényi random graphs are worth mentioning:
-
Observe that if
(2.11) then the IRGT is equivalent to the inhomogeneous Erdős-Rényi random graph defined in Section 2.2.
-
Let denote the right-continuous generalised inverse of a distribution function with support , which is defined in the usual way as
(2.12) For , define the induced reference graphon by setting
(2.13) Given the type distribution , has the same distribution as the inhomogeneous Erdős-Rényi random graph with reference graphon . In other words, we have
(2.14) This observation is central to the LDP for IRGTs stated in Theorem 2.6 below.
2.4. Key assumptions
Before stating Theorem 2.6, we make a number of assumptions.
Assumption 2.2.
The sequence of type distributions satisfies the LDP on with rate and with rate function .
When , , are fixed and in as , as in (2.11), then Assumption 2.2 is satisfied with . Assumption 2.2 holds, for instance, when are i.i.d. random variables with distribution , in which case and is the relative entropy (or Kullback-Leibler divergence) of with respect to . Assumption 2.2 may also hold with not scaling linearly in .
To provide an example, we extend the setup in [13, Example 2.5]. Let , be a probability distribution on with bounded support, and be i.i.d. random variables with distribution . Let be such that
(2.15) |
and identify with its periodic extension to . Let be a smooth convolution kernel with compact support, and define
(2.16) |
In this case and
(2.17) |
We refer the reader to [13, Example 2.5] for the arguments underlying this result.
Assumption 2.3.
The function defined in (2.13) is a continuous mapping from to .
Assumption 2.3 holds, for example, when (i.e., there is no dependence on the type distribution) and is a continuous function. Assumption 2.3 also holds when, in addition, and are continuous functions, and
(2.18) |
In certain settings we require two further assumptions that are of a more technical nature.
Assumption 2.4.
For all , the induced graphon is away from the boundary, i.e.,
(2.19) |
for some .
Assumption 2.5.
The rate function has a unique zero, labelled .
2.5. LDP for IRGTs
Let
(2.20) |
and recall that is a function from to . We are now ready to state our LDP for IRGTs.
Theorem 2.6.
To understand where Theorem 2.6 comes from, it is instructive to realize that two random mechanisms play a role, and that the dominant mechanism determines the rare event behavior. Concretely, think of simulating outcomes of in two steps:
-
Simulate the types of the vertices, i.e., simulate the type distribution .
-
Simulate the edges given , i.e., simulate given the induced reference graphon .
Due to Assumption 2.2, large fluctuations in Step 1 are governed by the LDP with rate and with rate function , whereas due to (2.14) large fluctuations in Step 2 are governed by the LDP with rate and with rate function . In particular, this implies that when large fluctuations in are most likely to be caused by a rare event in Step 1, whereas when large fluctuations in are most likely to be caused by a rare event in Step 2. The regime can be viewed as ‘balanced’, in the sense that large fluctuations in are most likely to be caused by a combination of rare events in both Steps 1 and 2. When we say that the IRGT exhibits vertex-level fluctuations, whereas when we say that it exhibits edge-level fluctuations.
The IRGT is of interest in its own right. However, our primary motivation for introducing the IRGT is that it can be generalized in a natural way to a stochastic process. The rough idea behind its dynamic counterpart is that at each point in time the distribution of the process corresponds to an IRGT. We will focus primarily on processes that exhibit vertex-level fluctuations.
3. Graphon-valued processes
In Section 3.1 we first introduce the graph-valued process of interest, which can be viewed as the dynamic counterpart of the model discussed in Section 2. Section 3.2 describes an illustrative example. Section 3.3 states the sample-path LDP for the graph-valued process under the assumption that the driving process satisfies an LDP. Section 3.4 states the stochastic process limit for the graph-valued process under the assumption that the driving process satisfies a stochastic process limit. The latter assumption is investigated in Appendix A.
3.1. The model
For a given time horizon , let denote our graph-valued process. This process is constructed as follows. Suppose that each vertex has a type that may fluctuate randomly over time. Let denote the process of empirical type measures defined by
(3.1) |
i.e., the dynamic version of (2.8). This process evolves autonomously, i.e., independently of the graph-valued process .
In addition, let denote the process of empirical type distribution defined by
(3.2) |
i.e., the dynamic counterpart of (2.9). The process lives on , the space of -valued càdlàg paths. We suppose that, at any given time , edge is active with probability
(3.3) |
independently of all other edges at time , of , , and of , where
(3.4) |
This function gives rise to the induced reference graphon process , which, for , is characterised by
(3.5) |
Observe that, for any , given the outcome of the empirical type distribution , the distribution of corresponds to that of an inhomogeneous Erdős-Rényi random graph with reference graphon . In other words, for any ,
(3.6) |
where is the inhomogeneous Erdős-Rényi random graph defined in Section 2.2. We make the following assumption on the function , which due to (3.5) is an assumption on .
Assumption 3.1.
The map from to is continuous.
3.2. An illustrative example
Suppose that is characterised by the following dynamics:
-
•
is the empty graph.
-
•
Each vertex is assigned an independent Poisson clock with rate , i.e., the time intervals between two consecutive ring times are exponentially distributed with parameter . Each time the clock attached to vertex rings, all the edges are adjacent to become inactive.
-
•
If the edge is inactive, then it becomes active at rate , independently of anything else.
We first describe the associated driving process. Let denote the sequence of times at which the Poisson clock attached to vertex rings, and let
(3.7) |
denote the time since the clock last rung. The value of can be thought of as the age of vertex at time : each time the clock associated with rings, it dies and all its adjacent edges are lost. Recalling that we assumed that types take values in , we write
(3.8) |
to denote the type of vertex at time , where can be interpreted as the distribution function of an exponential random variable with rate .
The function can now also be identified. The probability that there is an active edge between vertices of ages and is . Putting and , we obtain, using that and ,
(3.9) |
Because is a continuous function of and , and is independent of and , it is straightforward to verify that Assumption 3.1 holds. A more involved example is given in Section 4.1.
3.3. Sample-path large deviations
Similarly as in Section 2.5, we assume that the driving process satisfies the LDP (which for the above illustrative example is established in Lemma A.1).
Assumption 3.2.
satisfies the LDP on with rate and with rate function .
To establish the sample-path LDP for the graphon-valued process , we need to:
-
(I)
establish the LDP in the pointwise topology;
-
(II)
strengthen this topology by establishing exponential tightness.
Step (I) is settled by the following result. For , let
(3.10) |
Proposition 3.3.
Note that Proposition 3.3 does not refer to any edge-switching dynamics. Specifically, if two process and have a common sequence of types and a common edge-connection function , then the marginal distributions are equivalent, i.e.,
(3.11) |
However, this does not necessarily mean that the joint distributions are equivalent, i.e., we may have
(3.12) |
because these depend on the specific edge-switching dynamics. Nonetheless, Proposition 3.3 implies that both and satisfy equivalent LDPs in the pointwise topology, i.e., the rate function depends only on the marginal distributions of the process and not on the specific edge-switching dynamics. In Sections 4.1 and 4.2 we provide examples of processes with equivalent marginals and different edge-switching dynamics.
The specific edge-switching dynamics do need to be taken into consideration when we want to perform step (II), i.e., strengthen the topology of the LDP in Proposition 3.3 by establishing exponential tightness. We next provide a condition that can be used to verify that are exponentially tight. Let
(3.13) |
and define
(3.14) |
In other words, is the number of edges that change (i.e., go from active to inactive or vice versa) at some time between and .
Proposition 3.4.
If, for all and ,
(3.15) |
then is exponentially tight.
Combining the above two propositions, we obtain the following.
Theorem 3.5.
3.4. Stochastic process convergence
In the sequel, denotes convergence in distribution, and convergence of the associated finite-dimensional distributions. We assume that the empirical type distribution has a stochastic process limit.
Assumption 3.6.
Suppose that as on .
We establish the stochastic process limit of as on , i.e., we no longer take the quotient with respect to the equivalence relation . To establish a stochastic process limit in this finer topology, as explained below, we need to ensure that the labels of the vertices update dynamically.
Assumption 3.7.
At any time the labels of the vertices are such that
(3.16) |
The importance of Assumption 3.7 is illustrated in Figure 3.4, where we consider the illustrative example from Section 3.2 with , and . The left panel shows an outcome of with a static labelling, where vertices are labelled arbitrarily at time and their labels do not change over time. We observe that has no discernible structure, and so with probability 1 the sequence does not converge to a limit in . The center panel of Figure 3.4 illustrates an outcome of under the dynamic labelling given in Assumption 3.7. Under this assumption has a discernible structure and in as , where is the smooth graphon illustrated in the right panel of Figure 3.4.



Given the dynamic labelling in Assumption 3.7 and the illustrative example, the motivation behind establishing our stochastic process limits on rather than on is clear: it provides insight into the natural question ‘Are the older vertices more connected than the younger vertices?’ If we establish a limit in , then we have a definitive answer, whereas if we establish a limit in , then we do not gain any insight.
Strengthening the topology to obtain convergence in distribution on is more difficult than in Section 3.3, because, unlike the space , the space is not Polish, and hence we cannot directly apply established sufficient conditions for tightness, such as those stated in [14, Sections 3.6–3.9]. Nonetheless, we are able to establish convergence directly by using [14, Corollary 3.3].
Assumption 3.9.
For any ,
(3.17) |
In view of Lemma A.1, the conditions of Theorem 3.10 can again be readily verified for the illustrative example. We establish a more general result in Proposition 4.2.
4. Applications and extensions
In this section we consider a class of processes that generalise the illustrative example. We use this class of processes to make three points that we believe apply more generally:
-
(I)
An additional layer of dependence between the edges can be introduced that cannot be captured by the types of the vertices (so that (3.6) no longer holds), but still allows to establish limiting results in the spirit of Section 3. Roughly speaking, this is the case when the additional layer of dependence between the edges is of mean-field type (see Section 4.1).
-
(II)
The specific edge-switching dynamics rarely affects the limiting path of the process (see Section 4.2).
-
(III)
The dependence between edges in inhomogeneous random graphs with type dependence leads to new behaviour in the corresponding variational problems, even in relatively simple settings (see Section 4.3).
4.1. Beyond conditional independence of edges
4.1.1. Model and LDP
Suppose that is characterised by the following dynamics:
-
•
is the empty graph.
-
•
Each vertex is assigned an independent rate- Poisson clock, and each time the clock associated with vertex rings all the edges that are adjacent to become inactive. Let denote the sequence of times at which the Poisson clock attached to vertex rings, and let, as before,
(4.1) denote the time since the clock last rung (i.e., the ‘age’ of vertex ). To ensure that we take . Note that the definition here differs slightly from that in the illustrative example (Section 3.2).
-
•
If edge is inactive, then it becomes active at rate .
-
•
If edge is active, then it becomes inactive at rate .
We assume that and are Lipshitz-continuous functions on .
If and do not depend on the current state of the unlabelled graph , i.e., if
(4.2) |
then the process fits into the framework of Section 3, otherwise it does not. To understand why, we compute the probability that edge is active under (4.2) at time given , and . We have
(4.3) | ||||
an explanation of the expression on the right-hand side is given below, when we discuss a related differential equation version. It is also easy to see that two edges and are independent given , , , , and , and hence the process indeed falls into the framework of Section 3. To recover the illustrative example, take and . Note that through (3.5) and (4.3), we see that the empirical reference graphon is characterised by
(4.4) |
An example of a choice for that does depend on the current state of the unlabelled graph is
(4.5) |
where is a simple graph (e.g. a triangle) and denotes the homomorphism density of in . Note that, by the counting lemma (see, for instance, [8, Proposition 2.2]), this particular choice of is Lipshitz-continuous. In this case, the two edges and are not independent given , , , , and . Indeed, if edge is active, then it may participate in additional copies of , which means that edge is more likely to be active. This dependence is inherent to the model, in that it cannot be removed by changing the definition of the types .
We continue by giving some background on the next main result, Theorem 4.1, which demonstrates that, despite the above observation, we can still establish a sample-path LDP for the process. In order to express the rate function of this LDP, we need to define a mapping such that can be interpreted as the (approximate) probability that there is an edge between vertices and at time . We proceed in a similar manner as above. Given , and , the probability that there is an edge between vertices and is given by
(4.6) | ||||
Because this expression depends on (in addition to and ), it cannot be used to define a mapping through (3.5). However, if we tacitly assume that is well approximated by , then we can use (4.6) with replaced by in combination with (3.5), to implicitly define the mapping by
(4.7) | ||||
The mapping in (4.7) is well-defined for all the relevant paths of (being such that , where is defined below), which can be seen by re-expressing it as the solution to a differential equation. For any , is a random variable on
(4.8) |
and is a random variable on
(4.9) |
Let denote the closure of , and observe that contains all possible paths of and their limits. If and for , then the mapping in (4.7) satisfies the differential equation: ,
(4.10) |
if and otherwise.
The differential equation in (4.10) can be understood as follows. Consider the process under Assumption 3.7, so that the vertices are labelled in order of increasing age. Informally, given , can be thought of as the probability that edge is active at time . For , under Assumption 3.7, at time these vertices had the labels , respectively (with given above (4.10)). The first term in the right-hand side of (4.10), , is the probability that the edge was active at time , the second term accounts for the event that the edge turned on during the time interval , while the third term accounts for the event that it turned off during the time interval .
Theorem 4.1.
Proposition 4.2.
4.1.2. Numerical illustration












(4.13) |
where denotes the triangle density in . In Figure 4.1.2 we plot, under Assumption 3.7, for and the corresponding fluid limit given in Proposition 4.2 for .
The evolution of the graphon valued processes illustrated in Figure 4.1.2 can be understood by analysing the rates in Equation (4.13). When is small (i.e., ), the constant term ‘4’ in is much larger than the terms and . This is because all vertices start with the same type (so that with high probability) and initially the graph contains no edges (so that ). Consequently, at , the graph is similar to what it would be if and , as in the illustrative example. As increases, both the triangle density and the squared difference in the ages (on average) increase, which means that the terms and have an increasing influence on the dynamics of the process. Initially, the term grows faster (on average) than and, consequently, when , the process has a high edge density. However, as increases further () the squared differences in the ages (on average) become increasingly dominant and, consequently, pairs of vertices with significantly different ages are much less likely to be connected by an edge. Due to Assumption 3.7 these vertices lie away from the diagonal connecting to , and therefore we observe fewer edges in these regions of the plots.


In the top panel of Figure 4.1.2 we plot the triangle density of (solid line), (dashed line), and (dotted line). To obtain the outcome of , we take the same outcome of the empirical type processes that was used to generate the outcome of and insert it into the differential equation in (4.10). We can therefore view the path of as having randomness in both the types of the vertices and the outcomes of the edges, and view the path of as the same path but with the randomness in the outcomes of the edges removed. Because there are only vertices and edges, it is much more computationally efficient to simulate than it is to simulate . We note that in the top panel of Figure 4.1.2 the solid and dashed curves follow each other closely. This is because, as Theorem 4.1 suggests, fluctuations in the process are considerably more likely to be caused by fluctuations in the types of the vertices than in the specific outcomes of the edges given these types. This means for the purposes of simulation, when is sufficiently large () it suffices to simulate . In the bottom panel of Figure 4.1.2 we simulate the triangle density of for and observe convergence to the fluid limit.
4.2. Different edge-switching dynamics, equivalent sample-path LDP
Next suppose that the process has the same underlying driving process (i.e., the same ) as in Section 4.1, but different edge-switching dynamics. In particular, we consider edge switching dynamics inspired by [1]. Let be a sequence of independent uniform variables on , and suppose that edge is active if
(4.14) |
where is given by (4.2) and (4.3). Observe that for any edge the random number is sampled just once, rendering this process not Markov. Nonetheless, by Proposition 3.3 we immediately have that, in the pointwise topology, the sequence of processes satisfies the same LDP as the processes described in Section 4.2. To strengthen the topology it remains to verify establish exponential tightness. This can be done by using Proposition 3.4, which leads to the following result.
Proposition 4.3.
4.3. The most likely path to an unusually small edge density
Consider the illustrative example described in Section 3.2 with . Observe that in this case (3.9) simplifies to
(4.15) |
Suppose that we would like to determine the most likely path the process takes to a prescribed edge density at time . The standard method involves two steps: first compute the most likely state of the process at time given this edge density, and afterwards use this computation to obtain the most likely trajectory of the process. Below we focus only on the first step and demonstrate that it is far simpler to numerically compute the most likely state of the process at time when is below the expected edge density than when it is above the expected edge density. Note that, given (4.15), the results in this section apply to the models introduced in Sections 4.1 and 4.2 (with the above simplification).
4.3.1. Most likely state of the process at time
Let denote the distribution of , where is defined in (3.8). Note that
(4.16) |
However, for the moment we will assume that is a general measure on . We first consider the event that has an unusually small edge density at time . By Theorem 2.6, the corresponding variational problem is
(4.17) |
Proposition 4.4.
The feasible region of the variational problem described in (4.17) is convex.
Because the objective function in (4.17) is strictly convex, Proposition 4.4 implies that the variational problem has a unique local minimum which is the global minumium. Consequently, there are several numerical methods that we can apply to find the global minimum.
If instead we consider the probability that has an unusually high edge density, then we must solve the same variational problem as described in (4.17) with ‘’ replaced by ‘’. Now the feasible region is no longer convex. Consequently, as we illustrate with a numerical example, the corresponding variational problem may have multiple local maxima and multiple local minima.
Let
(4.18) |
In Figure 4.3.1 we plot the rate (the value of the objective function evaluated at the maxima) against the edge density . When there are two distinct optimal solutions corresponding to
For , solutions near and are local minima. These local minima are illustrated by the dotted curve in Figure 4.3.1: values above correspond to solutions that are close to and values below correspond to solutions that are close to . Observe that we can restrict our search of an optimal measure to measures that are absolutely continuous with respect to . For given by (4.18), these measures live on the 2-dimensional simplex. Consequently, there can be at most two local minima. If has a continuous component (as it does in (4.16)), then there is, in principle, no bound on the number of local minima. This makes it difficult to determine if a global minimum has been reached using numerical methods.

5. Proofs
5.1. Proofs of the results in Section 2
We prove Theorem 2.6 via a sequence of lemmas. Recall that we use to denote an inhomogeneous Erdős–Rényi random graph (IRG) and to denote a inhomogeneous random graph with type dependence (IRGT).
5.1.1. Inhomogeneous Erdős–Rényi random graphs
The first lemma is similar to results presented in [12] and to [13, Theorem 4.1], the primary difference being that in [12, 13] there is a single reference graphon , i.e., for all . The generalisation comes at the cost of the addition of Assumption 2.4 in the lower bound (which is not made in [13, Theorem 4.1], but is in [12]).
Lemma 5.1.
Let denote the reference graphon for and suppose that in . Then
(5.1) |
and, subject to Assumption 2.4,
(5.2) |
Proof.
The upper bound follows from the same arguments as used in [13, Theorem 4.1] (by noting that the specific requirement that is not used there). The lower bound again follows from similar arguments. However, now instead of applying Jensen’s inequality we apply the dominated convergence theorem, which is possible because of Assumption 2.4. ∎
The previous lemma can be used to obtain the following concentration type result for inhomogeneous Erdős–Rényi random graphs. For , denote .
Lemma 5.2.
Let be an IRG with reference graphon . If , then, for any ,
(5.3) |
Proof.
Suppose that , and let be any member of the equivalence class . Using a Taylor expansion in the first inequality and Jensen’s inequality in the second inequality, we have
(5.4) | ||||
Since is open, its complement is closed, which implies that we can apply the upper bound in Lemma 5.1, from which the result follows. ∎
5.1.2. Inhomogeneous random graphs with type dependence
We next turn our attention to inhomogeneous random graphs with type dependence . We first use the previous lemma to show that is close to the induced reference graphon with high probability.
Lemma 5.3.
If Assumption 2.3 holds, then
(5.5) |
Proof.
Suppose that Assumption 2.3 holds. We proceed by contradiction. Suppose that (5.5) does not hold. Then there necessarily exist sequences and such that
(5.6) |
where, for each , is an empirical distribution function with data points. Since is compact, there exists a convergent subsequence of . Consequently, without loss of generality we may assume that there exists such that in as . Under Assumption 2.3 we therefore have
(5.7) |
Recalling that, due to (2.14) (i.e., conditional on the induced reference graphon the graph has the distribution of an inhomogeneous random graph), we can apply Lemma 5.2 to obtain a contradiction. ∎
The next lemma establishes a large deviation principle for the sequence of induced reference graphons .
Lemma 5.4.
Proof.
In the remainder of this subsection we make use of the above auxiliary results to prove Theorem 2.6.
Proof of Theorem 2.6: We prove (i), (ii), and (iii) separately. Recall that these correspond to the three cases , , and respectively.
(i) lower bound: Let be an open subset of , and let denote the largest open set whose -neighbourhood is contained in . We have
(5.9) |
Applying Lemmas 5.4 and 5.3 to the first and second terms on the right-hand-side of (5.9), respectively, we obtain
(5.10) |
(i) upper bound: Let be a closed subset of , and let denote the largest closed set that contains the -neighbourhoods of all the points in . We have
(5.11) |
Again applying Lemmas 5.4 and 5.3 to the first and second terms on the right-hand-side of (5.9), respectively, we obtain
(5.12) |
where in the first step we use the fact that .
(ii) lower bound: For , let
(5.13) |
and observe that, by Assumption 2.3, the set is measurable. Under Assumption 2.4 we therefore have
(5.14) | ||||
In the second inequality we use the fact that, since is compact, for any sequence in there exists a convergent subsequence, which allows us to apply Lemma 5.1. In the final step we use the lower semi-continuity of and the assumption that is a continuous mapping from to , in combination with the fact that, under Assumption 2.4, if , then uniformly over as (see [12, Lemma 2.3]). Because these arguments hold for any we can take the sharpest lower bound:
(5.15) |
(ii) upper bound: Let be the Lévy metric, let , and recall that metrises the weak topology (see [11, Theorem D.8]). Since is a compact space, for any we can construct a finite set with the property that for any there exists such that . We therefore have
(5.16) | ||||
In the second step we apply Assumption 2.2, Lemma 5.1 and Laplace’s method, using a similar justification as in the lower bound. In the fourth step we use lower semi-continuity of (Assumption 2.2) and apply [12, Lemma 2.3]).
(iii): In this case we can apply similar (albeit simpler) arguments as in case (i). ∎
5.2. Proofs of the results in Section 3
5.2.1. Large deviations
The next lemma will be used to prove Proposition 3.3.
Lemma 5.5.
Subject to Assumption 3.1, satisfies the LDP with rate and with rate function
(5.17) |
Proof.
The claim follows from the contraction principle (cf. Lemma 5.4). ∎
Proof of Proposition 3.3. To establish a multi-point LDP, we can follow similar arguments as in the proof of Theorem 2.6.(i). For example, to establish the lower bound, pick , let be an open subset of , and let be as in the proof of Theorem 2.6. We have
(5.18) | ||||
where is given by (5.17), and we use a similar justification as in the lower bound of Theorem 2.6.(i) (now applying Lemma 5.5 where we used to apply Lemma 5.4). The upper bound is again similar and is therefore omitted. With the multi-point LDP established, we can apply the Dawson-Gärtner projective limit theorem [11, Theorem 4.6.1] to establish an LDP in the pointwise topology, and [11, Lemma 4.6.5] to obtain the specific form of the rate function in Proposition 3.3. ∎
Proof of Proposition 3.4. For and , define the modulus of continuity in by
(5.19) |
where the infimum is over satisfying
(5.20) |
and . By [15, Theorem 4.1] (in combination with the compactness of ), the sequence of processes is exponentially tight if
(5.21) |
for all . Suppose that (3.15) holds, with given by (3.14). We will show that this entails that (5.21) holds with for . Indeed, observe that
(5.22) |
Consequently,
(5.23) |
Hence
(5.24) |
where in the final step we apply (3.15) and use the fact that . ∎
5.2.2. Weak convergence
Proof.
Proof of Proposition 3.8. By Lemma 5.6, we have on . From (3.6), Assumption 3.7, and the uniform bound in [8, Lemma 5.11] we know that, for any ,
(5.26) |
which implies that
(5.27) |
The claim therefore follows from [14, Corollary 3.3]. ∎
Proof of Theorem 3.10. By Lemma 5.6 and [14, Corollary 3.3], it suffices to prove that
(5.28) |
where is a metric that generates (see [14, Ch. 3.5] for an expression for ). Define such that, and
(5.29) |
Because, for any ,
(5.30) | ||||
it suffices to show that
(5.31) |
with probability 1. We first deal with the term . By the same arguments as in the proof of Proposition 3.8, for any ,
(5.32) |
Thus,
(5.33) | ||||
where in the final step we apply (5.32) in combination with Assumption 3.9. The fact that, with probability 1,
follows because, due to Assumptions 3.6 and 3.1, the limiting object of is a random variable on (i.e., its trajectories are càdlàg paths). ∎
5.3. Proofs of the results in Section 4
5.3.1. Proofs of the results in Section 4.1
To prove Theorem 3.10, we construct a graphon-valued process, , that mimics the behaviour of while still falling into the framework of Section 3. We couple the two processes and demonstrate that, under the coupling, the probability that the two processes deviate from each other significantly (in a way defined below) is .
Constructing a mimicking process: Suppose that the process is characterised by the following dynamics:
-
•
is the empty graph.
-
•
Each vertex is assigned an independent rate- Poisson clock. Each time the clock rings, all the edges that are adjacent to become inactive.
-
•
If edge is inactive, then it becomes active at rate λ(t,Y_i(t),Y_j(t),F_n(t; ⋅), ~g^[F_n](t; ⋅, ⋅)).
-
•
If edge is active, then it becomes inactive at rate μ(t,Y_i(t),Y_j(t),F_n(t; ⋅), ~g^[F_n](t; ⋅, ⋅)).
Here, is defined in (4.10). We point out that the induced reference graphon process of is indeed . Note that the only difference between the transition rates of and is that in the transition rate functions and we have replaced by .
Lemma 5.7.
There exists a coupling of and such that
(5.34) |
Proof.
The claim is proved in three steps.
Step 1: description of the coupling. Let be the maximal value that the rates and can take, i.e.,
(5.35) |
and observe that because and are Lipshitz continuous functions with a compact domain. Suppose that outcomes of and are generated in the following manner.
-
•
For each , vertex is assigned the same (coupled) rate- Poisson clock in both processes, so that if the clock associated with vertex rings in at time , then the clock associated with vertex also rings in at time (and vice-versa). When the clock associated to vertex then all edges adjacent to vertex become inactive (in both and simultaneously).
-
•
Assign each edge the same (coupled) Poisson rate- clock in both processes. When the Poisson clock associated with edge rings, generate an outcome of a Unif(0,1) distribution.
-
–
if , then the edge becomes active in .
-
–
if , then the edge becomes active in .
(If it was already active, then it remains active.)
-
–
-
•
Assign each edge a second (coupled) Poisson rate- clock in both processes. When the Poisson clock associated with edge rings, generate an outcome of a Unif(0,1) distribution.
-
–
if , then the edge becomes inactive in .
-
–
if , then the edge becomes inactive in .
(If it was already inactive, then it remains inactive.)
-
–
Step 2: majorization of the distance. Observe that if edge is inactive in both models and the clock associated with edge rings, then a difference is formed (i.e., edge is active in one process and inactive in the other) with probability
(5.36) |
At any time , by the Lipshitz continuity of ,
(5.37) |
where is the Lipshitz constant. Observe that an equivalent bound holds when is replaced by .
Let denote the number of differences between and , so that
(5.38) |
Using a standard property of the superposition of Poisson processes and (LABEL:eqn:rBg), we see that, at any time , increases by 1 at a rate that is bounded above by
(5.39) |
In addition, observe that if a clock associated to a vertex rings it can only decrease the number of differences because then, in both processes, all the edges adjacent to the vertex are inactive. Combining the above, we see that, for any , is stochastically dominated by the random quantity
(5.40) |
where is a pure-birth process, with initial state , characterized by the transition rate
(5.41) |
from state to state . Consequently, for any we have
(5.42) | ||||
where we have used the fact that is an increasing process.
Step 3: bounding the dominating process. So as to bound the second term on the right-hand-side of (5.42), we observe that, for any ,
(5.43) |
We can establish (5.43) following similar arguments as in the proof of Theorem 3.10, and hence these arguments are omitted.
For ease of notation below we write to denote . It suffices to show that, for any , there exists sufficiently small so that
(5.44) |
for some . Observe that the Markov chain described by the transition rates (5.41) is a continuous-time branching process with immigration. The initial population size is 0, immigrants arrive at rate , while individuals in the population give birth at rate and die at rate . Let denote the number of descendants that are alive at time of an individual that immigrated to the population at time . Let . It was shown by Yule (cf. [17, Chapter V.8]) that
(5.45) |
i.e., has a geometric distribution with success probability . Note that (since the death rate of individuals is zero) stochastically dominates for all . In addition, the total number of immigrants has a Poisson distribution with mean . Thus, if are i.i.d. copies of and , where are independent of everything else, then
(5.46) |
With
(5.47) | ||||
and , and applying the Chernoff bound, we therefore have
(5.48) |
The claim now follows by observing that can be selected sufficiently small to make the inequality on the right-hand-side of (5.48) strict, in which case . ∎
Lemma 5.8.
The process satisfies the conditions of Theorem 3.5.
Proof.
Assumption 3.2 follows from Proposition A.1. The conditions of Proposition 3.4 can be verified using similar (albeit simpler) arguments as in the proof of Proposition 4.3 below. To verify Assumption 3.1 we need to show that if in then in . Recall that, for any , , and let
(5.49) |
Below we assume that is a continuity point of . Applying (4.10) and the triangle inequality, we have
(5.50) | ||||
Using , the Lipschitz continuity of and and the fact that for any , , we see that there exists such that
(5.51) | ||||
for almost all , with the same inequality holding for . In addition because and are Lipschitz continuous functions with compact support they are uniformly bounded by some constant . Combining this with (5.50) and (5.51), we obtain
(5.52) | ||||
Using the fact that and repeatedly applying (LABEL:eq:CPP3), we obtain
(5.53) | ||||
Because we can replace and by and in (LABEL:eq:CPP4) almost everywhere. Since , we also have and almost everywhere. Consequently, from (LABEL:eq:CPP4) we obtain
(5.54) | ||||
Because , (LABEL:eq:CPP5) implies that for all , which completes the proof. ∎
5.3.2. Proofs of the results in Section 4.2
We proceed by giving the proof of Proposition 4.3.
Proof of Proposition 4.3. It remains to establish exponential tightness using Proposition 3.4. Recall that and are bounded functions and let denote their maximum. It directly follows that
(5.55) | ||||
Note that there are two ways that edges can change during the interval . In the first place, the Poisson clock of one of the adjacent vertices of the rings during . We let denote the number of edges (active or inactive) that are adjacent to such a vertex. Because the probability each vertex that each vertex rings during is , is stochastically dominated by a random variable . In the second place, the value of falls within the interval
(5.56) | ||||
We let denote the number of such edges. Because both and are bounded by a constant , it follows that
(5.57) | ||||
Consequently, is stochastically dominated by a random variable . We thus have
(5.58) | ||||
where in the last inequality we apply [25, Theorem 2.3] (which is essentially a Chernoff bound). ∎
5.3.3. Proofs of the results in Section 4.3
The proof of Proposition 4.4 relies on the following direct computation.
Proof of Proposition 4.4. Pick and suppose that
(5.59) |
Observe that if are independent random variables with distribution , then
(5.60) |
Let with . We have
(5.61) |
Hence it remains to show that . We have
(5.62) |
where in the second step we apply the Cauchy-Schwarz inequality, and in the final step use (5.59). ∎
Appendix A Rate function for the driving process
To establish an LDP for the illustrative example in Section 3.2 and the examples in Sections 4.1 and 4.2, we need to verify Assumption 2.2, i.e., we need to establish an LDP for the driving process. The latter are dealt with in Appendix A.1, the former in Appendix A.2.
A.1. LDP for driving process in Section 4
Recall that in Sections 4.1 and 4.2 each vertex is assigned a Poisson clock that rings at times , and
(A.1) |
with . Here we drop the restriction that , and suppose that . We also suppose that
(A.2) |
in . For and , we let
(A.3) |
Proposition A.1.
The sequence of processes satisfies the LDP on with rate and with rate function
(A.4) |
where if is absolutely continuous and otherwise.
Proof.
We prove Proposition A.1 by applying the standard method of proving sample-path LDPs:
-
(I)
Establish a finite-dimensional LDP for times .
-
(II)
Write down a limiting expression for the rate function when the gaps between the times shrink to zero.
-
(III)
Strengthen the topology by establishing exponential tightness.
Step (I): Let
(A.5) |
where with . We apply the following result, which is an immediate consequence of [10, Theorem 3.5]: If (A.2) holds, then the sequence of measures satisfies the LDP with rate and with rate function
(A.6) | ||||
where .
To apply (A.6), we need to write down a formula for . By the Markov property, this is essentially equivalent to writing down an expression for , i.e., for a single time step. If , then the probability that is (i.e., the probability that the Poisson clock associated with vertex does not ring in the time interval ). On the other hand, if , then the probability that is the probability that the Poisson clock associated with vertex rings in the time interval (which occurs with probability ), and afterwards does not ring again (which occurs with probability ). We thus have
(A.7) |
If we apply (A.6) for a single time step, then we obtain
(A.8) |
We would like to derive a closed form expression for . To do this, we consider the term under the supremum, i.e.,
(A.9) |
take the derivative with respect to for fixed , and set this to zero. This gives
(A.10) |
which leads to
(A.11) |
Substituting (A.11) into (A.9), we obtain
(A.12) |
We next optimise over , (note that previously we optimised over , ). To do this, we consider the last line, which reads
(A.13) | |||
(A.14) |
and take the derivative with respect to for fixed , and set this to zero. This gives
(A.15) |
which implies that
(A.16) |
Substituting (A.16) into (A.14), we obtain
(A.17) |
Combining this with (A.13), we obtain
Rearranging further, we obtain the following.
Lemma A.2.
(A.18) |
Note that if
then for absolutely continuous with respect to we have
which is the relative entropy from to .
The above arguments extend naturally to establish the following finite-dimensional large deviation principle.
Lemma A.3.
If (A.2) holds, then the sequence of measures satisfis the LDP with rate and with rate function , where and .
Step (II): By the Dawson-Gärtner projective limit LDP ([11, Theorem 4.6.1]), the sequence of measures satisfies the LDP in the pointwise topology with rate function
(A.19) |
The next step is therefore to prove the following lemma.
Lemma A.4.
for all with .
Proof.
The proof comes in two steps.
We start by demonstrating that for any . Consider the times , and let . The rate function of the finite-dimensional LDP is
(A.20) | ||||
(A.21) | ||||
(A.22) | ||||
(A.23) |
Recall the definition of from (A.3). For the first term we have
(A.24) |
For the second term we have
(A.25) |
where . Finally, for the third term we have
(A.26) |
Combining the three terms, we get the expression in (A.4).
To prove that , suppose to the contrary that for some . In that case there must exist a vector such that
(A.27) |
For , let ) be such that
-
•
for any there exists with ,
-
•
.
By the contraction principle, for any , we have
(A.28) |
On the other hand, following very similar arguments as those above, we get
(A.29) |
which in combination with (A.28) contradicts (A.27). The proof that when is not absolutely continuous follows from standard arguments and is therefore omitted. ∎
Step (III): We now establish exponential tightness.
Lemma A.5.
The sequence of measures is exponentially tight in .
Proof.
By [15] it suffices to show that
(A.30) |
where
(A.31) |
and is a metric which generates the weak topology. We equip the space with the Lévy metric, so that for two distribution functions we have
(A.32) |
Let denote the number of vertices that turn off at some point in the interval . Observe that
(A.33) |
Given (A.33) the result then follow from very similar arguments to those used in the proof of Proposition 4.3. ∎
∎
A.2. LDP for driving process in Section 3.2
Define by letting
(A.34) |
for all and , where is the cdf of and exponential random variable with rate . Note that with this transformation we recover the empirical type process in Section 3.2.
Proposition A.6.
The sequence of processes satisfies the LDP on with rate and with rate function
(A.35) |
where is defined in Proposition A.1.
References
- [1] Athreya, S., Hollander, F. D., and Röllin, A. (2021). Graphon-valued stochastic processes from population genetics. Annals of Applied Probability 31, 1724–1745.
- [2] Billingsley, P. (1979). Probability and Measure. Wiley, New York.
- [3] Borgs, C., Chayes, J., Gaudio, J., Petti, S., and Sen, S. (2020). A large deviation principle for block models. arXiv:2007.14508.
- [4] Borgs, C., Chayes, J., Lovász, L., Sós, V., and Vesztergombi, K. (2008). Convergent sequences of dense graphs I: Subgraph frequencies, metric properties and testing. Advances in Mathematics 219, 1801–1851.
- [5] Borgs, C., Chayes, J., Lovász, L., Sós, V., and Vesztergombi, K. (2012). Convergent sequences of dense graphs II: Multiway cuts and statistical physics. Annals of Mathematics 176, 151–219.
- [6] Braunsteins, P., den Hollander, F., and Mandjes, M. (2022+). A sample-path large deviation principle for dynamic Erdős–Rényi random graphs. arXiv:2009.12848. Annals of Applied Probability, to appear.
- [7] Černý, J., and Klimovsky, A. (2020). Markovian dynamics of exchangeable arrays. In: Genealogies of Interacting Particle Systems. Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore, pp. 209–228.
- [8] Chatterjee, S. (2017). Large Deviations for Random Graphs Lecture Notes in Mathematics 2197. Springer.
- [9] Chatterjee, S., and Varadhan, S.R.S. (2011). The large deviation principle for the Erdős-Rényi random graph. European Journal of Combinatorics 32 (7), 1000–1017.
- [10] Dawson, D A., and Gärtner, J. (1987). Large deviations from the McKean-Vlasov limit for weakly interacting diffusions. Stochastics: An International Journal of Probability and Stochastic Processes 20(4), 247–308.
- [11] Dembo, A., and Zeitouni, O. (1998). Large Deviations Techniques and Applications. Stochastic Modelling and Applied Probability 38. Springer, New York.
- [12] Dhara, S., and Sen, S. (2021). Large deviation for uniform graphs with given degrees. Communications in Mathematical Physics 382 (1), 123-171.
- [13] Dupuis, P., and Medvedev, G. (2020). The large deviation principle for interacting dynamical systems on random graphs. arXiv:2007.13899.
- [14] Ethier, S. N., and Kurtz, T.G. (2009). Markov Processes: Characterization and Convergence. Wiley Series in Probability and Mathematical Statistics 282. John Wiley & Sons.
- [15] Feng, J., and Kurtz, T.G. (2006). Large Deviations for Stochastic Processes. Mathematical Surveys and Monographs 131. American Mathematical Society.
- [16] Grebík, J., and Pikhurko, O. (2021). Large deviation principles for block and step graphon random graph models. arXiv:2101.07025.
- [17] Harris, T.E. (1963). The Theory of Branching Processes. Grundlehren der mathematischen Wissenschaften 196. Springer, Berlin.
- [18] den Hollander, F. (2000). Large Deviations. Fields Institute Monograph 14. American Mathematical Society, Providence RI, USA.
- [19] Janson, F. (2013). Graphons, Cut Norm and Distance, Couplings and Rearrangements. New York Journal of Mathematics Monographs, Vol. 4.
- [20] Lovász, L. (2012). Large Networks and Graph Limits. American Mathematical Society, Providence RI, USA.
- [21] Lovász, L., and Szegedy, B. (2006). Limits of dense graph sequences. Journal of Combinatorial Theory, Series B 96, 933–957.
- [22] Lovász, L., and Szegedy, B. (2007). Szemerédi’s lemma for the analyst. Geometric and Functional Analysis 17, 252–270.
- [23] Lubetzky, E., and Zhao, Y. (2015). On replica symmetry of large deviations in random graphs. Random Structures & Algorithms 47(1), 109–146.
- [24] Markering, M. (2020). The large deviation principle for inhomogeneous Erdő sRényi random graphs. arXiv:2010.03504.
- [25] McDiarmid, C. (1998). Concentration. In: Probabilistic methods for algorithmic discrete mathematics. Springer, Berlin, Heidelberg, pp. 195-248.
- [26] Ráth, B. (2012). Time evolution of dense multigraph limits under edge-conservative preferential attachment dynamics. Random Structures & Algorithms 41, 365–390.
- [27] Röllin, A., and Zhang, Z.S. (2021). Dense multigraphon-valued stochastic processes and edge-changing dynamics in the configuration model. arXiv:2104.13024.