The strong component structure of the barely subcritical directed configuration model

Matthew Coulson Department of Combinatorics & Optimization, University of Waterloo, Canada. Email: [email protected].

Abstract

We study the behaviour of the largest components of the directed configuration model in the barely subcritical regime. We show that with high probability all strongly connected components in this regime are either cycles or isolated vertices and give an asymptotic distribution of the size of the $k$ th largest cycle. This gives a configuration model analogue of a result of Łuczak and Seierstad for the binomial random digraph.

1 Introduction

1.1 The directed configuration model

Let $\mathbf{d}_{n}=(\mathbf{d}_{n}^{-},\mathbf{d}_{n}^{+})=((d_{1}^{-},d_{1}^{+}),\ldots,(d_{n}^{-},d_{n}^{+}))$ be a directed degree sequence on $n$ vertices and let $m_{n}=\sum d_{i}^{-}=\sum d_{i}^{+}$ . The directed configuration model on $\mathbf{d}_{n}$ , introduced by Cooper and Frieze [3], is the random directed multigraph on $[n]$ obtained by associating with vertex $i$ $d_{i}^{-}$ in-stubs and $d_{i}^{+}$ in-stubs, and then choosing a perfect matching of in- and out- stubs uniformly at random. We will denote this model by $G(\mathbf{d}_{n})$ . This is a directed generalisation of the configuration model of Bollobás [1] which since its introduction has become one of the most widely used random graph models.

A strongly connected component in a digraph is a maximal sub-digraph such that there exists a directed path between each ordered pair of vertices. In this paper we will consider the size of the largest strongly connected component in the barely subcritical regime. Note that these components are much smaller than the connected components which are found in the undirected case. This is due to the fact that the minimal strongly connected components are directed cycles as opposed to the trees in the directed case and so we require one additional edge on the same vertex set. We shall call any strongly connected digraph which is not a cycle complex. Note that any such digraph has strictly more edges than vertices.

Let $n_{k,\ell}$ be the number of copies of $(k,\ell)$ in $\mathbf{d}_{n}$ and let $\Delta_{n}=\max(d_{i}^{-},d_{i}^{+})$ be the maximum degree of $\mathbf{d}_{n}$ . Also, define the following parameters of the degree distribution. These parameters govern the behaviour of the size of the largest strongly connected component.

Definition 1.1.

\displaystyle Q_{n}:=\frac{1}{m_{n}}\sum_{i=1}^{n}d_{i}^{-}d_{i}^{+}-1

\displaystyle R_{n}^{-}:=\frac{1}{m_{n}}\sum_{i=1}^{n}d_{i}^{-}d_{i}^{+}(d_{i}^{-}-1)

\displaystyle R_{n}^{+}:=\frac{1}{m_{n}}\sum_{i=1}^{n}d_{i}^{-}d_{i}^{+}(d_{i}^{+}-1)

We shall assume the following conditions which ensure that the degree sequence is suitably “well behaved”.

Condition 1.2.

For each $n$ , $(d_{i}^{-},d_{i}^{+})_{i=1}^{n}=((d_{i}^{-},d_{i}^{+})^{(n)})_{i=1}^{n}$ is a sequence of ordered pairs of non-negative integers such that $\sum_{i=1}^{n}d_{i}^{-}=\sum_{i=1}^{n}d_{i}^{+}$ . Furthermore, $(p_{i,j})_{i,j=1}^{\infty}$ is a probability distribution such that for some $\varepsilon,\zeta>0$ ,

i)

$n_{i,j}/n\to p_{i,j}$ as $n\to\infty$ for each $i,j\geq 0$ ,
ii)

$m_{n}/n=\mu^{(n)}\to\mu=\sum_{i,j=1}^{\infty}ip_{i,j}=\sum_{i,j=1}^{\infty}jp_{i,j}$ ,
iii)

$n_{0,0}=0$ ,
iv)

$\sum_{i=1}^{\infty}n_{0,i}+n_{i,0}\leq(1-\varepsilon)n$ ,
v)

$n_{1,1}\leq(1-\varepsilon)n$ ,
vi)

$\Delta_{n}\leq n^{1/6}\log^{-1/4}(n)$ ,
vii)

$R_{n}^{-},R_{n}^{+}\geq\zeta$ .

It was shown by Cooper and Frieze [3] that $Q_{n}=0$ is the threshold for the existence of a giant strongly connected component under some mild conditions on the degree sequence similar to those observed in 1.2.

For the remainder of the paper, we shall omit the subscript $n$ for reasons of clarity - for example we write $\mathbf{d}=\mathbf{d}_{n}$ etc. Let $\mathcal{C}_{k}(\mathbf{d})$ be the size of the $k$ th largest strongly connected component of the directed configuration model with degree sequence $\mathbf{d}$ . We shall write $\mathcal{C}_{k}$ for this quantity when the degree sequence is clear.

Our main result is the following,

Theorem 1.3.

Suppose that $G(\mathbf{d})$ is a configuration model random digraph and suppose that $nQ^{3}(R^{-}R^{+})^{-1}\to-\infty$ . With high probability, there are no complex components or cycles of length $\omega(1/|Q|)$ . Furthermore,

\mathbb{P}\bigg{(}|\mathcal{C}_{k}|\geq\frac{\alpha}{|Q|}\bigg{)}=1-\sum_{i=0}^{k-1}\frac{\xi_{\alpha}^{i}}{i!}e^{-\xi_{\alpha}}+o(1).

where

\xi_{\alpha}=\int_{\alpha}^{\infty}\frac{e^{-x}}{x}dx.

The condition $nQ^{3}(R^{-}R^{+})^{-1}\to-\infty$ is likely best possible as the analogous condition in the undirected case discriminates the barely subcritical case from the critical window. Moreover, substituting $Q,R^{-}$ and $R^{+}$ obtained from a typical degree sequence of $D(n,p)$ also enters the critical window when we have $nQ^{3}(R^{-}R^{+})^{-1}\to c$ for some constant $c$ .

In prior work, Łuczak and Seierstad [17] considered the model $D(n,p)$ which is a random digraph model formed by including each possible arc with probability $p$ independently. They showed an analogous result to Theorem 1.3 for $p=(1-\varepsilon)/n$ with $\varepsilon=o(1)$ and $\varepsilon^{3}n\to\infty$ in this model.

1.2 Previous work

The study of the giant component in random graph models was initiated by the seminal paper of Erdős and Rényi [7] regarding the giant component in $G(n,p)$ . Since then the appearance of a giant component in various models has remained an active topic of study in the area.

When working with directed graphs, there are a number of types of connected component which are of interest. In this paper we concern ourselves with the strongly connected components. The study of strongly connected components in random digraphs began in the model $D(n,p)$ where we include each possible edge with probability $p$ independently. Karp [14] and Łuczak [16] independently showed that when $p=c/n$ , then $D(n,p)$ has all strongly connected components of size $O(1)$ if $c<1$ . If instead $c>1$ they showed that there exists a unique strongly connected component of linear order with all other components of size $O(1)$ . The case $p=(1+\varepsilon)/n$ with $\varepsilon=o(1)$ and $|\varepsilon|^{3}n\to\infty$ was studied by Łuczak and Seierstad [17]. They showed that if $\varepsilon^{3}n\to\infty$ , there is a unique strongly connected component of size $4\varepsilon^{2}n$ and other components all of size $O(1/\varepsilon)$ . When $\varepsilon^{3}n\to-\infty$ they showed an analogue of Theorem 1.3:

Theorem 1.4.

Let $np=1-\varepsilon$ where $\varepsilon\to 0$ but $\varepsilon^{3}n\to-\infty$ . Assume $\alpha>0$ is a constant and let $X_{k}$ denote the size of the $k$ th largest strongly connected component. Then, asymptotically almost surely, $D(n,p)$ contains no complex components and

\lim_{n\to\infty}\mathbb{P}(X_{k}\geq\alpha/\varepsilon)=1-\sum_{i=0}^{k-1}\frac{\xi_{\alpha}^{i}}{i!}e^{-\xi_{\alpha}},

where $\xi_{\alpha}=\int_{\alpha}^{\infty}\frac{e^{-x}}{x}dx$ .

The so-called critical window, when $p=(1+\lambda n^{-1/3})/n$ has also been the subject of some study. In [5] the author showed bounds on the size of the largest strongly connected component in this regime which are akin to bounds obtained by Nachmias and Peres for the undirected case [18]. Moreover, Goldschmidt and Stephenson [10] gave a scaling limit result for the largest strongly connected components in the critical window.

The directed configuration model has also been studied previously. It was first studied by Cooper and Frieze who showed that provided the maximum degree $\Delta<n^{1/12}/\log(n)$ then if $Q_{n}<0$ , there is no all strongly connected components are small and if $Q_{n}>0$ there is a giant strongly connected component of linear size. The assumptions on the degree sequence have subsequently been relaxed, Graf [11] showed $\Delta<n^{1/4}$ is enough to draw the same conclusion and Cai and Perarnau [2] improved this further to only require bounded second moments. A scaling limit result has been recently obtained at the exact point of criticality, $Q_{n}=0$ by Donderwinkel and Xie [6] in a very closely related model where vertices’ degrees are sampled from a limiting degree sequence.

1.3 Organization

The remainder of this paper is arranged as follows, in section 2 we prove some auxiliary results on subgraph counting within the directed configuration model. Then, in section 3 we enumerate certain types of strongly connected directed graphs of maximum degree $4$ . In section 4 we prove Theorem 1.3 which we break down into a few steps: first that there are no long cycles, next that there are no complex components, and finally we compute the probability the $k$ th largest component has size at least $\alpha/|Q|$ via Poisson approximation. We conclude the paper in section 5 with some open questions and future work.

2 Subgraph Bounds

We calculate probabilities of given subgraphs in the configuration model.

Suppose that $G(\mathbf{d})$ is a configuration model random graph and let $D=(D_{-},D_{+})$ be the degree of a uniformly random vertex of $G(\mathbf{d})$ . We call $D$ the degree distribution of $G(\mathbf{d})$ (or of $\mathbf{d}$ ). Define

\displaystyle\mu_{i,j}=\mathbb{E}(D_{-}^{i}D_{+}^{j}),

and

\displaystyle\rho_{i,j}=\mathbb{E}\bigg{(}\prod_{k=0}^{i-1}(D_{-}-k)\prod_{\ell=0}^{j-1}(D_{+}-\ell)\bigg{)}.

So that the $\mu_{i,j}$ and $\rho_{i,j}$ are the moments and factorial moments of $(D_{-},D_{+})$ respectively. We also define $\mu=\mu_{1,0}=\mu_{0,1}$ which is the average degree. Furthermore, observe that $\mu_{1,1}=\frac{m}{n}(1+Q)$ which is a fact we utilise in subsequent sections. We now state a general upper bound on the probability of finding certain subgraphs in the configuration model.

Lemma 2.1.

Let $G(\mathbf{d})$ be a configuration model random digraph with degree distribution $(D_{-},D_{+})$ . Suppose further that $G(\mathbf{d})$ has $n$ vertices and $m$ edges. Let $H$ be any digraph with $h$ vertices, $k$ edges and let $h_{i,j}$ be the number of vertices of $H$ with in-degree $i$ and out degree $j$ . Then the probability that a uniformly random injective map $\phi:V(H)\to V(G(\mathbf{d}))$ is a homomorphism is bounded above by

p^{+}(H,G(\mathbf{d})):=\frac{n^{h}}{(n)_{h}(m)_{k}}\prod_{i,j\in\mathbb{N}}\rho_{i,j}^{h_{i,j}}

(1)

Furthermore, the expected number of copies of $H$ in $G(\mathbf{d})$ is bounded above by

n^{+}(H,G(\mathbf{d})):=\frac{h!}{\text{aut}(H)}\binom{n}{h}p^{+}(H,G(\mathbf{d})).

(2)

Proof.

Note that (2) follows immediately from (1). Thus we shall focus on the proof of (1). Let $\psi:V(H)\to V(G(\mathbf{d}))$ be a fixed injective map. Arbitrarily order the edges of $H$ as $E_{1},E_{2},\ldots,E_{k}$ and for each edge $E_{i}$ define an event $\mathcal{E}_{i}:=\{\psi(E_{i})\in E(G(\mathbf{d}))\}$ . Then, $\psi$ is a homomorphism if and only if every event $\mathcal{E}_{1},\ldots,\mathcal{E}_{k}$ occurs. To simplify notation, let $\mathcal{F}_{1}=\mathcal{E}_{1}$ and $\mathcal{F}_{i}=\mathcal{E}_{i}|\cap_{j=1}^{i-1}\mathcal{E}_{j}$ for $i\geq 2$ and note that

\bigcap_{i=1}^{k}\mathcal{E}_{i}=\bigcap_{i=1}^{k}\mathcal{F}_{i}.

Suppose that $E_{i}=a_{i}b_{i}$ for some $a_{i},b_{i}\in V(H)$ . Define $s_{i}=|\{j<i:a_{i}=a_{j}\}|$ and $t_{i}=|\{j<i:b_{i}=b_{j}\}|$ . That is, $s_{i}$ and $t_{i}$ are the number of times that $a_{i}$ (resp. $b_{i}$ ) has previously appeared as the initial (terminal) vertex of an edge of $H$ . Also, suppose that $\psi(E_{i})=a_{i}^{\prime}b_{i}^{\prime}$ .

Claim 2.2.

\mathbb{P}(\mathcal{F}_{i})\leq\frac{\min(d^{+}(a_{i}^{\prime})-s_{i},0)\min(d^{-}(b_{i}^{\prime})-t_{i},0)}{m+1-i}

Proof.

To see this, note that if $d^{+}(a_{i}^{\prime})\leq s_{i}$ , then $\mathbb{P}(\mathcal{F}_{i})=0$ as there are not enough stubs at $a_{i}^{\prime}$ to create such a copy of $H$ . Similarly if $d^{-}(b_{i}^{\prime})\leq t_{i}$ , $\mathbb{P}(\mathcal{F}_{i})=0$ .

So, without loss of generality we may assume that $(d^{+}(a_{i}^{\prime})-s_{i}),(d^{-}(b_{i}^{\prime})-t_{i})>0$ . Now, by definition of $\mathcal{F}_{i}$ we have already chosen $i-1$ edges of the configuration model and we have precisely $d^{+}(a_{i}^{\prime})-s_{i}$ out-stubs remaining at $a_{i}^{\prime}$ and $d^{-}(b_{i}^{\prime})-t_{i}$ at $b_{i}^{\prime}$ . Now, consider the random matching on the $2(m+1-i)$ remaining stubs. The probability that a specified pair of stubs is matched is $(m+1-i)^{-1}$ . There are $(d^{+}(a_{i}^{\prime})-s_{i})(d^{-}(b_{i}^{\prime})-t_{i})$ such pairs which produce $a_{i}^{\prime}b_{i}^{\prime}$ . Thus the expected number of edges from $a_{i}^{\prime}$ to $b_{i}^{\prime}$ is $(d^{+}(a_{i}^{\prime})-s_{i})(d^{-}(b_{i}^{\prime})-t_{i})/(m+1-i)$ . The claim then follows by Markov’s inequality. ∎

Next we compute the probability that all of the $\mathcal{F}_{i}$ occur simultaneously. Due to the way in which we defined these events,

\mathbb{P}(\psi\text{ is a homomorphism})=\mathbb{P}\bigg{(}\bigcap_{i=1}^{k}\mathcal{F}_{i}\bigg{)}=\prod_{i=1}^{k}\mathbb{P}(\mathcal{F}_{i}).

To write down this probability succinctly, we will use the functions

f_{a,b}(x,y):=\prod_{i=1}^{a}\prod_{j=1}^{b}(x+1-i)(y+1-j)

It is a simple computation to check that

\mathbb{P}\bigg{(}\bigcap_{i=1}^{k}\mathcal{F}_{i}\bigg{)}\leq((m)_{k})^{-1}\prod_{v\in V(H)}f_{d_{H}^{-}(v),d_{H}^{+}(v)}(d_{G(\mathbf{d})}^{-}(\psi(v)),d_{G(\mathbf{d})}^{+}(\psi(v)))

(3)

Note that here we were able to remove the $\min$ function as if there are any negative contributions to the product, then there is also a contribution of value $0$ and furthermore, it is impossible for $\psi$ to be a homomorphism in this case so that (3) reduces to $0\leq 0$ in this case (which is clearly true).

To complete the proof, we extend the right hand side of equation (3) to allow any function $\psi:V(H)\to V(G(\mathbf{d}))$ . Choosing an uniformly random function in this way gives that the probability that a uniformly random injective function is a homomorphism is bounded above by

		$\displaystyle\;\frac{1}{(n)_{h}(m)_{k}}\sum_{\psi:V(H)\hookrightarrow V(G(\mathbf{d}))}\prod_{v\in V(H)}f_{d_{H}^{-}(v),d_{H}^{+}(v)}(d_{G(\mathbf{d})}^{-}(\psi(v)),d_{G(\mathbf{d})}^{+}(\psi(v)))$		(4)
	$\displaystyle\leq$	$\displaystyle\;\frac{1}{(n)_{h}(m)_{k}}\sum_{\psi:V(H)\to V(G(\mathbf{d}))}\prod_{v\in V(H)}f_{d_{H}^{-}(v),d_{H}^{+}(v)}(d_{G(\mathbf{d})}^{-}(\psi(v)),d_{G(\mathbf{d})}^{+}(\psi(v)))$		(5)

In moving from injective functions to arbitrary functions between (4) and (5), we move from an intractable space of functions to a product space which naturally splits over the vertices of $H$ . This allows us to rewrite (5) as

		$\displaystyle\frac{1}{(n)_{h}(m)_{k}}\sum_{\psi:V(H)\to V(G(\mathbf{d}))}\prod_{v\in V(H)}f_{d_{H}^{-}(v),d_{H}^{+}(v)}(d_{G(\mathbf{d})}^{-}(\psi(v)),d_{G(\mathbf{d})}^{+}(\psi(v)))$
	$\displaystyle=$	$\displaystyle\frac{1}{(n)_{h}(m)_{k}}\prod_{v\in V(H)}\sum_{w\in V(G(\mathbf{d}))}f_{d_{H}^{-}(v),d_{H}^{+}(v)}(d_{G(\mathbf{d})}^{-}(w),d_{G(\mathbf{d})}^{+}(w))$
	$\displaystyle=$	$\displaystyle\frac{n^{h}}{(n)_{h}(m)_{k}}\prod_{v\in V(H)}\rho_{d_{H}^{-}(v),d_{H}^{+}(v)}=\frac{n^{h}}{(n)_{h}(m)_{k}}\prod_{i,j\in\mathbb{N}}\rho_{i,j}^{h_{i,j}},$

which is the claimed upper bound, $p^{+}(H,G(\mathbf{d}))$ . ∎

We also need a lower bound in the case that $H$ is a cycle.

Lemma 2.3.

Let $G(\mathbf{d})$ be a configuration model random digraph whose degree sequence $\mathbf{d}$ satisfies 1.2. Suppose further that $G(\mathbf{d})$ has $n$ vertices and $m$ edges. Let $H$ be a directed cycle with $h$ vertices. Then the probability that a uniformly random injective map $\phi:V(H)\to V(G(\mathbf{d}))$ is a homomorphism is bounded from below by

p_{c}^{-}(H,G(\mathbf{d})):=\frac{(1+Q)^{h}}{(n)_{h}}\left(1-\frac{2h^{2}\Delta^{2}}{\varepsilon n}\right)\left(1-\frac{h\Delta^{2}}{2m}\right),

(6)

where the $\varepsilon$ in (6) comes from 1.2

Proof.

First, let us consider the probability of finding at least one edge from vertex $u$ of out-degree $a$ to vertex $v$ with in-degree $b$ . Let $\mathcal{H}_{k}$ be our knowledge of $G(\mathbf{d})$ up until now which consists only of a set of $k$ edges which are present. $\mathcal{H}_{k}$ satisfies that we have not observed any edges that affect either the out-degree of $u$ or the in-degree of $v$ and that $G(\mathbf{d})$ has $m$ edges. So let $X$ be the number of edges between $u$ and $v$ . Then,

	$\displaystyle\mathbb{P}(X=0\|\mathcal{H}_{k})=$	$\displaystyle\,\bigg{(}1-\frac{a}{m-k}\bigg{)}\bigg{(}1-\frac{a}{m-k-1}\bigg{)}\ldots\bigg{(}1-\frac{a}{m+1-k-b}\bigg{)}$
	$\displaystyle\leq$	$\displaystyle\,\bigg{(}1-\frac{a}{m}\bigg{)}^{b}$
	$\displaystyle\leq$	$\displaystyle\,1-\frac{ab}{m}+\frac{a^{2}b^{2}}{2m^{2}}.$

where the final line follows by the inequality, $(1+x)^{n}\leq 1-nx+\binom{n}{2}x^{2}$ which is valid for $n\in\mathbb{N}$ and $x\geq-1$ . This allows us to deduce that the probability of seeing an edge between $u$ and $v$ is at least $\frac{ab}{m}-\frac{a^{2}b^{2}}{2m^{2}}$ . Now, looking at each edge of $H$ in turn allows us to deduce that the probability $\phi$ is a homomorphism is at least

\frac{1}{(n)_{h}}\sum_{\phi:[h]\hookrightarrow[n]}\prod_{i=1}^{h}\bigg{(}\frac{d_{\phi(i)}^{-}d_{\phi(i+1)}^{+}}{m}-\frac{(d_{\phi(i)}^{-}d_{\phi(i+1)}^{+})^{2}}{2m^{2}}\bigg{)},

(7)

where we consider the argument of $\phi$ modulo $h$ in (7). Also, note that one can factorise the linear term in (7) and bound the second occurrence of $d_{\phi(i)}^{-}d_{\phi(i+1)}^{+}$ by $\Delta^{2}$ to get the lower bound

\frac{1}{(n)_{h}}\bigg{(}1-\frac{\Delta^{2}}{2m}\bigg{)}\sum_{\phi:[h]\hookrightarrow[n]}\prod_{i=1}^{h}\frac{d_{\phi(i)}^{-}d_{\phi(i)}^{+}}{m}

(8)

The idea is to argue that we can swap the order of the product and sum in (8) without changing the result very much. To this end, let $\Phi_{i}$ be the set of functions $\phi:[h]\to[n]$ such that

i)

$|\phi([n])|=h-i$ ,
ii)

$d_{\phi(j)}^{+}\neq 0$ for each $j\in[h]$ ,
iii)

$d_{\phi(j)}^{-}\neq 0$ for each $j\in[h]$ .

Note that for any function $\phi\not\in\bigcup_{i=0}^{h}\Phi_{i}$ then $\prod_{i=1}^{h}d_{\phi(i)}^{-}d_{\phi(i)}^{+}=0$ . As a result of this we observe that

\sum_{\phi:[h]\hookrightarrow[n]}\prod_{i=1}^{h}d_{\phi(i)}^{-}d_{\phi(i)}^{+}=\sum_{\phi\in\Phi_{0}}\prod_{i=1}^{h}d_{\phi(i)}^{-}d_{\phi(i)}^{+}.

As well as the fact that

\sum_{j=0}^{h}\sum_{\phi\in\Phi_{j}}\prod_{i=1}^{h}d_{\phi(i)}^{-}d_{\phi(i)}^{+}=\sum_{\phi:[h]\rightarrow[n]}\prod_{i=1}^{h}d_{\phi(i)}^{-}d_{\phi(i)}^{+}=\left(\sum_{i=1}^{n}d_{i}^{-}d_{i}^{+}\right)^{h}=m^{h}(1+Q)^{h}.

(9)

For a given function $\phi:[h]\to[n]$ we define its weight as $w(\phi):=\prod_{i=1}^{h}d_{\phi(i)}^{-}d_{\phi(i)}^{+}$ . We also define the weight of a set $S$ of functions in the natural way as $w(S)=\sum_{\phi\in S}w(\phi)$ . Next, we shall apply switching arguments to bound the weights of the sets $\Phi_{i}$ relative to one another.

Consider the auxiliary bipartite graphs, $\mathcal{G}_{i}$ with parts $\Phi_{i}$ and $\Phi_{i+1}$ . We connect $\phi_{i}\in\Phi_{i}$ to $\phi_{i+1}\in\Phi_{i+1}$ by an edge in $\mathcal{G}_{i}$ if $\phi_{i}$ and $\phi_{i+1}$ differ in precisely one coordinate. For each $\phi_{i}\in\Phi_{i}$ there are at most $h^{2}$ ways we can change one coordinate and decrease the size of the image. In particular we may pick any of the $h$ coordinates of $\phi_{i}$ and change it to $\phi_{i}(j)$ for some $j\in[h]$ . So, $\Delta_{\mathcal{G}_{i}}(\Phi_{i})\leq h^{2}$ . For each $\phi_{i+1}\in\Phi_{i+1}$ , we pick any of the at least $i$ coordinates at which $\phi_{i+1}$ is not injective and choose a new image for this coordinate. By 1.2 there are at least $\varepsilon n/2$ ways to choose the new image, $\delta_{\mathcal{G}_{i}}(\Phi_{i+1})\geq i\varepsilon n/2$ .

Combining these two results allows us to deduce that $i\varepsilon n/2|\Phi_{i+1}|\leq e(\mathcal{G}_{i})\leq h^{2}|\Phi_{i}|$ . Upon rearrangement we find $|\Phi_{i}|\geq\frac{\varepsilon n}{2h^{2}}|\Phi_{i+1}|$ . Note that two functions $\phi$ and $\psi$ which differ in one coordinate must also satisfy $w(\phi)\leq\Delta^{2}w(\psi)$ and vice versa. Hence $w(\Phi_{i})\geq\frac{\varepsilon n}{2h^{2}\Delta^{2}}w(\Phi_{i+1})$ . This allows us to apply induction to deduce that

\sum_{j=0}^{h}\sum_{\phi\in\Phi_{j}}\prod_{i=1}^{h}d_{\phi(i)}^{-}d_{\phi(i)}^{+}=\sum_{j=0}^{h}w(\Phi_{j})\leq w(\Phi_{0})\sum_{j=0}^{h}\left(\frac{2h^{2}\Delta^{2}}{\varepsilon n}\right)^{j}\leq\frac{w(\Phi_{0})}{1-\frac{2h^{2}\Delta^{2}}{\varepsilon n}}.

So by (9), $w(\Phi_{0})\geq(1-\frac{2h^{2}\Delta^{2}}{\varepsilon n})m^{h}(1+Q)^{h}$ . Combining this with (8) allows us to deduce the statement of the lemma, that the following is a lower bound on the probability of finding a cycle at a specified position in $G(\mathbf{d})$ :

\frac{(1+Q)^{h}}{(n)_{h}}\left(1-\frac{2h^{2}\Delta^{2}}{\varepsilon n}\right)\left(1-\frac{h\Delta^{2}}{2m}\right).

∎

3 Digraph Counting

In this section we will prove the following bound on the number of strongly connected multi-digraphs with maximum total degree $4$ . This is similar to [5, Lemma 2.3] although a weaker bound suffices here allowing us to simplify the proof somewhat.

Lemma 3.1.

Suppose that $n,m,a,b\in\mathbb{N}$ such that $n+a+b=m$ . Let $N(n,a,b)$ be the number of labelled strongly connected multi-digraphs with $n$ vertices and degree distribution given by

in-degree	out-degree	quantity
$1$	$1$	$n-2a-b$
$1$	$2$	a
$2$	$1$	a
$2$	$2$	b

Then, we have the following bound,

N(n,a,b)\leq(3a+2b)(m-1)!\binom{n}{a,a,b}

To prove this bound we will use the preheart configuration model of Pérez-Giménez and Wormald [19] which we shall define as follows.

A preheart is a multi-digraph with minimum semi-degree at least $1$ and no cycle components. The heart of a preheart $D$ is the multidigraph $H(D)$ formed by suppressing all vertices of $D$ which have in and out degree precisely $1$ . For a degree sequence $\mathbf{d}$ on vertex set $V$ , define

T=T(\mathbf{d})=\{v\in V:d^{+}(v)+d^{-}(v)\geq 3\}.

To form the preheart configuration model, first we apply the configuration model to $T$ to produce a heart $H$ . Given a heart configuration $H$ , we construct a preheart configuration $Q$ by assigning $V\setminus T$ to $E(H)$ such that the vertices assigned to each arc of $H$ are given a linear order. Denote this assignment including the orderings by $q$ . Then the preheart configuration model, $\mathcal{Q}(\mathbf{d})$ is the probability space of random preheart configurations formed by choosing $H$ and $q$ uniformly at random.

It is easy to see that every strongly connected digraph is produced by the preheart configuration model. Thus counting the number of possible outcomes from the preheart configuration model gives an upper bound for the number of strongly connected digraphs with the same degree sequence. We now count the number of preheart configurations using [5, Lemma 2.4].

Lemma 3.2.

In the preheart configuration model with $n$ vertices, $m$ edges and degree sequence $\mathbf{d}$ , let $n^{\prime}=|T(\mathbf{d})|$ be the number of vertices of the heart. Then there are a total of

\frac{n^{\prime}+m-n}{m}m!

preheart configurations.

From this lemma, we may prove Lemma 3.1.

Proof of Lemma 3.1.

First, we choose the degree sequence. So note that there are at most $\binom{n}{a,a,b}$ ways in which we can give $a$ vertices in-degree $1$ and out-degree $2$ , $a$ vertices in-degree $2$ and out-degree $1$ , $b$ vertices in-degree $2$ and out-degree $2$ and the remainder in-degree $1$ and out-degree $1$ . Having fixed this degree sequence, by Lemma 3.2 as the heart contains $2a+b$ vertices and there are $a+b$ more edges than vertices, the number of strongly connected digraphs with this degree sequence is bounded above by $\frac{3a+2b}{m}m!$ . Hence the number of strongly connected digraphs with degree distribution as in the statement of the lemma is at most

(3a+2b)(m-1)!\binom{n}{a,a,b}

as claimed. ∎

4 Proof of Theorem 1.3

In this section we prove Theorem 1.3 and show that every strongly connected component is a cycle and that these cycles are not particularly large. In particular this provides a configuration model analogue of [17, Theorem 7]. Working with the configuration model introduces several additional difficulties with the proof of this, foremost of these being that we do not have enough control of the subgraph counts to compute the factorial moments and show that they converge to those of a Poisson distribution. We will instead use the Chen-Stein method for Poisson approximation which only requires good control of the first moment and an upper bound on the second.

The proof of this theorem splits naturally into four parts. Choose functions $f(n)\gg g(n)$ which are defined such that $f(n)=\omega(\sqrt{m/|Q|})$ , $f(n)=o(m|Q|/R^{-})$ and $g(n)=\omega(1/|Q|)$ . Moreover for this section we shall assume that $R^{-}\geq R^{+}$ and if this is not the case, we swap the orientations of all edges to get an equivalent digraph with $R^{-}\geq R^{+}$ as desired. We will say that a cycle $C$ is

•

Long if $|C|\geq f(n)$ ,
•

Medium if $g(n)<|C|<f(n)$ ,
•

Short if $|C|\leq g(n)$ .

First we will show that there are no long or medium cycles in the directed configuration model. Next, we show that there are no complex components and finally we show the result on the distribution of the length of the $k$ th longest cycle.

4.1 Long Cycles

Lemma 4.1.

The configuration model, $G(\mathbf{d})$ has no long cycles.

To show that there are no long cycles, it suffices to show that the out-component of an arbitrary vertex is bounded above by $f(n)$ . Certainly, the longest cycle in a directed graph is at most the size of the largest out-component and so the lemma follows.

Thus, consider the following version of the branching process of Hatami and Molloy [12] for the out-component in a digraph. For a vertex $v$ we explore its out-component in the random digraph $G(\mathbf{d})$ as follows. We will have a partial subdigraph $C_{t}$ at time $t$ consisting of the vertices explored thus far. $C_{t}$ will consist of all in- and out-stubs of some vertices of $G(\mathbf{d})$ together with a matching of some of the stubs. If there are unmatched out-stubs in $C_{t}$ we will pick one at random and match it to some in-stub which yields an edge of $C_{t}$ . We define $Y_{t}$ as the number of unmatched out-stubs in $C_{t}$ . Thus, $Y_{t}=0$ indicates that we have explored an out-component in its entirety. Formally we define the exploration process as follows.

•

Choose a vertex $v$ and initialise $C_{0}=\{v\}$ and $Y_{0}=d^{+}(v)$ .
•
While $Y_{t}>0$ , choose an arbitrary unmatched out-stub of any vertex $v\in C_{t}$ . Pick a uniformly random unmatched in-stub and let $u$ be the vertex to which this in-stub belongs. Match these two stubs forming an edge of $G$ .
- –
  
  If $u\not\in C_{t}$ we add it so $C_{t+1}=C_{t}\cup\{u\}$ and $Y_{t+1}=Y_{t}+d^{+}(u)-1$ .
- –
  
  Otherwise, $C_{t+1}=C_{t}$ , $Y_{t+1}=Y_{t}-1$ .

Note that $Y_{t}$ does not depend on how we have exposed $C_{t}$ so $C_{t}$ and $Y_{t}$ are Markov processes. We define the following quantities.

•

$D_{t}:=Y_{t}+\sum_{u\not\in C_{t}}d^{+}(u)$ , the number of unmatched out-stubs at time $t$ . Note this is also the number of unmatched in-stubs at time $t$ .
•

$v_{t}:=\emptyset$ if $C_{t-1}$ and $C_{t}$ have the same vertex set. Otherwise it is the unique vertex in $C_{t}\setminus C_{t-1}$ .
•

$Q_{t}:=\frac{\sum_{u\not\in C_{t}}d^{-}(u)d^{+}(u)}{D_{t}}-1$ .

Note that initially $Q_{t}=Q$ . Also, for unvisited vertices $u\not\in C_{t}$ , the probability that we explore $u$ next is $\mathbb{P}(v_{t+1}=u)=\frac{d^{-}(u)}{D_{t}}$ . Hence, provided that $Y_{t}>0$ , the expected change in $Y_{t}$ is

\mathbb{E}(Y_{t+1}-Y_{t}|C_{t})=\sum_{u\not\in C_{t}}\mathbb{P}(v_{t+1}=u)(d^{+}(u)-1)=\frac{\sum_{u\not\in C_{t}}d^{-}(u)d^{+}(u)}{D_{t}}-1=Q_{t}.

(10)

As long as $Q_{t}$ remains close to $Q$ we expect that $Y_{t}$ is a random walk with drift approximately $Q$ . So in particular, for our setting of $Q<0$ , we expect that the random walk will quickly return to 0. Thus, we shall start by showing that the drift parameter $Q_{t}$ is indeed close to $Q$ with high probability. For this we shall use the following formulation of the Azuma-Hoeffding inequality (see [13, Theorem 2.25]).

Theorem 4.2 (Azuma-Hoeffding Inequality).

Let $(X_{k})_{k=0}^{n}$ be a martingale with $X=X_{n}$ , $X_{0}=\mathbb{E}(X)$ . Suppose there exist constants, $c_{k}>0$ such that

|X_{k}-X_{k-1}|\leq c_{k}\;\text{ for all }\;k\leq n.

Then for any $\lambda\geq 0$ ,

\mathbb{P}(|X_{n}-X_{0}|\geq\lambda)\leq 2\exp\left(-\frac{\lambda^{2}}{2\sum_{k=1}^{n}c_{k}^{2}}\right).

For $t\geq 1$ define $W_{t}:=Q_{t}-Q_{t-1}-\mathbb{E}(Q_{t}-Q_{t-1}|C_{t-1})$ . Also, we define $X_{0}=Q$ and for $t\geq 1$ let

X_{t}:=X_{0}+\sum_{i=1}^{t}W_{i}=Q_{t}-\sum_{i=1}^{t}\mathbb{E}(Q_{i}-Q_{i-1}|C_{i-1}).

(11)

It is a simple check that the $X_{t}$ forms a martingale. Furthermore, $|Q_{t}-Q|\leq|Q_{t}-X_{t}|+|X_{t}-Q|$ so to bound the probability that $|Q_{t}-Q|$ is large, we show that $|Q_{t}-X_{t}|$ is small and bound the probability that $|X_{t}-Q|$ is large.

For the second of these, consider the auxiliary random variables

\widetilde{Q}_{t}:=\frac{\sum_{u\not\in C_{t}}d^{-}(u)d^{+}(u)}{D_{t-1}}-1.

That is we change $Q_{t}$ to have the same denominator as $Q_{t-1}$ . As we assume that $t\leq m/2$ , $D_{t}\geq m/2$ so combining this with the fact that $|D_{t}-D_{t-1}|\leq 1$ we deduce that

|Q_{t}-\widetilde{Q}_{t}|=\frac{|D_{t}-D_{t-1}|\sum_{u\not\in C_{t}}d^{-}(u)d^{+}(u)}{D_{t}D_{t-1}}\leq\frac{4}{m}.

(12)

Also, as there is at most one vertex whose contributions are removed in moving from $Q_{t-1}$ to $Q_{t}$ , then

|Q_{t-1}-\widetilde{Q}_{t}|\leq\frac{4\Delta^{2}}{m}.

(13)

Combining equations (12) and (13) gives an upper bound on $|Q_{t}-Q_{t-1}|$ which we may then use to bound the martingale differences almost surely,

|X_{t}-X_{t-1}|\leq\frac{8\Delta^{2}+8}{m}\leq\frac{16\Delta^{2}}{m}.

(14)

Next, we will bound the terms $\mathbb{E}(Q_{t}-Q_{t-1}|C_{t-1})$ . It will be convenient to do this in two stages utilising the auxiliary random variables $\widetilde{Q}_{t}$ . So, first note that

	$\displaystyle 0\leq\mathbb{E}(Q_{t-1}-\widetilde{Q}_{t}\|C_{t-1})$	$\displaystyle=\sum_{u\not\in C_{t-1}}\mathbb{P}(v_{t}=u)\frac{d^{-}(u)d^{+}(u)}{D_{t-1}}\leq\sum_{u\in V(G)}\frac{(d^{-}(u))^{2}d^{+}(u)}{D_{t-1}^{2}}$
		$\displaystyle\leq\sum_{u\in V(G)}\frac{4(d^{-}(u))^{2}d^{+}(u)}{m^{2}}\leq\frac{4R^{-}+4}{m}.$		(15)

We can combine (15) with (12) and apply 1.2 to deduce that

|\mathbb{E}(Q_{t}-Q_{t-1}|C_{t-1})|\leq\frac{4R^{-}+8}{m}\leq\frac{12R^{-}}{\zeta m}.

(16)

This leaves us in a situation in which we can compare $X_{t}$ and $Q_{t}$ ,

|X_{t}-Q_{t}|\leq\sum_{i=1}^{t}|\mathbb{E}(Q_{i}-Q_{i-1}|C_{i-1})|\leq\frac{12R^{-}t}{\zeta m}.

(17)

So provided that $t\leq\frac{m\zeta|Q|}{48R^{-}}$ we have $|X_{t}-Q_{t}|\leq|Q|/4$ and in particular, for any such $t$ ,

		$\displaystyle\mathbb{P}\left(Q_{t}-Q\geq\frac{\|Q\|}{2}\right)=\mathbb{P}\left(Q_{t}-X_{t}+X_{t}-Q\geq\frac{\|Q\|}{2}\right)$
	$\displaystyle\leq\;$	$\displaystyle\mathbb{P}\left(X_{t}-Q\geq\frac{\|Q\|}{4}\right)\leq\mathbb{P}\left(\|X_{t}-Q\|\geq\frac{\|Q\|}{4}\right)$

Whereupon we can apply the Azuma-Hoeffding inequality as $X_{0}=Q$ . We can use the bound from (14) for the $c_{k}$ . Substituting into Theorem 4.2 yields,

\mathbb{P}\left(|X_{t}-Q|\geq\frac{|Q|}{4}\right)\leq\exp\left(-\frac{|Q|^{2}m^{2}}{8192t\Delta^{4}}\right)\leq\exp\left(-C|Q|mn^{-2/3}\log(n)\right),

(18)

where the second inequality in (18) comes from $t\leq\frac{m\zeta|Q|}{48R^{-}}$ , $\Delta\leq n^{1/6}\log^{-1/4}(n)$ and $R^{-}\geq\zeta$ . We could improve the dependence on $\Delta$ by using Freedman’s inequality [9] in place of the Azuma-Hoeffding inequality here however there are other points where we require $\Delta\leq n^{1/6}$ and so this would only remove the $\log^{-1/4}(n)$ term in 1.2. Note $mn^{-2/3}\geq m^{1/3}/2$ hence $|Q|mn^{-2/3}\to\infty$ and so for any large enough $n$ ,

\mathbb{P}\left(Q_{t}-Q\geq\frac{|Q|}{2}\right)=\mathbb{P}\left(Q_{t}\geq-\frac{|Q|}{2}\right)\leq n^{-2}.

(19)

Now that we have shown that $Q_{t}$ is concentrated around $Q$ , we can proceed to show that $G(\mathbf{d})$ has no large components with high probability via a stopping time argument. We will use the following version of Doob’s optional stopping theorem [20, Theorem 10.10]

Theorem 4.3 (Optional Stopping Theorem).

Let $X$ be a supermartingale and let $\tau$ be a stopping time. Then $X_{\tau}$ is integrable and furthermore,

\mathbb{E}(X_{\tau})\leq\mathbb{E}(X_{0})

whenever $\tau$ is bounded.

So, now let us show that there is no component of size larger than $f(n)$ . For each $v\in V(G)$ we shall consider the exploration process started at $v$ . Recall that $f(n)=o(n|Q|/R^{-})$ and so for sufficiently large $n$ , $f(n)\leq\frac{m\zeta|Q|}{48R^{-}}$ . Hence we have $Q_{t}\leq-|Q|/2$ with high probability for each $t\leq f(n)$ . Define the stopping time

\tau:=\min\{t\geq 0|Y_{t}=0\text{ or }Q_{t}\geq-|Q|/2\text{ or }t=f(n)\}

Recall that for $t\leq\frac{m\zeta|Q|}{48R^{-}}$ we have $\mathbb{E}(Y_{t}-Y_{t-1})=Q_{t-1}\leq-|Q|/2$ . Thus, $Y_{\min(t,\tau)}+|Q|\min(t,\tau)/2$ is a supermartingale. Clearly, $\tau$ is bounded by $f(n)$ so we may apply Theorem 4.3 to $Y_{\min(t,\tau)}+|Q|\min(t,\tau)/2$ from which we deduce that

\mathbb{E}\left(Y_{\tau}+\frac{|Q|\tau}{2}\right)\leq Y_{0}=d^{+}(v).

Upon rearrangement this yields,

\mathbb{E}(\tau)\leq 2\frac{d^{+}(v)-\mathbb{E}(Y_{\tau})}{|Q|}\leq\frac{2d^{+}(v)}{|Q|}.

By Markov’s inequality we can deduce

\mathbb{P}(\tau=f(n))\leq\frac{2d^{+}(v)}{|Q|f(n)}

The only other way in which we could have $Y_{\tau}\neq 0$ is if for some $i$ we have $Q_{i}\geq-|Q|/2$ . A union bound allows us to deduce that this occurs with probability at most $f(n)n^{-2}\leq n^{-1}$ . So for any large enough $n$ ,

\mathbb{P}(Y_{\tau}\neq 0)\leq\frac{2d^{+}(v)}{|Q|f(n)}+\frac{1}{n}.

Define $Z$ as the number of vertices of $G$ which lie in cycles of size at least $f(n)$ . Note that any such vertex must have out component of size at least $f(n)$ . Thus,

	$\displaystyle\mathbb{P}(\|\mathcal{C}_{1}\|\geq f(n))$	$\displaystyle=\mathbb{P}(Z\geq f(n))\leq\frac{\mathbb{E}(Z)}{f(n)}\leq\frac{1}{f(n)}\sum_{v\in V(G)}\mathbb{P}(\|C^{+}(v)\|\geq f(n))$
		$\displaystyle\leq\frac{1}{f(n)}\sum_{v\in V(G)}\left(\frac{2d^{+}(v)}{\|Q\|f(n)}+\frac{1}{n}\right)=\frac{m}{\|Q\|f(n)^{2}}+\frac{1}{f(n)}=o(1)$

as $f(n)\gg\sqrt{m/|Q|}$ . Thus there are no long cycles in $G(\mathbf{d})$ .

4.2 Medium Cycles

Lemma 4.4.

The configuration model, $G(\mathbf{d})$ has no medium cycles.

Our next step is to apply Lemma 2.1 to show there are no medium cycles. By Lemma 2.1, the probability that $G(\mathbf{d})$ has a cycle of length $h$ in any particular location is at most $\frac{n^{h}\mu_{1,1}^{h}}{(n)_{h}(m)_{h}}$ . Thus the expected number of cycles of length $h$ in $G(\mathbf{d})$ is at most

	$\displaystyle\frac{h!}{\|\text{aut}(\overrightarrow{C_{h}})\|}\binom{n}{h}\frac{n^{h}\mu_{1,1}^{h}}{(n)_{h}(m)_{h}}$	$\displaystyle=\frac{m^{h}}{(m)_{h}}\frac{(1+Q)^{h}}{h}\leq\bigg{(}1+\frac{h}{m-h}\bigg{)}^{h}\frac{(1+Q)^{h}}{h}$
		$\displaystyle\leq\bigg{(}1+\frac{2h}{m}\bigg{)}^{h}\frac{(1+Q)^{h}}{h}\leq\frac{e^{hQ+\frac{2h^{2}}{m}}}{h}.$		(20)

For any $g(n)\leq h\leq f(n)$ , we have $2h^{2}/m\leq 4h^{2}/n\leq h|Q|/2$ as $f(n)=o(n|Q|)$ . Then, the expected number of cycles of length between $g(n)$ and $f(n)$ is at most

\sum_{h=g(n)}^{f(n)}\frac{e^{hQ+\frac{2h^{2}}{m}}}{h}\leq\sum_{h=g(n)}^{f(n)}\frac{e^{\frac{hQ}{2}}}{h}\leq\int_{g(n)}^{\infty}\frac{e^{\frac{hQ}{2}}}{h}dh=\int_{-\frac{Qg(n)}{2}}^{\infty}\frac{e^{-\lambda}}{\lambda}d\lambda=E_{1}\bigg{(}-\frac{Qg(n)}{2}\bigg{)},

(21)

where $E_{1}(x)$ is the exponential integral function and the first equality follows by making the substitution $\lambda=-Qh/2$ (recall that $Q<0$ and so this substitution preserves positivity). It is straightforward to bound $0\leq E_{1}(x)\leq e^{-x}/x$ which allows us to conclude that $E_{1}(x)\to 0$ as $x\to\infty$ . Note that $g(n)=\omega(1/|Q|)$ and so $-Qg(n)\to\infty$ . Thus the expected number of cycles in $G(\mathbf{d})$ of length at least $g(n)$ is $o(1)$ . So by Markov’s inequality, there are no such cycles with high probability.

4.3 Complex Components

Lemma 4.5.

The configuration model, $G(\mathbf{d})$ has no complex components.

We begin by defining digraphs $S(a,b,c)$ and $T(a,b)$ for $a,b,c\in\mathbb{N}$ . Let $S(a,b,c)$ be the digraph with $a+b+c-1$ vertices consisting of vertices $u$ , $v$ and three internally disjoint paths. One of length $a$ from $u$ to $v$ , one of length $b$ from $u$ to $v$ and one of length $c$ from $v$ to $u$ . Let $T(a,b)$ be the digraph with $a+b-1$ vertices consisting of two cycles, one of length $a$ , one of length $b$ which intersect at a single vertex, $u$ . We can use the ear decomposition of a strongly connected digraph to deduce that if $G(\mathbf{d})$ contains any complex components, then it contains a subgraph which is either a copy of $S(a,b,c)$ or a copy of $T(a,b)$ . Note that both of these are the union of two cycles and and by the results of the previous two sections, there are no cycles with more than $g(n)$ vertices with high probability. Thus we only need to show there are none of these motifs on at most $2g(n)$ vertices to deduce that there are none in $G(\mathbf{d})$ .

Unlike Łuczak and Seierstad [17], we must treat these cases separately as in the first case, we have two vertices of total degree $3$ and the rest of total degree $2$ and in the second there is one vertex of total degree $4$ in place of the total degree $3$ vertices which changes the result of applying Lemma 2.1.

First let us consider $S(a,b,c)$ . There are at most $h^{2}h!$ ways of finding such subgraphs on $h$ vertices ( $\leq h^{2}$ ways of choosing path lengths connecting the two degree $3$ vertices and assuming the associated automorphism groups are all trivial gives this bound). Thus, we may apply Lemma 2.1 to deduce that the expected number of such subgraphs in the configuration model with parameters as in the statement of Theorem 1.3 is at most

\sum_{h=1}^{2g(n)}\frac{n^{h}}{(m)_{h+1}}h^{2}h!\mu_{1,1}^{h-2}\rho_{1,2}\rho_{2,1}=\frac{R^{-}R^{+}}{m}\sum_{h=1}^{2g(n)}\frac{m^{h+1}}{(m)_{h+1}}h^{2}(1+Q)^{h-2}\leq\frac{R^{-}R^{+}}{m}\int_{0}^{2g(n)}x^{2}e^{\frac{xQ}{2}}dx,

(22)

where we eliminate the term $\frac{m^{k+1}}{(m)_{k+1}}$ in the above in the same way as in (20). An integral of the form seen in (22) can be evaluated by integrating by parts twice to deduce the following (where $t>0$ ),

\int_{0}^{y}x^{2}e^{-tx}=\frac{2}{t^{3}}-e^{-ty}\bigg{(}\frac{2}{t^{3}}+\frac{2y}{t^{2}}+\frac{2y^{2}}{t}\bigg{)}\leq\frac{2}{t^{3}}.

Thus the expected number of these subgraphs in $G(\mathbf{d})$ can be bounded above by $\frac{16R^{-}R^{+}}{m|Q|^{3}}\to 0$ .

The second case is $T(a,b)$ where we have a degree $4$ vertex. In this case there are at most $hh!$ such subgraphs on $h$ vertices ( $\leq h$ choices of the two cycle lengths and assuming the associated automorphism groups are all trivial gives this bound). Again we apply Lemma 2.1 to compute the expected number of such subgraphs of size at most $2g(n)$ which this time is at most

\sum_{h=1}^{2g(n)}\frac{n^{h}}{(m)_{h+1}}h\mu_{1,1}^{h-1}\rho_{2,2}\leq\frac{R^{+}\Delta}{m}\sum_{h=1}^{2g(n)}\frac{m^{h+1}}{(m)_{h+1}}he^{-hQ}\leq\frac{R^{+}\Delta}{m}\int_{0}^{2g(n)}xe^{-\frac{xQ}{2}}dx

(23)

Note we may pick either $\rho_{2,2,}\leq\frac{m}{n}R^{-}\Delta$ or $\frac{m}{n}R^{+}\Delta$ here by selecting which part of the product $d_{i}^{-}(d_{i}^{-}-1)d_{i}^{+}(d_{i}^{+}-1)$ to bound by $\Delta$ in computing $\rho_{2,2}$ and so we pick the smaller of the two. We may proceed similarly to before, integrating by parts which allows us to bound integrals of the form found in (23) as

\int_{0}^{y}xe^{-tx}=\frac{1}{t^{2}}-e^{-ty}\bigg{(}\frac{1}{t^{2}}+\frac{y}{t}\bigg{)}\leq\frac{1}{t^{2}}.

So, we can bound (23) above by $\frac{4R^{+}\Delta}{m|Q|^{2}}\leq\frac{4(R^{+}R^{-})^{2/3}}{m^{2/3}\zeta^{1/3}|Q|^{2}}\frac{\Delta}{m^{1/3}}\to 0$ . Thus by Markov’s inequality there are no copies of $S(a,b,c)$ or $T(a,b)$ on at most $2g(n)$ vertices in $G(\mathbf{d})$ with high probability. Combining Lemma 4.1 and Lemma 4.4 we deduce there are no cycles of length at least $g(n)$ with high probability. As any complex strongly connected digraph contains a copy of at least one of these, we deduce that $G(\mathbf{d})$ contains no complex components with high probability.

4.4 Length of the kth largest cycle

Lemma 4.6.

The $kth$ longest cycle in the configuration model, $G(\mathbf{d})$ follows the distribution given in Theorem 1.3

The idea now will be to apply a local coupling version of the Chen-Stein method (see [8, Theorem 2.8]) to deduce that the number of cycles in $G(\mathbf{d})$ of length between $\alpha/|Q|$ and $g(n)$ converges to a Poisson distribution of mean $\xi_{\alpha}$ . Let us start by stating the version of the Chen-Stein method which we will apply,

Theorem 4.7.

Let $W=\sum_{i\in\Gamma}X_{i}$ be a sum of indicator variables and let $p_{i}:=\mathbb{E}(X_{i})$ . For each $i\in\Gamma$ , divide $\Gamma\setminus\{i\}$ into two sets, $\Gamma_{i}^{s}$ and $\Gamma_{i}^{w}$ . Define

Z_{i}:=\sum_{j\in\Gamma_{i}^{s}}X_{j}\;\text{ and }\;W_{i}:=\sum_{j\in\Gamma_{i}^{w}}X_{j}.

Suppose that there exist random variables, $W_{i}^{1}$ and $\widetilde{W}_{i}^{1}$ defined on the same probability space such that

\mathcal{L}(\widetilde{W}_{i}^{1})=\mathcal{L}(W_{i}|X_{i}=1)\;\text{ and }\;\mathcal{L}(W_{i}^{1})=\mathcal{L}(W_{i}).

Then,

d_{\text{TV}}(W,Po(\mathbb{E}(W)))\leq\min(1,\mathbb{E}(W)^{-1})\sum_{i\in\Gamma}\left(p_{i}\mathbb{E}(X_{i}+Z_{i})+\mathbb{E}(X_{i}Z_{i})+p_{i}\mathbb{E}|\widetilde{W}_{i}^{1}-W_{i}^{1}|\right)

(24)

Note that this lemma requires us to have a copy of $W_{i}|X_{i}=1$ . To create such a copy, we will use the following lemma to couple the configuration model with itself conditioned on the containment of a given subgraph.

Lemma 4.8.

Let $G=(A\cup B,E)$ be a balanced bipartite complete graph and let $\mathcal{M}$ be a uniformly chosen random perfect matching of $G$ . Suppose that $a_{1},a_{2},\ldots,a_{k}$ are distinct elements of $A$ and $b_{1},b_{2},\ldots,b_{k}$ are distinct elements of $B$ . Then, the following procedure gives a copy of $\mathcal{M}|(a_{1}b_{1},a_{2}b_{2},\ldots,a_{k}b_{k}\in\mathcal{M})$ :

1.

Sample an element $M$ from $\mathcal{M}$ and set $M_{0}=M$ .
2.
For each $i\leq k$ :
- •
  
  If $a_{i}b_{i}$ is an edge of $M_{i-1}$ set $M_{i}=M_{i-1}$ .
- •
  
  Otherwise, let $a_{i}^{\prime}$ be the unique neighbour of $b_{i}$ and $b_{i}^{\prime}$ be the unique neighbour of $a_{i}$ in $M_{i-1}$ . Let $M_{i}=(M_{i-1}\setminus\{a_{i}b_{i}^{\prime},a_{i}^{\prime}b_{i}\})\cup\{a_{i}b_{i},a_{i}^{\prime}b_{i}^{\prime}\}$ .

That is $M_{k}$ is sampled uniformly from $\mathcal{M}|(a_{1}b_{1},a_{2}b_{2},\ldots,a_{k}b_{k}\in\mathcal{M})$ .

Proof.

Note that it is sufficient to prove the lemma for $k=1$ as if $\mathcal{B}=\mathcal{M}|(a_{1}b_{1},a_{2}b_{2},\ldots,a_{k-1}b_{k-1}\in\mathcal{M})$ then clearly, $\mathcal{B}|(a_{k}b_{k}\in\mathcal{B})=\mathcal{M}|(a_{1}b_{1},a_{2}b_{2},\ldots,a_{k}b_{k}\in\mathcal{M})$ . So, the general case follows by induction on the $k=1$ case.

Now, we prove the lemma for $k=1$ . To do so, we will show that each of the $(n-1)!$ atoms of $\mathcal{M}|(a_{1}b_{1}\in\mathcal{M})$ comes from precisely $n$ atoms of $\mathcal{M}$ via this switching approach. So let $M$ be an atom of $\mathcal{M}|(a_{1}b_{1}\in\mathcal{M})$ and let $e=cd$ be an edge of $M$ with $c\in A,d\in B$ . Now, consider the inverse switching, $M\to(M\setminus\{ab,cd\})\cup\{ad,bc\}$ (note if we chose the edge $ab$ , then this is simply the identity $M\to M$ ). Thus, for each atom $M$ of $\mathcal{M}|(a_{1}b_{1}\in\mathcal{M})$ there are $n$ ways to get to an atom of $\mathcal{M}$ . Similarly we can show that each atom of $\mathcal{M}$ is mapped to a unique element of $\mathcal{M}|(a_{1}b_{1}\in\mathcal{M})$ . Thus, as $\mathcal{M}$ has the uniform distribution, so does the random variable obtained by our procedure above. ∎

Note that this lemma does not allow us to generate the configuration model conditioned on the existence of a subgraph specified in the usual way by the locations of its vertices unless all of the involved vertices have in- and out-degrees at most $1$ . This is due to the fact that Lemma 4.8 allows us to condition on which pairs of stubs are connected rather than which pairs of vertices. Instead we shall condition on the existence of principal subgraphs which we shall define as follows.

Definition 4.9.

Let $G(\mathbf{d})$ be a configuration model random digraph with degree distribution $\mathbf{d}$ . Let $M=M(G(\mathbf{d}))$ be the associated perfect matching of in- and out-stubs. A principal subgraph of $G(\mathbf{d})$ is an event of the form $M^{\prime}\subseteq M$ where $M^{\prime}$ is a partial matching of in- and out-stubs.

A similar notion was seen in [15] in which their canonical events are precisely the same as our principal subgraphs. Furthermore, note that the event corresponding to a subgraph in $G(\mathbf{d})$ may be written as a union of events corresponding to principal subgraphs. In particular, the existence of the cycle $v_{1}v_{2}\ldots v_{k}$ in $G(\mathbf{d})$ can be written as the union of $\prod_{i=1}^{k}d_{G}^{-}(v_{i})d_{G}^{+}(v_{i})$ events corresponding to principal cycles. For this reason, in places where there may be some ambiguity as to whether we are dealing with principal subgraphs or not, we shall refer to subgraphs in the usual sense as union subgraphs.

This leaves us in a setting to which we can apply Theorem 4.7. We shall show that the number of cycles in $G(\mathbf{d})$ of lengths between $\alpha/|Q|$ and $g(n)$ is Poisson distributed with mean $\xi_{\alpha}$ . Let $\Gamma$ be the set of all principal cycles which have lengths between $\alpha/|Q|$ and $g(n)$ . For each $C\in\Gamma$ and digraph $J$ on the same vertex set and stubs as $G(\mathbf{d})$ let $X_{C}(J)$ be the indicator function that $C$ is a principal subgraph of $J$ . Also, we split $\Gamma\setminus\{C\}$ into a set strongly dependent on $C$ and a set weakly dependent on $C$ . Define the strongly dependent set $\Gamma_{C}^{s}$ to be the set of all principal cycles which share at least one vertex with $C$ . The weakly dependent set $\Gamma_{C}^{w}$ contains all of the other principal cycles, it is the set of principal cycles which are vertex disjoint from $C$ . Finally, we define $W_{C}^{1}$ to be $W_{C}$ for an independent copy $G^{\prime}$ of $G(\mathbf{d})$ and $\widetilde{G}_{C}$ to be obtained from $G^{\prime}$ by applying a $4$ -cycle switching to each edge of $C$ in $G_{C}$ in turn and define $\widetilde{W}_{C}^{1}$ in the obvious way to be $W_{C}$ for the digraph $\widetilde{G}_{C}$ . That is,

W_{C}=\sum_{C^{\prime}\in\Gamma_{C}^{w}}X_{C^{\prime}}(G^{\prime})\qquad\qquad\qquad\widetilde{W}_{C}^{1}=\sum_{C^{\prime}\in\Gamma_{C}^{w}}X_{C^{\prime}}(\widetilde{G}_{C})

These variables clearly satisfy the assumptions of Theorem 4.7 and so we must bound the expectations in the statement of the theorem to compute an upper bound on $d_{\text{TV}}(W,Po(\mathbb{E}(W)))$ . The idea now is to reduce the whole problem to one of bounding the expected numbers of certain subgraphs being contained in $G(\mathbf{d})$ , $G^{\prime}$ and $\widetilde{G}_{C}$ . For the remainder of this section, we write $X_{C}$ for $X_{C}(G)$ and $p_{C}=\mathbb{E}(X_{C})$ unless specified otherwise.

We shall bound the three terms from (24) one by one. First let us consider the term

\eta:=\sum_{C\in\Gamma}p_{C}\mathbb{E}(X_{C}+Z_{C})=\sum_{C\in\Gamma}\sum_{C^{\prime}\in\Gamma_{C}^{s}\cup\{C\}}p_{C}p_{C}^{\prime}

(25)

Note that the set, $\{C^{\prime}\in\Gamma_{C}^{s}\cup\{C\}\}$ is the set of all cycles which share at least one vertex with $C$ . Thus, the union of $C$ and $C^{\prime}$ is a strongly connected digraph (here we allow multiple edges). In the computation of $\eta$ we will use the following generalisation of [5, Lemma 3.4]. No changes are required to the proof of the lemma as presented in [5] to generalise from digraphs to multi-digraphs.

Lemma 4.10.

Each strongly connected multi-digraph $D$ with excess $k$ may be formed in at most $27^{k}$ ways as the union of a pair of directed cycles $C_{1}$ and $C_{2}$ .

Define $\Xi$ to be the set of all strongly connected multi-digraphs with vertices a subset of $V(G(\mathbf{d}))$ such that all vertices have degrees $d^{+}(v)=d^{-}(v)=1$ or $d^{+}(v)=d^{-}(v)=2$ with at least one vertex which has $d^{+}(v)=d^{-}(v)=2$ . Note that $\Xi$ is precisely the set of multi-digraphs which can be formed as the edge disjoint union of the two cycles $C\in\Gamma$ and $C^{\prime}\in\Gamma_{C}^{s}$ . Furthermore, note that the excess of a multi-digraph in $\Xi$ is precisely the number of vertices which have $d^{+}(v)=d^{-}(v)=2$ . We let $\Xi_{k}$ be the set $\Xi_{k}^{h}:=\{F\in\Xi||F|=h\text{ and }\text{excess}(F)=k\}$ . Moreover, for each $F\in\Xi$ define

t(F):=\prod_{v\in V(F)}d^{-}_{G}(v)^{d^{-}_{F}(v)}d^{+}_{G}(v)^{d^{+}_{F}(v)}

(26)

Observe that for a given $C,C^{\prime}$ whose union is $F$ , $t(F)$ is the number of pairs of principal cycles $\widetilde{C},\widetilde{C}^{\prime}$ which are copies of the same cycles as $C,C^{\prime}$ . Thus, combining (26) with Lemma 4.10 allows us to bound $\eta$ as follows,

\eta=\sum_{C\in\Gamma}\sum_{C^{\prime}\in\Gamma_{C}^{s}\cup\{C\}}p_{C}p_{C^{\prime}}\leq\sum_{h=\frac{\alpha}{|Q|}}^{2g(n)}\sum_{k=1}^{h}\sum_{F\in\Xi_{k}^{h}}\frac{27^{k}t(F)}{(m-h-k)^{h+k}}

(27)

Now, let $\Lambda_{k}^{h}$ be the set of all strongly connected labelled multi-digraphs with $h$ vertices such that all vertices have degrees $d^{+}(v)=d^{-}(v)=1$ or $d^{+}(v)=d^{-}(v)=2$ with precisely $k$ vertices $v$ such that $d^{+}(v)=d^{-}(v)=2$ . This allows us to write

\sum_{F\in\Xi_{k}^{h}}t(F)=\frac{1}{h!}\sum_{F\in\Lambda_{k}^{h}}\sum_{\phi:V(F)\hookrightarrow V(G)}\prod_{v\in V(F)}d^{-}_{G}(v)^{d^{-}_{F}(v)}d^{+}_{G}(v)^{d^{+}_{F}(v)},

(28)

where the division by $h!$ comes from the fact that $\Lambda_{k}^{h}$ distinguishes between digraphs obtained from one-another by a permutation of the vertex set. Arguing similarly to Lemma 2.1, noting that we know the degree sequence of $F$ and using $\mu_{1,1}=\frac{m}{n}(1+Q)\leq\frac{m}{n}$ , we deduce that

\sum_{\phi:V(F)\hookrightarrow V(G)}\prod_{v\in V(F)}d^{-}_{G}(v)^{d^{-}_{F}(v)}d^{+}_{G}(v)^{d^{+}_{F}(v)}\leq n^{h}\mu_{1,1}^{h-k}\mu_{2,2}^{k}\leq m^{h}\mu_{2,2}^{k}\mu_{1,1}^{-k}.

Substituting into (28) and applying Lemma 3.1 we find

	$\displaystyle\sum_{F\in\Xi_{k}^{h}}t(F)\leq\frac{k}{h+k}\frac{(h+k)!}{h!}\binom{h}{k}m^{h}\mu_{2,2}^{k}\mu_{1,1}^{-k}$	$\displaystyle=\frac{k}{h+k}\binom{h+k}{2k}\frac{(2k)!}{k!}m^{h}\mu_{2,2}^{k}\mu_{1,1}^{-k}$
		$\displaystyle\leq\frac{(h+k)^{2k-1}}{(k-1)!}m^{h}\mu_{2,2}^{k}\mu_{1,1}^{-k}.$		(29)

To finish, we note that $((m-h-k)^{h+k})=(1+o(1))m^{h+k}$ as $h+k=o(m^{1/2})$ , this allows us to substitute the bound found in (29) into (27) where we deduce

	$\displaystyle\eta$	$\displaystyle\leq(1+o(1))\sum_{h=\frac{\alpha}{\|Q\|}}^{2g(n)}\sum_{k=1}^{h}\frac{27^{k}}{m^{h+k}}\frac{(h+k)^{2k-1}}{(k-1)!}m^{h}\mu_{2,2}^{k}\mu_{1,1}^{-k}$
		$\displaystyle\leq(1+o(1))\frac{216g(n)\mu_{2,2}}{m}\sum_{h=\frac{\alpha}{\|Q\|}}^{2g(n)}\sum_{k=0}^{h}\frac{108^{k}h^{2k}\mu_{2,2}^{k}}{m^{k}k!\mu_{1,1}^{k}}.$		(30)

Part of the expression in (30) is in the form of an exponential sum. Evaluating this sum allows us to deduce that $\eta=o(1)$ .

\eta\leq(1+o(1))\frac{216g(n)\mu_{2,2}}{m\mu_{1,1}}\sum_{h=\frac{\alpha}{|Q|}}^{2g(n)}e^{\frac{108h^{2}\mu_{2,2}}{m\mu_{1,1}}}\leq(1+o(1))\frac{432g(n)^{2}\mu_{2,2}}{m\mu_{1,1}}e^{\frac{432g(n)^{2}\mu_{2,2}}{m\mu_{1,1}}}=o(1),

(31)

where we deduce that this is $o(1)$ in the same way as when we bound (23).

Next we shall consider the term,

\theta:=\sum_{C\in\Gamma}\mathbb{E}(X_{C}Z_{C})=\sum_{C\in\Gamma}\sum_{C^{\prime}\in\Gamma_{C}^{s}\cup\{C\}}\mathbb{E}(X_{C}X_{C}^{\prime})

(32)

So note that $\mathbb{E}(X_{C}X_{C^{\prime}})$ is the probability that both $C$ and $C^{\prime}$ are simultaneously present. Furthermore, note that $C\cup C^{\prime}$ is a strongly connected (not necessarily simple) digraph with maximum total degree at most $4$ . Thus we can use a similar strategy to the one used to bound $\eta$ . So define $\Theta$ to be the set of all strongly connected multi-digraphs $F$ with $V(F)\subseteq V(G)$ and $\Delta(F)\leq 4$ . Furthermore for $a,b\in\mathbb{N}$ , define

\Theta_{a,b}^{h}:=\{F\in\Theta||F|=h,n_{1,2}(F)=n_{2,1}(F)=a,n_{2,2}(F)=b\},

where we define $\Theta_{0,0}^{h}=\emptyset$ . Define $t(F)$ for $F\in\Theta$ in the same way as (26). Then, for a given construction, $F=C\cup C^{\prime}$ from a pair of principal cycles, $t(F)$ is again the number of pairs of principal cycles $\widetilde{C},\widetilde{C}^{\prime}$ which are copies of the same cycles as $C,C^{\prime}$ . Thus, if we apply Lemma 4.10 we get the following bound on $\theta$ ,

\theta=\sum_{C\in\Gamma}\sum_{C^{\prime}\in\Gamma_{C}^{s}}\mathbb{E}(X_{C}X_{C^{\prime}})\leq\sum_{h=\frac{\alpha}{|Q|}}^{2g(n)}\sum_{a=0}^{h}\sum_{b=0}^{h}\sum_{F\in\Theta_{a,b}^{h}}\frac{27^{k}t(F)}{(m-h-a-b)^{h+a+b}}

(33)

Now, we define $\Lambda_{a,b}^{h}$ to be the set of all strongly connected labelled multi-digraphs $F$ with $h$ vertices such that $n_{1,1}(F)=h-2a-b$ , $n_{1,2}(F)=n_{2,1}(F)=a$ and $n_{2,2}(F)=b$ . (Note $\Lambda_{0,k}^{h}=\Lambda_{k}^{h}$ ). Thus, arguing as previously,

\sum_{F\in\Theta_{a,b}^{h}}t(F)=\frac{1}{h!}\sum_{F\in\Lambda_{a,b}^{h}}\sum_{\phi:V(F)\hookrightarrow V(G)}\prod_{v\in V(F)}d^{-}_{G}(v)^{d^{-}_{F}(v)}d^{+}_{G}(v)^{d^{+}_{F}(v)}

(34)

Again, we may argue similarly to Lemma 2.1 to deduce that for $F\in\Lambda_{a,b}^{h}$ ,

\sum_{\phi:V(F)\hookrightarrow V(G)}\prod_{v\in V(F)}d^{-}_{G}(v)^{d^{-}_{F}(v)}d^{+}_{G}(v)^{d^{+}_{F}(v)}\leq m^{h}\mu_{1,2}^{a}\mu_{2,1}^{a}\mu_{2,2}^{b}\mu_{1,1}^{-(2a+b)}.

We can substitute this into (33) and apply Lemma 3.1 to find

$\displaystyle\sum_{F\in\Theta_{a,b}^{h}}t(F)$	$\displaystyle\leq\frac{3a+2b}{h+a+b}\frac{(h+a+b)!}{h!}\binom{h}{a,a,b}m^{h}\mu_{1,2}^{a}\mu_{2,1}^{a}\mu_{2,2}^{b}\mu_{1,1}^{-(2a+b)}$
	$\displaystyle\leq\frac{3a+2b}{h}2^{a+b}\frac{h^{3a+2b}}{a!a!b!}m^{h}\mu_{1,2}^{a}\mu_{2,1}^{a}\mu_{2,2}^{b}\mu_{1,1}^{-(2a+b)}$
	$\displaystyle=\frac{3a+2b}{h}\frac{1}{a!a!b!}\bigg{(}\frac{2h^{3}\mu_{1,2}\mu_{2,1}}{\mu_{1,1}^{2}}\bigg{)}^{a}\bigg{(}\frac{2h^{2}\mu_{2,2}}{\mu_{1,1}}\bigg{)}^{b}m^{h}$	(35)

Note that this bound is a sum of two terms due to the factor $3a+2b$ in (35). So we will split this bound into $t_{a}(F)+t_{b}(F)$ in the obvious way. This allows us to bound $\theta\leq\theta_{a}+\theta_{b}$ by only considering the $t_{a}(F)$ or $t_{b}(F)$ terms which will be convenient for us. We substitute the bound from (35) into (33) and use that $(m-h-a-b)^{h+a+b}=(1+o(1))m^{h+a+b}$ to deduce

	$\displaystyle\theta_{a}$	$\displaystyle\leq(1+o(1))\sum_{h=\frac{\alpha}{\|Q\|}}^{2g(n)}\sum_{a=0}^{h}\sum_{b=0}^{h}\frac{3a}{h}\frac{1}{a!a!b!}\bigg{(}\frac{2\cdot 27^{2}h^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}\bigg{)}^{a}\bigg{(}\frac{54h^{2}\mu_{2,2}}{m\mu_{1,1}}\bigg{)}^{b}$		(36)
	$\displaystyle\theta_{b}$	$\displaystyle\leq(1+o(1))\sum_{h=\frac{\alpha}{\|Q\|}}^{2g(n)}\sum_{a=0}^{h}\sum_{b=0}^{h}\frac{2b}{h}\frac{1}{a!a!b!}\bigg{(}\frac{2\cdot 27^{2}h^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}\bigg{)}^{a}\bigg{(}\frac{54h^{2}\mu_{2,2}}{m\mu_{1,1}}\bigg{)}^{b}$		(37)

Note that in both cases we can evaluate the two inner sums in terms of exponential functions and modified Bessel functions of the first kind. However the latter of these is a little difficult to work with, hence we will replace the $a!a!$ with $(2a)!$ allowing us to use hyperbolic trigonometric functions instead. In particular, we have

\frac{1}{a!a!}\leq\frac{4^{a}}{(2a)!}\;\text{ and }\;\frac{1}{a!(a-1)!}\leq\frac{4^{a}}{(2a-1)!}.

Looking first at $\theta_{a}$ we get the bound

$\displaystyle\theta_{a}$	$\displaystyle\leq(1+o(1))\sum_{h=\frac{\alpha}{\|Q\|}}^{2g(n)}\sum_{a=0}^{h}\sum_{b=0}^{h}\frac{3}{h}\frac{1}{(2a-1)!b!}\bigg{(}\frac{8\cdot 27^{2}h^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}\bigg{)}^{a}\bigg{(}\frac{54h^{2}\mu_{2,2}}{m\mu_{1,1}}\bigg{)}^{b}$
	$\displaystyle=(1+o(1))162\sqrt{2}\sum_{h=\frac{\alpha}{\|Q\|}}^{2g(n)}\sqrt{\frac{h\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}}\sinh\left(54\sqrt{\frac{2h^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}}\right)e^{\frac{54h^{2}\mu_{2,2}}{m\mu_{1,1}}}$
	$\displaystyle\leq(1+o(1))648\sqrt{\frac{g(n)^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}}\sinh\left(216\sqrt{\frac{g(n)^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}}\right)e^{\frac{216g(n)^{2}\mu_{2,2}}{m\mu_{1,1}}},$	(38)

where the final inequality follows from the fact that $\sqrt{x}$ , $\sinh(x)$ and $e^{x}$ are all increasing for $x>0$ . Bounding $\theta_{b}$ is similar,

$\displaystyle\theta_{b}$	$\displaystyle\leq(1+o(1))\sum_{h=\frac{\alpha}{\|Q\|}}^{2g(n)}\sum_{a=0}^{h}\sum_{b=1}^{h}\frac{2}{h}\frac{1}{(2a)!(b-1)!}\bigg{(}\frac{8\cdot 27^{2}h^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}\bigg{)}^{a}\bigg{(}\frac{54h^{2}\mu_{2,2}}{m\mu_{1,1}}\bigg{)}^{b}$
	$\displaystyle=(1+o(1))108\sum_{h=\frac{\alpha}{\|Q\|}}^{2g(n)}\frac{h\mu_{2,2}}{m\mu_{1,1}}\cosh\left(54\sqrt{\frac{2h^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}}\right)e^{\frac{54h^{2}\mu_{2,2}}{m\mu_{1,1}}}$
	$\displaystyle\leq(1+o(1))432\frac{g(n)^{2}\mu_{2,2}}{m\mu_{1,1}}\cosh\left(216\sqrt{\frac{g(n)^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}}\right)e^{\frac{216g(n)^{2}\mu_{2,2}}{m\mu_{1,1}}}$	(39)

Both (38) and (39) are $o(1)$ which follows from the facts that

\frac{g(n)^{2}\mu_{2,2}}{m\mu_{1,1}}\to 0\qquad\text{ and }\qquad\frac{g(n)^{3}\mu_{1,2}\mu_{2,1}}{m\mu_{1,1}^{2}}\to 0.

The first of which we showed in (31) and the latter follows directly from the assumption that $g(n)^{3}=o(\frac{m}{R^{-}R^{+}})$ . Hence $\theta=o(1)$ . Finally, we will bound the terms,

\kappa_{C}:=\mathbb{E}|\widetilde{W}_{C}^{1}-W_{C}^{1}|.

(40)

We note that unlike in bounding $\eta$ and $\theta$ we will not need to look at the sum over $C\in\Gamma$ in computing the bound. This is due to the fact that $\kappa_{C}$ depends on a much more global structure than the terms $p_{C}\mathbb{E}(X_{C}+Z_{C})$ and $\mathbb{E}(X_{C}Z_{C})$ . For the bounding of $\kappa_{C}$ , the first thing to observe is that due to the choice of coupling and $\Gamma_{C}^{s}$ we have $\widetilde{W}_{C}^{1}\geq W_{C}^{1}$ . This is because all of the edges which are removed by the switchings are incident to a vertex of $C$ and therefore none of the principal cycles containing these edges are in $\Gamma_{C}^{w}$ which are the cycles contributing to $\widetilde{W}_{C}^{1}\geq W_{C}^{1}$ . This enables us to bound $\kappa_{C}$ by computing the number of expected cycles from $\Gamma_{C}^{w}$ which are added by the switchings.

In order to add a cycle $C^{\prime}$ with the switchings over $4$ -cycles which produce $\widetilde{G}_{C}$ from $G^{\prime}$ the following must be true for some $k$ ,

•

All but $k$ edges of $C^{\prime}$ must be present in $G^{\prime}$ .
•

$k$ of the edges of $C$ must add these edges after applying the switching.

Now let us compute the probability that this event comes to pass. So let us fix $k$ edges missing from $C^{\prime}$ which we match up with $k$ of the edges of $C$ which will be used to add them when we apply the switching. So if one such edge is $e=uv$ and this is matched with $e^{\prime}=u^{\prime}v^{\prime}$ , in order for the switching to add $e$ , before switching we must have edges $uv^{\prime}$ and $u^{\prime}v$ present in $G^{\prime}$ (note that because the edges are directed we do not need to consider the alternate switching using $uu^{\prime}$ and $vv^{\prime}$ like we would if working with graphs). Hence in total there are $|C^{\prime}|+k$ edges which we must find in $G^{\prime}$ before switching in order that $C$ is a cycle of $\widetilde{G}_{C}$ . The probability we find these edges is $((m)_{|C^{\prime}|+k})^{-1}$ .

Now, let us count how many such structures there are which produce the same union cycle as $C^{\prime}$ . As $C$ is a principal cycle, we know exactly which stubs we use for it and so there is no contribution to the number of copies from vertices of $C$ . However there are $d^{+}(u)d^{-}(v)$ switchings which add the union edge $uv$ and so the number of ways in which the switching structure with a union cycle which is that of $C^{\prime}$ can be found is

\prod_{v\in V(C^{\prime})}d^{-}(v)d^{+}(v).

Thus for any fixed set of missing edges for the union cycle with the same edges as $C^{\prime}$ and choice of edges from $C$ with which we add these edges, the expected number of additional principal cycles after the switchings is

\frac{1}{(m)_{|C^{\prime}|+k}}\prod_{v\in V(C^{\prime})}d^{-}(v)d^{+}(v).

(41)

There are $\binom{|C|}{k}$ ways to pick the edges of $C$ which we switch over to add the missing edges of $C^{\prime}$ . Also there are $\binom{|C^{\prime}|}{k}$ ways to pick the missing edges of $C^{\prime}$ and $k!$ ways to assign edges of $C$ to missing edges of $C^{\prime}$ . Thus, when we sum over principal cycles $C^{\prime}$ , missing edges of $C$ and $C^{\prime}$ and matchings of the missing edges, we find the expected number of such structures is at most

\mathbb{E}|\widetilde{W}_{C}^{1}-W_{C}^{1}|\leq\sum_{h=\frac{\alpha}{|Q|}}^{g(n)}\sum_{k=1}^{\min(|C|,h)}\frac{1}{h}\sum_{C^{\prime}\subseteq G;|C^{\prime}|=h}\binom{|C|}{k}\binom{h}{k}\frac{k!}{(m)_{h+k}}\prod_{v\in V(C^{\prime})}d^{-}(v)d^{+}(v).

(42)

Applying standard bounds on binomial coefficients and falling factorials and taking vertices from $V(G(\mathbf{d}))$ for $|C^{\prime}|$ with replacement rather than without allows us to bound (42) as follows

$\displaystyle\mathbb{E}\|\widetilde{W}_{C}^{1}-W_{C}^{1}\|$	$\displaystyle\leq(1+o(1))\sum_{h=\frac{\alpha}{\|Q\|}}^{g(n)}\sum_{k=1}^{\infty}\frac{\|C\|^{k}h^{k-1}}{k!m^{k}}(1+Q)^{h}$
	$\displaystyle\leq(1+o(1))\sum_{\|C^{\prime}\|=\frac{\alpha}{\|Q\|}}^{g(n)}\frac{e^{\frac{\|C\|h}{m}}-1}{\|C^{\prime}\|}$
	$\displaystyle\leq\frac{g(n)\|Q\|}{\alpha}\left(e^{\frac{g(n)^{2}}{m}}-1\right)\leq\frac{2g(n)^{3}\|Q\|}{\alpha m}=o(1),$	(43)

where we note that $g(n)^{2}/m<1$ allows us to use the bound $e^{x}-1\leq 2x$ in (43). So $\kappa_{C}\leq\kappa=o(1)$ for all $C$ where $\kappa=\frac{2g(n)^{3}|Q|}{\alpha m}$ . Finally, note that this implies $\sum_{C\in\Gamma}p_{C}\kappa_{C}\leq\kappa\mathbb{E}(W)$ . We subsequently show that $\mathbb{E}(W)\leq\xi_{\alpha}+o(1)$ from which it follows that $\sum_{C\in\Gamma}p_{C}\kappa_{C}=o(1)$ as $\xi_{\alpha}$ is a constant independent of $n$ . Using this and the fact that $\eta,\theta=o(1)$ , we conclude that

d_{\text{TV}}(W,Po(\mathbb{E}(W)))=o(1).

Note that $W$ is the number of cycles of $G(\mathbf{d})$ of length between $\alpha/|Q|$ and $g(n)$ . By Lemma 4.1 and Lemma 4.4, the probability that there are any longer cycles is $o(1)$ . Thus, if $W^{\prime}$ is the number of cycles of $G(\mathbf{d})$ of length at least $\alpha/|Q|$ , $d_{\text{TV}}(W^{\prime},Po(\mathbb{E}(W)))=d_{\text{TV}}(W,Po(\mathbb{E}(W)))+o(1)=o(1)$ . To show that the mean of the corresponding Poisson distribution can be taken to be $\xi_{\alpha}$ note that $d_{\text{TV}}(Po(\lambda),Po(\mu))\leq|\lambda-\mu|$ holds for all $\lambda,\mu\geq 0$ . This follows from coupling $\text{Bin}(n,\lambda/n)$ and $\text{Bin}(n,\mu/n)$ by coupling their constituent Bernoulli trials and noting that $d_{\text{TV}}(\text{Bin}(n,\lambda/n),Po(\lambda))=o_{n}(1)$ . Thus it suffices to show that $\mathbb{E}(W)=\xi_{\alpha}+o(1)$ . Upper bounding $\mathbb{E}(W)$ may be done in the same way as the proof of Lemma 4.4. However, we can be more precise than we are in (21) as in this case $2h^{2}/m=o(1)$ , so this line can be replaced by

\sum_{h=\frac{\alpha}{|Q|}}^{g(n)}\frac{e^{hQ+\frac{2h^{2}}{m}}}{h}\leq\sum_{h=\frac{\alpha}{|Q|}}^{g(n)}\frac{e^{hQ+o(1)}}{h}\leq(1+o(1))\int_{\frac{\alpha}{|Q|}}^{\infty}\frac{e^{hQ}}{h}dh=(1+o(1))\int_{\alpha}^{\infty}\frac{e^{-\lambda}}{\lambda}d\lambda=\xi_{\alpha}+o(1).

(44)

Next, we lower bound $\mathbb{E}(W)$ by $\xi_{\alpha}-o(1)$ . In order to do this we will use Lemma 2.3 in combination with the inequality

1-x\geq e^{-x-x^{2}}\;\text{ for }\;x\leq\frac{1}{2}.

This allows us to deduce that the probability of finding a cycle of length between $\alpha/|Q|$ and $g(n)$ is at least

\sum_{h=\frac{\alpha}{|Q|}}^{g(n)}\frac{h!}{h}\binom{n}{h}\frac{(1+Q)^{h}}{(n)_{h}}\left(1-\frac{2h^{2}\Delta^{2}}{\varepsilon n}\right)\left(1-\frac{h\Delta^{2}}{2m}\right)\geq\sum_{h=\frac{\alpha}{|Q|}}^{g(n)}\frac{(1+o(1))}{h}e^{h(Q-\frac{Q^{2}}{2})}

(45)

Now, that the exponent in the above is equal to $hQ-o(1)$ follows from the assumption $g(n)=o(1/|Q|^{2})$ . Thus in analogy with (44), we deduce that (45) is at most

(1+o(1))\sum_{h=\frac{\alpha}{|Q|}}^{g(n)}\frac{e^{hQ-o(1)}}{h}\geq(1-o(1))\int_{\frac{\alpha}{|Q|}}^{\infty}\frac{e^{hQ}}{h}dh=(1-o(1))\int_{\alpha}^{\infty}\frac{e^{-\lambda}}{\lambda}d\lambda=\xi_{\alpha}-o(1).

(46)

Combining (44) and (46) we deduce that $\mathbb{E}(W)=\xi_{\alpha}+o(1)$ as required.

4.5 Putting it all together

We can deduce the statement of the theorem from the previous sections as follows. First we apply Lemma 4.1, Lemma 4.4 and Lemma 4.5 to deduce there are no cycles with length at least $\omega(1/|Q|)$ or complex components. Then, we can apply Lemma 4.6 to deduce that the distribution of the number of cycles with at least $\alpha/|Q|$ vertices converges to a Poisson distribution with mean $\xi_{\alpha}$ from which the statement that

\mathbb{P}\left(|\mathcal{C}_{k}|\geq\frac{\alpha}{|Q|}\right)=1-\sum_{i=0}^{k-1}\frac{\xi_{\alpha}^{i}}{i!}e^{-\xi_{\alpha}}+o(1)

follows immediately as the above is simply the probability that a Poisson( $\xi_{\alpha}$ ) distribution is at least $k$ (plus the $o(1)$ term).

5 Concluding Remarks

In this paper, we showed that the largest component of the directed configuration model is of order $|Q|^{-1}$ when $nQ^{3}(R^{-}R^{+})\to-\infty$ and found the distribution of the size of the $k$ th largest component for any $k$ . In a subsequent paper [4] we shall show that under similar conditions to those in this paper that for degree sequences such that $nQ^{3}(R^{-}R^{+})^{-1}\to\infty$ the largest component is of order $nQ^{2}(R^{-}R^{+})^{-1}$ . This quantity matches the $1/|Q|$ we find in this paper if $n|Q|^{3}(R^{-}R^{+})^{-1}\to c$ for some constant $c$ and suggests that one may find a critical window phenomenon for degree sequences with such parameters. Note that in particular this is precisely the critical window of $D(n,p)$ (looking at typical degree sequences from the model $D(n,p)$ ). Moreover, the recent result of Donderwinkel and Xie certainly seems to indicate that the point $Q=0$ lies inside a critical window.

An interesting question for further study would be to ask about the joint distribution of the largest strongly connected components in the directed configuration model with parameters as in this paper. The fact that we find

\mathbb{P}\bigg{(}|\mathcal{C}_{k}|\geq\frac{\alpha}{|Q|}\bigg{)}=1-\sum_{i=0}^{k-1}\frac{\xi_{\alpha}^{i}}{i!}e^{-\xi_{\alpha}}+o(1)

seems rather suggestive of an underlying Poisson process. As such we make the following conjecture.

Conjecture 5.1.

Let $G$ be a directed configuration model with parameters as in Theorem 1.3 and suppose the sizes of its components in descending order are given by the random variables $Z_{1}\geq Z_{2}\geq\ldots$ . Suppose further that $X_{1}\leq X_{2}\leq\ldots$ are the points of a Poisson process of rate $1$ and $Y_{1}\geq Y_{2}\geq\ldots$ are the unique positive solutions to

X_{i}=\int_{Y_{i}}^{\infty}\frac{e^{-x}}{x}dx.

Then we have that

(|Q|Z_{1},|Q|Z_{2},\ldots)\to(Y_{1},Y_{2},\ldots)\text{ as }n\to\infty.

Note that the above is also an open question for $D(n,p)$ with $p=(1-\varepsilon)/n$ and $\varepsilon\to 0$ , $\varepsilon^{3}n\to\infty$ .

References

[1] B. Bollobás. A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics, 1(4):311–316, 1980.
[2] X. S. Cai and G. Perarnau. The giant component of the directed configuration model revisited. arXiv preprint arXiv:2004.04998, 2020.
[3] C. Cooper and A. Frieze. The size of the largest strongly connected component of a random digraph with a given degree sequence. Combinatorics, Probability and Computing, 13(3):319–337, 2004.
[4] M. Coulson. The strong component structure of the barely supercritical directed configuration model. Manuscript.
[5] M. Coulson. The critical window in random digraphs. arXiv preprint arXiv:1905.00624, 2019.
[6] S. Donderwinkel and Z. Xie. Universality for the directed configuration model with random degrees: metric space convergence of the strongly connected components at criticality. arXiv preprint arXiv:2105.11434, 2021.
[7] P. Erdos and A. Rényi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17–60, 1960.
[8] T. Erhardsson. Steins method for Poisson and compound Poisson approximation. An introduction to Stein’s method, 4:61–114, 2005.
[9] D. A. Freedman. On tail probabilities for martingales. the Annals of Probability, pages 100–118, 1975.
[10] C. Goldschmidt and R. Stephenson. The scaling limit of a critical random directed graph. arXiv preprint arXiv:1905.05397, 2019.
[11] A. Graf. On the strongly connected components of random directed graphs with given degree sequences. Master’s thesis, University of Waterloo, 2016.
[12] H. Hatami and M. Molloy. The scaling window for a random graph with a given degree sequence. Random Structures & Algorithms, 41(1):99–123, 2012.
[13] S. Janson, T. Łuczak, and A. Ruciński. Random graphs, volume 45. John Wiley & Sons, 2011.
[14] R. M. Karp. The transitive closure of a random digraph. Random Structures & Algorithms, 1(1):73–93, 1990.
[15] L. Lu and L. Székely. Using Lovász Local Lemma in the space of random injections. the electronic journal of combinatorics, page R63, 2007.
[16] T. Łuczak. The phase transition in the evolution of random digraphs. Journal of graph theory, 14(2):217–223, 1990.
[17] T. Łuczak and T. G. Seierstad. The critical behavior of random digraphs. Random Structures & Algorithms, 35(3):271–293, 2009.
[18] A. Nachmias and Y. Peres. Critical percolation on random regular graphs. Random Structures & Algorithms, 36(2):111–148, 2010.
[19] X. Pérez-Giménez and N. Wormald. Asymptotic enumeration of strongly connected digraphs by vertices and edges. Random Structures & Algorithms, 43(1):80–114, 2013.
[20] D. Williams. Probability with martingales. Cambridge university press, 1991.

$\displaystyle\mathbb{E}\|\widetilde{W}_{C}^{1}-W_{C}^{1}\|$	$\displaystyle\leq(1+o(1))\sum_{h=\frac{\alpha}{\|Q\|}}^{g(n)}\sum_{k=1}^{\infty}\frac{\|C\|^{k}h^{k-1}}{k!m^{k}}(1+Q)^{h}$
	$\displaystyle\leq(1+o(1))\sum_{\|C^{\prime}\|=\frac{\alpha}{\|Q\|}}^{g(n)}\frac{e^{\frac{\|C\|h}{m}}-1}{\|C^{\prime}\|}$
	$\displaystyle\leq\frac{g(n)\|Q\|}{\alpha}\left(e^{\frac{g(n)^{2}}{m}}-1\right)\leq\frac{2g(n)^{3}\|Q\|}{\alpha m}=o(1),$	(43)