Truncated Simulation and Inference in
Edge-Exchangeable Networks

Xinglong Lilabel=e1]xinglong.li@stat.ubc.ca [ Trevor Campbelllabel=e2]trevor@stat.ubc.ca [ Department of Statistics
The University of British Columbia
email: xinglong.li@stat.ubc.ca; trevor@stat.ubc.ca

Abstract

Edge-exchangeable probabilistic network models generate edges as an i.i.d. sequence from a discrete measure, providing a simple means for statistical inference of latent network properties. The measure is often constructed using the self-product of a realization from a Bayesian nonparametric (BNP) discrete prior; but unlike in standard BNP models, the self-product measure prior is not conjugate the likelihood, hindering the development of exact simulation and inference algorithms. Approximation via finite truncation of the discrete measure is a straightforward alternative, but incurs an unknown approximation error. In this paper, we develop methods for forward simulation and posterior inference in random self-product-measure models based on truncation, and provide theoretical guarantees on the quality of the results as a function of the truncation level. The techniques we present are general and extend to the broader class of discrete Bayesian nonparametric models.

truncation,

Bayesian nonparametrics,

edge-exchangeable,

networks,

Bayesian inference,

keywords:

\startlocaldefs\endlocaldefs

t1This work is supported by a National Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant and Discovery Launch Supplement.

and

1 Introduction

Probabilistic generative models have for many years been key tools in the analysis of network data [1, 2]. Recent work in the area [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] has begun to incorporate the use of nonparametric discrete measures, in an effort to address the limitations of traditional models in capturing the sparsity of real large-scale networks [14]. These models construct a discrete random measure $\Theta$ (often a completely random measure, or CRM [15]) on a space $\Psi$ , associate each atom of the measure with a vertex in the network, and then use the self-product of the measure—i.e., the measure $\Theta\times\Theta$ on $\Psi^{2}$ —to represent the magnitude of interaction between vertices.

While the inclusion of a nonparametric measure enables capturing sparsity, it also makes both generative simulation and posterior inference via Markov chain Monte Carlo (MCMC) [16; 17, Ch. 11, 12] significantly more challenging. In standard Bayesian models with discrete nonparametric measures—such as the Dirichlet process mixture model [18] or beta process latent feature model [19]—this issue is typically addressed by exploiting the conjugacy of the (normalized) completely random measure prior and the likelihood to marginalize the latent infinite discrete measure [20]. But in the case of nonparametric network models, however, there is no such reprieve; the self-product of a completely random measure is not a completely random measure, and exact marginalization is typically not possible.

Another option is to truncate the discrete CRM to have finitely many atoms, and perform simulation and inference based on the truncated CRM. Exact truncation schemes based on auxiliary variables [21, 22, 23] are limited to models where certain tail probabilities can be computed exactly. Fixed truncation [24, 25, 26, 27], on the other hand, is easy to apply to any CRM-based model; but it involves an approximation with potentially unknown error. Although the approximation error of truncated CRM models has been thoroughly studied in past work [28, 29, 30, 27, 31], these results apply only to generative simulation—i.e., not inference—and do not apply to self-product CRMs that commonly appear in network models.

In this work, we provide tractable methods for both generative simulation and posterior inference of discrete Bayesian nonparametric models based on truncation, as well as rigorous theoretical analyses of the error incurred by truncation in both scenarios. In particular, our theory and methods require only the ability to compute bounds on—instead of exact evaluation of—intractable tail probabilities. Our work focuses on the case of self-product-measure-based edge-exchangeable network sequences [3, 5, 32, 33], whose edges are simulated i.i.d. conditional on the discrete random product measure $\Theta\times\Theta$ , but the ideas here apply without much effort to the broader class of discrete Bayesian nonparametric models. As an intermediate step of possible independent interest, we also show that the nonzero rates generated from the rejection representation [34] of a Poisson process have the same distribution as the well-known but typically intractable inverse Lévy or Ferguson-Klass representation [35]. This provides a novel method for simulating the inverse Lévy representation, which has a wide variety of uses in applications of Poisson processes [36, 37, 38].

2 Background

2.1 Completely random measures and self-products

A completely random measure (CRM) $\Theta$ on $\Psi$ is a random measure such that for any collection of $K\in\mathbb{N}$ disjoint measurable sets $A_{1},...,A_{K}\subset\Psi$ , the values $\Theta(A_{1}),...,\Theta(A_{K})$ are independent random variables [15]. In this work, we focus on discrete CRMs taking the form $\Theta=\sum_{k}\theta_{k}\delta_{\psi_{k}}$ , where $\delta_{x}$ is a Dirac measure on $\Psi$ at location $x\in\Psi$ (i.e., $\delta_{x}(A)=1$ if $x\in A$ and 0 otherwise), and $(\theta_{k},\psi_{k})_{k=1}^{\infty}$ are a sequence of rates $\theta_{k}$ and labels $\psi_{k}$ generated from a Poisson process on $\mathbb{R}_{+}\times\Psi$ with mean measure $\nu(\mathrm{d}\theta)\times L(\mathrm{d}\psi)$ . Here $L$ is a diffuse probability measure, and $\nu$ is a $\sigma$ -finite measure satisfying $\nu(\mathbb{R}_{+})=\infty$ , which guarantees that the Poisson process has countably infinitely many points almost surely. The space $\Psi$ and distribution $L$ will not affect our analysis; thus as a shorthand, we write $\mathrm{CRM}(\nu)$ for the distribution of $\Theta$ :

\displaystyle\Theta:=\sum_{k}\theta_{k}\delta_{\psi_{k}}\sim\mathrm{CRM}(\nu).

(2)

One can construct a multidimensional measure $\Theta^{(d)}$ on $\Psi^{d}$ , $d\in\mathbb{N}$ from $\Theta$ defined in Eq. 2 by taking its self-product. In particular, we define

\displaystyle\Theta^{(d)}:=\sum_{i\in\mathbb{N}_{\neq}^{d}}\vartheta_{i}\delta_{\zeta_{i}},

\displaystyle\vartheta_{i}:=\prod_{j=1}^{d}\theta_{i_{j}},

\displaystyle\zeta_{i}:=(\psi_{i_{1}},\psi_{i_{2}},...,\psi_{i_{d}}),

(3)

where $i$ is a $d$ -dimensional multi-index, and $\mathbb{N}_{\neq}^{d}$ is the set of such indices with all distinct components. Note that $\Theta^{(d)}$ is no longer a CRM, as it does not satisfy the independence condition.

2.2 Series representations

To simulate a realization $\Theta\sim\mathrm{CRM}(\nu)$ —e.g., as a first step in the simulation of a self-product measure $\Theta^{(d)}$ —the rates $\theta_{k}$ and labels $\psi_{k}$ may be generated in sequence using a series representation [39] of the CRM. In particular, we begin by simulating the jumps of a unit-rate homogeneous Poisson process $(\Gamma_{k})_{k=1}^{\infty}$ on $\mathbb{R}_{+}$ in increasing order. For a given distribution $g$ on $\mathbb{R}_{+}$ and nonnegative measurable function $\tau:\mathbb{R}_{+}\times\mathbb{R}_{+}\to\mathbb{R}_{+}$ , we set

\displaystyle{\tiny}\Theta=\sum_{k=1}^{\infty}\theta_{k}\delta_{\psi_{k}},\quad\theta_{k}=\tau(U_{k},\Gamma_{k}),\quad U_{k}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}g,\quad\psi_{k}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}L.

(4)

Depending on the particular choice of $\tau$ and $g$ , one can construct several different series representations of a CRM [28]. For example, the inverse Lévy representation [35] has the form

\displaystyle\theta_{k}=\nu^{\leftarrow}(\Gamma_{k}),\quad\nu^{\leftarrow}(x):=\inf\left\{y:\nu\left(\left[y,\infty\right)\right)\leq x\right\}.

(5)

In many cases, computing $\nu^{\leftarrow}(x)$ is intractable, making it hard to generate $\theta_{k}$ in this manner. Alternatively, we can generate a series of rates from $\mathrm{CRM}(\nu)$ with the rejection representation [34], which has the form

\displaystyle\theta_{k}=T_{k}\mathds{1}\Big{(}\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(T_{k})\geq U_{k}\Big{)},\quad T_{k}=\mu^{\leftarrow}(\Gamma_{k}),\quad U_{k}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}\mathrm{Unif}[0,1],

(6)

where $\mu$ is a measure on $\mathbb{R}_{+}$ chosen such that $\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\leq 1$ uniformly and $\mu^{\leftarrow}(x)$ is easy to calculate in closed-form. While there are many other sequential representations of CRMs [28], the representations in Eqs. 4, 5 and 6 are broadly applicable and play a key role in our theoretical analysis.

2.3 Edge-exchangeable graphs

Self-product measures $\Theta^{(d)}$ of the form Eq. 3 with $d=2$ have recently been used as priors in a number of probabilistic network models [3, 4].¹¹1There are also network models based on self-product measure priors that do not generate edge-exchangeable sequences [8, 11]; it is likely that many of the techniques in the present work would extend to these models, but we leave the study of this to future work. The focus of the present work are those models that associate each $\psi_{k}$ with a vertex, each tuple $\zeta_{i}=(\psi_{i_{1}},\dots,\psi_{i_{d}})$ with a (hyper)edge, and then build a growing sequence of networks by sequentially generating edges from $\Theta^{(d)}$ in rounds $n=1,\dots,N$ . There are a number of choices for how to construct such a sequence. For example, in each round $n$ we may add multiple edges via an independent likelihood process $X_{n}\sim\mathrm{LP}(h,\Theta^{(d)})$ defined by

\displaystyle X_{n}:=\sum_{i\in\mathbb{N}_{\neq}^{d}}x_{ni}\delta_{\zeta_{i}},

\displaystyle x_{ni}\overset{\textrm{\tiny{indep}}}{\sim}h(\cdot|\vartheta_{i}),

(7)

where $x_{ni}=k$ denotes that there were $k$ copies of edge $\zeta_{i}$ added at round $n$ , and $h(\cdot|\vartheta)$ is a probability distribution on $\mathbb{N}\cup\{0\}$ . We denote the mean $\mu(\vartheta):=\sum_{k=0}^{\infty}k\cdot h(k\,|\,\vartheta)$ and probability of 0 under $h$ to be $\pi(\vartheta):=h(0|\vartheta)$ for convenience. By the Slivnyak-Mecke theorem [40], if $h$ satisfies

\displaystyle\int_{\mathbb{R}_{+}^{d}}\mu\left(\prod_{j=1}^{d}\theta_{j}\right)\prod_{j=1}^{d}\nu(\mathrm{d}\theta_{j})<\infty,

(8)

then finitely many edges are added to the graph in each round. We make this assumption throughout this work. Alternatively, if

\displaystyle\int_{\mathbb{R}_{+}}\min(1,\theta)\nu(\mathrm{d}\theta)<\infty,

(9)

then $\Omega:=\Theta^{(d)}(\Psi^{d})<\infty$ , and we may add only a single edge per round $n$ via a categorical likelihood process $X_{n}\sim\mathrm{Categorical}(\Theta^{(d)})$ defined by

\displaystyle X_{n}:=\delta_{\zeta_{I_{n}}},

\displaystyle I_{n}\sim\mathrm{Categorical}\left(\left(\frac{\vartheta_{i}}{\Omega}\right)_{i\in\mathbb{N}_{\neq}^{d}}\right).

(10)

This construction has appeared in [4], where $\Theta$ follows a Dirichlet process, which can be seen as a normalized gamma process [41]. Note that our definition of $\mathrm{Categorical}(\Theta^{(d)})$ involves simulating from the normalization; we use this definition to avoid introducing new notation for normalized processes.

Using either likelihood process, the edges in the network after $N$ rounds are

\displaystyle\sum_{n=1}^{N}X_{n}:=\sum_{i\in\mathbb{N}_{\neq}^{d}}x_{i}\delta_{\zeta_{i}},\qquad x_{i}:=\sum_{n=1}^{N}x_{ni},

(11)

i.e., $x_{i}\in\mathbb{N}\cup\{0\}$ represents the count of edge $i$ after $N$ rounds.

There are three points to note about this formulation. First, since the atom locations $\zeta_{i}$ are not used, we can represent the network using only its array of edge counts

\displaystyle E_{N}:=(x_{i})_{i\in\mathbb{N}_{\neq}^{d}}\in\mathcal{N}_{d},

(12)

where $\mathcal{N}_{d}$ denotes the set of integer arrays indexed by $\mathbb{N}_{\neq}^{d}$ with finitely many nonzero entries. Note that $\mathcal{N}_{d}$ is a countable set. Second, by construction, the distribution of $E_{N}$ is invariant to reorderings of the arrival of edges, and thus the network is edge-exchangeable [3, 5, 6, 7]. Finally, note that the network $E_{N}$ as formulated in Eq. 12 is in general a directed multigraph with no self-loops (due to the restriction to indices $i\in\mathbb{N}_{\neq}^{d}$ rather than $\mathbb{N}^{d}$ ). Although the main theoretical results in this work are developed in this setting, we provide an additional result in Section 3.1 to translate to other common network structures (e.g. binary undirected networks).

3 Truncated generative simulation

In this section, we consider the tractable generative simulation of network models via truncation, and analyze the approximation error incurred in doing so as a function of $K\in\mathbb{N}$ (the truncation level) and number of rounds of generation $N\in\mathbb{N}$ . In particular, to construct a truncated self-product measure, we first split the underlying CRM $\Theta$ into a truncation and tail component,

\displaystyle\Theta=\Theta_{K}+\Theta_{K+},

\displaystyle\Theta_{K}=\sum_{k=1}^{K}\theta_{k}\delta_{\psi_{k}},

\displaystyle\Theta_{K+}=\sum_{k=K+1}^{\infty}\theta_{k}\delta_{\psi_{k}},

(13)

and construct the self-product $\Theta_{K}^{(d)}$ from the truncation $\Theta_{K}$ as in Eq. 3. Fig. 1 provides an illustration of the truncation of $\Theta$ and $\Theta^{(2)}$ , showing that $\Theta^{(2)}$ can be decomposed into a sum of four parts,

\displaystyle\Theta^{(2)}=\Theta^{2}=(\Theta_{K}+\Theta_{K+})^{2}=\Theta_{K}^{(2)}+\left(\Theta_{K}\times\Theta_{K+}+\Theta_{K+}\times\Theta_{K}+\Theta_{K+}^{2}\right).

(14)

Thus, while we only discard $\Theta_{K+}$ in truncating $\Theta$ to $\Theta_{K}$ , we discard three parts in truncating $\Theta_{K}^{(2)}$ to $\Theta_{K}^{(2)}$ ; and in general, we discard $2^{d}-1$ parts of $\Theta^{(d)}$ when truncating it to $\Theta_{K}^{(d)}$ . We therefore intuitively might expect higher truncation error when approximating $\Theta_{K}^{(d)}\approx\Theta^{(d)}$ than when approximating $\Theta_{K}\approx\Theta$ ; in Sections 3.2 and 3.3, we will show that this is indeed the case.

Refer to caption — (a) Truncation of $\Theta$

Once the measure is truncated, a truncated network—based on $\Theta_{K}^{(d)}$ —can be constructed in the same manner as the original network using the independent likelihood process Eq. 7 or categorical likelihood process Eq. 10. We denote $E_{N,K}=(x_{i,K})_{i\in\mathbb{N}^{d}_{\neq}}\in\mathcal{N}_{d}$ to be the corresponding edge set of the truncated network up round $N$ , where $x_{i,K}=0$ for any index $i\in\mathbb{N}^{d}_{\neq}$ such that some component $i_{j}>K$ . We keep $E_{N}$ and $E_{N,K}$ in the same space in order to compare their distributions in Sections 3.1, 3.2 and 3.3.

3.1 Truncation error bound

We formulate the approximation error incurred by truncation as the total variation distance between the marginal network distributions, i.e., of $E_{N}$ and $E_{N,K}$ . The first step in the analysis of truncation error—provided by Lemma 3.1—is to show that this is bounded above by the probability that there are edges in the full network $E_{N}$ involving vertices beyond the truncation level $K$ . To this end, we denote the maximum vertex index of $E_{N}$ to be

\displaystyle I_{N}:=\max_{i\in\mathbb{N}_{\neq}^{d}}\left(\max_{j\in[d]}\,\,i_{j}\right)\quad\text{s.t.}\quad x_{i}>0,

(15)

and note that by definition, $I_{N}\leq K$ if and only if all edges in $E_{N}$ fall in the truncated region.

Lemma 3.1.

Let $\Theta=\sum_{k=1}^{\infty}\theta_{k}\delta_{\psi_{k}}$ be a random discrete measure, and $\Theta_{K}=\sum_{k=1}^{K}\theta_{k}\delta_{\psi_{k}}$ be its truncation to $K$ atoms. Let $\Theta^{(d)}$ be the self-product of $\Theta$ , and $\Theta^{(d)}_{K}$ be the self-product of $\Theta_{K}$ . Let $P_{N}$ and $P_{N,K}$ be the marginal distributions of edge sets $E_{N},E_{N,K}\in\mathcal{N}_{d}$ under either the independent or categorical likelihood process. Then

\displaystyle\mathrm{D_{TV}}\left(P_{N},P_{N,K}\right)\leq 1-\mathbb{P}\left(I_{N}\leq K\right).

(16)

As mentioned in Section 2.3, the network $E_{N}$ is in general a directed multigraph with no self loops. However, Lemma 3.1—and the downstream truncation error bounds presented in Sections 3.2 and 3.3—also apply to any graph $E^{\prime}_{N}=(x^{\prime}_{i})_{i\in\mathbb{N}^{d}_{\neq}}$ that is a function of the original graph $E^{\prime}_{N}=f(E_{N})$ such that truncation commutes with the function, i.e., $E^{\prime}_{N,K}=f(E_{N,K})$ . For example, to obtain a truncation error bound for the common setting of undirected binary graphs, we generate the directed multigraph $E_{N}$ as above and construct the undirected binary graph $E^{\prime}_{N}$ via

\displaystyle x^{\prime}_{i}=\mathds{1}_{x_{i}>0}\cdot\mathds{1}_{i_{1}<i_{2}<\dots<i_{d}},\quad i\in\mathbb{N}^{d}_{\neq}.

(17)

Corollary 3.2 provides the precise statement of the result; note that the bound is identical to that from Lemma 3.1.

Corollary 3.2.

Let $E^{\prime}_{N}:=(x^{\prime}_{i})_{i\in\mathbb{N}^{d}_{\neq}}\in\mathcal{N}_{d}$ be the set of edges of a network with truncation $E^{\prime}_{N,K}\in\mathcal{N}_{d}$ , and denote their marginal distributions $P^{\prime}_{N}$ and $P^{\prime}_{N,K}$ . If there exists a measurable function $f$ such that

\displaystyle E^{\prime}_{N}=f(E_{N})\qquad\text{and}\qquad E^{\prime}_{N,K}=f(E_{N,K}),

(18)

then

\displaystyle\mathrm{D_{TV}}\left(P^{\prime}_{N},P^{\prime}_{N,K}\right)\leq 1-\mathbb{P}\left(I_{N}\leq K\right).

(19)

3.2 Independent likelihood process

We now specialize Lemma 3.1 to the setting where $\Theta$ is a CRM generated by a series representation of the form Eq. 4, and the network is generated via the independent likelihood process from Eq. 7. As a first step towards a bound on the truncation error for general hypergraphs with $d>1$ in Theorem 3.4, we present a simpler corollary in the case where $d=2$ , which is of direct interest in analyzing the truncation error of popular edge-exchangeable networks.

Corollary 3.3.

In the setting of Lemma 3.1, suppose $\Theta$ is a CRM generated from the series representation Eq. 4, edges are generated from the independent likelihood process Eq. 7, and $d=2\leq K$ . Then

\displaystyle\mathbb{P}\left(I_{N}\leq K\right)\geq\exp\left(-N\cdot B_{K}\right),

(20)

where

$\displaystyle B_{K}$	$\displaystyle=B_{K,1}+B_{K,2}+B_{K,3}$	(21)
$\displaystyle B_{K,1}$	$\displaystyle=\mathbb{E}\left[\int_{\mathbb{R}_{+}^{2}}-\log\pi\left(\tau(U_{1},\gamma_{1}+\Gamma_{K})\tau(U_{2},\gamma_{2}+\Gamma_{K})\right)\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}\right]$	(22)
$\displaystyle B_{K,2}$	$\displaystyle=2\mathbb{E}\left[\int_{0}^{\Gamma_{K}}\frac{(K-1)}{\Gamma_{K}}\int_{\Gamma_{K}}^{\infty}-\log\pi\left(\tau(U_{1},\gamma_{1})\tau(U_{2},\gamma_{2})\right)\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}\right]$	(23)
$\displaystyle B_{K,3}$	$\displaystyle=2\mathbb{E}\left[\int_{\mathbb{R}_{+}}-\log\pi\left(\tau(U_{1},\Gamma_{K})\tau(U_{2},\gamma+\Gamma_{K})\right)\mathrm{d}\gamma\right].$	(24)

The proof of this result (and Theorem 3.4 below) in Appendix B follows by representing the tail of the CRM as a unit-rate Poisson process on $[\Gamma_{K},\infty)$ . Though perhaps complicated at first glance, an intuitive interpretation of the truncation error terms $B_{K,i}$ is provided by Fig. 1(b). $B_{K,1}$ corresponds to the truncation error arising from the upper right quadrant, where both vertices participating in the edge were in the discarded tail region. $B_{K,2}$ is the truncation error arising from the bottom right and upper left quadrants, where one of the two vertices participating in the edge was in the truncation, and the other was in the tail. Finally, $B_{K,3}$ represents the truncation error arising from edges in which one vertex was at the boundary of tail and truncation, and the other was in the tail.

Theorem 3.4 is the generalization of Corollary 3.3 from $d=2$ to the general setting of arbitrary hypergraphs with $d>1$ . The bound is analogous to that in Corollary 3.3—with $B_{K}$ expressed as a sum of terms, each corresponding to whether vertices were in the tail, boundary, or truncation region—except that there are $d>1$ vertices participating in each edge, resulting in more terms in the sum. Note that Theorem 3.4 also guarantees that the bound is not vacuous, and indeed converges to 0 as the truncation level $K\rightarrow\infty$ as expected.

Theorem 3.4.

In the setting of Lemma 3.1, suppose $\Theta$ is a CRM generated from the series representation Eq. 4, edges are generated from the independent likelihood process Eq. 7, and $1<d\leq K$ . Then

\displaystyle\mathbb{P}\left(I_{N}\leq K\right)\geq\exp\left(-N\cdot B_{K}\right),

(25)

where

$\displaystyle B_{K}$	$\displaystyle=B_{K,1}+B_{K,2}+B_{K,3},$	(26)
$\displaystyle B_{K,1}$	$\displaystyle=\mathbb{E}\left[\int_{[\Gamma_{K},\infty)^{d}}-\log\pi(\tilde{\theta})\mathrm{d}\gamma\right]$	(27)
$\displaystyle B_{K,2}$	$\displaystyle=\sum_{\ell=1}^{d-1}\binom{d}{\ell}\mathbb{E}\!\left[\frac{(K-1)\,!}{(K-1-\ell)\,!}\Gamma_{K}^{-\ell}\!\!\int_{[0,\Gamma_{K}]^{\ell}\times[\Gamma_{K},\infty)^{d-\ell}}\!\!-\log\pi(\tilde{\theta})\mathrm{d}\gamma\right]$	(28)
$\displaystyle B_{K,3}$	$\displaystyle=\sum_{\ell=1}^{d-1}\ell\binom{d}{\ell}\mathbb{E}\!\left[\frac{(K-1)\,!}{(K-\ell)\,!}\Gamma_{K}^{-(\ell-1)}\!\!\int_{[0,\Gamma_{K}]^{\ell}\times[\Gamma_{K},\infty)^{d-\ell}}\!\!-\delta_{\gamma_{\ell}=\Gamma_{K}}\log\pi(\tilde{\theta})\mathrm{d}\gamma\right],$	(29)

$\delta_{\cdot}$ is the Dirac delta, $\mathrm{d}\gamma:=\prod_{j=1}^{d}\mathrm{d}\gamma_{j}$ , and $\tilde{\theta}:=\prod_{j=1}^{d}\tau(U_{j},\gamma_{j})$ . Furthermore, $\emph{lim}_{K\rightarrow\infty}B_{K}=0$ .

The same geometric intuition from the $d=2$ -dimensional case applies to the general hypergraph truncation error in Eq. 26. $B_{K,1}$ corresponds to the error arising from the edges whose vertices all belong to the tail region $\Theta_{K+}$ . Each term in the summation in $B_{K,2}$ corresponds to the error arising from edges that have $\ell$ out of $d$ vertices belonging to the truncation $\Theta_{K}$ . Each term in the summation in $B_{K,3}$ corresponds to the error arising from the edges that have $\ell-1$ out of $d$ vertices belonging to the truncation $\Theta_{K}$ and have one vertex exactly on the boundary of the truncation. Note that we obtain Corollary 3.3 by taking $d=2$ in Eq. 26.

3.3 Categorical likelihood process

We may also specialize Lemma 3.1 to the setting where the network is generated via the single-edge-per-step categorical likelihood process in Eq. 10. However, truncation with the categorical likelihood process poses a few key challenges. From a practical angle, certain choices of series representation for generating $\Theta$ may be problematic. For instance, when using the rejection representation Eq. 6 in the typical case where $\mu\neq\nu$ , there is a nonzero probability that

\displaystyle\sum_{k=1}^{K}\mathds{1}\left[\theta_{k}>0\right]<d,

(30)

meaning there aren’t enough accepted vertices in the truncation to generate a single edge. In this case, the categorical likelihood process—which must generate exactly one edge per step—is ill-defined. An additional theoretical challenge arises from the normalization of the original and truncated networks in Eq. 10, which prevents the use of the usual theoretical tools for analyzing CRMs.

Fortunately, the inverse Lévy representation provides an avenue to address both issues. The rates $\theta_{k}$ are all guaranteed to be nonzero—meaning as long as $K\geq d$ , the categorical likelihood process is well-defined—and are decreasing, which enables our theoretical analysis in Section B.1. However, as mentioned earlier, the inverse Lévy representation is well-known to be intractable to use in most practical settings.

Theorem 3.5 provides a solution: we use the rejection representation to simulate the rates $\theta_{k}$ , but instead of simulating for iterations $k=1,\dots,K$ , we simulate until we obtain $K$ nonzero rates. This is no longer a sample of a truncated rejection representation; but Theorem 3.5 shows that the first $K$ nonzero rates have the same distribution as simulating $K$ iterations of the inverse Lévy representation. Therefore, we can tailor the analysis of truncation error for the categorical likelihood process in Theorem 3.6 to the inverse Lévy representation, and simulate its truncation for any $K$ using the tractable rejection representation in practice.

Theorem 3.5.

Let $\theta_{1},\dots,\theta_{K}$ be the first $K$ rates from the inverse Lévy representation of a CRM, and let $\theta^{\prime}_{1},\dots,\theta^{\prime}_{K}$ be the first $K$ nonzero rates from any rejection representation of the same CRM. Then

\displaystyle(\theta_{1},\dots,\theta_{K})\overset{d}{=}(\theta^{\prime}_{1},\dots,\theta^{\prime}_{K}).

(31)

Theorem 3.6.

In the setting of Lemma 3.1, suppose $\Theta$ is a CRM generated from the inverse Lévy representation Eq. 5, edges are generated from the categorical likelihood process Eq. 10, and $1<d\leq K$ . Then

\displaystyle\mathbb{P}\left(I_{N}\leq K\right)\geq(1-B_{K})^{Nd}\geq 0,

(32)

where

\displaystyle B_{K}:=\mathbb{E}\left[\int_{-\infty}^{\infty}Q(\Gamma_{K},x)\left(\int_{0}^{1}Q(\Gamma_{K}u,x)\mathrm{d}u\right)^{K-d}\left(\frac{\mathrm{d}}{\mathrm{d}x}e^{\int_{0}^{\infty}Q(\Gamma_{K}+\gamma,\,x)-1\,\mathrm{d}\gamma}\right)\mathrm{d}x\right],

(33)

and

\displaystyle Q(u,t)=\exp\left(-\nu^{\leftarrow}(u)e^{-t}\right)\quad\text{and}\quad\Gamma_{K}\sim{\sf{Gam}}(K,1).

(34)

Furthermore, $\lim_{K\rightarrow\infty}B_{K}=0$ .

3.4 Examples

We now apply the results of this section to binary undirected networks ( $d=2$ ) constructed via Eq. 17 from common edge-exchangeable networks [3]. We derive the convergence rate of truncation error, and provide simulations of the expected number of edges and vertices under the infinite and truncated network. In each simulation we use the rejection representation to simulate $\Theta$ , and run $N=10,000$ steps of network construction.

Beta-Bernoulli process network

Suppose $\Theta$ is generated from a beta process, and network $E_{N}$ is generated using the independent Bernoulli likelihood process [3]. The beta process $\mathrm{BP}(\gamma,\lambda,\alpha)$ [42] with discount parameter $\alpha\in[0,1)$ , concentration parameter $\lambda>-\alpha$ , and mass parameter $\gamma>0$ is a CRM with rate measure

\displaystyle\nu(\mathrm{d}\theta)=\gamma\frac{\Gamma(\lambda+1)}{\Gamma(1-\alpha)\Gamma(\lambda+\alpha)}\mathds{1}[\theta\leq 1]\theta^{-1-\alpha}(1-\theta)^{\lambda+\alpha-1}\mathrm{d}\theta.

(35)

The Bernoulli likelihood has the form

\displaystyle h(x|\theta)=\mathds{1}[x\leq 1]\theta^{x}(1-\theta)^{1-x}.

(36)

To simulate the process, we use a proposal rate measure $\mu$ given by

\displaystyle\mu(\mathrm{d}\theta)=\gamma^{\prime}\mathds{1}\left[\theta\leq 1\right]\theta^{-1-\alpha}\mathrm{d}\theta,\quad\gamma^{\prime}=\gamma\frac{\Gamma(\lambda+1)}{\Gamma(1-\alpha)\Gamma(\lambda+\alpha)}.

(37)

Dense network: When $\alpha=0$ , the binary beta-Bernoulli graph is dense and

\displaystyle\mu^{\leftarrow}(u)=e^{-u/\gamma^{\prime}},

\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1-\theta)^{\lambda-1}.

(38)

Therefore the rejection representation Eq. 6 of $\mathrm{BP}(\gamma,\lambda,0)$ can be written as

\displaystyle\theta_{k}=T_{k}\mathds{1}\left(U_{k}\leq(1-T_{k})^{\lambda-1}\right),\quad T_{k}=e^{-\Gamma_{k}/\gamma^{\prime}}.

(39)

In Section C.2, we show that there exists $K_{0}\in\mathbb{N}$ such that

\displaystyle\forall K\geq K_{0},\quad B_{K}\leq 12\gamma(1-e^{-1})^{\lambda-2}\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}.

(40)

This implies that the truncation error of the dense binary beta-Bernoulli network converges to 0 geometrically in $K$ . Fig. 2(a) corroborates this result in simulation with $\lambda=2$ and $\gamma=1$ ; it can be seen that for the dense beta-Bernoulli graph, truncated graphs with relatively low truncation level—in this case, $K\approx 50$ —approximate the true network model well.

Sparse network: When $\alpha\in(0,1)$ , the binary beta-Bernoulli graph is sparse and

\displaystyle\mu^{\leftarrow}(u)=\left(1+\frac{\alpha u}{\gamma^{\prime}}\right)^{-1/\alpha},

\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1-\theta)^{\lambda+\alpha-1}.

(41)

Therefore the rejection representation Eq. 6 of $\mathrm{BP}(\gamma,\lambda,\alpha)$ can be written as

\displaystyle\theta_{k}=T_{k}\mathds{1}\left(U_{k}\leq(1-T_{k})^{\lambda+\alpha-1}\right),

\displaystyle T_{k}=(1+\alpha\Gamma_{k}/\gamma^{\prime})^{-1/\alpha}.

(42)

In Section C.2, we show that there exists $K_{0}\in\mathbb{N}$ such that

\displaystyle\forall K\geq K_{0},\quad B_{K}\leq 6\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ (K-1)^{\frac{\alpha-1}{\alpha}}.

(43)

This bound suggests that the truncation error for the sparse binary beta-Bernoulli network converges to 0 much more slowly than for the dense graph. Fig. 2(b) corroborates this result in simulation with $\lambda=2$ , $\gamma=1$ , and $\alpha=0.6$ ; it can be seen that for the sparse beta-Bernoulli graph, truncated graphs behave significantly differently from the true graph for moderate truncation levels. In practice, one should select an appropriate (large) value of $K$ using our error bounds as guidance, and use sparse data structures to avoid undue memory burden.

Gamma-independent Poisson network

Next, consider the network with $\Theta$ generated from a gamma process, and the network $E_{N}$ generated using the independent Poisson likelihood process. The gamma process $\mathrm{\Gamma P}(\gamma,\lambda,\alpha)$ [43] with discount parameter $\alpha\in[0,1)$ , scale parameter $\lambda>0$ and mass parameter $\gamma>0$ has rate measure

\displaystyle\nu(\mathrm{d}\theta)=\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\theta^{-\alpha-1}e^{-\lambda\theta}\mathrm{d}\theta.

(44)

The Poisson likelihood has the form

\displaystyle h(x|\theta)=\frac{\theta^{x}}{x!}e^{-\theta}.

(45)

Dense network: When $\alpha=0$ , the gamma-Poisson graph is dense, and we choose the proposal measure $\mu$ to be

\displaystyle\mu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}(1+\lambda\theta)^{-1}\mathrm{d}\theta,

(46)

such that

\displaystyle\mu^{\leftarrow}(u)=1/\left(\lambda\left(e^{(\gamma\lambda)^{-1}u}-1\right)\right),\qquad\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1+\lambda\theta)e^{-\lambda\theta}.

(47)

Therefore, the rejection representation in Eq. 6 has the form

\displaystyle\theta_{k}=T_{k}\mathds{1}\left(U_{k}\leq(1+\lambda T_{k})e^{-\lambda T_{k}}\right),

\displaystyle T_{k}=\frac{1}{\lambda\left(e^{(\gamma\lambda)^{-1}\Gamma_{k}}-1\right)}.

(48)

In Section C.3, we show that there exists $K_{0}\in\mathbb{N}$ such that

\displaystyle\forall K\geq K_{0},\quad B_{K}\leq 6\frac{\gamma}{\lambda}\left(\frac{\gamma\lambda}{1+\gamma\lambda}\right)^{K-1}.

(49)

Again, for the dense network $B_{K}$ converges to 0 geometrically, indicating that truncation may provide reasonable approximations to the original network. Fig. 3(a) corroborates this result in simulation with $\lambda=2$ and $\gamma=1$ ; for $K\approx 50$ , no vertices are discarded on average by truncation.

Sparse network: When $\alpha\in(0,1)$ , the gamma-Poisson graph is sparse, and we choose the proposal measure $\mu$ to be

\displaystyle\mu(\mathrm{d}\theta)=\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\theta^{-1-\alpha}\mathrm{d}\theta,

(50)

such that

\displaystyle\mu^{\leftarrow}(u)=(\gamma^{\prime}u^{-1})^{1/d},\quad\gamma^{\prime}:=\gamma\frac{\lambda^{1-\alpha}}{\alpha\Gamma(1-\alpha)},\quad\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=e^{-\lambda\theta}.

(51)

Therefore the rejection representation in Eq. 6 has the form

\displaystyle\theta_{k}=T_{k}\mathds{1}\left(U_{k}\leq e^{-\lambda T_{k}}\right),\quad T_{k}=\left(\gamma^{\prime}\Gamma_{k}^{-1}\right)^{1/\alpha}.

(52)

In Section C.3, we show there exists $K_{0}\in\mathbb{N}$ such that

\displaystyle\forall K\geq K_{0},\quad B_{K}\leq\frac{12\gamma^{2}\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\frac{3\gamma^{\prime}}{K-1}\right)^{\frac{1-\alpha}{\alpha}}.

(53)

Again, for the sparse network $B_{K}$ converges to 0 slowly, suggesting that the truncation error for the sparse binary gamma-Poisson graph converges more slowly than for the dense graph. Fig. 3(b) corroborates this result in simulation with $\lambda=2$ , $\gamma=1$ , and $\alpha=0.6$ ; for a moderate range of truncation values $K\leq 100$ , the truncated graph behaves very differently from the true graph. Therefore in practice, one should follow the above guidance for the beta-Bernoulli network: select a value of $K$ using our error bounds, and avoid intractable memory requirements by using sparse data structures.

4 Truncated posterior inference

In this section, we develop a tractable approximate posterior inference method for network models via truncation, and analyze its approximation error. In particular, given an observed network $E_{N}$ , we want to simulate from the posterior distribution of the CRM rates and the parameters of the CRM rate measure. Since exact expressions of full conditional densities are not available, we truncate the model and run an approximate Markov chain Monte Carlo algorithm. We provide a rigorous theoretical justification for this simple approach by establishing a bound on the total variation distance between the truncated and exact posterior distribution. This in turn provides a method to select the truncation level in a principled manner.

Although this section focuses on self-product-CRM-based network models, the method and theory we develop are both general and extend easily to other CRM-based models, e.g. for clustering [44], feature allocation [45], and trait allocation [6]. In particular, the methodology only requires bounds on tail occupancy probabilities (e.g., in the present context, the probability that $I_{N}\leq K$ ) rather than the exact evaluation of these quantities.

4.1 Truncation error bound

We begin by examining the density of the posterior distribution of the $K^{\text{th}}$ rate from the inverse Lévy representation $\theta_{K}$ for some fixed $K\in\mathbb{N}$ , the unordered collection of rates $(\theta_{k})_{k=1}^{K-1}$ such that $\theta_{k}\geq\theta_{K}$ , and the parameters $\sigma\in\mathbb{R}^{m}$ of the CRM rate measure $\nu_{\sigma}$ given the observed set of edges $E_{N}$ . We denote $\nu_{\sigma}(\theta)$ to be the density of $\nu_{\sigma}(\mathrm{d}\theta)$ and $p(\sigma)$ to be the prior density of $\sigma$ , both with respect to the Lebesgue measure. Given these definitions the posterior density can be expressed as

	$\displaystyle p(\sigma,\theta_{1:K},X_{1:N})\propto p(\sigma)\cdot e^{-\nu_{\sigma}[\theta_{K},\infty)}\cdot\prod_{k=1}^{K}\mathds{1}[\theta_{K}\leq\theta_{k}]\nu_{\sigma}(\theta_{k})$		(54)
	$\displaystyle\hskip 85.35826pt\cdot\left(\prod_{n=1}^{N}\prod_{i\in[K]^{d}_{\neq}}h(x_{ni}\,\|\,{\scriptstyle\prod_{j=1}^{d}}\theta_{i_{j}})\right)\cdot p(x_{1:N}^{K+}\|\theta_{1:K},\sigma),$		(55)

where $[K]^{d}_{\neq}$ is the subset of $\mathbb{N}^{d}_{\neq}$ such that $\max_{j\in[d]}i_{j}\leq K$ , and $x_{1:N}^{K+}$ is shorthand for the set of $(x_{ni})_{n\in[N],i\in\mathbb{N}_{\neq}^{d}}$ such that $i\notin[K]^{d}_{\neq}$ . All the factors on the first row correspond to the prior of $\sigma$ and $(\theta_{k})_{k=1}^{K}$ , and the first factor on the second row corresponds to the likelihood of the portion of the network involving only vertices $1\dots K$ ; these are straightforward to evaluate, though $\nu_{\sigma}[\theta_{K},\infty)$ will occasionally require a 1-dimensional numerical integration. The last factor corresponds to the likelihood of the portion of the network involving vertices beyond $K$ , and is not generally possible to evaluate exactly.

To handle this term, suppose that $K$ is large enough that $x_{1:N}^{K+}=0$ . Then $p(x_{1:N}^{K+}|\theta_{1:K},\sigma)=\mathbb{P}\left(I_{N}\leq K\,|\,\Gamma_{1:K},\sigma\right)$ , i.e., the probability that no portion of the network involves vertices of index $>K$ . Theorem 4.1 provides upper and lower bounds on this probability akin to those of Theorem 3.4—indeed, Theorem 4.1 is an intermediate step in the proof of Theorem 3.4—that apply conditionally on the values of $U_{1:K}$ , $\Gamma_{1:K}$ rather than marginally. This theorem also makes the dependence of the bound on the rate measure parameters $\sigma$ notationally explicit via parametrized series representation components $\tau_{\sigma}$ and $g_{\sigma}$ from Eq. 4. Finally, though Theorem 4.1 applies to general series representations, we require only the specific instantiation for the inverse Lévy representation in the present context.

Theorem 4.1.

The conditional probability $\mathbb{P}\left(I_{N}\leq K\,|\,U_{1:K},\Gamma_{1:K},\sigma\right)$ satisfies

\displaystyle 1\geq\mathbb{P}\left(I_{N}\leq K\,|\,U_{1:K},\Gamma_{1:K},\sigma\right)\geq\exp\left(-N\cdot B(U_{1:K},\Gamma_{1:K},\sigma)\right),

(56)

where

	$\displaystyle B(U_{1:K},\Gamma_{1:K},\sigma)$	$\displaystyle=\sum_{\ell=0}^{d-1}\binom{d}{\ell}\sum_{\begin{subarray}{c}\mathcal{L}\subseteq[K]\\ \|\mathcal{L}\|=\ell\end{subarray}}B(U_{1:K},\Gamma_{1:K},\sigma,\mathcal{L})$		(57)
	$\displaystyle B(U_{1:K},\Gamma_{1:K},\sigma,\mathcal{L})$	$\displaystyle=\int_{[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\mathbb{E}\left[-\log\pi\left(\prod_{j\in\mathcal{L}}\theta_{j}\prod_{j=1}^{d-\ell}\tau_{\sigma}(U^{\prime}_{j},\gamma_{j})\right)\,\|\,\theta_{1:K}\right]\mathrm{d}\gamma,$		(58)

where $\mathrm{d}\gamma=\prod_{j=1}^{d-\ell}\mathrm{d}\gamma_{j}$ and $U^{\prime}_{j}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}g_{\sigma}$ .

Since $U_{1:K}$ is unused in the inverse Lévy representation, in the present context we use the notation $B(\Gamma_{1:K},\sigma)$ for brevity. The bound in Theorem 4.1 implies that as long as $K$ is set large enough such that both $x_{1:N}^{K+}=0$ and $B(\Gamma_{1:K},\sigma)\leq\epsilon/N$ for some $\epsilon>0$ then

\displaystyle 1-\epsilon\leq p(x_{1:N}^{K+}|\theta_{1:K},\sigma)\leq 1.

(59)

Therefore as $K$ increases, this term should become $\approx 1$ and have a negligible effect on the posterior density. We use this intuition to propose a truncated Metropolis–Hastings algorithm that sets $K>I_{N}$ , ignores the $p(x_{1:N}^{K+}|\theta_{1:K},\sigma)$ term in the acceptance ratio, and fixes $x_{N}^{K+}$ to 0. Theorem 4.2 provides a rigorous analysis of the error involved in using the truncated sampler.

Theorem 4.2.

Fix $K>I_{N}$ . Let $\Pi$ be the distribution of $\sigma,\theta_{1:K}$ given $E_{N}$ , and let $\hat{\Pi}$ be the distribution with density proportional to Eq. 55 without the $p(x_{1:N}^{K+}|\theta_{1:K},\sigma)$ term and with $x_{1:N}^{K+}$ fixed to 0. If for some $\eta\in[0,1)$ ,

\displaystyle\hat{\Pi}\left\{B(\Gamma_{1:K},\sigma)\leq\epsilon/N\right\}\geq 1-\eta,

(60)

then

\displaystyle\mathrm{D_{TV}}\left(\hat{\Pi},\Pi\right)

\displaystyle\leq\frac{3(\epsilon+\eta)}{2}-\epsilon\eta.

(61)

4.2 Adaptive truncated Metropolis–Hastings

Crucially, as long as one can obtain samples from the truncated posterior distribution, one can estimate the bound in Eq. 61 using sample estimates of the tail probability $\hat{\Pi}\left\{B(\Gamma_{1:K},\sigma)\leq\epsilon/N\right\}\geq 1-\eta$ . Therefore, one can compute the bound Eq. 61 without needing to evaluate $p(x_{1:N}^{K+}|\theta_{1:K},\sigma)$ exactly. This suggests the following adaptive truncation procedure:

1.

Pick a value of $K>I_{N}$ and desired total variation error $\xi\in(0,1)$ .
2.

Obtain samples from the truncated posterior.
3.

Minimize the bound in Eq. 61 over $\epsilon\in(0,1)$ , using the samples to estimate $\eta=1-\hat{\Pi}\left\{B(\Gamma_{1:K},\sigma)\leq\epsilon/N\right\}$ for each value of $\epsilon$ .
4.

If the minimum bound exceeds $\xi$ , increase $K$ and return to 2.
5.

Otherwise, return the samples.

In this work, we start by initializing $K$ to $I_{N}+1$ . In order to decide how much to increase $K$ by in each iteration, we use the sequential representation to extrapolate the total variation bound Eq. 61 to larger values of $K$ without actually performing MCMC sampling. In particular, for each posterior sample, we use its hyperparameters to generate additional rates from the sequential representation. We then use these extended samples to compute the total variation error guarantee as per step 3 above. We continue to generate additional rates (typically doubling the number each time) until the predicted total variation guarantee is below the desired threshold. Finally, we use linear interpolation between the last two predicted errors to find the next truncation level $K$ that matches the desired (log) error threshold. This fixes a new value of $K$ ; at this point, we return to step 2 above and iterate.

4.3 Experiments

In this section, we examine the properties of the proposed adaptive truncated inference algorithm for the beta-independent Bernoulli network model in Eqs. 35 and 36 with discount $\alpha\in(0,1)$ , concentration $\lambda>1$ , mass $\gamma>0$ , unordered collection of rates $\theta_{1:K-1}$ , and $K^{\text{th}}$ rate from the sequential representation $\theta_{K}$ . In order to simplify inference, we transform each of these parameters to an unconstrained version:

	$\displaystyle\alpha$	$\displaystyle=\frac{\exp(\alpha_{u})}{1+\exp(\alpha_{u})},$	$\displaystyle\lambda$	$\displaystyle=1+\exp(\lambda_{u}),$	$\displaystyle\gamma$	$\displaystyle=\exp(\gamma_{u}),$		(62)
	$\displaystyle\theta_{K}$	$\displaystyle=\frac{\exp(\theta_{u,K})}{1+\exp(\theta_{u,K})},$	$\displaystyle\theta_{k}$	$\displaystyle=\frac{\theta_{K}+\exp(\theta_{u,k})}{1+\exp(\theta_{u,k})}$	$\displaystyle k$	$\displaystyle=1,\dots,K-1.$		(63)

We use a Markov chain Monte Carlo algorithm that includes an exact Gibbs sampling move for $\gamma$ , and separate Gaussian random-walk Metropolis–Hastings moves for $\alpha_{u}$ , $\lambda_{u}$ , $\theta_{u,K}$ , all $\theta_{u,k}$ such that vertex $k$ has degree 0 (jointly), and each $\theta_{u,k}$ such that vertex $k$ has nonzero degree (individually).

Synthetic data

We first apply the model to synthetic data simulated from the generative model. We simulate a sparse network with parameters $\lambda=2$ , $\gamma=1$ , $\alpha=0.2$ , and $N=10^{5}$ , and a dense network with $\lambda=2$ , $\gamma=1$ , $\alpha=0$ , and $N=10^{7}$ . In both settings we set the truncation level for data generation to $500$ , the desired total variation bound from Eq. 61 to $0.01$ , and initialize the sampler with $\alpha=0.4$ , $\lambda=5$ , $\gamma=2$ and $\theta$ generated from the sequential representation. All Metropolis–Hastings moves have proposal standard deviation $0.1$ except the sparse network $\alpha_{u}$ move, which has standard deivation $0.03$ .

Figs. 4 and 5 show histograms of $5{,}000$ marginal posterior samples of the hyperparameters for the dense (true $\alpha=0$ ) and sparse (true $\alpha=0.2$ ) networks. In both cases, the approximate posterior in the first round of adaptation (orange histogram) does not concentrate on the true hyperparameter values, despite the relatively large number of generative rounds $N$ . Fig. 6—which displays the truncation error and predictive adaptation procedure—shows why this is the case. In both networks, the first adaptation iteration identifies a large truncation error. After a single round of adaptation, the approximate posterior distributions (blue histograms) in Figs. 4 and 5 concentrate more on the true values as expected, and the truncation errors fall well below the desired threshold ( $\log_{10}(0.01)=-2$ ). It is worth noting that the predictive extrapolation appears to be quite conservative in these examples, and especially in the dense network. This happens because the approximate posterior for the dense network (respectively, sparse network) assigns mass to higher values of $\alpha$ (respectively, $\gamma$ ) than it should, which results in larger truncation error and thus a larger predicted required truncation level.

Real network data

Next, we apply the model to a Facebook-like Social Network²²2Available at https://toreopsahl.com/datasets/ [46]. The original source network contains a sequence of $61{,}734$ weighted, time-stamped edges, and $1{,}899$ vertices. We preprocess the data by removing the edge weights, binning the edge sequence into 30-minute intervals, and removing the initial transient of network growth, resulting in $1{,}899$ vertices and $10{,}435$ edges over $N=6{,}427$ rounds of generation. We again set a desired total variation error guarantee of 0.01 during inference. All Metropolis–Hastings moves have proposal standard deviation $0.1$ except the $\alpha_{u}$ and degree-0 $\theta_{u}$ moves, which have standard deviation $0.04$ and $0.03$ , respectively. We initialize the sampler to $\alpha=0.1$ , $\gamma=2$ , $\lambda=20$ and sample rates $\theta$ from the prior sequential representation.

Fig. 7 shows the posterior marginal histograms for the hyperparameters $\alpha,\lambda,\gamma$ in both the first iteration (orange) and the second and final iteration (blue) of truncation adaptation. The posterior distribution suggests that the network is dense (i.e. $\alpha\approx 0$ ). This conclusion is supported both by the close match of 100 samples from the posterior predictive distribution, shown in Figs. 8(a) and 8(b), and the findings of past work using this data [8]. Further, as in the synthetic examples, the truncation adaptation terminates after two iterations; but in this case the histograms do not change very much between the two. This is essentially because the truncation error in the first iteration is relatively low ( $\approx 0.02$ ), leading to a reasonably accurate truncated posterior and hence accurate predictions of the truncation error at higher truncation levels.

5 Conclusion

In this paper, we developed methods for tractable generative simulation and posterior inference in statistical models with discrete nonparametric priors via finite truncation. We demonstrated that these approximate truncation-based approaches are sound via theoretical error analysis. In the process, we also showed that the nonzero rates of the (tractable) rejection representation of a Poisson process are equal in distribution to the rates of the (intractable) inverse Lévy representation. Simulated and real network examples demonstrated that the proposed methods are useful in selecting truncation levels for both forward generation and inference in practice.

References

[1] Paul Erdős and Alfréd Rényi. On random graphs I. Publicationes Mathematicae, 6:290–297, 1959.
[2] Paul Holland, Kathryn Laskey, and Samuel Leinhardt. Stochastic block models: first steps. Social Networks, 5:109–137, 1983.
[3] Diana Cai, Trevor Campbell, and Tamara Broderick. Edge-exchangeable graphs and sparsity. In Advances in Neural Information Processing Systems, pages 4249–4257, 2016.
[4] Sinead Williamson. Nonparametric network models for link prediction. The Journal of Machine Learning Research, 17(1):7102–7121, 2016.
[5] Harry Crane and Walter Dempsey. Edge exchangeable models for interaction networks. Journal of the American Statistical Association, 113(523):1311–1326, 2018.
[6] Trevor Campbell, Diana Cai, and Tamara Broderick. Exchangeable trait allocations. Electronic Journal of Statistics, 12(2):2290–2322, 2018.
[7] Svante Janson. On edge exchangeable random graphs. Journal of Statistical Physics, 173(3-4):448–484, 2018.
[8] François Caron and Emily Fox. Sparse graphs using exchangeable random measures. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1295–1366, 2017.
[9] Adrien Todeschini, Xenia Miscouridou, and François Caron. Exchangeable random measures for sparse and modular graphs with overlapping communities. arXiv:1602.02114, 2016.
[10] Tue Herlau, Mikkel Schmidt, and Morten Mørup. Completely random measures for modelling block-structured sparse networks. In Advances in Neural Information Processing Systems, pages 4260–4268, 2016.
[11] Victor Veitch and Daniel Roy. The class of random graphs arising from exchangeable random measures. arXiv:1512.03099, 2015.
[12] Christian Borgs, Jennifer Chayes, Henry Cohn, and Nina Holden. Sparse exchangeable graphs and their limits via graphon processes. The Journal of Machine Learning Research, 18(1):7740–7810, 2017.
[13] François Caron and Judith Rousseau. On sparsity and power-law properties of graphs based on exchangeable point processes. arXiv:1708.03120, 2017.
[14] Peter Orbanz and Daniel Roy. Bayesian models of graphs, arrays and other exchangeable random structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):437–461, 2014.
[15] John Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1):59–78, 1967.
[16] Christian Robert and George Casella. Monte Carlo Statistical Methods. Springer, 2nd edition, 2004.
[17] Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. Bayesian data analysis. CRC Press, 3rd edition, 2013.
[18] Michael Escobar and Mike West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90:577–588, 1995.
[19] Thomas Griffiths and Zoubin Ghahramani. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems, 2005.
[20] Tamara Broderick, Ashia Wilson, and Michael Jordan. Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli, 24(4B):3181–3221, 2018.
[21] Yee Whye Teh, Dilan Görür, and Zoubin Ghahramani. Stick-breaking construction for the Indian buffet process. In Artificial Intelligence and Statistics, 2007.
[22] Maria Kalli, Jim Griffin, and Stephen Walker. Slice sampling mixture models. Statistics and Computing, 21:93–105, 2011.
[23] Peiyuan Zhu, Alexandre Bouchard-Côté, and Trevor Campbell. Slice sampling for general completely random measures. In Uncertainty in Artificial Intelligence, 2020.
[24] David Blei and Michael Jordan. Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1):121–143, 2006.
[25] David Blei and John Lafferty. A correlated topic model of science. The Annals of Applied Statistics, 1(1):17–35, 2007.
[26] Chong Wang, John Paisley, and David Blei. Online variational inference for the hierarchical Dirichlet process. In International Conference on Artificial Intelligence and Statistics, 2011.
[27] Finale Doshi, Kurt Miller, Jurgen Van Gael, and Yee Whye Teh. Variational inference for the Indian buffet process. In Artificial Intelligence and Statistics, pages 137–144, 2009.
[28] Trevor Campbell, Jonathan Huggins, Jonathan How, and Tamara Broderick. Truncated random measures. Bernoulli, 25(2):1256–1288, 2019.
[29] Hemant Ishwaran and Lancelot James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161–173, 2001.
[30] Hemant Ishwaran and Mahmoud Zarepour. Exact and approximate sum representations for the Dirichlet process. Canadian Journal of Statistics, 30(2):269–283, 2002.
[31] Anirban Roychowdhury and Brian Kulis. Gamma Processes, Stick-Breaking, and Variational Inference. In International Conference on Artificial Intelligence and Statistics, 2015.
[32] Diana Cai and Tamara Broderick. Completely random measures for modeling power laws in sparse graphs. In NIPS 2015 Workshop on Networks in the Social and Informational Sciences, 2015.
[33] Harry Crane and Walter Dempsey. A framework for statistical network modeling. arXiv:1509.08185, 2015.
[34] Jan Rosiński. Series representations of Lévy processes from the perspective of point processes. In Lévy processes, pages 401–415. Springer, 2001.
[35] Thomas Ferguson and Michael Klass. A representation of independent increment processes without Gaussian components. The Annals of Mathematical Statistics, 43(5):1634–1643, 1972.
[36] Robert Wolpert and Katja Ickstadt. Poisson/gamma random field models for spatial statistics. Biometrika, 85(2):251–267, 1998.
[37] Yee Whye Teh and Dilan Gorur. Indian buffet processes with power-law behavior. In Advances in Neural Information Processing Systems, pages 1838–1846, 2009.
[38] Yee Whye Teh and Michael Jordan. Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametrics, 1:158–207, 2010.
[39] Jan Rosiński. On series representations of infinitely divisible random vectors. The Annals of Probability, pages 405–430, 1990.
[40] Günter Last and Mathew Penrose. Lectures on the Poisson process, volume 7. Cambridge University Press, 2017.
[41] Thomas Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1:209–230, 1973.
[42] Nils Lid Hjort. Nonparametric Bayes estimators based on beta processes in models for life history data. The Annals of Statistics, 18(3):1259–1294, 1990.
[43] Anders Brix. Generalized gamma measures and shot-noise Cox processes. Advances in Applied Probability, 31(4):929–953, 1999.
[44] Charles Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2:1152–1174, 1974.
[45] Thomas Griffiths and Zoubin Ghahramani. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems, 2006.
[46] Pietro Panzarasa, Tore Opsahl, and Kathleen M Carley. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology, 60(5):911–932, 2009.
[47] John Kingman. Poisson Processes. Oxford Studies in Probability. Clarendon Press, 1992.

Appendix A Equivalence between nonzero rates from a rejection representation and the inverse Lévy representation

Proof of Theorem 3.5.

Denote $T_{k_{1}}$ be the first nonzero element that is generated from the rejection representation from Eq. 6 and correspondingly, denote $\Gamma_{k_{1}}$ be the jump of the unit-rate homogeneous Poisson process on $\mathbb{R}_{+}$ such that $T_{k_{1}}=\mu^{\leftarrow}(\Gamma_{k_{1}})$ , where $\mu$ is the proposal measure in the rejection representation. Let $f$ be a bounded continuous function. Then

$\displaystyle\mathbb{E}[f(T_{k_{1}})]$	$\displaystyle=\mathbb{E}[f(\mu^{\leftarrow}(\Gamma_{k_{1}}))]$	(64)
	$\displaystyle=\mathbb{E}\left[\sum_{j=1}^{\infty}f(\mu^{\leftarrow}(\Gamma_{j}))\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{j}))\geq U_{j})\right]\prod_{i=1}^{j-1}\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{i}))<U_{i}\right]\right]$	(65)
	$\displaystyle=\mathbb{E}\left[\sum_{j=1}^{\infty}f(\mu^{\leftarrow}(\Gamma_{j}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{j}))\prod_{i=1}^{j-1}\left(1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{i}))\right)\right]$	(66)
	$\displaystyle=\sum_{j=1}^{\infty}\mathbb{E}\left[f(\mu^{\leftarrow}(\Gamma_{j}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{j}))\mathbb{E}\left[\left.\prod_{i=1}^{j-1}\left(1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{i}))\right)\right\|\Gamma_{j}\right]\right]$	(67)

Note that given $\Gamma_{j}$ , $\Gamma_{i}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Unif}}(0,\Gamma_{j})$ , for $i=1,\cdots,j-1$ , so

\displaystyle\mathbb{E}\left[\left.\prod_{i=1}^{j-1}\left(1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{i}))\right)\right|\Gamma_{j}\right]=\mathbb{E}\left[1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(U))\,|\,\Gamma_{j}\right]^{j-1},

(68)

where $U\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Unif}}(0,\Gamma_{j})$ . Using the change of variable $y=\mu^{\leftarrow}(u)$ , we obtain

\displaystyle\mathbb{E}\left[1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(U))\,|\,\Gamma_{j}\right]=1-\frac{1}{\Gamma_{j}}\int_{0}^{\Gamma_{j}}\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(u))\mathrm{d}u=1-\frac{1}{\Gamma_{j}}\int_{\mu^{\leftarrow}(\Gamma_{j})}^{\infty}\mathrm{d}\nu.

(69)

Therefore, using the same change of variable trick,

$\displaystyle\mathbb{E}[f(T_{k_{1}})]$	$\displaystyle=\sum_{j=1}^{\infty}\mathbb{E}\left[f(\mu^{\leftarrow}(\Gamma_{j}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{j}))\left(1-\frac{1}{\Gamma_{j}}\int_{\mu^{\leftarrow}(\Gamma_{j})}^{\infty}\mathrm{d}\nu\right)^{j-1}\right]$	(70)
	$\displaystyle=\sum_{j=1}^{\infty}\int_{0}^{\infty}f(\mu^{\leftarrow}(\gamma))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma))\left(1-\frac{1}{\gamma}\int_{\mu^{\leftarrow}(\gamma)}^{\infty}\mathrm{d}\nu\right)^{j-1}\frac{\gamma^{j-1}}{(j-1)!}e^{-\gamma}\mathrm{d}\gamma$	(71)
	$\displaystyle=\sum_{j=1}^{\infty}\int_{0}^{\infty}f(y)\left(1-\mu[y,\infty)^{-1}\nu[y,\infty)\right)^{j-1}\frac{\mu[y,\infty)^{j-1}}{(j-1)!}e^{-\mu[y,\infty)}\nu(\mathrm{d}y)$	(72)
	$\displaystyle=\int_{0}^{\infty}f(y)\sum_{j=1}^{\infty}\frac{\left(\mu[y,\infty)-\nu[y,\infty)\right)^{j-1}}{(j-1)!}e^{-\mu[y,\infty)}\nu(\mathrm{d}y)$	(73)
	$\displaystyle=\int_{0}^{\infty}f(y)e^{-\nu[y,\infty)}\nu(\mathrm{d}y).$	(74)

Suppose that $\theta_{1}$ is the first rate generated using the inverse Lévy representation. Then

\displaystyle\mathbb{E}[f(\theta_{1})]=\int_{0}^{\infty}f(\nu^{\leftarrow}(\gamma))e^{-\gamma}\mathrm{d}\gamma.

(75)

Making the change of variable $y=\nu^{\leftarrow}(\gamma)$ , we obtain

\displaystyle\mathbb{E}[f(\theta_{1})]=\int_{0}^{\infty}f(y)e^{-\nu[\gamma,\infty)}\nu(\mathrm{d}y)=\mathbb{E}[f(T_{k_{1}})].

(76)

Therefore, the first nonzero rate $\theta_{k_{1}}$ from the rejection representation has the same marginal distribution as the first rate $\theta_{1}$ from the inverse Lévy representation.

We now employ an inductive argument. Suppose that we have shown that the marginal distribution of first nonzero $M$ elements $\Xi_{M}:=(T_{k_{1}},T_{k_{2}},\cdots,T_{k_{M}})$ from the rejection representation has the same marginal distribution as the first $M$ elements $\Theta_{M}:=(\theta_{1},\cdots,\theta_{M})$ from the inverse Lévy representation. To prove the same for $M+1$ elements, it suffices to show that the conditional distribution of $T_{k_{M+1}}$ given $\Xi_{M}$ is equal to the conditional distribution of $\theta_{M+1}$ given $\Theta_{M}$ when $\Xi_{M}=\Theta_{M}$ .

Denote $\Gamma^{\prime}_{j}=\sum_{i=1}^{j}e^{\prime}_{i}$ , where $e^{\prime}_{i}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Exp}}(1)$ , and $U^{\prime}_{i}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Unif}}[0,1]$ . Then

\displaystyle\mathbb{E}[f(T_{k_{M+1}})|\Xi_{M}]=\mathbb{E}[f(\mu^{\leftarrow}(\Gamma_{k_{M+1}}))|\Xi_{M}]=\mathbb{E}\left[\left.\sum_{j=1}^{\infty}f(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j}))\mathds{1}[\dots]\right|\Xi_{M}\right],

(77)

where $\mathds{1}[\dots]$ is shorthand for

\displaystyle\mathds{1}[\dots]=\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j}))\geq U^{\prime}_{j}\right]\prod_{i=1}^{j-1}\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma_{i}))<U^{\prime}_{i}\right].

(78)

Using steps similar to the base case,

\displaystyle\mathbb{E}[f(T_{k_{M+1}})|\Xi_{M}]=\sum_{j=1}^{\infty}\mathbb{E}\left[\left.f(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j}))\mathbb{E}[\dots]^{j-1}\right|\Xi_{M}\right],

(79)

where

\displaystyle\mathbb{E}[\dots]=\mathbb{E}\left[\left.1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+U))\right|\Gamma^{\prime}_{j},\Xi_{M}\right]\qquad U\sim{\sf{Unif}}[0,\Gamma^{\prime}_{j}].

(80)

Making the change of variable $y=\mu^{\leftarrow}(\Gamma_{k_{M}}+u)$ as before, we obtain

	$\displaystyle\mathbb{E}\left[\left.1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+U))\right\|\Gamma^{\prime}_{j},\Xi_{M}\right]$	$\displaystyle=1-\int_{0}^{\Gamma^{\prime}_{j}}\frac{1}{\Gamma^{\prime}_{j}}\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+u))\mathrm{d}u$		(81)
		$\displaystyle=1-\frac{1}{\Gamma^{\prime}_{j}}\int_{\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j})}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}\mathrm{d}\nu.$		(82)

Making another change of variables $y=\mu^{\leftarrow}(\Gamma_{k_{M}}+\gamma)$ in the original integral—and hence $\gamma=\mu[y,\infty)-\Gamma_{k_{M}}=\mu[y,\mu^{\leftarrow}(\Gamma_{k_{M}}))$ —yields

	$\displaystyle\mathbb{E}[f(T_{k_{M+1}})\|\Xi_{M}]$	(83)
$\displaystyle=$	$\displaystyle\sum_{j=1}^{\infty}\int_{0}^{\infty}f(\mu^{\leftarrow}(\Gamma_{k_{M}}+\gamma))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+\gamma))\mathbb{E}[\dots]^{j-1}\frac{\gamma^{j-1}}{(j-1)!}e^{-\gamma}\mathrm{d}\gamma$	(84)
$\displaystyle=$	$\displaystyle\sum_{j=1}^{\infty}\int_{0}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}f(y)\left(1-\frac{1}{\gamma}\int_{y}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}\mathrm{d}\nu\right)^{j-1}\frac{\gamma^{j-1}}{(j-1)!}e^{-\gamma}\nu(\mathrm{d}y)$	(85)
$\displaystyle=$	$\displaystyle\sum_{j=1}^{\infty}\int_{0}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}f(y)\frac{(\mu[y,\mu^{\leftarrow}(\Gamma_{k_{M}}))-\nu[y,\mu^{\leftarrow}(\Gamma_{k_{M}})))^{j-1}}{(j-1)!}e^{-\mu[y,\mu^{\leftarrow}(\Gamma_{k_{M}}))}\nu(\mathrm{d}y)$	(86)
$\displaystyle=$	$\displaystyle\int_{0}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}f(y)e^{-\nu[y,\mu^{\leftarrow}(\Gamma_{k_{M}}))}\nu(\mathrm{d}y)=\int_{0}^{T_{k_{M}}}f(y)e^{-\nu[y,T_{k_{M}})}\nu(\mathrm{d}y).$	(87)

On the other hand,

		$\displaystyle\mathbb{E}[f(\theta_{M+1})\|\Theta_{M}]=\mathbb{E}[f(\nu^{\leftarrow}(\Gamma_{M}+\Gamma^{\prime}_{1}))\|\Theta_{M}]=\int_{0}^{\infty}f(\nu^{\leftarrow}(\Gamma_{M}+\gamma))e^{-\gamma}\mathrm{d}\gamma$		(88)
	$\displaystyle=$	$\displaystyle\int_{0}^{\nu^{\leftarrow}(\Gamma_{M})}f(y)e^{-\nu[y,\nu^{\leftarrow}(\Gamma_{M}))}\nu(\mathrm{d}y)=\int_{0}^{\theta_{M}}f(y)e^{-\nu[y,\theta_{M})}\nu(\mathrm{d}y).$		(89)

Thus the distribution of the $(M+1)^{\text{th}}$ nonzero rate in the rejection representation $T_{k_{M+1}}$ given $\Xi_{M}$ is equal to the distribution of the $(M+1)^{\text{th}}$ rate from the inverse Lévy representation $\theta_{M+1}$ given $\Theta_{M}$ when $\Xi_{M}=\Theta_{M}$ . ∎

Appendix B Truncation error bounds for self-product measures

Proof of Lemma 3.1.

Denote $\tilde{\Theta}=\{\Theta^{(d)},\Theta_{K}^{(d)}\}$ . Denote the marginal probability mass function (PMF) of $E_{N}\in\mathcal{N}_{d}$ and $E_{N,K}\in\mathcal{N}_{d}$ as $P_{N}$ and $P_{N,K}$ , and denote their PMFs given $\tilde{\Theta}$ as $f(x|\tilde{\Theta})$ and $f_{K}(x|\tilde{\Theta})$ respectively.

	$\displaystyle\mathrm{D_{TV}}\left(P_{N},P_{N,K}\right)$		(90)
	$\displaystyle=\frac{1}{2}\sum_{x\in\mathcal{N}_{d}}\Big{\|}P_{N}(x)-P_{N,K}(x)\Big{\|}$		(91)
	$\displaystyle=\frac{1}{2}\sum_{x\in\mathcal{N}_{d}}\left\|\mathbb{E}[f(x\|\tilde{\Theta})]-\mathbb{E}[f_{K}(x\|\tilde{\Theta})]\right\|$		(92)
	$\displaystyle\leq\frac{1}{2}\mathbb{P}(I_{N}\leq K)\sum_{x\in\mathcal{N}_{d}}\left\|\mathbb{E}[f(x\|\tilde{\Theta})\|I_{N}\leq K]-\mathbb{E}[f_{K}(x\|\tilde{\Theta})\|I_{N}\leq K]\right\|$		(93)
	$\displaystyle\,\,\,+\frac{1}{2}\mathbb{P}(I_{N}>K)\sum_{x\in\mathcal{N}_{d}}\left\|\mathbb{E}[f(x\|\tilde{\Theta})\|I_{N}>K]-\mathbb{E}[f_{K}(x\|\tilde{\Theta})\|I_{N}>K]\right\|$		(94)

Conditioned on $I_{N}\leq K$ , $f(x|\tilde{\Theta})=f_{K}(x|\tilde{\Theta})$ under both the independent and categorical likelihood. So

\displaystyle\mathbb{E}[f(x|\tilde{\Theta})|I_{N}\leq K]-\mathbb{E}[f_{K}(x|\tilde{\Theta})|I_{N}\leq K]=0.

(96)

By Fubini’s Theorem,

	$\displaystyle\mathrm{D_{TV}}\left(P_{N},P_{N,K}\right)$	(97)
$\displaystyle\leq$	$\displaystyle\frac{1}{2}\mathbb{P}(I_{N}>K)\sum_{x\in\mathcal{N}_{d}}\left(\ \mathbb{E}[f(x\|\tilde{\Theta})\|I_{N}>K]+\mathbb{E}[f_{K}(x\|\tilde{\Theta})\|I_{N}>K]\right)$	(98)
$\displaystyle=$	$\displaystyle\frac{1}{2}\mathbb{P}(I_{N}>K)\left(\mathbb{E}\left[\left.\sum_{x\in\mathcal{N}_{d}}f(x\|\tilde{\Theta})\right\|I_{N}>K\right]+\mathbb{E}\left[\left.\sum_{x\in\mathcal{N}_{d}}f_{K}(x\|\tilde{\Theta})\right\|I_{N}>K\right]\right)$	(99)
$\displaystyle=$	$\displaystyle\mathbb{P}(I_{N}>K)=1-\mathbb{P}\left(I_{N}\leq K\right).$	(100)

∎

Proof of Theorems 3.4 and 4.1.

Denote the set of indices

\displaystyle\mathcal{I}_{\ell,K}:=\{i\in\mathbb{N}_{\neq}^{d}:1\leq i_{1},\cdots,i_{\ell}\leq K,\ K+1\leq i_{\ell+1},\cdots,i_{d}<\infty\}

(101)

such that $i\in\mathcal{I}_{\ell,K}$ indicates that the first $\ell$ elements of $i$ belong to the truncation, and the remaining $d-\ell$ elements belong to the tail. By Jensen’s inequality,

	$\displaystyle\mathbb{P}\left(I_{N}\leq K\,\|\,U_{1:K},\Gamma_{1:K}\right)$
	$\displaystyle=\mathbb{E}\left[\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right\}\,\|\,U_{1:K},\Gamma_{1:K}\right]$
	$\displaystyle\geq\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\mathbb{E}\left[\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\,\|\,U_{1:K},\Gamma_{1:K}\right]\right\}.$		(102)

This equation arises by noting that $I_{N}\leq K$ if and only if for all $i$ involving an index $i_{j}>K$ , the count of edge $i$ is 0 after $N$ rounds; the factor $\binom{d}{\ell}$ accounts for the fact that $\vartheta_{i}=\prod_{j=1}^{d}\theta_{i_{j}}$ is independent of the ordering of the $i_{j}$ .

Consider a single term $\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi(\vartheta_{i})$ in the above sum. Since we are conditioning on $U_{1:K},\Gamma_{1:K}$ , we have that $\theta_{i_{1}},\dots,\theta_{i_{\ell}}$ are fixed in the expectation, and the remaining steps $\Gamma_{K+1},\Gamma_{K+2},\cdots$ are the ordered jumps of a unit rate homogeneous Poisson process on $\left[\Gamma_{K},\infty\right)$ . By the marking property of the Poisson process [47], conditioned on $\Gamma_{K}$ , we have that $(U_{i},\Gamma_{i})_{i=K+1}^{\infty}$ is a Poisson process on $\mathbb{R}_{+}\times\left[\Gamma_{K},\infty\right)$ with rate measure $g(\mathrm{d}u)\mathrm{d}\gamma$ . Thus we apply the Slivnyak-Mecke theorem [40] to the remaining $d-\ell$ indices to obtain

	$\displaystyle\mathbb{E}\left[\left.\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right\|U_{1:K},\Gamma_{1:K}\right]$	(103)
	$\displaystyle\mathbb{E}\left[\left.\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\prod_{j=1}^{\ell}\theta_{i_{j}}\prod_{j=\ell+1}^{d}\tau(U_{i_{j}},\Gamma_{i_{j}})\right)\right\|U_{1:K},\Gamma_{1:K}\right]$	(104)
$\displaystyle=$	$\displaystyle\sum_{i_{1}\neq\cdots\neq i_{\ell}\leq K}\mathbb{E}\left[\int_{\mathbb{R}_{+}^{d-\ell}}\log\pi\left(\prod_{j=1}^{\ell}\theta_{i_{j}}\prod_{j=\ell+1}^{d}\tau(U_{i_{j}},\Gamma_{K}+\gamma_{j})\right)\prod_{j=\ell+1}^{d}\mathrm{d}\gamma_{j}\,\|\,U_{1:K},\Gamma_{1:K}\right]$	(105)
$\displaystyle=$	$\displaystyle\sum_{\begin{subarray}{c}\mathcal{L}\subseteq[K]\\ \|\mathcal{L}\|=\ell\end{subarray}}\int_{[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\mathbb{E}\left[\log\pi\left(\prod_{j\in\mathcal{L}}\theta_{j}\cdot\prod_{j=1}^{d-\ell}\tau(U^{\prime}_{j},\gamma_{j})\right)\,\|\,\theta_{1:K}\right]\mathrm{d}\gamma_{1:d-\ell},$	(106)

where $U^{\prime}_{j}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}g$ . Substitution of this expression into Eq. 102 yields the result of Theorem 4.1. Next, we consider the bound on the marginal probability $\mathbb{P}\left(I_{N}\leq K\right)$ . By Jensen’s inequality applied to Eq. 102 and following the previous derivation, we find that

\displaystyle\mathbb{P}\left(I_{N}\leq K\right)\geq\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\mathbb{E}\left[\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right]\right\},

(107)

and

		$\displaystyle\mathbb{E}\left[\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right]$		(108)
	$\displaystyle=$	$\displaystyle\sum_{i_{1}\neq\cdots\neq i_{\ell}\leq K}\mathbb{E}\left[\int_{[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\log\pi\left(\prod_{j=1}^{\ell}\theta_{i_{j}}\prod_{j=1}^{d-\ell}\tau(U^{\prime}_{j},\gamma_{j})\right)\mathrm{d}\gamma_{1:d-\ell}\right].$		(109)

Using the fact that conditioned on $\Gamma_{K}$ , $\Gamma_{1:K-1}$ are uniformly distributed on $[0,\Gamma_{K}]$ , and that at most one $i_{j}$ can be equal to $K$ ,

	$\displaystyle\sum_{i_{1}\neq\cdots\neq i_{\ell}\leq K}\mathbb{E}\left[\int_{[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\log\pi\left(\prod_{j=1}^{\ell}\theta_{i_{j}}\prod_{j=1}^{d-\ell}\tau(U^{\prime}_{j},\gamma_{j})\right)\mathrm{d}\gamma_{1:d-\ell}\right]$		(110)
	$\displaystyle=\frac{(K-1)!}{(K-1-\ell)!}\mathbb{E}\left[\Gamma_{K}^{-\ell}\int_{[0,\Gamma_{K}]^{\ell}\times[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\log\pi\left(\prod_{j=1}^{d}\tau(U_{j},\gamma_{j})\right)\mathrm{d}\gamma_{1:d}\right]$		(111)
	$\displaystyle+\ell\frac{(K-1)!}{(K-\ell)!}\mathbb{E}\left[\Gamma_{K}^{-(\ell-1)}\int_{[0,\Gamma_{K}]^{\ell}\times[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\delta_{\gamma_{1}=\Gamma_{K}}\log\pi\left(\prod_{j=1}^{d}\tau(U_{j},\gamma_{j})\right)\mathrm{d}\gamma_{1:d}\right],$		(112)

where the first and second terms arise from portions of the sum where all indices satisfy $i_{j}\neq K$ and one index satisfies $i_{j}=K$ , respectively.

To complete the result, we study the asymptotics of the marginal probability bound. It follows from Eq. 8 that $I_{N}<\infty$ almost surely. Therefore

\displaystyle\lim_{K\rightarrow\infty}\mathbb{E}\left[\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right\}\right]=\lim_{K\rightarrow\infty}\mathbb{P}(I_{N}\leq K)=1.

(113)

It then follows from [28, Lemma B.1] and continuous mapping theorem that

\displaystyle\sum_{\ell=0}^{d-1}\left[\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right]\overset{p}{\to}0\qquad\text{as}\,\,K\to\infty.

(114)

Since this sequence is monotonically increasing in $K$ , we have that

\displaystyle B_{K}=\mathbb{E}\left[\sum_{\ell=0}^{d-1}\left[\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}-\log\pi\left(\vartheta_{i}\right)\right]\right]\to 0\qquad\text{as}\,\,K\to\infty.

(115)

∎

B.1 Proof of Theorem 3.6

We first state an useful results which states that if one perturbs the probabilities of a countable discrete distribution by i.i.d. ${\sf{Gumbel}}(0,1)$ random variables, the arg max of the resulting set is a sample from that distribution.

Lemma B.1.

[28, Lemma 5.2] Let $(p_{j})_{j=1}^{\infty}$ be a countable collection of positive numbers such that $\sum_{j}p_{j}<\infty$ and let $\bar{p}_{j}=\frac{p_{j}}{\sum_{k}p_{k}}$ . If $(W_{j})_{j=1}^{\infty}$ are i.i.d ${\sf{Gumbel}}(0,1)$ random variables, then $\operatornamewithlimits{arg\,max}_{j\in\mathbb{N}}W_{j}+\log p_{j}$ exists, is unique a.s., and has distribution

\displaystyle\operatornamewithlimits{arg\,max}_{j\in\mathbb{N}}W_{j}+\log p_{j}\sim{\sf{Categorical}}\left((\bar{p}_{j})_{j=1}^{\infty}\right).

(116)

Proof of Theorem 3.6.

Since the $N$ edges from the categorical likelihood process are i.i.d. categorical random variables, by Jensen’s inequality,

\displaystyle\mathbb{P}(I_{N}\leq K)=\mathbb{E}\left[\mathbb{P}(I_{N}\leq K|\Theta)\right]=\mathbb{E}\left[\mathbb{P}(I_{1}\leq K|\Theta)^{N}\right]\geq\mathbb{E}\left[\mathbb{P}(I_{1}\leq K|\Theta)\right]^{N}.

(117)

Next, since $\vartheta_{i}=\prod_{j=1}^{d}\theta_{i_{j}}$ , we can simulate a categorical variable with probabilities proportional to $\vartheta_{i}$ , $i\in\mathbb{N}_{\neq}^{d}$ by sampling $d$ indices $(J_{1},\cdots,J_{d})$ independently from a categorical distribution with probabilities proportional to $(\theta_{1},\theta_{2},\dots)$ and discarding samples where $J_{j}=J_{k}$ for some $1\leq j,k\leq d$ . Denote $\theta^{\prime}_{k}=\theta_{k}/\sum_{k}\theta_{k}$ to be the normalized rates, $P_{J,K}:=\sum_{j=J}^{K}\theta^{\prime}_{j}$ , and the event $\mathcal{Q}:=\{J_{j}\neq J_{k},\forall 1\leq j,k\leq d\}$ . Then

\displaystyle\mathbb{P}(I_{1}\leq K|\Theta)=\mathbb{P}(1\leq J_{j}\leq K,\ 1\leq j\leq d\ |\ \mathcal{Q},\Theta).

(118)

Since the normalized rates $\theta^{\prime}_{k}$ are generated from the inverse Lévy representation, they are monotone decreasing. Therefore

	$\displaystyle\mathbb{P}(1\leq J_{j}\leq K,\ 1\leq j\leq d\ \|\ \mathcal{Q},\Theta)$	$\displaystyle\geq\frac{P_{1,K}}{1-0}\cdot\frac{P_{2,K}}{1-P_{1,1}}\cdots\frac{P_{d,K}}{1-P_{1,d-1}}$		(119)
		$\displaystyle\geq\left(\frac{P_{d,K}}{1-P_{1,d-1}}\right)^{d}.$		(120)

By Jensen’s inequality,

\displaystyle\mathbb{E}[\mathbb{P}(1\leq J_{j}\leq K,\ 1\leq j\leq d\ |\ \mathcal{Q},\Theta)]\geq\mathbb{E}\left[\frac{P_{d,K}}{1-P_{1,d-1}}\right]^{d}.

(121)

Note that for a categorical random variable $J$ with class probabilities given by $\theta^{\prime}_{j}/(1-P_{1,d-1})$ , $j\geq d$ , the quantity $P_{d,K}/(1-P_{1,d-1})$ is the probability that $d\leq J\leq K$ . So by the infinite Gumbel-max sampling lemma,

\displaystyle\mathbb{E}\left[\frac{P_{d,K}}{1-P_{1,d-1}}\right]=\mathbb{P}\left(d\leq\operatornamewithlimits{arg\,max}_{j\geq d}(\log\theta_{j}+W_{j})\leq K\right),\quad W_{j}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Gumbel}}(0,1),

(122)

where we can replace $\theta^{\prime}_{j}$ with the unnormalized $\theta_{j}$ because the normalization does not affect the $\operatornamewithlimits{arg\,max}$ . Denoting

\displaystyle M_{K}:=\max_{d\leq k\leq K}\log\nu^{\leftarrow}(\Gamma_{k})+W_{k},

\displaystyle M_{K+}:=\sup_{k>K}\log\nu^{\leftarrow}(\Gamma_{k})+W_{k},

(123)

we have that

\displaystyle\mathbb{P}(I_{N}\leq K)\geq\left(1-\mathbb{E}[\mathbb{P}(M_{K}<M_{K+}|\Gamma_{K})]\right)^{N\cdot d},

(124)

and so the remainder of the proof focuses on the conditional expectation. Conditioned on $\Gamma_{K}$ ,

\displaystyle M_{K}\overset{d}{=}\max\left\{\log\nu^{\leftarrow}(\Gamma_{K})+W_{K},\ \max_{d\leq k\leq K}\log\nu^{\leftarrow}(u_{k})+W_{k}\right\}.

(125)

The cumulative distribution function and the probability density function of the Gumbel distribution ${\sf{Gumbel}}(0,1)$ is

\displaystyle F(x)=e^{-e^{-x}},\quad f(x)=e^{-(x+e^{-x})}.

(126)

\displaystyle\mathbb{P}(\log\nu^{\leftarrow}(\Gamma_{K})+W_{K}\leq x\ |\ \Gamma_{K})=e^{-\nu^{\leftarrow}(\Gamma_{K})e^{-x}},

(127)

and

\displaystyle\mathbb{P}(\log\nu^{\leftarrow}(u_{k})+W_{k}\leq x\ |\ \Gamma_{K})=\int_{0}^{1}e^{-\nu^{\leftarrow}(\Gamma_{K}u)e^{-x}}\mathrm{d}u.

(128)

Therefore,

\displaystyle\mathbb{P}(M_{K}\leq x\ |\ \Gamma_{K})=\left(\int_{0}^{1}e^{-\nu^{\leftarrow}(\Gamma_{K}u)e^{-x}}\mathrm{d}u\right)^{K-d}e^{-\nu^{\leftarrow}(\Gamma_{K})e^{-x}}.

(129)

Denote $Q(u,t)=e^{-\nu^{\leftarrow}(u)e^{-t}}$ , and

\displaystyle\mathbb{P}(M_{K}\leq x\ |\ \Gamma_{K})=\left(\int_{0}^{1}Q(\Gamma_{K}u,x)\mathrm{d}u\right)^{K-d}Q(\Gamma_{K},x).

(130)

Conditioned on $\Gamma_{K}$ , the tail $\Gamma_{K+1},\Gamma_{K+2},\cdots$ is a unit rate homogeneous Poisson process on $[\Gamma_{K},\infty)$ that is independent of $\Gamma_{1},\cdots,\Gamma_{K-1}$ . So conditioned on $\Gamma_{K}$ ,

\displaystyle M_{K+}\overset{d}{=}\sup_{k\geq 1}\log\nu^{\leftarrow}(\Gamma_{K}+\Gamma_{k}^{\prime})+W_{k},

(131)

where $\Gamma_{k}^{\prime}$ is unit rate of homogeneous Poisson process on $\mathbb{R}_{+}$ . Since $\Gamma_{k}^{\prime}$ is a Poisson process, $\log\nu^{\leftarrow}(\Gamma_{K}+\Gamma_{k}^{\prime})+W_{k}$ is also a Poisson process with the rate measure

\displaystyle\left(\int_{0}^{\infty}e^{-(t-\log\nu^{\leftarrow}(\Gamma_{K}+\gamma))-e^{-(t-\log\nu^{\leftarrow}(\Gamma_{K}+\gamma))}}\mathrm{d}\gamma\right)\mathrm{d}t.

(132)

$\mathbb{P}(M_{K+}\leq x\ |\ \Gamma_{K})$ is the probability that no atom of the Poisson process is greater than $x$ . For a Poisson process with rate measure $\mu(\mathrm{d}t)$ , this probability is $e^{-\int_{x}^{\infty}\mu(\mathrm{d}t)}$ ,

$\displaystyle\mathbb{P}(M_{K+}\leq x\ \|\ \Gamma_{K})$	$\displaystyle=e^{-\int_{x}^{\infty}\left(\int_{0}^{\infty}e^{-(t-\log\nu^{\leftarrow}(\Gamma_{K}+\gamma))-e^{-(t-\log\nu^{\leftarrow}(\Gamma+\gamma))}}\mathrm{d}\gamma\right)\mathrm{d}t}$	(133)
	$\displaystyle=e^{\int_{0}^{\infty}\left(e^{-\nu^{\leftarrow}(\Gamma_{K}+\gamma)e^{-x}}-1\right)\mathrm{d}\gamma}$	(134)
	$\displaystyle=e^{\int_{0}^{\infty}Q(\Gamma_{K}+\gamma,\,x)-1\,\mathrm{d}\gamma},$	(135)

where the second equation comes from the fact that the inner integrand is the probability density function of a Gumbel distribution. Therefore,

	$\displaystyle\mathbb{P}(M_{K}<M_{K+}\ \|\ \Gamma_{K})$	(136)
$\displaystyle=$	$\displaystyle\int_{-\infty}^{\infty}\mathbb{P}(M_{K}\leq x\ \|\ \Gamma_{K})\frac{\mathrm{d}}{\mathrm{d}x}\mathbb{P}(M_{K+}\leq x\ \|\ \Gamma_{K})\mathrm{d}x$	(137)
$\displaystyle=$	$\displaystyle\int_{-\infty}^{\infty}Q(\Gamma_{K},x)\left(\int_{0}^{1}Q(\Gamma_{K}u,x)\mathrm{d}u\right)^{K-d}\left(\frac{\mathrm{d}}{\mathrm{d}x}e^{\int_{0}^{\infty}Q(\Gamma_{K}+\gamma,\,x)-1\,\mathrm{d}\gamma}\right)\mathrm{d}x.$	(138)

For the categorical variable $J$ with class probabilities given by $\theta^{\prime}_{j}/(1-P_{1,d-1})$ , $j\geq d$ , it holds that $\mathbb{P}(d\leq J\leq K)\uparrow 1$ as $K\to\infty$ . By the monotone convergence theorem

\displaystyle B_{K}=\mathbb{P}\left(\operatornamewithlimits{arg\,max}_{j\geq d}(\log\theta_{j}+W_{j})>K\right)=1-\mathbb{E}\left[\frac{P_{d,K}}{1-P_{1,d}}\right]\to 0.

(139)

∎

Appendix C Error bounds for edge-exchangeable networks

C.1 Rejection representation

We first derive the specific form of $B_{K}$ in Corollary 3.3 for the rejection representation from Eq. 6, where

\displaystyle\tau(U,\Gamma)=\mu^{\leftarrow}(\Gamma)\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma))\geq U\right],

(140)

and $U\sim{\sf{Unif}}[0,1]$ . So in Corollary 3.3,

	$\displaystyle B_{K,1}$	(141)
$\displaystyle=$	$\displaystyle\mathbb{E}\left[\int_{\mathbb{R}_{+}^{4}}-\log\pi\left(\tau(u_{1},\gamma_{1}+\Gamma_{K})\tau(u_{2},\gamma_{2}+\Gamma_{K})\right)\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}g(u_{1})\mathrm{d}u_{1}g(u_{2})\mathrm{d}u_{2}\right]$	(142)
$\displaystyle=$	$\displaystyle\mathbb{E}\left[\int_{\Gamma_{K}}^{\infty}\int_{\Gamma_{K}}^{\infty}\int_{0}^{1}\int_{0}^{1}-\log\pi\left(\tau(u_{1},\gamma_{1})\tau(u_{2},\gamma_{2})\right)\mathrm{d}u_{1}\mathrm{d}u_{2}\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}\right].$	(143)

Since $\pi(0)=1$ , $\log\pi(0)=0$ , we can take the indicator in $\tau$ out of the function $\log\pi(\tau(u_{1},\gamma_{1})\tau(u_{2},\gamma_{2}))$ to obtain

	$\displaystyle\int_{0}^{1}\int_{0}^{1}\log\pi\left(\tau(u_{1},\gamma_{1})\tau(u_{2},\gamma_{2})\right)\mathrm{d}u_{1}\mathrm{d}u_{2}$	(144)
$\displaystyle=$	$\displaystyle\int_{0}^{1}\int_{0}^{1}\log\pi\left(\mu^{\leftarrow}(\gamma_{1})\mu^{\leftarrow}(\gamma_{2})\right)\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{1}))\geq u_{1}\right]\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{2}))\geq u_{2}\right]\mathrm{d}u_{1}\mathrm{d}u_{2}$	(145)
$\displaystyle=$	$\displaystyle\log\pi\left(\mu^{\leftarrow}(\gamma_{1})\mu^{\leftarrow}(\gamma_{2})\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{1}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{2})).$	(146)

Transforming variables via $x_{1}=\mu^{\leftarrow}(\gamma_{1})$ and $x_{2}=\mu^{\leftarrow}(\gamma_{2})$ , and noting that $\mu^{\leftarrow}(\Gamma_{K})\geq x\iff\Gamma_{K}\leq\mu[x,\infty)$ ,

$\displaystyle B_{K,1}$	$\displaystyle=\mathbb{E}\left[\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}-\log\pi(x_{1}x_{2})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x_{1})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x_{2})\mu(\mathrm{d}x_{1})\mu(\mathrm{d}x_{2})\right]$
	$\displaystyle=\mathbb{E}\left[\int_{\mathbb{R}_{+}^{2}}-\log\pi(x_{1}x_{2})\mathds{1}[x_{1}\leq\mu^{\leftarrow}(\Gamma_{K})]\mathds{1}[x_{2}\leq\mu^{\leftarrow}(\Gamma_{K})]\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2})\right]$
	$\displaystyle=\int_{\mathbb{R}_{+}^{2}}-\log\pi(x_{1}x_{2})\mathbb{E}\left[\mathds{1}[\Gamma_{K}\leq\mu[\max\{x_{1},x_{2}\},\infty)]\right]\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2})$
	$\displaystyle=\int_{\mathbb{R}_{+}^{2}}-\log\pi(x_{1}x_{2})F_{K}\left(\mu[\max\{x_{1},x_{2}\},\infty)\right)\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}),$	(147)

where $F_{K}$ is the cumulative distribution function of $\Gamma_{K}$ . Using the same variable transformation again,

	$\displaystyle\frac{1}{2(K-1)}B_{K,2}$	(148)
$\displaystyle=$	$\displaystyle\mathbb{E}\left[\int_{0}^{\Gamma_{K}}\int_{\mathbb{R}_{+}^{3}}-\frac{1}{\Gamma_{K}}\log\pi\left(\tau(u_{1},\gamma_{1})\tau(u_{2},\gamma_{2}+\Gamma_{K})\right)g(u_{1})\mathrm{d}u_{1}g(u_{2})\mathrm{d}u_{2}\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}\right]$	(149)
$\displaystyle=$	$\displaystyle\mathbb{E}\left[\int_{\mu^{\leftarrow}(\Gamma_{K})}^{\infty}\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}-\frac{1}{\Gamma_{K}}\log\pi(x_{1}x_{2})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x_{1})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x_{2})\mu(\mathrm{d}x_{1})\mu(\mathrm{d}x_{2})\right]$	(150)
$\displaystyle=$	$\displaystyle\int_{\mathbb{R}_{+}^{2}}-\mathbb{E}\left[\frac{1}{\Gamma_{K}}\mathds{1}[\mu[x_{1},\infty)\leq\Gamma_{K}\leq\mu[x_{2},\infty)]\right]\log\pi(x_{1}x_{2})\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}).$	(151)

$\displaystyle B_{K,3}$	$\displaystyle=2\mathbb{E}\left[\int_{\mathbb{R}_{+}^{3}}-\log\pi\left(\tau(u_{1},\Gamma_{K})\tau(u_{2},\gamma_{2}+\Gamma_{K})\right)g(u_{1})\mathrm{d}u_{1}g(u_{2})\mathrm{d}u_{2}\mathrm{d}\gamma_{2}\right]$	(152)
	$\displaystyle=2\mathbb{E}\left[\int_{\Gamma_{K}}^{\infty}-\log\pi\left(\mu^{\leftarrow}(\Gamma_{K})\mu^{\leftarrow}(\gamma_{2})\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{K}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{2}))\mathrm{d}\gamma_{2}\right]$	(153)
	$\displaystyle=2\mathbb{E}\left[\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}-\log\pi\left(\mu^{\leftarrow}(\Gamma_{K})x\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{K}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x)\mu(\mathrm{d}x)\right]$	(154)
	$\displaystyle=2\mathbb{E}\left[\int_{\mathbb{R}_{+}}-\log\pi\left(\mu^{\leftarrow}(\Gamma_{K})x\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{K}))\mathds{1}[\Gamma_{K}\leq\mu[x,\infty)]\nu(\mathrm{d}x)\right].$	(155)

C.2 Beta-independent Bernoulli process network

Dense network

When $\alpha=0$ ,

\displaystyle\nu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}(1-\theta)^{\lambda-1}\mathrm{d}\theta,\quad\mu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}\mathrm{d}\theta,

(156)

and

\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1-\theta)^{\lambda-1},\quad\mu\left([x,1]\right)=-\gamma^{\prime}\log x,\quad\mu^{\leftarrow}(u)=e^{-u/\gamma^{\prime}}.

(157)

Substituting $\nu(\mathrm{d}x),\mu(\mathrm{d}x)$ and $\pi(\theta)$ into Eq. 147 and noting that the integrand is symmetric around the line $x_{1}=x_{2}$ ,

\displaystyle B_{K,1}

\displaystyle=2\int_{0}^{1}\int_{x_{1}}^{1}-\log(1-x_{1}x_{2})F_{K}\left(\mu[x_{2},1]\right)\gamma\lambda x_{2}^{-1}(1-x_{2})^{\lambda-1}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}).

(158)

Next, note that $0\leq F_{K}\left(\mu[x_{2},1]\right)\leq F_{K}\left(\mu[x_{1},1]\right)$ when $x_{2}\geq x_{1}$ and

\displaystyle 0\geq\log(1-x_{1}x_{2})\geq-\left(\frac{1}{1-x_{1}x_{2}}-1\right)=-\left(\frac{x_{1}x_{2}}{1-x_{1}x_{2}}\right)\geq-\left(\frac{x_{1}x_{2}}{1-x_{1}}\right).

(159)

	$\displaystyle B_{K,1}$	$\displaystyle\leq 2\gamma\lambda\int_{0}^{1}F_{K}\left(\mu([x_{1},1])\right)\int_{x_{1}}^{1}\frac{x_{1}x_{2}}{1-x_{1}}x_{2}^{-1}(1-x_{2})^{\lambda-1}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1})$		(160)
		$\displaystyle=2\gamma^{2}\lambda\int_{0}^{1}(1-x)^{2(\lambda-1)}F_{K}\left(\mu([x,1])\right)\mathrm{d}x.$		(161)

For any $a\in(0,1)$ , dividing the integral into two parts and bounding each part separately,

\displaystyle B_{K,1}\leq 2\gamma^{2}\lambda\left[\int_{0}^{a}(1-x)^{2(\lambda-1)}\mathrm{d}x+\int_{a}^{1}F_{K}\left(-\gamma^{\prime}\log x\right)(1-x)^{2(\lambda-1)}\mathrm{d}x\right].

(162)

Assume for the moment that $\lambda\neq 0.5$ . Use the fact that $F_{K}(t)\leq(3t/K)^{K}$ and note also that when $x\in[a,1]$ , $-\log x\leq\left[-\log a/(1-a)\right](1-x)$ ,

	$\displaystyle\frac{1}{2\gamma^{2}\lambda}B_{K,1}$	(163)
$\displaystyle\leq$	$\displaystyle\frac{1-(1-a)^{2\lambda-1}}{2\lambda-1}+\int_{a}^{1}\left(\frac{3\gamma^{\prime}\left[-\log a/(1-a)\right](1-x)}{K}\right)^{K}(1-x)^{2\lambda-2}\mathrm{d}x$	(164)
$\displaystyle=$	$\displaystyle\frac{1-(1-a)^{2\lambda-1}}{2\lambda-1}+\left(-\frac{3\gamma^{\prime}\log a}{K}\right)^{K}\frac{1}{K+2\lambda-1}(1-a)^{2\lambda-1}.$	(165)

It can be seen that there exists a constant value $c>1$ such that $3\gamma^{\prime}\log c=1/c$ . Setting $a=c^{-K}$ and using the first order Taylor’s expansion to approximate the first term, it can be seen that

\displaystyle\frac{1-(1-a)^{2\lambda-1}}{2\lambda-1}\sim c^{-K},\quad\left(-\frac{3\gamma^{\prime}\log a}{K}\right)^{K}\frac{1}{K+2\lambda-1}(1-a)^{2\lambda-1}\sim\frac{1}{K}c^{-K}.

(166)

Therefore, there exists $K_{0}$ such that when $K>K_{0}$ ,

\displaystyle B_{K,1}\leq 4\gamma^{2}\lambda c^{-K}.

(167)

If $\lambda=0.5$ ,

	$\displaystyle B_{K,1}$	$\displaystyle\leq 2\gamma^{2}\lambda\int_{0}^{1}(1-x)^{-1}F_{K}\left(\mu([x,1])\right)\mathrm{d}x$		(168)
		$\displaystyle\leq 2\gamma^{2}\lambda\left[-\log(1-a)+\left(\frac{3\gamma^{\prime}}{K}\right)^{K}\frac{1}{K}(-\log a)^{K}\right],$		(169)

which can be bounded similarly by choosing the same $c$ and setting $a=c^{-K}$ . Therefore we can find a constant $K_{0}$ and for $K>K_{0}$ ,

\displaystyle B_{K,1}\leq 4\gamma^{2}\lambda c^{-K}.

(170)

Next, the term $B_{K,2}$ is bounded via

	$\displaystyle\frac{1}{2}B_{K,2}$	(171)
$\displaystyle=$	$\displaystyle\int_{0}^{1}\int_{x_{1}}^{1}-\log(1-x_{1}x_{2})\left[F_{K-1}\left(\mu([x_{1},1])\right)-F_{K-1}\left(\mu([x_{2},1])\right)\right]\nu(\mathrm{d}x_{2})\nu(\mathrm{d}x_{1})$	(172)
$\displaystyle\leq$	$\displaystyle\int_{0}^{1}\int_{x_{1}}^{1}-\log(1-x_{1}x_{2})F_{K-1}\left(\mu[x_{1},1]\right)\nu(\mathrm{d}x_{2})\nu(\mathrm{d}x_{1}).$	(173)

This has exactly the same form as $B_{K,1}$ , except that the CDF of $\Gamma_{K}$ is replaced with that of $\Gamma_{K-1}$ . Therefore, it can be shown that for large $K$ ,

\displaystyle B_{K,2}\leq 4\gamma^{2}\lambda c^{-(K-1)}.

(174)

Finally, $B_{K,3}$ may be expressed as

	$\displaystyle\frac{1}{2}B_{K,3}$	(175)
$\displaystyle=$	$\displaystyle\mathbb{E}\left[\int_{0}^{1}-\log\left(1-\mu^{\leftarrow}(\Gamma_{K})x\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\left(\mu^{\leftarrow}(\Gamma_{K})\right)\mathds{1}\left[x\leq\mu^{\leftarrow}(\Gamma_{K})\right]\right]\nu(\mathrm{d}x)$	(176)
$\displaystyle=$	$\displaystyle\gamma\lambda\mathbb{E}\left[\int_{0}^{\mu^{\leftarrow}(\Gamma_{k})}-\log\left(1-\mu^{\leftarrow}(\Gamma_{K})x\right)\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda-1}x^{-1}(1-x)^{\lambda-1}\mathrm{d}x\right].$	(177)

Since $\log\left(1-\mu^{\leftarrow}(\Gamma_{K})x\right)\geq-\mu^{\leftarrow}(\Gamma_{K})x/\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)$ ,

$\displaystyle B_{K,3}$	$\displaystyle\leq 2\gamma\lambda\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda-2}\mu^{\leftarrow}(\Gamma_{K})\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}(1-x)^{\lambda-1}\mathrm{d}x\right]$	(178)
	$\displaystyle\geq 2\gamma\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda-2}\mu^{\leftarrow}(\Gamma_{K})\right]$	(179)
	$\displaystyle=2\gamma\int_{0}^{\infty}\left(1-e^{-x/\gamma^{\prime}}\right)^{\lambda-2}e^{-x/\gamma^{\prime}}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x.$	(180)

We split the analysis of this term into two cases. In the first case, assuming $\lambda\geq 2$ , we have that

\displaystyle B_{K,3}\leq 2\gamma\int_{0}^{\infty}\frac{x^{K-1}}{\Gamma(K)}e^{-x(1+\gamma^{\prime})/\gamma^{\prime}}\mathrm{d}x=2\gamma\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}.

(181)

On the other hand, if $\lambda<2$ , we bound the integral over $[0,\gamma^{\prime}]$ and over $[\gamma^{\prime},\infty)$ separately. Since $1-e^{-x}\geq x^{2}$ for $x\in[0,1]$ ,

	$\displaystyle\frac{1}{2\gamma}B_{K,3}$	(182)
$\displaystyle\leq$	$\displaystyle\int_{0}^{\gamma^{\prime}}\left(\frac{x}{\gamma^{\prime}}\right)^{2(\lambda-2)}\frac{x^{K-1}}{\Gamma(K)}e^{-x(1+\gamma^{\prime})/\gamma^{\prime}}\mathrm{d}x+(1-e^{-1})^{\lambda-2}\int_{\gamma^{\prime}}^{\infty}\frac{x^{K-1}}{\Gamma(K)}e^{-x(1+\gamma^{\prime})/\gamma^{\prime}}\mathrm{d}x$	(183)
$\displaystyle\leq$	$\displaystyle\gamma^{\prime 2(2-\lambda)}\int_{0}^{\gamma^{\prime}}\frac{x^{K-1+2(\lambda-2)}}{\Gamma(K)}\mathrm{d}x+(1-e^{-1})^{\lambda-2}\int_{0}^{\infty}\frac{x^{K-1}}{\Gamma(K)}e^{-x(1+\gamma^{\prime})/\gamma^{\prime}}\mathrm{d}x$	(184)
$\displaystyle=$	$\displaystyle\frac{1}{\Gamma(K)}\frac{\gamma^{\prime K}}{K+2(\lambda-2)}+(1-e^{-1})^{\lambda-2}\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}.$	(185)

As $K\rightarrow\infty$ , the second term will dominate the first term, so when $\lambda-2<0$ , the following inequality holds for large $K$ ,

\displaystyle B_{K,3}\leq 4\gamma(1-e^{-1})^{\lambda-2}\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}.

(186)

$B_{K}=B_{K,1}+B_{K,2}+B_{K,3}$ and as $K\rightarrow\infty$ , $B_{K,3}$ will dominate $B_{K,1}$ and $B_{K,2}$ . So there exists a $K_{0}\in\mathbb{N}$ such that for $K>K_{0}$ ,

\displaystyle B_{K}\leq 12\gamma(1-e^{-1})^{\lambda-2}\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}\longrightarrow 0.

(187)

Sparse network

When $\alpha>0$ ,

\displaystyle\nu(\mathrm{d}\theta)=\gamma^{\prime}\theta^{-1-\alpha}(1-\theta)^{\lambda+\alpha-1}\mathrm{d}\theta,\qquad\mu(\mathrm{d}\theta)=\gamma^{\prime}\theta^{-1-\alpha}\mathrm{d}\theta,

(188)

and

\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1-\theta)^{\lambda+\alpha-1},

\displaystyle\mu\left([x,1]\right)=\gamma^{\prime}\alpha^{-1}(x^{-\alpha}-1),

\displaystyle\mu^{\leftarrow}(u)=\left(1+\frac{\alpha u}{\gamma^{\prime}}\right)^{-1/\alpha}.

(189)

Similar to the case when $\alpha=0$ ,

\displaystyle B_{K,1}=2\int_{0}^{1}\int_{x_{1}}^{1}-\log(1-x_{1}x_{2})F_{K}\left(\mu([x_{2},1])\right)\gamma^{\prime}x_{2}^{-1-\alpha}(1-x_{2})^{\lambda+\alpha-1}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}).

(190)

Since $\log(1-x_{1}x_{2})\geq-x_{1}x_{2}/(1-x_{1})$ ,

$\displaystyle B_{K,1}$	$\displaystyle\leq 2\gamma^{\prime 2}\int_{0}^{1}F_{K}\left(\gamma^{\prime}\alpha^{-1}(x^{-\alpha}-1)\right)x^{-2\alpha}(1-x)^{2[(\lambda+\alpha)-1]}\mathrm{d}x$	(191)
	$\displaystyle\leq 2\gamma^{\prime 2}\int_{0}^{1}\frac{(\gamma^{\prime}\alpha^{-1}x^{-\alpha}(1-x^{\alpha}))^{K}}{\Gamma(K+1)}x^{-2\alpha}(1-x)^{2[(\lambda+\alpha)-1]}\mathrm{d}x$	(192)
	$\displaystyle\leq 2\gamma^{\prime 2}\int_{0}^{1}\frac{(\gamma^{\prime}\alpha^{-1}x^{-\alpha})^{K}}{\Gamma(K+1)}x^{-2\alpha}(1-x)^{2[(\lambda+\alpha)-1]}\mathrm{d}x$	(193)
	$\displaystyle\leq 2\alpha^{2}\int_{0}^{1}\frac{\left(\gamma^{\prime}\alpha^{-1}x^{-\alpha}\right)^{K+2}}{{\Gamma(K+1)}}\mathrm{d}x.$	(194)

Denoting $t=\gamma^{\prime}\alpha^{-1}x^{-\alpha}$ , then

$\displaystyle B_{K,1}$	$\displaystyle\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\int_{\gamma^{\prime}\alpha^{-1}}^{\infty}\frac{t^{K+1-1/\alpha}}{\Gamma(K+1)}\ \mathrm{d}t$	(195)
	$\displaystyle\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ \frac{\Gamma(K+2-1/\alpha)}{\Gamma(K+1)}\int_{\gamma^{\prime}\alpha^{-1}}^{\infty}\frac{t^{K+1-1/\alpha}}{\Gamma(K+2-1/\alpha)}e^{-t}\ \mathrm{d}t$	(196)
	$\displaystyle\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ \frac{\Gamma(K+2-1/\alpha)}{\Gamma(K+1)}.$	(197)

By Stirling’s formula,

\displaystyle B_{K,1}\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ K^{\frac{\alpha-1}{\alpha}}.

(198)

Now we consider the error bound for $B_{K,2}$ . As we have shown in the example when $\alpha=0$ , here we can obtain that

\displaystyle B_{K,2}\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ (K-1)^{\frac{\alpha-1}{\alpha}}.

(199)

Similar to $B_{K,1}$ and $B_{K,2}$ ,

	$\displaystyle B_{K,3}$	(200)
$\displaystyle=$	$\displaystyle 2\mathbb{E}\left[\int_{0}^{1}-\log\left(1-\mu^{\leftarrow}(\Gamma_{K})x\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\left(\mu^{\leftarrow}(\Gamma_{K})\right)\mathds{1}\left(\Gamma_{K}\leq\mu^{\leftarrow}[x,1]\right)\nu(\mathrm{d}x)\right]$	(201)
$\displaystyle\leq$	$\displaystyle 2\gamma^{\prime}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda+\alpha-1}\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}\frac{\mu^{\leftarrow}(\Gamma_{K})x}{1-x}x^{-1-\alpha}(1-x)^{\lambda+\alpha-1}\mathrm{d}x\right]$	(202)
$\displaystyle=$	$\displaystyle 2\gamma^{\prime}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda+\alpha-1}\mu^{\leftarrow}(\Gamma_{K})\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}x^{-\alpha}(1-x)^{\lambda+\alpha-2}\mathrm{d}x\right].$	(203)

We again split our analysis into two cases. First, suppose that $\lambda+\alpha-2\geq 0$ . Then

	$\displaystyle B_{K,3}$	(204)
$\displaystyle\leq$	$\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda+\alpha-1}\mu^{\leftarrow}(\Gamma_{K})^{2-\alpha}\right]$	(205)
$\displaystyle=$	$\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\int_{0}^{\infty}\left[1-\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{\lambda+\alpha-1}\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x.$	(206)

Since $\alpha<1$ ,

\displaystyle\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\leq\left(\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha},

(207)

and so

$\displaystyle B_{K,3}$	$\displaystyle\leq\frac{2\gamma^{\prime}}{1-\alpha}\int_{0}^{\infty}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\frac{x^{K-1-(2-\alpha)/\alpha}}{\Gamma(K)}e^{-x}\mathrm{d}x$	(208)
	$\displaystyle\leq\frac{2\gamma^{\prime}}{1-\alpha}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{\Gamma(K-(2-\alpha)/\alpha)}{\Gamma(K)}$	(209)
	$\displaystyle\sim\frac{2\gamma^{\prime}}{1-\alpha}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}K^{-(2-\alpha)/\alpha},$	(210)

where the last equation is obtained from Stirling’s formula.

On the other hand, if $\lambda+\alpha-2<0$ ,

	$\displaystyle B_{K,3}$	(211)
$\displaystyle\leq$	$\displaystyle 2\gamma^{\prime}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda+\alpha-1}\mu^{\leftarrow}(\Gamma_{K})\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}x^{-\alpha}(1-x)^{\lambda+\alpha-2}\mathrm{d}x\right]$	(212)
$\displaystyle\leq$	$\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{2(\lambda+\alpha)-3}\mu^{\leftarrow}(\Gamma_{K})^{2-\alpha}\right]$	(213)
$\displaystyle=$	$\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\int_{0}^{\infty}\left[1-\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{2(\lambda+\alpha)-3}\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x.$	(214)

If $2(\lambda+\alpha)-3\geq 0$ , we get the same result as in the case when $\lambda+\alpha-2\geq 0$ . When $2(\lambda+\alpha)-3\leq 0$ , note that we can find an $x_{0}$ such that when $x\in[0,x_{0}]$ ,

\displaystyle 1-\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-1/\alpha}\geq x^{2}.

(215)

	$\displaystyle B_{K,3}$	(216)
$\displaystyle\leq$	$\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\left\{\int_{0}^{x_{0}}x^{4(\lambda+\alpha)-6}\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x\right.$	(217)
	$\displaystyle\left.+\left[1-\left(1+\frac{\alpha x_{0}}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{2(\lambda+\alpha)-3}\int_{x_{0}}^{\infty}\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x\right\}$	(218)
$\displaystyle\leq$	$\displaystyle\frac{\gamma^{\prime}}{1-\alpha}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\left\{\int_{0}^{x_{0}}\frac{x^{K-1+4(\lambda+\alpha)-6-(2-\alpha)/\alpha}}{\Gamma(K)}e^{-x}\mathrm{d}x\right.$	(219)
	$\displaystyle\left.+\left[1-\left(1+\frac{\alpha x_{0}}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{2(\lambda+\alpha)-3}\frac{\Gamma(K-(2-\alpha)/\alpha)}{\Gamma(K)}\right\}.$	(220)

Because here we assume that $2(\lambda+\alpha)-3<0$ , the second term will dominate. So in this case we obtain that for large $K$ ,

\displaystyle B_{K,3}

\displaystyle\leq\frac{4\gamma^{\prime}}{1-\alpha}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\left[1-\left(1+\frac{\alpha x_{0}}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{2(\lambda+\alpha)-3}K^{-(2-\alpha)/\alpha}.

(221)

Asymptotically, $B_{K,2}$ will dominate $B_{K,1}$ and $B_{K,3}$ , so there exists $K_{0}\in\mathbb{N}$ such that when $K>K_{0}$ ,

\displaystyle B_{K}\leq 6\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ (K-1)^{\frac{\alpha-1}{\alpha}}\longrightarrow 0.

(222)

C.3 Gamma-independent Poisson network

Dense network

When $\alpha=0$ ,

\displaystyle\nu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}e^{-\lambda\theta}\mathrm{d}\theta,\qquad\mu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}(1+\lambda\theta)^{-1}\mathrm{d}\theta.

(223)

In this case,

\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1+\lambda\theta)e^{-\lambda\theta},

\displaystyle\mu[x,\infty)=\gamma\lambda\log(1+(\lambda x)^{-1}),

\displaystyle\mu^{\leftarrow}(u)=\frac{1}{\lambda(e^{(\gamma\lambda)^{-1}u}-1)}.

(224)

For Poisson distribution, $\pi(\theta)=e^{-\theta}$ , so

\displaystyle B_{K,1}=\int_{\mathbb{R}_{+}^{2}}x_{1}x_{2}F_{K}\left(\mu[\max\{x_{1},x_{2}\},\infty)\right)\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}).

(225)

Note that the integrand is symmetric around the line $x_{1}=x_{2}$ , so we only need to compute the integral above the line $x_{1}=x_{2}$ . In this region, $e^{-\lambda x_{2}}\leq e^{-\lambda x_{1}}$ and $0\leq F_{K}\left(\mu[x_{2},\infty)\right)\leq F_{K}\left(\mu[x_{1},\infty)\right)\leq 1$ . So

$\displaystyle B_{K,1}$	$\displaystyle=2\int_{\mathbb{R}_{+}}x_{1}\int_{x_{1}}^{\infty}x_{2}F_{K}\left(\mu[x_{2},\infty)\right)\lambda\gamma x_{2}^{-1}e^{-\lambda x_{2}}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1})$	(226)
	$\displaystyle\leq 2\int_{\mathbb{R}_{+}}x_{1}F_{K}\left(\mu[x_{1},\infty)\right)\gamma e^{-\lambda x_{1}}\nu(\mathrm{d}x_{1})$	(227)
	$\displaystyle=2\gamma^{2}\lambda\int_{\mathbb{R}_{+}}F_{K}\left(\mu[x,\infty)\right)e^{-2\lambda x}\mathrm{d}x.$	(228)

For any $a>0$ , we divide the integral into two parts and bound each part separately. We denote $b=\log(1+(\lambda a)^{-1})$ and use the fact that $\int_{0}^{a}e^{-2\lambda x}\mathrm{d}x\leq a$ and $F_{K}(t)\leq t^{K}/K!\leq(3t/K)^{K}$ . So

$\displaystyle B_{K,1}$	$\displaystyle\leq 2\gamma^{2}\lambda\left[\int_{0}^{a}e^{-2\lambda x}\mathrm{d}x+F_{K}\left(\mu[a,\infty)\right)\int_{a}^{\infty}e^{-2\lambda x}\mathrm{d}x\right]$	(229)
	$\displaystyle\leq 2\gamma^{2}\lambda\left[a+\frac{1}{2\lambda}F_{K}\left(\gamma\lambda\log(1+(\lambda a)^{-1})\right)\right]$	(230)
	$\displaystyle\leq 2\gamma^{2}\lambda\left[\left(\lambda(e^{b}-1)\right)^{-1}+\frac{1}{\lambda}\left(\frac{3\gamma\lambda b}{K}\right)^{K}\right].$	(231)

Setting two terms equal and use the fact that $(e^{b}-1)^{-1}\approx e^{-b}$ when $b$ is large, we get $b=KW_{0}((3\gamma\lambda)^{-1})$ and $W_{0}$ is defined by

\displaystyle W_{0}(y)=x\iff xe^{x}=y.

(232)

Therefore,

\displaystyle B_{K,1}\leq\frac{4\gamma^{2}}{e^{KW_{0}((3\gamma\lambda)^{-1})}-1}\sim 4\gamma^{2}e^{-KW_{0}\left((3\gamma\lambda)^{-1}\right)}.

(233)

Similarly,

		$\displaystyle\frac{1}{2(K-1)}B_{K,2}$		(234)
	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}_{+}^{2}}x_{1}x_{2}\mathbb{E}\left[\frac{1}{\Gamma_{K}}\mathds{1}\left(\mu[x_{2},\infty)\leq\Gamma_{K}\leq\mu[x_{1},\infty)\right)\right]\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}).$		(235)

Note that

		$\displaystyle\mathbb{E}\left[\frac{1}{\Gamma_{K}}\mathds{1}\left(\mu[x_{2},\infty)\leq\Gamma_{K}\leq\mu[x_{1},\infty)\right)\right]$		(236)
	$\displaystyle=$	$\displaystyle\frac{1}{K-1}\left[F_{K-1}\left(\mu[x_{1},\infty)\right)-F_{K-1}\left(\mu[x_{2},\infty)\right)\right].$		(237)

Keeping only the positive part,

	$\displaystyle B_{K,2}$	$\displaystyle\leq 2\int_{\mathbb{R}_{+}}x_{1}F_{K-1}\left(\mu[x_{1},\infty)\right)\int_{x_{1}}^{\infty}\gamma\lambda e^{-\lambda x_{2}}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1})$		(238)
		$\displaystyle=2\gamma^{2}\lambda\int_{\mathbb{R}_{+}}F_{K-1}\left(\mu[x,\infty)\right)e^{-2\lambda x}\mathrm{d}x.$		(239)

This has the same form as $B_{K,1}$ , so

\displaystyle B_{K,2}\leq\frac{4\gamma^{2}}{e^{(K-1)W_{0}((3\gamma\lambda)^{-1})}-1}\sim 4\gamma^{2}e^{-(K-1)W_{0}\left((3\gamma\lambda)^{-1}\right)}.

(240)

Next,

	$\displaystyle B_{K,3}$	(241)
$\displaystyle=$	$\displaystyle 2\mathbb{E}\left[\int_{\mathbb{R}_{+}}x\mu^{\leftarrow}(\Gamma_{K})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\left(\mu^{\leftarrow}(\Gamma_{K})\right)\mathds{1}\left[\Gamma_{K}\leq\mu[x,\infty)\right]\right]\nu(\mathrm{d}x)$	(242)
$\displaystyle=$	$\displaystyle 2\mathbb{E}\left[\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}x\mu^{\leftarrow}(\Gamma_{K})\left(1+\lambda\mu^{\leftarrow}(\Gamma_{K})\right)e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\gamma\lambda x^{-1}e^{-\lambda x}\mathrm{d}x\right]$	(243)
$\displaystyle=$	$\displaystyle 2\gamma\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})\left(1+\lambda\mu^{\leftarrow}(\Gamma_{K})\right)e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\left(1-e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\right)\right]$	(244)
$\displaystyle\leq$	$\displaystyle 2\gamma\lambda\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})^{2}\left(1+\lambda\mu^{\leftarrow}(\Gamma_{K})\right)e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\right].$	(245)

Note that $(1+x)e^{-x}\leq 1$ , so

	$\displaystyle B_{K,3}$	$\displaystyle\leq 2\lambda\gamma\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})^{2}\right]$		(246)
		$\displaystyle\leq 2\frac{\gamma}{\lambda}\mathbb{E}\left[e^{-(\gamma\lambda)^{-1}\Gamma_{K}}\right]=2\frac{\gamma}{\lambda}\left(\frac{\gamma\lambda}{1+\gamma\lambda}\right)^{K-1}.$		(247)

Since $B_{K,3}$ will dominate $B_{K,1}$ and $B_{K,2}$ asymptotically, there exists $K_{0}\in\mathbb{N}$ such that for $K>K_{0}$ ,

\displaystyle B_{K}\leq 3B_{K,3}\leq 6\frac{\gamma}{\lambda}\left(\frac{\gamma\lambda}{1+\gamma\lambda}\right)^{K-1}.

(248)

Sparse network

When $\alpha>0$ ,

\displaystyle\nu(\mathrm{d}\theta)=\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\theta^{-\alpha-1}e^{-\lambda\theta}\mathrm{d}\theta,\qquad\mu(\mathrm{d}\theta)=\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\theta^{-\alpha-1}\mathrm{d}\theta,

(249)

and

\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=e^{-\lambda\theta},

\displaystyle\mu[x,\infty)=\gamma^{\prime}x^{-\alpha},

\displaystyle\mu^{\leftarrow}(u)=(\gamma^{\prime}u^{-1})^{1/\alpha},

\displaystyle\gamma^{\prime}=\gamma\frac{\lambda^{1-\alpha}}{\alpha\Gamma(1-\alpha)}.

(250)

Similar to the example when $\alpha=0$ ,

$\displaystyle B_{K,1}$	$\displaystyle=\int_{\mathbb{R}_{+}^{2}}x_{1}x_{2}F_{K}\left(\mu[\max\{x_{1},x_{2}\},\infty)\right)\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2})$	(251)
	$\displaystyle=2\int_{\mathbb{R}_{+}}x_{1}\int_{x_{1}}^{\infty}F_{K}\left(\mu[x_{2},\infty)\right)\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}x_{2}^{-\alpha}e^{-\lambda x_{2}}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1})$	(252)
	$\displaystyle\leq 2\gamma\int_{\mathbb{R}_{+}}x_{1}F_{K}\left(\mu[x_{1},\infty)\right)\int_{x_{1}}^{\infty}\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}x_{2}^{-\alpha}e^{-\lambda x_{2}}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}).$	(253)

Note that the integrand with respect to $x_{2}$ is the density function of the gamma distribution with shape $\alpha$ and rate $\lambda$ , so the integral is less than 1. We partition the outer integral into two parts and bound them separately,

$\displaystyle B_{K,1}$	$\displaystyle\leq 2\gamma\int_{\mathbb{R}_{+}}F_{K}\left(\mu[x,\infty)\right)\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}x^{-\alpha}e^{-\lambda x}\mathrm{d}x$	(254)
	$\displaystyle\leq 2\gamma^{2}\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\left[\int_{0}^{a}x^{-\alpha}\mathrm{d}x+F_{K}\left(\mu[a,\infty)\right)\int_{a}^{\infty}x^{-\alpha}e^{\lambda x}\mathrm{d}x\right]$	(255)
	$\displaystyle\leq 2\gamma^{2}\frac{1}{\Gamma(1-\alpha)}\left[\frac{\lambda^{1-\alpha}}{1-\alpha}a^{1-\alpha}+\Gamma(1-\alpha)\left(\frac{3\gamma^{\prime}a^{-\alpha}}{K}\right)^{K}\right].$	(256)

By setting the two terms in the brackets equal, we get

\displaystyle a=\left[\frac{(1-\alpha)\Gamma(1-\alpha)}{\lambda^{1-\alpha}}\right]^{\frac{1}{(K-1)\alpha+1}}\left(\frac{3\gamma^{\prime}}{K}\right)^{\frac{K}{(K-1)\alpha+1}}\sim\left(\frac{3\gamma^{\prime}}{K}\right)^{\frac{1}{\alpha}}.

(257)

\displaystyle B_{K,1}\leq\frac{4\gamma^{2}\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\frac{3\gamma^{\prime}}{K}\right)^{\frac{1-\alpha}{\alpha}}.

(258)

Similar to the last example where $\alpha=0$ , here

	$\displaystyle B_{K,2}$	(259)
$\displaystyle=$	$\displaystyle 2(K-1)\int_{\mathbb{R}_{+}^{2}}x_{1}x_{2}\mathbb{E}\left[\frac{1}{\Gamma_{K}}\mathds{1}\left(\mu[x_{2},\infty)\leq\Gamma_{K}\leq\mu[x_{1},\infty)\right)\right]\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2})$	(260)
$\displaystyle\leq$	$\displaystyle 2\int_{\mathbb{R}_{+}}x_{1}F_{K-1}\left(\mu[x_{1},\infty)\right)\int_{x_{1}}^{\infty}x_{2}\nu(\mathrm{d}x_{2})\nu(\mathrm{d}x_{1})$	(261)
$\displaystyle\leq$	$\displaystyle 2\gamma\int_{\mathbb{R}_{+}}x_{1}F_{K-1}\left(\mu[x_{1},\infty)\right)\nu(\mathrm{d}x_{1}).$	(262)

This has the same form as $B_{K,1}$ , and therefore

\displaystyle B_{K,2}\leq\frac{4\gamma^{2}\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\frac{3\gamma^{\prime}}{K-1}\right)^{\frac{1-\alpha}{\alpha}}.

(263)

Finally, since both $e^{-\lambda x}<1$ and $e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}<1$ ,

	$\displaystyle B_{K,3}$	(264)
$\displaystyle=$	$\displaystyle 2\mathbb{E}\left[\int_{\mathbb{R}_{+}}\mu^{\leftarrow}(\Gamma_{K})xe^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\mathds{1}\left[x\leq\mu^{\leftarrow}(\Gamma_{K})\right]\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}x^{-1-\alpha}e^{-\lambda x}\mathrm{d}x\right]$	(265)
$\displaystyle\leq$	$\displaystyle 2\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}x^{-\alpha}\mathrm{d}x\right]$	(266)
$\displaystyle\leq$	$\displaystyle 2\gamma\frac{\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})^{2-\alpha}\right]$	(267)
$\displaystyle=$	$\displaystyle 2\gamma\frac{\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\gamma^{\prime}\right)^{\frac{2-\alpha}{\alpha}}\frac{\Gamma(K-\frac{2-\alpha}{\alpha})}{\Gamma(K)}.$	(268)

By Stirling’s formula,

\displaystyle\frac{\Gamma(K-(2-\alpha)/\alpha)}{\Gamma(K)}\sim\frac{\mathchoice{{\hbox{$\displaystyle\sqrt{2\pi(K-\frac{2-\alpha}{\alpha})\,}$}\lower 0.4pt\hbox{\vrule height=8.59721pt,depth=-6.8778pt}}}{{\hbox{$\textstyle\sqrt{2\pi(K-\frac{2-\alpha}{\alpha})\,}$}\lower 0.4pt\hbox{\vrule height=7.5pt,depth=-6.00003pt}}}{{\hbox{$\scriptstyle\sqrt{2\pi(K-\frac{2-\alpha}{\alpha})\,}$}\lower 0.4pt\hbox{\vrule height=5.25pt,depth=-4.20003pt}}}{{\hbox{$\scriptscriptstyle\sqrt{2\pi(K-\frac{2-\alpha}{\alpha})\,}$}\lower 0.4pt\hbox{\vrule height=4.2986pt,depth=-3.4389pt}}}\left(\frac{K-(2-\alpha)/\alpha}{e}\right)^{K-(2-\alpha)/\alpha}}{\mathchoice{{\hbox{$\displaystyle\sqrt{2\pi K\,}$}\lower 0.4pt\hbox{\vrule height=6.83331pt,depth=-5.46667pt}}}{{\hbox{$\textstyle\sqrt{2\pi K\,}$}\lower 0.4pt\hbox{\vrule height=6.83331pt,depth=-5.46667pt}}}{{\hbox{$\scriptstyle\sqrt{2\pi K\,}$}\lower 0.4pt\hbox{\vrule height=4.78333pt,depth=-3.82668pt}}}{{\hbox{$\scriptscriptstyle\sqrt{2\pi K\,}$}\lower 0.4pt\hbox{\vrule height=3.41666pt,depth=-2.73334pt}}}\left(\frac{K}{e}\right)^{K}}\sim K^{-\frac{2-\alpha}{\alpha}}.

(269)

\displaystyle B_{K,3}\leq 2\gamma\frac{\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\gamma^{\prime}\right)^{\frac{2-\alpha}{\alpha}}K^{-\frac{2-\alpha}{\alpha}}.

(270)

$B_{K,2}$ dominates $B_{K,1}$ and $B_{K,3}$ asymptotically, so there exists $K_{0}\in\mathbb{N}$ such that for $K>K_{0}$ ,

\displaystyle B_{K}\leq\frac{12\gamma^{2}\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\frac{3\gamma^{\prime}}{K-1}\right)^{\frac{1-\alpha}{\alpha}}.

(271)

Appendix D Truncated inference

Proof of Theorem 4.2.

Fix $K\in\mathbb{N}$ and $\epsilon>0$ , and define the subset of state space $A=\left\{X_{N+1}^{K+}=0,B(\Gamma_{1:K},\sigma)\leq\frac{\epsilon}{N+1}\right\}$ . By assumption,

\displaystyle\hat{\Pi}\left(A\right)\geq 1-\eta.

(272)

Further, by applying the bound from Theorem 4.1, we know that for states in $A$ ,

\displaystyle 1-\epsilon\leq p(X_{1:N+1}^{K+}\,|\,\theta_{1:K},\sigma)\leq 1.

(273)

Suppose $p$ is the RHS of Eq. 55 that is proportional to the density of $\Pi$ , and $\hat{p}$ is the RHS of Eq. 55 removing the term $p(X_{1:N}^{K+}|\theta_{1:K},\sigma)$ , which is proportional to the density of $\hat{\Pi}$ :

\displaystyle p\leq\hat{p},\qquad\text{and within $A$,}\quad p\geq(1-\epsilon)\hat{p}.

(274)

Define the normalization constants $Z,\hat{Z}$ such that $\int p/Z=\int\hat{p}/\hat{Z}=1$ . Then the above bounds yield

\displaystyle\hat{Z}=\!\int\hat{p}\geq\!\int p=\!\int_{A}p+\!\int_{A^{c}}p\geq\!\int_{A}(1-\epsilon)\hat{p}=(1-\epsilon)\frac{\int_{A}\hat{p}}{\int\hat{p}}\!\int\hat{p}\geq(1-\epsilon)(1-\eta)\hat{Z}.

(275)

Therefore, $(1-\epsilon)(1-\eta)\hat{Z}\leq Z\leq\hat{Z}$ , and hence

\displaystyle\Pi(A)=\int_{A}p/Z\geq(1-\epsilon)\int_{A}\hat{p}/Z\geq(1-\epsilon)\int_{A}\hat{p}/\hat{Z}\geq(1-\epsilon)(1-\eta).

(276)

The above results yield the total variation bound via

$\displaystyle\frac{1}{2}\int\left\|\frac{p}{Z}-\frac{\hat{p}}{\hat{Z}}\right\|$	$\displaystyle=\frac{1}{2}\int_{A}\left\|\frac{p}{Z}-\frac{\hat{p}}{\hat{Z}}\right\|+\frac{1}{2}\int_{A^{c}}\left\|\frac{p}{Z}-\frac{\hat{p}}{\hat{Z}}\right\|$	(277)
	$\displaystyle\leq\frac{1}{2}\int_{A}\left\|\frac{p}{Z}-\frac{p}{\hat{Z}}\right\|+\frac{1}{2}\int_{A}\left\|\frac{p}{\hat{Z}}-\frac{\hat{p}}{\hat{Z}}\right\|+\frac{1}{2}\left(\Pi(A^{c})+\hat{\Pi}(A^{c})\right)$	(278)
	$\displaystyle=\frac{1}{2}\left\|\frac{1}{Z}-\frac{1}{\hat{Z}}\right\|\int_{A}p+\frac{1}{2}\frac{1}{\hat{Z}}\int_{A}(\hat{p}-p)+\frac{1}{2}\left(\Pi(A^{c})+\hat{\Pi}(A^{c})\right)$	(279)
	$\displaystyle\leq\frac{1}{2}\left(1-(1-\epsilon)(1-\eta)\right)+\frac{1}{2}\epsilon+\frac{1}{2}\left(1-(1-\eta)(1-\epsilon)+\eta\right)$	(280)
	$\displaystyle=\frac{3(\epsilon+\eta)}{2}-\epsilon\eta.$	(281)

∎

	$\displaystyle\mathbb{P}\left(I_{N}\leq K\,\|\,U_{1:K},\Gamma_{1:K}\right)$
	$\displaystyle=\mathbb{E}\left[\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right\}\,\|\,U_{1:K},\Gamma_{1:K}\right]$
	$\displaystyle\geq\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\mathbb{E}\left[\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\,\|\,U_{1:K},\Gamma_{1:K}\right]\right\}.$		(102)

Truncated Simulation and Inference in Edge-Exchangeable Networks

Abstract

keywords:

1 Introduction

2 Background

2.1 Completely random measures and self-products

2.2 Series representations

2.3 Edge-exchangeable graphs

3 Truncated generative simulation

3.1 Truncation error bound

Lemma 3.1.

Corollary 3.2.

3.2 Independent likelihood process

Corollary 3.3.

Theorem 3.4.

3.3 Categorical likelihood process

Theorem 3.5.

Theorem 3.6.

3.4 Examples

Beta-Bernoulli process network

Gamma-independent Poisson network

4 Truncated posterior inference

4.1 Truncation error bound

Theorem 4.1.

Theorem 4.2.

4.2 Adaptive truncated Metropolis–Hastings

4.3 Experiments

Synthetic data

Real network data

5 Conclusion

References

Appendix A Equivalence between nonzero rates from a rejection representation and the inverse Lévy representation

Proof of Theorem 3.5.

Appendix B Truncation error bounds for self-product measures

Proof of Lemma 3.1.

Proof of Theorems 3.4 and 4.1.

B.1 Proof of Theorem 3.6

Lemma B.1.

Proof of Theorem 3.6.

Appendix C Error bounds for edge-exchangeable networks

C.1 Rejection representation

C.2 Beta-independent Bernoulli process network

Dense network

Sparse network

C.3 Gamma-independent Poisson network

Dense network

Sparse network

Appendix D Truncated inference

Proof of Theorem 4.2.

Truncated Simulation and Inference in
Edge-Exchangeable Networks