This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Truncated Simulation and Inference in
Edge-Exchangeable Networks

Xinglong Lilabel=e1]xinglong.li@stat.ubc.ca [    Trevor Campbelllabel=e2]trevor@stat.ubc.ca [ Department of Statistics
The University of British Columbia
email: xinglong.li@stat.ubc.ca; trevor@stat.ubc.ca
Abstract

Edge-exchangeable probabilistic network models generate edges as an i.i.d. sequence from a discrete measure, providing a simple means for statistical inference of latent network properties. The measure is often constructed using the self-product of a realization from a Bayesian nonparametric (BNP) discrete prior; but unlike in standard BNP models, the self-product measure prior is not conjugate the likelihood, hindering the development of exact simulation and inference algorithms. Approximation via finite truncation of the discrete measure is a straightforward alternative, but incurs an unknown approximation error. In this paper, we develop methods for forward simulation and posterior inference in random self-product-measure models based on truncation, and provide theoretical guarantees on the quality of the results as a function of the truncation level. The techniques we present are general and extend to the broader class of discrete Bayesian nonparametric models.

truncation,
Bayesian nonparametrics,
edge-exchangeable,
networks,
Bayesian inference,
keywords:
\startlocaldefs\endlocaldefs

t1This work is supported by a National Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant and Discovery Launch Supplement.

and

1 Introduction

Probabilistic generative models have for many years been key tools in the analysis of network data [1, 2]. Recent work in the area [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13] has begun to incorporate the use of nonparametric discrete measures, in an effort to address the limitations of traditional models in capturing the sparsity of real large-scale networks [14]. These models construct a discrete random measure Θ\Theta (often a completely random measure, or CRM [15]) on a space Ψ\Psi, associate each atom of the measure with a vertex in the network, and then use the self-product of the measure—i.e., the measure Θ×Θ\Theta\times\Theta on Ψ2\Psi^{2}—to represent the magnitude of interaction between vertices.

While the inclusion of a nonparametric measure enables capturing sparsity, it also makes both generative simulation and posterior inference via Markov chain Monte Carlo (MCMC) [16; 17, Ch. 11, 12] significantly more challenging. In standard Bayesian models with discrete nonparametric measures—such as the Dirichlet process mixture model [18] or beta process latent feature model [19]—this issue is typically addressed by exploiting the conjugacy of the (normalized) completely random measure prior and the likelihood to marginalize the latent infinite discrete measure [20]. But in the case of nonparametric network models, however, there is no such reprieve; the self-product of a completely random measure is not a completely random measure, and exact marginalization is typically not possible.

Another option is to truncate the discrete CRM to have finitely many atoms, and perform simulation and inference based on the truncated CRM. Exact truncation schemes based on auxiliary variables [21, 22, 23] are limited to models where certain tail probabilities can be computed exactly. Fixed truncation [24, 25, 26, 27], on the other hand, is easy to apply to any CRM-based model; but it involves an approximation with potentially unknown error. Although the approximation error of truncated CRM models has been thoroughly studied in past work [28, 29, 30, 27, 31], these results apply only to generative simulation—i.e., not inference—and do not apply to self-product CRMs that commonly appear in network models.

In this work, we provide tractable methods for both generative simulation and posterior inference of discrete Bayesian nonparametric models based on truncation, as well as rigorous theoretical analyses of the error incurred by truncation in both scenarios. In particular, our theory and methods require only the ability to compute bounds on—instead of exact evaluation of—intractable tail probabilities. Our work focuses on the case of self-product-measure-based edge-exchangeable network sequences [3, 5, 32, 33], whose edges are simulated i.i.d. conditional on the discrete random product measure Θ×Θ\Theta\times\Theta, but the ideas here apply without much effort to the broader class of discrete Bayesian nonparametric models. As an intermediate step of possible independent interest, we also show that the nonzero rates generated from the rejection representation [34] of a Poisson process have the same distribution as the well-known but typically intractable inverse Lévy or Ferguson-Klass representation [35]. This provides a novel method for simulating the inverse Lévy representation, which has a wide variety of uses in applications of Poisson processes [36, 37, 38].

2 Background

2.1 Completely random measures and self-products

A completely random measure (CRM) Θ\Theta on Ψ\Psi is a random measure such that for any collection of KK\in\mathbb{N} disjoint measurable sets A1,,AKΨA_{1},...,A_{K}\subset\Psi, the values Θ(A1),,Θ(AK)\Theta(A_{1}),...,\Theta(A_{K}) are independent random variables [15]. In this work, we focus on discrete CRMs taking the form Θ=kθkδψk\Theta=\sum_{k}\theta_{k}\delta_{\psi_{k}}, where δx\delta_{x} is a Dirac measure on Ψ\Psi at location xΨx\in\Psi (i.e., δx(A)=1\delta_{x}(A)=1 if xAx\in A and 0 otherwise), and (θk,ψk)k=1(\theta_{k},\psi_{k})_{k=1}^{\infty} are a sequence of rates θk\theta_{k} and labels ψk\psi_{k} generated from a Poisson process on +×Ψ\mathbb{R}_{+}\times\Psi with mean measure ν(dθ)×L(dψ)\nu(\mathrm{d}\theta)\times L(\mathrm{d}\psi). Here LL is a diffuse probability measure, and ν\nu is a σ\sigma-finite measure satisfying ν(+)=\nu(\mathbb{R}_{+})=\infty, which guarantees that the Poisson process has countably infinitely many points almost surely. The space Ψ\Psi and distribution LL will not affect our analysis; thus as a shorthand, we write CRM(ν)\mathrm{CRM}(\nu) for the distribution of Θ\Theta:

Θ:=kθkδψkCRM(ν).\displaystyle\Theta:=\sum_{k}\theta_{k}\delta_{\psi_{k}}\sim\mathrm{CRM}(\nu). (2)

One can construct a multidimensional measure Θ(d)\Theta^{(d)} on Ψd\Psi^{d}, dd\in\mathbb{N} from Θ\Theta defined in Eq. 2 by taking its self-product. In particular, we define

Θ(d):=idϑiδζi,\displaystyle\Theta^{(d)}:=\sum_{i\in\mathbb{N}_{\neq}^{d}}\vartheta_{i}\delta_{\zeta_{i}}, ϑi:=j=1dθij,\displaystyle\vartheta_{i}:=\prod_{j=1}^{d}\theta_{i_{j}}, ζi:=(ψi1,ψi2,,ψid),\displaystyle\zeta_{i}:=(\psi_{i_{1}},\psi_{i_{2}},...,\psi_{i_{d}}), (3)

where ii is a dd-dimensional multi-index, and d\mathbb{N}_{\neq}^{d} is the set of such indices with all distinct components. Note that Θ(d)\Theta^{(d)} is no longer a CRM, as it does not satisfy the independence condition.

2.2 Series representations

To simulate a realization ΘCRM(ν)\Theta\sim\mathrm{CRM}(\nu)—e.g., as a first step in the simulation of a self-product measure Θ(d)\Theta^{(d)}—the rates θk\theta_{k} and labels ψk\psi_{k} may be generated in sequence using a series representation [39] of the CRM. In particular, we begin by simulating the jumps of a unit-rate homogeneous Poisson process (Γk)k=1(\Gamma_{k})_{k=1}^{\infty} on +\mathbb{R}_{+} in increasing order. For a given distribution gg on +\mathbb{R}_{+} and nonnegative measurable function τ:+×++\tau:\mathbb{R}_{+}\times\mathbb{R}_{+}\to\mathbb{R}_{+}, we set

Θ=k=1θkδψk,θk=τ(Uk,Γk),Uki.i.d.g,ψki.i.d.L.\displaystyle{\tiny}\Theta=\sum_{k=1}^{\infty}\theta_{k}\delta_{\psi_{k}},\quad\theta_{k}=\tau(U_{k},\Gamma_{k}),\quad U_{k}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}g,\quad\psi_{k}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}L. (4)

Depending on the particular choice of τ\tau and gg, one can construct several different series representations of a CRM [28]. For example, the inverse Lévy representation [35] has the form

θk=ν(Γk),ν(x):=inf{y:ν([y,))x}.\displaystyle\theta_{k}=\nu^{\leftarrow}(\Gamma_{k}),\quad\nu^{\leftarrow}(x):=\inf\left\{y:\nu\left(\left[y,\infty\right)\right)\leq x\right\}. (5)

In many cases, computing ν(x)\nu^{\leftarrow}(x) is intractable, making it hard to generate θk\theta_{k} in this manner. Alternatively, we can generate a series of rates from CRM(ν)\mathrm{CRM}(\nu) with the rejection representation [34], which has the form

θk=Tk𝟙(dνdμ(Tk)Uk),Tk=μ(Γk),Uki.i.d.Unif[0,1],\displaystyle\theta_{k}=T_{k}\mathds{1}\Big{(}\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(T_{k})\geq U_{k}\Big{)},\quad T_{k}=\mu^{\leftarrow}(\Gamma_{k}),\quad U_{k}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}\mathrm{Unif}[0,1], (6)

where μ\mu is a measure on +\mathbb{R}_{+} chosen such that dνdμ1\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\leq 1 uniformly and μ(x)\mu^{\leftarrow}(x) is easy to calculate in closed-form. While there are many other sequential representations of CRMs [28], the representations in Eqs. 4, 5 and 6 are broadly applicable and play a key role in our theoretical analysis.

2.3 Edge-exchangeable graphs

Self-product measures Θ(d)\Theta^{(d)} of the form Eq. 3 with d=2d=2 have recently been used as priors in a number of probabilistic network models [3, 4].111There are also network models based on self-product measure priors that do not generate edge-exchangeable sequences [8, 11]; it is likely that many of the techniques in the present work would extend to these models, but we leave the study of this to future work. The focus of the present work are those models that associate each ψk\psi_{k} with a vertex, each tuple ζi=(ψi1,,ψid)\zeta_{i}=(\psi_{i_{1}},\dots,\psi_{i_{d}}) with a (hyper)edge, and then build a growing sequence of networks by sequentially generating edges from Θ(d)\Theta^{(d)} in rounds n=1,,Nn=1,\dots,N. There are a number of choices for how to construct such a sequence. For example, in each round nn we may add multiple edges via an independent likelihood process XnLP(h,Θ(d))X_{n}\sim\mathrm{LP}(h,\Theta^{(d)}) defined by

Xn:=idxniδζi,\displaystyle X_{n}:=\sum_{i\in\mathbb{N}_{\neq}^{d}}x_{ni}\delta_{\zeta_{i}}, xniindeph(|ϑi),\displaystyle x_{ni}\overset{\textrm{\tiny{indep}}}{\sim}h(\cdot|\vartheta_{i}), (7)

where xni=kx_{ni}=k denotes that there were kk copies of edge ζi\zeta_{i} added at round nn, and h(|ϑ)h(\cdot|\vartheta) is a probability distribution on {0}\mathbb{N}\cup\{0\}. We denote the mean μ(ϑ):=k=0kh(k|ϑ)\mu(\vartheta):=\sum_{k=0}^{\infty}k\cdot h(k\,|\,\vartheta) and probability of 0 under hh to be π(ϑ):=h(0|ϑ)\pi(\vartheta):=h(0|\vartheta) for convenience. By the Slivnyak-Mecke theorem [40], if hh satisfies

+dμ(j=1dθj)j=1dν(dθj)<,\displaystyle\int_{\mathbb{R}_{+}^{d}}\mu\left(\prod_{j=1}^{d}\theta_{j}\right)\prod_{j=1}^{d}\nu(\mathrm{d}\theta_{j})<\infty, (8)

then finitely many edges are added to the graph in each round. We make this assumption throughout this work. Alternatively, if

+min(1,θ)ν(dθ)<,\displaystyle\int_{\mathbb{R}_{+}}\min(1,\theta)\nu(\mathrm{d}\theta)<\infty, (9)

then Ω:=Θ(d)(Ψd)<\Omega:=\Theta^{(d)}(\Psi^{d})<\infty, and we may add only a single edge per round nn via a categorical likelihood process XnCategorical(Θ(d))X_{n}\sim\mathrm{Categorical}(\Theta^{(d)}) defined by

Xn:=δζIn,\displaystyle X_{n}:=\delta_{\zeta_{I_{n}}}, InCategorical((ϑiΩ)id).\displaystyle I_{n}\sim\mathrm{Categorical}\left(\left(\frac{\vartheta_{i}}{\Omega}\right)_{i\in\mathbb{N}_{\neq}^{d}}\right). (10)

This construction has appeared in [4], where Θ\Theta follows a Dirichlet process, which can be seen as a normalized gamma process [41]. Note that our definition of Categorical(Θ(d))\mathrm{Categorical}(\Theta^{(d)}) involves simulating from the normalization; we use this definition to avoid introducing new notation for normalized processes.

Using either likelihood process, the edges in the network after NN rounds are

n=1NXn:=idxiδζi,xi:=n=1Nxni,\displaystyle\sum_{n=1}^{N}X_{n}:=\sum_{i\in\mathbb{N}_{\neq}^{d}}x_{i}\delta_{\zeta_{i}},\qquad x_{i}:=\sum_{n=1}^{N}x_{ni}, (11)

i.e., xi{0}x_{i}\in\mathbb{N}\cup\{0\} represents the count of edge ii after NN rounds.

There are three points to note about this formulation. First, since the atom locations ζi\zeta_{i} are not used, we can represent the network using only its array of edge counts

EN:=(xi)id𝒩d,\displaystyle E_{N}:=(x_{i})_{i\in\mathbb{N}_{\neq}^{d}}\in\mathcal{N}_{d}, (12)

where 𝒩d\mathcal{N}_{d} denotes the set of integer arrays indexed by d\mathbb{N}_{\neq}^{d} with finitely many nonzero entries. Note that 𝒩d\mathcal{N}_{d} is a countable set. Second, by construction, the distribution of ENE_{N} is invariant to reorderings of the arrival of edges, and thus the network is edge-exchangeable [3, 5, 6, 7]. Finally, note that the network ENE_{N} as formulated in Eq. 12 is in general a directed multigraph with no self-loops (due to the restriction to indices idi\in\mathbb{N}_{\neq}^{d} rather than d\mathbb{N}^{d}). Although the main theoretical results in this work are developed in this setting, we provide an additional result in Section 3.1 to translate to other common network structures (e.g. binary undirected networks).

3 Truncated generative simulation

In this section, we consider the tractable generative simulation of network models via truncation, and analyze the approximation error incurred in doing so as a function of KK\in\mathbb{N} (the truncation level) and number of rounds of generation NN\in\mathbb{N}. In particular, to construct a truncated self-product measure, we first split the underlying CRM Θ\Theta into a truncation and tail component,

Θ=ΘK+ΘK+,\displaystyle\Theta=\Theta_{K}+\Theta_{K+}, ΘK=k=1Kθkδψk,\displaystyle\Theta_{K}=\sum_{k=1}^{K}\theta_{k}\delta_{\psi_{k}}, ΘK+=k=K+1θkδψk,\displaystyle\Theta_{K+}=\sum_{k=K+1}^{\infty}\theta_{k}\delta_{\psi_{k}}, (13)

and construct the self-product ΘK(d)\Theta_{K}^{(d)} from the truncation ΘK\Theta_{K} as in Eq. 3. Fig. 1 provides an illustration of the truncation of Θ\Theta and Θ(2)\Theta^{(2)}, showing that Θ(2)\Theta^{(2)} can be decomposed into a sum of four parts,

Θ(2)=Θ2=(ΘK+ΘK+)2=ΘK(2)+(ΘK×ΘK++ΘK+×ΘK+ΘK+2).\displaystyle\Theta^{(2)}=\Theta^{2}=(\Theta_{K}+\Theta_{K+})^{2}=\Theta_{K}^{(2)}+\left(\Theta_{K}\times\Theta_{K+}+\Theta_{K+}\times\Theta_{K}+\Theta_{K+}^{2}\right). (14)

Thus, while we only discard ΘK+\Theta_{K+} in truncating Θ\Theta to ΘK\Theta_{K}, we discard three parts in truncating ΘK(2)\Theta_{K}^{(2)} to ΘK(2)\Theta_{K}^{(2)}; and in general, we discard 2d12^{d}-1 parts of Θ(d)\Theta^{(d)} when truncating it to ΘK(d)\Theta_{K}^{(d)}. We therefore intuitively might expect higher truncation error when approximating ΘK(d)Θ(d)\Theta_{K}^{(d)}\approx\Theta^{(d)} than when approximating ΘKΘ\Theta_{K}\approx\Theta; in Sections 3.2 and 3.3, we will show that this is indeed the case.

Refer to caption
(a) Truncation of Θ\Theta
Refer to caption
(b) Truncation of Θ(2)\Theta^{(2)}
Figure 1: An illustration of the difference between truncation of CRMs (d=1d=1) and self-product CRMs with d=2d=2. Intuitively, increasing dd means that a higher proportion of mass is discarded in the truncation process.

Once the measure is truncated, a truncated network—based on ΘK(d)\Theta_{K}^{(d)}—can be constructed in the same manner as the original network using the independent likelihood process Eq. 7 or categorical likelihood process Eq. 10. We denote EN,K=(xi,K)id𝒩dE_{N,K}=(x_{i,K})_{i\in\mathbb{N}^{d}_{\neq}}\in\mathcal{N}_{d} to be the corresponding edge set of the truncated network up round NN, where xi,K=0x_{i,K}=0 for any index idi\in\mathbb{N}^{d}_{\neq} such that some component ij>Ki_{j}>K. We keep ENE_{N} and EN,KE_{N,K} in the same space in order to compare their distributions in Sections 3.1, 3.2 and 3.3.

3.1 Truncation error bound

We formulate the approximation error incurred by truncation as the total variation distance between the marginal network distributions, i.e., of ENE_{N} and EN,KE_{N,K}. The first step in the analysis of truncation error—provided by Lemma 3.1—is to show that this is bounded above by the probability that there are edges in the full network ENE_{N} involving vertices beyond the truncation level KK. To this end, we denote the maximum vertex index of ENE_{N} to be

IN:=maxid(maxj[d]ij)s.t.xi>0,\displaystyle I_{N}:=\max_{i\in\mathbb{N}_{\neq}^{d}}\left(\max_{j\in[d]}\,\,i_{j}\right)\quad\text{s.t.}\quad x_{i}>0, (15)

and note that by definition, INKI_{N}\leq K if and only if all edges in ENE_{N} fall in the truncated region.

Lemma 3.1.

Let Θ=k=1θkδψk\Theta=\sum_{k=1}^{\infty}\theta_{k}\delta_{\psi_{k}} be a random discrete measure, and ΘK=k=1Kθkδψk\Theta_{K}=\sum_{k=1}^{K}\theta_{k}\delta_{\psi_{k}} be its truncation to KK atoms. Let Θ(d)\Theta^{(d)} be the self-product of Θ\Theta, and ΘK(d)\Theta^{(d)}_{K} be the self-product of ΘK\Theta_{K}. Let PNP_{N} and PN,KP_{N,K} be the marginal distributions of edge sets EN,EN,K𝒩dE_{N},E_{N,K}\in\mathcal{N}_{d} under either the independent or categorical likelihood process. Then

DTV(PN,PN,K)1(INK).\displaystyle\mathrm{D_{TV}}\left(P_{N},P_{N,K}\right)\leq 1-\mathbb{P}\left(I_{N}\leq K\right). (16)

As mentioned in Section 2.3, the network ENE_{N} is in general a directed multigraph with no self loops. However, Lemma 3.1—and the downstream truncation error bounds presented in Sections 3.2 and 3.3—also apply to any graph EN=(xi)idE^{\prime}_{N}=(x^{\prime}_{i})_{i\in\mathbb{N}^{d}_{\neq}} that is a function of the original graph EN=f(EN)E^{\prime}_{N}=f(E_{N}) such that truncation commutes with the function, i.e., EN,K=f(EN,K)E^{\prime}_{N,K}=f(E_{N,K}). For example, to obtain a truncation error bound for the common setting of undirected binary graphs, we generate the directed multigraph ENE_{N} as above and construct the undirected binary graph ENE^{\prime}_{N} via

xi=𝟙xi>0𝟙i1<i2<<id,id.\displaystyle x^{\prime}_{i}=\mathds{1}_{x_{i}>0}\cdot\mathds{1}_{i_{1}<i_{2}<\dots<i_{d}},\quad i\in\mathbb{N}^{d}_{\neq}. (17)

Corollary 3.2 provides the precise statement of the result; note that the bound is identical to that from Lemma 3.1.

Corollary 3.2.

Let EN:=(xi)id𝒩dE^{\prime}_{N}:=(x^{\prime}_{i})_{i\in\mathbb{N}^{d}_{\neq}}\in\mathcal{N}_{d} be the set of edges of a network with truncation EN,K𝒩dE^{\prime}_{N,K}\in\mathcal{N}_{d}, and denote their marginal distributions PNP^{\prime}_{N} and PN,KP^{\prime}_{N,K}. If there exists a measurable function ff such that

EN=f(EN)andEN,K=f(EN,K),\displaystyle E^{\prime}_{N}=f(E_{N})\qquad\text{and}\qquad E^{\prime}_{N,K}=f(E_{N,K}), (18)

then

DTV(PN,PN,K)1(INK).\displaystyle\mathrm{D_{TV}}\left(P^{\prime}_{N},P^{\prime}_{N,K}\right)\leq 1-\mathbb{P}\left(I_{N}\leq K\right). (19)

3.2 Independent likelihood process

We now specialize Lemma 3.1 to the setting where Θ\Theta is a CRM generated by a series representation of the form Eq. 4, and the network is generated via the independent likelihood process from Eq. 7. As a first step towards a bound on the truncation error for general hypergraphs with d>1d>1 in Theorem 3.4, we present a simpler corollary in the case where d=2d=2, which is of direct interest in analyzing the truncation error of popular edge-exchangeable networks.

Corollary 3.3.

In the setting of Lemma 3.1, suppose Θ\Theta is a CRM generated from the series representation Eq. 4, edges are generated from the independent likelihood process Eq. 7, and d=2Kd=2\leq K. Then

(INK)exp(NBK),\displaystyle\mathbb{P}\left(I_{N}\leq K\right)\geq\exp\left(-N\cdot B_{K}\right), (20)

where

BK\displaystyle B_{K} =BK,1+BK,2+BK,3\displaystyle=B_{K,1}+B_{K,2}+B_{K,3} (21)
BK,1\displaystyle B_{K,1} =𝔼[+2logπ(τ(U1,γ1+ΓK)τ(U2,γ2+ΓK))dγ1dγ2]\displaystyle=\mathbb{E}\left[\int_{\mathbb{R}_{+}^{2}}-\log\pi\left(\tau(U_{1},\gamma_{1}+\Gamma_{K})\tau(U_{2},\gamma_{2}+\Gamma_{K})\right)\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}\right] (22)
BK,2\displaystyle B_{K,2} =2𝔼[0ΓK(K1)ΓKΓKlogπ(τ(U1,γ1)τ(U2,γ2))dγ1dγ2]\displaystyle=2\mathbb{E}\left[\int_{0}^{\Gamma_{K}}\frac{(K-1)}{\Gamma_{K}}\int_{\Gamma_{K}}^{\infty}-\log\pi\left(\tau(U_{1},\gamma_{1})\tau(U_{2},\gamma_{2})\right)\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}\right] (23)
BK,3\displaystyle B_{K,3} =2𝔼[+logπ(τ(U1,ΓK)τ(U2,γ+ΓK))dγ].\displaystyle=2\mathbb{E}\left[\int_{\mathbb{R}_{+}}-\log\pi\left(\tau(U_{1},\Gamma_{K})\tau(U_{2},\gamma+\Gamma_{K})\right)\mathrm{d}\gamma\right]. (24)

The proof of this result (and Theorem 3.4 below) in Appendix B follows by representing the tail of the CRM as a unit-rate Poisson process on [ΓK,)[\Gamma_{K},\infty). Though perhaps complicated at first glance, an intuitive interpretation of the truncation error terms BK,iB_{K,i} is provided by Fig. 1(b). BK,1B_{K,1} corresponds to the truncation error arising from the upper right quadrant, where both vertices participating in the edge were in the discarded tail region. BK,2B_{K,2} is the truncation error arising from the bottom right and upper left quadrants, where one of the two vertices participating in the edge was in the truncation, and the other was in the tail. Finally, BK,3B_{K,3} represents the truncation error arising from edges in which one vertex was at the boundary of tail and truncation, and the other was in the tail.

Theorem 3.4 is the generalization of Corollary 3.3 from d=2d=2 to the general setting of arbitrary hypergraphs with d>1d>1. The bound is analogous to that in Corollary 3.3—with BKB_{K} expressed as a sum of terms, each corresponding to whether vertices were in the tail, boundary, or truncation region—except that there are d>1d>1 vertices participating in each edge, resulting in more terms in the sum. Note that Theorem 3.4 also guarantees that the bound is not vacuous, and indeed converges to 0 as the truncation level KK\rightarrow\infty as expected.

Theorem 3.4.

In the setting of Lemma 3.1, suppose Θ\Theta is a CRM generated from the series representation Eq. 4, edges are generated from the independent likelihood process Eq. 7, and 1<dK1<d\leq K. Then

(INK)exp(NBK),\displaystyle\mathbb{P}\left(I_{N}\leq K\right)\geq\exp\left(-N\cdot B_{K}\right), (25)

where

BK\displaystyle B_{K} =BK,1+BK,2+BK,3,\displaystyle=B_{K,1}+B_{K,2}+B_{K,3}, (26)
BK,1\displaystyle B_{K,1} =𝔼[[ΓK,)dlogπ(θ~)dγ]\displaystyle=\mathbb{E}\left[\int_{[\Gamma_{K},\infty)^{d}}-\log\pi(\tilde{\theta})\mathrm{d}\gamma\right] (27)
BK,2\displaystyle B_{K,2} ==1d1(d)𝔼[(K1)!(K1)!ΓK[0,ΓK]×[ΓK,)dlogπ(θ~)dγ]\displaystyle=\sum_{\ell=1}^{d-1}\binom{d}{\ell}\mathbb{E}\!\left[\frac{(K-1)\,!}{(K-1-\ell)\,!}\Gamma_{K}^{-\ell}\!\!\int_{[0,\Gamma_{K}]^{\ell}\times[\Gamma_{K},\infty)^{d-\ell}}\!\!-\log\pi(\tilde{\theta})\mathrm{d}\gamma\right] (28)
BK,3\displaystyle B_{K,3} ==1d1(d)𝔼[(K1)!(K)!ΓK(1)[0,ΓK]×[ΓK,)dδγ=ΓKlogπ(θ~)dγ],\displaystyle=\sum_{\ell=1}^{d-1}\ell\binom{d}{\ell}\mathbb{E}\!\left[\frac{(K-1)\,!}{(K-\ell)\,!}\Gamma_{K}^{-(\ell-1)}\!\!\int_{[0,\Gamma_{K}]^{\ell}\times[\Gamma_{K},\infty)^{d-\ell}}\!\!-\delta_{\gamma_{\ell}=\Gamma_{K}}\log\pi(\tilde{\theta})\mathrm{d}\gamma\right], (29)

δ\delta_{\cdot} is the Dirac delta, dγ:=j=1ddγj\mathrm{d}\gamma:=\prod_{j=1}^{d}\mathrm{d}\gamma_{j}, and θ~:=j=1dτ(Uj,γj)\tilde{\theta}:=\prod_{j=1}^{d}\tau(U_{j},\gamma_{j}). Furthermore, limKBK=0\emph{lim}_{K\rightarrow\infty}B_{K}=0.

The same geometric intuition from the d=2d=2-dimensional case applies to the general hypergraph truncation error in Eq. 26. BK,1B_{K,1} corresponds to the error arising from the edges whose vertices all belong to the tail region ΘK+\Theta_{K+}. Each term in the summation in BK,2B_{K,2} corresponds to the error arising from edges that have \ell out of dd vertices belonging to the truncation ΘK\Theta_{K}. Each term in the summation in BK,3B_{K,3} corresponds to the error arising from the edges that have 1\ell-1 out of dd vertices belonging to the truncation ΘK\Theta_{K} and have one vertex exactly on the boundary of the truncation. Note that we obtain Corollary 3.3 by taking d=2d=2 in Eq. 26.

3.3 Categorical likelihood process

We may also specialize Lemma 3.1 to the setting where the network is generated via the single-edge-per-step categorical likelihood process in Eq. 10. However, truncation with the categorical likelihood process poses a few key challenges. From a practical angle, certain choices of series representation for generating Θ\Theta may be problematic. For instance, when using the rejection representation Eq. 6 in the typical case where μν\mu\neq\nu, there is a nonzero probability that

k=1K𝟙[θk>0]<d,\displaystyle\sum_{k=1}^{K}\mathds{1}\left[\theta_{k}>0\right]<d, (30)

meaning there aren’t enough accepted vertices in the truncation to generate a single edge. In this case, the categorical likelihood process—which must generate exactly one edge per step—is ill-defined. An additional theoretical challenge arises from the normalization of the original and truncated networks in Eq. 10, which prevents the use of the usual theoretical tools for analyzing CRMs.

Fortunately, the inverse Lévy representation provides an avenue to address both issues. The rates θk\theta_{k} are all guaranteed to be nonzero—meaning as long as KdK\geq d, the categorical likelihood process is well-defined—and are decreasing, which enables our theoretical analysis in Section B.1. However, as mentioned earlier, the inverse Lévy representation is well-known to be intractable to use in most practical settings.

Theorem 3.5 provides a solution: we use the rejection representation to simulate the rates θk\theta_{k}, but instead of simulating for iterations k=1,,Kk=1,\dots,K, we simulate until we obtain KK nonzero rates. This is no longer a sample of a truncated rejection representation; but Theorem 3.5 shows that the first KK nonzero rates have the same distribution as simulating KK iterations of the inverse Lévy representation. Therefore, we can tailor the analysis of truncation error for the categorical likelihood process in Theorem 3.6 to the inverse Lévy representation, and simulate its truncation for any KK using the tractable rejection representation in practice.

Theorem 3.5.

Let θ1,,θK\theta_{1},\dots,\theta_{K} be the first KK rates from the inverse Lévy representation of a CRM, and let θ1,,θK\theta^{\prime}_{1},\dots,\theta^{\prime}_{K} be the first KK nonzero rates from any rejection representation of the same CRM. Then

(θ1,,θK)=𝑑(θ1,,θK).\displaystyle(\theta_{1},\dots,\theta_{K})\overset{d}{=}(\theta^{\prime}_{1},\dots,\theta^{\prime}_{K}). (31)
Theorem 3.6.

In the setting of Lemma 3.1, suppose Θ\Theta is a CRM generated from the inverse Lévy representation Eq. 5, edges are generated from the categorical likelihood process Eq. 10, and 1<dK1<d\leq K. Then

(INK)(1BK)Nd0,\displaystyle\mathbb{P}\left(I_{N}\leq K\right)\geq(1-B_{K})^{Nd}\geq 0, (32)

where

BK:=𝔼[Q(ΓK,x)(01Q(ΓKu,x)du)Kd(ddxe0Q(ΓK+γ,x)1dγ)dx],\displaystyle B_{K}:=\mathbb{E}\left[\int_{-\infty}^{\infty}Q(\Gamma_{K},x)\left(\int_{0}^{1}Q(\Gamma_{K}u,x)\mathrm{d}u\right)^{K-d}\left(\frac{\mathrm{d}}{\mathrm{d}x}e^{\int_{0}^{\infty}Q(\Gamma_{K}+\gamma,\,x)-1\,\mathrm{d}\gamma}\right)\mathrm{d}x\right], (33)

and

Q(u,t)=exp(ν(u)et)andΓK𝖦𝖺𝗆(K,1).\displaystyle Q(u,t)=\exp\left(-\nu^{\leftarrow}(u)e^{-t}\right)\quad\text{and}\quad\Gamma_{K}\sim{\sf{Gam}}(K,1). (34)

Furthermore, limKBK=0\lim_{K\rightarrow\infty}B_{K}=0.

3.4 Examples

We now apply the results of this section to binary undirected networks (d=2d=2) constructed via Eq. 17 from common edge-exchangeable networks [3]. We derive the convergence rate of truncation error, and provide simulations of the expected number of edges and vertices under the infinite and truncated network. In each simulation we use the rejection representation to simulate Θ\Theta, and run N=10,000N=10,000 steps of network construction.

Beta-Bernoulli process network

Suppose Θ\Theta is generated from a beta process, and network ENE_{N} is generated using the independent Bernoulli likelihood process [3]. The beta process BP(γ,λ,α)\mathrm{BP}(\gamma,\lambda,\alpha) [42] with discount parameter α[0,1)\alpha\in[0,1), concentration parameter λ>α\lambda>-\alpha, and mass parameter γ>0\gamma>0 is a CRM with rate measure

ν(dθ)=γΓ(λ+1)Γ(1α)Γ(λ+α)𝟙[θ1]θ1α(1θ)λ+α1dθ.\displaystyle\nu(\mathrm{d}\theta)=\gamma\frac{\Gamma(\lambda+1)}{\Gamma(1-\alpha)\Gamma(\lambda+\alpha)}\mathds{1}[\theta\leq 1]\theta^{-1-\alpha}(1-\theta)^{\lambda+\alpha-1}\mathrm{d}\theta. (35)

The Bernoulli likelihood has the form

h(x|θ)=𝟙[x1]θx(1θ)1x.\displaystyle h(x|\theta)=\mathds{1}[x\leq 1]\theta^{x}(1-\theta)^{1-x}. (36)

To simulate the process, we use a proposal rate measure μ\mu given by

μ(dθ)=γ𝟙[θ1]θ1αdθ,γ=γΓ(λ+1)Γ(1α)Γ(λ+α).\displaystyle\mu(\mathrm{d}\theta)=\gamma^{\prime}\mathds{1}\left[\theta\leq 1\right]\theta^{-1-\alpha}\mathrm{d}\theta,\quad\gamma^{\prime}=\gamma\frac{\Gamma(\lambda+1)}{\Gamma(1-\alpha)\Gamma(\lambda+\alpha)}. (37)

Dense network: When α=0\alpha=0, the binary beta-Bernoulli graph is dense and

μ(u)=eu/γ,\displaystyle\mu^{\leftarrow}(u)=e^{-u/\gamma^{\prime}}, dνdμ=(1θ)λ1.\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1-\theta)^{\lambda-1}. (38)

Therefore the rejection representation Eq. 6 of BP(γ,λ,0)\mathrm{BP}(\gamma,\lambda,0) can be written as

θk=Tk𝟙(Uk(1Tk)λ1),Tk=eΓk/γ.\displaystyle\theta_{k}=T_{k}\mathds{1}\left(U_{k}\leq(1-T_{k})^{\lambda-1}\right),\quad T_{k}=e^{-\Gamma_{k}/\gamma^{\prime}}. (39)

In Section C.2, we show that there exists K0K_{0}\in\mathbb{N} such that

KK0,BK12γ(1e1)λ2(γ1+γ)K.\displaystyle\forall K\geq K_{0},\quad B_{K}\leq 12\gamma(1-e^{-1})^{\lambda-2}\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}. (40)

This implies that the truncation error of the dense binary beta-Bernoulli network converges to 0 geometrically in KK. Fig. 2(a) corroborates this result in simulation with λ=2\lambda=2 and γ=1\gamma=1; it can be seen that for the dense beta-Bernoulli graph, truncated graphs with relatively low truncation level—in this case, K50K\approx 50—approximate the true network model well.

Refer to caption
(a) Dense graph with α=0\alpha=0
Refer to caption
(b) Sparse graph with α=0.6\alpha=0.6
Figure 2: Beta-independent Bernoulli network

Sparse network: When α(0,1)\alpha\in(0,1), the binary beta-Bernoulli graph is sparse and

μ(u)=(1+αuγ)1/α,\displaystyle\mu^{\leftarrow}(u)=\left(1+\frac{\alpha u}{\gamma^{\prime}}\right)^{-1/\alpha}, dνdμ=(1θ)λ+α1.\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1-\theta)^{\lambda+\alpha-1}. (41)

Therefore the rejection representation Eq. 6 of BP(γ,λ,α)\mathrm{BP}(\gamma,\lambda,\alpha) can be written as

θk=Tk𝟙(Uk(1Tk)λ+α1),\displaystyle\theta_{k}=T_{k}\mathds{1}\left(U_{k}\leq(1-T_{k})^{\lambda+\alpha-1}\right), Tk=(1+αΓk/γ)1/α.\displaystyle T_{k}=(1+\alpha\Gamma_{k}/\gamma^{\prime})^{-1/\alpha}. (42)

In Section C.2, we show that there exists K0K_{0}\in\mathbb{N} such that

KK0,BK6α(γα1)1/αeγα1(K1)α1α.\displaystyle\forall K\geq K_{0},\quad B_{K}\leq 6\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ (K-1)^{\frac{\alpha-1}{\alpha}}. (43)

This bound suggests that the truncation error for the sparse binary beta-Bernoulli network converges to 0 much more slowly than for the dense graph. Fig. 2(b) corroborates this result in simulation with λ=2\lambda=2, γ=1\gamma=1, and α=0.6\alpha=0.6; it can be seen that for the sparse beta-Bernoulli graph, truncated graphs behave significantly differently from the true graph for moderate truncation levels. In practice, one should select an appropriate (large) value of KK using our error bounds as guidance, and use sparse data structures to avoid undue memory burden.

Gamma-independent Poisson network

Next, consider the network with Θ\Theta generated from a gamma process, and the network ENE_{N} generated using the independent Poisson likelihood process. The gamma process ΓP(γ,λ,α)\mathrm{\Gamma P}(\gamma,\lambda,\alpha) [43] with discount parameter α[0,1)\alpha\in[0,1), scale parameter λ>0\lambda>0 and mass parameter γ>0\gamma>0 has rate measure

ν(dθ)=γλ1αΓ(1α)θα1eλθdθ.\displaystyle\nu(\mathrm{d}\theta)=\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\theta^{-\alpha-1}e^{-\lambda\theta}\mathrm{d}\theta. (44)

The Poisson likelihood has the form

h(x|θ)=θxx!eθ.\displaystyle h(x|\theta)=\frac{\theta^{x}}{x!}e^{-\theta}. (45)

Dense network: When α=0\alpha=0, the gamma-Poisson graph is dense, and we choose the proposal measure μ\mu to be

μ(dθ)=γλθ1(1+λθ)1dθ,\displaystyle\mu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}(1+\lambda\theta)^{-1}\mathrm{d}\theta, (46)

such that

μ(u)=1/(λ(e(γλ)1u1)),dνdμ=(1+λθ)eλθ.\displaystyle\mu^{\leftarrow}(u)=1/\left(\lambda\left(e^{(\gamma\lambda)^{-1}u}-1\right)\right),\qquad\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1+\lambda\theta)e^{-\lambda\theta}. (47)

Therefore, the rejection representation in Eq. 6 has the form

θk=Tk𝟙(Uk(1+λTk)eλTk),\displaystyle\theta_{k}=T_{k}\mathds{1}\left(U_{k}\leq(1+\lambda T_{k})e^{-\lambda T_{k}}\right), Tk=1λ(e(γλ)1Γk1).\displaystyle T_{k}=\frac{1}{\lambda\left(e^{(\gamma\lambda)^{-1}\Gamma_{k}}-1\right)}. (48)

In Section C.3, we show that there exists K0K_{0}\in\mathbb{N} such that

KK0,BK6γλ(γλ1+γλ)K1.\displaystyle\forall K\geq K_{0},\quad B_{K}\leq 6\frac{\gamma}{\lambda}\left(\frac{\gamma\lambda}{1+\gamma\lambda}\right)^{K-1}. (49)

Again, for the dense network BKB_{K} converges to 0 geometrically, indicating that truncation may provide reasonable approximations to the original network. Fig. 3(a) corroborates this result in simulation with λ=2\lambda=2 and γ=1\gamma=1; for K50K\approx 50, no vertices are discarded on average by truncation.

Refer to caption
(a) Dense graph with α=0\alpha=0
Refer to caption
(b) Sparse graph with α=0.6\alpha=0.6
Figure 3: Gamma-independent Poisson graph

Sparse network: When α(0,1)\alpha\in(0,1), the gamma-Poisson graph is sparse, and we choose the proposal measure μ\mu to be

μ(dθ)=γλ1αΓ(1α)θ1αdθ,\displaystyle\mu(\mathrm{d}\theta)=\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\theta^{-1-\alpha}\mathrm{d}\theta, (50)

such that

μ(u)=(γu1)1/d,γ:=γλ1ααΓ(1α),dνdμ=eλθ.\displaystyle\mu^{\leftarrow}(u)=(\gamma^{\prime}u^{-1})^{1/d},\quad\gamma^{\prime}:=\gamma\frac{\lambda^{1-\alpha}}{\alpha\Gamma(1-\alpha)},\quad\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=e^{-\lambda\theta}. (51)

Therefore the rejection representation in Eq. 6 has the form

θk=Tk𝟙(UkeλTk),Tk=(γΓk1)1/α.\displaystyle\theta_{k}=T_{k}\mathds{1}\left(U_{k}\leq e^{-\lambda T_{k}}\right),\quad T_{k}=\left(\gamma^{\prime}\Gamma_{k}^{-1}\right)^{1/\alpha}. (52)

In Section C.3, we show there exists K0K_{0}\in\mathbb{N} such that

KK0,BK12γ2λ1α(1α)Γ(1α)(3γK1)1αα.\displaystyle\forall K\geq K_{0},\quad B_{K}\leq\frac{12\gamma^{2}\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\frac{3\gamma^{\prime}}{K-1}\right)^{\frac{1-\alpha}{\alpha}}. (53)

Again, for the sparse network BKB_{K} converges to 0 slowly, suggesting that the truncation error for the sparse binary gamma-Poisson graph converges more slowly than for the dense graph. Fig. 3(b) corroborates this result in simulation with λ=2\lambda=2, γ=1\gamma=1, and α=0.6\alpha=0.6; for a moderate range of truncation values K100K\leq 100, the truncated graph behaves very differently from the true graph. Therefore in practice, one should follow the above guidance for the beta-Bernoulli network: select a value of KK using our error bounds, and avoid intractable memory requirements by using sparse data structures.

4 Truncated posterior inference

In this section, we develop a tractable approximate posterior inference method for network models via truncation, and analyze its approximation error. In particular, given an observed network ENE_{N}, we want to simulate from the posterior distribution of the CRM rates and the parameters of the CRM rate measure. Since exact expressions of full conditional densities are not available, we truncate the model and run an approximate Markov chain Monte Carlo algorithm. We provide a rigorous theoretical justification for this simple approach by establishing a bound on the total variation distance between the truncated and exact posterior distribution. This in turn provides a method to select the truncation level in a principled manner.

Although this section focuses on self-product-CRM-based network models, the method and theory we develop are both general and extend easily to other CRM-based models, e.g. for clustering [44], feature allocation [45], and trait allocation [6]. In particular, the methodology only requires bounds on tail occupancy probabilities (e.g., in the present context, the probability that INKI_{N}\leq K) rather than the exact evaluation of these quantities.

4.1 Truncation error bound

We begin by examining the density of the posterior distribution of the KthK^{\text{th}} rate from the inverse Lévy representation θK\theta_{K} for some fixed KK\in\mathbb{N}, the unordered collection of rates (θk)k=1K1(\theta_{k})_{k=1}^{K-1} such that θkθK\theta_{k}\geq\theta_{K}, and the parameters σm\sigma\in\mathbb{R}^{m} of the CRM rate measure νσ\nu_{\sigma} given the observed set of edges ENE_{N}. We denote νσ(θ)\nu_{\sigma}(\theta) to be the density of νσ(dθ)\nu_{\sigma}(\mathrm{d}\theta) and p(σ)p(\sigma) to be the prior density of σ\sigma, both with respect to the Lebesgue measure. Given these definitions the posterior density can be expressed as

p(σ,θ1:K,X1:N)p(σ)eνσ[θK,)k=1K𝟙[θKθk]νσ(θk)\displaystyle p(\sigma,\theta_{1:K},X_{1:N})\propto p(\sigma)\cdot e^{-\nu_{\sigma}[\theta_{K},\infty)}\cdot\prod_{k=1}^{K}\mathds{1}[\theta_{K}\leq\theta_{k}]\nu_{\sigma}(\theta_{k}) (54)
(n=1Ni[K]dh(xni|j=1dθij))p(x1:NK+|θ1:K,σ),\displaystyle\hskip 85.35826pt\cdot\left(\prod_{n=1}^{N}\prod_{i\in[K]^{d}_{\neq}}h(x_{ni}\,|\,{\scriptstyle\prod_{j=1}^{d}}\theta_{i_{j}})\right)\cdot p(x_{1:N}^{K+}|\theta_{1:K},\sigma), (55)

where [K]d[K]^{d}_{\neq} is the subset of d\mathbb{N}^{d}_{\neq} such that maxj[d]ijK\max_{j\in[d]}i_{j}\leq K, and x1:NK+x_{1:N}^{K+} is shorthand for the set of (xni)n[N],id(x_{ni})_{n\in[N],i\in\mathbb{N}_{\neq}^{d}} such that i[K]di\notin[K]^{d}_{\neq}. All the factors on the first row correspond to the prior of σ\sigma and (θk)k=1K(\theta_{k})_{k=1}^{K}, and the first factor on the second row corresponds to the likelihood of the portion of the network involving only vertices 1K1\dots K; these are straightforward to evaluate, though νσ[θK,)\nu_{\sigma}[\theta_{K},\infty) will occasionally require a 1-dimensional numerical integration. The last factor corresponds to the likelihood of the portion of the network involving vertices beyond KK, and is not generally possible to evaluate exactly.

To handle this term, suppose that KK is large enough that x1:NK+=0x_{1:N}^{K+}=0. Then p(x1:NK+|θ1:K,σ)=(INK|Γ1:K,σ)p(x_{1:N}^{K+}|\theta_{1:K},\sigma)=\mathbb{P}\left(I_{N}\leq K\,|\,\Gamma_{1:K},\sigma\right), i.e., the probability that no portion of the network involves vertices of index >K>K. Theorem 4.1 provides upper and lower bounds on this probability akin to those of Theorem 3.4—indeed, Theorem 4.1 is an intermediate step in the proof of Theorem 3.4—that apply conditionally on the values of U1:KU_{1:K}, Γ1:K\Gamma_{1:K} rather than marginally. This theorem also makes the dependence of the bound on the rate measure parameters σ\sigma notationally explicit via parametrized series representation components τσ\tau_{\sigma} and gσg_{\sigma} from Eq. 4. Finally, though Theorem 4.1 applies to general series representations, we require only the specific instantiation for the inverse Lévy representation in the present context.

Theorem 4.1.

The conditional probability (INK|U1:K,Γ1:K,σ)\mathbb{P}\left(I_{N}\leq K\,|\,U_{1:K},\Gamma_{1:K},\sigma\right) satisfies

1(INK|U1:K,Γ1:K,σ)exp(NB(U1:K,Γ1:K,σ)),\displaystyle 1\geq\mathbb{P}\left(I_{N}\leq K\,|\,U_{1:K},\Gamma_{1:K},\sigma\right)\geq\exp\left(-N\cdot B(U_{1:K},\Gamma_{1:K},\sigma)\right), (56)

where

B(U1:K,Γ1:K,σ)\displaystyle B(U_{1:K},\Gamma_{1:K},\sigma) ==0d1(d)[K]||=B(U1:K,Γ1:K,σ,)\displaystyle=\sum_{\ell=0}^{d-1}\binom{d}{\ell}\sum_{\begin{subarray}{c}\mathcal{L}\subseteq[K]\\ |\mathcal{L}|=\ell\end{subarray}}B(U_{1:K},\Gamma_{1:K},\sigma,\mathcal{L}) (57)
B(U1:K,Γ1:K,σ,)\displaystyle B(U_{1:K},\Gamma_{1:K},\sigma,\mathcal{L}) =[ΓK,)d𝔼[logπ(jθjj=1dτσ(Uj,γj))|θ1:K]dγ,\displaystyle=\int_{[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\mathbb{E}\left[-\log\pi\left(\prod_{j\in\mathcal{L}}\theta_{j}\prod_{j=1}^{d-\ell}\tau_{\sigma}(U^{\prime}_{j},\gamma_{j})\right)\,|\,\theta_{1:K}\right]\mathrm{d}\gamma, (58)

where dγ=j=1ddγj\mathrm{d}\gamma=\prod_{j=1}^{d-\ell}\mathrm{d}\gamma_{j} and Uji.i.d.gσU^{\prime}_{j}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}g_{\sigma}.

Since U1:KU_{1:K} is unused in the inverse Lévy representation, in the present context we use the notation B(Γ1:K,σ)B(\Gamma_{1:K},\sigma) for brevity. The bound in Theorem 4.1 implies that as long as KK is set large enough such that both x1:NK+=0x_{1:N}^{K+}=0 and B(Γ1:K,σ)ϵ/NB(\Gamma_{1:K},\sigma)\leq\epsilon/N for some ϵ>0\epsilon>0 then

1ϵp(x1:NK+|θ1:K,σ)1.\displaystyle 1-\epsilon\leq p(x_{1:N}^{K+}|\theta_{1:K},\sigma)\leq 1. (59)

Therefore as KK increases, this term should become 1\approx 1 and have a negligible effect on the posterior density. We use this intuition to propose a truncated Metropolis–Hastings algorithm that sets K>INK>I_{N}, ignores the p(x1:NK+|θ1:K,σ)p(x_{1:N}^{K+}|\theta_{1:K},\sigma) term in the acceptance ratio, and fixes xNK+x_{N}^{K+} to 0. Theorem 4.2 provides a rigorous analysis of the error involved in using the truncated sampler.

Theorem 4.2.

Fix K>INK>I_{N}. Let Π\Pi be the distribution of σ,θ1:K\sigma,\theta_{1:K} given ENE_{N}, and let Π^\hat{\Pi} be the distribution with density proportional to Eq. 55 without the p(x1:NK+|θ1:K,σ)p(x_{1:N}^{K+}|\theta_{1:K},\sigma) term and with x1:NK+x_{1:N}^{K+} fixed to 0. If for some η[0,1)\eta\in[0,1),

Π^{B(Γ1:K,σ)ϵ/N}1η,\displaystyle\hat{\Pi}\left\{B(\Gamma_{1:K},\sigma)\leq\epsilon/N\right\}\geq 1-\eta, (60)

then

DTV(Π^,Π)\displaystyle\mathrm{D_{TV}}\left(\hat{\Pi},\Pi\right) 3(ϵ+η)2ϵη.\displaystyle\leq\frac{3(\epsilon+\eta)}{2}-\epsilon\eta. (61)

4.2 Adaptive truncated Metropolis–Hastings

Crucially, as long as one can obtain samples from the truncated posterior distribution, one can estimate the bound in Eq. 61 using sample estimates of the tail probability Π^{B(Γ1:K,σ)ϵ/N}1η\hat{\Pi}\left\{B(\Gamma_{1:K},\sigma)\leq\epsilon/N\right\}\geq 1-\eta. Therefore, one can compute the bound Eq. 61 without needing to evaluate p(x1:NK+|θ1:K,σ)p(x_{1:N}^{K+}|\theta_{1:K},\sigma) exactly. This suggests the following adaptive truncation procedure:

  1. 1.

    Pick a value of K>INK>I_{N} and desired total variation error ξ(0,1)\xi\in(0,1).

  2. 2.

    Obtain samples from the truncated posterior.

  3. 3.

    Minimize the bound in Eq. 61 over ϵ(0,1)\epsilon\in(0,1), using the samples to estimate η=1Π^{B(Γ1:K,σ)ϵ/N}\eta=1-\hat{\Pi}\left\{B(\Gamma_{1:K},\sigma)\leq\epsilon/N\right\} for each value of ϵ\epsilon.

  4. 4.

    If the minimum bound exceeds ξ\xi, increase KK and return to 2.

  5. 5.

    Otherwise, return the samples.

In this work, we start by initializing KK to IN+1I_{N}+1. In order to decide how much to increase KK by in each iteration, we use the sequential representation to extrapolate the total variation bound Eq. 61 to larger values of KK without actually performing MCMC sampling. In particular, for each posterior sample, we use its hyperparameters to generate additional rates from the sequential representation. We then use these extended samples to compute the total variation error guarantee as per step 3 above. We continue to generate additional rates (typically doubling the number each time) until the predicted total variation guarantee is below the desired threshold. Finally, we use linear interpolation between the last two predicted errors to find the next truncation level KK that matches the desired (log) error threshold. This fixes a new value of KK; at this point, we return to step 2 above and iterate.

Refer to caption
(a) True α=0\alpha=0
Refer to caption
(b) True γ=1\gamma=1
Refer to caption
(c) True λ=2\lambda=2
Figure 4: Posterior of α\alpha, γ\gamma, λ\lambda for the dense simulated network. Orange histograms depict the first adaptation iteration, and blue histograms depict the second and final iteration.
Refer to caption
(a) True α=0.2\alpha=0.2
Refer to caption
(b) True γ=1\gamma=1
Refer to caption
(c) True λ=2\lambda=2
Figure 5: Posterior of α\alpha, γ\gamma, λ\lambda for the sparse simulated network. Orange histograms depict the first adaptation iteration, and blue histograms depict the second and final iteration.

4.3 Experiments

In this section, we examine the properties of the proposed adaptive truncated inference algorithm for the beta-independent Bernoulli network model in Eqs. 35 and 36 with discount α(0,1)\alpha\in(0,1), concentration λ>1\lambda>1, mass γ>0\gamma>0, unordered collection of rates θ1:K1\theta_{1:K-1}, and KthK^{\text{th}} rate from the sequential representation θK\theta_{K}. In order to simplify inference, we transform each of these parameters to an unconstrained version:

α\displaystyle\alpha =exp(αu)1+exp(αu),\displaystyle=\frac{\exp(\alpha_{u})}{1+\exp(\alpha_{u})}, λ\displaystyle\lambda =1+exp(λu),\displaystyle=1+\exp(\lambda_{u}), γ\displaystyle\gamma =exp(γu),\displaystyle=\exp(\gamma_{u}), (62)
θK\displaystyle\theta_{K} =exp(θu,K)1+exp(θu,K),\displaystyle=\frac{\exp(\theta_{u,K})}{1+\exp(\theta_{u,K})}, θk\displaystyle\theta_{k} =θK+exp(θu,k)1+exp(θu,k)\displaystyle=\frac{\theta_{K}+\exp(\theta_{u,k})}{1+\exp(\theta_{u,k})} k\displaystyle k =1,,K1.\displaystyle=1,\dots,K-1. (63)

We use a Markov chain Monte Carlo algorithm that includes an exact Gibbs sampling move for γ\gamma, and separate Gaussian random-walk Metropolis–Hastings moves for αu\alpha_{u}, λu\lambda_{u}, θu,K\theta_{u,K}, all θu,k\theta_{u,k} such that vertex kk has degree 0 (jointly), and each θu,k\theta_{u,k} such that vertex kk has nonzero degree (individually).

Synthetic data

Refer to caption
(a) Dense synthetic network
Refer to caption
(b) Sparse synthetic network
Figure 6: Visualization of the truncation expansion procedure. Black circles denote the truncation level and error in each iteration of the adaptation. The adaptation stops when the truncation error falls below the desired threshold (here log10(0.01)=2\log_{10}(0.01)=-2). Grey dashed lines and circles visualize the predicted truncation error using rates generated from the sequential representation. The vertical dotted line shows that the algorithm selects a value of KK that attempts to match the desired log10\log_{10} error threshold of 2-2 using the predictions.

We first apply the model to synthetic data simulated from the generative model. We simulate a sparse network with parameters λ=2\lambda=2, γ=1\gamma=1, α=0.2\alpha=0.2, and N=105N=10^{5}, and a dense network with λ=2\lambda=2, γ=1\gamma=1, α=0\alpha=0, and N=107N=10^{7}. In both settings we set the truncation level for data generation to 500500, the desired total variation bound from Eq. 61 to 0.010.01, and initialize the sampler with α=0.4\alpha=0.4, λ=5\lambda=5, γ=2\gamma=2 and θ\theta generated from the sequential representation. All Metropolis–Hastings moves have proposal standard deviation 0.10.1 except the sparse network αu\alpha_{u} move, which has standard deivation 0.030.03.

Figs. 4 and 5 show histograms of 5,0005{,}000 marginal posterior samples of the hyperparameters for the dense (true α=0\alpha=0) and sparse (true α=0.2\alpha=0.2) networks. In both cases, the approximate posterior in the first round of adaptation (orange histogram) does not concentrate on the true hyperparameter values, despite the relatively large number of generative rounds NN. Fig. 6—which displays the truncation error and predictive adaptation procedure—shows why this is the case. In both networks, the first adaptation iteration identifies a large truncation error. After a single round of adaptation, the approximate posterior distributions (blue histograms) in Figs. 4 and 5 concentrate more on the true values as expected, and the truncation errors fall well below the desired threshold (log10(0.01)=2\log_{10}(0.01)=-2). It is worth noting that the predictive extrapolation appears to be quite conservative in these examples, and especially in the dense network. This happens because the approximate posterior for the dense network (respectively, sparse network) assigns mass to higher values of α\alpha (respectively, γ\gamma) than it should, which results in larger truncation error and thus a larger predicted required truncation level.

Real network data

Next, we apply the model to a Facebook-like Social Network222Available at https://toreopsahl.com/datasets/ [46]. The original source network contains a sequence of 61,73461{,}734 weighted, time-stamped edges, and 1,8991{,}899 vertices. We preprocess the data by removing the edge weights, binning the edge sequence into 30-minute intervals, and removing the initial transient of network growth, resulting in 1,8991{,}899 vertices and 10,43510{,}435 edges over N=6,427N=6{,}427 rounds of generation. We again set a desired total variation error guarantee of 0.01 during inference. All Metropolis–Hastings moves have proposal standard deviation 0.10.1 except the αu\alpha_{u} and degree-0 θu\theta_{u} moves, which have standard deviation 0.040.04 and 0.030.03, respectively. We initialize the sampler to α=0.1\alpha=0.1, γ=2\gamma=2, λ=20\lambda=20 and sample rates θ\theta from the prior sequential representation.

Refer to caption
Refer to caption
Refer to caption
Figure 7: Posterior of α\alpha, γ\gamma, λ\lambda in the Facebook-like network

Fig. 7 shows the posterior marginal histograms for the hyperparameters α,λ,γ\alpha,\lambda,\gamma in both the first iteration (orange) and the second and final iteration (blue) of truncation adaptation. The posterior distribution suggests that the network is dense (i.e. α0\alpha\approx 0). This conclusion is supported both by the close match of 100 samples from the posterior predictive distribution, shown in Figs. 8(a) and 8(b), and the findings of past work using this data [8]. Further, as in the synthetic examples, the truncation adaptation terminates after two iterations; but in this case the histograms do not change very much between the two. This is essentially because the truncation error in the first iteration is relatively low (0.02\approx 0.02), leading to a reasonably accurate truncated posterior and hence accurate predictions of the truncation error at higher truncation levels.

Refer to caption
(a) # Vertices vs. # edges
Refer to caption
(b) Degree density
Refer to caption
(c) Truncation expansion
Figure 8: (8(a),8(b)): The characteristics of the observed Facebook-like network (black) and 100 samples from the posterior predictive distribution (blue). (8(c)): The truncation adaptation process, with truncation level and error bound in each iteration (black circles) and predictive truncation errors (grey dashed lines).

5 Conclusion

In this paper, we developed methods for tractable generative simulation and posterior inference in statistical models with discrete nonparametric priors via finite truncation. We demonstrated that these approximate truncation-based approaches are sound via theoretical error analysis. In the process, we also showed that the nonzero rates of the (tractable) rejection representation of a Poisson process are equal in distribution to the rates of the (intractable) inverse Lévy representation. Simulated and real network examples demonstrated that the proposed methods are useful in selecting truncation levels for both forward generation and inference in practice.

References

  • [1] Paul Erdős and Alfréd Rényi. On random graphs I. Publicationes Mathematicae, 6:290–297, 1959.
  • [2] Paul Holland, Kathryn Laskey, and Samuel Leinhardt. Stochastic block models: first steps. Social Networks, 5:109–137, 1983.
  • [3] Diana Cai, Trevor Campbell, and Tamara Broderick. Edge-exchangeable graphs and sparsity. In Advances in Neural Information Processing Systems, pages 4249–4257, 2016.
  • [4] Sinead Williamson. Nonparametric network models for link prediction. The Journal of Machine Learning Research, 17(1):7102–7121, 2016.
  • [5] Harry Crane and Walter Dempsey. Edge exchangeable models for interaction networks. Journal of the American Statistical Association, 113(523):1311–1326, 2018.
  • [6] Trevor Campbell, Diana Cai, and Tamara Broderick. Exchangeable trait allocations. Electronic Journal of Statistics, 12(2):2290–2322, 2018.
  • [7] Svante Janson. On edge exchangeable random graphs. Journal of Statistical Physics, 173(3-4):448–484, 2018.
  • [8] François Caron and Emily Fox. Sparse graphs using exchangeable random measures. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1295–1366, 2017.
  • [9] Adrien Todeschini, Xenia Miscouridou, and François Caron. Exchangeable random measures for sparse and modular graphs with overlapping communities. arXiv:1602.02114, 2016.
  • [10] Tue Herlau, Mikkel Schmidt, and Morten Mørup. Completely random measures for modelling block-structured sparse networks. In Advances in Neural Information Processing Systems, pages 4260–4268, 2016.
  • [11] Victor Veitch and Daniel Roy. The class of random graphs arising from exchangeable random measures. arXiv:1512.03099, 2015.
  • [12] Christian Borgs, Jennifer Chayes, Henry Cohn, and Nina Holden. Sparse exchangeable graphs and their limits via graphon processes. The Journal of Machine Learning Research, 18(1):7740–7810, 2017.
  • [13] François Caron and Judith Rousseau. On sparsity and power-law properties of graphs based on exchangeable point processes. arXiv:1708.03120, 2017.
  • [14] Peter Orbanz and Daniel Roy. Bayesian models of graphs, arrays and other exchangeable random structures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):437–461, 2014.
  • [15] John Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1):59–78, 1967.
  • [16] Christian Robert and George Casella. Monte Carlo Statistical Methods. Springer, 2nd edition, 2004.
  • [17] Andrew Gelman, John Carlin, Hal Stern, David Dunson, Aki Vehtari, and Donald Rubin. Bayesian data analysis. CRC Press, 3rd edition, 2013.
  • [18] Michael Escobar and Mike West. Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90:577–588, 1995.
  • [19] Thomas Griffiths and Zoubin Ghahramani. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems, 2005.
  • [20] Tamara Broderick, Ashia Wilson, and Michael Jordan. Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli, 24(4B):3181–3221, 2018.
  • [21] Yee Whye Teh, Dilan Görür, and Zoubin Ghahramani. Stick-breaking construction for the Indian buffet process. In Artificial Intelligence and Statistics, 2007.
  • [22] Maria Kalli, Jim Griffin, and Stephen Walker. Slice sampling mixture models. Statistics and Computing, 21:93–105, 2011.
  • [23] Peiyuan Zhu, Alexandre Bouchard-Côté, and Trevor Campbell. Slice sampling for general completely random measures. In Uncertainty in Artificial Intelligence, 2020.
  • [24] David Blei and Michael Jordan. Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1):121–143, 2006.
  • [25] David Blei and John Lafferty. A correlated topic model of science. The Annals of Applied Statistics, 1(1):17–35, 2007.
  • [26] Chong Wang, John Paisley, and David Blei. Online variational inference for the hierarchical Dirichlet process. In International Conference on Artificial Intelligence and Statistics, 2011.
  • [27] Finale Doshi, Kurt Miller, Jurgen Van Gael, and Yee Whye Teh. Variational inference for the Indian buffet process. In Artificial Intelligence and Statistics, pages 137–144, 2009.
  • [28] Trevor Campbell, Jonathan Huggins, Jonathan How, and Tamara Broderick. Truncated random measures. Bernoulli, 25(2):1256–1288, 2019.
  • [29] Hemant Ishwaran and Lancelot James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161–173, 2001.
  • [30] Hemant Ishwaran and Mahmoud Zarepour. Exact and approximate sum representations for the Dirichlet process. Canadian Journal of Statistics, 30(2):269–283, 2002.
  • [31] Anirban Roychowdhury and Brian Kulis. Gamma Processes, Stick-Breaking, and Variational Inference. In International Conference on Artificial Intelligence and Statistics, 2015.
  • [32] Diana Cai and Tamara Broderick. Completely random measures for modeling power laws in sparse graphs. In NIPS 2015 Workshop on Networks in the Social and Informational Sciences, 2015.
  • [33] Harry Crane and Walter Dempsey. A framework for statistical network modeling. arXiv:1509.08185, 2015.
  • [34] Jan Rosiński. Series representations of Lévy processes from the perspective of point processes. In Lévy processes, pages 401–415. Springer, 2001.
  • [35] Thomas Ferguson and Michael Klass. A representation of independent increment processes without Gaussian components. The Annals of Mathematical Statistics, 43(5):1634–1643, 1972.
  • [36] Robert Wolpert and Katja Ickstadt. Poisson/gamma random field models for spatial statistics. Biometrika, 85(2):251–267, 1998.
  • [37] Yee Whye Teh and Dilan Gorur. Indian buffet processes with power-law behavior. In Advances in Neural Information Processing Systems, pages 1838–1846, 2009.
  • [38] Yee Whye Teh and Michael Jordan. Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametrics, 1:158–207, 2010.
  • [39] Jan Rosiński. On series representations of infinitely divisible random vectors. The Annals of Probability, pages 405–430, 1990.
  • [40] Günter Last and Mathew Penrose. Lectures on the Poisson process, volume 7. Cambridge University Press, 2017.
  • [41] Thomas Ferguson. A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1:209–230, 1973.
  • [42] Nils Lid Hjort. Nonparametric Bayes estimators based on beta processes in models for life history data. The Annals of Statistics, 18(3):1259–1294, 1990.
  • [43] Anders Brix. Generalized gamma measures and shot-noise Cox processes. Advances in Applied Probability, 31(4):929–953, 1999.
  • [44] Charles Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2:1152–1174, 1974.
  • [45] Thomas Griffiths and Zoubin Ghahramani. Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems, 2006.
  • [46] Pietro Panzarasa, Tore Opsahl, and Kathleen M Carley. Patterns and dynamics of users’ behavior and interaction: Network analysis of an online community. Journal of the American Society for Information Science and Technology, 60(5):911–932, 2009.
  • [47] John Kingman. Poisson Processes. Oxford Studies in Probability. Clarendon Press, 1992.

Appendix A Equivalence between nonzero rates from a rejection representation and the inverse Lévy representation

Proof of Theorem 3.5.

Denote Tk1T_{k_{1}} be the first nonzero element that is generated from the rejection representation from Eq. 6 and correspondingly, denote Γk1\Gamma_{k_{1}} be the jump of the unit-rate homogeneous Poisson process on +\mathbb{R}_{+} such that Tk1=μ(Γk1)T_{k_{1}}=\mu^{\leftarrow}(\Gamma_{k_{1}}), where μ\mu is the proposal measure in the rejection representation. Let ff be a bounded continuous function. Then

𝔼[f(Tk1)]\displaystyle\mathbb{E}[f(T_{k_{1}})] =𝔼[f(μ(Γk1))]\displaystyle=\mathbb{E}[f(\mu^{\leftarrow}(\Gamma_{k_{1}}))] (64)
=𝔼[j=1f(μ(Γj))𝟙[dνdμ(μ(Γj))Uj)]i=1j1𝟙[dνdμ(μ(Γi))<Ui]]\displaystyle=\mathbb{E}\left[\sum_{j=1}^{\infty}f(\mu^{\leftarrow}(\Gamma_{j}))\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{j}))\geq U_{j})\right]\prod_{i=1}^{j-1}\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{i}))<U_{i}\right]\right] (65)
=𝔼[j=1f(μ(Γj))dνdμ(μ(Γj))i=1j1(1dνdμ(μ(Γi)))]\displaystyle=\mathbb{E}\left[\sum_{j=1}^{\infty}f(\mu^{\leftarrow}(\Gamma_{j}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{j}))\prod_{i=1}^{j-1}\left(1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{i}))\right)\right] (66)
=j=1𝔼[f(μ(Γj))dνdμ(μ(Γj))𝔼[i=1j1(1dνdμ(μ(Γi)))|Γj]]\displaystyle=\sum_{j=1}^{\infty}\mathbb{E}\left[f(\mu^{\leftarrow}(\Gamma_{j}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{j}))\mathbb{E}\left[\left.\prod_{i=1}^{j-1}\left(1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{i}))\right)\right|\Gamma_{j}\right]\right] (67)

Note that given Γj\Gamma_{j}, Γii.i.d.𝖴𝗇𝗂𝖿(0,Γj)\Gamma_{i}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Unif}}(0,\Gamma_{j}), for i=1,,j1i=1,\cdots,j-1, so

𝔼[i=1j1(1dνdμ(μ(Γi)))|Γj]=𝔼[1dνdμ(μ(U))|Γj]j1,\displaystyle\mathbb{E}\left[\left.\prod_{i=1}^{j-1}\left(1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{i}))\right)\right|\Gamma_{j}\right]=\mathbb{E}\left[1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(U))\,|\,\Gamma_{j}\right]^{j-1}, (68)

where Ui.i.d.𝖴𝗇𝗂𝖿(0,Γj)U\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Unif}}(0,\Gamma_{j}). Using the change of variable y=μ(u)y=\mu^{\leftarrow}(u), we obtain

𝔼[1dνdμ(μ(U))|Γj]=11Γj0Γjdνdμ(μ(u))du=11Γjμ(Γj)dν.\displaystyle\mathbb{E}\left[1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(U))\,|\,\Gamma_{j}\right]=1-\frac{1}{\Gamma_{j}}\int_{0}^{\Gamma_{j}}\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(u))\mathrm{d}u=1-\frac{1}{\Gamma_{j}}\int_{\mu^{\leftarrow}(\Gamma_{j})}^{\infty}\mathrm{d}\nu. (69)

Therefore, using the same change of variable trick,

𝔼[f(Tk1)]\displaystyle\mathbb{E}[f(T_{k_{1}})] =j=1𝔼[f(μ(Γj))dνdμ(μ(Γj))(11Γjμ(Γj)dν)j1]\displaystyle=\sum_{j=1}^{\infty}\mathbb{E}\left[f(\mu^{\leftarrow}(\Gamma_{j}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{j}))\left(1-\frac{1}{\Gamma_{j}}\int_{\mu^{\leftarrow}(\Gamma_{j})}^{\infty}\mathrm{d}\nu\right)^{j-1}\right] (70)
=j=10f(μ(γ))dνdμ(μ(γ))(11γμ(γ)dν)j1γj1(j1)!eγdγ\displaystyle=\sum_{j=1}^{\infty}\int_{0}^{\infty}f(\mu^{\leftarrow}(\gamma))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma))\left(1-\frac{1}{\gamma}\int_{\mu^{\leftarrow}(\gamma)}^{\infty}\mathrm{d}\nu\right)^{j-1}\frac{\gamma^{j-1}}{(j-1)!}e^{-\gamma}\mathrm{d}\gamma (71)
=j=10f(y)(1μ[y,)1ν[y,))j1μ[y,)j1(j1)!eμ[y,)ν(dy)\displaystyle=\sum_{j=1}^{\infty}\int_{0}^{\infty}f(y)\left(1-\mu[y,\infty)^{-1}\nu[y,\infty)\right)^{j-1}\frac{\mu[y,\infty)^{j-1}}{(j-1)!}e^{-\mu[y,\infty)}\nu(\mathrm{d}y) (72)
=0f(y)j=1(μ[y,)ν[y,))j1(j1)!eμ[y,)ν(dy)\displaystyle=\int_{0}^{\infty}f(y)\sum_{j=1}^{\infty}\frac{\left(\mu[y,\infty)-\nu[y,\infty)\right)^{j-1}}{(j-1)!}e^{-\mu[y,\infty)}\nu(\mathrm{d}y) (73)
=0f(y)eν[y,)ν(dy).\displaystyle=\int_{0}^{\infty}f(y)e^{-\nu[y,\infty)}\nu(\mathrm{d}y). (74)

Suppose that θ1\theta_{1} is the first rate generated using the inverse Lévy representation. Then

𝔼[f(θ1)]=0f(ν(γ))eγdγ.\displaystyle\mathbb{E}[f(\theta_{1})]=\int_{0}^{\infty}f(\nu^{\leftarrow}(\gamma))e^{-\gamma}\mathrm{d}\gamma. (75)

Making the change of variable y=ν(γ)y=\nu^{\leftarrow}(\gamma), we obtain

𝔼[f(θ1)]=0f(y)eν[γ,)ν(dy)=𝔼[f(Tk1)].\displaystyle\mathbb{E}[f(\theta_{1})]=\int_{0}^{\infty}f(y)e^{-\nu[\gamma,\infty)}\nu(\mathrm{d}y)=\mathbb{E}[f(T_{k_{1}})]. (76)

Therefore, the first nonzero rate θk1\theta_{k_{1}} from the rejection representation has the same marginal distribution as the first rate θ1\theta_{1} from the inverse Lévy representation.

We now employ an inductive argument. Suppose that we have shown that the marginal distribution of first nonzero MM elements ΞM:=(Tk1,Tk2,,TkM)\Xi_{M}:=(T_{k_{1}},T_{k_{2}},\cdots,T_{k_{M}}) from the rejection representation has the same marginal distribution as the first MM elements ΘM:=(θ1,,θM)\Theta_{M}:=(\theta_{1},\cdots,\theta_{M}) from the inverse Lévy representation. To prove the same for M+1M+1 elements, it suffices to show that the conditional distribution of TkM+1T_{k_{M+1}} given ΞM\Xi_{M} is equal to the conditional distribution of θM+1\theta_{M+1} given ΘM\Theta_{M} when ΞM=ΘM\Xi_{M}=\Theta_{M}.

Denote Γj=i=1jei\Gamma^{\prime}_{j}=\sum_{i=1}^{j}e^{\prime}_{i}, where eii.i.d.𝖤𝗑𝗉(1)e^{\prime}_{i}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Exp}}(1), and Uii.i.d.𝖴𝗇𝗂𝖿[0,1]U^{\prime}_{i}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Unif}}[0,1]. Then

𝔼[f(TkM+1)|ΞM]=𝔼[f(μ(ΓkM+1))|ΞM]=𝔼[j=1f(μ(ΓkM+Γj))𝟙[]|ΞM],\displaystyle\mathbb{E}[f(T_{k_{M+1}})|\Xi_{M}]=\mathbb{E}[f(\mu^{\leftarrow}(\Gamma_{k_{M+1}}))|\Xi_{M}]=\mathbb{E}\left[\left.\sum_{j=1}^{\infty}f(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j}))\mathds{1}[\dots]\right|\Xi_{M}\right], (77)

where 𝟙[]\mathds{1}[\dots] is shorthand for

𝟙[]=𝟙[dνdμ(μ(ΓkM+Γj))Uj]i=1j1𝟙[dνdμ(μ(ΓkM+Γi))<Ui].\displaystyle\mathds{1}[\dots]=\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j}))\geq U^{\prime}_{j}\right]\prod_{i=1}^{j-1}\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma_{i}))<U^{\prime}_{i}\right]. (78)

Using steps similar to the base case,

𝔼[f(TkM+1)|ΞM]=j=1𝔼[f(μ(ΓkM+Γj))dνdμ(μ(ΓkM+Γj))𝔼[]j1|ΞM],\displaystyle\mathbb{E}[f(T_{k_{M+1}})|\Xi_{M}]=\sum_{j=1}^{\infty}\mathbb{E}\left[\left.f(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j}))\mathbb{E}[\dots]^{j-1}\right|\Xi_{M}\right], (79)

where

𝔼[]=𝔼[1dνdμ(μ(ΓkM+U))|Γj,ΞM]U𝖴𝗇𝗂𝖿[0,Γj].\displaystyle\mathbb{E}[\dots]=\mathbb{E}\left[\left.1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+U))\right|\Gamma^{\prime}_{j},\Xi_{M}\right]\qquad U\sim{\sf{Unif}}[0,\Gamma^{\prime}_{j}]. (80)

Making the change of variable y=μ(ΓkM+u)y=\mu^{\leftarrow}(\Gamma_{k_{M}}+u) as before, we obtain

𝔼[1dνdμ(μ(ΓkM+U))|Γj,ΞM]\displaystyle\mathbb{E}\left[\left.1-\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+U))\right|\Gamma^{\prime}_{j},\Xi_{M}\right] =10Γj1Γjdνdμ(μ(ΓkM+u))du\displaystyle=1-\int_{0}^{\Gamma^{\prime}_{j}}\frac{1}{\Gamma^{\prime}_{j}}\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+u))\mathrm{d}u (81)
=11Γjμ(ΓkM+Γj)μ(ΓkM)dν.\displaystyle=1-\frac{1}{\Gamma^{\prime}_{j}}\int_{\mu^{\leftarrow}(\Gamma_{k_{M}}+\Gamma^{\prime}_{j})}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}\mathrm{d}\nu. (82)

Making another change of variables y=μ(ΓkM+γ)y=\mu^{\leftarrow}(\Gamma_{k_{M}}+\gamma) in the original integral—and hence γ=μ[y,)ΓkM=μ[y,μ(ΓkM))\gamma=\mu[y,\infty)-\Gamma_{k_{M}}=\mu[y,\mu^{\leftarrow}(\Gamma_{k_{M}}))—yields

𝔼[f(TkM+1)|ΞM]\displaystyle\mathbb{E}[f(T_{k_{M+1}})|\Xi_{M}] (83)
=\displaystyle= j=10f(μ(ΓkM+γ))dνdμ(μ(ΓkM+γ))𝔼[]j1γj1(j1)!eγdγ\displaystyle\sum_{j=1}^{\infty}\int_{0}^{\infty}f(\mu^{\leftarrow}(\Gamma_{k_{M}}+\gamma))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{k_{M}}+\gamma))\mathbb{E}[\dots]^{j-1}\frac{\gamma^{j-1}}{(j-1)!}e^{-\gamma}\mathrm{d}\gamma (84)
=\displaystyle= j=10μ(ΓkM)f(y)(11γyμ(ΓkM)dν)j1γj1(j1)!eγν(dy)\displaystyle\sum_{j=1}^{\infty}\int_{0}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}f(y)\left(1-\frac{1}{\gamma}\int_{y}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}\mathrm{d}\nu\right)^{j-1}\frac{\gamma^{j-1}}{(j-1)!}e^{-\gamma}\nu(\mathrm{d}y) (85)
=\displaystyle= j=10μ(ΓkM)f(y)(μ[y,μ(ΓkM))ν[y,μ(ΓkM)))j1(j1)!eμ[y,μ(ΓkM))ν(dy)\displaystyle\sum_{j=1}^{\infty}\int_{0}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}f(y)\frac{(\mu[y,\mu^{\leftarrow}(\Gamma_{k_{M}}))-\nu[y,\mu^{\leftarrow}(\Gamma_{k_{M}})))^{j-1}}{(j-1)!}e^{-\mu[y,\mu^{\leftarrow}(\Gamma_{k_{M}}))}\nu(\mathrm{d}y) (86)
=\displaystyle= 0μ(ΓkM)f(y)eν[y,μ(ΓkM))ν(dy)=0TkMf(y)eν[y,TkM)ν(dy).\displaystyle\int_{0}^{\mu^{\leftarrow}(\Gamma_{k_{M}})}f(y)e^{-\nu[y,\mu^{\leftarrow}(\Gamma_{k_{M}}))}\nu(\mathrm{d}y)=\int_{0}^{T_{k_{M}}}f(y)e^{-\nu[y,T_{k_{M}})}\nu(\mathrm{d}y). (87)

On the other hand,

𝔼[f(θM+1)|ΘM]=𝔼[f(ν(ΓM+Γ1))|ΘM]=0f(ν(ΓM+γ))eγdγ\displaystyle\mathbb{E}[f(\theta_{M+1})|\Theta_{M}]=\mathbb{E}[f(\nu^{\leftarrow}(\Gamma_{M}+\Gamma^{\prime}_{1}))|\Theta_{M}]=\int_{0}^{\infty}f(\nu^{\leftarrow}(\Gamma_{M}+\gamma))e^{-\gamma}\mathrm{d}\gamma (88)
=\displaystyle= 0ν(ΓM)f(y)eν[y,ν(ΓM))ν(dy)=0θMf(y)eν[y,θM)ν(dy).\displaystyle\int_{0}^{\nu^{\leftarrow}(\Gamma_{M})}f(y)e^{-\nu[y,\nu^{\leftarrow}(\Gamma_{M}))}\nu(\mathrm{d}y)=\int_{0}^{\theta_{M}}f(y)e^{-\nu[y,\theta_{M})}\nu(\mathrm{d}y). (89)

Thus the distribution of the (M+1)th(M+1)^{\text{th}} nonzero rate in the rejection representation TkM+1T_{k_{M+1}} given ΞM\Xi_{M} is equal to the distribution of the (M+1)th(M+1)^{\text{th}} rate from the inverse Lévy representation θM+1\theta_{M+1} given ΘM\Theta_{M} when ΞM=ΘM\Xi_{M}=\Theta_{M}. ∎

Appendix B Truncation error bounds for self-product measures

Proof of Lemma 3.1.

Denote Θ~={Θ(d),ΘK(d)}\tilde{\Theta}=\{\Theta^{(d)},\Theta_{K}^{(d)}\}. Denote the marginal probability mass function (PMF) of EN𝒩dE_{N}\in\mathcal{N}_{d} and EN,K𝒩dE_{N,K}\in\mathcal{N}_{d} as PNP_{N} and PN,KP_{N,K}, and denote their PMFs given Θ~\tilde{\Theta} as f(x|Θ~)f(x|\tilde{\Theta}) and fK(x|Θ~)f_{K}(x|\tilde{\Theta}) respectively.

DTV(PN,PN,K)\displaystyle\mathrm{D_{TV}}\left(P_{N},P_{N,K}\right) (90)
=12x𝒩d|PN(x)PN,K(x)|\displaystyle=\frac{1}{2}\sum_{x\in\mathcal{N}_{d}}\Big{|}P_{N}(x)-P_{N,K}(x)\Big{|} (91)
=12x𝒩d|𝔼[f(x|Θ~)]𝔼[fK(x|Θ~)]|\displaystyle=\frac{1}{2}\sum_{x\in\mathcal{N}_{d}}\left|\mathbb{E}[f(x|\tilde{\Theta})]-\mathbb{E}[f_{K}(x|\tilde{\Theta})]\right| (92)
12(INK)x𝒩d|𝔼[f(x|Θ~)|INK]𝔼[fK(x|Θ~)|INK]|\displaystyle\leq\frac{1}{2}\mathbb{P}(I_{N}\leq K)\sum_{x\in\mathcal{N}_{d}}\left|\mathbb{E}[f(x|\tilde{\Theta})|I_{N}\leq K]-\mathbb{E}[f_{K}(x|\tilde{\Theta})|I_{N}\leq K]\right| (93)
+12(IN>K)x𝒩d|𝔼[f(x|Θ~)|IN>K]𝔼[fK(x|Θ~)|IN>K]|\displaystyle\,\,\,+\frac{1}{2}\mathbb{P}(I_{N}>K)\sum_{x\in\mathcal{N}_{d}}\left|\mathbb{E}[f(x|\tilde{\Theta})|I_{N}>K]-\mathbb{E}[f_{K}(x|\tilde{\Theta})|I_{N}>K]\right| (94)

Conditioned on INKI_{N}\leq K, f(x|Θ~)=fK(x|Θ~)f(x|\tilde{\Theta})=f_{K}(x|\tilde{\Theta}) under both the independent and categorical likelihood. So

𝔼[f(x|Θ~)|INK]𝔼[fK(x|Θ~)|INK]=0.\displaystyle\mathbb{E}[f(x|\tilde{\Theta})|I_{N}\leq K]-\mathbb{E}[f_{K}(x|\tilde{\Theta})|I_{N}\leq K]=0. (96)

By Fubini’s Theorem,

DTV(PN,PN,K)\displaystyle\mathrm{D_{TV}}\left(P_{N},P_{N,K}\right) (97)
\displaystyle\leq 12(IN>K)x𝒩d(𝔼[f(x|Θ~)|IN>K]+𝔼[fK(x|Θ~)|IN>K])\displaystyle\frac{1}{2}\mathbb{P}(I_{N}>K)\sum_{x\in\mathcal{N}_{d}}\left(\ \mathbb{E}[f(x|\tilde{\Theta})|I_{N}>K]+\mathbb{E}[f_{K}(x|\tilde{\Theta})|I_{N}>K]\right) (98)
=\displaystyle= 12(IN>K)(𝔼[x𝒩df(x|Θ~)|IN>K]+𝔼[x𝒩dfK(x|Θ~)|IN>K])\displaystyle\frac{1}{2}\mathbb{P}(I_{N}>K)\left(\mathbb{E}\left[\left.\sum_{x\in\mathcal{N}_{d}}f(x|\tilde{\Theta})\right|I_{N}>K\right]+\mathbb{E}\left[\left.\sum_{x\in\mathcal{N}_{d}}f_{K}(x|\tilde{\Theta})\right|I_{N}>K\right]\right) (99)
=\displaystyle= (IN>K)=1(INK).\displaystyle\mathbb{P}(I_{N}>K)=1-\mathbb{P}\left(I_{N}\leq K\right). (100)

Proof of Theorems 3.4 and 4.1.

Denote the set of indices

,K:={id:1i1,,iK,K+1i+1,,id<}\displaystyle\mathcal{I}_{\ell,K}:=\{i\in\mathbb{N}_{\neq}^{d}:1\leq i_{1},\cdots,i_{\ell}\leq K,\ K+1\leq i_{\ell+1},\cdots,i_{d}<\infty\} (101)

such that i,Ki\in\mathcal{I}_{\ell,K} indicates that the first \ell elements of ii belong to the truncation, and the remaining dd-\ell elements belong to the tail. By Jensen’s inequality,

(INK|U1:K,Γ1:K)\displaystyle\mathbb{P}\left(I_{N}\leq K\,|\,U_{1:K},\Gamma_{1:K}\right)
=𝔼[exp{N=0d1(d)i,Klogπ(ϑi)}|U1:K,Γ1:K]\displaystyle=\mathbb{E}\left[\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right\}\,|\,U_{1:K},\Gamma_{1:K}\right]
exp{N=0d1(d)𝔼[i,Klogπ(ϑi)|U1:K,Γ1:K]}.\displaystyle\geq\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\mathbb{E}\left[\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\,|\,U_{1:K},\Gamma_{1:K}\right]\right\}. (102)

This equation arises by noting that INKI_{N}\leq K if and only if for all ii involving an index ij>Ki_{j}>K, the count of edge ii is 0 after NN rounds; the factor (d)\binom{d}{\ell} accounts for the fact that ϑi=j=1dθij\vartheta_{i}=\prod_{j=1}^{d}\theta_{i_{j}} is independent of the ordering of the iji_{j}.

Consider a single term i,Klogπ(ϑi)\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi(\vartheta_{i}) in the above sum. Since we are conditioning on U1:K,Γ1:KU_{1:K},\Gamma_{1:K}, we have that θi1,,θi\theta_{i_{1}},\dots,\theta_{i_{\ell}} are fixed in the expectation, and the remaining steps ΓK+1,ΓK+2,\Gamma_{K+1},\Gamma_{K+2},\cdots are the ordered jumps of a unit rate homogeneous Poisson process on [ΓK,)\left[\Gamma_{K},\infty\right). By the marking property of the Poisson process [47], conditioned on ΓK\Gamma_{K}, we have that (Ui,Γi)i=K+1(U_{i},\Gamma_{i})_{i=K+1}^{\infty} is a Poisson process on +×[ΓK,)\mathbb{R}_{+}\times\left[\Gamma_{K},\infty\right) with rate measure g(du)dγg(\mathrm{d}u)\mathrm{d}\gamma. Thus we apply the Slivnyak-Mecke theorem [40] to the remaining dd-\ell indices to obtain

𝔼[i,Klogπ(ϑi)|U1:K,Γ1:K]\displaystyle\mathbb{E}\left[\left.\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right|U_{1:K},\Gamma_{1:K}\right] (103)
𝔼[i,Klogπ(j=1θijj=+1dτ(Uij,Γij))|U1:K,Γ1:K]\displaystyle\mathbb{E}\left[\left.\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\prod_{j=1}^{\ell}\theta_{i_{j}}\prod_{j=\ell+1}^{d}\tau(U_{i_{j}},\Gamma_{i_{j}})\right)\right|U_{1:K},\Gamma_{1:K}\right] (104)
=\displaystyle= i1iK𝔼[+dlogπ(j=1θijj=+1dτ(Uij,ΓK+γj))j=+1ddγj|U1:K,Γ1:K]\displaystyle\sum_{i_{1}\neq\cdots\neq i_{\ell}\leq K}\mathbb{E}\left[\int_{\mathbb{R}_{+}^{d-\ell}}\log\pi\left(\prod_{j=1}^{\ell}\theta_{i_{j}}\prod_{j=\ell+1}^{d}\tau(U_{i_{j}},\Gamma_{K}+\gamma_{j})\right)\prod_{j=\ell+1}^{d}\mathrm{d}\gamma_{j}\,|\,U_{1:K},\Gamma_{1:K}\right] (105)
=\displaystyle= [K]||=[ΓK,)d𝔼[logπ(jθjj=1dτ(Uj,γj))|θ1:K]dγ1:d,\displaystyle\sum_{\begin{subarray}{c}\mathcal{L}\subseteq[K]\\ |\mathcal{L}|=\ell\end{subarray}}\int_{[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\mathbb{E}\left[\log\pi\left(\prod_{j\in\mathcal{L}}\theta_{j}\cdot\prod_{j=1}^{d-\ell}\tau(U^{\prime}_{j},\gamma_{j})\right)\,|\,\theta_{1:K}\right]\mathrm{d}\gamma_{1:d-\ell}, (106)

where Uji.i.d.gU^{\prime}_{j}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}g. Substitution of this expression into Eq. 102 yields the result of Theorem 4.1. Next, we consider the bound on the marginal probability (INK)\mathbb{P}\left(I_{N}\leq K\right). By Jensen’s inequality applied to Eq. 102 and following the previous derivation, we find that

(INK)exp{N=0d1(d)𝔼[i,Klogπ(ϑi)]},\displaystyle\mathbb{P}\left(I_{N}\leq K\right)\geq\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\mathbb{E}\left[\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right]\right\}, (107)

and

𝔼[i,Klogπ(ϑi)]\displaystyle\mathbb{E}\left[\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right] (108)
=\displaystyle= i1iK𝔼[[ΓK,)dlogπ(j=1θijj=1dτ(Uj,γj))dγ1:d].\displaystyle\sum_{i_{1}\neq\cdots\neq i_{\ell}\leq K}\mathbb{E}\left[\int_{[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\log\pi\left(\prod_{j=1}^{\ell}\theta_{i_{j}}\prod_{j=1}^{d-\ell}\tau(U^{\prime}_{j},\gamma_{j})\right)\mathrm{d}\gamma_{1:d-\ell}\right]. (109)

Using the fact that conditioned on ΓK\Gamma_{K}, Γ1:K1\Gamma_{1:K-1} are uniformly distributed on [0,ΓK][0,\Gamma_{K}], and that at most one iji_{j} can be equal to KK,

i1iK𝔼[[ΓK,)dlogπ(j=1θijj=1dτ(Uj,γj))dγ1:d]\displaystyle\sum_{i_{1}\neq\cdots\neq i_{\ell}\leq K}\mathbb{E}\left[\int_{[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\log\pi\left(\prod_{j=1}^{\ell}\theta_{i_{j}}\prod_{j=1}^{d-\ell}\tau(U^{\prime}_{j},\gamma_{j})\right)\mathrm{d}\gamma_{1:d-\ell}\right] (110)
=(K1)!(K1)!𝔼[ΓK[0,ΓK]×[ΓK,)dlogπ(j=1dτ(Uj,γj))dγ1:d]\displaystyle=\frac{(K-1)!}{(K-1-\ell)!}\mathbb{E}\left[\Gamma_{K}^{-\ell}\int_{[0,\Gamma_{K}]^{\ell}\times[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\log\pi\left(\prod_{j=1}^{d}\tau(U_{j},\gamma_{j})\right)\mathrm{d}\gamma_{1:d}\right] (111)
+(K1)!(K)!𝔼[ΓK(1)[0,ΓK]×[ΓK,)dδγ1=ΓKlogπ(j=1dτ(Uj,γj))dγ1:d],\displaystyle+\ell\frac{(K-1)!}{(K-\ell)!}\mathbb{E}\left[\Gamma_{K}^{-(\ell-1)}\int_{[0,\Gamma_{K}]^{\ell}\times[\Gamma_{K},\infty)^{d-\ell}}\!\!\!\!\delta_{\gamma_{1}=\Gamma_{K}}\log\pi\left(\prod_{j=1}^{d}\tau(U_{j},\gamma_{j})\right)\mathrm{d}\gamma_{1:d}\right], (112)

where the first and second terms arise from portions of the sum where all indices satisfy ijKi_{j}\neq K and one index satisfies ij=Ki_{j}=K, respectively.

To complete the result, we study the asymptotics of the marginal probability bound. It follows from Eq. 8 that IN<I_{N}<\infty almost surely. Therefore

limK𝔼[exp{N=0d1(d)i,Klogπ(ϑi)}]=limK(INK)=1.\displaystyle\lim_{K\rightarrow\infty}\mathbb{E}\left[\exp\left\{N\sum_{\ell=0}^{d-1}\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right\}\right]=\lim_{K\rightarrow\infty}\mathbb{P}(I_{N}\leq K)=1. (113)

It then follows from [28, Lemma B.1] and continuous mapping theorem that

=0d1[(d)i,Klogπ(ϑi)]𝑝0asK.\displaystyle\sum_{\ell=0}^{d-1}\left[\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}\log\pi\left(\vartheta_{i}\right)\right]\overset{p}{\to}0\qquad\text{as}\,\,K\to\infty. (114)

Since this sequence is monotonically increasing in KK, we have that

BK=𝔼[=0d1[(d)i,Klogπ(ϑi)]]0asK.\displaystyle B_{K}=\mathbb{E}\left[\sum_{\ell=0}^{d-1}\left[\binom{d}{\ell}\sum_{i\in\mathcal{I}_{\ell,K}}-\log\pi\left(\vartheta_{i}\right)\right]\right]\to 0\qquad\text{as}\,\,K\to\infty. (115)

B.1 Proof of Theorem 3.6

We first state an useful results which states that if one perturbs the probabilities of a countable discrete distribution by i.i.d. 𝖦𝗎𝗆𝖻𝖾𝗅(0,1){\sf{Gumbel}}(0,1) random variables, the arg max of the resulting set is a sample from that distribution.

Lemma B.1.

[28, Lemma 5.2] Let (pj)j=1(p_{j})_{j=1}^{\infty} be a countable collection of positive numbers such that jpj<\sum_{j}p_{j}<\infty and let p¯j=pjkpk\bar{p}_{j}=\frac{p_{j}}{\sum_{k}p_{k}}. If (Wj)j=1(W_{j})_{j=1}^{\infty} are i.i.d 𝖦𝗎𝗆𝖻𝖾𝗅(0,1){\sf{Gumbel}}(0,1) random variables, then argmaxjWj+logpj\operatornamewithlimits{arg\,max}_{j\in\mathbb{N}}W_{j}+\log p_{j} exists, is unique a.s., and has distribution

argmaxjWj+logpj𝖢𝖺𝗍𝖾𝗀𝗈𝗋𝗂𝖼𝖺𝗅((p¯j)j=1).\displaystyle\operatornamewithlimits{arg\,max}_{j\in\mathbb{N}}W_{j}+\log p_{j}\sim{\sf{Categorical}}\left((\bar{p}_{j})_{j=1}^{\infty}\right). (116)
Proof of Theorem 3.6.

Since the NN edges from the categorical likelihood process are i.i.d. categorical random variables, by Jensen’s inequality,

(INK)=𝔼[(INK|Θ)]=𝔼[(I1K|Θ)N]𝔼[(I1K|Θ)]N.\displaystyle\mathbb{P}(I_{N}\leq K)=\mathbb{E}\left[\mathbb{P}(I_{N}\leq K|\Theta)\right]=\mathbb{E}\left[\mathbb{P}(I_{1}\leq K|\Theta)^{N}\right]\geq\mathbb{E}\left[\mathbb{P}(I_{1}\leq K|\Theta)\right]^{N}. (117)

Next, since ϑi=j=1dθij\vartheta_{i}=\prod_{j=1}^{d}\theta_{i_{j}}, we can simulate a categorical variable with probabilities proportional to ϑi\vartheta_{i}, idi\in\mathbb{N}_{\neq}^{d} by sampling dd indices (J1,,Jd)(J_{1},\cdots,J_{d}) independently from a categorical distribution with probabilities proportional to (θ1,θ2,)(\theta_{1},\theta_{2},\dots) and discarding samples where Jj=JkJ_{j}=J_{k} for some 1j,kd1\leq j,k\leq d. Denote θk=θk/kθk\theta^{\prime}_{k}=\theta_{k}/\sum_{k}\theta_{k} to be the normalized rates, PJ,K:=j=JKθjP_{J,K}:=\sum_{j=J}^{K}\theta^{\prime}_{j}, and the event 𝒬:={JjJk,1j,kd}\mathcal{Q}:=\{J_{j}\neq J_{k},\forall 1\leq j,k\leq d\}. Then

(I1K|Θ)=(1JjK, 1jd|𝒬,Θ).\displaystyle\mathbb{P}(I_{1}\leq K|\Theta)=\mathbb{P}(1\leq J_{j}\leq K,\ 1\leq j\leq d\ |\ \mathcal{Q},\Theta). (118)

Since the normalized rates θk\theta^{\prime}_{k} are generated from the inverse Lévy representation, they are monotone decreasing. Therefore

(1JjK, 1jd|𝒬,Θ)\displaystyle\mathbb{P}(1\leq J_{j}\leq K,\ 1\leq j\leq d\ |\ \mathcal{Q},\Theta) P1,K10P2,K1P1,1Pd,K1P1,d1\displaystyle\geq\frac{P_{1,K}}{1-0}\cdot\frac{P_{2,K}}{1-P_{1,1}}\cdots\frac{P_{d,K}}{1-P_{1,d-1}} (119)
(Pd,K1P1,d1)d.\displaystyle\geq\left(\frac{P_{d,K}}{1-P_{1,d-1}}\right)^{d}. (120)

By Jensen’s inequality,

𝔼[(1JjK, 1jd|𝒬,Θ)]𝔼[Pd,K1P1,d1]d.\displaystyle\mathbb{E}[\mathbb{P}(1\leq J_{j}\leq K,\ 1\leq j\leq d\ |\ \mathcal{Q},\Theta)]\geq\mathbb{E}\left[\frac{P_{d,K}}{1-P_{1,d-1}}\right]^{d}. (121)

Note that for a categorical random variable JJ with class probabilities given by θj/(1P1,d1)\theta^{\prime}_{j}/(1-P_{1,d-1}), jdj\geq d, the quantity Pd,K/(1P1,d1)P_{d,K}/(1-P_{1,d-1}) is the probability that dJKd\leq J\leq K. So by the infinite Gumbel-max sampling lemma,

𝔼[Pd,K1P1,d1]=(dargmaxjd(logθj+Wj)K),Wji.i.d.𝖦𝗎𝗆𝖻𝖾𝗅(0,1),\displaystyle\mathbb{E}\left[\frac{P_{d,K}}{1-P_{1,d-1}}\right]=\mathbb{P}\left(d\leq\operatornamewithlimits{arg\,max}_{j\geq d}(\log\theta_{j}+W_{j})\leq K\right),\quad W_{j}\overset{\textrm{\tiny{i.i.d.}\@}}{\sim}{\sf{Gumbel}}(0,1), (122)

where we can replace θj\theta^{\prime}_{j} with the unnormalized θj\theta_{j} because the normalization does not affect the argmax\operatornamewithlimits{arg\,max}. Denoting

MK:=maxdkKlogν(Γk)+Wk,\displaystyle M_{K}:=\max_{d\leq k\leq K}\log\nu^{\leftarrow}(\Gamma_{k})+W_{k}, MK+:=supk>Klogν(Γk)+Wk,\displaystyle M_{K+}:=\sup_{k>K}\log\nu^{\leftarrow}(\Gamma_{k})+W_{k}, (123)

we have that

(INK)(1𝔼[(MK<MK+|ΓK)])Nd,\displaystyle\mathbb{P}(I_{N}\leq K)\geq\left(1-\mathbb{E}[\mathbb{P}(M_{K}<M_{K+}|\Gamma_{K})]\right)^{N\cdot d}, (124)

and so the remainder of the proof focuses on the conditional expectation. Conditioned on ΓK\Gamma_{K},

MK=𝑑max{logν(ΓK)+WK,maxdkKlogν(uk)+Wk}.\displaystyle M_{K}\overset{d}{=}\max\left\{\log\nu^{\leftarrow}(\Gamma_{K})+W_{K},\ \max_{d\leq k\leq K}\log\nu^{\leftarrow}(u_{k})+W_{k}\right\}. (125)

The cumulative distribution function and the probability density function of the Gumbel distribution 𝖦𝗎𝗆𝖻𝖾𝗅(0,1){\sf{Gumbel}}(0,1) is

F(x)=eex,f(x)=e(x+ex).\displaystyle F(x)=e^{-e^{-x}},\quad f(x)=e^{-(x+e^{-x})}. (126)

So

(logν(ΓK)+WKx|ΓK)=eν(ΓK)ex,\displaystyle\mathbb{P}(\log\nu^{\leftarrow}(\Gamma_{K})+W_{K}\leq x\ |\ \Gamma_{K})=e^{-\nu^{\leftarrow}(\Gamma_{K})e^{-x}}, (127)

and

(logν(uk)+Wkx|ΓK)=01eν(ΓKu)exdu.\displaystyle\mathbb{P}(\log\nu^{\leftarrow}(u_{k})+W_{k}\leq x\ |\ \Gamma_{K})=\int_{0}^{1}e^{-\nu^{\leftarrow}(\Gamma_{K}u)e^{-x}}\mathrm{d}u. (128)

Therefore,

(MKx|ΓK)=(01eν(ΓKu)exdu)Kdeν(ΓK)ex.\displaystyle\mathbb{P}(M_{K}\leq x\ |\ \Gamma_{K})=\left(\int_{0}^{1}e^{-\nu^{\leftarrow}(\Gamma_{K}u)e^{-x}}\mathrm{d}u\right)^{K-d}e^{-\nu^{\leftarrow}(\Gamma_{K})e^{-x}}. (129)

Denote Q(u,t)=eν(u)etQ(u,t)=e^{-\nu^{\leftarrow}(u)e^{-t}}, and

(MKx|ΓK)=(01Q(ΓKu,x)du)KdQ(ΓK,x).\displaystyle\mathbb{P}(M_{K}\leq x\ |\ \Gamma_{K})=\left(\int_{0}^{1}Q(\Gamma_{K}u,x)\mathrm{d}u\right)^{K-d}Q(\Gamma_{K},x). (130)

Conditioned on ΓK\Gamma_{K}, the tail ΓK+1,ΓK+2,\Gamma_{K+1},\Gamma_{K+2},\cdots is a unit rate homogeneous Poisson process on [ΓK,)[\Gamma_{K},\infty) that is independent of Γ1,,ΓK1\Gamma_{1},\cdots,\Gamma_{K-1}. So conditioned on ΓK\Gamma_{K},

MK+=𝑑supk1logν(ΓK+Γk)+Wk,\displaystyle M_{K+}\overset{d}{=}\sup_{k\geq 1}\log\nu^{\leftarrow}(\Gamma_{K}+\Gamma_{k}^{\prime})+W_{k}, (131)

where Γk\Gamma_{k}^{\prime} is unit rate of homogeneous Poisson process on +\mathbb{R}_{+}. Since Γk\Gamma_{k}^{\prime} is a Poisson process, logν(ΓK+Γk)+Wk\log\nu^{\leftarrow}(\Gamma_{K}+\Gamma_{k}^{\prime})+W_{k} is also a Poisson process with the rate measure

(0e(tlogν(ΓK+γ))e(tlogν(ΓK+γ))dγ)dt.\displaystyle\left(\int_{0}^{\infty}e^{-(t-\log\nu^{\leftarrow}(\Gamma_{K}+\gamma))-e^{-(t-\log\nu^{\leftarrow}(\Gamma_{K}+\gamma))}}\mathrm{d}\gamma\right)\mathrm{d}t. (132)

(MK+x|ΓK)\mathbb{P}(M_{K+}\leq x\ |\ \Gamma_{K}) is the probability that no atom of the Poisson process is greater than xx. For a Poisson process with rate measure μ(dt)\mu(\mathrm{d}t), this probability is exμ(dt)e^{-\int_{x}^{\infty}\mu(\mathrm{d}t)},

(MK+x|ΓK)\displaystyle\mathbb{P}(M_{K+}\leq x\ |\ \Gamma_{K}) =ex(0e(tlogν(ΓK+γ))e(tlogν(Γ+γ))dγ)dt\displaystyle=e^{-\int_{x}^{\infty}\left(\int_{0}^{\infty}e^{-(t-\log\nu^{\leftarrow}(\Gamma_{K}+\gamma))-e^{-(t-\log\nu^{\leftarrow}(\Gamma+\gamma))}}\mathrm{d}\gamma\right)\mathrm{d}t} (133)
=e0(eν(ΓK+γ)ex1)dγ\displaystyle=e^{\int_{0}^{\infty}\left(e^{-\nu^{\leftarrow}(\Gamma_{K}+\gamma)e^{-x}}-1\right)\mathrm{d}\gamma} (134)
=e0Q(ΓK+γ,x)1dγ,\displaystyle=e^{\int_{0}^{\infty}Q(\Gamma_{K}+\gamma,\,x)-1\,\mathrm{d}\gamma}, (135)

where the second equation comes from the fact that the inner integrand is the probability density function of a Gumbel distribution. Therefore,

(MK<MK+|ΓK)\displaystyle\mathbb{P}(M_{K}<M_{K+}\ |\ \Gamma_{K}) (136)
=\displaystyle= (MKx|ΓK)ddx(MK+x|ΓK)dx\displaystyle\int_{-\infty}^{\infty}\mathbb{P}(M_{K}\leq x\ |\ \Gamma_{K})\frac{\mathrm{d}}{\mathrm{d}x}\mathbb{P}(M_{K+}\leq x\ |\ \Gamma_{K})\mathrm{d}x (137)
=\displaystyle= Q(ΓK,x)(01Q(ΓKu,x)du)Kd(ddxe0Q(ΓK+γ,x)1dγ)dx.\displaystyle\int_{-\infty}^{\infty}Q(\Gamma_{K},x)\left(\int_{0}^{1}Q(\Gamma_{K}u,x)\mathrm{d}u\right)^{K-d}\left(\frac{\mathrm{d}}{\mathrm{d}x}e^{\int_{0}^{\infty}Q(\Gamma_{K}+\gamma,\,x)-1\,\mathrm{d}\gamma}\right)\mathrm{d}x. (138)

For the categorical variable JJ with class probabilities given by θj/(1P1,d1)\theta^{\prime}_{j}/(1-P_{1,d-1}), jdj\geq d, it holds that (dJK)1\mathbb{P}(d\leq J\leq K)\uparrow 1 as KK\to\infty. By the monotone convergence theorem

BK=(argmaxjd(logθj+Wj)>K)=1𝔼[Pd,K1P1,d]0.\displaystyle B_{K}=\mathbb{P}\left(\operatornamewithlimits{arg\,max}_{j\geq d}(\log\theta_{j}+W_{j})>K\right)=1-\mathbb{E}\left[\frac{P_{d,K}}{1-P_{1,d}}\right]\to 0. (139)

Appendix C Error bounds for edge-exchangeable networks

C.1 Rejection representation

We first derive the specific form of BKB_{K} in Corollary 3.3 for the rejection representation from Eq. 6, where

τ(U,Γ)=μ(Γ)𝟙[dνdμ(μ(Γ))U],\displaystyle\tau(U,\Gamma)=\mu^{\leftarrow}(\Gamma)\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma))\geq U\right], (140)

and U𝖴𝗇𝗂𝖿[0,1]U\sim{\sf{Unif}}[0,1]. So in Corollary 3.3,

BK,1\displaystyle B_{K,1} (141)
=\displaystyle= 𝔼[+4logπ(τ(u1,γ1+ΓK)τ(u2,γ2+ΓK))dγ1dγ2g(u1)du1g(u2)du2]\displaystyle\mathbb{E}\left[\int_{\mathbb{R}_{+}^{4}}-\log\pi\left(\tau(u_{1},\gamma_{1}+\Gamma_{K})\tau(u_{2},\gamma_{2}+\Gamma_{K})\right)\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}g(u_{1})\mathrm{d}u_{1}g(u_{2})\mathrm{d}u_{2}\right] (142)
=\displaystyle= 𝔼[ΓKΓK0101logπ(τ(u1,γ1)τ(u2,γ2))du1du2dγ1dγ2].\displaystyle\mathbb{E}\left[\int_{\Gamma_{K}}^{\infty}\int_{\Gamma_{K}}^{\infty}\int_{0}^{1}\int_{0}^{1}-\log\pi\left(\tau(u_{1},\gamma_{1})\tau(u_{2},\gamma_{2})\right)\mathrm{d}u_{1}\mathrm{d}u_{2}\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}\right]. (143)

Since π(0)=1\pi(0)=1, logπ(0)=0\log\pi(0)=0, we can take the indicator in τ\tau out of the function logπ(τ(u1,γ1)τ(u2,γ2))\log\pi(\tau(u_{1},\gamma_{1})\tau(u_{2},\gamma_{2})) to obtain

0101logπ(τ(u1,γ1)τ(u2,γ2))du1du2\displaystyle\int_{0}^{1}\int_{0}^{1}\log\pi\left(\tau(u_{1},\gamma_{1})\tau(u_{2},\gamma_{2})\right)\mathrm{d}u_{1}\mathrm{d}u_{2} (144)
=\displaystyle= 0101logπ(μ(γ1)μ(γ2))𝟙[dνdμ(μ(γ1))u1]𝟙[dνdμ(μ(γ2))u2]du1du2\displaystyle\int_{0}^{1}\int_{0}^{1}\log\pi\left(\mu^{\leftarrow}(\gamma_{1})\mu^{\leftarrow}(\gamma_{2})\right)\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{1}))\geq u_{1}\right]\mathds{1}\left[\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{2}))\geq u_{2}\right]\mathrm{d}u_{1}\mathrm{d}u_{2} (145)
=\displaystyle= logπ(μ(γ1)μ(γ2))dνdμ(μ(γ1))dνdμ(μ(γ2)).\displaystyle\log\pi\left(\mu^{\leftarrow}(\gamma_{1})\mu^{\leftarrow}(\gamma_{2})\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{1}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{2})). (146)

Transforming variables via x1=μ(γ1)x_{1}=\mu^{\leftarrow}(\gamma_{1}) and x2=μ(γ2)x_{2}=\mu^{\leftarrow}(\gamma_{2}), and noting that μ(ΓK)xΓKμ[x,)\mu^{\leftarrow}(\Gamma_{K})\geq x\iff\Gamma_{K}\leq\mu[x,\infty),

BK,1\displaystyle B_{K,1} =𝔼[0μ(ΓK)0μ(ΓK)logπ(x1x2)dνdμ(x1)dνdμ(x2)μ(dx1)μ(dx2)]\displaystyle=\mathbb{E}\left[\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}-\log\pi(x_{1}x_{2})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x_{1})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x_{2})\mu(\mathrm{d}x_{1})\mu(\mathrm{d}x_{2})\right]
=𝔼[+2logπ(x1x2)𝟙[x1μ(ΓK)]𝟙[x2μ(ΓK)]ν(dx1)ν(dx2)]\displaystyle=\mathbb{E}\left[\int_{\mathbb{R}_{+}^{2}}-\log\pi(x_{1}x_{2})\mathds{1}[x_{1}\leq\mu^{\leftarrow}(\Gamma_{K})]\mathds{1}[x_{2}\leq\mu^{\leftarrow}(\Gamma_{K})]\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2})\right]
=+2logπ(x1x2)𝔼[𝟙[ΓKμ[max{x1,x2},)]]ν(dx1)ν(dx2)\displaystyle=\int_{\mathbb{R}_{+}^{2}}-\log\pi(x_{1}x_{2})\mathbb{E}\left[\mathds{1}[\Gamma_{K}\leq\mu[\max\{x_{1},x_{2}\},\infty)]\right]\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2})
=+2logπ(x1x2)FK(μ[max{x1,x2},))ν(dx1)ν(dx2),\displaystyle=\int_{\mathbb{R}_{+}^{2}}-\log\pi(x_{1}x_{2})F_{K}\left(\mu[\max\{x_{1},x_{2}\},\infty)\right)\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}), (147)

where FKF_{K} is the cumulative distribution function of ΓK\Gamma_{K}. Using the same variable transformation again,

12(K1)BK,2\displaystyle\frac{1}{2(K-1)}B_{K,2} (148)
=\displaystyle= 𝔼[0ΓK+31ΓKlogπ(τ(u1,γ1)τ(u2,γ2+ΓK))g(u1)du1g(u2)du2dγ1dγ2]\displaystyle\mathbb{E}\left[\int_{0}^{\Gamma_{K}}\int_{\mathbb{R}_{+}^{3}}-\frac{1}{\Gamma_{K}}\log\pi\left(\tau(u_{1},\gamma_{1})\tau(u_{2},\gamma_{2}+\Gamma_{K})\right)g(u_{1})\mathrm{d}u_{1}g(u_{2})\mathrm{d}u_{2}\mathrm{d}\gamma_{1}\mathrm{d}\gamma_{2}\right] (149)
=\displaystyle= 𝔼[μ(ΓK)0μ(ΓK)1ΓKlogπ(x1x2)dνdμ(x1)dνdμ(x2)μ(dx1)μ(dx2)]\displaystyle\mathbb{E}\left[\int_{\mu^{\leftarrow}(\Gamma_{K})}^{\infty}\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}-\frac{1}{\Gamma_{K}}\log\pi(x_{1}x_{2})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x_{1})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x_{2})\mu(\mathrm{d}x_{1})\mu(\mathrm{d}x_{2})\right] (150)
=\displaystyle= +2𝔼[1ΓK𝟙[μ[x1,)ΓKμ[x2,)]]logπ(x1x2)ν(dx1)ν(dx2).\displaystyle\int_{\mathbb{R}_{+}^{2}}-\mathbb{E}\left[\frac{1}{\Gamma_{K}}\mathds{1}[\mu[x_{1},\infty)\leq\Gamma_{K}\leq\mu[x_{2},\infty)]\right]\log\pi(x_{1}x_{2})\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}). (151)
BK,3\displaystyle B_{K,3} =2𝔼[+3logπ(τ(u1,ΓK)τ(u2,γ2+ΓK))g(u1)du1g(u2)du2dγ2]\displaystyle=2\mathbb{E}\left[\int_{\mathbb{R}_{+}^{3}}-\log\pi\left(\tau(u_{1},\Gamma_{K})\tau(u_{2},\gamma_{2}+\Gamma_{K})\right)g(u_{1})\mathrm{d}u_{1}g(u_{2})\mathrm{d}u_{2}\mathrm{d}\gamma_{2}\right] (152)
=2𝔼[ΓKlogπ(μ(ΓK)μ(γ2))dνdμ(μ(ΓK))dνdμ(μ(γ2))dγ2]\displaystyle=2\mathbb{E}\left[\int_{\Gamma_{K}}^{\infty}-\log\pi\left(\mu^{\leftarrow}(\Gamma_{K})\mu^{\leftarrow}(\gamma_{2})\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{K}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\gamma_{2}))\mathrm{d}\gamma_{2}\right] (153)
=2𝔼[0μ(ΓK)logπ(μ(ΓK)x)dνdμ(μ(ΓK))dνdμ(x)μ(dx)]\displaystyle=2\mathbb{E}\left[\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}-\log\pi\left(\mu^{\leftarrow}(\Gamma_{K})x\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{K}))\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(x)\mu(\mathrm{d}x)\right] (154)
=2𝔼[+logπ(μ(ΓK)x)dνdμ(μ(ΓK))𝟙[ΓKμ[x,)]ν(dx)].\displaystyle=2\mathbb{E}\left[\int_{\mathbb{R}_{+}}-\log\pi\left(\mu^{\leftarrow}(\Gamma_{K})x\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}(\mu^{\leftarrow}(\Gamma_{K}))\mathds{1}[\Gamma_{K}\leq\mu[x,\infty)]\nu(\mathrm{d}x)\right]. (155)

C.2 Beta-independent Bernoulli process network

Dense network

When α=0\alpha=0,

ν(dθ)=γλθ1(1θ)λ1dθ,μ(dθ)=γλθ1dθ,\displaystyle\nu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}(1-\theta)^{\lambda-1}\mathrm{d}\theta,\quad\mu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}\mathrm{d}\theta, (156)

and

dνdμ=(1θ)λ1,μ([x,1])=γlogx,μ(u)=eu/γ.\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1-\theta)^{\lambda-1},\quad\mu\left([x,1]\right)=-\gamma^{\prime}\log x,\quad\mu^{\leftarrow}(u)=e^{-u/\gamma^{\prime}}. (157)

Substituting ν(dx),μ(dx)\nu(\mathrm{d}x),\mu(\mathrm{d}x) and π(θ)\pi(\theta) into Eq. 147 and noting that the integrand is symmetric around the line x1=x2x_{1}=x_{2},

BK,1\displaystyle B_{K,1} =201x11log(1x1x2)FK(μ[x2,1])γλx21(1x2)λ1dx2ν(dx1).\displaystyle=2\int_{0}^{1}\int_{x_{1}}^{1}-\log(1-x_{1}x_{2})F_{K}\left(\mu[x_{2},1]\right)\gamma\lambda x_{2}^{-1}(1-x_{2})^{\lambda-1}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}). (158)

Next, note that 0FK(μ[x2,1])FK(μ[x1,1])0\leq F_{K}\left(\mu[x_{2},1]\right)\leq F_{K}\left(\mu[x_{1},1]\right) when x2x1x_{2}\geq x_{1} and

0log(1x1x2)(11x1x21)=(x1x21x1x2)(x1x21x1).\displaystyle 0\geq\log(1-x_{1}x_{2})\geq-\left(\frac{1}{1-x_{1}x_{2}}-1\right)=-\left(\frac{x_{1}x_{2}}{1-x_{1}x_{2}}\right)\geq-\left(\frac{x_{1}x_{2}}{1-x_{1}}\right). (159)

So

BK,1\displaystyle B_{K,1} 2γλ01FK(μ([x1,1]))x11x1x21x1x21(1x2)λ1dx2ν(dx1)\displaystyle\leq 2\gamma\lambda\int_{0}^{1}F_{K}\left(\mu([x_{1},1])\right)\int_{x_{1}}^{1}\frac{x_{1}x_{2}}{1-x_{1}}x_{2}^{-1}(1-x_{2})^{\lambda-1}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}) (160)
=2γ2λ01(1x)2(λ1)FK(μ([x,1]))dx.\displaystyle=2\gamma^{2}\lambda\int_{0}^{1}(1-x)^{2(\lambda-1)}F_{K}\left(\mu([x,1])\right)\mathrm{d}x. (161)

For any a(0,1)a\in(0,1), dividing the integral into two parts and bounding each part separately,

BK,12γ2λ[0a(1x)2(λ1)dx+a1FK(γlogx)(1x)2(λ1)dx].\displaystyle B_{K,1}\leq 2\gamma^{2}\lambda\left[\int_{0}^{a}(1-x)^{2(\lambda-1)}\mathrm{d}x+\int_{a}^{1}F_{K}\left(-\gamma^{\prime}\log x\right)(1-x)^{2(\lambda-1)}\mathrm{d}x\right]. (162)

Assume for the moment that λ0.5\lambda\neq 0.5. Use the fact that FK(t)(3t/K)KF_{K}(t)\leq(3t/K)^{K} and note also that when x[a,1]x\in[a,1], logx[loga/(1a)](1x)-\log x\leq\left[-\log a/(1-a)\right](1-x),

12γ2λBK,1\displaystyle\frac{1}{2\gamma^{2}\lambda}B_{K,1} (163)
\displaystyle\leq 1(1a)2λ12λ1+a1(3γ[loga/(1a)](1x)K)K(1x)2λ2dx\displaystyle\frac{1-(1-a)^{2\lambda-1}}{2\lambda-1}+\int_{a}^{1}\left(\frac{3\gamma^{\prime}\left[-\log a/(1-a)\right](1-x)}{K}\right)^{K}(1-x)^{2\lambda-2}\mathrm{d}x (164)
=\displaystyle= 1(1a)2λ12λ1+(3γlogaK)K1K+2λ1(1a)2λ1.\displaystyle\frac{1-(1-a)^{2\lambda-1}}{2\lambda-1}+\left(-\frac{3\gamma^{\prime}\log a}{K}\right)^{K}\frac{1}{K+2\lambda-1}(1-a)^{2\lambda-1}. (165)

It can be seen that there exists a constant value c>1c>1 such that 3γlogc=1/c3\gamma^{\prime}\log c=1/c. Setting a=cKa=c^{-K} and using the first order Taylor’s expansion to approximate the first term, it can be seen that

1(1a)2λ12λ1cK,(3γlogaK)K1K+2λ1(1a)2λ11KcK.\displaystyle\frac{1-(1-a)^{2\lambda-1}}{2\lambda-1}\sim c^{-K},\quad\left(-\frac{3\gamma^{\prime}\log a}{K}\right)^{K}\frac{1}{K+2\lambda-1}(1-a)^{2\lambda-1}\sim\frac{1}{K}c^{-K}. (166)

Therefore, there exists K0K_{0} such that when K>K0K>K_{0},

BK,14γ2λcK.\displaystyle B_{K,1}\leq 4\gamma^{2}\lambda c^{-K}. (167)

If λ=0.5\lambda=0.5,

BK,1\displaystyle B_{K,1} 2γ2λ01(1x)1FK(μ([x,1]))dx\displaystyle\leq 2\gamma^{2}\lambda\int_{0}^{1}(1-x)^{-1}F_{K}\left(\mu([x,1])\right)\mathrm{d}x (168)
2γ2λ[log(1a)+(3γK)K1K(loga)K],\displaystyle\leq 2\gamma^{2}\lambda\left[-\log(1-a)+\left(\frac{3\gamma^{\prime}}{K}\right)^{K}\frac{1}{K}(-\log a)^{K}\right], (169)

which can be bounded similarly by choosing the same cc and setting a=cKa=c^{-K}. Therefore we can find a constant K0K_{0} and for K>K0K>K_{0},

BK,14γ2λcK.\displaystyle B_{K,1}\leq 4\gamma^{2}\lambda c^{-K}. (170)

Next, the term BK,2B_{K,2} is bounded via

12BK,2\displaystyle\frac{1}{2}B_{K,2} (171)
=\displaystyle= 01x11log(1x1x2)[FK1(μ([x1,1]))FK1(μ([x2,1]))]ν(dx2)ν(dx1)\displaystyle\int_{0}^{1}\int_{x_{1}}^{1}-\log(1-x_{1}x_{2})\left[F_{K-1}\left(\mu([x_{1},1])\right)-F_{K-1}\left(\mu([x_{2},1])\right)\right]\nu(\mathrm{d}x_{2})\nu(\mathrm{d}x_{1}) (172)
\displaystyle\leq 01x11log(1x1x2)FK1(μ[x1,1])ν(dx2)ν(dx1).\displaystyle\int_{0}^{1}\int_{x_{1}}^{1}-\log(1-x_{1}x_{2})F_{K-1}\left(\mu[x_{1},1]\right)\nu(\mathrm{d}x_{2})\nu(\mathrm{d}x_{1}). (173)

This has exactly the same form as BK,1B_{K,1}, except that the CDF of ΓK\Gamma_{K} is replaced with that of ΓK1\Gamma_{K-1}. Therefore, it can be shown that for large KK,

BK,24γ2λc(K1).\displaystyle B_{K,2}\leq 4\gamma^{2}\lambda c^{-(K-1)}. (174)

Finally, BK,3B_{K,3} may be expressed as

12BK,3\displaystyle\frac{1}{2}B_{K,3} (175)
=\displaystyle= 𝔼[01log(1μ(ΓK)x)dνdμ(μ(ΓK))𝟙[xμ(ΓK)]]ν(dx)\displaystyle\mathbb{E}\left[\int_{0}^{1}-\log\left(1-\mu^{\leftarrow}(\Gamma_{K})x\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\left(\mu^{\leftarrow}(\Gamma_{K})\right)\mathds{1}\left[x\leq\mu^{\leftarrow}(\Gamma_{K})\right]\right]\nu(\mathrm{d}x) (176)
=\displaystyle= γλ𝔼[0μ(Γk)log(1μ(ΓK)x)(1μ(ΓK))λ1x1(1x)λ1dx].\displaystyle\gamma\lambda\mathbb{E}\left[\int_{0}^{\mu^{\leftarrow}(\Gamma_{k})}-\log\left(1-\mu^{\leftarrow}(\Gamma_{K})x\right)\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda-1}x^{-1}(1-x)^{\lambda-1}\mathrm{d}x\right]. (177)

Since log(1μ(ΓK)x)μ(ΓK)x/(1μ(ΓK))\log\left(1-\mu^{\leftarrow}(\Gamma_{K})x\right)\geq-\mu^{\leftarrow}(\Gamma_{K})x/\left(1-\mu^{\leftarrow}(\Gamma_{K})\right),

BK,3\displaystyle B_{K,3} 2γλ𝔼[(1μ(ΓK))λ2μ(ΓK)0μ(ΓK)(1x)λ1dx]\displaystyle\leq 2\gamma\lambda\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda-2}\mu^{\leftarrow}(\Gamma_{K})\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}(1-x)^{\lambda-1}\mathrm{d}x\right] (178)
2γ𝔼[(1μ(ΓK))λ2μ(ΓK)]\displaystyle\geq 2\gamma\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda-2}\mu^{\leftarrow}(\Gamma_{K})\right] (179)
=2γ0(1ex/γ)λ2ex/γxK1Γ(K)exdx.\displaystyle=2\gamma\int_{0}^{\infty}\left(1-e^{-x/\gamma^{\prime}}\right)^{\lambda-2}e^{-x/\gamma^{\prime}}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x. (180)

We split the analysis of this term into two cases. In the first case, assuming λ2\lambda\geq 2, we have that

BK,32γ0xK1Γ(K)ex(1+γ)/γdx=2γ(γ1+γ)K.\displaystyle B_{K,3}\leq 2\gamma\int_{0}^{\infty}\frac{x^{K-1}}{\Gamma(K)}e^{-x(1+\gamma^{\prime})/\gamma^{\prime}}\mathrm{d}x=2\gamma\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}. (181)

On the other hand, if λ<2\lambda<2, we bound the integral over [0,γ][0,\gamma^{\prime}] and over [γ,)[\gamma^{\prime},\infty) separately. Since 1exx21-e^{-x}\geq x^{2} for x[0,1]x\in[0,1],

12γBK,3\displaystyle\frac{1}{2\gamma}B_{K,3} (182)
\displaystyle\leq 0γ(xγ)2(λ2)xK1Γ(K)ex(1+γ)/γdx+(1e1)λ2γxK1Γ(K)ex(1+γ)/γdx\displaystyle\int_{0}^{\gamma^{\prime}}\left(\frac{x}{\gamma^{\prime}}\right)^{2(\lambda-2)}\frac{x^{K-1}}{\Gamma(K)}e^{-x(1+\gamma^{\prime})/\gamma^{\prime}}\mathrm{d}x+(1-e^{-1})^{\lambda-2}\int_{\gamma^{\prime}}^{\infty}\frac{x^{K-1}}{\Gamma(K)}e^{-x(1+\gamma^{\prime})/\gamma^{\prime}}\mathrm{d}x (183)
\displaystyle\leq γ2(2λ)0γxK1+2(λ2)Γ(K)dx+(1e1)λ20xK1Γ(K)ex(1+γ)/γdx\displaystyle\gamma^{\prime 2(2-\lambda)}\int_{0}^{\gamma^{\prime}}\frac{x^{K-1+2(\lambda-2)}}{\Gamma(K)}\mathrm{d}x+(1-e^{-1})^{\lambda-2}\int_{0}^{\infty}\frac{x^{K-1}}{\Gamma(K)}e^{-x(1+\gamma^{\prime})/\gamma^{\prime}}\mathrm{d}x (184)
=\displaystyle= 1Γ(K)γKK+2(λ2)+(1e1)λ2(γ1+γ)K.\displaystyle\frac{1}{\Gamma(K)}\frac{\gamma^{\prime K}}{K+2(\lambda-2)}+(1-e^{-1})^{\lambda-2}\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}. (185)

As KK\rightarrow\infty, the second term will dominate the first term, so when λ2<0\lambda-2<0, the following inequality holds for large KK,

BK,34γ(1e1)λ2(γ1+γ)K.\displaystyle B_{K,3}\leq 4\gamma(1-e^{-1})^{\lambda-2}\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}. (186)

BK=BK,1+BK,2+BK,3B_{K}=B_{K,1}+B_{K,2}+B_{K,3} and as KK\rightarrow\infty, BK,3B_{K,3} will dominate BK,1B_{K,1} and BK,2B_{K,2}. So there exists a K0K_{0}\in\mathbb{N} such that for K>K0K>K_{0},

BK12γ(1e1)λ2(γ1+γ)K0.\displaystyle B_{K}\leq 12\gamma(1-e^{-1})^{\lambda-2}\left(\frac{\gamma^{\prime}}{1+\gamma^{\prime}}\right)^{K}\longrightarrow 0. (187)

Sparse network

When α>0\alpha>0,

ν(dθ)=γθ1α(1θ)λ+α1dθ,μ(dθ)=γθ1αdθ,\displaystyle\nu(\mathrm{d}\theta)=\gamma^{\prime}\theta^{-1-\alpha}(1-\theta)^{\lambda+\alpha-1}\mathrm{d}\theta,\qquad\mu(\mathrm{d}\theta)=\gamma^{\prime}\theta^{-1-\alpha}\mathrm{d}\theta, (188)

and

dνdμ=(1θ)λ+α1,\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1-\theta)^{\lambda+\alpha-1}, μ([x,1])=γα1(xα1),\displaystyle\mu\left([x,1]\right)=\gamma^{\prime}\alpha^{-1}(x^{-\alpha}-1), μ(u)=(1+αuγ)1/α.\displaystyle\mu^{\leftarrow}(u)=\left(1+\frac{\alpha u}{\gamma^{\prime}}\right)^{-1/\alpha}. (189)

Similar to the case when α=0\alpha=0,

BK,1=201x11log(1x1x2)FK(μ([x2,1]))γx21α(1x2)λ+α1dx2ν(dx1).\displaystyle B_{K,1}=2\int_{0}^{1}\int_{x_{1}}^{1}-\log(1-x_{1}x_{2})F_{K}\left(\mu([x_{2},1])\right)\gamma^{\prime}x_{2}^{-1-\alpha}(1-x_{2})^{\lambda+\alpha-1}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}). (190)

Since log(1x1x2)x1x2/(1x1)\log(1-x_{1}x_{2})\geq-x_{1}x_{2}/(1-x_{1}),

BK,1\displaystyle B_{K,1} 2γ201FK(γα1(xα1))x2α(1x)2[(λ+α)1]dx\displaystyle\leq 2\gamma^{\prime 2}\int_{0}^{1}F_{K}\left(\gamma^{\prime}\alpha^{-1}(x^{-\alpha}-1)\right)x^{-2\alpha}(1-x)^{2[(\lambda+\alpha)-1]}\mathrm{d}x (191)
2γ201(γα1xα(1xα))KΓ(K+1)x2α(1x)2[(λ+α)1]dx\displaystyle\leq 2\gamma^{\prime 2}\int_{0}^{1}\frac{(\gamma^{\prime}\alpha^{-1}x^{-\alpha}(1-x^{\alpha}))^{K}}{\Gamma(K+1)}x^{-2\alpha}(1-x)^{2[(\lambda+\alpha)-1]}\mathrm{d}x (192)
2γ201(γα1xα)KΓ(K+1)x2α(1x)2[(λ+α)1]dx\displaystyle\leq 2\gamma^{\prime 2}\int_{0}^{1}\frac{(\gamma^{\prime}\alpha^{-1}x^{-\alpha})^{K}}{\Gamma(K+1)}x^{-2\alpha}(1-x)^{2[(\lambda+\alpha)-1]}\mathrm{d}x (193)
2α201(γα1xα)K+2Γ(K+1)dx.\displaystyle\leq 2\alpha^{2}\int_{0}^{1}\frac{\left(\gamma^{\prime}\alpha^{-1}x^{-\alpha}\right)^{K+2}}{{\Gamma(K+1)}}\mathrm{d}x. (194)

Denoting t=γα1xαt=\gamma^{\prime}\alpha^{-1}x^{-\alpha}, then

BK,1\displaystyle B_{K,1} 2α(γα1)1/αγα1tK+11/αΓ(K+1)dt\displaystyle\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\int_{\gamma^{\prime}\alpha^{-1}}^{\infty}\frac{t^{K+1-1/\alpha}}{\Gamma(K+1)}\ \mathrm{d}t (195)
2α(γα1)1/αeγα1Γ(K+21/α)Γ(K+1)γα1tK+11/αΓ(K+21/α)etdt\displaystyle\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ \frac{\Gamma(K+2-1/\alpha)}{\Gamma(K+1)}\int_{\gamma^{\prime}\alpha^{-1}}^{\infty}\frac{t^{K+1-1/\alpha}}{\Gamma(K+2-1/\alpha)}e^{-t}\ \mathrm{d}t (196)
2α(γα1)1/αeγα1Γ(K+21/α)Γ(K+1).\displaystyle\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ \frac{\Gamma(K+2-1/\alpha)}{\Gamma(K+1)}. (197)

By Stirling’s formula,

BK,12α(γα1)1/αeγα1Kα1α.\displaystyle B_{K,1}\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ K^{\frac{\alpha-1}{\alpha}}. (198)

Now we consider the error bound for BK,2B_{K,2}. As we have shown in the example when α=0\alpha=0, here we can obtain that

BK,22α(γα1)1/αeγα1(K1)α1α.\displaystyle B_{K,2}\leq 2\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ (K-1)^{\frac{\alpha-1}{\alpha}}. (199)

Similar to BK,1B_{K,1} and BK,2B_{K,2},

BK,3\displaystyle B_{K,3} (200)
=\displaystyle= 2𝔼[01log(1μ(ΓK)x)dνdμ(μ(ΓK))𝟙(ΓKμ[x,1])ν(dx)]\displaystyle 2\mathbb{E}\left[\int_{0}^{1}-\log\left(1-\mu^{\leftarrow}(\Gamma_{K})x\right)\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\left(\mu^{\leftarrow}(\Gamma_{K})\right)\mathds{1}\left(\Gamma_{K}\leq\mu^{\leftarrow}[x,1]\right)\nu(\mathrm{d}x)\right] (201)
\displaystyle\leq 2γ𝔼[(1μ(ΓK))λ+α10μ(ΓK)μ(ΓK)x1xx1α(1x)λ+α1dx]\displaystyle 2\gamma^{\prime}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda+\alpha-1}\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}\frac{\mu^{\leftarrow}(\Gamma_{K})x}{1-x}x^{-1-\alpha}(1-x)^{\lambda+\alpha-1}\mathrm{d}x\right] (202)
=\displaystyle= 2γ𝔼[(1μ(ΓK))λ+α1μ(ΓK)0μ(ΓK)xα(1x)λ+α2dx].\displaystyle 2\gamma^{\prime}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda+\alpha-1}\mu^{\leftarrow}(\Gamma_{K})\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}x^{-\alpha}(1-x)^{\lambda+\alpha-2}\mathrm{d}x\right]. (203)

We again split our analysis into two cases. First, suppose that λ+α20\lambda+\alpha-2\geq 0. Then

BK,3\displaystyle B_{K,3} (204)
\displaystyle\leq 2γ1α𝔼[(1μ(ΓK))λ+α1μ(ΓK)2α]\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda+\alpha-1}\mu^{\leftarrow}(\Gamma_{K})^{2-\alpha}\right] (205)
=\displaystyle= 2γ1α0[1(1+αxγ)1/α]λ+α1(1+αxγ)(2α)/αxK1Γ(K)exdx.\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\int_{0}^{\infty}\left[1-\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{\lambda+\alpha-1}\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x. (206)

Since α<1\alpha<1,

(1+αxγ)(2α)/α(αxγ)(2α)/α,\displaystyle\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\leq\left(\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}, (207)

and so

BK,3\displaystyle B_{K,3} 2γ1α0(αγ)(2α)/αxK1(2α)/αΓ(K)exdx\displaystyle\leq\frac{2\gamma^{\prime}}{1-\alpha}\int_{0}^{\infty}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\frac{x^{K-1-(2-\alpha)/\alpha}}{\Gamma(K)}e^{-x}\mathrm{d}x (208)
2γ1α(αγ)(2α)/αΓ(K(2α)/α)Γ(K)\displaystyle\leq\frac{2\gamma^{\prime}}{1-\alpha}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{\Gamma(K-(2-\alpha)/\alpha)}{\Gamma(K)} (209)
2γ1α(αγ)(2α)/αK(2α)/α,\displaystyle\sim\frac{2\gamma^{\prime}}{1-\alpha}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}K^{-(2-\alpha)/\alpha}, (210)

where the last equation is obtained from Stirling’s formula.

On the other hand, if λ+α2<0\lambda+\alpha-2<0,

BK,3\displaystyle B_{K,3} (211)
\displaystyle\leq 2γ𝔼[(1μ(ΓK))λ+α1μ(ΓK)0μ(ΓK)xα(1x)λ+α2dx]\displaystyle 2\gamma^{\prime}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{\lambda+\alpha-1}\mu^{\leftarrow}(\Gamma_{K})\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}x^{-\alpha}(1-x)^{\lambda+\alpha-2}\mathrm{d}x\right] (212)
\displaystyle\leq 2γ1α𝔼[(1μ(ΓK))2(λ+α)3μ(ΓK)2α]\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\mathbb{E}\left[\left(1-\mu^{\leftarrow}(\Gamma_{K})\right)^{2(\lambda+\alpha)-3}\mu^{\leftarrow}(\Gamma_{K})^{2-\alpha}\right] (213)
=\displaystyle= 2γ1α0[1(1+αxγ)1/α]2(λ+α)3(1+αxγ)(2α)/αxK1Γ(K)exdx.\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\int_{0}^{\infty}\left[1-\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{2(\lambda+\alpha)-3}\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x. (214)

If 2(λ+α)302(\lambda+\alpha)-3\geq 0, we get the same result as in the case when λ+α20\lambda+\alpha-2\geq 0. When 2(λ+α)302(\lambda+\alpha)-3\leq 0, note that we can find an x0x_{0} such that when x[0,x0]x\in[0,x_{0}],

1(1+αxγ)1/αx2.\displaystyle 1-\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-1/\alpha}\geq x^{2}. (215)

So

BK,3\displaystyle B_{K,3} (216)
\displaystyle\leq 2γ1α{0x0x4(λ+α)6(1+αxγ)(2α)/αxK1Γ(K)exdx\displaystyle\frac{2\gamma^{\prime}}{1-\alpha}\left\{\int_{0}^{x_{0}}x^{4(\lambda+\alpha)-6}\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x\right. (217)
+[1(1+αx0γ)1/α]2(λ+α)3x0(1+αxγ)(2α)/αxK1Γ(K)exdx}\displaystyle\left.+\left[1-\left(1+\frac{\alpha x_{0}}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{2(\lambda+\alpha)-3}\int_{x_{0}}^{\infty}\left(1+\frac{\alpha x}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\ \frac{x^{K-1}}{\Gamma(K)}e^{-x}\mathrm{d}x\right\} (218)
\displaystyle\leq γ1α(αγ)(2α)/α{0x0xK1+4(λ+α)6(2α)/αΓ(K)exdx\displaystyle\frac{\gamma^{\prime}}{1-\alpha}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\left\{\int_{0}^{x_{0}}\frac{x^{K-1+4(\lambda+\alpha)-6-(2-\alpha)/\alpha}}{\Gamma(K)}e^{-x}\mathrm{d}x\right. (219)
+[1(1+αx0γ)1/α]2(λ+α)3Γ(K(2α)/α)Γ(K)}.\displaystyle\left.+\left[1-\left(1+\frac{\alpha x_{0}}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{2(\lambda+\alpha)-3}\frac{\Gamma(K-(2-\alpha)/\alpha)}{\Gamma(K)}\right\}. (220)

Because here we assume that 2(λ+α)3<02(\lambda+\alpha)-3<0, the second term will dominate. So in this case we obtain that for large KK,

BK,3\displaystyle B_{K,3} 4γ1α(αγ)(2α)/α[1(1+αx0γ)1/α]2(λ+α)3K(2α)/α.\displaystyle\leq\frac{4\gamma^{\prime}}{1-\alpha}\left(\frac{\alpha}{\gamma^{\prime}}\right)^{-(2-\alpha)/\alpha}\left[1-\left(1+\frac{\alpha x_{0}}{\gamma^{\prime}}\right)^{-1/\alpha}\right]^{2(\lambda+\alpha)-3}K^{-(2-\alpha)/\alpha}. (221)

Asymptotically, BK,2B_{K,2} will dominate BK,1B_{K,1} and BK,3B_{K,3}, so there exists K0K_{0}\in\mathbb{N} such that when K>K0K>K_{0},

BK6α(γα1)1/αeγα1(K1)α1α0.\displaystyle B_{K}\leq 6\alpha(\gamma^{\prime}\alpha^{-1})^{1/\alpha}\ e^{\gamma^{\prime}\alpha^{-1}}\ (K-1)^{\frac{\alpha-1}{\alpha}}\longrightarrow 0. (222)

C.3 Gamma-independent Poisson network

Dense network

When α=0\alpha=0,

ν(dθ)=γλθ1eλθdθ,μ(dθ)=γλθ1(1+λθ)1dθ.\displaystyle\nu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}e^{-\lambda\theta}\mathrm{d}\theta,\qquad\mu(\mathrm{d}\theta)=\gamma\lambda\theta^{-1}(1+\lambda\theta)^{-1}\mathrm{d}\theta. (223)

In this case,

dνdμ=(1+λθ)eλθ,\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=(1+\lambda\theta)e^{-\lambda\theta}, μ[x,)=γλlog(1+(λx)1),\displaystyle\mu[x,\infty)=\gamma\lambda\log(1+(\lambda x)^{-1}), μ(u)=1λ(e(γλ)1u1).\displaystyle\mu^{\leftarrow}(u)=\frac{1}{\lambda(e^{(\gamma\lambda)^{-1}u}-1)}. (224)

For Poisson distribution, π(θ)=eθ\pi(\theta)=e^{-\theta}, so

BK,1=+2x1x2FK(μ[max{x1,x2},))ν(dx1)ν(dx2).\displaystyle B_{K,1}=\int_{\mathbb{R}_{+}^{2}}x_{1}x_{2}F_{K}\left(\mu[\max\{x_{1},x_{2}\},\infty)\right)\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}). (225)

Note that the integrand is symmetric around the line x1=x2x_{1}=x_{2}, so we only need to compute the integral above the line x1=x2x_{1}=x_{2}. In this region, eλx2eλx1e^{-\lambda x_{2}}\leq e^{-\lambda x_{1}} and 0FK(μ[x2,))FK(μ[x1,))10\leq F_{K}\left(\mu[x_{2},\infty)\right)\leq F_{K}\left(\mu[x_{1},\infty)\right)\leq 1. So

BK,1\displaystyle B_{K,1} =2+x1x1x2FK(μ[x2,))λγx21eλx2dx2ν(dx1)\displaystyle=2\int_{\mathbb{R}_{+}}x_{1}\int_{x_{1}}^{\infty}x_{2}F_{K}\left(\mu[x_{2},\infty)\right)\lambda\gamma x_{2}^{-1}e^{-\lambda x_{2}}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}) (226)
2+x1FK(μ[x1,))γeλx1ν(dx1)\displaystyle\leq 2\int_{\mathbb{R}_{+}}x_{1}F_{K}\left(\mu[x_{1},\infty)\right)\gamma e^{-\lambda x_{1}}\nu(\mathrm{d}x_{1}) (227)
=2γ2λ+FK(μ[x,))e2λxdx.\displaystyle=2\gamma^{2}\lambda\int_{\mathbb{R}_{+}}F_{K}\left(\mu[x,\infty)\right)e^{-2\lambda x}\mathrm{d}x. (228)

For any a>0a>0, we divide the integral into two parts and bound each part separately. We denote b=log(1+(λa)1)b=\log(1+(\lambda a)^{-1}) and use the fact that 0ae2λxdxa\int_{0}^{a}e^{-2\lambda x}\mathrm{d}x\leq a and FK(t)tK/K!(3t/K)KF_{K}(t)\leq t^{K}/K!\leq(3t/K)^{K}. So

BK,1\displaystyle B_{K,1} 2γ2λ[0ae2λxdx+FK(μ[a,))ae2λxdx]\displaystyle\leq 2\gamma^{2}\lambda\left[\int_{0}^{a}e^{-2\lambda x}\mathrm{d}x+F_{K}\left(\mu[a,\infty)\right)\int_{a}^{\infty}e^{-2\lambda x}\mathrm{d}x\right] (229)
2γ2λ[a+12λFK(γλlog(1+(λa)1))]\displaystyle\leq 2\gamma^{2}\lambda\left[a+\frac{1}{2\lambda}F_{K}\left(\gamma\lambda\log(1+(\lambda a)^{-1})\right)\right] (230)
2γ2λ[(λ(eb1))1+1λ(3γλbK)K].\displaystyle\leq 2\gamma^{2}\lambda\left[\left(\lambda(e^{b}-1)\right)^{-1}+\frac{1}{\lambda}\left(\frac{3\gamma\lambda b}{K}\right)^{K}\right]. (231)

Setting two terms equal and use the fact that (eb1)1eb(e^{b}-1)^{-1}\approx e^{-b} when bb is large, we get b=KW0((3γλ)1)b=KW_{0}((3\gamma\lambda)^{-1}) and W0W_{0} is defined by

W0(y)=xxex=y.\displaystyle W_{0}(y)=x\iff xe^{x}=y. (232)

Therefore,

BK,14γ2eKW0((3γλ)1)14γ2eKW0((3γλ)1).\displaystyle B_{K,1}\leq\frac{4\gamma^{2}}{e^{KW_{0}((3\gamma\lambda)^{-1})}-1}\sim 4\gamma^{2}e^{-KW_{0}\left((3\gamma\lambda)^{-1}\right)}. (233)

Similarly,

12(K1)BK,2\displaystyle\frac{1}{2(K-1)}B_{K,2} (234)
=\displaystyle= +2x1x2𝔼[1ΓK𝟙(μ[x2,)ΓKμ[x1,))]ν(dx1)ν(dx2).\displaystyle\int_{\mathbb{R}_{+}^{2}}x_{1}x_{2}\mathbb{E}\left[\frac{1}{\Gamma_{K}}\mathds{1}\left(\mu[x_{2},\infty)\leq\Gamma_{K}\leq\mu[x_{1},\infty)\right)\right]\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}). (235)

Note that

𝔼[1ΓK𝟙(μ[x2,)ΓKμ[x1,))]\displaystyle\mathbb{E}\left[\frac{1}{\Gamma_{K}}\mathds{1}\left(\mu[x_{2},\infty)\leq\Gamma_{K}\leq\mu[x_{1},\infty)\right)\right] (236)
=\displaystyle= 1K1[FK1(μ[x1,))FK1(μ[x2,))].\displaystyle\frac{1}{K-1}\left[F_{K-1}\left(\mu[x_{1},\infty)\right)-F_{K-1}\left(\mu[x_{2},\infty)\right)\right]. (237)

Keeping only the positive part,

BK,2\displaystyle B_{K,2} 2+x1FK1(μ[x1,))x1γλeλx2dx2ν(dx1)\displaystyle\leq 2\int_{\mathbb{R}_{+}}x_{1}F_{K-1}\left(\mu[x_{1},\infty)\right)\int_{x_{1}}^{\infty}\gamma\lambda e^{-\lambda x_{2}}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}) (238)
=2γ2λ+FK1(μ[x,))e2λxdx.\displaystyle=2\gamma^{2}\lambda\int_{\mathbb{R}_{+}}F_{K-1}\left(\mu[x,\infty)\right)e^{-2\lambda x}\mathrm{d}x. (239)

This has the same form as BK,1B_{K,1}, so

BK,24γ2e(K1)W0((3γλ)1)14γ2e(K1)W0((3γλ)1).\displaystyle B_{K,2}\leq\frac{4\gamma^{2}}{e^{(K-1)W_{0}((3\gamma\lambda)^{-1})}-1}\sim 4\gamma^{2}e^{-(K-1)W_{0}\left((3\gamma\lambda)^{-1}\right)}. (240)

Next,

BK,3\displaystyle B_{K,3} (241)
=\displaystyle= 2𝔼[+xμ(ΓK)dνdμ(μ(ΓK))𝟙[ΓKμ[x,)]]ν(dx)\displaystyle 2\mathbb{E}\left[\int_{\mathbb{R}_{+}}x\mu^{\leftarrow}(\Gamma_{K})\frac{\mathrm{d}\nu}{\mathrm{d}\mu}\left(\mu^{\leftarrow}(\Gamma_{K})\right)\mathds{1}\left[\Gamma_{K}\leq\mu[x,\infty)\right]\right]\nu(\mathrm{d}x) (242)
=\displaystyle= 2𝔼[0μ(ΓK)xμ(ΓK)(1+λμ(ΓK))eλμ(ΓK)γλx1eλxdx]\displaystyle 2\mathbb{E}\left[\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}x\mu^{\leftarrow}(\Gamma_{K})\left(1+\lambda\mu^{\leftarrow}(\Gamma_{K})\right)e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\gamma\lambda x^{-1}e^{-\lambda x}\mathrm{d}x\right] (243)
=\displaystyle= 2γ𝔼[μ(ΓK)(1+λμ(ΓK))eλμ(ΓK)(1eλμ(ΓK))]\displaystyle 2\gamma\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})\left(1+\lambda\mu^{\leftarrow}(\Gamma_{K})\right)e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\left(1-e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\right)\right] (244)
\displaystyle\leq 2γλ𝔼[μ(ΓK)2(1+λμ(ΓK))eλμ(ΓK)].\displaystyle 2\gamma\lambda\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})^{2}\left(1+\lambda\mu^{\leftarrow}(\Gamma_{K})\right)e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\right]. (245)

Note that (1+x)ex1(1+x)e^{-x}\leq 1, so

BK,3\displaystyle B_{K,3} 2λγ𝔼[μ(ΓK)2]\displaystyle\leq 2\lambda\gamma\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})^{2}\right] (246)
2γλ𝔼[e(γλ)1ΓK]=2γλ(γλ1+γλ)K1.\displaystyle\leq 2\frac{\gamma}{\lambda}\mathbb{E}\left[e^{-(\gamma\lambda)^{-1}\Gamma_{K}}\right]=2\frac{\gamma}{\lambda}\left(\frac{\gamma\lambda}{1+\gamma\lambda}\right)^{K-1}. (247)

Since BK,3B_{K,3} will dominate BK,1B_{K,1} and BK,2B_{K,2} asymptotically, there exists K0K_{0}\in\mathbb{N} such that for K>K0K>K_{0},

BK3BK,36γλ(γλ1+γλ)K1.\displaystyle B_{K}\leq 3B_{K,3}\leq 6\frac{\gamma}{\lambda}\left(\frac{\gamma\lambda}{1+\gamma\lambda}\right)^{K-1}. (248)

Sparse network

When α>0\alpha>0,

ν(dθ)=γλ1αΓ(1α)θα1eλθdθ,μ(dθ)=γλ1αΓ(1α)θα1dθ,\displaystyle\nu(\mathrm{d}\theta)=\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\theta^{-\alpha-1}e^{-\lambda\theta}\mathrm{d}\theta,\qquad\mu(\mathrm{d}\theta)=\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\theta^{-\alpha-1}\mathrm{d}\theta, (249)

and

dνdμ=eλθ,\displaystyle\frac{\mathrm{d}\nu}{\mathrm{d}\mu}=e^{-\lambda\theta}, μ[x,)=γxα,\displaystyle\mu[x,\infty)=\gamma^{\prime}x^{-\alpha}, μ(u)=(γu1)1/α,\displaystyle\mu^{\leftarrow}(u)=(\gamma^{\prime}u^{-1})^{1/\alpha}, γ=γλ1ααΓ(1α).\displaystyle\gamma^{\prime}=\gamma\frac{\lambda^{1-\alpha}}{\alpha\Gamma(1-\alpha)}. (250)

Similar to the example when α=0\alpha=0,

BK,1\displaystyle B_{K,1} =+2x1x2FK(μ[max{x1,x2},))ν(dx1)ν(dx2)\displaystyle=\int_{\mathbb{R}_{+}^{2}}x_{1}x_{2}F_{K}\left(\mu[\max\{x_{1},x_{2}\},\infty)\right)\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}) (251)
=2+x1x1FK(μ[x2,))γλ1αΓ(1α)x2αeλx2dx2ν(dx1)\displaystyle=2\int_{\mathbb{R}_{+}}x_{1}\int_{x_{1}}^{\infty}F_{K}\left(\mu[x_{2},\infty)\right)\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}x_{2}^{-\alpha}e^{-\lambda x_{2}}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}) (252)
2γ+x1FK(μ[x1,))x1λ1αΓ(1α)x2αeλx2dx2ν(dx1).\displaystyle\leq 2\gamma\int_{\mathbb{R}_{+}}x_{1}F_{K}\left(\mu[x_{1},\infty)\right)\int_{x_{1}}^{\infty}\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}x_{2}^{-\alpha}e^{-\lambda x_{2}}\mathrm{d}x_{2}\nu(\mathrm{d}x_{1}). (253)

Note that the integrand with respect to x2x_{2} is the density function of the gamma distribution with shape α\alpha and rate λ\lambda, so the integral is less than 1. We partition the outer integral into two parts and bound them separately,

BK,1\displaystyle B_{K,1} 2γ+FK(μ[x,))γλ1αΓ(1α)xαeλxdx\displaystyle\leq 2\gamma\int_{\mathbb{R}_{+}}F_{K}\left(\mu[x,\infty)\right)\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}x^{-\alpha}e^{-\lambda x}\mathrm{d}x (254)
2γ2λ1αΓ(1α)[0axαdx+FK(μ[a,))axαeλxdx]\displaystyle\leq 2\gamma^{2}\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\left[\int_{0}^{a}x^{-\alpha}\mathrm{d}x+F_{K}\left(\mu[a,\infty)\right)\int_{a}^{\infty}x^{-\alpha}e^{\lambda x}\mathrm{d}x\right] (255)
2γ21Γ(1α)[λ1α1αa1α+Γ(1α)(3γaαK)K].\displaystyle\leq 2\gamma^{2}\frac{1}{\Gamma(1-\alpha)}\left[\frac{\lambda^{1-\alpha}}{1-\alpha}a^{1-\alpha}+\Gamma(1-\alpha)\left(\frac{3\gamma^{\prime}a^{-\alpha}}{K}\right)^{K}\right]. (256)

By setting the two terms in the brackets equal, we get

a=[(1α)Γ(1α)λ1α]1(K1)α+1(3γK)K(K1)α+1(3γK)1α.\displaystyle a=\left[\frac{(1-\alpha)\Gamma(1-\alpha)}{\lambda^{1-\alpha}}\right]^{\frac{1}{(K-1)\alpha+1}}\left(\frac{3\gamma^{\prime}}{K}\right)^{\frac{K}{(K-1)\alpha+1}}\sim\left(\frac{3\gamma^{\prime}}{K}\right)^{\frac{1}{\alpha}}. (257)

So

BK,14γ2λ1α(1α)Γ(1α)(3γK)1αα.\displaystyle B_{K,1}\leq\frac{4\gamma^{2}\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\frac{3\gamma^{\prime}}{K}\right)^{\frac{1-\alpha}{\alpha}}. (258)

Similar to the last example where α=0\alpha=0, here

BK,2\displaystyle B_{K,2} (259)
=\displaystyle= 2(K1)+2x1x2𝔼[1ΓK𝟙(μ[x2,)ΓKμ[x1,))]ν(dx1)ν(dx2)\displaystyle 2(K-1)\int_{\mathbb{R}_{+}^{2}}x_{1}x_{2}\mathbb{E}\left[\frac{1}{\Gamma_{K}}\mathds{1}\left(\mu[x_{2},\infty)\leq\Gamma_{K}\leq\mu[x_{1},\infty)\right)\right]\nu(\mathrm{d}x_{1})\nu(\mathrm{d}x_{2}) (260)
\displaystyle\leq 2+x1FK1(μ[x1,))x1x2ν(dx2)ν(dx1)\displaystyle 2\int_{\mathbb{R}_{+}}x_{1}F_{K-1}\left(\mu[x_{1},\infty)\right)\int_{x_{1}}^{\infty}x_{2}\nu(\mathrm{d}x_{2})\nu(\mathrm{d}x_{1}) (261)
\displaystyle\leq 2γ+x1FK1(μ[x1,))ν(dx1).\displaystyle 2\gamma\int_{\mathbb{R}_{+}}x_{1}F_{K-1}\left(\mu[x_{1},\infty)\right)\nu(\mathrm{d}x_{1}). (262)

This has the same form as BK,1B_{K,1}, and therefore

BK,24γ2λ1α(1α)Γ(1α)(3γK1)1αα.\displaystyle B_{K,2}\leq\frac{4\gamma^{2}\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\frac{3\gamma^{\prime}}{K-1}\right)^{\frac{1-\alpha}{\alpha}}. (263)

Finally, since both eλx<1e^{-\lambda x}<1 and eλμ(ΓK)<1e^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}<1,

BK,3\displaystyle B_{K,3} (264)
=\displaystyle= 2𝔼[+μ(ΓK)xeλμ(ΓK)𝟙[xμ(ΓK)]γλ1αΓ(1α)x1αeλxdx]\displaystyle 2\mathbb{E}\left[\int_{\mathbb{R}_{+}}\mu^{\leftarrow}(\Gamma_{K})xe^{-\lambda\mu^{\leftarrow}(\Gamma_{K})}\mathds{1}\left[x\leq\mu^{\leftarrow}(\Gamma_{K})\right]\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}x^{-1-\alpha}e^{-\lambda x}\mathrm{d}x\right] (265)
\displaystyle\leq 2γλ1αΓ(1α)𝔼[μ(ΓK)0μ(ΓK)xαdx]\displaystyle 2\gamma\frac{\lambda^{1-\alpha}}{\Gamma(1-\alpha)}\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})\int_{0}^{\mu^{\leftarrow}(\Gamma_{K})}x^{-\alpha}\mathrm{d}x\right] (266)
\displaystyle\leq 2γλ1α(1α)Γ(1α)𝔼[μ(ΓK)2α]\displaystyle 2\gamma\frac{\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\mathbb{E}\left[\mu^{\leftarrow}(\Gamma_{K})^{2-\alpha}\right] (267)
=\displaystyle= 2γλ1α(1α)Γ(1α)(γ)2ααΓ(K2αα)Γ(K).\displaystyle 2\gamma\frac{\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\gamma^{\prime}\right)^{\frac{2-\alpha}{\alpha}}\frac{\Gamma(K-\frac{2-\alpha}{\alpha})}{\Gamma(K)}. (268)

By Stirling’s formula,

Γ(K(2α)/α)Γ(K)2π(K2αα) (K(2α)/αe)K(2α)/α2πK (Ke)KK2αα.\displaystyle\frac{\Gamma(K-(2-\alpha)/\alpha)}{\Gamma(K)}\sim\frac{\mathchoice{{\hbox{$\displaystyle\sqrt{2\pi(K-\frac{2-\alpha}{\alpha})\,}$}\lower 0.4pt\hbox{\vrule height=8.59721pt,depth=-6.8778pt}}}{{\hbox{$\textstyle\sqrt{2\pi(K-\frac{2-\alpha}{\alpha})\,}$}\lower 0.4pt\hbox{\vrule height=7.5pt,depth=-6.00003pt}}}{{\hbox{$\scriptstyle\sqrt{2\pi(K-\frac{2-\alpha}{\alpha})\,}$}\lower 0.4pt\hbox{\vrule height=5.25pt,depth=-4.20003pt}}}{{\hbox{$\scriptscriptstyle\sqrt{2\pi(K-\frac{2-\alpha}{\alpha})\,}$}\lower 0.4pt\hbox{\vrule height=4.2986pt,depth=-3.4389pt}}}\left(\frac{K-(2-\alpha)/\alpha}{e}\right)^{K-(2-\alpha)/\alpha}}{\mathchoice{{\hbox{$\displaystyle\sqrt{2\pi K\,}$}\lower 0.4pt\hbox{\vrule height=6.83331pt,depth=-5.46667pt}}}{{\hbox{$\textstyle\sqrt{2\pi K\,}$}\lower 0.4pt\hbox{\vrule height=6.83331pt,depth=-5.46667pt}}}{{\hbox{$\scriptstyle\sqrt{2\pi K\,}$}\lower 0.4pt\hbox{\vrule height=4.78333pt,depth=-3.82668pt}}}{{\hbox{$\scriptscriptstyle\sqrt{2\pi K\,}$}\lower 0.4pt\hbox{\vrule height=3.41666pt,depth=-2.73334pt}}}\left(\frac{K}{e}\right)^{K}}\sim K^{-\frac{2-\alpha}{\alpha}}. (269)

So

BK,32γλ1α(1α)Γ(1α)(γ)2ααK2αα.\displaystyle B_{K,3}\leq 2\gamma\frac{\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\gamma^{\prime}\right)^{\frac{2-\alpha}{\alpha}}K^{-\frac{2-\alpha}{\alpha}}. (270)

BK,2B_{K,2} dominates BK,1B_{K,1} and BK,3B_{K,3} asymptotically, so there exists K0K_{0}\in\mathbb{N} such that for K>K0K>K_{0},

BK12γ2λ1α(1α)Γ(1α)(3γK1)1αα.\displaystyle B_{K}\leq\frac{12\gamma^{2}\lambda^{1-\alpha}}{(1-\alpha)\Gamma(1-\alpha)}\left(\frac{3\gamma^{\prime}}{K-1}\right)^{\frac{1-\alpha}{\alpha}}. (271)

Appendix D Truncated inference

Proof of Theorem 4.2.

Fix KK\in\mathbb{N} and ϵ>0\epsilon>0, and define the subset of state space A={XN+1K+=0,B(Γ1:K,σ)ϵN+1}A=\left\{X_{N+1}^{K+}=0,B(\Gamma_{1:K},\sigma)\leq\frac{\epsilon}{N+1}\right\}. By assumption,

Π^(A)1η.\displaystyle\hat{\Pi}\left(A\right)\geq 1-\eta. (272)

Further, by applying the bound from Theorem 4.1, we know that for states in AA,

1ϵp(X1:N+1K+|θ1:K,σ)1.\displaystyle 1-\epsilon\leq p(X_{1:N+1}^{K+}\,|\,\theta_{1:K},\sigma)\leq 1. (273)

Suppose pp is the RHS of Eq. 55 that is proportional to the density of Π\Pi, and p^\hat{p} is the RHS of Eq. 55 removing the term p(X1:NK+|θ1:K,σ)p(X_{1:N}^{K+}|\theta_{1:K},\sigma), which is proportional to the density of Π^\hat{\Pi}:

pp^,and within A,p(1ϵ)p^.\displaystyle p\leq\hat{p},\qquad\text{and within $A$,}\quad p\geq(1-\epsilon)\hat{p}. (274)

Define the normalization constants Z,Z^Z,\hat{Z} such that p/Z=p^/Z^=1\int p/Z=\int\hat{p}/\hat{Z}=1. Then the above bounds yield

Z^=p^p=Ap+AcpA(1ϵ)p^=(1ϵ)Ap^p^p^(1ϵ)(1η)Z^.\displaystyle\hat{Z}=\!\int\hat{p}\geq\!\int p=\!\int_{A}p+\!\int_{A^{c}}p\geq\!\int_{A}(1-\epsilon)\hat{p}=(1-\epsilon)\frac{\int_{A}\hat{p}}{\int\hat{p}}\!\int\hat{p}\geq(1-\epsilon)(1-\eta)\hat{Z}. (275)

Therefore, (1ϵ)(1η)Z^ZZ^(1-\epsilon)(1-\eta)\hat{Z}\leq Z\leq\hat{Z}, and hence

Π(A)=Ap/Z(1ϵ)Ap^/Z(1ϵ)Ap^/Z^(1ϵ)(1η).\displaystyle\Pi(A)=\int_{A}p/Z\geq(1-\epsilon)\int_{A}\hat{p}/Z\geq(1-\epsilon)\int_{A}\hat{p}/\hat{Z}\geq(1-\epsilon)(1-\eta). (276)

The above results yield the total variation bound via

12|pZp^Z^|\displaystyle\frac{1}{2}\int\left|\frac{p}{Z}-\frac{\hat{p}}{\hat{Z}}\right| =12A|pZp^Z^|+12Ac|pZp^Z^|\displaystyle=\frac{1}{2}\int_{A}\left|\frac{p}{Z}-\frac{\hat{p}}{\hat{Z}}\right|+\frac{1}{2}\int_{A^{c}}\left|\frac{p}{Z}-\frac{\hat{p}}{\hat{Z}}\right| (277)
12A|pZpZ^|+12A|pZ^p^Z^|+12(Π(Ac)+Π^(Ac))\displaystyle\leq\frac{1}{2}\int_{A}\left|\frac{p}{Z}-\frac{p}{\hat{Z}}\right|+\frac{1}{2}\int_{A}\left|\frac{p}{\hat{Z}}-\frac{\hat{p}}{\hat{Z}}\right|+\frac{1}{2}\left(\Pi(A^{c})+\hat{\Pi}(A^{c})\right) (278)
=12|1Z1Z^|Ap+121Z^A(p^p)+12(Π(Ac)+Π^(Ac))\displaystyle=\frac{1}{2}\left|\frac{1}{Z}-\frac{1}{\hat{Z}}\right|\int_{A}p+\frac{1}{2}\frac{1}{\hat{Z}}\int_{A}(\hat{p}-p)+\frac{1}{2}\left(\Pi(A^{c})+\hat{\Pi}(A^{c})\right) (279)
12(1(1ϵ)(1η))+12ϵ+12(1(1η)(1ϵ)+η)\displaystyle\leq\frac{1}{2}\left(1-(1-\epsilon)(1-\eta)\right)+\frac{1}{2}\epsilon+\frac{1}{2}\left(1-(1-\eta)(1-\epsilon)+\eta\right) (280)
=3(ϵ+η)2ϵη.\displaystyle=\frac{3(\epsilon+\eta)}{2}-\epsilon\eta. (281)