This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Structural Entropy of the Stochastic Block Models

Jie Han , Tao Guo , Qiaoqiao Zhou , Wei Han , Bo Bai , and Gong Zhang J. Han, T. Guo, W. Han, B. Bai, and G. Zhang are with Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd.; ([email protected], [email protected], [email protected], [email protected], [email protected])Q. Zhou is with the Department of Computer Science, School of Computing, National University of Singapore; ([email protected])
Abstract

With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the “structural information” only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erdős–Rényi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time.

In this paper, we consider the Stochastic Block Models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for Stochastic Block Models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the Stochastic Block Models, and provide a compression scheme that asymptotically achieves this entropy limit.

1 Introduction

Shannon’s metric of “Entropy” of information is a foundational concept of information theory [39, 9]. Given a discrete random variable XX with support set (that is, the possible outcomes) x1,x2,,xnx_{1},x_{2},\dots,x_{n}, which occurs with probability p1,p2,,pnp_{1},p_{2},\dots,p_{n}, the entropy of XX is defined as

H(X):=i=1mpilogpi,H(X):=-\sum_{i=1}^{m}p_{i}\log p_{i},

where the logarithm here and throughout this paper is of base 2. Note that the entropy of XX is a function of the probability distribution of XX.

The entropy was originally created by Shannon in [33] as part of his theory of communication, where a data communication system consists of a data source XX, a channel and a receiver. The fundamental problem of communication is for the receiver to reliably recover what data was generated by the source, based on the bits it receives through the channel. Shannon proved that the entropy of the source XX plays a central role – in his source coding theorem it is shown that the entropy is the mathematical limit on how well the data can be losslessly compressed.

The question then arises: How to compress data that has structures, e.g., data in social networks? In Shannon’s 1953 less known paper [34] he argued for an extension of information theory, where data is considered as observations of a source, to “non-conventional data” (that is, lattices). Indeed, nowadays data appears in various formats and structures (e.g., sequences, expressions, interactions) and in drastically increasing amounts. In many scenarios, data is highly context-dependent and in particular, the structural information and the context information seem to be two conceptually different aspects. Therefore it is desirable to develop novel theory and efficient algorithms for extracting useful information from non-conventional data structures. Roughly speaking, such data consists of structural information, which, might be understood as the “shape” of the data, and context information which should be recognized as data labels.

It is well-known that complex networks (e.g., social networks) admit community structures [26]. That is, users within a group interact with each other more frequently than those outside the group. The Stochastic Block Model (SBM) [13] is a celebrated random graph model that has been widely used to study the community structures in graphs and networks. It provides a good benchmark to evaluate the performance of community detection algorithms and inspires the design of many algorithms for community detection tasks. The theoretical underpinnings of the SBM have been extensively studied and sharp thresholds for exact recovery have been successively established [2, 19, 3, 11]. We refer readers to [1] for a recent survey, where other interesting and important problems in SBM are also discussed.

In addition to the SBM model discussed in [1], there are other angles to study compression of data with graph structures. Asadi et. al. [6] investigated data compression on graphs with clusters. Zenil et. al. [40] have surveyed information-theoretic methods, in particular Shannon entropy and algorithmic complexity, for characterizing graphs and networks.

1.1 Compression of graphs

In recent years, graphical data and the network structures supporting them are becoming increasingly common and important in branches of engineering and sciences. To better represent and transmit graphical data, many works consider the problem of compressing the (random) graph up to isomorphism, i.e., compressing the structure of a graph. A graph GG contains a finite set VV of vertices and a set EE of edges each of which connects two vertices. A graph can be represented by a binary matrix (the adjacency matrix) that further can be viewed as a binary sequence. Thus, encoding a labeled graph (that is, all vertices need to be distinguished) is equivalent to encoding the (|V|2)\binom{|V|}{2}-digit binary sequence, given certain probability distribution on all (|V|2)\binom{|V|}{2} possible edges. However, such a string does not reflect internal symmetries that are conveyed by the graph automorphism, and sometimes we are only interested in the local or global structures in the graph, rather than the exact vertex labelings. The structural entropy is defined when the graphs are considered unlabeled, or simply called structures, where the vertices are viewed as undistinguishable. The goal of this natural definition is to capture the information of the structure, and thus provides a fundamental measure in graph/structure compression schemes.

The problem actually has a strong theoretical background. Back to 1984, Turán [37] raised the question of finding an efficient coding method for general unlabeled graphs on nn vertices, where a lower bound of (n2)nlogn+O(n)\binom{n}{2}-n\log n+O(n) bits is suggested. This lower bound can be seen by the number of unlabeled graphs [12]. The question was later answered by Naor [25] in 1990 who proposed such a representation that is optimal up to the first two leading terms when all unlabeled graphs are equally likely. In a recent paper Kieffer et al. [14] proved a structural complexity of a binary tree. There also have been some heuristic methods for real-world graph compression schemes, see [4, 7, 28, 32, 35]. Rather recently, Choi and Szpankowski [8] studied the structural entropy of the Erdős–Rényi random graph 𝒢(n,p)\mathcal{G}(n,p). They computed the structural entropy given that pp is not (very) close to 0 or 1 and also gave a compression scheme that matches their computation. Later, the structural entropy for other randomly generated graphs, e.g. the preferential attachment graphs and web graphs are also studied [18, 17, 31, 15].

However, it is well-known that the Erdős–Rényi model is too simplistic to model real networks, in particular due to its strong homogeneity and absence of community structure. In this paper, we consider the compression of graphical structures of the SBM, which in general model real networks better and circumvent the issues of the ER-model. In summary, our contributions are as follows:

  • We introduce the partitioned structural entropy which generalizes the structural entropy for unlabeled graphs and we show that it reflects the partition information of the SBM.

  • We provide an explicit formula for the partitioned structural entropy of the SBM.

  • We also propose a compression scheme that asymptotically achieves this entropy limit.

Semantic communications are considered as a key component of future generation networks, where a natural problem to consider is how to efficiently extract and transmit the “semantic information”. In the case of graph data, one may view the (partitioned) structures as the information that needs to be abstracted while the concrete labeling information is considered redundant. From this point of view, our result is a step for the study of semantic compression/communication under appropriate contexts.

1.2 Related works

Finally, we would like to point out that there are some other information metrics defined on graphs. The term “graph entropy” has been defined and used in the history. For example, graph entropy introduced by Kőrner in [16] denotes the number of bits one has to convey to resolve the ambiguity of a vertex in a graph. This notion also turns out to be useful in other areas, including combinatorics. Chromatic entropy introduced in [5] is the lowest entropy of any coloring of a graph. It finds application in zero-error source coding. We remark that the structural entropy we considered is quite different from the Kőrner graph entropy and chromatic entropy.

On the other hand, a concept of graph entropy (also called topological information content of a graph) was introduced by Rashevsky [29] and Trucco [36], and later by Mowshowitz [23, 20, 21, 22, 24, 10], which is defined as a function of (the structure of) a graph and an equivalence relation defined on its vertices or edges. Such a concept is a measure of the graph itself and does not involve any probability distribution.

2 Preliminaries

2.1 Structural entropy of unlabeled graphs

Now let us formally define the structural entropy given a probability distribution on unlabeled graphs.

Given an integer nn, define 𝒢n\mathcal{G}_{n} as the collection of all nn-vertex labeled graphs.

Definition 2.1 (Entropy of Random Graph).

Given an integer nn and a probability distribution on 𝒢n\mathcal{G}_{n}, the entropy of a random graph 𝒢𝒢n\mathcal{G}\in\mathcal{G}_{n} is defined as

H𝒢=𝔼[logP(G)]=G𝒢nP(G)logP(G)H_{\mathcal{G}}=\mathbb{E}[-\log P(G)]=-\sum_{G\in\mathcal{G}_{n}}P(G)\log P(G)

where P(G)P(𝒢=G)P(G)\triangleq P(\mathcal{G}=G) is the probability of a graph GG in 𝒢n\mathcal{G}_{n}.

Then the random structure model 𝒮n\mathcal{S}_{n} associated with the probability distribution 𝒢n\mathcal{G}_{n}, is defined as the unlabeled version of 𝒢n\mathcal{G}_{n}. For a given S𝒮nS\in\mathcal{S}_{n}, the probability of SS can be computed as

P(S)=GS,G𝒢nP(G).P(S)=\sum_{G\cong S,G\in\mathcal{G}_{n}}P(G).

Here GSG\cong S means that GG and SS have the same structure, that is, SS is isomorphic to GG. Clearly if all isomorphic labeled graphs have the same probability, then for any labeled graph GSG\cong S, one has

P(S)=N(S)P(G)P(S)=N(S)\cdot P(G)

where N(S)N(S) stands for the number of different labeled graphs that have the same structure as SS.

Definition 2.2 (Structural Entropy).

The structural entropy H𝒮H_{\mathcal{S}} of a random graph 𝒢\mathcal{G} is defined as the entropy of a random structure 𝒮\mathcal{S} associated with 𝒢n\mathcal{G}_{n}, that is,

H𝒮=𝔼[logP(S)]=S𝒮P(S)logP(S)H_{\mathcal{S}}=\mathbb{E}[-\log P(S)]=-\sum_{S\in\mathcal{S}}P(S)\log P(S)

where the sum is over all distinct structures.

The Erdős–Rényi random graph 𝒢(n,p)\mathcal{G}(n,p), also called the binomial random graph, is a fundamental random graph model, which has nn vertices and each pair of vertices is connected with probability pp, independent of other pairs. In 2012, Choi and Szpankowski [8] proved the following for the Erdős–Rényi random graphs.

Theorem 2.3 (Choi and Szpankowski, [8]).

For large nn and all pp satisfying n1lnnpn^{-1}\ln n\ll p and 1pn1lnn1-p\gg n^{-1}\ln n, the following holds:

  1. 1.

    The structural entropy H𝒮H_{\mathcal{S}} of 𝒢(n,p)\mathcal{G}(n,p) is

    H𝒮=(n2)h(p)logn!+O(lognnα)H_{\mathcal{S}}=\binom{n}{2}h(p)-\log n!+O\left(\frac{\log n}{n^{\alpha}}\right)

    for some α>0\alpha>0.

  2. 2.

    For a structure SS of nn vertices and ε>0\varepsilon>0

    P(|1(n2)logP(S)h(p)+logn!(n2)|<ε)>12εP\left(\left|-\frac{1}{\binom{n}{2}}\log P(S)-h(p)+\frac{\log n!}{\binom{n}{2}}\right|<\varepsilon\right)>1-2\varepsilon

    where h(p)=plogp(1p)log(1p)h(p)=-p\log p-(1-p)\log(1-p) is the entropy rate of a binary memoryless source.

Furthermore, they [8] also presented a compression algorithm for unlabeled graphs that asymptotically achieves the structural entropy up to an O(n)O(n) error term.

2.2 Stochastic Block Model – Our result

It is well-known that the ER model is too simplistic to model real networks, in particular due to its strong homogeneity and absence of community structure. The Stochastic Block Model is then introduced on the assumption that vertices in a network connect independently but with probability based on their profiles, or equivalently, on their community assignment. For example, in the SBM with two communities and symmetric parameters, also known as the planted bisection model, denoted by 𝒢(n,p,q)\mathcal{G}(n,p,q), the vertex set is partitioned into two sets V1V_{1} and V2V_{2}, any pair of vertices inside V1V_{1} or V2V_{2} are connected with probability pp and any pair of vertices across the clusters are connected with probability qq, and all these connections are independent.

As an illuminating example, consider a context GG where there are n/2n/2 users and n/2n/2 devices, and each pair of users and each pair of devices are connected with probability pp, a user and a device is connected with probability qq and each of these connections is independent of all other connections. Suppose that we need to compress the information of GG. However, in the context it is not appropriate to view GG as an unlabeled graph, that is, in addition to the structure information, it is also important to keep the “community” information – the compression also needs to encode the information that who is a user and who is device.

Definition 2.4 (Partition-respecting isomorphism, Partitioned Unlabeled Graphs).

Let rnr\leq n be integers. Suppose VV is a set of nn vertices and 𝒫={V1,V2,,Vr}\mathcal{P}=\{V_{1},V_{2},\dots,V_{r}\} is a partition of VV into rr parts. The partition-respecting isomorphism, denoted by “𝒫\cong_{\mathcal{P}}” is defined as follows. For any two labeled graphs GG and GG^{\prime}, we write G𝒫GG\cong_{\mathcal{P}}G^{\prime} if and only if GGG\cong G^{\prime} they are isomorphic via an isomorphism function ϕ:VV\phi:V\to V such that ϕ(Vi)=Vi\phi(V_{i})=V_{i}, for 1ir1\leq i\leq r. Then Γ𝒫\Gamma_{\mathcal{P}} is defined as the collection of nn-vertex graphs on VV where we ignore the labels of vertices inside each ViV_{i}, 1ir1\leq i\leq r, namely, the equivalence classes under partition-respecting isomorphism, with respect to 𝒫\mathcal{P}.

Note that every labeled graph GG corresponds to a unique structure SΓ𝒫S\in\Gamma_{\mathcal{P}}, and we use G𝒫SG\cong_{\mathcal{P}}S to denote this relation. Furthermore, under the above definition, general unlabeled graphs correspond to the case r=1r=1.

Definition 2.5 (Partitioned Structural Entropy).

Let VV be a set of nn vertices where nn\in\mathbb{N}. Suppose 𝒫={V1,V2,,Vr}\mathcal{P}=\{V_{1},V_{2},\dots,V_{r}\} is a partition of VV into rr parts and 𝒮n\mathcal{S}_{n} is a probability distribution over all partitioned unlabeled graphs on nn vertices. Then the structural entropy H𝒮H_{\mathcal{S}} associated to 𝒮n\mathcal{S}_{n} is defined by

H𝒮=𝔼[logP(S)]=S𝒮nP(S)logP(S).H_{\mathcal{S}}=\mathbb{E}[-\log P(S)]=-\sum_{S\in{\mathcal{S}_{n}}}P(S)\log P(S).

In this paper we extend Theorem 2.3 to the structural entropy of the Stochastic Block Model with any given number of blocks, and provide a compression algorithm that asymptotically matches this structural entropy. We first give the result for the balanced bipartition case 𝒢(n,p,q)\mathcal{G}(n,p,q).

Theorem 2.6.

Let V=V1V2V=V_{1}\cup V_{2} be a set of nn vertices and |V1|=|V2|=n/2|V_{1}|=|V_{2}|=n/2. Suppose 𝒢(n,p,q)\mathcal{G}(n,p,q) is a probability distribution of graphs on VV where every edge inside V1V_{1} or V2V_{2} is present with probability pp and every edge between V1V_{1} and V2V_{2} is present with probability qq, and these edges are mutually independent. For large even nn and all pp satisfying n1lnnp,qn^{-1}\ln n\ll p,q and 1pn1lnn1-p\gg n^{-1}\ln n, the following holds:

  1. (i)

    The partitioned structural entropy H𝒮H_{\mathcal{S}} of 𝒢(n,p,q)\mathcal{G}(n,p,q) is

    H𝒮=2(n/22)h(p)+n24h(q)2log(n2)!+O(lognnα)H_{\mathcal{S}}=2\binom{n/2}{2}h(p)+\frac{n^{2}}{4}h(q)-2\log\left(\frac{n}{2}\right)!+O\left(\frac{\log n}{n^{\alpha}}\right) (1)

    for some α>0\alpha>0.

  2. (ii)

    For a balanced bipartitioned structure SS and ε>0\varepsilon>0

    P(|1(n2)logP(S)n22n2h(p)n2n2h(q)+2log(n/2)!(n2)|<3ε)>14εP\left(\left|-\frac{1}{\binom{n}{2}}\log P(S)-\frac{n-2}{2n-2}h(p)-\frac{n}{2n-2}h(q)+\frac{2\log(n/2)!}{\binom{n}{2}}\right|<3\varepsilon\right)>1-4\varepsilon

    where h(p)=plogp(1p)log(1p)h(p)=-p\log p-(1-p)\log(1-p) is the entropy rate of a binary memoryless source.

Note that the structural entropy H𝒮H_{\mathcal{S}} here is larger than that in Theorem 2.3 (even if p=qp=q), which reflects the fact that the SBM with “a planted (bi-)partition” contains prefixed structures, so has less symmetries than 𝒢(n,p)\mathcal{G}(n,p), the pure random model111For 𝒢(n,p)\mathcal{G}(n,p), when it is asymmetric, comparing with the completely labeled graphs, Theorem 2.3 saves a term as logn!\log n!; this saving becomes 2log(n/2)!2\log\left(n/2\right)! for the planted balanced bipartition case in Theorem 2.6..

3 Proof of Theorem 2.6

One key ingredient in the proof of Theorem 2.3 in [8] is the following lemma on the symmetry of 𝒢(n,p)\mathcal{G}(n,p). A graph is called asymmetric if its automorphism group does not contain any permutation other than identity; otherwise it is called symmetric.

Lemma 3.1 (Kim, Sudakov and Vu, 2002).

For all pp satisfying n1lnnpn^{-1}\ln n\ll p and 1pn1lnn1-p\gg n^{-1}\ln n, a random graph G𝒢(n,p)G\in\mathcal{G}(n,p) is symmetric with probability O(nw)O(n^{-w}) for any positive constant ww.

Proof of Theorem 2.6.

Note that every pair of vertices in V1V_{1} or in V2V_{2} should be considered as undistinguishable, but not the pairs of vertices in X×YX\times Y. Recall that we write G𝒫SG\cong_{\mathcal{P}}S for a graph GG and a structure SS if SS represents the structure of GG (with respect to the partition 𝒫\mathcal{P}).

Let 𝒢:=𝒢(n,p,q)\mathcal{G}:=\mathcal{G}(n,p,q). We first compute H𝒢H_{\mathcal{G}}. Note that there are (n2)\binom{n}{2} possible edges in G𝒢G\in\mathcal{G}, and we can view it as a binary sequence of length (n2)\binom{n}{2}, where each digit is a Bernoulli random variable. Moreover, for edges inside V1V_{1} or V2V_{2}, the random variable, denoted by X1X_{1}, has expectation pp and for edges in V1×V2V_{1}\times V_{2} the random variable, denoted by X2X_{2}, has expectation qq. Thus we have

H𝒢\displaystyle H_{\mathcal{G}} =𝔼[logX12(n/22)X2n2/4]\displaystyle=-\mathbb{E}[\log X_{1}^{2\binom{n/2}{2}}X_{2}^{n^{2}/4}]
=2(n/22)𝔼[logX1]n24𝔼[logX2]\displaystyle=-2\binom{n/2}{2}\mathbb{E}[\log X_{1}]-\frac{n^{2}}{4}\mathbb{E}[\log X_{2}]
=2(n/22)h(p)+n24h(q).\displaystyle=2\binom{n/2}{2}h(p)+\frac{n^{2}}{4}h(q).

Now write 𝒮n\mathcal{S}_{n} for the probability distribution on VV over all partitioned unlabeled graphs inherited from 𝒢\mathcal{G}, namely, given SΓ𝒫S\in\Gamma_{\mathcal{P}}, P(S)=G𝒫SP(G)P(S)=\sum_{G\cong_{\mathcal{P}}S}P(G). Let H𝒮H_{\mathcal{S}} be the partitioned structural entropy of 𝒮n\mathcal{S}_{n}. Therefore, compared with our goal, it remains to show that

H𝒮H𝒢=2log(n/2)!+O(lognnα).H_{\mathcal{S}}-H_{\mathcal{G}}=-2\log\left(n/2\right)!+O\left(\frac{\log n}{n^{\alpha}}\right). (2)

Note that in 𝒢(n,p,q)\mathcal{G}(n,p,q), all labeled graphs G𝒢G\in\mathcal{G} such that G𝒫SG\cong_{\mathcal{P}}S have the same probability P(G)P(G). Thus, given a (labeled) graph G𝒢G\in\mathcal{G}, we have P(G)=P(S)/N(S)P(G)=P(S)/N(S), where S𝒮nS\in\mathcal{S}_{n} is such that G𝒫SG\cong_{\mathcal{P}}S. So the graph entropy of 𝒢=𝒢(n,p,q)\mathcal{G}=\mathcal{G}(n,p,q) can be written as

H𝒢\displaystyle H_{\mathcal{G}} =G𝒢P(G)logP(G)\displaystyle=-\sum_{G\in\mathcal{G}}P(G)\log P(G)
=S𝒮nG𝒫S,G𝒢P(G)logP(G)\displaystyle=-\sum_{S\in\mathcal{S}_{n}}\sum_{G\cong_{\mathcal{P}}S,G\in\mathcal{G}}P(G)\log P(G)
=S𝒮nG𝒫S,G𝒢P(S)N(S)logP(S)N(S)\displaystyle=-\sum_{S\in\mathcal{S}_{n}}\sum_{G\cong_{\mathcal{P}}S,G\in\mathcal{G}}\frac{P(S)}{N(S)}\log\frac{P(S)}{N(S)}
=S𝒮nP(S)logP(S)N(S)\displaystyle=-\sum_{S\in\mathcal{S}_{n}}P(S)\log\frac{P(S)}{N(S)}
=H𝒮+S𝒮P(S)logN(S)\displaystyle=H_{\mathcal{S}}+\sum_{S\in\mathcal{S}}P(S)\log N(S) (3)

Define S[W]S[W] be be SS restricted on WW for WVW\in V. Now we split SS into S1S_{1} and S2S_{2}, i.e., S1=S[V1]S_{1}=S[V_{1}] and S2=S[V2]S_{2}=S[V_{2}]. Write Aut(Si)Aut(S_{i}) for the automorphism group for SiS_{i}, and we naturally have

N(S)=(n/2)!(n/2)!|Aut(S1)||Aut(S2)|.N(S)=\frac{(n/2)!\cdot(n/2)!}{|Aut(S_{1})||Aut(S_{2})|}.

Combining this with (2) and (3), it remains to show that

S𝒮P(S)log|Aut(S1)||Aut(S2)|=O(lognnα).\sum_{S\in\mathcal{S}}P(S)\log|Aut(S_{1})||Aut(S_{2})|=O\left(\frac{\log n}{n^{\alpha}}\right).

In the summation above we only need to focus on SS such that either S1S_{1} or S2S_{2} is symmetric, as otherwise log|Aut(S1)||Aut(S2)|=log1=0\log|Aut(S_{1})||Aut(S_{2})|=\log 1=0. By Lemma 3.1, we conclude that the probability of SS restricted on V1V_{1} or V2V_{2} is symmetric is O(n1α)O(n^{-1-\alpha}) for some α>0\alpha>0, and for such SS we use the trivial bound log|Aut(S1)||Aut(S2)|2log(n/2)!2nlogn\log|Aut(S_{1})||Aut(S_{2})|\leq 2\log(n/2)!\leq 2n\log n. This gives us the desired estimate in (i)

S𝒮P(S)log|Aut(S1)||Aut(S2)|2nlognO(n1α)=O(lognnα).\sum_{S\in\mathcal{S}}P(S)\log|Aut(S_{1})||Aut(S_{2})|\leq 2n\log n\cdot O(n^{-1-\alpha})=O\left(\frac{\log n}{n^{\alpha}}\right).

To show (ii), for a set VV of nn vertices and a balanced bipartition 𝒫=(V1,V2)\mathcal{P}=(V_{1},V_{2}) of VV, we define the typical set TεnT_{\varepsilon}^{n} as the set of structures SS on nn vertices satisfying (a) SS is asymmetric on V1V_{1} and V2V_{2}, respectively; (b) for G𝒫SG\cong_{\mathcal{P}}S

22(n/22)h(p)n24h(q)(n2)εP(G)22(n/22)h(p)n24h(q)+(n2)ε.2^{-2\binom{n/2}{2}h(p)-\tfrac{n^{2}}{4}h(q)-\binom{n}{2}\varepsilon}\leq P(G)\leq 2^{-2\binom{n/2}{2}h(p)-\tfrac{n^{2}}{4}h(q)+\binom{n}{2}\varepsilon}.

Denote by T1nT_{1}^{n} and T2nT_{2}^{n} the sets of structures satisfying the properties (a) and (b), respectively and thus we have Tεn=T1nT2nT_{\varepsilon}^{n}=T_{1}^{n}\cap T_{2}^{n}. Firstly, by the asymmetry of 𝒢(n,p)\mathcal{G}(n,p) (Lemma 3.1), we conclude that P(T1n)>12εP(T_{1}^{n})>1-2\varepsilon for large nn. Secondly, we use a binary sequence of length (n2)\binom{n}{2} to represent a (labeled) instance GG of 𝒢(n,p,q)\mathcal{G}(n,p,q), where the first (n/22)\binom{n/2}{2} bits 1\mathcal{L}_{1} represent the induced subgraph on V1V_{1}, the next (n/22)\binom{n/2}{2} bits 2\mathcal{L}_{2} represent the induced subgraph on V2V_{2}, and finally the rest n2/4n^{2}/4 bits 12\mathcal{L}_{12} represent the bipartite graph on V1×V2V_{1}\times V_{2}. Since all edges of GG are generated independently, both 1\mathcal{L}_{1} and 2\mathcal{L}_{2} have in expectation (n/22)p\binom{n/2}{2}p 1’s and the AEP property of the binary sequences implies that

2(n/22)h(p)(n2)εP(G[V1]),P(G[V2])2(n/22)h(p)+(n2)ε2^{-\binom{n/2}{2}h(p)-\binom{n}{2}\varepsilon}\leq P(G[V_{1}]),P(G[V_{2}])\leq 2^{-\binom{n/2}{2}h(p)+\binom{n}{2}\varepsilon}

holds with probability at least 12ε1-2\varepsilon. Similarly, 12\mathcal{L}_{12} has in expectation (n2/4)q(n^{2}/4)q 1’s and the AEP property of the binary sequences gives that with probability at least 1ε1-\varepsilon,

2n24h(q)(n2)εP(G[V1,V2])2n24h(q)+(n2)ε2^{-\tfrac{n^{2}}{4}h(q)-\binom{n}{2}\varepsilon}\leq P(G[V_{1},V_{2}])\leq 2^{-\tfrac{n^{2}}{4}h(q)+\binom{n}{2}\varepsilon}

Since these edges are independent, we finally conclude that (b) holds with probability at least 13ε1-3\varepsilon. Thus, P(Tεn)14εP(T_{\varepsilon}^{n})\geq 1-4\varepsilon. Now we can compute P(S)P(S) for STεnS\in T_{\varepsilon}^{n}. By (a), P(S)=(n/2)!(n/2)!P(G)P(S)=(n/2)!(n/2)!P(G) for any GSG\cong S. Together with (b) and straightforward computation, the assertion of (ii) follows. ∎

4 SBM Compression Algorithm

Given the computation of the structural entropy, a natural next step is to design efficient compression schemes that are close to or even (asymptotically) achieve this entropy limit. Choi and Szpankowski [8] presented such an algorithm (which they named Szip) for (unlabeled) random graphs, which uses in expectation at most (n2)h(p)nlogn+O(n)\binom{n}{2}h(p)-n\log n+O(n) bits and asymptotically achieves the structural entropy given in Theorem 2.3. Roughly speaking, Szip greedily peels off vertices from the graph and (efficiently) store the neighborhood information. This procedure can be simply reversed but the labeling of the recovered graph may be different from the original graph, which is the reason on why a saving of the codeword length is achieved. Refinements and analysis [8] are also provided to achieve the proposed performance.

Here we give an algorithm that optimally compresses SBMs which uses the Szip algorithm as building blocks and matches the structural entropy computation in Theorem 2.6. The algorithm consists of two stages. It first compresses S[V1]S[V_{1}] and S[V2]S[V_{2}] using Szip and then compresses S[V1,V2]S[V_{1},V_{2}] using an arithmetic compression algorithm with the help of Szip decoding outputs.

SBMdataS[V1]S[V_{1}]SzipEncoder 1\mathcal{L}_{1}S[V2]S[V_{2}]SzipEncoder 2\mathcal{L}_{2}S[V1,V2]S[V_{1},V_{2}]SZIP\text{S}_{\text{ZIP}}Decoder SzipDecoder S[V1]S^{\prime}[V_{1}]S[V2]S^{\prime}[V_{2}]Labeled Encoder 12\mathcal{L}_{12}Cascade encoderLabeled Decoder S^[V1,V2]=S[V1,V2]\hat{S}[V_{1},V_{2}]=S^{\prime}[V_{1},V_{2}]SzipDecoder S^[V1]=S[V1]\hat{S}[V_{1}]=S^{\prime}[V_{1}]SzipDecoder S^[V2]=S[V2]\hat{S}[V_{2}]=S^{\prime}[V_{2}]Parallel decoder
Figure 1: Illustration of compression algorithm

To give a brief description of the compression algorithm, we again use the balanced bipartition V1V2V_{1}\cup V_{2} as an example. The encoding and decoding procedure of the algorithm is illustrated in Figure 1. The algorithm encodes the observed 𝒮(n,p,q)\mathcal{S}(n,p,q) into a binary string as follows. It uses Szip as a subroutine to compress S[V1]S[V_{1}] and S[V2]S[V_{2}] into binary sequences 1\mathcal{L}_{1} and 2\mathcal{L}_{2}. Then, as part of the encoder, we run the Szip decoder on 1\mathcal{L}_{1} and 2\mathcal{L}_{2} to obtain decoded structures S[V1]S^{\prime}[V_{1}] and S[V2]S^{\prime}[V_{2}], respectively. We then compress S[V1,V2]S[V_{1},V_{2}] as a labeled bipartite graph under the vertex labeling of S[V1]S^{\prime}[V_{1}] and S[V2]S^{\prime}[V_{2}] into 12\mathcal{L}_{12}. This “Labeled Encoder” can be done by treating it as a binary sequence of length n2/4n^{2}/4 and compressing using a standard arithmetic encoder [27, 30, 38]. The concatenation of Szip algorithms and the arithmetic encoder forms the cascade encoder of our algorithm and obtains the codeword (1,2,12)(\mathcal{L}_{1},\mathcal{L}_{2},\mathcal{L}_{12}). Upon receiving the codeword, we decode them parallelly using Szip decoder and the arithmetic decoder. This completes our algorithm.

The main challenge in the design of our algorithm is how the decoder can retrieve the consistency between the bipartite graph S[V1,V2]S[V_{1},V_{2}] and the decoded version of S[V1]S[V_{1}] and S[V2]S[V_{2}]. A key observation here is that since Szip is a deterministic algorithm, although it may permute the vertex labelings, its output is an invariant given the same input. Given this, our solution here is to first run Szip (both encoding and decoding) at the encoder, and obtain structures S[V1]S^{\prime}[V_{1}] and S[V2]S^{\prime}[V_{2}], respectively. We then compress S[V1,V2]S[V_{1},V_{2}] (as a labeled bipartite graph) under the vertex labeling of S[V1]S^{\prime}[V_{1}] and S[V2]S^{\prime}[V_{2}]. This would guarantee that the decoded structures S^[V1]\hat{S}[V_{1}], S^[V2]\hat{S}[V_{2}] and S^[V1,V2]\hat{S}[V_{1},V_{2}] share the same vertex labeling as S[V1]S^{\prime}[V_{1}] and S[V2]S^{\prime}[V_{2}], namely, SS is recovered.

Before discussing the performance of the algorithm, we first describe some useful properties of the arithmetic compression algorithm in the following lemma. We omit the proof of the lemma, which follows from the analysis in [27, 30, 38] and AEP properties in [39, 9].

Lemma 4.1.

Let LL be the codeword length of the arithmetic compression algorithm when compressing a binary sequence with length mm and entropy rate hh. For large mm, the following holds:

  1. (i)(i)

    The expected codeword length asymptotically achieves the entropy of the message, i.e.,

    𝔼[L]=mh+O(logm).\mathbb{E}[L]=mh+O(\log m). (4)
  2. (ii)(ii)

    For any ϵ>0\epsilon>0,

    P(|L𝔼[L]|ϵlogm)1o(1).P(|L-\mathbb{E}[L]|\leq\epsilon\log m)\geq 1-o(1). (5)
  3. (iii)(iii)

    The arithmetic algorithm runs in time O(m)O(m).

The following theorem characterizes the performance of our algorithm. It is immediate from Theorem 2 in [8] (performance of Szip) and Lemma 4.1, we omit the detailed proofs here.

Theorem 4.2.

Let V=V1V2V=V_{1}\cup V_{2} be a set of nn vertices and |V1|=|V2|=n/2|V_{1}|=|V_{2}|=n/2. Given a partitioned unlabeled graph SS on VV, let L(S)L(S) be the codeword length given by our algorithm. For large nn, our algorithm runs in time O(n2)O(n^{2}), and satisfies the following:

  1. (i)(i)

    The algorithm asymptotically achieves the structural entropy in (1) 222Note that (n/2)log(n/2)=nlogn+O(n)(n/2)\log(n/2)=n\log n+O(n)., i.e.,

    𝔼[L(S)]2(n/22)h(p)+n24h(q)nlogn+O(n).\mathbb{E}[L(S)]\leq 2\binom{n/2}{2}h(p)+\frac{n^{2}}{4}h(q)-n\log n+O(n).
  2. (ii)(ii)

    For any ϵ>0\epsilon>0,

    P(|L(S)𝔼[L(S)]|ϵnlogn)1o(1).P(|L(S)-\mathbb{E}[L(S)]|\leq\epsilon n\log n)\geq 1-o(1).

5 General SBM with r2r\geq 2 blocks

In previous sections, we discussed the structural entropy of SBM and the compression algorithm that asymptotically achieves this structural entropy for the balanced bipartition case (r=2r=2). The corresponding results in Theorem 2.6 and Theorem 4.2 can be easily generalized to the general rr-partition case. We briefly describe the generalizations below.

5.1 Structural entropy

Our approach can deal with general SBMs similarly. In a general SBM with r2r\geq 2 parts, an r×rr\times r symmetric matrix PP is used to describe the probabilities between and within the communities, where two vertices uViu\in V_{i} and vVjv\in V_{j} are connected by an edge with probability PijP_{ij} (ii and jj are not necessarily distinct). To simplify the presentation, we only present the results below in its special form where Pij=pP_{ij}=p if i=ji=j and Pij=qP_{ij}=q if iji\neq j, and we remark that similar results hold in the general case as well. We first give the result on the computation of the partitioned structural entropy of SBM.

Theorem 5.1.

Fix rr reals x1,x2,,xrx_{1},x_{2},\dots,x_{r} in (0,1)(0,1) whose sum is 1. Let V=V1V2VrV=V_{1}\cup V_{2}\cup\cdots\cup V_{r} be a set of nn vertices with a partition into rr parts such that |Vi|=xin|V_{i}|=x_{i}n. For large nn and all pp satisfying n1lnnp,qn^{-1}\ln n\ll p,q and 1pn1lnn1-p\gg n^{-1}\ln n, the following holds:

  1. (i)(i)

    The rr-partitioned structural entropy H𝒮rH_{\mathcal{S}}^{r} for a partitioned structure 𝒮\mathcal{S} on VV is

    H𝒮r=z(h(p)h(q))+(n2)h(q)rlog(nr)!+O(lognnα)H_{\mathcal{S}}^{r}=z(h(p)-h(q))+\binom{n}{2}h(q)-r\log\left(\frac{n}{r}\right)!+O\left(\frac{\log n}{n^{\alpha}}\right) (6)

    for some α>0\alpha>0, where z:=i=1r(xin2)z:=\sum_{i=1}^{r}\binom{x_{i}n}{2}.

  2. (ii)(ii)

    For a partitioned structure SS on VV and ε>0\varepsilon>0

    P(|1(n2)logP(S)z(n2)(h(p)h(q))h(q)+rlog(n/r)!(n2)|<3ε)>14ε.P\left(\left|-\frac{1}{\binom{n}{2}}\log P(S)-\frac{z}{\binom{n}{2}}(h(p)-h(q))-h(q)+\frac{r\log(n/r)!}{\binom{n}{2}}\right|<3\varepsilon\right)>1-4\varepsilon.

5.2 Compression algorithm

The compression algorithm for a general rr with vertex partition {V1,V2,,Vr}\{V_{1},V_{2},\dots,V_{r}\} can be viewed as a union of the compression algorithms for S[Vi]S[V_{i}] and S[Vi,Vj]S[V_{i},V_{j}] (i<j{1,2,,r}i<j\in\{1,2,\dots,r\}). To be more precise, we describe the algorithm as follows. It first compresses all S[Vi]S[V_{i}] into i\mathcal{L}_{i} using Szip. Then run the Szip decoder with input i\mathcal{L}_{i} to obtain the decoded structure S[Vi]S^{\prime}[V_{i}]. With the indices of S[Vi]S^{\prime}[V_{i}], i=1,2,,ri=1,2,\dots,r, we can compress S[V1,V2,,Vr]S[V_{1},V_{2},\dots,V_{r}] as a labeled rr-partite graph into \mathcal{L} using an arithmetic encoder. This completes the encoding procedure and gives the codewords 1,,r,\mathcal{L}_{1},\dots,\mathcal{L}_{r},\mathcal{L}, for which we concatenate together and get the final codeword. The decoding is to simply run the Szip decoders and labeled (arithmetic) decoders parallelly. The correctness of the decoding output can also be argued accordingly.

The performance of the algorithm can be obtained similar to Theorem 4.2 as follows.

Theorem 5.2.

Fix rr reals x1,x2,,xrx_{1},x_{2},\dots,x_{r} in (0,1)(0,1) whose sum is 1. Let V=V1V2VrV=V_{1}\cup V_{2}\cup\cdots\cup V_{r} be a set of nn vertices with a partition into rr parts such that |Vi|=xin|V_{i}|=x_{i}n. Given a partitioned unlabeled graph SS on VV, let L(S)L(S) be the codeword length given by our algorithm. For large nn, our algorithm runs in time O(n2)O(n^{2}), and satisfies the following:

  1. (i)(i)

    The algorithm asymptotically achieves the structural entropy in (6), i.e.,

    𝔼[L(S)]i=1r(xin2)(h(p)h(q))+(n2)h(q)nlogn+O(n).\mathbb{E}[L(S)]\leq\sum_{i=1}^{r}\binom{x_{i}n}{2}(h(p)-h(q))+\binom{n}{2}h(q)-n\log n+O(n).
  2. (ii)(ii)

    For any ϵ>0\epsilon>0,

    P(|L(S)𝔼[L(S)]|ϵnlogn)1o(1).P(|L(S)-\mathbb{E}[L(S)]|\leq\epsilon n\log n)\geq 1-o(1).

6 Conclusion

In this paper we defined the partitioned unlabeled graphs and partitioned structural entropy, which generalize the structural entropy for unlabeled graphs introduced by Choi and Szpankowski [8]. We then computed the partitioned structural entropy for Stochastic Block Models and gave a compression algorithm that asymptotically achieves this structural entropy limit. As mentioned earlier, we believe that in appropriate contexts the structural information of a graph or network can be interpreted as a kind of semantic information, in which case, the communication schemes may benefit from structural compressions which considerably reduce the cost.

References

  • [1] E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
  • [2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory, 62(1):471–487, 2015.
  • [3] E. Abbe and C. Sandon. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 670–688. IEEE, 2015.
  • [4] M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Proceedings DCC 2001. Data Compression Conference, pages 203–212, 2001.
  • [5] N. Alon and A. Orlitsky. Source coding and graph entropies. IEEE Trans. Inf. Theory, 42(5):1329–1339, 1996.
  • [6] A. R. Asadi, E. Abbe, and S. Verdú. Compressing data on graphs with clusters. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 1583–1587, 2017.
  • [7] F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM KDD ’09, page 219–228, New York, NY, USA, 2009. Association for Computing Machinery.
  • [8] Y. Choi and W. Szpankowski. Compression of graphical structures: Fundamental limits, algorithms, and experiments. IEEE Trans. Inf. Theory, 58(2):620–638, 2012.
  • [9] T. M. Cover and J. A. Thomas. Elements of Information Theory 2nd Edition. Wiley-Interscience, USA, 2006.
  • [10] M. Dehmer and A. Mowshowitz. A history of graph entropy measures. Information Science, 181(1):57–78, Jan. 2011.
  • [11] B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hidden community. IEEE Trans. Inf. Theory, 63(8):4729–4745, 2017.
  • [12] F. Harary and E. M. Palmer. Graphical Enumeration. New York: Academic, 1973.
  • [13] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
  • [14] J. C. Kieffer, E.-H. Yang, and W. Szpankowski. Structural complexity of random binary trees. In Proceedings of 2009 IEEE International Symposium on Information Theory (ISIT), pages 635–639, Seoul, 2009.
  • [15] I. Kontoyiannis, Y. H. Lim, K. Papakonstantinopoulou, and W. Szpankowski. Symmetry and the entropy of small-world structures and graphs. In Proceedings of 2021 IEEE International Symposium on Information Theory (ISIT), pages 3026–3031, 2021.
  • [16] J. Körner. Coding of an information source having ambiguous alphabet and the entropy of graphs. In 6th Prague conference on information theory, pages 411–425, 1973.
  • [17] T. Łuczak, A. Magner, and W. Szpankowski. Asymmetry and structural information in preferential attachment graphs. Random Structures & Algorithms, 55:696–718, 03 2019.
  • [18] T. Łuczak, A. Magner, and W. Szpankowski. Compression of preferential attachment graphs. In Proceedings of 2019 IEEE International Symposium on Information Theory (ISIT), pages 1697–1701, 2019.
  • [19] E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for the planted bisection model. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 69–75, 2015.
  • [20] A. Mowshowitz. Entropy and the complexity of graphs ii: the information content of digraphs and infinite graphs. Bulletin of Mathematical Biophysics 30, pages 225–240, 1968.
  • [21] A. Mowshowitz. Entropy and the complexity of graphs iii: graphs with prescribed information content. Bulletin of Mathematical Biophysics 30, pages 387–414, 1968.
  • [22] A. Mowshowitz. Entropy and the complexity of graphs iv: entropy measures and graphical structure. Bulletin of Mathematical Biophysics 30, pages 533–546, 1968.
  • [23] A. Mowshowitz. Entropy and the complexity of the graphs i: an index of the relative complexity of a graph. Bulletin of Mathematical Biophysics 30, pages 175–204, 1968.
  • [24] A. Mowshowitz and M. Dehmer. Entropy and the complexity of graphs revisited. Entropy, 14(3):559–570, 2012.
  • [25] M. Naor. Succinct representation of general unlabeled graphs. Discr. Appl. Math., 28(3):303–307, 1990.
  • [26] G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. nature, 435(7043):814–818, 2005.
  • [27] R. C. Pasco. Source coding algorithms for fast data compression. Ph.D. disssertation, Stanford Univ., May 1976.
  • [28] L. Peshkin. Structure induction by lossless graph compression. In Proceedings of 2007 Data Compression Conference (DCC’07), pages 53–62, 2007.
  • [29] N. Rashevsky. Life information theory and topology. Bulletin of Mathematical Biophysics 17, pages 229–235, 1955.
  • [30] J. J. Rissanen. Generalized kraft inequality and arithmetic coding. IBM Journal of Research and Development, 20(3):198–203, May 1976.
  • [31] M. Sauerhoff. On the entropy of models for the web graph, 2016.
  • [32] S. A. Savari. Compression of words over a partially commutative alphabet. IEEE Trans. Inf. Theory, 50(7):1425–1441, 2004.
  • [33] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379–423, 1948.
  • [34] C. E. Shannon. The lattice theory of information. Transactions of the IRE Professional Group on Information Theory, 1(1):105–107, 1953.
  • [35] J. Sun, E. Bollt, and D. Ben-Avraham. Graph compression - save information by exploiting redundancy. J. Statist. Mechan.: Theory Exper., pages 06001–06001, 2008.
  • [36] E. Trucco. A note on the information content of graphs. Bulletin of Mathematical Biophysics 18, pages 129–135, 1956.
  • [37] G. Turán. On the succinct representation of graphs. Discr. Appl. Math., 8(3):289–294, 1984.
  • [38] F. Willems, Y. Shtarkov, and T. Tjalkens. The context-tree weighting method: basic properties. IEEE Trans. Inf. Theory, 41(3):653–664, 1995.
  • [39] R. W. Yeung. Information Theory and Network Coding. Springer, Verlag, USA, 2008.
  • [40] H. Zenil, N. A. Kiani, and J. Tegnér. A review of graph and network complexity from an algorithmic information perspective. Entropy, 20(8), 2018.