Structural Entropy of the Stochastic Block Models

Jie Han , Tao Guo ^†, Qiaoqiao Zhou , Wei Han ^†, Bo Bai ^†, and Gong Zhang ^† J. Han, T. Guo, W. Han, B. Bai, and G. Zhang are with Theory Lab, Central Research Institute, 2012 Labs, Huawei Tech. Co., Ltd.; ([email protected], [email protected], [email protected], [email protected], [email protected])Q. Zhou is with the Department of Computer Science, School of Computing, National University of Singapore; ([email protected])

Abstract

With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the “structural information” only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erdős–Rényi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time.

In this paper, we consider the Stochastic Block Models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for Stochastic Block Models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the Stochastic Block Models, and provide a compression scheme that asymptotically achieves this entropy limit.

1 Introduction

Shannon’s metric of “Entropy” of information is a foundational concept of information theory [39, 9]. Given a discrete random variable $X$ with support set (that is, the possible outcomes) $x_{1},x_{2},\dots,x_{n}$ , which occurs with probability $p_{1},p_{2},\dots,p_{n}$ , the entropy of $X$ is defined as

H(X):=-\sum_{i=1}^{m}p_{i}\log p_{i},

where the logarithm here and throughout this paper is of base 2. Note that the entropy of $X$ is a function of the probability distribution of $X$ .

The entropy was originally created by Shannon in [33] as part of his theory of communication, where a data communication system consists of a data source $X$ , a channel and a receiver. The fundamental problem of communication is for the receiver to reliably recover what data was generated by the source, based on the bits it receives through the channel. Shannon proved that the entropy of the source $X$ plays a central role – in his source coding theorem it is shown that the entropy is the mathematical limit on how well the data can be losslessly compressed.

The question then arises: How to compress data that has structures, e.g., data in social networks? In Shannon’s 1953 less known paper [34] he argued for an extension of information theory, where data is considered as observations of a source, to “non-conventional data” (that is, lattices). Indeed, nowadays data appears in various formats and structures (e.g., sequences, expressions, interactions) and in drastically increasing amounts. In many scenarios, data is highly context-dependent and in particular, the structural information and the context information seem to be two conceptually different aspects. Therefore it is desirable to develop novel theory and efficient algorithms for extracting useful information from non-conventional data structures. Roughly speaking, such data consists of structural information, which, might be understood as the “shape” of the data, and context information which should be recognized as data labels.

It is well-known that complex networks (e.g., social networks) admit community structures [26]. That is, users within a group interact with each other more frequently than those outside the group. The Stochastic Block Model (SBM) [13] is a celebrated random graph model that has been widely used to study the community structures in graphs and networks. It provides a good benchmark to evaluate the performance of community detection algorithms and inspires the design of many algorithms for community detection tasks. The theoretical underpinnings of the SBM have been extensively studied and sharp thresholds for exact recovery have been successively established [2, 19, 3, 11]. We refer readers to [1] for a recent survey, where other interesting and important problems in SBM are also discussed.

In addition to the SBM model discussed in [1], there are other angles to study compression of data with graph structures. Asadi et. al. [6] investigated data compression on graphs with clusters. Zenil et. al. [40] have surveyed information-theoretic methods, in particular Shannon entropy and algorithmic complexity, for characterizing graphs and networks.

1.1 Compression of graphs

In recent years, graphical data and the network structures supporting them are becoming increasingly common and important in branches of engineering and sciences. To better represent and transmit graphical data, many works consider the problem of compressing the (random) graph up to isomorphism, i.e., compressing the structure of a graph. A graph $G$ contains a finite set $V$ of vertices and a set $E$ of edges each of which connects two vertices. A graph can be represented by a binary matrix (the adjacency matrix) that further can be viewed as a binary sequence. Thus, encoding a labeled graph (that is, all vertices need to be distinguished) is equivalent to encoding the $\binom{|V|}{2}$ -digit binary sequence, given certain probability distribution on all $\binom{|V|}{2}$ possible edges. However, such a string does not reflect internal symmetries that are conveyed by the graph automorphism, and sometimes we are only interested in the local or global structures in the graph, rather than the exact vertex labelings. The structural entropy is defined when the graphs are considered unlabeled, or simply called structures, where the vertices are viewed as undistinguishable. The goal of this natural definition is to capture the information of the structure, and thus provides a fundamental measure in graph/structure compression schemes.

The problem actually has a strong theoretical background. Back to 1984, Turán [37] raised the question of finding an efficient coding method for general unlabeled graphs on $n$ vertices, where a lower bound of $\binom{n}{2}-n\log n+O(n)$ bits is suggested. This lower bound can be seen by the number of unlabeled graphs [12]. The question was later answered by Naor [25] in 1990 who proposed such a representation that is optimal up to the first two leading terms when all unlabeled graphs are equally likely. In a recent paper Kieffer et al. [14] proved a structural complexity of a binary tree. There also have been some heuristic methods for real-world graph compression schemes, see [4, 7, 28, 32, 35]. Rather recently, Choi and Szpankowski [8] studied the structural entropy of the Erdős–Rényi random graph $\mathcal{G}(n,p)$ . They computed the structural entropy given that $p$ is not (very) close to 0 or 1 and also gave a compression scheme that matches their computation. Later, the structural entropy for other randomly generated graphs, e.g. the preferential attachment graphs and web graphs are also studied [18, 17, 31, 15].

However, it is well-known that the Erdős–Rényi model is too simplistic to model real networks, in particular due to its strong homogeneity and absence of community structure. In this paper, we consider the compression of graphical structures of the SBM, which in general model real networks better and circumvent the issues of the ER-model. In summary, our contributions are as follows:

•

We introduce the partitioned structural entropy which generalizes the structural entropy for unlabeled graphs and we show that it reflects the partition information of the SBM.
•

We provide an explicit formula for the partitioned structural entropy of the SBM.
•

We also propose a compression scheme that asymptotically achieves this entropy limit.

Semantic communications are considered as a key component of future generation networks, where a natural problem to consider is how to efficiently extract and transmit the “semantic information”. In the case of graph data, one may view the (partitioned) structures as the information that needs to be abstracted while the concrete labeling information is considered redundant. From this point of view, our result is a step for the study of semantic compression/communication under appropriate contexts.

1.2 Related works

Finally, we would like to point out that there are some other information metrics defined on graphs. The term “graph entropy” has been defined and used in the history. For example, graph entropy introduced by Kőrner in [16] denotes the number of bits one has to convey to resolve the ambiguity of a vertex in a graph. This notion also turns out to be useful in other areas, including combinatorics. Chromatic entropy introduced in [5] is the lowest entropy of any coloring of a graph. It finds application in zero-error source coding. We remark that the structural entropy we considered is quite different from the Kőrner graph entropy and chromatic entropy.

On the other hand, a concept of graph entropy (also called topological information content of a graph) was introduced by Rashevsky [29] and Trucco [36], and later by Mowshowitz [23, 20, 21, 22, 24, 10], which is defined as a function of (the structure of) a graph and an equivalence relation defined on its vertices or edges. Such a concept is a measure of the graph itself and does not involve any probability distribution.

2 Preliminaries

2.1 Structural entropy of unlabeled graphs

Now let us formally define the structural entropy given a probability distribution on unlabeled graphs.

Given an integer $n$ , define $\mathcal{G}_{n}$ as the collection of all $n$ -vertex labeled graphs.

Definition 2.1 (Entropy of Random Graph).

Given an integer $n$ and a probability distribution on $\mathcal{G}_{n}$ , the entropy of a random graph $\mathcal{G}\in\mathcal{G}_{n}$ is defined as

H_{\mathcal{G}}=\mathbb{E}[-\log P(G)]=-\sum_{G\in\mathcal{G}_{n}}P(G)\log P(G)

where $P(G)\triangleq P(\mathcal{G}=G)$ is the probability of a graph $G$ in $\mathcal{G}_{n}$ .

Then the random structure model $\mathcal{S}_{n}$ associated with the probability distribution $\mathcal{G}_{n}$ , is defined as the unlabeled version of $\mathcal{G}_{n}$ . For a given $S\in\mathcal{S}_{n}$ , the probability of $S$ can be computed as

P(S)=\sum_{G\cong S,G\in\mathcal{G}_{n}}P(G).

Here $G\cong S$ means that $G$ and $S$ have the same structure, that is, $S$ is isomorphic to $G$ . Clearly if all isomorphic labeled graphs have the same probability, then for any labeled graph $G\cong S$ , one has

P(S)=N(S)\cdot P(G)

where $N(S)$ stands for the number of different labeled graphs that have the same structure as $S$ .

Definition 2.2 (Structural Entropy).

The structural entropy $H_{\mathcal{S}}$ of a random graph $\mathcal{G}$ is defined as the entropy of a random structure $\mathcal{S}$ associated with $\mathcal{G}_{n}$ , that is,

H_{\mathcal{S}}=\mathbb{E}[-\log P(S)]=-\sum_{S\in\mathcal{S}}P(S)\log P(S)

where the sum is over all distinct structures.

The Erdős–Rényi random graph $\mathcal{G}(n,p)$ , also called the binomial random graph, is a fundamental random graph model, which has $n$ vertices and each pair of vertices is connected with probability $p$ , independent of other pairs. In 2012, Choi and Szpankowski [8] proved the following for the Erdős–Rényi random graphs.

Theorem 2.3 (Choi and Szpankowski, [8]).

For large $n$ and all $p$ satisfying $n^{-1}\ln n\ll p$ and $1-p\gg n^{-1}\ln n$ , the following holds:

1.

The structural entropy $H_{\mathcal{S}}$ of $\mathcal{G}(n,p)$ is

$H_{\mathcal{S}}=\binom{n}{2}h(p)-\log n!+O\left(\frac{\log n}{n^{\alpha}}\right)$

for some $\alpha>0$ .
2.

For a structure $S$ of $n$ vertices and $\varepsilon>0$

$P\left(\left|-\frac{1}{\binom{n}{2}}\log P(S)-h(p)+\frac{\log n!}{\binom{n}{2}}\right|<\varepsilon\right)>1-2\varepsilon$

where $h(p)=-p\log p-(1-p)\log(1-p)$ is the entropy rate of a binary memoryless source.

Furthermore, they [8] also presented a compression algorithm for unlabeled graphs that asymptotically achieves the structural entropy up to an $O(n)$ error term.

2.2 Stochastic Block Model – Our result

It is well-known that the ER model is too simplistic to model real networks, in particular due to its strong homogeneity and absence of community structure. The Stochastic Block Model is then introduced on the assumption that vertices in a network connect independently but with probability based on their profiles, or equivalently, on their community assignment. For example, in the SBM with two communities and symmetric parameters, also known as the planted bisection model, denoted by $\mathcal{G}(n,p,q)$ , the vertex set is partitioned into two sets $V_{1}$ and $V_{2}$ , any pair of vertices inside $V_{1}$ or $V_{2}$ are connected with probability $p$ and any pair of vertices across the clusters are connected with probability $q$ , and all these connections are independent.

As an illuminating example, consider a context $G$ where there are $n/2$ users and $n/2$ devices, and each pair of users and each pair of devices are connected with probability $p$ , a user and a device is connected with probability $q$ and each of these connections is independent of all other connections. Suppose that we need to compress the information of $G$ . However, in the context it is not appropriate to view $G$ as an unlabeled graph, that is, in addition to the structure information, it is also important to keep the “community” information – the compression also needs to encode the information that who is a user and who is device.

Definition 2.4 (Partition-respecting isomorphism, Partitioned Unlabeled Graphs).

Let $r\leq n$ be integers. Suppose $V$ is a set of $n$ vertices and $\mathcal{P}=\{V_{1},V_{2},\dots,V_{r}\}$ is a partition of $V$ into $r$ parts. The partition-respecting isomorphism, denoted by “ $\cong_{\mathcal{P}}$ ” is defined as follows. For any two labeled graphs $G$ and $G^{\prime}$ , we write $G\cong_{\mathcal{P}}G^{\prime}$ if and only if $G\cong G^{\prime}$ they are isomorphic via an isomorphism function $\phi:V\to V$ such that $\phi(V_{i})=V_{i}$ , for $1\leq i\leq r$ . Then $\Gamma_{\mathcal{P}}$ is defined as the collection of $n$ -vertex graphs on $V$ where we ignore the labels of vertices inside each $V_{i}$ , $1\leq i\leq r$ , namely, the equivalence classes under partition-respecting isomorphism, with respect to $\mathcal{P}$ .

Note that every labeled graph $G$ corresponds to a unique structure $S\in\Gamma_{\mathcal{P}}$ , and we use $G\cong_{\mathcal{P}}S$ to denote this relation. Furthermore, under the above definition, general unlabeled graphs correspond to the case $r=1$ .

Definition 2.5 (Partitioned Structural Entropy).

Let $V$ be a set of $n$ vertices where $n\in\mathbb{N}$ . Suppose $\mathcal{P}=\{V_{1},V_{2},\dots,V_{r}\}$ is a partition of $V$ into $r$ parts and $\mathcal{S}_{n}$ is a probability distribution over all partitioned unlabeled graphs on $n$ vertices. Then the structural entropy $H_{\mathcal{S}}$ associated to $\mathcal{S}_{n}$ is defined by

H_{\mathcal{S}}=\mathbb{E}[-\log P(S)]=-\sum_{S\in{\mathcal{S}_{n}}}P(S)\log P(S).

In this paper we extend Theorem 2.3 to the structural entropy of the Stochastic Block Model with any given number of blocks, and provide a compression algorithm that asymptotically matches this structural entropy. We first give the result for the balanced bipartition case $\mathcal{G}(n,p,q)$ .

Theorem 2.6.

Let $V=V_{1}\cup V_{2}$ be a set of $n$ vertices and $|V_{1}|=|V_{2}|=n/2$ . Suppose $\mathcal{G}(n,p,q)$ is a probability distribution of graphs on $V$ where every edge inside $V_{1}$ or $V_{2}$ is present with probability $p$ and every edge between $V_{1}$ and $V_{2}$ is present with probability $q$ , and these edges are mutually independent. For large even $n$ and all $p$ satisfying $n^{-1}\ln n\ll p,q$ and $1-p\gg n^{-1}\ln n$ , the following holds:

(i)

The partitioned structural entropy $H_{\mathcal{S}}$ of $\mathcal{G}(n,p,q)$ is

$H_{\mathcal{S}}=2\binom{n/2}{2}h(p)+\frac{n^{2}}{4}h(q)-2\log\left(\frac{n}{2}\right)!+O\left(\frac{\log n}{n^{\alpha}}\right)$ (1)

for some $\alpha>0$ .

(ii)

For a balanced bipartitioned structure $S$ and $\varepsilon>0$

P\left(\left|-\frac{1}{\binom{n}{2}}\log P(S)-\frac{n-2}{2n-2}h(p)-\frac{n}{2n-2}h(q)+\frac{2\log(n/2)!}{\binom{n}{2}}\right|<3\varepsilon\right)>1-4\varepsilon

where $h(p)=-p\log p-(1-p)\log(1-p)$ is the entropy rate of a binary memoryless source.

Note that the structural entropy $H_{\mathcal{S}}$ here is larger than that in Theorem 2.3 (even if $p=q$ ), which reflects the fact that the SBM with “a planted (bi-)partition” contains prefixed structures, so has less symmetries than $\mathcal{G}(n,p)$ , the pure random model¹¹1For $\mathcal{G}(n,p)$ , when it is asymmetric, comparing with the completely labeled graphs, Theorem 2.3 saves a term as $\log n!$ ; this saving becomes $2\log\left(n/2\right)!$ for the planted balanced bipartition case in Theorem 2.6..

3 Proof of Theorem 2.6

One key ingredient in the proof of Theorem 2.3 in [8] is the following lemma on the symmetry of $\mathcal{G}(n,p)$ . A graph is called asymmetric if its automorphism group does not contain any permutation other than identity; otherwise it is called symmetric.

Lemma 3.1 (Kim, Sudakov and Vu, 2002).

For all $p$ satisfying $n^{-1}\ln n\ll p$ and $1-p\gg n^{-1}\ln n$ , a random graph $G\in\mathcal{G}(n,p)$ is symmetric with probability $O(n^{-w})$ for any positive constant $w$ .

Proof of Theorem 2.6.

Note that every pair of vertices in $V_{1}$ or in $V_{2}$ should be considered as undistinguishable, but not the pairs of vertices in $X\times Y$ . Recall that we write $G\cong_{\mathcal{P}}S$ for a graph $G$ and a structure $S$ if $S$ represents the structure of $G$ (with respect to the partition $\mathcal{P}$ ).

Let $\mathcal{G}:=\mathcal{G}(n,p,q)$ . We first compute $H_{\mathcal{G}}$ . Note that there are $\binom{n}{2}$ possible edges in $G\in\mathcal{G}$ , and we can view it as a binary sequence of length $\binom{n}{2}$ , where each digit is a Bernoulli random variable. Moreover, for edges inside $V_{1}$ or $V_{2}$ , the random variable, denoted by $X_{1}$ , has expectation $p$ and for edges in $V_{1}\times V_{2}$ the random variable, denoted by $X_{2}$ , has expectation $q$ . Thus we have

	$\displaystyle H_{\mathcal{G}}$	$\displaystyle=-\mathbb{E}[\log X_{1}^{2\binom{n/2}{2}}X_{2}^{n^{2}/4}]$
		$\displaystyle=-2\binom{n/2}{2}\mathbb{E}[\log X_{1}]-\frac{n^{2}}{4}\mathbb{E}[\log X_{2}]$
		$\displaystyle=2\binom{n/2}{2}h(p)+\frac{n^{2}}{4}h(q).$

Now write $\mathcal{S}_{n}$ for the probability distribution on $V$ over all partitioned unlabeled graphs inherited from $\mathcal{G}$ , namely, given $S\in\Gamma_{\mathcal{P}}$ , $P(S)=\sum_{G\cong_{\mathcal{P}}S}P(G)$ . Let $H_{\mathcal{S}}$ be the partitioned structural entropy of $\mathcal{S}_{n}$ . Therefore, compared with our goal, it remains to show that

H_{\mathcal{S}}-H_{\mathcal{G}}=-2\log\left(n/2\right)!+O\left(\frac{\log n}{n^{\alpha}}\right).

(2)

Note that in $\mathcal{G}(n,p,q)$ , all labeled graphs $G\in\mathcal{G}$ such that $G\cong_{\mathcal{P}}S$ have the same probability $P(G)$ . Thus, given a (labeled) graph $G\in\mathcal{G}$ , we have $P(G)=P(S)/N(S)$ , where $S\in\mathcal{S}_{n}$ is such that $G\cong_{\mathcal{P}}S$ . So the graph entropy of $\mathcal{G}=\mathcal{G}(n,p,q)$ can be written as

$\displaystyle H_{\mathcal{G}}$	$\displaystyle=-\sum_{G\in\mathcal{G}}P(G)\log P(G)$
	$\displaystyle=-\sum_{S\in\mathcal{S}_{n}}\sum_{G\cong_{\mathcal{P}}S,G\in\mathcal{G}}P(G)\log P(G)$
	$\displaystyle=-\sum_{S\in\mathcal{S}_{n}}\sum_{G\cong_{\mathcal{P}}S,G\in\mathcal{G}}\frac{P(S)}{N(S)}\log\frac{P(S)}{N(S)}$
	$\displaystyle=-\sum_{S\in\mathcal{S}_{n}}P(S)\log\frac{P(S)}{N(S)}$
	$\displaystyle=H_{\mathcal{S}}+\sum_{S\in\mathcal{S}}P(S)\log N(S)$	(3)

Define $S[W]$ be be $S$ restricted on $W$ for $W\in V$ . Now we split $S$ into $S_{1}$ and $S_{2}$ , i.e., $S_{1}=S[V_{1}]$ and $S_{2}=S[V_{2}]$ . Write $Aut(S_{i})$ for the automorphism group for $S_{i}$ , and we naturally have

N(S)=\frac{(n/2)!\cdot(n/2)!}{|Aut(S_{1})||Aut(S_{2})|}.

Combining this with (2) and (3), it remains to show that

\sum_{S\in\mathcal{S}}P(S)\log|Aut(S_{1})||Aut(S_{2})|=O\left(\frac{\log n}{n^{\alpha}}\right).

In the summation above we only need to focus on $S$ such that either $S_{1}$ or $S_{2}$ is symmetric, as otherwise $\log|Aut(S_{1})||Aut(S_{2})|=\log 1=0$ . By Lemma 3.1, we conclude that the probability of $S$ restricted on $V_{1}$ or $V_{2}$ is symmetric is $O(n^{-1-\alpha})$ for some $\alpha>0$ , and for such $S$ we use the trivial bound $\log|Aut(S_{1})||Aut(S_{2})|\leq 2\log(n/2)!\leq 2n\log n$ . This gives us the desired estimate in (i)

\sum_{S\in\mathcal{S}}P(S)\log|Aut(S_{1})||Aut(S_{2})|\leq 2n\log n\cdot O(n^{-1-\alpha})=O\left(\frac{\log n}{n^{\alpha}}\right).

To show (ii), for a set $V$ of $n$ vertices and a balanced bipartition $\mathcal{P}=(V_{1},V_{2})$ of $V$ , we define the typical set $T_{\varepsilon}^{n}$ as the set of structures $S$ on $n$ vertices satisfying (a) $S$ is asymmetric on $V_{1}$ and $V_{2}$ , respectively; (b) for $G\cong_{\mathcal{P}}S$

2^{-2\binom{n/2}{2}h(p)-\tfrac{n^{2}}{4}h(q)-\binom{n}{2}\varepsilon}\leq P(G)\leq 2^{-2\binom{n/2}{2}h(p)-\tfrac{n^{2}}{4}h(q)+\binom{n}{2}\varepsilon}.

Denote by $T_{1}^{n}$ and $T_{2}^{n}$ the sets of structures satisfying the properties (a) and (b), respectively and thus we have $T_{\varepsilon}^{n}=T_{1}^{n}\cap T_{2}^{n}$ . Firstly, by the asymmetry of $\mathcal{G}(n,p)$ (Lemma 3.1), we conclude that $P(T_{1}^{n})>1-2\varepsilon$ for large $n$ . Secondly, we use a binary sequence of length $\binom{n}{2}$ to represent a (labeled) instance $G$ of $\mathcal{G}(n,p,q)$ , where the first $\binom{n/2}{2}$ bits $\mathcal{L}_{1}$ represent the induced subgraph on $V_{1}$ , the next $\binom{n/2}{2}$ bits $\mathcal{L}_{2}$ represent the induced subgraph on $V_{2}$ , and finally the rest $n^{2}/4$ bits $\mathcal{L}_{12}$ represent the bipartite graph on $V_{1}\times V_{2}$ . Since all edges of $G$ are generated independently, both $\mathcal{L}_{1}$ and $\mathcal{L}_{2}$ have in expectation $\binom{n/2}{2}p$ 1’s and the AEP property of the binary sequences implies that

2^{-\binom{n/2}{2}h(p)-\binom{n}{2}\varepsilon}\leq P(G[V_{1}]),P(G[V_{2}])\leq 2^{-\binom{n/2}{2}h(p)+\binom{n}{2}\varepsilon}

holds with probability at least $1-2\varepsilon$ . Similarly, $\mathcal{L}_{12}$ has in expectation $(n^{2}/4)q$ 1’s and the AEP property of the binary sequences gives that with probability at least $1-\varepsilon$ ,

2^{-\tfrac{n^{2}}{4}h(q)-\binom{n}{2}\varepsilon}\leq P(G[V_{1},V_{2}])\leq 2^{-\tfrac{n^{2}}{4}h(q)+\binom{n}{2}\varepsilon}

Since these edges are independent, we finally conclude that (b) holds with probability at least $1-3\varepsilon$ . Thus, $P(T_{\varepsilon}^{n})\geq 1-4\varepsilon$ . Now we can compute $P(S)$ for $S\in T_{\varepsilon}^{n}$ . By (a), $P(S)=(n/2)!(n/2)!P(G)$ for any $G\cong S$ . Together with (b) and straightforward computation, the assertion of (ii) follows. ∎

4 SBM Compression Algorithm

Given the computation of the structural entropy, a natural next step is to design efficient compression schemes that are close to or even (asymptotically) achieve this entropy limit. Choi and Szpankowski [8] presented such an algorithm (which they named Szip) for (unlabeled) random graphs, which uses in expectation at most $\binom{n}{2}h(p)-n\log n+O(n)$ bits and asymptotically achieves the structural entropy given in Theorem 2.3. Roughly speaking, Szip greedily peels off vertices from the graph and (efficiently) store the neighborhood information. This procedure can be simply reversed but the labeling of the recovered graph may be different from the original graph, which is the reason on why a saving of the codeword length is achieved. Refinements and analysis [8] are also provided to achieve the proposed performance.

Here we give an algorithm that optimally compresses SBMs which uses the Szip algorithm as building blocks and matches the structural entropy computation in Theorem 2.6. The algorithm consists of two stages. It first compresses $S[V_{1}]$ and $S[V_{2}]$ using Szip and then compresses $S[V_{1},V_{2}]$ using an arithmetic compression algorithm with the help of Szip decoding outputs.

Figure 1: Illustration of compression algorithm

To give a brief description of the compression algorithm, we again use the balanced bipartition $V_{1}\cup V_{2}$ as an example. The encoding and decoding procedure of the algorithm is illustrated in Figure 1. The algorithm encodes the observed $\mathcal{S}(n,p,q)$ into a binary string as follows. It uses Szip as a subroutine to compress $S[V_{1}]$ and $S[V_{2}]$ into binary sequences $\mathcal{L}_{1}$ and $\mathcal{L}_{2}$ . Then, as part of the encoder, we run the Szip decoder on $\mathcal{L}_{1}$ and $\mathcal{L}_{2}$ to obtain decoded structures $S^{\prime}[V_{1}]$ and $S^{\prime}[V_{2}]$ , respectively. We then compress $S[V_{1},V_{2}]$ as a labeled bipartite graph under the vertex labeling of $S^{\prime}[V_{1}]$ and $S^{\prime}[V_{2}]$ into $\mathcal{L}_{12}$ . This “Labeled Encoder” can be done by treating it as a binary sequence of length $n^{2}/4$ and compressing using a standard arithmetic encoder [27, 30, 38]. The concatenation of Szip algorithms and the arithmetic encoder forms the cascade encoder of our algorithm and obtains the codeword $(\mathcal{L}_{1},\mathcal{L}_{2},\mathcal{L}_{12})$ . Upon receiving the codeword, we decode them parallelly using Szip decoder and the arithmetic decoder. This completes our algorithm.

The main challenge in the design of our algorithm is how the decoder can retrieve the consistency between the bipartite graph $S[V_{1},V_{2}]$ and the decoded version of $S[V_{1}]$ and $S[V_{2}]$ . A key observation here is that since Szip is a deterministic algorithm, although it may permute the vertex labelings, its output is an invariant given the same input. Given this, our solution here is to first run Szip (both encoding and decoding) at the encoder, and obtain structures $S^{\prime}[V_{1}]$ and $S^{\prime}[V_{2}]$ , respectively. We then compress $S[V_{1},V_{2}]$ (as a labeled bipartite graph) under the vertex labeling of $S^{\prime}[V_{1}]$ and $S^{\prime}[V_{2}]$ . This would guarantee that the decoded structures $\hat{S}[V_{1}]$ , $\hat{S}[V_{2}]$ and $\hat{S}[V_{1},V_{2}]$ share the same vertex labeling as $S^{\prime}[V_{1}]$ and $S^{\prime}[V_{2}]$ , namely, $S$ is recovered.

Before discussing the performance of the algorithm, we first describe some useful properties of the arithmetic compression algorithm in the following lemma. We omit the proof of the lemma, which follows from the analysis in [27, 30, 38] and AEP properties in [39, 9].

Lemma 4.1.

Let $L$ be the codeword length of the arithmetic compression algorithm when compressing a binary sequence with length $m$ and entropy rate $h$ . For large $m$ , the following holds:

$(i)$

The expected codeword length asymptotically achieves the entropy of the message, i.e.,

$\mathbb{E}[L]=mh+O(\log m).$ (4)
$(ii)$

For any $\epsilon>0$ ,

$P(|L-\mathbb{E}[L]|\leq\epsilon\log m)\geq 1-o(1).$ (5)
$(iii)$

The arithmetic algorithm runs in time $O(m)$ .

The following theorem characterizes the performance of our algorithm. It is immediate from Theorem 2 in [8] (performance of Szip) and Lemma 4.1, we omit the detailed proofs here.

Theorem 4.2.

Let $V=V_{1}\cup V_{2}$ be a set of $n$ vertices and $|V_{1}|=|V_{2}|=n/2$ . Given a partitioned unlabeled graph $S$ on $V$ , let $L(S)$ be the codeword length given by our algorithm. For large $n$ , our algorithm runs in time $O(n^{2})$ , and satisfies the following:

$(i)$

The algorithm asymptotically achieves the structural entropy in (1) ²²2Note that $(n/2)\log(n/2)=n\log n+O(n)$ ., i.e.,

$\mathbb{E}[L(S)]\leq 2\binom{n/2}{2}h(p)+\frac{n^{2}}{4}h(q)-n\log n+O(n).$
$(ii)$

For any $\epsilon>0$ ,

$P(|L(S)-\mathbb{E}[L(S)]|\leq\epsilon n\log n)\geq 1-o(1).$

5 General SBM with $r\geq 2$ blocks

In previous sections, we discussed the structural entropy of SBM and the compression algorithm that asymptotically achieves this structural entropy for the balanced bipartition case ( $r=2$ ). The corresponding results in Theorem 2.6 and Theorem 4.2 can be easily generalized to the general $r$ -partition case. We briefly describe the generalizations below.

5.1 Structural entropy

Our approach can deal with general SBMs similarly. In a general SBM with $r\geq 2$ parts, an $r\times r$ symmetric matrix $P$ is used to describe the probabilities between and within the communities, where two vertices $u\in V_{i}$ and $v\in V_{j}$ are connected by an edge with probability $P_{ij}$ ( $i$ and $j$ are not necessarily distinct). To simplify the presentation, we only present the results below in its special form where $P_{ij}=p$ if $i=j$ and $P_{ij}=q$ if $i\neq j$ , and we remark that similar results hold in the general case as well. We first give the result on the computation of the partitioned structural entropy of SBM.

Theorem 5.1.

Fix $r$ reals $x_{1},x_{2},\dots,x_{r}$ in $(0,1)$ whose sum is 1. Let $V=V_{1}\cup V_{2}\cup\cdots\cup V_{r}$ be a set of $n$ vertices with a partition into $r$ parts such that $|V_{i}|=x_{i}n$ . For large $n$ and all $p$ satisfying $n^{-1}\ln n\ll p,q$ and $1-p\gg n^{-1}\ln n$ , the following holds:

$(i)$

The $r$ -partitioned structural entropy $H_{\mathcal{S}}^{r}$ for a partitioned structure $\mathcal{S}$ on $V$ is

$H_{\mathcal{S}}^{r}=z(h(p)-h(q))+\binom{n}{2}h(q)-r\log\left(\frac{n}{r}\right)!+O\left(\frac{\log n}{n^{\alpha}}\right)$ (6)

for some $\alpha>0$ , where $z:=\sum_{i=1}^{r}\binom{x_{i}n}{2}$ .

(ii)

For a partitioned structure $S$ on $V$ and $\varepsilon>0$

P\left(\left|-\frac{1}{\binom{n}{2}}\log P(S)-\frac{z}{\binom{n}{2}}(h(p)-h(q))-h(q)+\frac{r\log(n/r)!}{\binom{n}{2}}\right|<3\varepsilon\right)>1-4\varepsilon.

5.2 Compression algorithm

The compression algorithm for a general $r$ with vertex partition $\{V_{1},V_{2},\dots,V_{r}\}$ can be viewed as a union of the compression algorithms for $S[V_{i}]$ and $S[V_{i},V_{j}]$ ( $i<j\in\{1,2,\dots,r\}$ ). To be more precise, we describe the algorithm as follows. It first compresses all $S[V_{i}]$ into $\mathcal{L}_{i}$ using Szip. Then run the Szip decoder with input $\mathcal{L}_{i}$ to obtain the decoded structure $S^{\prime}[V_{i}]$ . With the indices of $S^{\prime}[V_{i}]$ , $i=1,2,\dots,r$ , we can compress $S[V_{1},V_{2},\dots,V_{r}]$ as a labeled $r$ -partite graph into $\mathcal{L}$ using an arithmetic encoder. This completes the encoding procedure and gives the codewords $\mathcal{L}_{1},\dots,\mathcal{L}_{r},\mathcal{L}$ , for which we concatenate together and get the final codeword. The decoding is to simply run the Szip decoders and labeled (arithmetic) decoders parallelly. The correctness of the decoding output can also be argued accordingly.

The performance of the algorithm can be obtained similar to Theorem 4.2 as follows.

Theorem 5.2.

Fix $r$ reals $x_{1},x_{2},\dots,x_{r}$ in $(0,1)$ whose sum is 1. Let $V=V_{1}\cup V_{2}\cup\cdots\cup V_{r}$ be a set of $n$ vertices with a partition into $r$ parts such that $|V_{i}|=x_{i}n$ . Given a partitioned unlabeled graph $S$ on $V$ , let $L(S)$ be the codeword length given by our algorithm. For large $n$ , our algorithm runs in time $O(n^{2})$ , and satisfies the following:

$(i)$

The algorithm asymptotically achieves the structural entropy in (6), i.e.,

$\mathbb{E}[L(S)]\leq\sum_{i=1}^{r}\binom{x_{i}n}{2}(h(p)-h(q))+\binom{n}{2}h(q)-n\log n+O(n).$
$(ii)$

For any $\epsilon>0$ ,

$P(|L(S)-\mathbb{E}[L(S)]|\leq\epsilon n\log n)\geq 1-o(1).$

6 Conclusion

In this paper we defined the partitioned unlabeled graphs and partitioned structural entropy, which generalize the structural entropy for unlabeled graphs introduced by Choi and Szpankowski [8]. We then computed the partitioned structural entropy for Stochastic Block Models and gave a compression algorithm that asymptotically achieves this structural entropy limit. As mentioned earlier, we believe that in appropriate contexts the structural information of a graph or network can be interpreted as a kind of semantic information, in which case, the communication schemes may benefit from structural compressions which considerably reduce the cost.

References

[1] E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
[2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Trans. Inf. Theory, 62(1):471–487, 2015.
[3] E. Abbe and C. Sandon. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, pages 670–688. IEEE, 2015.
[4] M. Adler and M. Mitzenmacher. Towards compressing web graphs. In Proceedings DCC 2001. Data Compression Conference, pages 203–212, 2001.
[5] N. Alon and A. Orlitsky. Source coding and graph entropies. IEEE Trans. Inf. Theory, 42(5):1329–1339, 1996.
[6] A. R. Asadi, E. Abbe, and S. Verdú. Compressing data on graphs with clusters. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 1583–1587, 2017.
[7] F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On compressing social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM KDD ’09, page 219–228, New York, NY, USA, 2009. Association for Computing Machinery.
[8] Y. Choi and W. Szpankowski. Compression of graphical structures: Fundamental limits, algorithms, and experiments. IEEE Trans. Inf. Theory, 58(2):620–638, 2012.
[9] T. M. Cover and J. A. Thomas. Elements of Information Theory 2nd Edition. Wiley-Interscience, USA, 2006.
[10] M. Dehmer and A. Mowshowitz. A history of graph entropy measures. Information Science, 181(1):57–78, Jan. 2011.
[11] B. Hajek, Y. Wu, and J. Xu. Information limits for recovering a hidden community. IEEE Trans. Inf. Theory, 63(8):4729–4745, 2017.
[12] F. Harary and E. M. Palmer. Graphical Enumeration. New York: Academic, 1973.
[13] P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social networks, 5(2):109–137, 1983.
[14] J. C. Kieffer, E.-H. Yang, and W. Szpankowski. Structural complexity of random binary trees. In Proceedings of 2009 IEEE International Symposium on Information Theory (ISIT), pages 635–639, Seoul, 2009.
[15] I. Kontoyiannis, Y. H. Lim, K. Papakonstantinopoulou, and W. Szpankowski. Symmetry and the entropy of small-world structures and graphs. In Proceedings of 2021 IEEE International Symposium on Information Theory (ISIT), pages 3026–3031, 2021.
[16] J. Körner. Coding of an information source having ambiguous alphabet and the entropy of graphs. In 6th Prague conference on information theory, pages 411–425, 1973.
[17] T. Łuczak, A. Magner, and W. Szpankowski. Asymmetry and structural information in preferential attachment graphs. Random Structures & Algorithms, 55:696–718, 03 2019.
[18] T. Łuczak, A. Magner, and W. Szpankowski. Compression of preferential attachment graphs. In Proceedings of 2019 IEEE International Symposium on Information Theory (ISIT), pages 1697–1701, 2019.
[19] E. Mossel, J. Neeman, and A. Sly. Consistency thresholds for the planted bisection model. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 69–75, 2015.
[20] A. Mowshowitz. Entropy and the complexity of graphs ii: the information content of digraphs and infinite graphs. Bulletin of Mathematical Biophysics 30, pages 225–240, 1968.
[21] A. Mowshowitz. Entropy and the complexity of graphs iii: graphs with prescribed information content. Bulletin of Mathematical Biophysics 30, pages 387–414, 1968.
[22] A. Mowshowitz. Entropy and the complexity of graphs iv: entropy measures and graphical structure. Bulletin of Mathematical Biophysics 30, pages 533–546, 1968.
[23] A. Mowshowitz. Entropy and the complexity of the graphs i: an index of the relative complexity of a graph. Bulletin of Mathematical Biophysics 30, pages 175–204, 1968.
[24] A. Mowshowitz and M. Dehmer. Entropy and the complexity of graphs revisited. Entropy, 14(3):559–570, 2012.
[25] M. Naor. Succinct representation of general unlabeled graphs. Discr. Appl. Math., 28(3):303–307, 1990.
[26] G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. nature, 435(7043):814–818, 2005.
[27] R. C. Pasco. Source coding algorithms for fast data compression. Ph.D. disssertation, Stanford Univ., May 1976.
[28] L. Peshkin. Structure induction by lossless graph compression. In Proceedings of 2007 Data Compression Conference (DCC’07), pages 53–62, 2007.
[29] N. Rashevsky. Life information theory and topology. Bulletin of Mathematical Biophysics 17, pages 229–235, 1955.
[30] J. J. Rissanen. Generalized kraft inequality and arithmetic coding. IBM Journal of Research and Development, 20(3):198–203, May 1976.
[31] M. Sauerhoff. On the entropy of models for the web graph, 2016.
[32] S. A. Savari. Compression of words over a partially commutative alphabet. IEEE Trans. Inf. Theory, 50(7):1425–1441, 2004.
[33] C. E. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379–423, 1948.
[34] C. E. Shannon. The lattice theory of information. Transactions of the IRE Professional Group on Information Theory, 1(1):105–107, 1953.
[35] J. Sun, E. Bollt, and D. Ben-Avraham. Graph compression - save information by exploiting redundancy. J. Statist. Mechan.: Theory Exper., pages 06001–06001, 2008.
[36] E. Trucco. A note on the information content of graphs. Bulletin of Mathematical Biophysics 18, pages 129–135, 1956.
[37] G. Turán. On the succinct representation of graphs. Discr. Appl. Math., 8(3):289–294, 1984.
[38] F. Willems, Y. Shtarkov, and T. Tjalkens. The context-tree weighting method: basic properties. IEEE Trans. Inf. Theory, 41(3):653–664, 1995.
[39] R. W. Yeung. Information Theory and Network Coding. Springer, Verlag, USA, 2008.
[40] H. Zenil, N. A. Kiani, and J. Tegnér. A review of graph and network complexity from an algorithmic information perspective. Entropy, 20(8), 2018.