This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Differentially Describing Groups of Graphs

Corinna Coupette,​\equalcontrib1 Sebastian Dalleiger,​\equalcontrib2 Jilles Vreeken2
Abstract

How does neural connectivity in autistic children differ from neural connectivity in healthy children or autistic youths? What patterns in global trade networks are shared across classes of goods, and how do these patterns change over time? Answering questions like these requires us to differentially describe groups of graphs: Given a set of graphs and a partition of these graphs into groups, discover what graphs in one group have in common, how they systematically differ from graphs in other groups, and how multiple groups of graphs are related. We refer to this task as graph group analysis, which seeks to describe similarities and differences between graph groups by means of statistically significant subgraphs. To perform graph group analysis, we introduce Gragra, which uses maximum entropy modeling to identify a non-redundant set of subgraphs with statistically significant associations to one or more graph groups. Through an extensive set of experiments on a wide range of synthetic and real-world graph groups, we confirm that Gragra works well in practice.

Introduction

Differentially describing groups of graphs lies at the heart of many scientific and societal challenges. Neuroscientists, for example, might want to characterize brain activity in healthy subjects, elucidate how it differs from brain activity in subjects diagnosed with certain disorders or diseases (e.g., autism or Alzheimer’s), and investigate whether their findings are the same across different groups of subjects (e.g., children, adolescents, and adults; or men and women). Policymakers, security experts, and epidemiologists alike could seek to understand patterns of human mobility, be it to improve the resilience of traffic infrastructure to random failures and targeted attacks, or to curb the spread of infectious diseases. And international economists might want to investigate patterns of world trade, e.g., imports and exports between countries, and ask how these vary across different years and product classes.

We refer to the common task underlying these scenarios as graph group analysis: Given a set of graphs and a partition of this set into graph groups, succinctly summarize the commonalities and differences between graphs in the same group, between graphs in different groups, and between the relationships connecting the groups. In this paper, we formalize graph group analysis as a maximum likelihood modeling problem, using significant subgraphs as graph patterns to factorize our probability distribution. We introduce Gragra (Graph group analysis) as an algorithm to solve this problem, which jointly discovers a set of graph patterns and an assignment of these patterns to graph groups.

Refer to caption
Figure 1: Gragra discovers common and contrastive graph patterns in noisy, heterogeneous groups of graphs, capturing, e.g., systematic similarities (left) and differences (right) between the functional brain networks of adolescents with and without autism spectrum disorder. Here, nodes represent centers of mass for brain regions from the AAL Atlas, and edge color classes correspond to significant subgraphs shared between (left) or specific to (right) groups, with individual edges signaling strong connectivity between regions.

As a real-world example of graph group analysis, consider Fig. 1. Here, we show the top shared (left) and specific (right) patterns identified in resting-state functional brain networks of adolescents with and without autism spectrum disorder, where nodes in the graphs correspond to brain regions, and edges signal strong connectivity between regions. On the right, patterns with red edges are characteristic of autistic adolescents, and patterns with blue edges are characteristic of non-autistic adolescents. They indicate over- and underconnectivity, respectively, in the brains of autistic adolescents when compared to typically developed controls. Although there is no consensus regarding the relationships between autism and neural connectivity (Hull et al. 2017), our method identifies graph patterns that permit neuroscientific interpretation: For example, the dark blue pattern in Fig. 1 indicates underconnectivity between the visual cortex, responsible for processing visual information, and the lingual gyrus, involved in vision and word processing.

Graph group analysis is related to graph classification (e.g., Lee, Rossi, and Kong 2018), but we are interested not only in what is different but also in what is similar among our graph groups. Our task further shares some of its motivation with significant subgraph mining (e.g., Sugiyama et al. 2015), graph summarization (e.g., Liu et al. 2018), and data clustering with graphs as data points (e.g., Mukherjee, Sarkar, and Lin 2017). However, we focus on a complete characterization of a set of graphs under a given partition—a cornerstone of scientific discovery involving graph data.

The remainder of the paper is structured as follows. After settling our basic notation, we describe the theoretical foundations of our method and introduce our algorithm. Having covered related work, through experiments on synthetic and real-world data, we demonstrate that Gragra works well in practice, before rounding up with discussion and conclusions. All our data, code, and results are publicly available.​111https://doi.org/10.5281/zenodo.6342823

Preliminaries

We consider a set 𝒢={G1,,G|𝒢|}\mathcal{G}=\{G_{1},\dots,G_{|\mathcal{G}|}\} of |𝒢||\mathcal{G}| node-aligned graphs Gi=(V,Ei)G_{i}=(V,E_{i}) with n=|V|n=|V| nodes and mi=|Ei|m_{i}=|E_{i}| edges, omitting the subscripts when clear from context. A partition Π={𝒢1,,𝒢k}\Pi=\{\mathcal{G}_{1},\dots,\mathcal{G}_{k}\} is a set of kk non-empty subsets of 𝒢i𝒢\mathcal{G}_{i}\subseteq\mathcal{G}, called graph groups, of cardinalities ci=|𝒢i|c_{i}=|\mathcal{G}_{i}|, whose disjoint union is 𝒢\mathcal{G}. Our graphs can be undirected or directed, loopy or non-loopy, and unweighted, edge-labeled, or integer weighted, where for the purposes of our model, we treat distinct edge labels or edge weights as a set WW of categories, and edges eEie\in E_{i} are drawn from the set =V×V×W\mathcal{E}=V\times V\times W of all possible weighted edges.

The empirical frequency of edge set XX\subseteq\mathcal{E} in group 𝒢i\mathcal{G}_{i} is qi(X)=|{(V,E)𝒢iXE}|/ciq_{i}(X)=\left\lvert\{(V,E)\in\mathcal{G}_{i}\mid X\subseteq E\}\right\rvert/c_{i}, and we denote by VXV_{X} the set of nodes incident with at least one edge in XX.

We base our probabilistic model on the maximum entropy principle, by which the distribution that best reflects a given set of constraints without introducing additional assumptions is the distribution with maximum Shannon entropy (Jaynes 1982). Thus, the expected frequency of XX in 𝒢i\mathcal{G}_{i} under a given set of edge sets S2S\subseteq 2^{\mathcal{E}} is

pi(XS)=𝔼f[X]=Y2,XYf(YS),\displaystyle p_{i}(X\mid S)=\operatorname{\mathbb{E}}_{f}[X]=\sum_{\begin{subarray}{c}Y\in 2^{\mathcal{E}},~{}X\subseteq Y\end{subarray}}f(Y\mid S)\;,

where 22^{\mathcal{E}} is the power set of \mathcal{E}, and ff is the distribution maximizing argmaxf{fXlogfX}\operatorname*{arg\,max}_{f}\{-\sum f_{X}\log f_{X}\}, subject to linear constraints 𝔼f[X]=qi(X)\operatorname{\mathbb{E}}_{f}[X]=q_{i}(X) for all elements in SS (Csiszár 1975). That is,

f(XS)=θ0YiS,YiXθi,\displaystyle f(X\mid S)=\theta_{0}\prod_{\begin{subarray}{c}Y_{i}\in S,~{}Y_{i}\subseteq X\end{subarray}}\theta_{i}\;,

where θ0\theta_{0} and all θi\theta_{i} are real-valued model parameters. Finding the distribution ff is a convex problem that involves computing the expected frequency pi(XS)p_{i}(X\mid S) over exponentially many elements. This is intractable if done naïvely, but there exist practical approaches that factorize pip_{i} into a product of independent distributions (Mampaey, Vreeken, and Tatti 2012; Dalleiger and Vreeken 2020a, b).

Theory

We now lay the theoretical foundations of our method, introducing our probabilistic model, our objective function, and our statistical test. At a high level, our goal in graph group analysis is to discover a set SS of graph patterns, i.e., edge sets of connected subgraphs, and an association matrix AA assigning graph patterns to graph groups, such that SS and AA together reveal the similarities and differences between graphs in the same group, between graphs in different groups, and between the relationships connecting the groups. A pattern is specific if we assign it to only one graph group, and it is shared if we assign it to several graph groups. We choose which patterns to include in our model based on the information we gain from them, testing whether this gain is statistically significant to rule out spurious results.

To avoid redundancy, we assign a pattern XX to a group 𝒢i\mathcal{G}_{i} iff XX is informative for 𝒢i\mathcal{G}_{i}, given what we already know about all groups. More precisely, using XX as a column index of AA in a slight abuse of notation, we set AiX=1A_{iX}=1 iff XX is informative for 𝒢i\mathcal{G}_{i} under our current model (S,A)(S,A). We assess this by comparing the empirical frequency of XX in group 𝒢i\mathcal{G}_{i}, qi(X)q_{i}(X), to its expected frequency in that group under our current model, pi(XSi)p_{i}(X\mid S_{i}), where Si={XSAiX=1}S_{i}=\{X\in S\mid A_{iX}=1\}, and pip_{i} is obtained from a practical approximation of the maximum entropy distribution with constraint set SiS_{i}. XX is informative for 𝒢i\mathcal{G}_{i} iff qi(X)q_{i}(X) is significantly different from pi(XSi)p_{i}(X\mid S_{i}), as judged by a statistical test, and we add XX to SS (and column XX to AA) if XX is informative for some 𝒢iΠ\mathcal{G}_{i}\in\Pi.

To identify a suitable set of graph patterns SS and an adequate association matrix AA, we exploit the interplay between two steps. First, we discover the best pattern XX to add to SS, given the current (S,A)(S,A), and second, we identify the best assignment of XX to graph groups to update AA, given the current (S,A)(S,A) and a new pattern XX. We now describe each step in more detail.

Identifying Informative Graph Patterns

To measure the likelihood of a set S2S\subseteq 2^{\mathcal{E}} of graph patterns, we use the Bayesian Information Criterion (BIC),

BIC(S)=(S)+(k|S|)/2log|𝒢|(Schwarz 1978),\displaystyle\operatorname{BIC}(S)=\ell(S)+(k\cdot\left\lvert S\right\rvert)/2\log|\mathcal{G}|\quad\text{\cite[citep]{(\@@bibref{AuthorsPhrase1Year}{schwarz:78:estimating}{\@@citephrase{{} }}{})}},

where k|S|k\cdot\left\lvert S\right\rvert is the number of coefficients in our model, and

(S)=ii(S)=iG𝒢ilogpi(GSi),\displaystyle\ell(S)=\sum_{i}\ell_{i}(S)=-\sum_{i}\sum_{G\in\mathcal{G}_{i}}\log p_{i}(G\mid S_{i})\,,

is the log likelihood of SS (with SiSS_{i}\subseteq S derived from AA), assuming that the graphs in a group are independent and identically distributed. This allows us to identify a good set of graph patterns by minimizing the BIC score, i.e.,

argminS2{BIC(S)}.\displaystyle\operatorname*{arg\,min}_{S\subseteq 2^{\mathcal{E}}}\ \{\operatorname{BIC}(S)\}\;.

Solving this problem exactly poses significant challenges in practice due to its combinatorial nature and the explosion in the number of solution candidates. Therefore, we employ a greedy search strategy, iteratively selecting the graph pattern XX\subseteq\mathcal{E} that best improves our current model. That is, for a given (S,A)(S,A), we select the graph pattern XX that maximizes our likelihood, or equivalently, maximizes the difference BIC(S)BIC(S{X})\operatorname{BIC}(S)-\operatorname{BIC}(S\cup\{X\}), which we write as

Δ(X)=(S)(S{X})k/2log|𝒢|.\displaystyle\Delta(X)=\ell(S)-\ell(S\cup\{X\})-k/2\log|\mathcal{G}|\;.

In a nutshell, the core of our approach is the procedure

SS{argmaxX,Δ(X)>0{Δ(X)}},S\leftarrow S\;\cup\big{\{}\operatorname*{arg\,max}_{X\subseteq\mathcal{E},~{}\Delta(X)>0}\ \{\Delta(X)\}\big{\}}\;, (1)

by which we iteratively and greedily insert into SS the pattern XX\subseteq\mathcal{E} that locally maximizes our information gain.

Using a model selection criterion alone, however, we cannot tell if our information gain is due to random fluctuations or due to signal, especially if we only have a limited number of samples. Thus, to avoid modeling noise, we add XX to SS only if its information gain Δ(X)\Delta(X) is statistically significant. Therefore, we test whether we can reject the null hypothesis

H0:BIC(S)=BIC(S{X}).\displaystyle H_{0}:\;\operatorname{BIC}(S)=\operatorname{BIC}(S\cup\{X\})\;.

To this end, we use Vuong’s closeness test (Vuong 1989), a likelihood ratio test designed for model selection problems under BIC. Vuong’s test statistic is defined as 2Δ(X)2\Delta(X), which is asymptotically χ2\chi^{2}-distributed with dfΔ(X)=dfpi(S{X})dfpi(S)\operatorname{df}_{\Delta}(X)=\operatorname{df}p_{i}(\ \cdot\mid S\cup\{X\})-\operatorname{df}p_{i}(\ \cdot\mid S) degrees of freedom. To calculate dfΔ(X)\operatorname{df}_{\Delta}(X), we count the coefficients θ\theta that must be changed in every distribution if we insert XX into SS. As we add one coefficient for XX, and update at least |X||X| edge coefficients per group, we arrive at |X|+1|X|+1 additional degrees of freedom.

Discovering Differential Pattern Associations

Once we have selected a new pattern XX\subseteq\mathcal{E} to add to SS, given the current SS and AA, we identify a good assignment of XX to graph groups 𝒢iΠ\mathcal{G}_{i}\in\Pi to update AA. Here, the significance of Δ(X)\Delta(X), which is used to accept XX into SS, signals that XX is informative for some 𝒢iΠ\mathcal{G}_{i}\in\Pi, but it does not tell us for which 𝒢i\mathcal{G}_{i}. To assign XX to a graph group 𝒢i\mathcal{G}_{i}, we hence rely on the partial information gain of XX for 𝒢i\mathcal{G}_{i},

Δi(X)=i(Si)i(Si{X})k/2log|𝒢|.\displaystyle\Delta_{i}(X)=\ell_{i}(S_{i})-\ell_{i}(S_{i}\cup\{X\})-k/2\log|\mathcal{G}|\;.

Again, we use Vuong’s closeness test to decide whether Δi(X)\Delta_{i}(X) is significant; and if it is, we set AiX=1A_{iX}=1.

Algorithm

Having stated its theoretical foundations, we now introduce Gragra as an algorithm to differentially describe groups of graphs using sets of significant subgraphs. Gragra, whose pseudocode is given as Alg. 1, revolves around the procedure stated in Eq. (1), a greedy process that iteratively selects the graph pattern candidate that best enhances our model. This could involve myriad searches through the exponentially-sized space of all possible graphs with nodes from VV, which is not only computationally infeasible in most cases but also unnecessary, as most candidates will be eliminated as uninformative by Vuong’s test. Hence, rather than exhaustively searching for the best graph patterns, we propose to grow graphs by systematically adding edges to candidates.

Input: groups of graphs 𝒢1,,𝒢k\mathcal{G}_{1},\dots,\mathcal{G}_{k}
Output: set of graph patterns SS, association matrix AA
1
2SS\leftarrow\mathcal{E}
3 AA\leftarrow empty binary matrix with kk rows and 0 columns
4 C{{x,y}x,y,xy,V{x}V{y}}C\leftarrow\{\{x,y\}\mid x,y\in\mathcal{E},x\neq y,V_{\{x\}}\cap V_{\{y\}}\neq\emptyset\}
5 while CC\neq\emptyset 
6      X^,CGrow(C)\hat{X},C\leftarrow\textsc{Grow}(C)
7      if i[k]\exists i\in[k] s.t. hi(X^)h_{i}(\hat{X}) is significant 
8           resize AA
9           AiX^=1hi(X^)is significanti[k]A_{i\hat{X}}=1\iff h_{i}(\hat{X})\;\text{is significant}\;\forall i\in[k]
10           SS{X^}S\leftarrow S\cup\{\hat{X}\}
11           estimate pi(Si)i[k]p_{i}(\ \cdot\mid S_{i})\;\forall i\in[k] s.t. AiX^=1A_{i\hat{X}}=1
12          
13          
14           return S,AS\setminus\mathcal{E},A
15          
16           Fn. Grow(C)(C)
17                XargmaxXC{h(X)s.t.h(X)is significant}\displaystyle X\leftarrow\operatorname*{arg\,max}_{X\in C}\;\{h(X)\;\text{s.t.}\;h(X)\;\text{is significant}\}
18               
19               CC(((VX×V×W)(V×VX×W))X)\displaystyle C\leftarrow C\;\cup\;(((V_{X}{\times}V{\times}W)\;\cup\;(V{\times}V_{X}{\times}W))\setminus X)
20                C{XCh(X)is significant}C\leftarrow\{X\in C\mid h(X)\ \text{is significant}\ \}
21                X^argmaxXC{h(X)}\displaystyle\hat{X}\leftarrow\operatorname*{arg\,max}_{X\in C}\{h(X)\}
22                if h(X^)>h(X)h(\hat{X})>h(X) 
23                     return Grow(C)\textsc{Grow}(C)
24                     else
25                          return X^,C{X^}\hat{X},C\setminus\{\hat{X}\}
26                         
Algorithm 1 Gragra

To enable our model to infer all possible graphs, we initialize it with the set \mathcal{E} of all possible edges. As our initial graph to grow, we then select the most promising graph pattern from our initial candidates, i.e., the connected triples

C={{x,y}x,y,xy,V{x}V{y}}.\displaystyle C=\{\{x,y\}\mid x,y\in\mathcal{E},x\neq y,V_{\{x\}}\cap V_{\{y\}}\not=\emptyset\}\;.

Starting with a graph pattern XX, we explore all its expansions, ((VX×V×W)(V×VX×W))X((V_{X}\times V\times W)\cup(V\times V_{X}\times W))\setminus X, from which we select the best candidate pattern to grow further, as long as we gain information and Δ(X)\Delta(X) is significant. We summarize these steps in the function Grow of Alg. 1 (l. 11).

Grow requires many inferences of Δ\Delta, which involve inferring many more expected frequencies pip_{i}, rendering exact computation impractical. We thus design a practical, pessimistic heuristic that only considers the information gain from graphs G𝒢G\in\mathcal{G} in which XX is fully present:

h(X)=iciqi(X)logqi(X)pi(X)k/2log|𝒢|.h(X)=\sum_{i}c_{i}\cdot q_{i}(X)\log\frac{q_{i}(X)}{p_{i}(X)}-k/2\log|\mathcal{G}|\;. (2)

We use h(X)h(X) instead of Δ(X)\Delta(X) because it involves inferring only one expected frequency per graph group. A derivation of this heuristic can be found in the Online Appendix.

To summarize, Gragra proceeds as follows. Starting with an initial set of candidates CC (l. 1), we select (l. 1) and grow (l. 1) the best candidate, and retain all significant expansions (l. 1), until we have grown XX to its fullest potential (l. 11). Afterwards, we test if the information gain provided by XX is significant, and if so, we keep track of its graph group associations (l. 1), and insert XX into SS (l. 1).

The computational complexity of Gragra depends on the number of candidates, which can grow to at most |2||2^{\mathcal{E}}|. In practice, Gragra’s complexity depends on the number of times we grow graph patterns, which is data-dependent and bounded by the size γ\gamma of the largest connected component observed in an input graph, as growing beyond that reduces the information gain. Multiplying γ\gamma by the initial set of candidates, Gragra achieves a complexity of 𝒪((n3)|W|γ)\mathcal{O}\big{(}\binom{n}{3}|W|\gamma\big{)} for all practical purposes, where we assume that the complexity of inferring the expected frequency is bounded.

Related Work

To the best of our knowledge, we are the first to differentially describe groups of graphs through sets of significant subgraphs. Our method is inspired by advances in graph similarity description (Momo, Coupette and Vreeken 2021) and explainable pattern set mining using maximum-entropy modeling (DISC, Dalleiger and Vreeken 2020a, b). However, Momo focuses on pairs and unpartitioned sets of graphs; DISC is designed for itemset data, ignores graph structure, and does not scale on graphs; and neither method uses a statistical test to select patterns. Further related work broadly falls into two categories: statistical inference on network populations, and graph mining for groups of graphs.

Statistical Inference on Network Populations. In the statistics literature, the task of analyzing multiple graphs simultaneously is typically framed as an inference problem for network-valued random variables (Durante, Dunson, and Vogelstein 2017; Lovato et al. 2020; Lunagómez, Olhede, and Wolfe 2020). Here, Ghoshdastidar et al. (2020) establish limits for distinguishing two population distributions given small sample sizes, and Lunagómez, Olhede, and Wolfe (2020) propose notions of mean and dispersion for a single population of networks, where the population mean is itself a network. Maugis et al. (2020) use subgraph counts to test if all graphs in a sample are drawn from the same distribution, and Signorelli and Wit (2020) propose a model-based clustering approach to describe subpopulations within a population of networks. Finally, Durante, Dunson, and Vogelstein (2017) extend latent space approaches designed for single graphs to capture the probabilistic mechanism that generates multiple graphs from a single population distribution. Their model has been used to characterize and test for differences between groups of brain networks (Durante, Dunson et al. 2018)—an actively studied application for which numerous statistical methods, mostly focusing on testing for differences, have been developed (Ginestet et al. 2017; Wang et al. 2019; Lukemire et al. 2020; Kundu et al. 2021; Lovato et al. 2021; Lehmann et al. 2021).

Prior work in the statistics literature has focused on describing one network population or distinguishing two populations. In contrast, with Gragra, we aim to construct a differential description of any number of populations. Furthermore, we ask not only if these populations are different, but also how they are different and how they are similar.

Graph Mining for Groups of Graphs. In the graph mining literature, groups of graphs have been studied in contexts as diverse as significant subgraph mining (Llinares-López et al. 2015; Sugiyama et al. 2015), graph classification (Vogelstein et al. 2013; Yan et al. 2019; Lanciano, Bonchi, and Gionis 2020), graph clustering with graphs as data points (Mukherjee, Sarkar, and Lin 2017), anomalous graph detection (Gomes, Rao, and Neville 2018), and graph summarization for time series of graphs (Shah et al. 2015). Significant subgraph mining commonly considers small, node-labeled graphs with unaligned node sets, and hence, does not target our problem. However, our setup—i.e., medium-sized graphs with aligned node sets—, has received heightened attention in the graph classification community, again inspired by challenges from neuroscience (Vogelstein et al. 2013; Yan et al. 2019; Lanciano, Bonchi, and Gionis 2020).

The methods that are closest to our work are contrast subgraphs (Lanciano, Bonchi, and Gionis 2020) and signal subgraphs (Vogelstein et al. 2013), both designed for two groups of node-aligned graphs. Contrast subgraphs discover the densest subgraph in the difference of the summary graphs of the input groups (obtained by adding the graphs in each group separately and then subtracting the results), where the size of this subgraph depends on a user-specified regularization parameter α\alpha. Signal subgraphs assume edge independence as a prior to rank edges by the pp-values of an edge-wise statistical test for distributional difference (e.g., Fisher’s exact test). Like signal subgraphs, Gragra combines ideas from structural and statistical pattern mining to produce interpretable results that—unlike contrast subgraphs—are based on a statistical foundation. Gragra is more exploratory and more flexible than both competitors, however, because it treats graph group description as an end in itself and can handle any number of graph groups.

Experiments

We now present an extensive evaluation of our algorithm. To this end, we implement Gragra in C++ and expose a Python interface to facilitate experimentation. We run our experiments on Intel E5-2643 CPUs with 128128 or 256256 GB RAM, testing at a conservative significance level of 1×1071\times 10^{-7} (or 1×1051\times 10^{-5} when operating with less than 5050 samples), and make all data, code, and results publicly available.​222https://doi.org/10.5281/zenodo.6342823 Our experiments revolve around two questions:

  1. 1.

    Can Gragra reliably recover the ground truth from groups of synthetic graphs?

  2. 2.

    Does Gragra discover meaningful patterns in groups of real graphs?

Recovering Ground Truth from Synthetic Graphs

To assess the reliability of Gragra, we run it on groups of synthetic graphs with planted patterns. We consider three scenarios, namely,

  1. 1.

    summarizing one group of graphs,

  2. 2.

    differentially describing two groups of graphs, and

  3. 3.

    differentially describing four groups of graphs.

In all three scenarios, each graph group consists of 100100 graphs with 100100 nodes, and our configurations differ in their planted patterns (type, prevalence, and position) and noise levels. A detailed overview of our synthetic data configurations can be found in the Online Appendix.

For each scenario, we report the distribution of precision, recall, and F1 score, computed separately for each group of graphs, for the edges of the planted patterns across 100100 graph group datasets sampled with different seeds. In all scenarios, we compare Gragra, which uses BIC with Vuong’s closeness test for pattern selection, with a variant using only BIC and no statistical test to select patterns (GragraBIC\textsc{Gragra}_{\text{BIC}}). For configurations in the second scenario, we also compare our results with those from contrast subgraphs (CSG) and signal subgraphs (SSG), described in the previous section.

As shown in Fig. 2, GragraBIC\textsc{Gragra}_{\text{BIC}} delivers good results in the four-group scenario but generally has poor precision, treating noise as signal. CSG and SSG identify only constrastive patterns, and fail even for contrastive patterns if the individual edges in planted patterns have similar occurrence probabilities across groups. Gragra, however, reliably recovers the ground truth across scenarios and configurations, which allows us to hope that it will also work in practice.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Gragra reliably recovers the ground truth from synthetic data. We show precision, recall, and F1 score distributions for Gragra, GragraBIC\textsc{Gragra}_{\text{BIC}}, contrast subgraphs (CSG), and signal subgraphs (SSG), separately for all experiments in our three different settings: one-group setting (left), two-group setting (middle), and four-group setting (right). Subscripts of CSG labels correspond to different choices of their regularization parameter α\alpha, and subscripts of SSG labels indicate different requirements for the pp-values obtained from their edge-wise distributional difference test.

Discovering Meaningful Patterns in Real Graphs

To determine whether Gragra discovers meaningful patterns in groups of real graphs, we run 2929 experiments on graph group data of various graph types from three domains: functional brain networks (undirected, unweighted), air transportation networks (directed, weighted), and international trade networks (directed, weighted). We compile basic statistics of these networks in the Technical Appendix, and present a quantitative overview of our results in Fig. 3. We observe that, in line with expectations derived from theory, more graphs or graphs with more potential edges, partitioned into fewer groups, generally yield more patterns.

Refer to caption
Figure 3: Gragra discovers long graph patterns in datasets with different numbers of graph groups. Here, we show the length distribution of the patterns identified in each of our experiments on real-world data, where each box corresponds to a dataset. The first number below a dataset identifier states the number of graph groups kk in the dataset, the second number states the total number of patterns |S||S|, and the third number states the number of patterns ss shared between at least two graph groups.

Functional Brain Networks

Network neuroscience has emerged as a promising approach to understanding neurological disorders and diseases (Bullmore and Sporns 2009; Bassett and Sporns 2017; Fornito, Zalesky, and Breakspear 2015). One of its fundamental questions is whether certain disorders are systematically associated with structural or functional connectivity alterations in the brain (van den Heuvel and Sporns 2019). In particular, there is considerable uncertainty surrounding the neurological footprint of autism (and the delineation of its subtypes), and small sample sizes as well as covariates make many published findings hard to replicate (He, Byrge, and Kennedy 2020; King et al. 2019). This calls for methods that can detect signal in the presence of considerable noise and heterogeneity, identifying connectivity patterns that are statistically significantly associated with one or more groups of brain networks.

Motivated by this application, we obtain graphs from preprocessed functional connectomes provided by the Autism Brain Imaging Data Exchange (Craddock et al. 2013). In these graphs, each node corresponds to one of the 116116 regions of interest from the automated anatomical labeling atlas (AAL, Rolls et al. 2020), and each edge indicates relatively strong connectivity between two regions, as measured by their blood-oxygen-level dependent signal correlation during resting-state functional magnetic resonance imaging. To facilitate comparisons, the data is processed and grouped as described by Lanciano, Bonchi, and Gionis (2020), but we remove the self-loops (corresponding to perfect self-correlations) that are present in their data.

We experiment with Gragra in four two-group settings (individuals with autism spectrum disorder [ASD] and typically developed controls [TD] in the categories adolescents, children, eyes closed during scan, and males), four one-group settings (autistic individuals in each category only), and one four-group setting (autistic and non-autistic children and adolescents), operating on graphs with m[1 320,1 348]m\in[1\,320,1\,348] edges and graph groups 𝒢i\mathcal{G}_{i} with ci[49,420]c_{i}\in[49,420] graphs. Our four-group experiment identifies significant overconnectivity across multiple brain regions as characteristic of ASD children versus all other groups, paralleling the neuroscience literature (Supekar et al. 2013; Nomi and Uddin 2015). However, as shown in Fig. 4, most of the patterns we identify in the two-group setting yield similar information gains across both groups (left), and there is significant structure to be exploited even within individual groups (right). This indicates that the differences between autistic and non-autistic brains in the settings under study are rather subtle, and that there is considerable heterogeneity also in the one-group data. To explore this heterogeneity and delineate neurosubtypes of autism (cf. Hong et al. 2020), our results could be used as inputs to multivariate subgroup discovery or clustering algorithms, where Gragra would effectively serve as a dimensionality reduction technique.

Refer to caption
Refer to caption
Figure 4: Gragra unveils shared and contrastive patterns in noisy and heterogeneous data. Here, we display the distribution of information gain differences per pattern in the two-group setting (left), and the distribution of information gains per pattern in the one-group setting (right), for our experiments on functional brain networks.
Refer to caption
Refer to caption
Refer to caption
Figure 5: Gragra discovers large, meaningful graph patterns. Here, we depict some of the patterns discovered in the air transportation networks of national carriers (left, five patterns shown), major carriers (right, two patterns shown), and both carrier classes (middle, one pattern shown). Gray nodes represent airports, and node labels identify airports contained in at least one of the displayed patterns by their three-letter IATA codes. Directed edges represent flight segments, and edge colors are proportional to their weight bins, following different color maps (reds, blues, or grays) where necessary to make them visually distinguishable. All drawn patterns are among the top fifteen in terms of information gain for their respective experiment, and the pattern in the middle is the top shared pattern, corresponding to the United States air transportation backbone.
Refer to caption
Figure 6: Gragra mines differential descriptions even when many graph groups are given as input. Here, we show the top five graph patterns identified in the international trade networks when split by product class and decade (fifteen graph groups in total). Nodes correspond to countries, which are represented by their ISO3 country codes. Directed edges correspond to trade flows between the countries, where the edge weights in all displayed patterns fall into the top weight bin. The patterns are labeled by rules identifying the graph groups in which they occur, with letters corresponding to the first letter of a product group, and numbers corresponding to the position of a ten-year interval. For example, the third pattern, labeled (¬A)(2,3)(\neg A)(2,3), occurs in all product classes except for Animals, in the second and the third ten-year interval, i.e., in [99,19)[99,19).

Air Transportation Networks

We obtain data on passenger flows between domestic airports in the United States for each month over the sixteen years from January 20052005 to December 20202020 from the website of the Bureau of Transportation Statistics (Bureau of Transportation Statistics 2021). Restricting our analysis to United States mainland airports and carriers classified as national (100100 million to 11 billion USD revenue in the previous year) or major (over 11 billion USD revenue in the previous year), we create one air transportation network per year, month, and carrier class. To this end, for each year and month, we aggregate the passenger flows between two airports by carrier class and filter edges corresponding to fewer than 3 0003\,000 passengers, which leaves edges between n=300n=300 airports (identified by three-letter IATA codes). Excluding graphs with fewer than n1=299n-1=299 edges, we arrive at 374374 graphs, whose edges we discretize into ten weight categories using equal-width binning.

We are interested in discovering patterns that are shared across all graphs, identifying structures of connected routes that are specific to individual carrier classes, and unveiling both seasonal and temporal trends. Therefore, we run Gragra in six different settings: on all graphs as one group, on the graphs corresponding to each carrier class separately, on all graphs with carrier classes as groups, on all graphs with quarters as groups (starting from December to capture the winter holiday season), and on all graphs with consecutive four-year intervals as groups. Thus, our setup contains graphs with m[335,3 533]m\in[335,3\,533] edges and graph groups 𝒢i\mathcal{G}_{i} with ci[86,374]c_{i}\in[86,374] graphs. In Fig. 5, we depict a subset of our results from the experiments involving the distinction between carrier classes. Gragra reveals an air transportation backbone jointly serviced by both carrier classes (middle), and it uncovers routes that are characteristically served by national or major carriers (left and right). Overall, we find that patterns corresponding to national carrier routes are often smaller and cover shorter distances than those corresponding to major carrier routes, mirroring the relatively smaller role of national carriers in the air traffic market.

International Trade Networks

We obtain data on international trade flows from the website of the World Integrated Trade Solution (World Bank 2021), for the thirty years from 19891989 to 20182018 (inclusive). The raw data correspond to exports of goods between (mostly) countries, classified using the Harmonized System at the four-digit level (HS-4), whose trade values we aggregate per (source, destination, HS-4 code) triple. For each year and HS-4 code, we construct one directed, weighted graph with (roughly) countries as nodes and exports as edges, discretizing the edge weights into ten categories using equal-width binning. We eliminate all trade entities above the country level but retain trade entities below the country level (and countries that do not exist anymore) if they have an ISO3 code. Restricting our attention to the WITS product groups Animals, Vegetables, Food Products, Minerals, and Chemicals, we arrive at 3 9763\,976 graphs with n=250n=250 nodes and at least n1=249n-1=249 edges.

Leveraging the richness of our data, we ask not only what graph patterns are characteristic of international trade as a whole, but also what structures emerge when we group trade networks by product class, ten-year interval, or product class and ten-year interval. As Gragra allows us to inspect our data at different scales, we further investigate the trade patterns it unveils when considering each product class separately, either treating all graphs from one product class as one group or splitting them by ten-year interval. Thus, we run our experiments on graphs with m[256,11 415]m\in[256,11\,415] edges and graph groups 𝒢i\mathcal{G}_{i} with ci[70,3 976]c_{i}\in[70,3\,976] graphs. In Fig. 6, we illustrate five patterns discovered in the experiments that explore all graphs together, grouped by product class and ten-year interval. Although the input consists of fifteen classes, Gragra discovers not only meaningful patterns but meaningful patterns with meaningful assignments to graph groups which, as highlighted by the pattern labels in Fig. 6, can be summarized succinctly. Across all experiments, we observe that the patterns yielding the largest information gains are often composed entirely of edges in the top two weight bins. This suggests that the ranking of exporter-importer pairs is most stable on the upper end of the trade-value spectrum, which aligns with interdisciplinary research findings that international trade is highly stratified (Sacks, Ventresca, and Uzzi 2001; Lloyd, Mahutga, and De Leeuw 2009; Fagiolo, Reyes, and Schiavo 2010).

Conclusion

We study the graph group analysis problem: Given a set of graphs and a partition of this set into graph groups, succinctly summarize the commonalities and differences between graphs in the same group, between graphs in different groups, and between the relationships connecting the groups. We introduce Gragra as an algorithm to solve the problem, which uses maximum likelihood modeling, paired with a model selection criterion and a statistical test, to jointly discover a set of significant subgraphs, called graph patterns, and an assignment of these patterns to graph groups. In our experiments, we demonstrate that Gragra differentially describes synthetic and real-world graph groups, even when faced with heterogeneity, noise, or large group numbers. As a byproduct, we introduce two novel datasets of node-aligned graphs, which might be of independent interest to the graph mining community.

However, our work also has limitations. First of all, we model edge weights as categories, which works well for binned edge weights in practice but is theoretically dissatisfying. Therefore, a natural enhancement of Gragra would be able to handle real edge weights, possibly using a maximum entropy model on its edge weight distribution. Second, we currently test all our graph patterns at the same alpha level. While this is theoretically defensible, given that we combine our statistical test with a model selection criterion, dynamically adjusting our alpha level might be an option worth exploring. Finally, Gragra is currently limited to groups of node-aligned graphs, and extending it to other graph types constitutes an open opportunity for future work.

References

  • Bassett and Sporns (2017) Bassett, D. S.; and Sporns, O. 2017. Network neuroscience. Nature Neuroscience, 20(3): 353–364.
  • Bullmore and Sporns (2009) Bullmore, E.; and Sporns, O. 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 10(3): 186–198.
  • Bureau of Transportation Statistics (2021) Bureau of Transportation Statistics. 2021. Data Bank 28DS - T-100 Domestic Segment Data (World Area Code). https://www.bts.gov/browse-statistical-products-and-data/bts-publications/data-bank-28ds-t-100-domestic-segment-data.
  • Coupette and Vreeken (2021) Coupette, C.; and Vreeken, J. 2021. Graph Similarity Description: How Are These Graphs Similar? In KDD, 185–195.
  • Craddock et al. (2013) Craddock, C.; Benhajali, Y.; Chu, C.; Chouinard, F.; Evans, A.; Jakab, A.; Khundrakpam, B. S.; Lewis, J. D.; Li, Q.; Milham, M.; et al. 2013. The neuro bureau preprocessing initiative: open sharing of preprocessed neuroimaging data and derivatives. Frontiers in Neuroinformatics, 7.
  • Csiszár (1975) Csiszár, I. 1975. I-divergence geometry of probability distributions and minimization problems. Annals Prob., 146–158.
  • Dalleiger and Vreeken (2020a) Dalleiger, S.; and Vreeken, J. 2020a. Explainable data decompositions. In AAAI, volume 34, 3709–3716.
  • Dalleiger and Vreeken (2020b) Dalleiger, S.; and Vreeken, J. 2020b. The Relaxed Maximum Entropy Distribution and its Application to Pattern Discovery. In ICDM, 978–983.
  • Durante, Dunson, and Vogelstein (2017) Durante, D.; Dunson, D. B.; and Vogelstein, J. T. 2017. Nonparametric Bayes modeling of populations of networks. JASA, 112(520): 1516–1530.
  • Durante, Dunson et al. (2018) Durante, D.; Dunson, D. B.; et al. 2018. Bayesian inference and testing of group differences in brain networks. Bayesian Analysis, 13(1): 29–58.
  • Fagiolo, Reyes, and Schiavo (2010) Fagiolo, G.; Reyes, J.; and Schiavo, S. 2010. The evolution of the world trade web: a weighted-network analysis. Journal of Evolutionary Economics, 20(4): 479–514.
  • Fornito, Zalesky, and Breakspear (2015) Fornito, A.; Zalesky, A.; and Breakspear, M. 2015. The connectomics of brain disorders. Nature Reviews Neuroscience, 16(3): 159–172.
  • Ghoshdastidar et al. (2020) Ghoshdastidar, D.; Gutzeit, M.; Carpentier, A.; and Von Luxburg, U. 2020. Two-sample hypothesis testing for inhomogeneous random graphs. Annals Stat., 48(4): 2208–2229.
  • Ginestet et al. (2017) Ginestet, C. E.; Li, J.; Balachandran, P.; Rosenberg, S.; Kolaczyk, E. D.; et al. 2017. Hypothesis testing for network data in functional neuroimaging. Annals Appl. Stat., 11(2): 725–750.
  • Gomes, Rao, and Neville (2018) Gomes, G.; Rao, V.; and Neville, J. 2018. Multi-level Hypothesis Testing for Populations of Heterogeneous Networks. In ICDM, 977–982.
  • He, Byrge, and Kennedy (2020) He, Y.; Byrge, L.; and Kennedy, D. P. 2020. Nonreplication of functional connectivity differences in autism spectrum disorder across multiple sites and denoising strategies. Human Brain Mapping, 41(5): 1334–1350.
  • Hong et al. (2020) Hong, S.-J.; Vogelstein, J. T.; Gozzi, A.; Bernhardt, B. C.; Yeo, B. T.; Milham, M. P.; and Di Martino, A. 2020. Toward neurosubtypes in autism. Biological Psychiatry, 88(1): 111–128.
  • Hull et al. (2017) Hull, J. V.; Dokovna, L. B.; Jacokes, Z. J.; Torgerson, C. M.; Irimia, A.; and Van Horn, J. D. 2017. Resting-state functional connectivity in autism spectrum disorders: a review. Frontiers in Psychiatry, 7: 205.
  • Jaynes (1982) Jaynes, E. 1982. On the rationale of maximum-entropy methods. Proc. IEEE, 70(9): 939–952.
  • King et al. (2019) King, J. B.; Prigge, M. B.; King, C. K.; Morgan, J.; Weathersby, F.; Fox, J. C.; Dean, D. C.; Freeman, A.; Villaruz, J. A. M.; Kane, K. L.; et al. 2019. Generalizability and reproducibility of functional connectivity in autism. Molecular Autism, 10(1): 1–23.
  • Kundu et al. (2021) Kundu, S.; Ming, J.; Nocera, J.; and McGregor, K. M. 2021. Integrative learning for population of dynamic networks with covariates. NeuroImage, 118181.
  • Lanciano, Bonchi, and Gionis (2020) Lanciano, T.; Bonchi, F.; and Gionis, A. 2020. Explainable Classification of Brain Networks via Contrast Subgraphs. In KDD, 3308–3318.
  • Lee, Rossi, and Kong (2018) Lee, J. B.; Rossi, R.; and Kong, X. 2018. Graph classification using structural attention. 1666–1674.
  • Lehmann et al. (2021) Lehmann, B.; Henson, R.; Geerligs, L.; White, S.; et al. 2021. Characterising group-level brain connectivity: a framework using Bayesian exponential random graph models. NeuroImage, 225: 117480.
  • Liu et al. (2018) Liu, Y.; Safavi, T.; Dighe, A.; and Koutra, D. 2018. Graph summarization methods and applications: A survey. ACM CSUR, 51(3): 1–34.
  • Llinares-López et al. (2015) Llinares-López, F.; Sugiyama, M.; Papaxanthos, L.; and Borgwardt, K. 2015. Fast and memory-efficient significant pattern mining via permutation testing. In KDD, 725–734.
  • Lloyd, Mahutga, and De Leeuw (2009) Lloyd, P.; Mahutga, M. C.; and De Leeuw, J. 2009. Looking back and forging ahead: Thirty years of social network research on the world-system. Journal of World-Systems Research, 48–85.
  • Lovato et al. (2021) Lovato, I.; Pini, A.; Stamm, A.; Taquet, M.; and Vantini, S. 2021. Multiscale null hypothesis testing for network-valued data: Analysis of brain networks of patients with autism. Appl. Statist., 70(2): 372–397.
  • Lovato et al. (2020) Lovato, I.; Pini, A.; Stamm, A.; and Vantini, S. 2020. Model-free two-sample test for network-valued data. Computational Statistics and Data Analysis, 144: 106896.
  • Lukemire et al. (2020) Lukemire, J.; Kundu, S.; Pagnoni, G.; and Guo, Y. 2020. Bayesian joint modeling of multiple brain functional networks. JASA, 1–13.
  • Lunagómez, Olhede, and Wolfe (2020) Lunagómez, S.; Olhede, S. C.; and Wolfe, P. J. 2020. Modeling network populations via graph distances. JASA, 1–18.
  • Mampaey, Vreeken, and Tatti (2012) Mampaey, M.; Vreeken, J.; and Tatti, N. 2012. Summarizing Data Succinctly with the Most Informative Itemsets. ACM TKDD, 6: 1–44.
  • Maugis et al. (2020) Maugis, P.-A.; Olhede, S.; Priebe, C.; and Wolfe, P. 2020. Testing for equivalence of network distribution using subgraph counts. Journal of Computational and Graphical Statistics, 29(3): 455–465.
  • Mukherjee, Sarkar, and Lin (2017) Mukherjee, S. S.; Sarkar, P.; and Lin, L. 2017. On clustering network-valued data. In NIPS, 7074–7084.
  • Nomi and Uddin (2015) Nomi, J. S.; and Uddin, L. Q. 2015. Developmental changes in large-scale network connectivity in autism. NeuroImage: Clinical, 7: 732–741.
  • Rolls et al. (2020) Rolls, E. T.; Huang, C.-C.; Lin, C.-P.; Feng, J.; and Joliot, M. 2020. Automated anatomical labelling atlas 3. Neuroimage, 206: 116189.
  • Sacks, Ventresca, and Uzzi (2001) Sacks, M. A.; Ventresca, M. J.; and Uzzi, B. 2001. Global institutions and networks: Contingent change in the structure of world trade advantage, 1965-1980. American Behavioral Scientist, 44(10): 1579–1601.
  • Schwarz (1978) Schwarz, G. 1978. Estimating the Dimension of a Model. Annals Stat., 6(2): 461–464.
  • Shah et al. (2015) Shah, N.; Koutra, D.; Zou, T.; Gallagher, B.; and Faloutsos, C. 2015. Timecrunch: Interpretable dynamic graph summarization. In KDD, 1055–1064.
  • Signorelli and Wit (2020) Signorelli, M.; and Wit, E. C. 2020. Model-based clustering for populations of networks. Statistical Modelling, 20(1): 9–29.
  • Sugiyama et al. (2015) Sugiyama, M.; López, F. L.; Kasenburg, N.; and Borgwardt, K. M. 2015. Significant subgraph mining with multiple testing correction. In SDM, 37–45. SIAM.
  • Supekar et al. (2013) Supekar, K.; Uddin, L. Q.; Khouzam, A.; Phillips, J.; Gaillard, W. D.; Kenworthy, L. E.; Yerys, B. E.; Vaidya, C. J.; and Menon, V. 2013. Brain hyperconnectivity in children with autism and its links to social deficits. Cell Reports, 5(3): 738–747.
  • van den Heuvel and Sporns (2019) van den Heuvel, M. P.; and Sporns, O. 2019. A cross-disorder connectome landscape of brain dysconnectivity. Nature Reviews Neuroscience, 20(7): 435–446.
  • Vogelstein et al. (2013) Vogelstein, J. T.; Roncal, W. G.; Vogelstein, R. J.; and Priebe, C. E. 2013. Graph classification using signal-subgraphs: Applications in statistical connectomics. IEEE TPAMI, 35(7): 1539–1551.
  • Vuong (1989) Vuong, Q. H. 1989. Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica: Journal of the Econometric Society, 307–333.
  • Wang et al. (2019) Wang, L.; Zhang, Z.; Dunson, D.; et al. 2019. Common and individual structure of brain networks. Annals Appl. Stat., 13(1): 85–112.
  • World Bank (2021) World Bank. 2021. World Integrated Trade Solution. https://wits.worldbank.org/Default.aspx?lang=en.
  • Yan et al. (2019) Yan, Y.; Zhu, J.; Duda, M.; Solarz, E.; Sripada, C.; and Koutra, D. 2019. Groupinn: Grouping-based interpretable neural network for classification of limited, noisy brain data. In KDD, 772–782.