Inverse Graph Identification: Can We Identify
Node Labels Given Graph Labels?
Abstract
Graph Identification (GI) has long been researched in graph learning and is essential in certain applications (e.g. social community detection). Specifically, GI requires to predict the label/score of a target graph given its collection of node features and edge connections. While this task is common, more complex cases arise in practice—we are supposed to do the inverse thing by, for example, grouping similar users in a social network given the labels of different communities. This triggers an interesting thought: can we identify nodes given the labels of the graphs they belong to? Therefore, this paper defines a novel problem dubbed Inverse Graph Identification (IGI), as opposed to GI. Upon a formal discussion of the variants of IGI, we choose a particular case study of node clustering by making use of the graph labels and node features, with an assistance of a hierarchical graph that further characterizes the connections between different graphs. To address this task, we propose Gaussian Mixture Graph Convolutional Network (GMGCN), a simple yet effective method that makes the node-level message passing process using Graph Attention Network (GAT) under the protocol of GI and then infers the category of each node via a Gaussian Mixture Layer (GML). The training of GMGCN is further boosted by a proposed consensus loss to take advantage of the structure of the hierarchical graph. Extensive experiments are conducted to test the rationality of the formulation of IGI. We verify the superiority of the proposed method compared to other baselines on several benchmarks we have built up. We will release our codes along with the benchmark data to facilitate more research attention to the IGI problem.
1 Introduction
In many scenarios, the objects with their features are connected by their interactions as graphs. By analyzing these edge connections and node features throughout various graphs, a graph identification (GI) problem is formed to predict the information or properties of the graphs, such as labels or scores of the target graphs. Many studies have been developed on GI, such as graph classification li2019semi ; shi2000normalized and graph regression pujara2013knowledge . Formally, methods for GI aggregates the information from nodes and edges to predict or summarize the information of the whole graph. However, it is much more interesting to consider an inverse problem which has never been proposed: Can we use the information of graphs to infer the information of nodes or even edges and sub-graphs? In another word, given the labels of graphs, how to figure out the categories of nodes, edges, or sub-graphs?
This problem is interesting and very common in the real world. In social media, for example, thinking of hot events that are widely discussed in the form of graphs where users involved as nodes and the co-following relationship among users as edges, it is interesting to think how to pick out the malicious users from the mass of users with only the labels of these event topics such as the authenticity of each topic zhou2018fake ; cao2018automatic . In drug discovery, for another example, thinking of molecules as graphs in which atoms as nodes and chemical bonds as edges, we attempt to identify the roles of certain sub-graphs, or equivalently functional groups, in each molecular given its chemical or physical properties gilmer2017neural ; ma2020dual . In a programming language, for the last example, thinking of control flows for programs as graphs by considering statements as nodes and control flows as edges, can we detect the problematic statements if we have already known which program has bug or not lu2005bugbench ?
All these problems can be defined as a general problem as Inverse Graph Identification (IGI) that identifies the nodes in graphs based on the information of graphs. The main difficulty of the IGI task is that the node labels are inaccessible so that all available information for the training model only comes from the labels of graphs. Namely, it is a node identification task where the available training labels are much coarser-grained than the node labels we want to fit. It seems this problem can be resolved from the present perspective of node clustering girvan2002community or node identification kipf2017semi . However, they are different from the problems that we attempt to address. Unlike node clustering which only conducts clustering based on node features or graph structures, IGI is better guaranteed by the given label information of graphs, which, we will demonstrate, closely influences the clustering results. It is also different from the node identification task because IGI does not contain any node labels for training. Another similar concept to our IGI is Multiple-Instance Learning (MIL) carbonneau2018multiple that adopts global labels of bags to identify the labels of local instances. However, MIL assumes that each instance is i.i.d. and there are no edge connections involved. Therefore, it is interesting and important to study the IGI problem as a set of new challenges and seek new solutions for it.
In this paper, we formally define IGI and address out its different problem statements as a set of new challenges. Meanwhile, we focus on a particular study of IGI: A node clustering task by making use of graph labels and node features with an assistance of a hierarchical graph girvan2002community that further characterizes the relations among graphs. To address this particular task, we propose a novel model based on Gaussian mixture model (GMM) and graph convolutional network (GCN) named as Gaussian Mixture Graph Convolutional Network (GMGCN). First, the features of each node are updated through the Graph Attention Network (GAT). Then the node features are aggregated by a Gaussian mixture layer (GML) and a new attention pooling layer proposed in the paper. After obtaining the graph representations, we adopt a hierGCN to classify the graphs. Specifically, we design a consensus loss that plays a key role in the training process. Finally, a node clustering is carried out according to the parameters of GML. The main contributions are as follows: 1. New problem:We introduce a new problem called Inverse Graph Identification (IGI), which tries to identify the nodes in graphs based on the labels of graphs, and we take a formal discussion of the variants of IGI to attract more research attention on this problem. 2. New solution:We propose an effective model called GMGCN based on GMM and GCN to solve a particular study of IGI problem. To the best of our knowledge, this is the first work to achieve node clustering in graph structure by integrating GMM into GCN. 3. New loss:We propose a consensus loss function to boost the model training through the principle of "same attraction, opposite repulsion". Experiments validate that this consensus loss greatly improves the model effect.
We validate the proposed GMGCN on various synthetic datasets based on different problem statements and a real-world dataset. The results demonstrate that the proposed method is suitable to solve the IGI task.
2 Related Works
Graph Identification. Recently, there is an increasing interest in the graph learning domain. Among all the existing works, GCN is one of the most effective convolution models. A typical GCN model is the message passing neural network (MPNN) proposed by Gilmer te al. gilmer2017neural which re-generalize several neural network and graph convolutional network approaches as a general "message-passing" architecture. Many kinds of GCN bruna2014spectral ; defferrard2016convolutional ; kipf2017semi actually deliver different message propagation functions for GCN. Among them, Graph Attention Networks (GAT) velivckovic2017graph first leverages learnable self-attentional layers to aggregate weighted neighbors’ information. All of these methods obtain the appropriate node representation for node identification tasks. But to cope with the GI problem, a compact representation on graph level should be utilized. Therefore, pooling strategies are proposed to integrate information over the node representations wu2020comprehensive , such as max/min pooling defferrard2016convolutional , SortPooling zhang2018end , and so on. In addition, Lin et al. lin2017structured propose an learnable attention pooling for weighted averaging of node representations. Although several studies have researched the graph identification problem, current researches are still lack of studies about Inverse Graph Identification.
Graph Clustering. Graph clustering, a fundamental data analysis task that aims to group similar nodes into the same category, has similar goals as IGI problem. Many real-world applications are cast as graph clustering shi2000normalized ; white2005spectral ; hastings2006community ; kim2006customer . The major strategy of graph clustering is to perform simple clustering algorithms such as -means jain2010data or GMM mclachlan1988mixture on the features embedded from graph embedding wang2017community . With greatly successful achievements of deep learning, more graph clustering studies have resorted to deep learning to learn embedding that capturing both node features and structural relationships wu2020comprehensive . Researchers employ the stacked sparse autoencoder tian2014learning , the variational autoencoder kipf2016variational , or the combination both autoencoder and GCN bo2020structural to obtain graph representations for clustering. Nevertheless, these graph clustering methods do not perform well on IGI due to the lack of attention to graph labels.
Multiple-Instance Learning. Another task that has a similar definition to IGI is Multiple-Instance Learning (MIL). MIL is a variant of inductive machine learning, where each learning example consists of a bag of instances instead of a single feature vector foulds2010review . When obtaining local instances annotations is costly or not possible, but global labels for bags are available, MIL is utilized to train classifiers using weakly labeled data. It has received a considerable amount of attention due to both its theoretical interests and its applicability to real-world problems dietterich1997solving ; andrews2002multiple ; cheplygina2019not ; pinheiro2015image . For example, Wu et al. wu2015deep propose deep MIL which adopts max pooling to find positive instances for image classification and annotation. Although these MIL methods learn a classifier from the labels of bags to achieve the instance classification, these MIL methods ignore the structural information among instances so that they are not suitable for IGI.
Gaussian Mixture Model. In addition, as a core part of the proposed GMGCN model, GMM is a parametric probability density function for a weighted sum of Gaussian component densities mclachlan1988mixture . It is commonly used to find underlying clusters in data samples bishop2006pattern . Generally, the GMM parameters are estimated using the iterative Expectation-Maximization (EM) moon1996expectation algorithm from training data. In this paper, we integrate the GMM into GCN to establish a solution for IGI and update the GMM parameters using stochastic gradient descent.
3 Notations and Problem Statement
Notations. We denote a set of graph instances as with graphs, where refers to the -th graph instance and is the corresponding graph label of different categories. We denote by the -th graph instance of size with nodes and edges , by the feature matrix of nodes , and by the adjacency matrix which associate edge with . We denote by the potential labels of nodes , which are invisible to the model training. In addition, we denote by the adjacency matrix of the links among graphs if the graphs in contain inter-connections. Then, if no relationships among graphs.
Problem Statement. The Inverse Graph Identification problem is defined as: Given a set of graph-label pairs and node features , how to infer each label of the -th nodes in the -th graph, , where and ? Under the definition of IGI, there are some different cases that could be extended from the problem as new tasks.
Case 1 (Infer Original Nodes)
Given the whole and as training data, how to infer all the node labels in the s?
Case 2 (Infer New Nodes)
Given the portion of nodes in each as training data, how to infer the labels of the rest of nodes in the s?
Case 3 (Infer New Graphs)
Given the portion of graphs as training data, how to infer the node labels in the rest of graphs, ?
Case 4 (With/Without Hierarchical Graph)
Given the inter-connections among graphs as , how to infer the node labels. And how to infer the node labels if (i.e. no inter-connections)?
There are more different cases under IGI problem. For example, how to infer a node score instead of a node label? We will discuss more cases in Appendix A. In this paper, we focus on the IGI problem that aims to infer the labels of the nodes. In the Experiments section, we will examine the proposed method and the baselines under all the above cases.

4 Proposed Method
In this section, we first introduce the preliminaries related to the proposed model, then we discuss two key components of GMGCN: GML and consensus loss.
4.1 Preliminaries
Graph Attention Networks. Graph Attention Networks (GAT) velivckovic2017graph has been widely adopted in the field of graph convolution due to the learnable attentions to neighbors during the node update process. A multi-head GAT Convolutional layer (GATConv) is formulated as below:
(1) |
where and refer to the hidden feature representation and the original feature vector of node in the graph, respectively. refers to concatenation operation, and refers to the nonlinear activation function. is the weight matrix and is the set of neighbors of node . is the attention coefficient computed by the -th attention mechanism as follows:
(2) |
where attention mechanism is a single-layer feedforward neural network, parametrized by a weight vector , and applying the LeakyReLU nonlinearity. In this paper, a two-layer GAT is formulated as follows:
(3) |
Graph Attention Pooling. Lin et al. propose the graph attention pooling in lin2017structured , which employs a self-attentive mechanism to learn the node importance and then transform a variable number of nodes into a fixed-length graph representation:
(4) |
The node attention scores are calculated by a 2-layer MLP without bias. is employed to infer the importance of each node. function is adopted to normalize the importance of each node in the graph. Finally, the unified graph representation of the graph is obtained by multiplying the node attention scores with H.
Hierarchical GCN. We employ a hierarchical GCN li2019semi to update the feature representations of the graphs in the hierarchical graph as Eq. 5
(5) |
where the hierarchical feature matrix H consists of the feature representations of graphs. Then we concatenate the original hierarchical feature matrix with the updated feature matrix as a self-loop.
4.2 Overall Framework
Fig. 1 gives an overall framework of the proposed GMGCN method to infer the categories of the nodes in the graphs. It consists of three processes: node-level representations, node inference, and graph-level identifications.
Gaussian Mixture Layer (GML). As the key component of the GMGCN, GML plays a crucial role in the process of node inference. It is described as , where refers to the weight matrix and represents the Gaussian weight function as follows:
(6) |
where and are learnable covariance matrix and mean vector of the Gaussian weight function , respectively.
Based on the idea of Gaussian Mixture Model (GMM), GML has the ability to effectively distinguish different types of nodes according to the input features. However, it could not resolve two major challenges of IGI problem: the node relations and the graph identification that affects the final node inference. Therefore, as shown in Fig. 1, we employ the node representations from the 2-layer GAT as inputs of GML. The node representations provide local structural information of the nodes to GML. Meanwhile, the outputs of GML are aggregated by attention pooling and further fed into a hierarchical GCN for graph identifications. The graph identifications adjust the hidden space of GML via a back propagation process of hierGCN. Note that, one can easily replace hierGCN by multilayer perceptron (MLP) to formulate Case4 of IGI. Both node representations and graph identifications address two major challenges of IGI, respectively.

GMGCN vs. AutoEncoder. Actually, the GMGCN model can be re-thought as an encoder-decoder process with a pre-specified hidden space, where the 2-layer GAT is considered as an encoder, the hierGCN is considered as a decoder to graph labels instead of nodes themselves, and GML is the hidden space. When involving GML into the proposed model, it naturally involves an assumption that the nodes of each category are potentially distributed as a Gaussian distribution in the hidden space. The nodes of multiple categories are mixed out to form a Gaussian mixture distribution. Consequently, this also implies the limitation of the proposed model that the proposed model is only able to resolve the IGI problem for categorizing nodes.
Consensus Loss. The core idea of consensus loss is to make the graph representations with the same labels closer, and those with different labels farther away in the hidden space. Therefore, before computing the consensus loss, we first compute the similarity matrix of the graph representations, which is obtained by the following two steps:
First, we compute the distance between the graph representation via graph attention pooling layer and the mean vector of -th Gaussian weight function as follows:
(7) |
Second, we enhance the distance of the graph representations to the true mean vector on by:
(8) |
where is a discount hyperparameter, is the true label of graph , and represents the distance between the graph representation and the mean vector of -th Gaussian weight function . Then the similarity matrix S is formulated as , where consists of and is calculated as Fig. 2.
Finally, the consensus loss, , is formulated as a cross-entropy loss between and . In this way, the consensus loss makes the graph representations of each graph closer to the mean vector of the Gaussian function corresponding to its graph label, so as to realize that the graphs with the same labels are closer and the graphs with different labels are farther. Meanwhile, the graph attention pooling allows the node representations with the potentially opposite label to their graph labels to leave away from the graph representations by paying less attention.
Overall Loss. Besides the consensus loss that distinguishes nodes in the hidden space, the GMGCN model also includes the graph classification task. We establish a cross-entropy loss between the predictions and the ground truth labels over all graphs. Then, the overall loss function of our GMGCN is defined as linear interpolation of and as , where is the trade-off coefficient between classification loss and consensus loss .
4.3 Inference of Nodes
We infer the categories of the nodes in the graph based on the cosine similarity between the node representations, , obtained from GML and the mean vector, , of each Gaussian weight function. The category of each node is determined as follows:
(9) |
methods | single graph | |||||
---|---|---|---|---|---|---|
orignal nodes | new nodes | new graphs | ||||
NMI | ARI | NMI | ARI | NMI | ARI | |
Feature Clustering | 49.46(1.89) | 50.78(2.48) | 50.11(2.07) | 49.33(2.56) | 47.48(2.48) | 48.03(3.32) |
Graph Clustering | 80.65(8.18) | 85.51(7.96) | 85.85(8.47) | 87.72(8.52) | 78.72(10.47) | 84.25(10.33) |
MIL | 76.47(0.00) | 81.39(0.00) | 75.42(0.00) | 80.76(0.00) | 75.77(0.00) | 80.63(0.00) |
ATTGCN | 14.25(9.76) | 12.45(11.64) | 19.69(9.54) | 13.44(11.38) | 13.25(9.60) | 12.22(11.10) |
GMGCN-noncon | 49.58(37.15) | 56.57(31.17) | 60.46(26.25) | 57.72(31.87) | 54.91(25.69) | 56.88(30.34) |
GMGCN | 93.28(0.77) | 96.29(0.43) | 96.70(0.43) | 97.55(0.34) | 93.49(1.12) | 96.46(0.62) |
methods | hierarchical graph | |||||
orignal nodes | new nodes | new graphs | ||||
NMI | ARI | NMI | ARI | NMI | ARI | |
ATTGCN | 19.09(12.34) | 17.48(13.52) | 24.64(12.34) | 19.09(13.23) | 20.54(11.92) | 20.00(13.37) |
GMGCN-noncon | 46.45(17.61) | 37.44(38.26) | 48.08(28.28) | 37.66(38.33) | 49.17(25.88) | 37.26(38.14) |
GMGCN | 92.83(1.22) | 96.02(0.71) | 96.32(0.77) | 97.20(0.62) | 92.54(1.71) | 95.95(0.88) |
5 Experiments
In this section, we first conduct the experiments on synthetic data to validate the effectiveness of our proposed model under different problem cases. Then, we construct a real-world dataset based on the public rumor dataset called PHEME111https://figshare.com/articles/PHEME_dataset_of_rumours_and_non-rumours/4010619 zubiaga2016learning , the proposed method and the comparisons are evaluated on the constructed dataset.
5.1 Baselines
Since the IGI problem is a new challenge without any baseline work yet, we compare the proposed GMGCN with three types of most similar methods: clustering methods without structure information jain2010data ; mclachlan1988mixture ; ester1996density ; hinton2006reducing , graph clustering methods kipf2016variational ; bo2020structural and MIL ray2005supervised methods. We only report the best results of each type of method in the sequel due to the page limitations. More detailed results for all methods are available in Appendix B.
Proposed Method. We also examine the various settings of the proposed method as: • ATTGCN: GMGCN without GML, we replace cosine similarity by 2-head attention mechanisms in the attention pooling layer to distinguish the categories of the nodes in the inference. • GMGCN: The proposed method. • GMGCN-noncon: GMGCN without the proposed consensus loss for training.
We apply Normalized Mutual Information (NMI) strehl2002cluster and Adjusted Rand Index (ARI) hubert1985comparing to evaluate the clustering results in this paper. The dimension of our GMGCN is set to d-128-64-16-64-c, where d is the input dimension and c is the number of graph categories in the dataset. We adopt 2 independent attention mechanisms for the first GATConv layer and use 1 attention mechanism for the second GATConv layer. The parameters are updated using stochastic gradient descent via Adam algorithm kingma2014adam . The discount hyperparameter =0.5. The training process is iterated upon 800 epochs for GMGCN. We run all methods 10 times to avoid extremes and report the average results.




5.2 GMGCN on Synthetic Data
Synthetic Data Generation. We generate a synthetic hierarchical graph to examine the proposed IGI problem with various cases. First, we generate two synthetic user groups, representing normal and abnormal users, respectively. The relationship between normal users is generated by the Barabási–Albert graph model barabasi1999emergence , while the relationship between abnormal users is generated by the random graph algorithm. We use Gaussian distribution to randomly generate user features, the mean and standard deviation of these two types of users are randomly sampled from [-5, 5] and [1, 10], respectively. Second, we randomly connect users between two groups to simulate the relationship between two different types of users. Third, we sample of abnormal (normal) users into the normal (abnormal) graph so that a graph structure consists of two different types of users. We sample 100 users per graph, and generate 50 normal graphs and 50 abnormal graphs. The label of each graph is determined by the major labels of the users in the graph. Finally, we construct the connections between two graphs if they have more than one common user.
Case 1: Transductive learning. All graphs are available in the training. Table 1 shows that the proposed GMGCN method outperforms all the baselines. In particular, GMGCN improves NMI and ARI by more than 10% compared with all best-performing baseline results. This demonstrates a great improvement of the node identifications by the graph labels.
Case 2 & Case 3: Inductive learning. Portions of graphs or nodes are available in the training. For the testing of new nodes and new graphs, we sample of the nodes from each graph and of the graphs as testing, respectively. GMGCN also outperforms the other baselines as shown in Table 1. Therefore, GMGCN works for both transductive and inductive settings.
Case 4: With hierarchical graph. As shown in Table 1, the hierarchical graph structure promotes the performance of ATTGCN but has no effect on GMGCN. This indicates that GMGCN focuses more on node-level updates than graph-level updates to obtain better node clustering results.
Attention vs. GML. We compare the ATTGCN with GMGCN to verify the importance of GML. The results show that GML greatly promotes GMGCN to identify nodes, while the attention mechanism fails.
Consensus loss. To analyze the effect of the proposed consensus loss, we compare the performance of GMGCN and GMGCN-noncon. The results in Table 1 indicates that consensus loss enables GMGCN to identify node categories well.
Consequently, the out-performances of GMGCN for all cases imply that the proposed method is a right solution for IGI problem.
Visualization. We also visualize the node representations of two graphs randomly sampled from the Synthetic dataset in a 2D space by t-SNE van2014accelerating . Figs 3(a)-3(d) show the original node features, as well as the node representations learned from the feature clustering, graph clustering, and GMGCN, respectively. Different shapes in the figures represent the nodes from different graphs and different colors refer to different node categories. As shown in Fig. 3(d), GMGCN clearly distinguishes these two node categories but the others fail. This demonstrates that the node representations learned by GMGCN are reasonable for IGI.
Charlie Hebdo | Ferguson | Germanwings Crash | Ottawa Shooting | Sydney Siege | |
Method | NMI | NMI | NMI | NMI | NMI |
Feature Clustering | 25.46(0.00) | 24.56(0.00) | 48.84(0.00) | 34.56(0.00) | 19.45(0.00) |
Graph Clustering | 3.26(2.26) | 1.08(0.71) | 0.66(0.50) | 4.49(3.07) | 4.93(1.92) |
MIL | 5.69(0.00) | 4.08(0.00) | 0.61(0.00) | 0.60(0.00) | 19.68(0.00) |
GMGCN | 47.51(3.27) | 48.35(4.08) | 48.85(2.14) | 32.58(3.63) | 41.00(3.93) |
Method | ARI | ARI | ARI | ARI | ARI |
Feature Clustering | 23.46(0.00) | 23.23(0.00) | 47.29(0.00) | 32.08(0.00) | 16.45(0.00) |
Graph Clustering | 6.56(6.30) | 4.04(2.24) | 3.71(1.93) | 10.66(7.68) | 11.78(5.57) |
MIL | 9.38(0.00) | 13.93(0.00) | 1.10(0.00) | 2.09(0.00) | 39.51(0.00) |
GMGCN | 52.38(3.15) | 55.26(3.21) | 54.95(1.02) | 37.51(2.23) | 37.79(2.18) |
5.3 GMGCN on PHEME Data
Data Description. PHEME consists of five sets of real-life tweets, where each set is related to a piece of breaking news. Every breaking news includes a lot of rumor and non-rumor topics. A detailed data description is presented in zubiaga2016learning . Based on this dataset, we construct the hierarchical graph structure by assuming each topic as a graph, where links between users are formed according to their follows/retweets. We connect two graphs if they have more than one common user. We label the users appearing in more than M rumor topics as abnormal users, and the rest as normal users. Therefore, this dataset perfectly falls into the statement of the IGI problem. When , the proposed method all achieves the best results, the detailed results are shown in Appendix B. In this section, we only show the results when .
Clustering Results. As shown in Table 2, GMGCN perfectly resolves this IGI problem with an average improvement on NMI and a improvement on ARI on these five PHEME datasets compared with the best-performing baseline methods. This significant improvement implies that there exist certain applications in the real world that fulfill the statement of IGI problem. In contrast, the other methods perform poorly because IGI is a totally new problem that differs from feature clustering, graph clustering, or MIL. Due to the improper assumptions, they all fail to reveal the truth.
6 Discussion and Future Work
In this work, we explore an interesting and novel problem: Can we identify nodes given the identifications of graphs? In one word, can we Invert the Graph Identification? The proposed IGI problem has totally different notions and statements from any of the node clustering, graph clustering, or multiple instance learning tasks. To address this new issue, we propose GMGCN with the integration of GMM and GNNs to resolve certain cases of IGI problem. Experimental results on Synthetic data and real-world data reveal the need for formulating IGI and the advantage of the proposed GMGCN over other related baselines. Nevertheless, our study is just an initial step towards the IGI problem while a variety of extensions are still potential. For example, can we infer the labels of edges or sub-graphs instead of nodes? Can we detect anomaly nodes other than conducting node clustering? We hope our study will open up a new vein of graph learning and encourage more specifications, solutions, and developments for IGI.
Broader Impact
Nowadays, many studies have been developed on Graph Identification (GI), such as graph classification and graph regression. However, the inverse problem of using the information of graphs to infer the information of nodes or even edges and sub-graphs has never been raised. In this paper, we present a new challenge named as Inverse Graph Identification (IGI). As opposed to GI, IGI mainly explores whether we can identify nodes given the labels of graphs they belong to. We also analyze various tasks belonging to the IGI problem so that more researchers can participate in the research of the IGI problem. By studying the IGI problem, many graph learning problems will be solved in a more appropriate and efficient way. The broader impact of our research can be summarized below:
For social media community: Through the research of social media tasks on IGI problem, different types of information communicators can be identified more accurately. For example, media reporters can use the proposed model to find appropriate interviewees, advertisers can discover the users in social networks for accurate advertising.
For biology and chemistry community: Biologists can divide proteins into groups according to the different interactions of proteins, and then use our method to find proteins with specific functions. Chemists can employ our method to discover functional groups with special properties.
For network security community: By dividing different people into groups, network security officers can utilize our method to discover suspicious users on the LAN, and can also employ our method to find suspicious operations or applications by grouping different operations or software together.
References
- [1] Jia Li, Yu Rong, Hong Cheng, Helen Meng, Wenbing Huang, and Junzhou Huang. Semi-supervised graph classification: A hierarchical graph perspective. In The World Wide Web Conference, pages 972–982, 2019.
- [2] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on pattern analysis and machine intelligence, 22(8):888–905, 2000.
- [3] Jay Pujara, Hui Miao, Lise Getoor, and William Cohen. Knowledge graph identification. In International Semantic Web Conference, pages 542–557. Springer, 2013.
- [4] Xinyi Zhou and Reza Zafarani. Fake news: A survey of research, detection methods, and opportunities. arXiv preprint arXiv:1812.00315, 2018.
- [5] Juan Cao, Junbo Guo, Xirong Li, Zhiwei Jin, Han Guo, and Jintao Li. Automatic rumor detection on microblogs: A survey. arXiv preprint arXiv:1807.03505, 2018.
- [6] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1263–1272. JMLR. org, 2017.
- [7] Hehuan Ma, Yu Rong, Wenbing Huang, Tingyang Xu, Weiyang Xie, Geyan Ye, and Junzhou Huang. Dual message passing neural network for molecular property prediction. arXiv preprint arXiv:2005.13607, 2020.
- [8] Shan Lu, Zhenmin Li, Feng Qin, Lin Tan, Pin Zhou, and Yuanyuan Zhou. Bugbench: Benchmarks for evaluating bug detection tools. In Workshop on the evaluation of software defect detection tools, volume 5, 2005.
- [9] Michelle Girvan and Mark EJ Newman. Community structure in social and biological networks. Proceedings of the national academy of sciences, 99(12):7821–7826, 2002.
- [10] Thomas Kipf, N. and Max Welling. Semi-supervised classification with graph convolutional networks. In Proceedings of the International Conference on Learning Representations, 2017.
- [11] Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, and Ghyslain Gagnon. Multiple instance learning: A survey of problem characteristics and applications. Pattern Recognition, 77:329–353, 2018.
- [12] Joan Bruna, Wojciech Zaremba, Arthur Szlam, and Yann Lecun. Spectral networks and locally connected networks on graphs. In International Conference on Learning Representations (ICLR2014), CBLS, April 2014, pages http–openreview, 2014.
- [13] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852, 2016.
- [14] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
- [15] Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020.
- [16] Muhan Zhang, Zhicheng Cui, Marion Neumann, and Yixin Chen. An end-to-end deep learning architecture for graph classification. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- [17] Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130, 2017.
- [18] Scott White and Padhraic Smyth. A spectral clustering approach to finding communities in graphs. In Proceedings of the 2005 SIAM international conference on data mining, pages 274–285. SIAM, 2005.
- [19] Matthew B Hastings. Community detection as an inference problem. Physical Review E, 74(3):035102, 2006.
- [20] Su-Yeon Kim, Tae-Soo Jung, Eui-Ho Suh, and Hyun-Seok Hwang. Customer segmentation and strategy development based on customer lifetime value: A case study. Expert systems with applications, 31(1):101–107, 2006.
- [21] Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010.
- [22] Geoffrey J McLachlan and Kaye E Basford. Mixture models: Inference and applications to clustering, volume 38. M. Dekker New York, 1988.
- [23] Xiao Wang, Peng Cui, Jing Wang, Jian Pei, Wenwu Zhu, and Shiqiang Yang. Community preserving network embedding. In Thirty-first AAAI conference on artificial intelligence, 2017.
- [24] Fei Tian, Bin Gao, Qing Cui, Enhong Chen, and Tie-Yan Liu. Learning deep representations for graph clustering. In Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014.
- [25] Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016.
- [26] Deyu Bo, Xiao Wang, Chuan Shi, Meiqi Zhu, Emiao Lu, and Peng Cui. Structural deep clustering network. In Proceedings of The Web Conference 2020, pages 1400–1410, 2020.
- [27] James Foulds and Eibe Frank. A review of multi-instance learning assumptions. The Knowledge Engineering Review, 25(1):1–25, 2010.
- [28] Thomas G Dietterich, Richard H Lathrop, and Tomás Lozano-Pérez. Solving the multiple instance problem with axis-parallel rectangles. Artificial intelligence, 89(1-2):31–71, 1997.
- [29] Stuart Andrews, Thomas Hofmann, and Ioannis Tsochantaridis. Multiple instance learning with generalized support vector machines. In AAAI/IAAI, pages 943–944, 2002.
- [30] Veronika Cheplygina, Marleen de Bruijne, and Josien PW Pluim. Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis. Medical image analysis, 54:280–296, 2019.
- [31] Pedro O Pinheiro and Ronan Collobert. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1713–1721, 2015.
- [32] Jiajun Wu, Yinan Yu, Chang Huang, and Kai Yu. Deep multiple instance learning for image classification and auto-annotation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3460–3469, 2015.
- [33] Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
- [34] Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing magazine, 13(6):47–60, 1996.
- [35] Arkaitz Zubiaga, Maria Liakata, and Rob Procter. Learning reporting dynamics during breaking news for rumour detection in social media. arXiv preprint arXiv:1610.07363, 2016.
- [36] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226–231, 1996.
- [37] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
- [38] Soumya Ray and Mark Craven. Supervised versus multiple instance learning: An empirical comparison. In Proceedings of the 22nd international conference on Machine learning, pages 697–704, 2005.
- [39] Alexander Strehl and Joydeep Ghosh. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, 3(Dec):583–617, 2002.
- [40] Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of classification, 2(1):193–218, 1985.
- [41] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- [42] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
- [43] Laurens Van Der Maaten. Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research, 15(1):3221–3245, 2014.
- [44] John Taylor. Introduction to error analysis, the study of uncertainties in physical measurements. 1997.
- [45] Cornelis Joost Van Rijsbergen. Information retrieval. 1979.
- [46] David Martin Powers. Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. 2011.
- [47] Gary Doran and Soumya Ray. A theoretical and empirical analysis of support vector machine methods for multiple-instance classification. Machine learning, 97(1-2):79–102, 2014.
Appendix A Supplementary Materials
A.1 Problem Statement
In this section, we will introduce some other cases under IGI problem. Most of them are still open problems that are waiting for further researches.
Notations. We denote a set of graph instances as with graphs, where refers to the -th graph instance and is the corresponding graph label of different categories. We denote by the -th graph instance of size with nodes and size with edges , by the feature matrix of nodes , by the feature matrix of edges , and by the adjacency matrix which associate edge with . We denote by the potential labels of nodes , by the potential scores of nodes , by the potential labels of edges , and by the potential labels of sub-graphs, which are invisible to the model training. In addition to the cases mentioned in the Notations and Problem Statement Section, more cases under IGI problem are introduced as below:
Case 5 (Infer Nodes Based on Semi-supervised Learning)
Given the whole , and the portion of the node labels as training data, how to infer the labels of the rest of nodes in the s?
It is a very common situation. For example, in the PHEHE data, we may be able to know a portion of users’ labels. Then, the IGI problem falls into Case 5. But to cope with Case 5, the proposed GMGCN needs a designated loss to gain extra information from the labeled nodes in the graphs.
Case 6 (Node Ranking)
Given a set of graph-label pairs and node features as training data, how to obtain the node score of the -th node in the -th graph, , where and ?
For example, one application could be to find the influential nodes such as the community leader in the graph.
Case 7 (Infer Edges)
Given a set of graph-label pairs , node features and edge features as training data, how to infer the label of the -th edge in the -th graph, , where and ?
This case could be applied to edge identification, for example, identifying labels of indirect jump instructions (including indirect jump, indirect call, and function return instructions) in a Control Flow Graph (CFG).
Case 8 (Infer Sub-graphs)
Given a set of graph-label pairs and node features as training data, how to infer all the sub-graph labels in the graphs s?
This case refers to a sub-graph identification task, for example, inferring the roles of the functional groups in each molecule given its chemical or physical properties.
Case 9 (Abnormal Detection)
Given a set of graph-label pairs and node features as training data, where most of belongs to one category as normal nodes but only a few of belongs to the other category as abnormal nodes, how to find out all the abnormal nodes in the graphs s?
Method | ACC | NMI | ARI | F1 | Pre@ |
K-Means | 99.13(0.00) | 0.00(0.00) | 0.00(0.00) | 49.78(0.00) | 0.00(0.00) |
GMM | 89.31(29.48) | 0.00(0.00) | 0.00(0.00) | 44.89(14.68) | 0.00(0.00) |
DBSCAN | 0.00(0.00) | 1.11(0.00) | 2.10(0.00) | 0.00(0.00) | 1.71(0.00) |
AE | 56.99(0.26) | 5.86(0.06) | 0.46(0.08) | 39.47(0.13) | 2.00(0.13) |
GAE | 56.99(0.26) | 5.86(0.06) | 0.46(0.08) | 39.47(0.13) | 2.00(0.13) |
VGAE | 57.77(1.38) | 5.39(0.13) | 0.18(0.19) | 38.93(0.60) | 2.25(0.29) |
SDCN | 91.58(0.00) | 0.00(0.00) | 0.00(0.00) | 47.66(0.00) | 0.00(0.00) |
SIL | 93.45(0.00) | 0.01(0.00) | 0.25(0.00) | 48.75(0.00) | 4.59(0.00) |
GMGCN | 76.09(7.55) | 7.09(2.23) | 0.02(0.14) | 45.98(1.21) | 0.71(0.80) |
The limitation of GMGCN on Case 9. The anomaly detection case under IGI problem is a task that the proposed GMGCN method and related clustering methods fail to cope with. We observe this limitation of the proposed GMGCN method and the baselines on a SocialGroups dataset provided by an anonymous company. The SocialGroups dataset consists of 2039 online social groups, each with no more than 100 users. On average, there are 2.9 abnormal users per graph. Each baseline method are elaborated in the next Section, and we adopt five metrics – Accuracy (ACC) [44], Normalized Mutual Information (NMI) [39], Adjusted Rand Index (ARI) [40], F1 score (F1) [45], and Precision@ (Pre@) [46] ( varies with the number of abnormal nodes in each graph)– to measure the effectiveness of anomaly detection. As shown in Table 3, each method performs very poorly, especially on the Pre@n metric. It indicates that for such extremely imbalanced data, the clustering methods, even for GMGCN, are unable to find the abnormal nodes in each graph effectively. Therefore, we need further studies on the anomaly detection case of IGI problem, and put forward a novel model to solve it effectively.
Of course, there are many cases under IGI problem, and more cases are left to other researches to explore.
A.2 Detailed Experimental Results
In this section, we present the detailed experimental results corresponding to the Experiment Section. First of all, we introduce the baseline methods mentioned in the Experiment Section. Then, we present the detailed results on synthetic and PHEHE datasets. Our code will be available on GitHub 222https://github.com/IGIproblem/IGIproblem.
A.2.1 Baselines
Feature Clustering Methods:
-
•
K-means [21]: A classical clustering method that aims to partition data points into clusters in which each data point belongs to the cluster with the nearest cluster centroid.
-
•
GMM [22]: A probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters.
-
•
DBSCAN [36]: A density-based algorithm for discovering clusters in large spatial databases with noise.
-
•
AE [37]: A two-stage deep clustering algorithm. We perform GMM on the representations learned by autoencoder.
Graph Clustering Methods:
-
•
GAE & VGAE [25]: A structural deep clustering model that combines GCN with the (variational) autoencoder to learn representations. we perform GMM on the representations learned by graph autoencoder.
-
•
SDCN [26]: A structural deep clustering network model that integrates the structural information into deep clustering.
Multiple Instance Learning:
-
•
SIL [38]: Single-Instance Learning (SIL) is a MIL approach that assigns each instance the label of its bag, creating a supervised learning problem but mislabeling negative instances in positive bags.
We implement -means, GMM and DBSAN with scikit-learn333https://scikit-learn.org; AE, GAE, VGAE, SDCN with Pytorch444https://pytorch.org/; SIL with a Python implementation created by [47]. To be consistent with related works [26], the dimension of AE is set to d-500-500-2000-10, where d is the dimension of the input data, and the dimension of GAE and VGAE is set to d-256-16. We train AE with 100 epochs, GAE and VGAE with 400 epochs, and SDCN with 200 epochs.