Enhancing Link Prediction with Fuzzy Graph Attention Networks and Dynamic Negative Sampling

Jinming Xing¹, Ruilin Xing² ¹North Carolina State University ²Guangxi University [email protected], [email protected]

Abstract

Link prediction is crucial for understanding complex networks but traditional Graph Neural Networks (GNNs) often rely on random negative sampling, leading to suboptimal performance. This paper introduces Fuzzy Graph Attention Networks (FGAT), a novel approach integrating fuzzy rough sets for dynamic negative sampling and enhanced node feature aggregation. Fuzzy Negative Sampling (FNS) systematically selects high-quality negative edges based on fuzzy similarities, improving training efficiency. FGAT layer incorporates fuzzy rough set principles, enabling robust and discriminative node representations. Experiments on two research collaboration networks demonstrate FGAT’s superior link prediction accuracy, outperforming state-of-the-art baselines by leveraging the power of fuzzy rough sets for effective negative sampling and node feature learning.

Index Terms:

Link Prediction, Graph Neural Networks, Fuzzy Rough Sets, Negative Sampling

I Introduction

Link prediction has emerged as a crucial task in network analysis with extensive applications across diverse domains. In medical sciences, it aids in predicting protein-protein interactions and drug-target associations; in financial systems, it helps detect fraudulent transactions and assess credit risks; and in chemistry, it facilitates the discovery of novel molecular structures and chemical reactions. The ability to accurately predict potential connections in these complex networks has significant implications for scientific advancement and practical applications.

Graph Neural Networks (GNNs) have demonstrated remarkable success in link prediction tasks, primarily due to their inherent capability to capture and process structural information in graph-structured data. However, a critical limitation in existing GNN-based approaches lies in their negative sampling methodology. Contemporary methods typically employ random sampling strategies to select negative edges, disregarding the rich semantic and structural information encoded in node representations. This oversight significantly hampers the training process, resulting in slower convergence rates and suboptimal model performance. An ideal negative sampling mechanism should not only leverage node embeddings effectively but also adaptively select high-quality negative samples based on the model’s current state, ensuring both dynamic responsiveness and sampling accuracy.

While various methodologies have been explored to enhance link prediction accuracy, the potential of fuzzy rough sets—a mathematical framework for measuring fuzzy relations and handling uncertainty—remains largely unexplored in the context of GNNs and link prediction. This theoretical framework offers unique advantages in capturing imprecise relationships and handling ambiguous data structures, making it particularly suitable for network analysis tasks.

To address these limitations and leverage the untapped potential of fuzzy rough sets, we propose a novel fuzzy rough sets-based negative sampling strategy called Fuzzy Negative Sampling (FNS). This approach systematically evaluates candidate negative edges through their fuzzy lower approximation values, selecting the top K candidates as negative training instances. Furthermore, we introduce Fuzzy Graph Attention Network (FGAT), an enhanced graph neural architecture designed to aggregate neighboring node information in a more robust and effective manner.

The main contributions of this work can be summarized as follows:

•

We introduce FNS, a novel negative sampling framework that leverages fuzzy rough sets theory to identify high-quality negative edges, significantly improving the effectiveness of the training process in link prediction tasks.
•

We propose FGAT, an innovative graph attention network that incorporates fuzzy rough set principles to achieve more robust and discriminative node representations.
•

We conduct comprehensive experiments across two real-world datasets, demonstrating the effectiveness of our proposed framework.

II Related Work

Graph Neural Networks have demonstrated remarkable versatility across various graph-based learning tasks. In node classification, seminal works like GraphSAGE [1] and Graph Attention Networks (GAT) [2] have established foundational approaches for learning node representations through neighborhood aggregation. GCN [3] introduced convolutional operations on graphs, enabling efficient feature propagation across network structures. For graph classification tasks, hierarchical pooling mechanisms have been developed, with DiffPool [4] and TopKPool [5] proposing learnable strategies to generate graph-level representations. In the context of link prediction, SEAL [6] pioneered the use of local subgraphs for edge existence prediction, while VGAE [7] employed variational autoencoders for learning edge formation patterns. Recent advances include NGNN [8], which introduces neural architecture improvements specifically designed for link prediction tasks.

Fuzzy rough sets theory, initially proposed by Dubois and Prade [9], has evolved into a powerful framework for handling uncertainty and imprecision in data analysis. In feature selection, Jensen and Shen [10] developed fuzzy-rough attribute reduction algorithms that significantly outperform traditional approaches in identifying relevant features while maintaining information fidelity. The application of fuzzy rough sets in medical diagnosis has been exemplified by works such as [11], where they effectively handle the inherent uncertainty in patient data for more accurate disease classification. For uncertainty measurement, the framework has been extensively studied in theoretical works by [12] and [13], establishing mathematical foundations for quantifying various types of uncertainty in data relationships. Recent developments include hybrid approaches combining fuzzy rough sets with deep learning and applications in big data analytics [14] , demonstrating the framework’s adaptability to modern computational challenges.

Except the innovative applications of Graph Neural Networks and fuzzy rough set theory in handling complex data and multi-task learning, against the backdrop of rapid advancements in machine learning, applications like knowledge graph technology in intelligent question-answering systems have become a key area of development for graph learning methods, effectively integrating multiple data sources to support flexible knowledge processing[15]. Similarly, multi-model integration technology applied to automated generation systems has significantly enhanced content generation flexibility and quality, providing valuable insights for representation learning based on graph data[16]. In network security, Graph Neural Networks have demonstrated strong generalization capabilities in supporting Botnet detection through machine learning, precisely identifying abnormal behaviors and enhancing network defense levels[17].

In recent years, the scope of machine learning applications has continuously expanded, especially in image processing and VR fields. Blockchain-enhanced image retrieval systems, for example, have brought revolutionary improvements in data security and retrieval efficiency[18]. In VR and robotic interaction, AI-vision-powered intelligent systems have explored new methods for balancing real-world interaction and virtual immersion, making human-computer interaction more natural and seamless[19]. Furthermore, the integration of fuzzy rough set theory with deep learning has also achieved breakthroughs in traffic data prediction, enabling more accurate short-term forecasting through multi-source data fusion, effectively adapting to the dynamic demands of complex data environments[20]. Overall, these technological innovations not only showcase the broad applicability of graph learning methods and fuzzy rough sets across different task scenarios but also reinforce their theoretical and practical value in big data and intelligent applications.

III Methodology

Figure 1 illustrates our proposed framework, which consists of two main components:

•

Fuzzy Negative Sampling: A mechanism that selects high-quality negative edges based on fuzzy similarities, where negative edges with high fuzzy similarity are dynamically selected for the FGAT framework’s training.
•

FGAT Convolution Layer: A specially designed layer for effective neighbor node information aggregation. Multiple FGAT convolution layers are stacked to capture multi-hop information.

In the following sections, we first detail the computation of fuzzy similarities using fuzzy rough sets for high-quality negative edge selection. Subsequently, we elaborate on the FGAT architecture, followed by a comprehensive framework summary.

Refer to caption — Figure 1: The FGAT Framework

III-A Fuzzy Negative Sampling

A fuzzy information system is defined as a tuple $(U,A,V,f)$ , where $U$ represents a non-empty finite set of samples, $A$ denotes the finite set of sample attributes, $V$ represents the domain of all attributes in $A$ , expressed as $V=\bigcup_{i}V_{i}$ where $V_{i}$ is the domain of attribute $i$ , and $f$ is a mapping function $U\times A\rightarrow V$ [11].

For an attribute set $B\subseteq A$ and a fuzzy equivalence relation $R$ , we can compute a coverage of the universe $U$ . For a sample $x$ , we denote its coverage under the fuzzy equivalence relation $R$ as $[x]_{R}$ . The membership of a sample $y$ to the coverage $[x]_{R}$ is defined as $[x]_{R}(y)=R(x,y)$ , where $R(x,y)$ quantifies the similarity between samples $x$ and $y$ under relation $R$ . For any sample $x\in U$ and subset $X\subseteq A$ , the fuzzy lower and upper approximations of sample $x$ to $X$ are defined as [11]:

\begin{split}\underline{R_{S}}X(x)&=\inf_{y\in U}S(N(R(x,y)),X(y)),\\ \overline{R_{T}}X(x)&=\sup_{y\in U}T(R(x,y),X(y))\end{split}

(1)

where $S$ and $T$ represent fuzzy triangular conorm (S-norm) and fuzzy triangular norm (T-norm) respectively, and $N(x)=1-x$ .

Using the conventional min-max version of $T$ and $S$ norms, for a set of samples $d_{i}$ of class $i$ and corresponding attribute set $B\subseteq A$ , Equation 1 can be reformulated as:

\begin{split}\underline{R_{B}}d_{i}(x)&=\inf_{y\in U}\max(1-R(x,y),d_{i}(y)),\\ \overline{R_{B}}d_{i}(x)&=\sup_{y\in U}\min(R(x,y),d_{i}(y))\end{split}

(2)

To capture non-linear high-level similarities, $R$ typically employs kernel functions, including the Gaussian kernel: $k_{G}(x,y)=\exp(-\frac{||x-y||^{2}}{\delta})$ , exponential kernel: $k_{E}(x,y)=\exp(-\frac{||x-y||}{\delta})$ , and rational quadratic kernel: $k_{R}(x,y)=1-\frac{||x-y||^{2}}{||x-y||^{2}+\delta}$ .

During each training epoch, negative links are dynamically selected based on their quality scores. For any potential negative link with end nodes $(x,y)$ , the quality score is computed as:

Score(x,y)=\alpha\times\underline{R_{B}}d_{y}(x)+(1-\alpha)\times\underline{R_{B}}d_{x}(y)

(3)

where $\alpha$ is a hyperparameter.

While computing quality scores for all possible negative edges and selecting the top k would be optimal, this approach becomes computationally intractable for large dense graphs. For a graph with $N$ nodes and $E$ edges, there exist $N\times(N-1)-E$ potential directed negative edges. To address this computational challenge, we randomly select $2E$ negative edges and select the top $E$ edges among them. This strategy reduces the computational complexity from $N\times(N-1)-E$ to $2E$ while maintaining near-optimal performance.

The selected top $E$ negative edges are combined with the original positive edges to form the training dataset. To prevent class imbalance issues, we maintain an equal number of selected negative edges and original positive edges.

III-B FGAT Convolution Layer

The FGAT convolution layer integrates GAT convolution layers with linear layers, incorporating layer normalization for training acceleration and dropout mechanisms for effective regularization.

Given an undirected graph $G=(V,E)$ , where $V$ represents the set of nodes and $E$ denotes the set of edges, each node $v\in V$ is associated with a feature vector $\mathbf{h}_{v}\in\mathbb{R}^{F}$ , where $F$ represents the dimension of input features per node. The FGAT layer aims to compute updated node representations $\mathbf{h}_{v}^{\prime}\in\mathbb{R}^{F^{\prime}}$ , where $F^{\prime}$ denotes the output feature dimension, by performing weighted aggregation of features from each node’s neighborhood.

For a node pair consisting of node $v$ and its neighbor $u$ , the attention coefficient $e_{vu}$ is computed through:

e_{vu}=\text{LeakyReLU}\left(\mathbf{a}^{T}\left[\mathbf{W}\mathbf{h}_{v}\parallel\mathbf{W}\mathbf{h}_{u}\right]\right)

(4)

where:

•

$\mathbf{W}\in\mathbb{R}^{F^{\prime}\times F}$ represents a learnable weight matrix that transforms node features linearly.
•

$\parallel$ indicates vector concatenation.
•

$\mathbf{a}\in\mathbb{R}^{2F^{\prime}}$ denotes a learnable weight vector.
•

LeakyReLU serves as the activation function, typically configured with a small negative slope (e.g., 0.2).

The attention coefficients then undergo normalization across each node’s neighborhood using the softmax function:

\alpha_{vu}=\frac{\exp(e_{vu})}{\sum_{k\in\mathcal{N}(v)}\exp(e_{vk})}

(5)

where $\mathcal{N}(v)$ represents the neighborhood set of node $v$ .

The normalized attention scores $\alpha_{vu}$ facilitate the computation of updated node features $\mathbf{h}_{v}^{\prime}$ through weighted aggregation:

\mathbf{h}_{v}^{\prime}=\sigma\left(\sum_{u\in\mathcal{N}(v)}\alpha_{vu}\mathbf{W}\mathbf{h}_{u}\right)

(6)

where $\sigma$ represents a non-linear activation function, typically implemented as ReLU.

To enhance model robustness and representational capacity, the GAT layers employ multi-head attention mechanisms. Specifically, K independent attention heads operate in parallel, each generating distinct attention coefficients and feature representations. These representations are subsequently concatenated to produce the final output:

\mathbf{h}_{v}^{\prime}=\parallel_{k=1}^{K}\sigma\left(\sum_{u\in\mathcal{N}(v)}\alpha_{vu}^{(k)}\mathbf{W}^{(k)}\mathbf{h}_{u}\right)

(7)

where $\alpha_{vu}^{(k)}$ and $\mathbf{W}^{(k)}$ correspond to the attention coefficient and weight matrix of the k-th attention head, respectively.

Layer normalization [21] is incorporated to stabilize and expedite the training process by normalizing layer inputs. For an input vector $\mathbf{h}=[h_{1},h_{2},\dots,h_{d}]$ with $d$ features, the normalized output $\mathbf{\hat{h}}=[\hat{h}_{1},\hat{h}_{2},\dots,\hat{h}_{d}]$ is computed as:

\hat{h}_{i}=\frac{h_{i}-\mu}{\sqrt{\sigma^{2}+\epsilon}}

(8)

where $\mu=\frac{1}{d}\sum_{i=1}^{d}h_{i}$ represents the mean, $\sigma^{2}=\frac{1}{d}\sum_{i=1}^{d}(h_{i}-\mu)^{2}$ denotes the variance, and $\epsilon$ is a small constant ensuring numerical stability. The final output $y$ is obtained through the application of learnable scaling parameter $\gamma$ and bias term $\beta$ :

y_{i}=\gamma\hat{h}_{i}+\beta

(9)

III-C The FGAT Framework

As illustrated in Figure 1, the FGAT framework operates through a systematic process that begins with dynamic negative edge selection during each training epoch, utilizing the given adjacency matrix. These dynamically selected negative edges, along with the existing positive edges, are subsequently processed by the FGAT layer in conjunction with their corresponding node embeddings. To effectively capture long-range dependencies within the graph structure, multiple FGAT layers are cascaded, with residual connections implemented to enhance training stability and information flow. Following the iterative processing through these layers, we obtain updated node representations $H=\{h_{1},h_{2},\dots,h_{N}\}$ . The probability of link existence between any pair of nodes $x$ and $y$ is then computed as:

P_{r}^{link}(x,y)=\text{Sigmoid}(h_{x}h_{y}^{T})

(10)

The framework’s effectiveness stems from two key components: the fuzzy negative sampling technique, which efficiently identifies and selects high-quality negative edges, and the FGAT layer architecture, which performs iterative neighbor information aggregation. The empirical validation of this framework’s performance is documented in the experiments section, demonstrating its effectiveness in link prediction tasks.

IV Experiments

We conduct comparative evaluations of FGAT against several state-of-the-art baselines using two research collaboration network datasets. The experimental results demonstrate the superior performance of FGAT. In this section, we present detailed information about the datasets, experimental settings, evaluation metrics, and analysis of results.

IV-A Datasets

Our evaluation utilizes two research collaboration network datasets, summarized in Table I.

TABLE I: Datasets Summary

	Ca-netscience	Ca-sandi-auths
#Nodes	379	86
#Edges	914	124
#AvgDegree	2.4	1.4
Directed	TRUE	TRUE
%Training Edges	0.7	0.7
%Validation Edges	0.1	0.1
%Testing Edges	0.2	0.2

TABLE II: Experiment Results

Methods	Ca-netscience				Ca-sandi-auths
Methods	Precision	Recall	F1	ROC	Precision	Recall	F1	ROC
MLP	0.4931	0.5879	0.5363	0.5044	0.5581	1.0000	0.7164	0.6181
GCN	0.5903	0.7363	0.6553	0.7078	0.5417	0.5417	0.5417	0.5304
GraphSAGE	0.5502	0.8132	0.6563	0.6277	0.4651	0.8333	0.5970	0.5885
GAT	0.6034	0.7692	0.6763	0.6916	0.6053	0.9583	0.7419	0.7240
FGAT	0.6667	0.6593	0.6630	0.7422	0.6216	0.9583	0.7541	0.7170

The Ca-netscience dataset comprises 379 nodes and 914 edges, with an average node degree of 2.4. In contrast, Ca-sandi-auths exhibits a more sparse structure with fewer nodes, edges, and a lower average node degree. Both datasets are directed networks. For experimental purposes, we employ a 70-10-20 split ratio, where 70% of the data is used for training, 10% for validation (early stopping), and 20% for testing.

IV-B Experiment Settings and Evaluation Metrics

We benchmark FGAT against several prominent baseline models: MLP, GCN [3], GraphSAGE [1], and GAT [2]. All baseline models maintain their default parameter configurations. For FGAT implementation, we configure the embedding dimension to 128 and employ a stack of 4 FGAT convolution layers. The dataset partitioning follows the aforementioned 0.7:0.1:0.2 ratio for training, validation, and testing, respectively.

To ensure a comprehensive performance assessment, we employ multiple evaluation metrics: Precision, Recall, F1 score, and ROC score. This diverse set of metrics provides a multifaceted evaluation, enabling thorough analysis of each model’s capabilities across different performance aspects.

IV-C Results

The experimental results are presented in Table II, yielding several significant observations:

•

On the Ca-netscience dataset, MLP demonstrates the poorest performance, attributable to its inability to capture spatial information encoded in the adjacency matrix. GCN, GraphSAGE, and GAT exhibit comparable performance levels, with GraphSAGE achieving superior Recall scores and GAT excelling in F1 metrics. For the Ca-sandi-auths dataset, MLP achieves notable Recall but relatively inferior Precision, suggesting overfitting tendencies and limited generalization capability. GAT, leveraging its attention mechanism, achieves the highest ROC scores among baseline methods.
•

FGAT outperforms baseline methods across both datasets in terms of the average of four evaluation metrics. Specifically, it demonstrates an average improvement of 7.11% across all metrics on Ca-netscience, and a more substantial 15.55% improvement on Ca-sandi-auths. This superior performance can be attributed to two key factors: the fuzzy negative sampling mechanism, which enables focused learning on error-prone edges, and the FGAT layer architecture, which provides robust message aggregation capabilities.

V Conclusion

The proposed FGAT framework, combining FNS and a novel graph attention layer, significantly improves link prediction performance compared to existing methods. FNS effectively identifies informative negative edges by leveraging fuzzy rough sets, leading to more focused and efficient model training. The FGAT layer, integrating fuzzy set concepts, captures complex relationships in graph data, resulting in superior node representations for accurate link prediction. The paper’s findings highlight the potential of fuzzy rough sets in advancing GNNs for link prediction tasks and pave the way for future research exploring fuzzy set theory in graph-based learning.

References

[1] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.
[2] P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” arXiv preprint arXiv:1710.10903, 2017.
[3] T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
[4] Z. Ying, J. You, C. Morris, X. Ren, W. Hamilton, and J. Leskovec, “Hierarchical graph representation learning with differentiable pooling,” Advances in neural information processing systems, vol. 31, 2018.
[5] F. Diehl, “Edge contraction pooling for graph neural networks,” arXiv preprint arXiv:1905.10990, 2019.
[6] M. Zhang and Y. Chen, “Link prediction based on graph neural networks,” Advances in neural information processing systems, vol. 31, 2018.
[7] T. N. Kipf and M. Welling, “Variational graph auto-encoders,” arXiv preprint arXiv:1611.07308, 2016.
[8] M. Zhang and P. Li, “Nested graph neural networks,” Advances in Neural Information Processing Systems, vol. 34, pp. 15 734–15 747, 2021.
[9] D. Dubois and H. Prade, “Rough fuzzy sets and fuzzy rough sets,” International Journal of General System, vol. 17, no. 2-3, pp. 191–209, 1990.
[10] R. Jensen and Q. Shen, “Fuzzy–rough attribute reduction with application to web categorization,” Fuzzy sets and systems, vol. 141, no. 3, pp. 469–485, 2004.
[11] J. Xing, C. Gao, and J. Zhou, “Weighted fuzzy rough sets-based tri-training and its application to medical diagnosis,” Applied Soft Computing, vol. 124, p. 109025, 2022.
[12] C. Gao, J. Zhou, J. Xing, and X. Yue, “Parameterized maximum-entropy-based three-way approximate attribute reduction,” International Journal of Approximate Reasoning, vol. 151, pp. 85–100, 2022.
[13] J. Ye, J. Zhan, W. Ding, and H. Fujita, “A novel fuzzy rough set model with fuzzy neighborhood operators,” Information Sciences, vol. 544, pp. 266–297, 2021.
[14] W. Ji, Y. Pang, X. Jia, Z. Wang, F. Hou, B. Song, M. Liu, and R. Wang, “Fuzzy rough sets and fuzzy rough neural networks for feature selection: A review,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 11, no. 3, p. e1402, 2021.
[15] C. Lin, S. Chen, X. Yang, C. Li, C. Qu, and Q. Chen, “Research and application of knowledge graph technology for intelligent question answering,” in 2021 12th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP). IEEE, 2021, pp. 152–156.
[16] X. Yang, “Research on automatic composition based on multiple machine learning models,” in 2021 3rd International Conference on Artificial Intelligence and Advanced Manufacture, 2021, pp. 1206–1209.
[17] X. Yang, Z. Guo, and Z. Mai, “Botnet detection based on machine learning,” in 2022 International Conference on Blockchain Technology and Information Security (ICBCTIS). IEEE, 2022, pp. 213–217.
[18] S. Zhao, J. Xie, C. Lin, X. Nie, J. Ye, X. Yang, and P. Xu, “Image retrieval based on blockchain,” in 2022 International Conference on Blockchain Technology and Information Security (ICBCTIS). IEEE, 2022, pp. 210–212.
[19] X. Yang, Y. Zhan, Y. Iwasaki, M. Shi, S. Tang, and H. Iwata, “Balancing real-world interaction and vr immersion with ai vision robotic arm,” in 2023 IEEE International Conference on Mechatronics and Automation (ICMA). IEEE, 2023, pp. 2051–2057.
[20] X. Deng, H. Zhou, X. Yang, and C. Ye, “Short-term traffic condition prediction based on multi-source data fusion,” in International Conference on Data Mining and Big Data. Springer, 2021, pp. 327–335.
[21] R. Xiong, Y. Yang, D. He, K. Zheng, S. Zheng, C. Xing, H. Zhang, Y. Lan, L. Wang, and T. Liu, “On layer normalization in the transformer architecture,” in International Conference on Machine Learning. PMLR, 2020, pp. 10 524–10 533.