Beyond Generalization: A Survey of Out-Of-Distribution Adaptation on Graphs
Abstract
Distribution shifts on graphs – the data distribution discrepancies between training and testing a graph machine learning model, are often ubiquitous and unavoidable in real-world scenarios. Such shifts may severely deteriorate the performance of the model, posing significant challenges for reliable graph machine learning. Consequently, there has been a surge in research on graph Out-Of-Distribution (OOD) adaptation methods that aim to mitigate the distribution shifts and adapt the knowledge from one distribution to another. In our survey, we provide an up-to-date and forward-looking review of graph OOD adaptation methods, covering two main problem scenarios including training-time as well as test-time graph OOD adaptation. We start by formally formulating the two problems and then discuss different types of distribution shifts on graphs. Based on our proposed taxonomy for graph OOD adaptation, we systematically categorize the existing methods according to their learning paradigm and investigate the techniques behind them. Finally, we point out promising research directions and the corresponding challenges. We also provide a continuously updated reading list at https://github.com/kaize0409/Awesome-Graph-OOD-Adaptation.git
1 Introduction
Motivated by the prevalence of graph-structured data in various real-world scenarios, growing attention has been paid to graph machine learning, which seeks to efficiently capture relationships and dependencies among entities within graphs. In particular, Graph Neural Networks (GNNs) are able to effectively learn the representations on graphs through message-passing Kipf and Welling (2017); Wu et al. (2019a); Hamilton et al. (2017), which have demonstrated remarkable success across diverse applications, such as social networks, physics problems, and traffic networks Bi et al. (2023b); Liu et al. (2023); Zhu et al. (2021b).
While graph machine learning has achieved notable success, most of the existing efforts presume that test data follows the same distribution as training data, which is often invalid in the wild. The performance of traditional graph machine learning methods may substantially degrade when confronted with Out-Of-Distribution (OOD) samples, limiting their efficacy in high-stake graph applications Li et al. (2022). Numerous methods have been proposed to tackle distribution shifts for Euclidean data Zhuang et al. (2020); Liang et al. (2023); Fang et al. (2022). However, applying these methods to graphs is restricted, as the interconnected entities on graphs violate the IID assumption underlying traditional machine learning methods. Moreover, the complex graph shift types present new challenges. These shifts could happen in different modalities including features, structures, and labels, and can be manifested in various forms such as variations in graph sizes, subgraph densities, and homophily Chen et al. (2022b). Given these obstacles, increasing research efforts have been put into improving the reliability of graph machine learning against distribution shifts, ranging from graph OOD generalization Li et al. (2022); Chen et al. (2022b) to graph OOD adaptation Zhu et al. (2021a); Liu et al. (2023).
Compared to graph OOD generalization, which assumes the model has no access to target data and aims to achieve satisfactory generalization performance on any unseen distribution, graph OOD adaptation takes a step further by efficiently incorporating information from the target distribution. With the goal of training or tuning a model to perform well under the specific target distribution, graph OOD adaptation methods excel in scenarios where integrating information from the partially observable target data is crucial, such as transferring knowledge from the well-labeled air transport network in one region to the unlabeled air transport network in another region Zhu et al. (2021b), or dealing with distribution discrepancies in time-evolving citation networks Zhu et al. (2022). While several surveys have extensively investigated graph OOD generalization and its closely related techniques Li et al. (2022); Xia et al. (2022), a systematic review of graph OOD adaptation has been overlooked, despite the significance and fast growth of the area.
With recent progress on graph OOD adaptation, an up-to-date and forward-looking review of this critical problem is urgently needed. In this survey, we provide, to our best knowledge, the first formal and systematic review of the literature on graph OOD adaptation. We start by formally formulating the problems and discussing different graph distribution shift types in graph machine learning. Afterward, a new taxonomy for graph OOD adaptation is proposed, classifying existing methods into two categories based on the model learning scenario: (1) training-time graph OOD adaptation, where the distribution adaptation happens during model training on both source and target distributions You et al. (2023); Zhu et al. (2023a), and (2) test-time graph OOD adaptation, where the adaptation is performed based on a model pre-trained on the source distribution Jin et al. (2023); Zhu et al. (2023b). For each of the problems, we further categorize the existing methods as model-centric approaches and data-centric approaches. Within each subline of research, we elaborate on the detailed techniques for mitigating distribution shifts on graphs. Based on the current progress on graph OOD adaptation, we also point out several promising research directions in this evolving field.
2 Graph Out-Of-Distribution Adaptation
2.1 Problem Definition
Let denote the node set of a graph , where is the adjacency matrix and is the node feature matrix. Denote a graph model characterized by parameters as . For node-level or edge-level tasks, we adopt a local view and fragment the graph as a set of k-hop subgraphs of the focal node or edge to accommodate the non-iid nature of graph entities, adhering to previous works Wu et al. (2022c); Zhu et al. (2021b). Consequently, for node-level tasks, the graph model can be written as , where and denote the k-hop subgraph for learning the node representations and the label of the node ; for edge-level tasks, the graph model can be written as , where and represent the k-hop subgraph for learning the edge representation and the label of the edge ; and for graph-level tasks, the model can be written as , where is the label of the entire graph. As a whole, the model can be denoted as , where Y represents the label (matrix) of the graph. Without loss of generality, we focus on node-level tasks in the following problem definition, while this can naturally be extended to edge-level and graph-level tasks.
Problem Definition 1.
Training-time Graph OOD Adaptation: Let denote the node set for instances from the source distribution , and denote the node set for labeled and unlabeled instances from the target distribution . Given source instances and target instances . Under the assumption that there exist distribution shifts between source and target , the goal of training-time graph OOD adaptation is to learn an optimal model based on the given instances, such that
(1) |
Problem Definition 2.
(Test-time Graph OOD Adaptation): Given a model pre-trained on source instances , and target instances , the goal of test-time graph OOD adaptation is to adapt the pre-trained model so that it achieves Equation 1 under the condition that .
Further, we call the problem unsupervised if , namely, none of the target instances are labeled. Otherwise, when a proportion of target instances are labeled, we call it a semi-supervised problem. An illustration of training-time graph OOD adaptation and test-time graph OOD adaptation can be found in Figure 1.

Graph Distribution Shift Types. In traditional machine learning, several studies have discussed and defined various types of distribution shifts Moreno-Torres et al. (2012), Kull and Flach (2014), of which the most widely-used concepts are covariate shifts (shifts in ) and concept shifts (shifts in or ). These concepts can naturally be extended to graph setting by replacing feature inputs X with the graph inputs .
-
•
Covariate Shifts. Covariate shifts on graphs emphasize the changes in graph inputs , which can be further decomposed and interpreted as structure shifts, size shifts, and feature shifts Li et al. (2022).
-
•
Concept Shifts. Concept shifts on graphs highlight the shifts in the relationship between graph inputs and labels or . The concept shifts can be further decomposed to reveal more specific graph distribution shift types, such as the recently proposed conditional structure shift Liu et al. (2023).
Additionally, the usage of these concepts often extends beyond the input space to the latent representation space , with covariate shifts describing the distribution shifts in latent representations , and concept shifts describing the changes in or .

2.2 Discussion on Related Topics
Several topics are closely related to graph OOD adaptation, including: graph transfer learning, graph domain adaptation, graph OOD generalization, and fair and debiased graph learning. These topics, sharing similar goals with graph OOD adaptation but exhibiting nuanced differences, are further discussed in this subsection.
Graph Transfer Learning. In comparison to graph OOD adaptation, graph transfer learning encompasses a broader scope and does not specifically focus on addressing distribution shifts. It involves the transfer of knowledge across distribution changes and across distinct tasks, leveraging knowledge acquired from one graph-related domain or task to enhance performance in another context.
Graph Domain Adaptation. Following traditional domain adaptation, graph domain adaptation methods typically rely on the covariate shift assumption about an invariant relationship between graph inputs and labels, represented as . The objective is then to address distribution shifts in the input graph space across domains . In contrast to graph domain adaptation, the focus of this survey – graph OOD adaptation is more general and comprehensive, involving distribution shifts that go beyond the covariate shift assumption.
Graph OOD Generalization. Graph OOD generalization and graph OOD adaptation pursue analogous objectives in developing models capable of handling OOD target data. In graph OOD generalization, where target data is usually assumed to be inaccessible, the primary focus is on training the model for broad generalizability, ensuring the model’s effectiveness on test graphs from any potential unseen distribution. In contrast, graph OOD adaptation fully leverages the observed target data and aims to adapt the model to the specific target distribution.
Fair and Debiased Graph Learning. To promote fairness among sensitive groups or mitigate GNN-induced bias issues, an ideal graph model should satisfy the following condition: , where represents the predicted probability of the model, and S indicates the latent group related to sensitive attributes such as gender Kose and Shen (2022), or bias-related structural information such as the node degree Ju et al. (2023). Although both share the goal of alleviating distribution discrepancies, fair and debiased graph learning strives to mitigate the discrepancies in the estimated posterior distributions between different groups to ensure fairness. On the other hand, graph OOD adaptation focuses on handling discrepancies in the population distribution between training and test to improve model performance.
2.3 Taxonomy
From previous problem definitions, training-time and test-time graph OOD adaptation significantly differ in model learning scenarios, with training-time adaptation starting from scratch while test-time adaptation starts from a pre-trained model. Consequently, in the following two sections, we first categorize existing methods into training-time graph OOD adaptation and test-time graph OOD adaptation. Within each section, we follow related surveys Zhuang et al. (2020) Yu et al. (2023) and further classify methods into model-centric and data-centric approaches. Model-centric approaches center on the learning process or the design of the graph model, while data-centric approaches emphasize the manipulation of input graphs, such as adjusting input instances or transforming graph structure or features. Our taxonomy is shown in Figure 2.
3 Training-Time Graph OOD Adaptation
Generally, training-time graph OOD adaptation serves three primary objectives in different scenarios.
-
•
Observation Bias Correction. For semi-supervised classification within a single graph, distribution shifts between training and test instances may arise from observation bias related to latent subpopulation Bi et al. (2023b), or the time-evolving nature of graphs Zhu et al. (2022). Mitigating distribution shifts in this setting may enhance the model’s performance on test instances.
-
•
Cross-graph Knowledge Transfer. In order to transfer knowledge from well-labeled graphs to graphs with limited labels, it is crucial to properly handle distribution shifts between graphs since distinct graphs typically exhibit varied data distributions.
-
•
Negative Augmentation Mitigation. Graph data augmentation, which utilizes the augmented data as additional training data, is commonly used for improving model generalization or alleviating label scarcity issues. However, overly severe distribution shifts between the original and augmented data may lead to the negative augmentation problem Wu et al. (2022b), Liu et al. (2022). Therefore, controlling distribution shifts is essential to avoid inferior model performance and fully exploit the benefits of augmented data.
In this section, we discuss existing training-time graph OOD adaptation methods, highlighting the techniques for mitigating distribution shifts behind these methods. Additional information such as the task, objective, and supervision, can be found in Table 1.
3.1 Model-Centric Approaches
In this subsection, we introduce model-centric approaches for training-time graph OOD adaptation. These approaches can be further categorized into distributionally aligned representation learning which aims at learning aligned representations, and model regularization which focuses on achieving effective knowledge transfer through model regularization.
Distributionally Aligned Representation Learning. Generally, the deep graph model can be decomposed as , where is a representation learner mapping the graph inputs to latent representations H, and is a classifier in the latent space. Existing literature on learning aligned representations can be further divided into domain-invariant representation learning and concept-shift aware representation learning.
-
•
Domain-invariant Representation Learning is frequently employed for domain adaptation under covariate assumption, in which an invariant relationship between latent representations and labels is assumed. Inspired by the theoretical generalization bound Ben-David et al. (2006), domain-invariant representation learning methods aim to train a representation learner such that the discrepancies between the induced marginal source distribution and target distribution can be reduced, and at the same time, to find a classifier in the latent space that achieves small empirical source risk. To achieve these two goals, the loss function for domain-invariant representation learning is usually formulated as:
(2) where denotes a regularization term that facilitates the alignment of the induced marginal distribution . Three strategies are mainly adopted: explicit distance minimization, adversarial training, and disentangled learning.
-
–
(1) Explicit Distance Minimization directly employs the distance between marginal distributions as the regularization term in Equation 2. Methods vary in terms of the choice of distance metric and the specific representations they aim to align. SR-GNN Zhu et al. (2021a) considers central moment discrepancy as regularization and aligns distribution discrepancies in the final layer of traditional GCN. CDNE Shen et al. (2020b), GraphAE Guo et al. (2023) and GRADE Wu et al. (2023) target at minimizing the statistical discrepancies between source and target across all latent layers, with the regularization term as a summation of distribution distances of different layers. Specifically, CDNE uses marginal maximum mean discrepancy and class-conditional marginal maximum mean discrepancy, GraphAE considers the multiple kernel variant of maximum mean discrepancy as the distance metric, and GRADE defines and utilizes subtree discrepancy. JHGDA Shi et al. (2023) relies on a hierarchical pooling module to extract network hierarchies and minimizes statistical discrepancies in hierarchical representations via the exponential form of marginal and class-conditional maximum mean discrepancy. For non-trainable representations, for instance, the latent embeddings in SimpleGCN Wu et al. (2019a), SR-GNN Zhu et al. (2021a) employs an instance weighting technique in which the learnable weight parameters are optimized through kernel mean matching to alleviate the distribution discrepancies.
-
–
(2) Adversarial Learning aligns the representations by training the representation learner to generate embeddings that confuse the domain discriminator . Correspondingly, the regularization term in Equation 2 is usually framed as a minimax game between and as:
where denotes the domain label, and can be chosen as a negative distance loss Dai et al. (2022), or a domain classification loss Wu et al. (2019b, 2020); Shen et al. (2020a); Guo et al. (2023); Qiao et al. (2023). Instead of framing it as a minimax problem, authors Zhang et al. (2019) explore using two symmetric and adversarial losses to train the representation learner and domain classifier, aiming to achieve bi-directional transfer. Typically, adversarial alignment takes place in the final hidden layer, with the exception being GraphAE Guo et al. (2023), which aligns representations in all hidden layers. In addition, it is noteworthy that SGDA Qiao et al. (2023) also takes the label scarcity issue of the source graph into account by employing a weighted self-supervised pseudo-labeling loss.
-
–
(3) Disentangled Learning decomposes representations into several understandable components, with one of them being domain-invariant and related to semantic classification. The loss function for disentangled representation learning takes the form:
where denotes the representation learner for acquiring domain-invariant classification-related information, denotes representation learner(s) for other components excluding , represents a regularization term for enhancing the separation between different components, and denotes a reconstruction loss aiming to recover the original graph structure from the concatenated representation, thereby preventing information loss. Additional terms are introduced to facilitate the learning of disentangled representations, enabling specific components to exhibit desired characteristics. In ASN Zhang et al. (2021), the representation is decomposed into a domain-private part and a domain-invariant classification-related part. A domain adversarial loss is additionally added to facilitate the learning of invariant representations. Analogous to DIVA Ilse et al. (2020), DGDA Cai et al. (2024) assumes that the graph generation process is controlled independently by domain-invariant semantic latent variables, domain latent variables, and random latent variables. To learn representations with desired characteristics, domain classification loss and noise reconstruction loss are considered as the additional losses.
-
–
-
•
Concept-shift Aware Representation Learning extends beyond the scope of learning domain-invariant representations and takes the change of label function across domains into consideration. Domain-invariant representation learning that minimizes the empirical source risk and the marginal distribution discrepancy inherently relies on the covariate shift assumption about invariant , leading to the inestimable term in the generalization bound equal to zero Ben-David et al. (2006). However, as illustrated in Zhao et al. (2019), when there exist concept shifts in or , namely, the label function changes, the inestimable adaptability term in the upper bound Ben-David et al. (2006) may be large and the performance of domain-invariant representation learning methods on target is no longer guaranteed. A similar upper bound and an illustrative example are also provided in Liu et al. (2023), illustrating the insufficiency of domain-invariant representation learning.
To further accommodate the change in label function, SRNC Zhu et al. (2022) leverages graph homophily, incorporating a shift-robust classification GNN module and an unsupervised clustering GNN module to alleviate the distribution shifts in joint distribution . Notably, SRNC is also capable of handling the open-set setting where new classes emerge in the test data. In StruRW Liu et al. (2023), authors identify and then mitigate the conditional structure shifts . They adaptively adjust the weights of edges in the source graph during training to align the distribution of a source node’s neighborhood with that of target nodes from the same pseudo-class under the contextual stochastic block model. However, how to further align the shifts in and is left for future studies. Moreover, authors Zhu et al. (2023a) demonstrate, under contextual stochastic block model, that the conditional shifts in latent space can be exacerbated by both graph heterophily and the graph convolution in GCN compared with the conditional shifts in input feature space . Hence, they introduce GCONDA that explicitly matches the distribution of across domains via Wasserstein distance regularization, and additionally, they also propose GCONDA++ that jointly minimizes the discrepancy in and .
Model Regularization. Instead of focusing on the process of learning aligned representations, some other methods achieve effective knowledge transfer under distribution shifts through model regularization. Building on the derived GNN-based generalization bound, authors You et al. (2023) propose SSReg and MFRReg, which regularize the spectral properties of GNN to enhance transferability. They also extend their theoretical results to the semi-supervised setting with the challenging distribution shifts in . Both KDGA Wu et al. (2022b) and KTGNN Bi et al. (2023b) employ knowledge distillation, regularizing the Kullback–Leibler divergence between the outputs of teacher and student models. Particularly, KDGA aims to mitigate the negative augmentation problem by distilling the knowledge of a teacher model trained on augmented graphs to a partially parameter-shared student model on the original graph. KTGNN, on the other hand, considers the semi-supervised node classification problem for VS-Graph, in which vocal nodes are regarded as the source and silent nodes with incomplete features are regarded as the target. They apply a domain-adapted feature completion module and domain-adapted message-passing mechanism to learn representations that capture domain differences. Then, the source classifier and target classifier are respectively constructed, and the knowledge of both source and target classifiers is distilled into the student transferable classifier through the KL regularization.
Category | Name | Reference | Task Level | Distribution Shift | Objective | Supervision |
Domain-invariant Representation Learning | DAGNN | Wu et al. (2019b) | graph | Cross-graph transfer | unsupervised | |
DANE | Zhang et al. (2019) | node | Cross-graph transfer | unsupervised | ||
CDNE | Shen et al. (2020b) | node | Cross-graph transfer | semi-supervised | ||
ACDNE | Shen et al. (2020a) | node | Cross-graph transfer | unsupervised | ||
UDA-GCN | Wu et al. (2020) | node | Cross-graph transfer | unsupervised | ||
DGDA | Cai et al. (2024) | graph | Cross-graph transfer | unsupervised | ||
SR-GNN | Zhu et al. (2021a) | node | Observation bias correction | unsupervised | ||
ASN | Zhang et al. (2021) | node | Cross-graph transfer | unsupervised | ||
AdaGCN | Dai et al. (2022) | node | Cross-graph transfer | un/semi-supervised | ||
GraphAE | Guo et al. (2023) | node | Cross-graph transfer | unsupervised | ||
GRADE | Wu et al. (2023) | node / edge | Cross-graph transfer | unsupervised | ||
JHGDA | Shi et al. (2023) | node | Cross-graph transfer | unsupervised | ||
SGDA | Qiao et al. (2023) | node | Cross-graph transfer | unsupervised | ||
Concept-shift Aware Representation Learning | SRNC | Zhu et al. (2022) | node | Cross-graph transfer / Observation bias correction | unsupervised | |
StruRW | Liu et al. (2023) | node | Cross-graph transfer | unsupervised | ||
GCONDA++ | Zhu et al. (2023a) | node / graph | Cross-graph transfer / Observation bias correction | unsupervised | ||
Model Regularization | KDGA | Wu et al. (2022b) | node | Negative augmentation mitigation | semi-supervised | |
SS/MFR-Reg | You et al. (2023) | node / edge | Cross-graph transfer | un/semi-supervised | ||
KTGNN | Bi et al. (2023b) | node | Observation bias correction | semi-supervised | ||
Instance weighting | IW | Ye et al. (2013) | edge | Cross-graph transfer | semi-supervised | |
NES-TL | Fu et al. (2020) | node | Cross-graph transfer | semi-supervised | ||
RSS-GCN | Wu et al. (2022a) | graph | Cross-graph transfer | unsupervised | ||
DR-GST | Liu et al. (2022) | node | Negative augmentation mitigation | semi-supervised | ||
Graph Transformation | FakeEdge | Dong et al. (2022) | edge | Observation bias correction | unsupervised | |
Bridged-GNN | Bi et al. (2023a) | node | Cross-graph transfer | semi-supervised | ||
DC-GST | Wang et al. (2024) | node | Negative augmentation mitigation Observation bias correction | unsupervised |
3.2 Data-Centric Approaches
Instance Weighting. Instance Weighting, which assigns different weights for instances, is a commonly used data-centric technique in traditional transfer learning Zhuang et al. (2020). Similar strategies are observed in methods for training-time graph OOD adaptation. Borrowing the idea from Adaboost and TrAda, authors Ye et al. (2013) employ the instance weighting technique for the edge sign prediction task. The edge weights are adjusted in each iteration, with reduced weights assigned to misclassified dissimilar source instances to mitigate distribution shifts across graphs. Authors Liu et al. (2022) recognize that the distribution shifts between the original data and the augmented data with pseudo-labels may impede the effectiveness of self-training. To mitigate the gap between the original distribution and the shifted distribution, they assign weights to augmented node instances based on information gain, paying more attention to nodes with high information gain rather than those with high confidence. Both RSS-GCN and NES-TL consider the multi-source transfer problem, where multiple graphs are available as source. Since the source graphs may not be equally important for predictions on the target graph and some of them may be of poor quality, a weighting technique is employed to effectively combine the available source graphs. NES-TL Fu et al. (2020) proposes the NES index to quantitatively measure the structural similarity between two graphs, and use the NES-based scores as weights to ensemble weak classifiers trained on instances from each source graph and labeled target instances. RSS-GCN Wu et al. (2022a) utilizes reinforcement learning to select high-quality source graphs for multi-source transfer, aiming to minimize the distribution divergence between selected source graphs and target graphs. Such sample selection strategy can be considered as a special binary instance weighting.
Graph Transformation. Several authors have delved into the exploration of leveraging graph transformation strategies to alleviate the distribution shifts, through adding or removing edges. Authors Dong et al. (2022) find that the dataset shift challenge in edge prediction arises from the presence of links in training and the absence of link observations in testing. To tackle this challenge, they propose FakeEdge, a subgraph-based link prediction framework that intentionally adds or removes the focal link within the subgraph. This adjustment decouples the dual role of edges as elements in representation learning and as labels of links in link prediction, thereby ensuring that the subgraph is consistent across training and testing. Additionally, authors Bi et al. (2023a) reconsider the domain-level knowledge transfer problem as learning sample-wise knowledge-enhanced posterior distribution. They first learn the similarities of samples from both source and target graphs and build bridges between each sample and its similar samples containing valuable knowledge for prediction. A GNN model is then employed to transfer knowledge across source and target samples on the constructed bridged-graph. More recently, a novel framework called DC-GST Wang et al. (2024) has been introduced to bridge the distribution shifts between augmented training instances and test instances in self-training, which incorporates a distribution-shift-aware edge predictor to improve the model’s generalizability of assigning pseudo-labels. Furthermore, they employ the distribution consistency criterion and neighborhood entropy reduction criterion for the selection of pseudo-labeled nodes. In doing so, they aim to identify nodes that are not only informative but also effective in mitigating the distribution discrepancy between source and target.
4 Test-Time Graph OOD Adaptation
In this section, we concentrate on graph OOD adaptation during test time. In training-time graph OOD adaptation, both source and target instances need to be observed simultaneously. However, this may be unrealistic in various graph-related applications. For instance, in social networks, source data is typically confidential and inaccessible due to privacy protection purposes and data leakage concerns. Additionally, storing the complete source data on resource-limited devices may also be impractical. In contrast to training-time adaptation, test-time adaptation is not restricted by the availability of labeled source data and aims to adapt a pre-trained model to perform effectively on the target data. This form of adaptation, also known as source-free adaptation, plays a crucial role in scenarios where access to source data is restricted.
4.1 Model-Centric Approaches
Model Fine-tuning. Fine-tuning is a widely used approach to address graph distribution shifts during test time. However, effectively leveraging information from the pre-trained model presents challenges, generally in two scenarios. In the first scenario, the model is pre-trained to encode more transferable and generalizable structural information, and then task-related information and domain-specific node attributes are added during fine-tuning. Consequently, this scenario requires target labels, which are often very limited, and thus may lead to the overfitting problem. To tackle this challenge, authors Zhu et al. (2023b) propose GraphControl that incorporates target data as conditional inputs inspired by the success of ControlNet Zhang et al. (2023). The structural information is fed into a frozen pre-trained model, while a kernel matrix built on node features is fed into the trainable copy. The two components are connected through zero MLPs with gradually expanding parameters, aiming to prevent the harmful impact of noise in target node features while gradually integrating downstream information into the pre-trained model. In the second scenario, task-related information is encoded into the pre-trained model, and subsequently, an unsupervised fine-tuning procedure is applied. Yet, when tuning on the unsupervised task, the model may lose the discriminatory power related to the main task, or learn irrelevant information. In SOGA Mao et al. (2024), authors utilize a loss that maximizes the mutual information between the inputs and outputs of the model to enhance the discriminatory power. GT3 Wang et al. (2022) avoids overfitting to the downstream self-supervised task by adding regularization constraints between training and test output embeddings, enforcing their statistical similarity and avoiding substantial fluctuations. Furthermore, GAPGC Chen et al. (2022a) aims to address the over-confidence bias and the risk of capturing redundant information through the use of an adversarial pseudo-group contrast strategy. From the information bottleneck perspective, GAPGC provides a lower bound guarantee of the information relevant to the main task. Some relevant studies, such as GTOT-Tuning Zhang et al. (2022), which concentrates on transferring knowledge across tasks on the same graph during test time, also provide valuable insights and may have the potential for dealing with distribution shifts.
Parameter Sharing. Parameter sharing is a model design strategy that involves constructing models with both domain-shared and domain-specific parameters. In these approaches, domain-shared parameters are directly employed without retraining, and adjustments are only made to domain-specific parameters during test-time. GraphGLOW Zhao et al. (2023) is designed to integrate a shared graph structure learner and dataset-specific GNN heads for classification tasks in the cross-graph transfer setting. In testing, only the data-specific GNN is updated while the structure learner from the pre-trained model is directly applied. GT3 Wang et al. (2022) structures the model to include two branches: a main task (classification) branch and a self-supervised branch. The two branches share initial layers and have unique task-specific layers and parameters afterwards. During the training phase, all parameters are optimized using a combination of self-supervised loss and main task loss. In the test phase, the main task branch is utilized for prediction, in which the unique parameters of the branch remain unchanged, and the parameters in the initial layers are tuned based on the self-supervised task on the target graph.
4.2 Data-Centric Approaches
Feature Reconstruction. Feature reconstruction is a test-time feature manipulation strategy that addresses distribution shifts without adapting the model structure or retraining parameters. In Ding et al. (2023), the authors introduce FRGNN for semi-supervised node classification. They utilize an MLP to establish a mapping between the output and input space of the pre-trained GNN. Subsequently, using the encoded one-hot class vectors as inputs, the MLP generates class representative representations. By substituting the features of labeled test nodes with the representative representations of the corresponding classes and spreading the updated information to other unlabeled test nodes through message passing, the graph embedding bias between test nodes and training nodes is anticipated to be mitigated.
Graph Transformation. Apart from adjusting features, authors Jin et al. (2023) introduce a graph transformation framework called GTRANS to address distribution shifts during test time. The graph transformation is modeled as injecting perturbations on the graph structure and node features, which is subsequently optimized via a parameter-free surrogate loss. Theoretical analyses guiding the selection of surrogate loss functions are additionally provided by the authors.
It is worth highlighting that instead of modifying the pre-trained model, data-centric test-time graph OOD adaptation focuses on adjusting the test data, and is especially beneficial when handling large-scale pre-trained models.
5 Future Directions
Theoretical Study. Future theoretical analyses could delve deeper into the feasibility and effectiveness of graph OOD adaptation, particularly in scenarios where the label function changes between training and test data. There is also the need to develop theories and methodologies specifically tailored for graph data or graph models, taking the intricate structural information inherent in graphs into consideration. Furthermore, it’s worth exploring more diverse scenarios, such as universal domain adaptive node classification Chen et al. (2023), graph size adaptation Yehudai et al. (2021), and multi-source transfer. Notably, several generalization bounds are derived from the graph transferability evaluation perspective Ruiz et al. (2020), Zhu et al. (2021b), Chuang and Jegelka (2022), and may assist in selecting high-quality source graphs in the multi-source transfer setting. However, identifying the optimal combination of source graphs with theoretical guarantees remains an open problem.
Test-time Graph OOD Adaptation. Test-time adaptation has garnered increasing attention in traditional machine learning, yet relatively few works have been conducted for graph settings. The exploration and design of more graph-specific strategies remain crucial and promising. Besides, the rigorous theoretical analysis for test-time adaptation remains an open problem Liang et al. (2023), and addressing this gap could potentially inspire the development of innovative graph test-time OOD adaptation methods. Additionally, it is worth investigating whether recent advances in unsupervised test-time graph evaluation Zheng et al. (2023), Zheng et al. (2024) can contribute to facilitating test-time graph adaptation algorithms. Lastly, alleviating the computational cost of adapting a large pre-trained model, through data-centric graph transformation or graph prompt tuning Sun et al. (2023), also deserves more attention in the future.
Distribution Shifts on Complex Graphs. In contrast to the substantial efforts dedicated to addressing distribution shifts on regular graphs, studies on more complex graph types, such as spatial, temporal, spatial-temporal, heterogeneous, and dynamic graphs, have received comparatively less attention. Such complex graphs often exhibit diverse and dynamic patterns or involve entities and relationships of various types, introducing more intricate and nuanced distribution shifts. Furthermore, existing graph OOD adaptation methods have primarily been designed and evaluated on small networks, whereas these complex graph types, especially dynamic graphs, may be of large scale, highlighting the necessity for scalable and memory-efficient graph OOD adaptation methods. The comprehensive exploration and efficient mitigation of distribution shifts on complex graph types are pivotal for enhancing the capabilities of graph machine learning in broader scenarios, such as recommendation systems, healthcare systems, and traffic forecasting.
6 Conclusion
In this survey, we examine existing graph OOD adaptation methods, covering two problem scenarios including both training-time graph OOD adaptation and test-time graph OOD adaptation. Firstly, we establish problem definitions and explore different graph distribution shift types. Then, we discuss topics related to graph OOD adaptation and explain our categorization. Based on the proposed taxonomy, we systematically examine the techniques for mitigating distribution shifts in existing graph OOD adaptation methods. Finally, we highlight several challenges and future directions. We hope that this survey will help researchers better understand the current research progress in graph OOD adaptation.
References
- Ben-David et al. [2006] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. NeurIPS, 2006.
- Bi et al. [2023a] Wendong Bi, Xueqi Cheng, Bingbing Xu, Xiaoqian Sun, Li Xu, and Huawei Shen. Bridged-gnn: Knowledge bridge learning for effective knowledge transfer. In CIKM, 2023.
- Bi et al. [2023b] Wendong Bi, Bingbing Xu, Xiaoqian Sun, Li Xu, Huawei Shen, and Xueqi Cheng. Predicting the silent majority on graphs: Knowledge transferable graph neural network. In TheWebConf, 2023.
- Cai et al. [2024] Ruichu Cai, Fengzhu Wu, Zijian Li, Pengfei Wei, Lingling Yi, and Kun Zhang. Graph domain adaptation: A generative view. TKDD, 2024.
- Chen et al. [2022a] Guanzi Chen, Jiying Zhang, Xi Xiao, and Yang Li. Graphtta: Test time adaptation on graph neural networks. ICML, 2022.
- Chen et al. [2022b] Yongqiang Chen, Yonggang Zhang, Yatao Bian, Han Yang, MA Kaili, Binghui Xie, Tongliang Liu, Bo Han, and James Cheng. Learning causally invariant representations for out-of-distribution generalization on graphs. NeurIPS, 2022.
- Chen et al. [2023] Jushuo Chen, Feifei Dai, Xiaoyan Gu, Jiang Zhou, Bo Li, and Weipinng Wang. Universal domain adaptive network embedding for node classification. In MM, 2023.
- Chuang and Jegelka [2022] Ching-Yao Chuang and Stefanie Jegelka. Tree mover’s distance: Bridging graph metrics and stability of graph neural networks. NeurIPS, 2022.
- Dai et al. [2022] Quanyu Dai, Xiao-Ming Wu, Jiaren Xiao, Xiao Shen, and Dan Wang. Graph transfer learning via adversarial domain adaptation with graph convolution. TKDE, 2022.
- Ding et al. [2023] Rui Ding, Jielong Yang, Feng Ji, Xionghu Zhong, and Linbo Xie. Distribution shift mitigation at test time with performance guarantees. arXiv, 2023.
- Dong et al. [2022] Kaiwen Dong, Yijun Tian, Zhichun Guo, Yang Yang, and Nitesh Chawla. Fakeedge: Alleviate dataset shift in link prediction. In LoG, 2022.
- Fang et al. [2022] Yuqi Fang, Pew-Thian Yap, Weili Lin, Hongtu Zhu, and Mingxia Liu. Source-free unsupervised domain adaptation: A survey. arXiv, 2022.
- Fu et al. [2020] Chenbo Fu, Yongli Zheng, Yi Liu, Qi Xuan, and Guanrong Chen. Nes-tl: Network embedding similarity-based transfer learning. TNSE, 2020.
- Guo et al. [2023] Gaoyang Guo, Chaokun Wang, Bencheng Yan, Yunkai Lou, Hao Feng, Junchao Zhu, Jun Chen, Fei He, and Philip Yu. Learning adaptive node embeddings across graphs. TKDE, 2023.
- Hamilton et al. [2017] Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. NeurIPS, 2017.
- Ilse et al. [2020] Maximilian Ilse, Jakub M Tomczak, Christos Louizos, and Max Welling. Diva: Domain invariant variational autoencoders. In MIDL, 2020.
- Jin et al. [2023] Wei Jin, Tong Zhao, Jiayuan Ding, Yozen Liu, Jiliang Tang, and Neil Shah. Empowering graph representation learning with test-time graph transformation. ICLR, 2023.
- Ju et al. [2023] Mingxuan Ju, Tong Zhao, Wenhao Yu, Neil Shah, and Yanfang Ye. Graphpatcher: Mitigating degree bias for graph neural networks via test-time augmentation. NeurIPS, 2023.
- Kipf and Welling [2017] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. ICLR, 2017.
- Kose and Shen [2022] O Deniz Kose and Yanning Shen. Fair node representation learning via adaptive data augmentation. arXiv, 2022.
- Kull and Flach [2014] Meelis Kull and Peter Flach. Patterns of dataset shift. In LMCE, 2014.
- Li et al. [2022] Haoyang Li, Xin Wang, Ziwei Zhang, and Wenwu Zhu. Out-of-distribution generalization on graphs: A survey. arXiv, 2022.
- Liang et al. [2023] Jian Liang, Ran He, and Tieniu Tan. A comprehensive survey on test-time adaptation under distribution shifts. arXiv, 2023.
- Liu et al. [2022] Hongrui Liu, Binbin Hu, Xiao Wang, Chuan Shi, Zhiqiang Zhang, and Jun Zhou. Confidence may cheat: Self-training on graph neural networks under distribution shift. In TheWebConf, 2022.
- Liu et al. [2023] Shikun Liu, Tianchun Li, Yongbin Feng, Nhan Tran, Han Zhao, Qiang Qiu, and Pan Li. Structural re-weighting improves graph domain adaptation. In ICML, 2023.
- Mao et al. [2024] Haitao Mao, Lun Du, Yujia Zheng, Qiang Fu, Zelin Li, Xu Chen, Shi Han, and Dongmei Zhang. Source free unsupervised graph domain adaptation. WSDM, 2024.
- Moreno-Torres et al. [2012] Jose G Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V Chawla, and Francisco Herrera. A unifying view on dataset shift in classification. Pattern Recognit, 2012.
- Qiao et al. [2023] Ziyue Qiao, Xiao Luo, Meng Xiao, Hao Dong, Yuanchun Zhou, and Hui Xiong. Semi-supervised domain adaptation in graph transfer learning. IJCAI, 2023.
- Ruiz et al. [2020] Luana Ruiz, Luiz Chamon, and Alejandro Ribeiro. Graphon neural networks and the transferability of graph neural networks. NeurIPS, 2020.
- Shen et al. [2020a] Xiao Shen, Quanyu Dai, Fu-lai Chung, Wei Lu, and Kup-Sze Choi. Adversarial deep network embedding for cross-network node classification. In AAAI, 2020.
- Shen et al. [2020b] Xiao Shen, Quanyu Dai, Sitong Mao, Fu-lai Chung, and Kup-Sze Choi. Network together: Node classification via cross-network deep network embedding. TNNLS, 2020.
- Shi et al. [2023] Boshen Shi, Yongqing Wang, Fangda Guo, Jiangli Shao, Huawei Shen, and Xueqi Cheng. Improving graph domain adaptation with network hierarchy. In CIKM, 2023.
- Sun et al. [2023] Xiangguo Sun, Jiawen Zhang, Xixi Wu, Hong Cheng, Yun Xiong, and Jia Li. Graph prompt learning: A comprehensive survey and beyond. arXiv, 2023.
- Wang et al. [2022] Yiqi Wang, Chaozhuo Li, Wei Jin, Rui Li, Jianan Zhao, Jiliang Tang, and Xing Xie. Test-time training for graph neural networks. arXiv, 2022.
- Wang et al. [2024] Fali Wang, Tianxiang Zhao, and Suhang Wang. Distribution consistency based self-training for graph neural networks with sparse labels. WSDM, 2024.
- Wu et al. [2019a] Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. In ICML, 2019.
- Wu et al. [2019b] Man Wu, Shirui Pan, Xingquan Zhu, Chuan Zhou, and Lei Pan. Domain-adversarial graph neural networks for text classification. In ICDM, 2019.
- Wu et al. [2020] Man Wu, Shirui Pan, Chuan Zhou, Xiaojun Chang, and Xingquan Zhu. Unsupervised domain adaptive graph convolutional networks. In TheWebConf, 2020.
- Wu et al. [2022a] Bo Wu, Xun Liang, Xiangping Zheng, Jun Wang, and Xiaoping Zhou. Reinforced sample selection for graph neural networks transfer learning. In BIBM, 2022.
- Wu et al. [2022b] Lirong Wu, Haitao Lin, Yufei Huang, and Stan Z Li. Knowledge distillation improves graph structure augmentation for graph neural networks. NeurIPS, 2022.
- Wu et al. [2022c] Qitian Wu, Hengrui Zhang, Junchi Yan, and David Wipf. Handling distribution shifts on graphs: An invariance perspective. ICLR, 2022.
- Wu et al. [2023] Jun Wu, Jingrui He, and Elizabeth Ainsworth. Non-iid transfer learning on graphs. In AAAI, 2023.
- Xia et al. [2022] Jun Xia, Yanqiao Zhu, Yuanqi Du, and Stan Z Li. A survey of pretraining on graphs: Taxonomy, methods, and applications. arXiv, 2022.
- Ye et al. [2013] Jihang Ye, Hong Cheng, Zhe Zhu, and Minghua Chen. Predicting positive and negative links in signed social networks by transfer learning. In TheWebConf, 2013.
- Yehudai et al. [2021] Gilad Yehudai, Ethan Fetaya, Eli Meirom, Gal Chechik, and Haggai Maron. From local structures to size generalization in graph neural networks. In ICML, 2021.
- You et al. [2023] Yuning You, Tianlong Chen, Zhangyang Wang, and Yang Shen. Graph domain adaptation via theory-grounded spectral regularization. In ICLR, 2023.
- Yu et al. [2023] Zhiqi Yu, Jingjing Li, Zhekai Du, Lei Zhu, and Heng Tao Shen. A comprehensive survey on source-free domain adaptation. arXiv, 2023.
- Zhang et al. [2019] Yizhou Zhang, Guojie Song, Lun Du, Shuwen Yang, and Yilun Jin. Dane: Domain adaptive network embedding. IJCAI, 2019.
- Zhang et al. [2021] Xiaowen Zhang, Yuntao Du, Rongbiao Xie, and Chongjun Wang. Adversarial separation network for cross-network node classification. In CIKM, 2021.
- Zhang et al. [2022] Jiying Zhang, Xi Xiao, Long-Kai Huang, Yu Rong, and Yatao Bian. Fine-tuning graph neural networks via graph topology induced optimal transport. In IJCAI, 2022.
- Zhang et al. [2023] Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
- Zhao et al. [2019] Han Zhao, Remi Tachet Des Combes, Kun Zhang, and Geoffrey Gordon. On learning invariant representations for domain adaptation. In ICML, 2019.
- Zhao et al. [2023] Wentao Zhao, Qitian Wu, Chenxiao Yang, and Junchi Yan. Graphglow: Universal and generalizable structure learning for graph neural networks. KDD, 2023.
- Zheng et al. [2023] Xin Zheng, Miao Zhang, Chunyang Chen, Soheila Molaei, Chuan Zhou, and Shirui Pan. Gnnevaluator: Evaluating gnn performance on unseen graphs without labels. NeurIPS, 2023.
- Zheng et al. [2024] Xin Zheng, Dongjin Song, Qingsong Wen, Bo Du, and Shirui Pan. Online gnn evaluation under test-time graph distribution shifts. ICLR, 2024.
- Zhu et al. [2021a] Qi Zhu, Natalia Ponomareva, Jiawei Han, and Bryan Perozzi. Shift-robust gnns: Overcoming the limitations of localized graph training data. NeurIPS, 2021.
- Zhu et al. [2021b] Qi Zhu, Carl Yang, Yidan Xu, Haonan Wang, Chao Zhang, and Jiawei Han. Transfer learning of graph neural networks with ego-graph information maximization. NeurIPS, 2021.
- Zhu et al. [2022] Qi Zhu, Chao Zhang, Chanyoung Park, Carl Yang, and Jiawei Han. Shift-robust node classification via graph adversarial clustering. NeurIPS, 2022.
- Zhu et al. [2023a] Qi Zhu, Yizhu Jiao, Natalia Ponomareva, Jiawei Han, and Bryan Perozzi. Explaining and adapting graph conditional shift. arXiv, 2023.
- Zhu et al. [2023b] Yun Zhu, Yaoke Wang, Haizhou Shi, Zhenshuo Zhang, and Siliang Tang. Graphcontrol: Adding conditional control to universal graph pre-trained models for graph domain transfer learning. arXiv, 2023.
- Zhuang et al. [2020] Fuzhen Zhuang, Zhiyuan Qi, Keyu Duan, Dongbo Xi, Yongchun Zhu, Hengshu Zhu, Hui Xiong, and Qing He. A comprehensive survey on transfer learning. Proc. IEEE, 2020.