A Survey on Graph Condensation

Hongjia Xu¹ Liangliang Zhang² Yao Ma² Sheng Zhou¹ Corresponding author Zhuonan Zheng¹ Bu Jiajun¹ ¹Zhejiang University
²Rensselaer Polytechnic Institute
{xu_hj, zhousheng_zju, bjj, zhengzn}@zju.edu.cn, [email protected], [email protected]

Abstract

Analytics on large-scale graphs have posed significant challenges to computational efficiency and resource requirements. Recently, Graph condensation (GC) has emerged as a solution to address challenges arising from the escalating volume of graph data. The motivation of GC is to reduce the scale of large graphs to smaller ones while preserving essential information for downstream tasks. For a better understanding of GC and to distinguish it from other related topics, we present a formal definition of GC and establish a taxonomy that systematically categorizes existing methods into three types based on its objective, and classify the formulations to generate the condensed graphs into two categories as modifying the original graphs or synthetic completely new ones. Moreover, our survey includes a comprehensive analysis of datasets and evaluation metrics in this field. Finally, we conclude by addressing challenges and limitations, outlining future directions, and offering concise guidelines to inspire future research in this field.

1 Introduction

Graph data, representing relationships and interactions between entities, are ubiquitous in various domains including social networks, biological systems, and recommendation systems. Information and patterns in those scenarios have been modeled as nodes and edges, and there has been significant progress in the development of techniques for large-scale graph data mining and pattern recognition.

However, analyzing and processing large-scale graphs pose significant challenges to computational efficiency and resource requirements Duan et al. (2022). Recently, the dataset distillation Yu et al. (2023) has attracted increasing attention and achieved success mainly in vision datasets. Conventional dataset distillation relies on the idea that within categories defined by class labels, instances of the same class share similar key features, e.g. shape patterns in vision datasets. This implies the existence of ’prototypes’ or ’clustering centers’, and thus a significant amount of redundant information exists among instances belonging to the same category. Similarly in graph datasets, e.g., in node classification tasks, the features and topology of nodes within the same class are similar, and there may be numerous repetitive and similar subgraph structures in the graph. Thus a natural question arises: How can we effectively formulate small-scale graphs from large-scale graphs to facilitate various graph data mining tasks?

Graph Condensation (GC), has been emerged for distilling large-scale graphs into smaller yet informative new ones. By eliminating redundant information, GC makes the graph more manageable within the constraints of limited computation resources, thereby providing better support for graph data mining tasks and applications such as Continual learning Liu et al. (2023b) and Network Architecture Search (NAS) Gao et al. (2021), etc. Moreover, take node classification as an example, the reason a node can be well classified is that GNNs have learned to capture the unique pattern of nodes to distinguish them from other nodes in different classes. Analogous to the attention heat map in the field of image classification (e.g. in Lapuschkin et al. (2019)), the visualization of patterns that GNNs learned can be accomplished by GC.

While the concept aligns with vision dataset distillation, the specific challenges posed by the uniqueness of graph data motivate us to: (1) Address the lack of universal definitions and (2) Explore and synthesize the existing knowledge in this domain to a comprehensive survey.

Refer to caption — Figure 1: Overview of GC. GNNs stand for any graph machine learning model with different architectures like GCN, GAT, …, etc.

Scope of this paper.

To enhance the understanding of GC, we first propose a refined definition in section 2.2, which clarifies the unique aspects of GC and highlights its divergence from vanilla dataset distillation. Under our definition, some methods of related topics are included in our discussion due to a shared motivation, i.e., facilitating graph data analytics by reducing the volume of graph data. Thus, the proposed taxonomy of GC is designed to encompass diverse techniques with various applications, and our discussion extends to the consideration of various potential issues within GC. Last but not least, we present various application scenarios and outline future directions. Our contributions are as follows:

•

We present a formal definition of GC and systematically categorize existing methods into three types: graph-guided, model-guided, and hybrid. Through a detailed analysis, we classify the formulations of condensed graphs into modification and synthetic categories.
•

We provide a summary of essential datasets, and conduct a thorough analysis of evaluation metrics and applications.
•

We delve into the limitations and challenges of GC methods from a broader perspective, presenting future directions, and thus

inspires future work in GC.

2 Preliminary

2.1 Notations and Definition

For any matrix, the symbol $\top,-,+$ represents the operations of transpose, inverse, and pseudoinverse, respectively. Consider $\mathcal{S}=\{\mathcal{G}_{1},\cdots,\mathcal{G}_{\eta}\}$ denote a dataset of $\eta$ graph(s), where $\eta\in\mathbb{N}$ , $\mathcal{G}=\{\mathcal{V},\mathcal{E},{\bf A},{\bf X}\}$ , $\mathcal{V}$ and $\mathcal{E}$ denotes the set of vertices (nodes) and edges, ${\bf A}\in\mathbb{R}^{N\times N}$ is the adjacency matrix, and ${\bf X}\in\mathbb{R}^{N\times d}$ represents the feature matrix. ${\bf L}_{\mathcal{G}}$ is the corresponding Laplacian matrix of $\mathcal{G}$ . $\mathcal{Y}\in\{1,\cdots,C\}^{M}$ denotes the set of labels of the nodes or edges if $\eta=1$ , or the label of graphs if $\eta>1$ , $[\mathcal{Y}]\in\mathbb{N}^{M\times 1}$ be its vector form. The downstream tasks are specifically defined by $M=N$ for node classification, $M=N^{2}$ for link prediction, and $M=\eta$ for graph classification. $\operatorname{GNN}(;{\theta}_{\mathcal{S}}):\mathcal{S}\rightarrow\mathcal{Y}$ denotes the GNN parameterized with ${\theta}_{\mathcal{S}}$ was trained on dataset $\{\mathcal{S},\mathcal{Y}\}$ .

2.2 Problem Defination

Denote $\mathcal{S}_{c}=\{{\mathcal{G}_{c}}_{1},\cdots,{\mathcal{G}_{c}}_{\mu}\}$ a smaller dataset of $\mu$ graph(s), $\mu\leq\eta$ , and $\mathcal{G}_{c}=\{\mathcal{V}^{\prime},\mathcal{E}^{\prime},{\bf A}^{\prime},{\bf X}^{\prime}\}$ where the first dimension of $\mathcal{V}^{\prime},{\bf A}^{\prime}$ , and ${\bf X}^{\prime}$ are $N^{\prime}\left(N^{\prime}\ll N\right)$ , $\mathcal{Y}^{\prime}\in\{1,\cdots,C\}^{M^{\prime}}$ be the labels of $\mathcal{G}_{c}$ . The conventional definition of GC Gao et al. (2024) is to directly borrow the definition of the dataset distillation problem, considering graphs as input and output:

\mathcal{S}_{c}=\arg\min_{\mathcal{S}_{c}}\mathcal{L}_{\text{cond }}(GNN(\{\mathcal{S}_{c},\mathcal{Y}^{\prime}\},\{\mathcal{S},\mathcal{Y}\};\theta))

(1)

where $\mathcal{L}_{\text{cond }}$ is the optimization objective of condensation.

In contrast, we define Graph Condensation ( $GC():\mathcal{S}\rightarrow\mathcal{S}_{c}$ ), a class of methods aimed to scaling large-scale graphs to smaller yet informative NEW ones. We call:

	$\displaystyle\mathcal{G}_{c}$	$\displaystyle\text{ as the {Condensed} graph only if }\mathcal{V}^{\prime}\nsubseteq\mathcal{V}.$		(2)
	$\displaystyle\mathcal{S}_{c}$	$\displaystyle\text{ as the {Condensed} graphs only if }\mathcal{S}_{c}^{\prime}\nsubseteq\mathcal{S}.$		(2)

In this paper, we will focus on condensing a single graph, while methods for multiple will be briefly introduced. To ensure the condensed graphs are informative, their formulations are parameterized through an optimization process:

\mathcal{G}_{c}=f(\mathcal{G};{\bf{W}}),{\bf{W}}=\arg\min_{{\bf W}}\mathcal{O}(\phi({\mathcal{G}_{c}},\mathcal{Y}^{\prime}),\phi({\mathcal{G},\mathcal{Y}}))

(3)

•

Condensation Objective $\mathcal{O}$ describes the loss of graph information, which is quantified by function $\phi$ ;
•

${\bf W}$ is optimized by minimizing the objective $\mathcal{O}$ .
•

The Formulation function $f()$ describes how we formulate the condense graphs, which is parameterized by ${\bf{W}}$ ;

The three steps for formulating condensed graphs correspond to the three steps in the GC workflow as illustrated in fig. 1. Under our definition, specifying what information to preserve, denoted as $\phi$ , is crucial as the primary motivation is to reduce the scale of graph data while preserving sufficient information. The details of the optimization objectives be seen in section 3, and the formulations will be presented in section 4. We categorize current methods by the taxonomy of objectives and present the formulations of each in table 1.

3 Condensation Objective

We categorize the condensing objectives into three types: preserving certain properties of the graph (graph guided), retaining the GNNs’ capabilities for downstream tasks (model guided), or simultaneously accomplishing both (hybrid).

3.1 Graph Property Guided Methods

These kinds of objectives can be formulated as: Get similar & smaller graph $\mathcal{G}_{c}$ from original graph $\mathcal{G}$ , and the key is defining what are graph properties and how to evaluate the similarity between two graphs based on these properties. In this case, $\phi$ be graph property extractor, and $\mathcal{O}$ be the corresponding similarity/distance function. We further categorize these objectives into two categories: Spectral and Spacial, by the domain of extracted information. Specifically:

Spectral Property Guided Methods.

The Spectral GNNs Chen et al. (2023) are defined by operators in the Spectral domain. Similarly, we define the Spectral properties of a graph by requiring the graph Laplacian for calculations, i.e., $\phi(\mathcal{G})=\phi({\bf L}_{\mathcal{G}})$ . In this case, the objective becomes minimizing the Distance of two graphs in the Spectral domain (DSpe), i.e., $\mathcal{O}=DSpe(\phi({\bf L}_{\mathcal{G}_{c}}),\phi({\bf L}_{\mathcal{G}}))$ .

The direct use of Laplacian eigenvalue and eigenvectors can be seen in this domain, as well as the Laplacian Energy Distribution (LED) used in Yang et al. (2023), which was derived from graph Laplacian. However, the difference in graph scale before and after condensation may need cross-dimension metrics, such as having a distinct number of eigenvalues and eigenvectors. Specifically, Loukas (2019); Huang et al. (2021) calculate the differences of smallest $k$ eigenvalues and corresponding eigenvectors between two graphs’ Laplacian; Liu et al. (2023a) take $k$ eigenvectors with the smallest eigenvalues to map node features, and minimize the distances of class centers in the spectral space. This is because the smaller the eigenvalue is, the more informative it and its corresponding eigenvector are on graph laplacianDas (2004). Jin et al. (2020) Further propose graph lifting to rescale the small graph to a large one, and thus comparing all eigenvalues. Another issue is the efficiency concern, and all methods mentioned above have their Laplacian approximation design, which will not be discussed here.

Despite direct optimization of spectral properties, Jin et al. (2020); Deng et al. (2020) identify similar nodes to Aggregate in spectral embedding space, which can be seen as indirectly minimizing the spectral similarity of two graphs. Notably, minimizing Graph similarity metrics based on spectral-GNNs, e.g. in Liu et al. (2023b), is also considered as spectral property guided objectives with $\phi(\mathcal{G})=GNN(\mathcal{G})$ , and similarly, the use of spatial-GNNs for similarity metrics applies to the subsequent definition of Spacial ones. The GNNs we discussed here were not trained on downstream tasks and only served as information extractors.

Spacial Property Guided Methods.

The spatial domain of a graph is the original topology and node features, i.e., $\phi(\mathcal{G})=\phi(\{{\bf A},{\bf X}\})$ . Objective becomes minimizing the Distance of two graphs in the Spacial domain (DSpa), i.e., $\mathcal{O}=DSpa(\phi(\{{\bf A^{\prime}},{\bf X^{\prime}}\}),\phi(\{{\bf A},{\bf X}\}))$ . Specifically:

Graph Statistic properties ( $\phi(\mathcal{G})=Statistics({\bf A})$ ) like graph density, average degree and degree variance employed in Xu et al. (2023a), and feature homophily ( $\phi(\mathcal{G})=Statistics(\{{\bf A},{\bf X}\})$ ) in Kumar et al. (2023). Moreover, Structural Equivalence Matching (SEM) and Normalized Heavy Edge Matching (NHEM) used in Liang et al. (2021) can identify Topologically Redundant nodes. Selecting nodes by Ranking them through score functions can be effective in well-designed scenarios Wei et al. (2023), while simple selection without aggregation will not be further discussed for not fitting our definition.

A special class of Reconstruction like objectives is also considered this category of objectives because a successful reconstruction can be seen as a successful preservation of graph information. In this case, $\mathcal{O}=\mathcal{L}_{reconstruct}(\mathcal{G}_{c},\mathcal{G})$ . For example, metrics on reconstructed node features are used in Ma and Chen (2021); Kumar et al. (2023), and metrics on reconstructing the whole graph in Gao et al. (2023).

3.2 Model Capability Guided Methods

Since the ultimate objective is to achieve comparable performance via training models (include but not limited to GNNs) on smaller graphs $\mathcal{S}_{c}$ , the models trained on the original graphs $\mathcal{S}$ can be useful. By expecting GNNs to achieve comparable results as those trained on the original dataset $\mathcal{S}$ through training on the condensed ones $\mathcal{S}_{c}$ , we write:

		$\displaystyle\min_{\mathcal{G}_{c}}\mathcal{L}\left(\mathrm{Model}(S;{\boldsymbol{\theta}_{\mathcal{G}_{c}}}),\mathcal{Y}\right)$		(4)
	s.t	$\displaystyle\boldsymbol{\theta}_{\mathcal{G}_{c}}=\underset{\boldsymbol{\theta}}{\arg\min}\mathcal{L}\left(\mathrm{Model}\left(\mathcal{G}_{c};{\boldsymbol{\theta}}\right),\mathcal{Y}^{\prime}\right),$		(4)

where $\mathcal{L}$ is the task-specific loss (performance) function.

In this case, $\phi(\{\mathcal{G},\mathcal{Y}\})=\phi(\mathrm{Model}(;{\boldsymbol{\theta}_{\mathcal{G}}}))$ , and $\mathcal{O}=D(\phi(\mathrm{Model}(;{\boldsymbol{\theta}_{\mathcal{G}_{c}}})),\phi(\mathrm{Model}(;{\boldsymbol{\theta}_{\mathcal{G}}})))$ , $D$ is a distance function. We categorize all objectives that utilize such trained model as input as model-guided objectives. Specifically:

Gradient.

Starting from Jin et al. (2021), training trajectory matching, i.e., aligning the Gradient of models’ parameters, has become one of the mainstream objectives in this category as Jin et al. (2022); Li et al. (2023); Yang et al. (2023); Zheng et al. (2023); Gao et al. (2023); Gao and Wu (2023) did. Despite the specific model and tasks, these methods treat condensed graphs as optimization parameters to simulate steps of the models’ training between the original graph and the condensed graph, and Li et al. (2023); Mao et al. (2023) further introduce adversarial training for optimization. By doing so, the models trained on the condensed graph align well with the original models, maximizing the preservation of models’ performances on specific tasks.

Loss Value.

Instead of applying GNNs as a black-box model, Xu et al. (2023b); Wang et al. (2023) aim to obtain the exact solution of the classification model from data. Specifically, kernel ridge regression was selected as the classifier model, and the Loss value evaluating the performance of the condensed model on original data was optimized.

Embedding and Logits.

The outputs of an instance through trained network typically integrate crucial information for downstream tasks, and is thus considered informative. Specifically, matching models’ output Embeddings of training instances is used in Liu et al. (2022b, 2023b); Dickens et al. (2023), and Predicted Logits based uncertainty metric was used in Liu et al. (2023a); Xu et al. (2023a).

3.3 Hybrid Methods

It is worth mentioning that the aforementioned two types of objectives are not mutually conflicting. Therefore, the third category named hybrid methods combines both the graph properties and model capabilities as guidance during condensation simultaneously. There are methods such as Yang et al. (2023); Gao et al. (2023); Liu et al. (2023a); Xu et al. (2023a) that optimize the condensed graph from both graph-guided and model-guided objectives. Specifically, Yang et al. (2023) simultaneously match the train trajectory between two models and the Laplacian Energy Distribution between two graphs, Gao et al. (2023) optimize the training trajectory loss and reconstruction loss together, Liu et al. (2022b) perform eigenbasis and training trajectory matching in the same time and Xu et al. (2023a) take the model predicted uncertainty and empirically verified useful graph properties to rank graph training instances for selection.

3.4 Comparison of Objectives

Three types of objectives, namely graph-guided, model-guided, and hybrid, each with its advantages and drawbacks: To produce ’similar’ condensed graphs, graph-guided objectives focus on preserving the properties of the original graph. This is suitable for applications that require retaining the patterns from original graphs. However, they are not guided by downstream tasks and hence may not be the optimal solution. On the other hand, the model-guided objectives aim to maintain the performance of the model by optimizing the condensed graph. These methods are driven by motivation-oriented optimization and thus perform exceptionally well in predefined scenarios. However, it may result in overfitting, reducing the adaptability of condensed graphs to other models or tasks. Hybrid methods combine the advantages of both graph-guided and model-guided approaches, intending to retain model performance while preserving graph properties for scenarios that value both graph property and model performance. However, balancing between the two objectives as well as optimizing them can be challenging.

In conclusion, the choice of the appropriate objective depends on the specific requirements of the application. Graph-guided is more suitable for tasks emphasizing graph structure, model-guided applies to scenarios emphasizing model performance, and the hybrid method seeks a balance between the two. Considering the goals of the task and the characteristics of the graph, selecting the most suitable method requires careful consideration in practical applications.

4 Formulation of Condensed Graphs

Here comes the question: how do we formulate each component of the condensed graph $\mathcal{G}_{c}$ ? Since the condensed graphs $\{\mathcal{G}_{c},\mathcal{Y}^{\prime}\}=\{\{{\bf A}^{\prime},{\bf X}^{\prime}\},\mathcal{Y}^{\prime}\}$ , therefore, $f(G)$ pertains to formulating these three components. We write: ${\bf A}^{\prime}=f_{\bf A}(G;{\bf W})$ , ${\bf X}^{\prime}=f_{\bf X}(G;{\bf W})$ , and $\mathcal{Y}^{\prime}=f_{\mathcal{Y}}(G;{\bf W})$ to formulate each. As we conclude, there are two main classes: the Modification and the Synthetic formulation. Specifically:

4.1 Modification formulation

Modification approaches encompass actions such as node aggregation and deletion, etc., where the condensed graph is the product of modifying the original graph. This category of formulations can be uniformly formalized as aggregating nodes from $\mathcal{G}$ to $\mathcal{G}_{c}$ . Assuming each node $v^{\prime}_{i}\in V^{\prime}$ is aggregated from $k$ nodes in $\mathcal{G}$ , $k\in\mathbb{N}$ , then the most common scheme, e.g., Loukas (2019); Deng et al. (2020); Jin et al. (2020); Huang et al. (2021); Ma and Chen (2021); Kumar et al. (2023); Dickens et al. (2023); Gao et al. (2023) did, was:

	$\displaystyle f_{\bf A}(\mathcal{G};{\bf P})$	$\displaystyle={\bf P^{T}AP}\text{ },\text{ }f_{\bf X}(\mathcal{G};{\bf P})={\bf P^{+}X},$		(5)
	$\displaystyle f_{\bf\mathcal{Y}}(\mathcal{G};{\bf P})$	$\displaystyle=\arg\max{\bf P^{+}}{[\mathcal{Y}]}$		(5)

$P\in\mathbb{R}^{N\times N^{\prime}}$ is defined as a projection matrix, indicating that nodes $\mathcal{V}_{(i)}$ in $\mathcal{G}$ were aggregated to a new node $v^{\prime}_{i}$ in $\mathcal{G}_{c}$ :

{\bf P}_{i,j}=\begin{cases}1&\text{ if }v^{\prime}_{j}\in\mathcal{V}_{(i)}\\ 0&\text{ otherwise }\end{cases}

(6)

In a general definition, each row of ${\bf P}$ may contain an uncertain number of nonzero entries, ranging from none (the node is considered dropped, e.g., Luo et al. (2021)) to one (the node is aggregated once) and even multiple (communities have overlapping issues). No paper has yet delved into the discussion of community overlapping in this field, however, this scenario can also be included within our formulation.

4.2 Synthetic formulation

Synthetic approaches, on the other hand, take the condensed graphs as parameters and directly optimize them by minimizing specific objective functions. We further divide this formulation into three strategies: Predefined, Joint Optimization, and Sequential Optimization. Specifically:

Predefined.

This kind of strategy is undoubtedly the most straightforward yet most tricky one. Two popular strategies are used in the literature: predefine ${\bf A}^{\prime}={\bf I}$ ( ${\bf I}$ is the identity matrix) in Zheng et al. (2023); Liu et al. (2023b) and predefine $\mathcal{Y}^{\prime}=Sample(\mathcal{Y})$ . The former can be interpreted as the goal of graph condensation being solely to learn the prototype embeddings for each class, at which point the topology information has already been integrated and is no longer necessary. The latter can be explained as achieving the same label distribution between the graph before and after condensation by employing a uniform sampling of labels. The tricky initialization of parameters to optimize is also included as Predefined strategies, which will not be discussed further here.

Joint Optimization.

Methods in this category, e.g., Jin et al. (2022); Xu et al. (2023b); Liu et al. (2023a), are the most simple yet the most challenging ones, where the condensed graph (topology ${\bf A}^{\prime}$ and node features ${\bf X}^{\prime}$ ) is considered as parameters for the optimization objective, and the node labels $\mathcal{Y}^{\prime}$ were often predefined by sampling the original labels $\mathcal{Y}$ . In conclusion, the formulations are given by:

\displaystyle f_{\bf A}(;{\bf{A}^{\prime}})

\displaystyle={\bf A}^{\prime}\text{ },\text{ }f_{\bf X}(;{\bf{X}^{\prime}})={\bf X}^{\prime}\text{ },\text{ }f_{\mathcal{Y}}({\mathcal{Y}};)\subseteq{\mathcal{Y}}

(7)

In this scenario, ${\bf W}=\{{\bf A^{\prime}},{\bf X^{\prime}}\}$ are parameters to be optimized. So far in the literature, they invariably need to predefine the labels for the condensed graph, e.g., uniform sampling to keep label distribution unchanged. Therefore, this kind of strategy can be perceived as generating dual features for each class: node features and their topological connection.

Sequential Optimization.

The existence of this strategy is typically regarded as a compromise in the challenge of joint optimization: if the complete condensed graph, encompassing both matrix ${\bf A}^{\prime}$ and vector ${\bf X}^{\prime}$ , is regarded as optimization parameters, the dimensionality of the parameter space escalates significantly, introducing challenges to the convergence of optimization objective. Therefore, optimizing part of the condensed graph first, and constructing the rest parts to complete the condensation can be an efficient solution. Specifically, Jin et al. (2021); Liu et al. (2022b); Wang et al. (2023); Gao and Wu (2023); Yang et al. (2023) first generate the node embedding via objective optimization, and construct the topology by MLPs according to the generated node features (Gao and Wu (2023); Yang et al. (2023) further introduce original topology ${\bf A}$ to help generate the condensed graph topology ${\bf A}^{\prime}$ ); and Huang et al. (2021); Dickens et al. (2023) modify the condensed graph first, and determine the node label by the majority of aggregated original nodes. The formulation is given by:

\displaystyle f_{\bf A}({\bf{X}^{\prime}};{\bf\omega})

\displaystyle=g({\bf{X}^{\prime}};{\bf\omega})\text{ },\text{ }f_{\bf X}(;{\bf{X}^{\prime}})={\bf{X}^{\prime}}\text{ },\text{ }f_{\mathcal{Y}}({\mathcal{Y}};)\subseteq{\mathcal{Y}}

(8)

$g()$ can be MLPs, etc., and in this case, ${\bf W}=\{{\bf X^{\prime}},{\bf\omega}\}$ .

Among the existing literature, as the optimization of ${\bf X}^{\prime}$ also needs predefined labels, this formulation can be seen as generating prototypes for each class first, and subsequently predicting their relationships (i.e., topology); or aggregate hypernodes first, and determining their labels; or construct topology first, optimizing the node feature and labels as Pan et al. (2023) did, hence possessing greater interpretability.

Table 1: Taxonomy of surveyed methods

	Method	$\mathcal{O}$	$f$
Graph Guided	GC ¹	Spectral Properties	Modification
	ReduceG ²	Spectral Properties	Modification
	SCAL ³	Spectral Properties	Modification
	GraphZoom ⁴	KNN Aggregation	Modification
	SC ⁵	KNN Aggregation	Modification
	FGC ⁶	Graph Statistics	Modification
	CaT ⁷	Graph Similarity	Synthetic
	OTC ⁸	Ranking + Reconstruct	M+S
Model Guided	ConvMatch ⁹	Embedding Similarity	Modification
	GCDM ¹⁰	Embedding Similarity	Synthetic
	KiDD ¹¹	Loss Matching	Synthetic
	FedGKD ¹²	Loss Matching	Synthetic
	GC-SNTK ¹³	Loss Matching	Synthetic
	SFGC ¹⁴	Gradient Matching	Synthetic
	DosCond ¹⁵	Gradient Matching	Synthetic
	GCond ¹⁶	Gradient Matching	Synthetic
	GroC ¹⁷	Gradient Matching	Synthetic
	HCDC ¹⁸	Gradient Matching	Synthetic
	MSGC ¹⁹	Gradient Matching	Synthetic
Hybrid	Mcond ²⁰	Reconstruct + Gradient	M+S
	SGDD ²¹	Spectral + Gradient	Synthetic
	GCEM ²²	Spectral + Logits	Synthetic
Ps. ’M’, ’S’ stands for Modification and Synthetic respectively. Refs: ¹Loukas (2019), ²Bravo Hermsdorff and Gunderson (2019), ³Huang et al. (2021), ⁴Deng et al. (2020), ⁵Jin et al. (2020), ⁶Kumar et al. (2023), ⁷Liu et al. (2023b), ⁸Huang et al. (2021), ⁹Dickens et al. (2023), ¹⁰Liu et al. (2022b), ¹¹Xu et al. (2023b), ¹²Pan et al. (2023),¹³Wang et al. (2023), ¹⁴Zheng et al. (2023), ¹⁵Jin et al. (2022), ¹⁶Jin et al. (2021), ¹⁷Li et al. (2023), ¹⁸Ding et al. (2022),¹⁹Gao and Wu (2023),²⁰Gao et al. (2023), ²¹Yang et al. (2023),²²Liu et al. (2023a)

4.3 Comparison of formulations

Each of the formulations mentioned has its distinct method (or not being invented yet but can be done) for generating ${\bf A}^{\prime}$ , ${\bf X}^{\prime}$ , $\mathcal{Y}^{\prime}$ separately, despite that the Sequential Optimization Formulation must rely on the intermediate results of the other formulations. As we conclude, the Modification formulations exhibit the strongest computational efficiency and interpretability, but their applicability is limited, as each graph awaiting condensation requires the recalibration of the projection matrix. The Synthetic by joint optimization formulation is the simplest, defining the objective and optimizing directly, yet it is also the most challenging, the parameter search space may be so big that convergence is difficult. To address this issue, the Synthetic by Sequential Optimization is a 2-step scheme, incorporating the advantages of both easy-to-implement and objective-oriented. The Synthetic by Predefined formulation yields intuitive results and can be effective in carefully designed scenarios. The table 1 presents the objective-based category of methods included in this survey, and strategies on how to formulate the condensed graphs.

4.4 Strategies: $\mathcal{S}\rightarrow\mathcal{S}_{c}$

After introducing the condensation of a single graph, we will now explore the strategies for condensing multiple graphs. Currently, there are only a few methods addressing this application, so we will only provide a concise overview:

One-by-One strategy: If the number of graphs remains unchanged, i.e., every single graph is condensed independently, e.g., Jin et al. (2020), we categorize this scheme as the one-by-one strategy. Any method that can condense a single graph can be modified to condense multiple graphs by adopting this strategy. Joint Optimization strategy: Similar to the single-graph joint optimization, this strategy combines multiple condensed graphs as parameters of the optimization objective, e.g., Jin et al. (2022). By naming the number of graphs in a graph dataset to be $1$ , this category of methods would essentially degenerate into a single-graph optimization strategy. Selecting strategy: The core of this strategy is to rank each single graph by score functions, and select the top-ranked ones, e.g., Wang et al. (2021); Xu et al. (2023a). Given that these methods have limited relevance to our survey, we will not delve further.

5 Dataset and Evaluation

5.1 Dataset Statistics

We systematically organize and summarize the datasets employed in the discussed methods, categorizing them into two primary types: datasets featuring a single large graph and those comprising multiple graphs. The former is typically utilized for tasks such as node classification and edge prediction, while the latter is employed in graph classification. We present key attributes of the datasets, encompassing details such as the number of nodes, number of edges, features and classes, and the graph type (e.g., social network or molecular network) for datasets with a single large graph. Additionally, for datasets containing multiple subgraphs, we provide organization based on the number of subgraphs, average number of nodes, average number of edges, number of labels, and the type of graph. The detailed statistic of the the datasets can be found in our online resources ¹¹1 https://github.com/liangliang6v6/GraphCondensation.

5.2 Evaluation Metrics

GC aims to create a significantly smaller graph dataset while preserving sufficient information, thus it is crucial to evaluate how much this information is retained. The evaluation of GC methods is challenging compared to the straightforward performance evaluation of traditional GNNs, mainly due to their involvement in multiple aspects. From a holistic perspective, we summarize the evaluation of the entire GC process into two aspects: effectiveness and efficiency. Effectiveness evaluates how well the GC retains the original information, while efficiency includes both the condensation process and downstream task efficiency. Details can be found below:

5.2.1 Metrics Over Effectiveness

From the perspective of input and output, GC methods take the original graphs as input and the condensed graphs as output. To verify that the condensed graphs are informative, the effectiveness of GC is evaluated through the following different aspects: (1) The similarity between the original and condensed graphs is assessed in domains such as spectral and spacial characteristics. (2) The performance of condensed graphs in downstream tasks, while closely mirrors evaluations in traditional GNNs, a comparable performance can be considered as a success preservation of valuable information for downstream tasks. (3) Properties of the condensed graphs alone, e.g., applicability which involves integrating GC as a component within an existing system, with evaluation metrics aligning with target systems like Graph Embedding and Graph Continual Learning; The capabilities such as fairness, generalizability, etc. Specifically:

Similarity with Original Graphs.

Evaluation of how well condensed graphs replicate the spectral and spacial characteristics of the original graphs involves metrics such as Proportion of Low-Frequency Nodes, Mean of High-Frequency Areas, and various spectral domain metrics. The dissimilarity between recovered graph partitions and ground-truth structures is quantified using Normalized Mutual Information (NMI). Error metrics, such as Relative Eigenerror, Reconstruction Error, Dirichlet Energy, and Hyperbolic Error, also provide insights into the similarity evaluation.

Performance of Mining on Condensed Graphs.

The evaluation of GC methods entails a diverse set of metrics tailored to specific downstream tasks. In node and graph classification, widely adopted measures include Accuracy (ACC), with the Mean Classification ACC and Standard Deviation providing a more robust assessment. Furthermore, widely adopted performance evaluation metrics, such as the area under the ROC curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score, find extensive application across diverse downstream tasks. For link prediction task, common metrics include Hits@50 and Mean Reciprocal Rank (MRR).

Independent Graphs Property Evaluation.

The utilization of GC methods in various systems entails the consideration of diverse evaluation metrics. For bias measurement in GC Feng et al. (2023); Mao et al. (2023), fairness metrics, including Demographic Parity (also known as Statistical Parity) and Equal Opportunity, are employed. Considering the fairness and downstream task performance trade-off, they utilize the Pareto frontier Ishizaka and Nemery. The GC methods involved with models’ guidance introduce the concept of generalizability, signifying that the condensed dataset can achieve comparable performance across GNNs. Addtionally, guided GNNs may differ from downstream GNNs, demonstrating cross-architecture generalizability.

5.2.2 Metrics Over Efficiency

The fundamental motivation of GC is to facilitate graph mining tasks on large-scale original graphs with efficiency. Consequently, it is imperative to evaluate the computational resources saved by GC in the mining of condensed graphs. Meanwhile, although GC is an one-time effort, the process of GC itself should not take too much resources.

Within the condensation algorithm, most methods, e.g. Jin et al. (2021, 2022), analyze the computational complexity, primarily emphasizing time complexity and, to a lesser extent, space complexity. With respect to various datasets and condensation ratios (often expressed as a percentage), these approach entails directly measuring the time required for generating the condensed dataset, commonly referred to as condensation time. For a more in-depth analysis of this term, certain methods further subdivide the time, e.g., GCEMLiu et al. (2023a) proposes additional metrics: time spent on pre-processing stage; KiDDXu et al. (2023b) measures the time dedicated to forward and reverse gradient propagation based on varying batch sizes, etc. In addition, considering that the large size of original graph data can cause the Out of Memory (OOM) problems, some methods also measure the memory usage in this process.

The efficiency of training a new GNN in downstream tasks using condensed graphs is undoubtedly superior to that of original graphs, where the time and space saved by GC are empirically proportional to the condensation ratios. However, there are still some research Dickens et al. (2023); Wang et al. (2023); Gao et al. (2023) specifically focused on evaluating downstream task training time and memory usage. Notably, not all methods explicitly address the efficiency of GC, but the absence of it does not imply its impossibility to conduct such Jin et al. (2020); Huang et al. (2021). Considering both efficiency and effectiveness, we will introduces a tradeoff consideration in the following chapters.

6 Limitations and Challenges

Performance Gap Issue.

As we analyze, there is an obvious performance gap: the scale of the synthetically condensed graph is considerably smaller than that of the modifying strategies. For example, the commonly used condensing ratio of synthetic methods is around 1.3%, while the common condense ratio of modifying the original graph is 30, 50, 70%. Different strategies have their respective advantages, but there might be a significant performance gap when evaluated on the same metric. That is to say, the choice of appropriate methods is diverse and depends on the practical needs of application scenarios, which is currently hard to have a unifying strategy.

Efficiency Concern for Applications.

If the ultimate goal of GC is to train GNNs effectively on scaled datasets, it may be required to ensure that the time and resources invested in GC do not surpass the time saved by training on smaller graphs. However, since the GC process is a one-time effort, as long as the downstream task continues for a sufficient duration or is repeated frequently, this requirement can still be fulfilled. Thus, the downstream task scenarios must be included as part of the efficiency evaluation scheme.

Comprehensive Effectiveness Metrics.

Existing methods predominantly evaluate GC effectiveness based on the performance of condensed graphs in downstream tasks. However, conventional performance metrics, like classification ACC, may fall short in addressing critical issues such as fairness Mao et al. (2023), robustness against adversarial attacks Li et al. (2023) etc. While few efforts, e.g. Feng et al. (2023), have propose constraints focusing on group fairness, future exploration could broaden the scope to include other universal constraint or evaluation metrics for GC.

Unexplored Methodological Capabilities

The scopes of investigations on generalizability were exploring performances with (1) Various GNN architectures; (2) Model convergence; (3) Extreme condensation ratios and (4) multiple downstream tasks. The main idea is to use the model performance on the condensed graph as a metric to assess the quality of the graph condensation. In this context, the performance metric is considered independent of the semantics of the condensed graph, and thus we consider GC methods using this strategy of evaluation to have a lower interpretability.

7 Future Directions

While substantial progress has been achieved in GC, we conclude that numerous future explorations persist:

Interpretability of the Condensed Graphs.

Unlike the natural interpretability of dataset distillation in the field of computer vision Zhao et al. (2020), the output of condensed graphs requires further exploration of interpretability in the real world. The key identification of GC in our definition is the fact that the nodes or edges in the condensed graph may be newly generated. While these new elements may provide sufficient information for graph mining, their semantic meanings in the real world may be difficult to obtain directly. Therefore, we believe that enhancing the interpretability of elements in the condensed graph is an important research problem for expanding the application scenarios of GC.

Condensing More Complex Graphs.

Although GC has been successfully developed in various graphs, most of existing methods have primarily focused on undirected, homogeneous, static graphs. However, graphs in real-world scenarios are usually more complicated Kazemi et al. (2020), such as dynamic graphs(e.g., traffic flow graphs), heterogeneous graphs (e.g., user-item graphs), etc. Condensation on these complex graphs requires preserving richer information from the original graph, which in turn poses greater challenges. Due to the complexity and diversity of real-world graph data, more graph types should be considered as well.

Exploring the Correlation between Objectives.

Under our taxonomy, each single objective of GC can be categorized into two groups: graph-guided and model-guided, by specific information to preserve. These two types of objectives are not inherently conflicting, yet their mutual relation has not been conclusively investigated. For instance, it remains uncertain whether there exists theoretical assurance that the preservation of certain graph properties is sufficient for the retention of GNN performance, or the other way around.

More Graph Formulation.

While there are already numerous methods for condensed graph formulation, we believe there are still many potential and viable approaches waiting to be explored. For example, non-uniform sampling of labels during GC might be a viable solution to address label imbalance issues Zhao et al. (2021); node features can be predefined as mutually orthogonal one-hot vectors, similar to what was done in Ma et al. (2021), just to generate the topology and thus facilitate relationship learning, etc.

Tradeoff Framework.

Within the exploration of applications, we inevitably confront a crucial yet delicate question: How to determine the scale of the condensed graph to meet the predefined purpose of GC? Although some existing methods have recognized the tradeoff between effectiveness and efficiency (as evidenced by their ratio-performance figure), we argues that both effectiveness and efficiency should be comprehensively included in a tradeoff framework. This is crucial to specify the utility of GC and expand its application scope to more practical scenarios. By considering both aspects, we can better understand the benefits and limitations of GC techniques and make informed decisions about their applicability in real-world settings.

References

Baker et al. [2020] Daniel Baker, Vladimir Braverman, Lingxiao Huang, Shaofeng H-C Jiang, Robert Krauthgamer, and Xuan Wu. Coresets for clustering in graphs of bounded treewidth. In International Conference on Machine Learning, pages 569–579. PMLR, 2020.
Bravo Hermsdorff and Gunderson [2019] Gecia Bravo Hermsdorff and Lee Gunderson. A unifying framework for spectrum-preserving graph sparsification and coarsening. Advances in Neural Information Processing Systems, 32, 2019.
Čebirić et al. [2019] Šejla Čebirić, François Goasdoué, Haridimos Kondylakis, Dimitris Kotzinos, Ioana Manolescu, Georgia Troullinou, and Mussab Zneika. Summarizing semantic graphs: a survey. The VLDB journal, 28:295–327, 2019.
Chen et al. [2022] Jie Chen, Yousef Saad, and Zechen Zhang. Graph coarsening: from scientific computing to machine learning. SeMA Journal, pages 1–37, 2022.
Chen et al. [2023] Zhiqian Chen, Fanglan Chen, Lei Zhang, Taoran Ji, Kaiqun Fu, Liang Zhao, Feng Chen, Lingfei Wu, Charu Aggarwal, and Chang-Tien Lu. Bridging the gap between spatial and spectral domains: A unified framework for graph neural networks. ACM Computing Surveys, 56(5):1–42, 2023.
Das [2004] K Ch Das. The laplacian spectrum of a graph. Computers & Mathematics with Applications, 48(5-6):715–724, 2004.
Deng et al. [2020] C Deng, Z Zhao, Y Wang, Z Zhang, and Z Feng. Graphzoom: A multi-level spectral approach for accurate and scalable graph embedding. In The International Conference on Learning Representations, 2020.
Dickens et al. [2023] Charles Dickens, Eddie Huang, Aishwarya Reganti, Jiong Zhu, Karthik Subbian, and Danai Koutra. Graph coarsening via convolution matching for scalable graph neural network training. arXiv preprint arXiv:2312.15520, 2023.
Ding et al. [2022] Mucong Ding, Xiaoyu Liu, Tahseen Rabbani, and Furong Huang. Faster hyperparameter search on graphs via calibrated dataset condensation. In NeurIPS 2022 Workshop: New Frontiers in Graph Learning, 2022.
Duan et al. [2022] Keyu Duan, Zirui Liu, Peihao Wang, Wenqing Zheng, Kaixiong Zhou, Tianlong Chen, Xia Hu, and Zhangyang Wang. A comprehensive study on large-scale graph training: Benchmarking and rethinking. Advances in Neural Information Processing Systems, 35:5376–5389, 2022.
Feng et al. [2023] Qizhang Feng, Zhimeng Jiang, Ruiquan Li, Yicheng Wang, Na Zou, Jiang Bian, and Xia Hu. Fair graph distillation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Gao and Wu [2023] Jian Gao and Jianshe Wu. Multiple sparse graphs condensation. Knowledge-Based Systems, 278:110904, 2023.
Gao et al. [2021] Yang Gao, Hong Yang, Peng Zhang, Chuan Zhou, and Yue Hu. Graph neural architecture search. International Joint Conference on Artificial Intelligence, 2021.
Gao et al. [2023] Xinyi Gao, T Chen, Yilong Zang, Wentao Zhang, Quoc Viet Hung Nguyen, K Zheng, and Hongzhi Yin. Graph condensation for inductive node representation learning. arXiv preprint arXiv:2307.15967, 2023.
Gao et al. [2024] Xinyi Gao, Junliang Yu, Wei Jiang, Tong Chen, Wentao Zhang, and Hongzhi Yin. Graph condensation: A survey. arXiv preprint arXiv:2401.11720, 2024.
Grattarola et al. [2022] Daniele Grattarola, Daniele Zambon, Filippo Maria Bianchi, and Cesare Alippi. Understanding pooling in graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2022.
Hashemi et al. [2024] Mohammad Hashemi, Shengbo Gong, Juntong Ni, Wenqi Fan, B. Aditya Prakash, and Wei Jin. A comprehensive survey on graph reduction: Sparsification, coarsening, and condensation. arXiv preprint, 2024.
Huang et al. [2021] Zengfeng Huang, Shengzhong Zhang, Chong Xi, Tang Liu, and Min Zhou. Scaling up graph neural networks via graph coarsening. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pages 675–684, 2021.
Jin et al. [2020] Yu Jin, Andreas Loukas, and Joseph JaJa. Graph coarsening with preserved spectral properties. In International Conference on Artificial Intelligence and Statistics, pages 4452–4462. PMLR, 2020.
Jin et al. [2021] Wei Jin, Lingxiao Zhao, Shichang Zhang, Yozen Liu, Jiliang Tang, and Neil Shah. Graph condensation for graph neural networks. In International Conference on Learning Representations, 2021.
Jin et al. [2022] Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. Condensing graphs via one-step gradient matching. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022.
Kazemi et al. [2020] Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. Representation learning for dynamic graphs: A survey. The Journal of Machine Learning Research, 21(1):2648–2720, 2020.
Kumar et al. [2023] Manoj Kumar, Anurag Sharma, Shashwat Saxena, and Sandeep Kumar. Featured graph coarsening with similarity guarantees. In International Conference on Machine Learning. PMLR, 2023.
Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking clever hans predictors and assessing what machines really learn. Nature communications, 10(1):1096, 2019.
Li et al. [2023] Xinglin Li, Kun Wang, Hanhui Deng, Yuxuan Liang, and Di Wu. Attend who is weak: Enhancing graph condensation via cross-free adversarial training. arXiv preprint arXiv:2311.15772, 2023.
Liang et al. [2021] Jiongqian Liang, Saket Gurukar, and Srinivasan Parthasarathy. Mile: A multi-level framework for scalable graph embedding. In Proceedings of the International AAAI Conference on Web and Social Media, volume 15, pages 361–372, 2021.
Liu et al. [2018] Yike Liu, Tara Safavi, Abhilash Dighe, and Danai Koutra. Graph summarization methods and applications: A survey. ACM computing surveys (CSUR), 51(3):1–34, 2018.
Liu et al. [2022a] Chuang Liu, Yibing Zhan, Jia Wu, Chang Li, Bo Du, Wenbin Hu, Tongliang Liu, and Dacheng Tao. Graph pooling for graph neural networks: Progress, challenges, and opportunities. arXiv preprint arXiv:2204.07321, 2022.
Liu et al. [2022b] Mengyang Liu, Shanchuan Li, X Chen, and Le S. Graph condensation via receptive field distribution matching. arXiv preprint arXiv:2206.13697, 2022.
Liu et al. [2023a] Yang Liu, Deyu Bo, and Chuan Shi. Graph condensation via eigenbasis matching. arXiv preprint arXiv:2310.09202, 2023.
Liu et al. [2023b] Yilun Liu, Ruihong Qiu, and Zi Huang. Cat: Balanced continual graph learning with graph condensation. arXiv preprint arXiv:2309.09455, 2023.
Loukas [2019] Andreas Loukas. Graph reduction with spectral and cut guarantees. J. Mach. Learn. Res., 20(116):1–42, 2019.
Luo et al. [2021] Dongsheng Luo, Wei Cheng, Wenchao Yu, Bo Zong, Jingchao Ni, Haifeng Chen, and Xiang Zhang. Learning to drop: Robust graph neural network via topological denoising. In Proceedings of the 14th ACM international conference on web search and data mining, pages 779–787, 2021.
Ma and Chen [2021] Tengfei Ma and Jie Chen. Unsupervised learning of graph hierarchical abstractions with differentiable coarsening and optimal transport. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 8856–8864, 2021.
Ma et al. [2021] Yao Ma, Xiaorui Liu, Neil Shah, and Jiliang Tang. Is homophily a necessity for graph neural networks? In International Conference on Learning Representations, 2021.
Mao et al. [2023] Runze Mao, Wenqi Fan, and Qing Li. Gcare: Mitigating subgroup unfairness in graph condensation through adversarial regularization. Applied Sciences, 13(16):9166, 2023.
Nguyen et al. [2022] Lam BQ Nguyen, Ivan Zelinka, Vaclav Snasel, Loan TT Nguyen, and Bay Vo. Subgraph mining in a large graph: A review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(4):e1454, 2022.
Pan et al. [2023] Qiying Pan, Ruofan Wu, LIU Tengfei, Tianyi Zhang, Yifei Zhu, and Weiqiang Wang. Fedgkd: Unleashing the power of collaboration in federated graph neural networks. In NeurIPS 2023 Workshop: New Frontiers in Graph Learning, 2023.
Rozemberczki et al. [2020] Benedek Rozemberczki, Oliver Kiss, and Rik Sarkar. Little Ball of Fur: A Python Library for Graph Sampling. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), page 3133–3140. ACM, 2020.
Wang et al. [2021] Yiwei Wang, Wei Wang, Yuxuan Liang, Yujun Cai, and Bryan Hooi. Curgraph: Curriculum learning for graph classification. In Proceedings of the Web Conference 2021, pages 1238–1248, 2021.
Wang et al. [2023] Lin Wang, Wenqi Fan, Jiatong Li, Yao Ma, and Qing Li. Fast graph condensation with structure-based neural tangent kernel. arXiv preprint arXiv:2310.11046, 2023.
Wei et al. [2023] Xiaowen Wei, Xiuwen Gong, Yibing Zhan, Bo Du, Yong Luo, and Wenbin Hu. Clnode: Curriculum learning for node classification. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 670–678, 2023.
Xu et al. [2023a] Jiarong Xu, Renhong Huang, Xin Jiang, Yuxuan Cao, Carl Yang, Chunping Wang, and Yang Yang. Better with less: A data-active perspective on pre-training graph neural networks. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Xu et al. [2023b] Zhe Xu, Yuzhong Chen, Menghai Pan, Huiyuan Chen, Mahashweta Das, Hao Yang, and Hanghang Tong. Kernel ridge regression-based graph dataset distillation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023.
Yang et al. [2023] Beining Yang, Kai Wang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Hao Tang, Yang You, and Jianxin Li. Does graph distillation see like vision dataset counterpart? In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Yu et al. [2023] Ruonan Yu, Songhua Liu, and Xinchao Wang. Dataset distillation: A comprehensive review. arXiv preprint arXiv:2301.07014, 2023.
Zhao et al. [2020] Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching. arXiv preprint arXiv:2006.05929, 2020.
Zhao et al. [2021] Tianxiang Zhao, Xiang Zhang, and Suhang Wang. Graphsmote: Imbalanced node classification on graphs with graph neural networks. In Proceedings of the 14th ACM international conference on web search and data mining, pages 833–841, 2021.
Zheng et al. [2023] Xin Zheng, Miao Zhang, Chunyang Chen, Quoc Viet Hung Nguyen, Xingquan Zhu, and Shirui Pan. Structure-free graph condensation: From large-scale graphs to condensed graph-free data. arXiv preprint arXiv:2306.02664, 2023.