Community-Aware Temporal Walks: Parameter-Free Representation Learning on Continuous-Time Dynamic Graphs

He Yu¹ Jing Liu²

{}^{1},^{2}

School of Artificial Intelligence, Xidian University, 2 South Taibai Road, Xi’an, Shaanxi 710071, China

{}^{1},^{2}

Guangzhou Institute of Technology, Xidian University, Knowledge City, Guangzhou, Guangdong 510555, China
¹[email protected], ²[email protected]

Abstract

Dynamic graph representation learning plays a crucial role in understanding evolving behaviors. However, existing methods often struggle with flexibility, adaptability, and the preservation of temporal and structural dynamics. To address these issues, we propose Community-aware Temporal Walks (CTWalks), a novel framework for representation learning on continuous-time dynamic graphs. CTWalks integrates three key components: a community-based parameter-free temporal walk sampling mechanism, an anonymization strategy enriched with community labels, and an encoding process that leverages continuous temporal dynamics modeled via ordinary differential equations (ODEs). This design enables precise modeling of both intra- and inter-community interactions, offering a fine-grained representation of evolving temporal patterns in continuous-time dynamic graphs. CTWalks theoretically overcomes locality bias in walks and establishes its connection to matrix factorization. Experiments on benchmark datasets demonstrate that CTWalks outperforms established methods in temporal link prediction tasks, achieving higher accuracy while maintaining robustness. The implementation of our proposed method is publicly available at https://github.com/leonyuhe/CTWalks.

1 Introduction

Continuous-Time Dynamic Graphs (CTDGs) rossi2020temporal ; xu2020inductive ; wang2021causal ; souza2022provably provide a comprehensive framework for modeling temporal interactions in real-world systems. By representing entities and their time-stamped interactions as nodes and edges, CTDGs capture the evolving dynamics of diverse systems, including social networks, biological processes, knowledge graphs, and recommendation platforms nickel2015review ; zhang2020gnnrec ; wang2019kgat ; zhang2021graph ; fout2017protein . Among the myriad tasks supported by CTDGs, temporal link prediction luo2022neighborhood ; yu2023towards —predicting future interactions based on historical data—stands out as a fundamental problem. Success in this task not only enhances our understanding of dynamic systems but also drives practical applications such as personalized recommendations and anomaly detection. Despite significant advancements in dynamic graph representation learning kazemi2020representation ; yang2023dynamic ; zhu2022encoder , existing methods face three critical challenges that limit their flexibility, adaptability, and expressiveness.

Refer to caption — Figure 1: Traditional anonymous encodings fail to differentiate nodes with similar local structures but distinct community roles, such as $A$ and $F$ . Without community context, the identical structures of $A$ and $F$ make it impossible to predict $D$ ’s relationship with either node.

First, current approaches rely heavily on heuristic sampling strategies to extract key graph features, including node relationships, edge interactions, and temporal patterns. However, these strategies often demand intensive hyperparameter tuning to achieve a balance between exploration and exploitation jin2023motif ; xia2021motif . Such reliance complicates the training process and introduces sensitivity to dataset-specific characteristics, limiting their generalizability.

Second, many existing methods fail to account for mesoscopic structures, such as communities fortunato2010community ; newman2004finding ; Liu2014Multiobjective ; Teng2021Overlapping , which bridge local (node-level) and global (graph-level) patterns. Communities are crucial in real-world networks for distinguishing nodes with similar local structures but differing roles within their communities. Without incorporating community-aware representations, traditional anonymous encodings struggle to differentiate such nodes, resulting in ambiguity when predicting interactions, as demonstrated in Figure 1.

Finally, existing methods struggle to handle the continuous nature of temporal dynamics in real-world networks. Conventional approaches often discretize time into fixed intervals or aggregate observations, which obscures important temporal information, particularly in irregularly-sampled data. Recent advances in continuous-time models, such as Neural Ordinary Differential Equations (Neural ODEs)chen2018neural , demonstrate the potential to preserve temporal continuity by defining hidden states at all time points through differential equations. However, integrating such continuous-time methodologies with dynamic graph structures remains an active area of researchjin2022mtgode ; asikis2020neural ; chen2023signed ; sdgnn2022 , particularly in scenarios that require capturing both intra-community and inter-community temporal dependencies.

To tackle these challenges, we propose Community-Aware Temporal Walks (CTWalks), a novel, parameter-free, and community-aware framework for representation learning on CTDGs. By leveraging the interplay between intra-community and inter-community dynamics, CTWalks achieves robust scalability, expressiveness, and generalizability. Contributions of this work include:

1. Community Detection on CTDGs: CTWalks introduces an innovative method to perform community detection on CTDGs by transforming them into a weighted temporal graph. This transformation aggregates interactions into a static graph, where edge weights reflect interaction frequencies, thereby enabling the identification of node community labels reflecting temporal information.

2. Parameter-Free Temporal Sampling: CTWalks eliminates the need for handcrafted hyperparameters in sampling by adopting a community-driven mechanism. Temporal walks are adaptively performed within or across communities, encoding both structural dependencies and temporal dynamics. In the anonymization process, we embed community labels to restore contextual distinctions between nodes, reducing ambiguity and enhancing inductive capabilities.

3. Continuous-Time Dynamics via ODEs: To explicitly model temporal dependencies and capture essential dynamic laws, CTWalks employs a Neural ODE-based process. This approach learns spatiotemporal dynamics continuously, aligning modeled dynamics with irregularly sampled observations and providing a robust basis for downstream tasks.

4. Empirical Validation: Extensive experiments on benchmark datasets demonstrate CTWalks’ superior performance in the temporal link prediction task. Compared to several well-known methods, CTWalks achieves higher accuracy while maintaining competitive computational efficiency.

2 Related Work

Dynamic graphs kazemi2020representation ; yang2023dynamic ; zhu2022encoder ; Ma2024Higher provide a powerful framework for modeling systems that evolve over time, effectively capturing temporal interactions among nodes, edges, and their attributes. Depending on the representation of temporal information, dynamic graphs are broadly classified into Discrete-Time Dynamic Graphs (DTDGs) cong2021dynamic ; goyal2020dyngraph2vec ; pareja2020evolvegcn ; sankar2020dysat ; you2022roland and Continuous-Time Dynamic Graphs (CTDGs) rossi2020temporal ; xu2020inductive ; wang2021causal ; souza2022provably . DTDGs approximate temporal evolution using discrete graph snapshots, which provide a coarse-grained view of dynamic systems. In contrast, CTDGs represent interactions as a sequence of time-stamped events, which allows for high temporal fidelity, enabling the representation of systems with irregular or rapid interactions. Existing representation learning methods on dynamic graphs can be categorized into two primary paradigms: message-passing-based methods and sequence-model-based methods. Additionally, methods based on Continuous-Time Differential Equations (CTDEs) are discussed as a separate modeling approach chen2018neural ; rubanova2019latent ; xhonneux2020continuous .

Message-Passing-Based Methods extend graph neural networks (GNNs) scarselli2008graph ; Mo2025AutoSGNN to incorporate temporal information by propagating and aggregating features from neighboring nodes. For example, TGN rossi2020temporal integrates a memory module to store historical interactions, dynamically updating it through attention-based message passing. While effective in capturing temporal dependencies, TGN suffers from high computational overhead due to frequent memory updates and neighbor aggregation, limiting its scalability for large graphs. Other approaches, such as MTSN jin2023motif , aim to preserve structural motifs during message passing to enhance the capture of local patterns. However, the computational cost of motif construction and repeated matrix operations makes these methods impractical for large-scale applications.

Sequence-Model-Based Methods rely on recurrent neural networks (RNNs) elman1990finding or related architectures to capture temporal dependencies by sequentially updating node embeddings. For instance, JODIE kumar2019predicting employs coupled RNNs to jointly model static and dynamic embeddings, enabling tasks such as temporal link prediction. However, JODIE’s reliance on hyperparameter tuning, such as decay functions for time intervals, reduces its generalizability across datasets. Similarly, CAWs wang2021causal leverage causal anonymous walks to encode temporal sequences while preserving the temporal order of interactions.

Continuous-Time Differential Equation (CTDE)-based Methods chen2018neural provide a robust framework for modeling temporal dynamics in graphs by treating temporal interactions as continuous processes. For example, ODE-RNN rubanova2019latent leverages ODE solvers to capture the dynamics between observations, effectively bridging the gaps in irregularly sampled data. CGNN xhonneux2020continuous introduces continuous message-passing inspired by diffusion processes, addressing challenges like over-smoothing while enabling long-range dependency modeling. TGAT xu2020inductive employs temporal attention mechanisms coupled with Fourier-based time encoding to model temporal dependencies, showcasing the potential of attention in capturing fine-grained temporal patterns.

3 Preliminaries

Definition 1: Continuous-Time Dynamic Graph.

A continuous-time dynamic graph $\mathcal{G}$ is represented as a sequence of non-decreasing chronological interactions:

\mathcal{G}=\{((u_{1},v_{1}),t_{1}),((u_{2},v_{2}),t_{2}),\dots\},\quad 0\leq t_{1}\leq t_{2}\leq\cdots.

(1)

Here, each pair $(u_{i},v_{i})$ represents an undirected link between nodes $u_{i}$ and $v_{i}$ , with a corresponding timestamp $t_{i}$ .

Definition 2: Temporal Link Prediction.

Temporal link prediction is the task of predicting whether an interaction between two nodes will occur at a specific time, based on historical interactions in the graph. Given the set of historical interactions before time $t$ :

\mathcal{H}(t)=\{((u^{\prime},v^{\prime}),t^{\prime})\mid t^{\prime}<t\},

(2)

and two nodes $u$ and $v$ , the goal is to predict the likelihood of an interaction between $u$ and $v$ at time $t$ . Formally, this is expressed as:

\text{Predict:}\;P(((u,v),t)\mid\mathcal{H}(t)),

(3)

where $P$ represents the probability of the interaction, conditioned on the historical interaction sequence.

Definition 3: Temporal Walk.

A temporal walk on a CTDG $\mathcal{G}$ is a sequence of node-time pairs:

W=\{(w_{1},t_{1}),(w_{2},t_{2}),\ldots,(w_{l},t_{l})\},

(4)

where:

•

$w_{1}=u$ and $t_{1}=t$ : $W$ is rooted at node $u$ at time $t$ .
•

$t_{1}>t_{2}>\ldots>t_{l}$ : Timestamps are strictly decreasing.
•

$((w_{i-1},w_{i}),t_{i})$ : Each step corresponds to a temporal edge in $\mathcal{G}$ .

Definition 4: Community.

A community in a graph is a subset of nodes $C\subseteq V$ such that nodes within $C$ are densely connected to each other, while connections between nodes in $C$ and those outside $C$ are sparse.

Formally, community detection algorithms aim to partition the node set $V$ into $k$ disjoint subsets $\mathcal{C}=\{C_{1},C_{2},\ldots,C_{k}\}$ , where:

C_{i}\cap C_{j}=\emptyset,\quad\forall i\neq j,\quad\text{and}\quad\bigcup_{i=1}^{k}C_{i}=V.

(5)

The quality of a community partition is often quantified using metrics such as modularity newman2004finding . Communities represent mesoscopic structures in graphs, bridging the gap between local node-level interactions and global graph-level patterns.

4 The Proposed Method: CTWalks

CTWalks comprises three key components: sampling, anonymization, and encoding. The complete notation system used throughout CTWalks is detailed in Appendix A, while the algorithm’s computational complexity is analyzed in Appendix C.3.

4.1 Sampling

Weighted Temporal Graph Construction.

We propose an innovative method for community detection in CTDGs by introducing the concept of a weighted temporal graph $G_{w}$ , which is derived from the input CTDG $\mathcal{G}$ . The set of nodes in $G_{w}$ includes all nodes appeared in $\mathcal{G}$ , and the edge weights $w_{uv}$ in $G_{w}$ encode the frequency of interactions between nodes $u$ and $v$ , defined as:

w_{uv}=\sum_{(u,v,t)\in\mathcal{G}}\mathbb{I}(u,v),

(6)

where $\mathbb{I}$ is an indicator function counting occurrences of $(u,v)$ in $\mathcal{G}$ . This transformation aggregates temporal interactions into a static graph while retaining the structural essence of the CTDG, thereby enabling efficient processing and community detection. By applying modularity optimization algorithms, such as Louvain newman2004fast ; blondel2008fast , on $G_{w}$ , the graph is partitioned into $k$ communities $\mathcal{C}=\{C_{1},C_{2},\ldots,C_{k}\}$ .

Sampling Strategy.

To effectively capture the temporal and structural patterns in $\mathcal{G}$ , nodes are categorized based on their roles within the community structure:

•

Non-Bridging Nodes: Nodes confined within a single community $C_{i}$ . Their temporal walks are restricted to the neighbors within the intra-community subgraph $G_{C_{i}}$ .
•

Bridging Nodes: Nodes that connect multiple communities. Their temporal walks are restricted to the neighbors within the inter-community subgraph $G_{I}$ , ensuring transitions only occur between bridging nodes.

At each step of a temporal walk, the next node $u$ is sampled based on its temporal relationship with the current node $v$ under the constraint $t^{\prime}<t$ , where $t$ is the current timestamp. The transition probability is defined as:

P(u,t^{\prime}|v,t)=\frac{e^{-(t-t^{\prime})}}{\sum_{u^{\prime}\in\text{N}_{\text{valid}}(v)}e^{-(t-t^{\prime})}},

(7)

where:

•
$\text{N}_{\text{valid}}(v)$ represents the valid neighbor set of $v$ , determined by its type:
- –
  
  For non-bridging nodes $v\in V_{C_{i}}$ , $\text{N}_{\text{valid}}(v)=\text{N}(v)\cap G_{C_{i}}$ , where $\text{N}(v)$ is the neighbor set of $v$ .
- –
  
  For bridging nodes $v\in V_{I}$ , $\text{N}_{\text{valid}}(v)=\text{N}(v)\cap G_{I}$ .
•

$t^{\prime}$ is the timestamp associated with the interaction between $v$ and $u$ .

Algorithm 1 Sampling Strategy

Input: Intra-community subgraphs $\{G_{C_{1}},G_{C_{2}},\ldots,G_{C_{k}}\}$ , inter-community subgraph $G_{I}$ , walk length $l$ , number of walks per node $R$ .
Output: Set of temporal walks $\mathcal{W}$ .

1: Initialize

\mathcal{W}\leftarrow\emptyset

2: for each node

v\in V

3: for each walk

r=1

R

4: Initialize walk

W\leftarrow[(v,t_{1})]

, timestamp

t\leftarrow t_{1}

5: if

v\in V_{I}

then

G_{\text{current}}\leftarrow G_{I}

7: else

8: Find

C_{i}

such that

v\in V_{C_{i}}

G_{\text{current}}\leftarrow G_{C_{i}}

10: end if

11: for step

i=1

l-1

12: Identify valid neighbors

\text{N}(v)

with

t^{\prime}<t

13: if

\text{N}(v)=\emptyset

then

14: Break

15: end if

16: Sample next node

u

using

P(u,t^{\prime}|v,t)

in Eq. (7)

17: Update

W\leftarrow W\cup(u,t^{\prime})

t\leftarrow t^{\prime}

18: end for

19:

\mathcal{W}\leftarrow\mathcal{W}\cup\{W\}

20: end for

21: end for

22: Return

\mathcal{W}

By utilizing community partitioning to guide walk strategies, CTWalks eliminates the need for additional parameters to control walk directions, simplifying implementation and improving adaptability to diverse graph structures and temporal dynamics. Non-bridging nodes focus on intra-community walks, capturing localized patterns, while bridging nodes explore inter-community relationships. The complete CTWalks sampling process is detailed in Algorithm 1.

4.2 Anonymization

CTWalks replaces node identities with position-based representations while embedding both directionality and community context. For a temporal interaction $((u,v),t_{i})$ , the anonymization process operates on nodes appearing in temporal walk sets originating from the source node $u$ and target node $v$ .

Community- and Direction-Aware Representation.

The anonymized representation for a node $w$ , based on temporal walk sets from the source node $u$ and target node $v$ , integrates both directionality and the community labels of the root nodes. This is defined as:

A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})=\big{[}A(w;\mathcal{S}_{u})||C_{u}||A(w;\mathcal{S}_{v})||C_{v}\big{]},

(8)

where:

•

$\mathcal{S}_{u}$ and $\mathcal{S}_{v}$ are the sets of temporal walks originating from $u$ and $v$ , respectively.
•

$A(w;\mathcal{S}_{u})$ and $A(w;\mathcal{S}_{v})$ aggregate $w$ ’s position-based occurrence information across all walks in $\mathcal{S}_{u}$ and $\mathcal{S}_{v}$ , respectively.
•

$C_{u}$ and $C_{v}$ are the community labels of the root nodes $u$ and $v$ , directly appended to their respective anonymization vectors.

Anonymized Walk Construction.

For a single temporal walk $\mathcal{W}=\{w_{1},w_{2},\dots,w_{l}\}$ with ascending timestamps $t_{1}<t_{2}<\dots<t_{l}$ , the anonymized walk representation is defined as:

\mathcal{W}_{\text{anon}}=\{(A(w_{i}),t_{i})\mid i=1,2,\dots,l\},

(9)

where $A(w_{i})$ is the community- and direction-aware anonymized representation of node $w_{i}$ , incorporating information from the sets of temporal walks $\mathcal{S}_{u}$ and $\mathcal{S}_{v}$ , as well as the community labels $C_{u}$ and $C_{v}$ . The anonymization process is detailed in Appendix G.

4.3 Encoding

The encoding mechanism in CTWalks processes anonymized temporal walks $\mathcal{W}_{\text{anon}}$ by combining continuous temporal evolution and instantaneous updates, while integrating community-aware information. This ensures the final state $h_{l}$ captures temporal, structural, and community-aware information for downstream tasks.

The process begins by initializing the cumulative hidden state $h^{\prime}_{0}=\mathbf{0}$ . Before the first step, the instantaneous hidden state $h_{1}$ is computed as:

h_{1}=g(h^{\prime}_{0},A(w_{1})).

(10)

For each subsequent step $i$ ( $1\leq i<l$ ), the encoding alternates between:

1.

Continuous Integration: Update $h^{\prime}_{i}$ by integrating the temporal evolution function $f(h,t)$ over $[t_{i},t_{i+1}]$ :

$h^{\prime}_{i}=\int_{t_{i}}^{t_{i+1}}f(h_{i},t)\,dt.$ (11)
2.

Instantaneous Update: Compute $h_{i+1}$ , incorporating $h^{\prime}_{i}$ and $A(w_{i+1})$ :

$h_{i+1}=g(h^{\prime}_{i},A(w_{i+1})).$ (12)

We define $g$ and $f$ as parameterized models, $g$ updates instantaneous states, while $f$ governs continuous temporal integration. Details are provided in Appendix C.1.

Algorithm 2 Encoding of CTWalks

Input: Anonymized temporal walk $\mathcal{W}_{\text{anon}}=\{(A(w_{1}),t_{1}),\ldots,(A(w_{l}),t_{l})\}$ .
Output: Final walk representation $h_{l}$ .

1: Initialize

h^{\prime}_{0}\leftarrow\mathbf{0}

2: Compute

h_{1}\leftarrow g(h^{\prime}_{0},A(w_{1}))

3: for

i=1

l-1

4: Update

h^{\prime}_{i}

via continuous integration

5: Compute

h_{i+1}

via instantaneous update

6: end for

7: Return

h_{l}

By alternating between cumulative updates and instantaneous hidden states, the encoding ensures $h_{l}$ is temporally coherent, structurally rich, and embeds community context for robust downstream tasks. The detailed implementation of this process is provided in Algorithm 2, and an illustration is shown in Fig. 2. For a detailed discussion, please refer to Appendix H.

4.4 Extension for Attributed Graphs

To encode node and edge attributes, the encoding process can be extended by modifying the instantaneous update function $g$ as follows:

h_{i}=g(h^{\prime}_{i-1},A(w_{i})||X_{w_{i}}||X_{e_{i}}),

(13)

where:

•

$X_{w_{i}}$ represents the attributes of node $w_{i}$ ,
•

$X_{e_{i}}$ represents the attributes of edge $e_{i}=\{w_{i-1},w_{i}\}$ ,
•

$||$ denotes the concatenation operation.

In this extended formulation, node attributes $X_{w_{i}}$ and edge attributes $X_{e_{i}}$ introduce additional context for real-world attributed dynamic graphs. Several methods yang2013community ; yang2013overlapping ; du2017community focus on community detection in attributed graphs, leveraging node attributes to improve accuracy. This extension enhances the model’s flexibility and expressiveness, enabling it to capture not only the structural and temporal information but also the rich attributes associated with nodes and edges.

5 Theoretical Analysis

Dynamic graph sampling inherently involves both spatial structures and temporal sequences. To isolate the impact of temporal sequences and focus on spatial sampling, it is essential to establish an unbiased spatial sampling environment as a foundation. To this end, we analyze the potential biases introduced by traditional random walks on unweighted, undirected static graphs, which may limit their ability to comprehensively explore the graph structure. Furthermore, we examine whether a parameter-free walk strategy, guided by community partitions and distinguishing intra-community and inter-community transitions, can effectively address these biases.

By simplifying the problem to static graphs, we aim to eliminate the confounding effects of temporal dynamics and concentrate on evaluating the locality bias and the theoretical advantages of community-guided walks in an idealized spatial context.

Analysis 1: Overcoming Locality Bias

Let a traditional random walk generate a sequence $\{w_{0},w_{1},...,w_{L_{w}-1}\}$ , where $w_{t}\in V$ . At step $t+1$ , the walk transitions to a neighbor of $w_{t}$ with probability $1/d(w_{t})$ , where $d(w_{t})$ is the degree of node $w_{t}$ . Let $P_{uv}$ denote the one-step transition probability from node $u$ to $v$ , defined as:

P_{uv}=\begin{cases}1/d(u),&\text{if }(u,v)\in E,\\ 0,&\text{otherwise.}\end{cases}

(14)

Suppose $w_{0}=u$ and $w_{t}=v$ , where $u,v\in V$ and $w_{0},w_{1},...,w_{t-1}\neq v$ . The probability that the walk starts at $u$ and visits $v$ for the first time at time $t$ is given by:

r^{t}_{uv}=\sum_{j\in\mathcal{N}(u)}P_{uj}\cdot r^{t-1}_{jv}=\frac{1}{d(u)}\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv},

(15)

where $\mathcal{N}(u)$ is the set of neighbors of node $u$ . This equation shows that $r^{t}_{uv}$ is the mean of the $(t-1)$ -step probabilities $r^{t-1}_{jv}$ over all neighbors of $u$ . The reliance on all neighbors dilutes the transition probability towards distant nodes, reinforcing locality bias.

Lemma 1. In CTWalks, let nodes $u$ and $v$ belong to two different communities, and $u,v\in V_{c}$ are bridging nodes. The probability that the walk starts from $u$ and visits $v$ for the first time at time $t$ satisfies:

r^{t}_{uv}\geq\frac{1}{d(u)}\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv},

(16)

with equality when all neighbors of $u$ are bridging nodes ( $\mathcal{N}(u)=\mathcal{N}_{\text{inter}}(u)$ ).

Proof. Partition $\mathcal{N}(u)$ into two disjoint subsets:

•

$\mathcal{N}_{\text{inter}}(u)=\mathcal{N}(u)\cap V_{c}$ , the set of inter-community neighbors (bridging nodes),
•

$\mathcal{N}_{\text{intra}}(u)=\mathcal{N}(u)-\mathcal{N}_{\text{inter}}(u)$ , the set of intra-community neighbors.

The total probability of transitioning to neighbors of $u$ can be expressed as:

\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv}=\sum_{j\in\mathcal{N}_{\text{inter}}(u)}r^{t-1}_{jv}+\sum_{j\in\mathcal{N}_{\text{intra}}(u)}r^{t-1}_{jv}.

(17)

In CTWalk, intra-community transitions cannot reach nodes in different communities. Thus, for any $j\in\mathcal{N}_{\text{intra}}(u)$ , $r^{t-1}_{jv}=0$ . The total probability simplifies to:

\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv}=\sum_{j\in\mathcal{N}_{\text{inter}}(u)}r^{t-1}_{jv}.

(18)

The transition probability $r^{t}_{uv}$ from $u$ to $v$ in CTWalks is:

r^{t}_{uv}=\frac{1}{|\mathcal{N}_{\text{inter}}(u)|}\sum_{j\in\mathcal{N}_{\text{inter}}(u)}r^{t-1}_{jv}.

(19)

Since $|\mathcal{N}_{\text{inter}}(u)|\leq d(u)$ , it follows that:

\frac{1}{|\mathcal{N}_{\text{inter}}(u)|}\geq\frac{1}{d(u)}.

(20)

Substituting this into the Eq.(19) for $r^{t}_{uv}$ , we obtain:

r^{t}_{uv}=\frac{1}{|\mathcal{N}_{\text{inter}}(u)|}\sum_{j\in\mathcal{N}_{\text{inter}}(u)}r^{t-1}_{jv}\geq\frac{1}{d(u)}\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv}.

(21)

Equality holds when all neighbors of $u$ are bridging nodes ( $\mathcal{N}(u)=\mathcal{N}_{\text{inter}}(u)$ ).

Lemma 1 demonstrates that CTWalks reduces locality bias by prioritizing inter-community transitions for bridging nodes. Unlike traditional random walks, which distribute transition probabilities evenly among all neighbors, CTWalks separates intra- and inter-community contributions, enabling effective exploration of global structures.

We also theoretically establish the connection between CTWalks and matrix factorization, details are provided in Appendix F.

6 Experiments

6.1 Experimental Setting

Baselines and Datasets.

CTWalks is evaluated against six state-of-the-art baselines for continuous-time dynamic graphs, including message passing-based methods (DyRep trivedi2019dyrep , TGAT xu2020inductive , and TGN rossi2020temporal ) and sequential model-based methods (CTDNE nguyen2018continuous , JODIE kumar2019predicting , and CAWs wang2021causal ) . The evaluation is conducted on five real-world datasets spanning diverse domains: social networks, email communications, e-commerce, online education, and student forums. For instance, UCI leskovec2016snap tracks social interactions in a university network; Enron leskovec2016snap captures email exchanges among employees; Taobao zhu2018learning records user-item interactions with encoded action types such as clicks and purchases; MOOC leskovec2016snap logs student activities on educational platforms; and Wikipedia kumar2019predicting is a bipartite interaction network consisting of editor-page and user-post activities. The dataset statistics are summarized in Table 1. Additional details on the baselines and datasets are available in the Appendix D.

Dataset	Nodes	Temporal Edges	Duration (seconds)	Interaction Intensity
UCI	1,899	59,835	16,621,303	$3.79\times 10^{-6}$
MOOC	7,144	411,749	2,572,086	$4.48\times 10^{-5}$
Enron	143	62,617	72,932,520	$6.50\times 10^{-5}$
Taobao	64,703	77,436	36,000	$6.64\times 10^{-5}$
Wikipedia	9,227	157,474	2,678,373	$1.27\times 10^{-5}$

Table 1: Dataset statistics. Interaction intensity is calculated as

2|E|/(|V|T)

, where

T

is the total time range in seconds,

|V|

and

|E|

are number of nodes and temporal links.

Data Preparation.

The data preparation process involves three key steps. First, all edges in the input temporal graph are sorted by their timestamps to ensure chronological consistency. This step facilitates training on past edges while testing on future edges. Next, the dataset is split into training, validation, and testing sets, adhering to a chronological ratio of 70% (training), 15% (validation), and 15% (testing). Finally, negative sampling is performed by generating edges absent in the original graph to form negative edge sets, ensuring a balanced dataset. Both positive and negative samples are used to train, validate, and test the model. Based on the training data, a Weighted Temporal Graph is constructed to capture the aggregate interaction strength between nodes over time. For nodes in the validation and testing datasets that are absent in the Weighted Temporal Graph:

•

Non-bridging nodes: If all neighbors of the node belong to the same community, the node is assigned to that community.
•

Bridging nodes: If the node’s neighbors belong to multiple communities, the node is assigned to a community with a probability proportional to the sum of the edge weights connecting the node to that community, defined as:

$P(C_{i}|v)=\frac{\sum_{u\in N(v)\cap C_{i}}w_{vu}}{\sum_{u\in N(v)}w_{vu}},$ (22)

where $P(C_{i}|v)$ is the probability of node $v$ belonging to community $C_{i}$ , $N(v)$ represents the neighbors of $v$ , and $w_{vu}$ is the edge weight between $v$ and $u$ .

Evaluation Tasks.

We assess the performance of CTWalks and baselines on link prediction tasks, which are further categorized into transductive and inductive settings: In the transductive link prediction task, temporal links between all nodes are used for training up to a specific time point, with the remaining links allocated for validation and testing. In the inductive link prediction task, the model is evaluated on its ability to predict links involving nodes unseen during training. Two specific scenarios are considered: 1. New-Old: Interactions between unseen (new) and observed (old) nodes. 2. New-New: Interactions exclusively between unseen nodes. To construct these scenarios, 10% of the nodes are masked during training, and interactions associated with them are removed. Validation and testing sets are limited to interactions involving these masked nodes.

Training Details.

For evaluation, we report results using two widely adopted metrics: Area Under the ROC Curve (AUC) and Average Precision (AP) which are detailed in Appendix D. These metrics effectively capture model performance across both transductive and inductive tasks. To ensure robustness, each experiment is repeated five times with different random seeds, and we report the average performance.

Task		Methods	UCI	MOOC	Enron	Taobao	Wikipedia
Inductive	new v.s. new	DyRep	63.76 $\pm$ 4.67	74.07 $\pm$ 1.88	72.74 $\pm$ 0.44	87.62 $\pm$ 1.23^†	67.07 $\pm$ 1.26
		TGAT	76.33 $\pm$ 1.38	70.04 $\pm$ 1.35	63.34 $\pm$ 1.82	76.99 $\pm$ 1.54	90.77 $\pm$ 0.95
		TGN	79.37 $\pm$ 1.49	77.07 $\pm$ 1.88	80.92 $\pm$ 0.02	84.74 $\pm$ 0.44	63.76 $\pm$ 4.67
		CTDNE	62.93 $\pm$ 0.72	72.01 $\pm$ 0.29	66.49 $\pm$ 1.21	78.01 $\pm$ 0.08	62.35 $\pm$ 1.10
		JODIE	70.31 $\pm$ 0.53	75.31 $\pm$ 4.14	70.95 $\pm$ 0.85	81.53 $\pm$ 2.12	72.65 $\pm$ 0.63
		CAWs	88.12 $\pm$ 0.88^†	90.10 $\pm$ 0.35^†	96.27 $\pm$ 1.16	86.34 $\pm$ 2.95	96.36 $\pm$ 1.48
		CTWalks	90.71 $\pm$ 0.08	92.86 $\pm$ 0.56	95.20 $\pm$ 0.68^†	96.98 $\pm$ 0.41	94.16 $\pm$ 0.25^†
	new v.s. old	DyRep	93.28 $\pm$ 3.01^†	88.23 $\pm$ 1.35	94.39 $\pm$ 0.59	89.36 $\pm$ 1.48^†	76.72 $\pm$ 1.36
		TGAT	76.33 $\pm$ 1.03	67.40 $\pm$ 1.71	61.80 $\pm$ 1.31	65.24 $\pm$ 0.98	95.47 $\pm$ 1.18
		TGN	87.13 $\pm$ 1.07	72.23 $\pm$ 1.20^†	77.98 $\pm$ 0.45	88.39 $\pm$ 0.32	90.28 $\pm$ 0.96
		CTDNE	71.11 $\pm$ 0.74	69.10 $\pm$ 0.88	64.66 $\pm$ 0.41	68.33 $\pm$ 0.05	88.39 $\pm$ 1.08
		JODIE	71.61 $\pm$ 0.37	88.20 $\pm$ 1.92	85.73 $\pm$ 1.36	80.53 $\pm$ 2.13	74.78 $\pm$ 0.22
		CAWs	91.25 $\pm$ 0.18	90.30 $\pm$ 0.08^†	93.22 $\pm$ 1.28^†	88.76 $\pm$ 1.18	96.19 $\pm$ 0.88
		CTWalks	95.70 $\pm$ 0.17	92.08 $\pm$ 0.20	92.52 $\pm$ 0.21	93.98 $\pm$ 0.44	95.15 $\pm$ 0.24^†
Transductive		DyRep	95.23 $\pm$ 1.48^†	90.49 $\pm$ 0.24	96.71 $\pm$ 0.80	83.11 $\pm$ 1.13	77.40 $\pm$ 1.79
		TGAT	77.67 $\pm$ 1.02	72.09 $\pm$ 1.51	60.88 $\pm$ 1.34	62.88 $\pm$ 1.46	96.36 $\pm$ 1.21^†
		TGN	83.36 $\pm$ 1.23	73.09 $\pm$ 0.03	68.12 $\pm$ 1.21	87.71 $\pm$ 0.04^†	95.23 $\pm$ 0.25
		CTDNE	76.89 $\pm$ 0.92	73.03 $\pm$ 0.32	89.36 $\pm$ 0.73	65.08 $\pm$ 0.02	92.43 $\pm$ 0.27
		JODIE	74.63 $\pm$ 0.52	90.50 $\pm$ 0.85	60.36 $\pm$ 0.65	82.02 $\pm$ 0.31	89.30 $\pm$ 0.22
		CAWs	95.25 $\pm$ 0.06	94.28 $\pm$ 0.29^†	91.64 $\pm$ 0.55	85.88 $\pm$ 0.37	98.67 $\pm$ 0.27
		CTWalks	98.05 $\pm$ 0.09	94.33 $\pm$ 0.08	92.54 $\pm$ 0.58^†	92.06 $\pm$ 0.16	95.14 $\pm$ 0.25

Table 2: Transductive and inductive link prediction performances w.r.t. AUC. We use bold font and ^† to highlight the best and second best performances.

6.2 Experimental Results and Discussion

We report the AUC performance of CTWalks and six state-of-the-art baselines across inductive and transductive link prediction tasks on various datasets. Table 2 summarizes the results, highlighting the effectiveness of CTWalks in both inductive and transductive settings.

In the inductive setting, CTWalks achieves remarkable results, particularly for the challenging ”new vs. new” links. On most of datasets, CTWalks significantly outperforms the best-performing baseline CAWs, with improvements of up to 5.14% on the UCI dataset and 2.16% on Taobao. Similarly, in the ”new vs. old” scenario, CTWalks consistently surpasses the baselines, achieving an AUC of 95.70 on UCI and 92.08 on MOOC, compared to the respective baseline bests of 93.28 by DyRep and 90.30 by CAWs. These results highlight CTWalks’ superior generalization ability, particularly for interactions involving previously unseen nodes.

In the transductive setting, CTWalks again demonstrates strong performance, achieving the best or second-best results across all datasets. On the Taobao dataset, CTWalks attains an AUC of 92.06. Similarly, on the MOOC dataset, CTWalks achieves an AUC of 94.33, showcasing its competitive edge even on attributed datasets.

The success of CTWalks can be attributed to its ability to integrate temporal dynamics and structural information effectively, leveraging its community-aware temporal walk encoding to capture intricate graph patterns. Its superior performance across diverse datasets highlights its generalizability and robustness, setting a benchmark for future research in dynamic graph learning.

6.3 Ablation Study

We conducted an ablation study to evaluate the contributions of key components in CTWalks, with results presented in Table 3. The study involved systematically removing essential mechanisms to understand their impact on the performance of CTWalks.

No.	Ablation	UCI	Taobao	MOOC
1.	original method	92.94 $\pm$ 0.21	94.88 $\pm$ 0.36	92.07 $\pm$ 0.21
2.	remove intra-community walk	89.28 $\pm$ 0.66	88.45 $\pm$ 0.41	85.69 $\pm$ 0.31
3.	remove inter-community walk	86.90 $\pm$ 0.17	87.91 $\pm$ 0.39	86.25 $\pm$ 1.99
4.	remove community walk	85.94 $\pm$ 0.85	86.01 $\pm$ 0.12	85.12 $\pm$ 0.20
5.	remove anonymous community label	88.94 $\pm$ 0.61	87.01 $\pm$ 0.14	86.12 $\pm$ 0.30
6.	remove continuous evolution	79.38 $\pm$ 2.05	84.44 $\pm$ 0.56	83.72 $\pm$ 0.44

Table 3: Ablation study on CTWalks in terms of AUC scores on all inductive test links.

The first three experiments assess the influence of the community-aware sampling mechanism. When the restrictions on intra-community and inter-community walks are removed, nodes are allowed to perform unrestricted temporal walks, with the selection of the next node determined by time-biased probabilities that prioritize temporally closer events. The resulting performance degradation across these variants underscores the significance of preserving community boundaries. This highlights the effectiveness of the proposed community-aware sampling strategy in capturing both local and global temporal dynamics. To further validate the advantages of community walks in enhancing network embedding learning, we conducted experiments on purely static graphs and compared against classical graph node embedding algorithms. Detailed analyses are provided in Appendix E.

The fourth experiment removes the use of community labels during the anonymized walk encoding. This change prevents the model from distinguishing between structurally similar walks based on their community context, reducing its ability to effectively generalize across different network structures.

Finally by removing the continuous integration step, the cumulative hidden state $h^{\prime}_{i}$ is no longer updated through integration over temporal intervals. Instead, the encoding process relies solely on the instantaneous update function Eq.(12), where the hidden state at each step is directly computed without accounting for continuous temporal evolution. This simplification significantly impacts performance, particularly on datasets with sparse interactions like UCI, as it eliminates the ability to capture nuanced spatiotemporal dynamics over time intervals, underscoring the critical role of the continuous integration mechanism.

These results validate that each component contributes significantly to the overall performance of CTWalks, with the interplay of community-aware sampling and continuous evolution mechanisms being particularly crucial.

7 Discussion and Conclusion

The effectiveness of our model relies heavily on modularity-based community detection methods, such as Louvain, which ties its performance to the quality of the initial community partitions. In networks with overlapping or poorly-defined communities, this dependency may hinder the efficacy of the community-aware sampling mechanism. To address this, future work could explore adaptive sampling techniques that dynamically refine community boundaries or integrate multi-scale community detection methods to enhance robustness.

Additionally, incorporating continuous temporal dynamics via ODE solvers poses scalability challenges for extremely large graphs. To address these issues, approximate solutions such as logarithmic normalization of time intervals and parallel batch processing (see Appendix C.2) have been employed to reduce computational overhead. Nonetheless, future research should prioritize the development of lightweight integration techniques or hybrid approaches that strike a balance between computational efficiency and modeling fidelity.

CTWalks represents a community-aware, parameter-free framework for representation learning on CTDGs. Unlike existing methods, CTWalks simultaneously incorporates intra- and inter-community dynamics, enabling the effective modeling of mesoscopic structures within graphs. Our contributions span several dimensions: (1) a novel temporal walk sampling strategy that adaptively captures community-driven dynamics without requiring extensive parameter tuning, (2) an anonymized encoding process augmented with community labels for generalization, and (3) a continuous temporal integration mechanism to capture nuanced spatiotemporal dependencies. Experimental results on diverse datasets consistently demonstrate the superiority of CTWalks in both inductive and transductive tasks.

This work lays the groundwork for future research in dynamic graph learning. Directions for future exploration include designing scalable methods for handling larger and more complex CTDGs, developing robust community detection mechanisms, and addressing challenges associated with data bias and ethical use in real-world applications. With its potential to generalize across domains, CTWalks represents a significant step forward in understanding and modeling the evolving dynamics of complex systems.

References

(1) E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, and M. Bronstein, “Temporal graph networks for deep learning on dynamic graphs,” in ICML 2020 Workshop on Graph Representation Learning, 2020.
(2) D. Xu, C. Ruan, E. Körpeoglu, S. Kumar, and K. Achan, “Inductive representation learning on temporal graphs,” in 8th International Conference on Learning Representations, OpenReview.net, 2020.
(3) Y. Wang, Y.-Y. Chang, Y. Liu, J. Leskovec, and P. Li, “Inductive representation learning in temporal networks via causal anonymous walks,” in 9th International Conference on Learning Representations, OpenReview.net, 2021.
(4) A. H. Souza, D. Mesquita, S. Kaski, and V. Garg, “Provably expressive temporal graph networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2022.
(5) M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A review of relational machine learning for knowledge graphs,” Proceedings of the IEEE, vol. 104, no. 1, pp. 11–33, 2015.
(6) S. Zhang, Z. Yang, M. Zhao, and G. Li, “Gnnrec: Graph neural network-based recommendation,” arXiv preprint arXiv:2001.09061, 2020.
(7) X. Wang, X. He, Y. Cao, M. Liu, and T.-S. Chua, “Kgat: Knowledge graph attention network for recommendation,” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 950–958, 2019.
(8) M. Zhang, W. Hu, J. Wang, and Z. Zhang, “Graph representation learning for biological networks,” in ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2021.
(9) A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur, “Protein interface prediction using graph convolutional networks,” Advances in neural information processing systems, vol. 30, pp. 6533–6542, 2017.
(10) Y. Luo and P. Li, “Neighborhood-aware scalable temporal network representation learning,” in Learning on Graphs Conference, 2022.
(11) L. Yu, L. Sun, B. Du, and W. Lv, “Towards better dynamic graph learning: New architecture and unified library,” in Advances in Neural Information Processing Systems, 2023.
(12) S. M. Kazemi, R. Goel, K. Jain, I. Kobyzev, A. Sethi, P. Forsyth, and P. Poupart, “Representation learning for dynamic graphs: A survey,” Journal of Machine Learning Research, vol. 21, no. 70, pp. 1–73, 2020.
(13) L. Yang, S. Adam, and C. Chatelain, “Dynamic graph representation learning with neural networks: A survey,” arXiv preprint arXiv:2304.05729, 2023.
(14) Y. Zhu, F. Lyu, C. Hu, X. Chen, and X. Liu, “Encoder-decoder architecture for supervised dynamic graph learning: A survey,” arXiv preprint arXiv:2203.10480, 2022.
(15) X. Jin, Q. Li, J. Zhou, W. Fan, and J. Tang, “Motif-aware representation learning on continuous-time dynamic graphs,” arXiv preprint arXiv:2301.06434, 2023.
(16) M. Xia, S. Zhang, P. Zhou, F. Nie, and J. Huang, “Motif-preserving dynamic attributed network embedding,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5750–5763, 2021.
(17) S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5, pp. 75–174, 2010.
(18) M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
(19) C. Liu, J. Liu, and Z. Jiang, “A multiobjective evolutionary algorithm based on similarity for community detection from signed social networks,” IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2274 – 2287, 2014.
(20) X. Teng, J. Liu, and M. Li, “Overlapping community detection in directed and undirected attributed networks using a multiobjective evolutionary algorithm,” IEEE Trans. Cybernetics, vol. 51, pp. 138 – 150, January 2021.
(21) R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential equations,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
(22) M. Jin, Y. Zheng, Y.-F. Li, S. Chen, B. Yang, and S. Pan, “Multivariate time series forecasting with dynamic graph neural odes,” arXiv preprint arXiv:2202.08408, 2022.
(23) T. Asikis, L. Böttcher, and N. Antulov-Fantulin, “Neural ordinary differential equation control of dynamics on graphs,” arXiv preprint arXiv:2006.09773, 2020.
(24) L. Chen, K. Wu, J. Lou, and J. Liu, “Signed graph neural ordinary differential equation for modeling continuous-time dynamics,” arXiv preprint arXiv:2312.11198, 2023.
(25) Anonymous, “Streaming dynamic graph neural networks for continuous-time temporal graphs,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
(26) H. Ma, K. Wu, H. Wang, and J. Liu, “Higher-order knowledge transfer for dynamic community detection with great changes,” IEEE Transactions on Evolutionary Computation, vol. 28, no. 1, pp. 90 – 104, 2024.
(27) W. Cong, Y. Wu, Y. Tian, M. Gu, Y. Xia, M. Mahdavi, and C.-c. J. Chen, “Dynamic graph representation learning via graph transformer networks,” CoRR, vol. abs/2111.10447, 2021.
(28) P. Goyal, S. R. Chhetri, and A. Canedo, “dyngraph2vec: Capturing network dynamics using dynamic graph representation learning,” Knowledge-Based Systems, vol. 187, p. 104816, 2020.
(29) A. Pareja, G. Domeniconi, J. Chen, T. Ma, T. Suzumura, H. Kanezashi, T. Kaler, T. B. Schardl, and C. E. Leiserson, “Evolvegcn: Evolving graph convolutional networks for dynamic graphs,” in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 5363–5370, AAAI Press, 2020.
(30) A. Sankar, Y. Wu, L. Gou, W. Zhang, and H. Yang, “Dysat: Deep neural representation learning on dynamic graphs via self-attention networks,” in Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining, pp. 519–527, ACM, 2020.
(31) J. You, T. Du, and J. Leskovec, “Roland: graph learning framework for dynamic graphs,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2358–2366, ACM, 2022.
(32) Y. Rubanova, R. T. Q. Chen, and D. Duvenaud, “Latent ordinary differential equations for irregularly-sampled time series,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
(33) L.-P. Xhonneux, M. Qu, and J. Tang, “Continuous graph neural networks,” Advances in Neural Information Processing Systems, vol. 33, pp. 18508–18519, 2020.
(34) F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2008.
(35) S. Mo, K. Wu, Q. Gao, X. Teng, and J. Liu, “Autosgnn: Automatic propagation mechanism discovery for spectral graph neural networks,” in AAAI, 2025.
(36) J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990.
(37) S. Kumar, X. Zhang, and J. Leskovec, “Predicting dynamic embedding trajectory in temporal interaction networks,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1269–1278, ACM, 2019.
(38) M. E. Newman, “Fast algorithm for detecting community structure in networks,” Physical review E, vol. 69, no. 6, p. 066133, 2004.
(39) V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008.
(40) J. Yang, J. McAuley, and J. Leskovec, “Community detection in networks with node attributes,” ICDM, pp. 1151–1156, 2013.
(41) J. Yang and J. Leskovec, “Overlapping community detection at scale: a nonnegative matrix factorization approach,” Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596, 2013.
(42) B. Du, Z. Wu, and J. Pei, “Community detection in attributed graphs: An embedding approach,” Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 589–598, 2017.
(43) R. Trivedi, M. Farajtabar, P. Biswal, and H. Zha, “Dyrep: Learning representations over dynamic graphs,” in International Conference on Learning Representations (ICLR), 2019.
(44) G. Nguyen, J. Lee, R. Rossi, N. Ahmed, E. Koh, and S. Kim, “Continuous-time dynamic network embeddings,” Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), pp. 373–381, 2018.
(45) J. Leskovec and A. Krevl, “Snap datasets: Stanford large network dataset collection.” http://snap.stanford.edu/data, 2014. Accessed: 2016-01-01.
(46) H. Zhu, X. Li, P. Zhang, G. Li, J. He, H. Li, and K. Gai, “Learning tree-based deep model for recommender systems,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1079–1088, ACM, 2018.
(47) K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, 2014.
(48) R. T. Chen, B. Amos, and M. Nickel, “Neural spatio-temporal point processes,” in International Conference on Learning Representations, 2020.
(49) T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proceedings of the International Conference on Learning Representations (ICLR), 2013.
(50) A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864, ACM, 2016.
(51) B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710, ACM, 2014.
(52) J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077, 2015.
(53) L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo, “struc2vec: Learning node representations from structural identity,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394, ACM, 2017.
(54) W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1024–1034, 2017.
(55) X. Wang, P. Cui, J. Wang, J. Pei, and W. Zhu, “Community preserving network embedding,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, pp. 203–209, 2017.
(56) R. A. Rossi and N. K. Ahmed, “The network data repository with interactive graph analytics and visualization,” 2015. Available at https://networkrepository.com.
(57) T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems (NeurIPS), pp. 3111–3119, 2013.
(58) O. Levy and Y. Goldberg, “Neural word embedding as implicit matrix factorization,” in Proceedings of Advances in Neural Information Processing Systems, pp. 2177–2185, 2014.
(59) J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, “Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and node2vec,” in Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp. 459–467, 2018.
(60) Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent neural networks for sequence learning,” arXiv preprint arXiv:1506.00019, 2015.
(61) Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, “Recurrent neural networks for multivariate time series with missing values,” in Scientific reports, vol. 8, pp. 1–12, Nature Publishing Group, 2018.
(62) H. Mei and J. Eisner, “Neural hawkes process: A neurally self-modulating multivariate point process,” in Advances in Neural Information Processing Systems, pp. 6754–6764, 2017.

Appendix A Notations and Definitions

Symbol	Definition
$\mathcal{G}=\{(e_{i},t_{i})\}_{i=1}^{N}$	A continuous-time dynamic graph (CTDG) with time-stamped interactions.
$G_{w}=(V,E_{w})$	The weighted temporal graph with aggregated interaction weights.
$w_{uv}$	Weight of the edge $(u,v)$ , representing the total number of interactions.
$\mathcal{C}=\{C_{1},C_{2},\ldots,C_{k}\}$	Partitioned communities in the graph using modularity-based algorithms.
$G_{C_{i}}=(V_{C_{i}},E_{C_{i}})$	Subgraph corresponding to community $C_{i}$ .
$G_{I}=(V_{I},E_{I})$	Inter-community subgraph comprising bridging nodes and edges.
$V_{I}$	Set of bridging nodes connecting multiple communities.
$\text{N}(v)$	Neighbor set of node $v$ .
$W$	A temporal walk defined as $\{(w_{1},t_{1}),(w_{2},t_{2}),\ldots,(w_{l},t_{l})\}$ .
$P_{\text{intra}}(u\|v,t)$	Transition probability for intra-community walks based on temporal constraints.
$P_{\text{inter}}(u\|v,t)$	Transition probability for inter-community walks based on temporal constraints.
$A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})$	Anonymization operation with community label integration.
$\mathcal{W}_{\text{anon}}$	Anonymized temporal walk incorporating community-aware representations.
$h_{i}$	Instantaneous hidden state at step $i$ of the walk.
$h^{\prime}_{i}$	Cumulative hidden state at step $i$ updated through continuous integration.
$g(h^{\prime}_{i-1},A(w_{i})$	Instantaneous update function for hidden states.
$f(h,t)$	Temporal evolution function for continuous integration.
$\int_{t_{i}}^{t_{i+1}}f(h_{i},t)\,dt$	Continuous integration over the temporal interval $[t_{i},t_{i+1}]$ .
$h_{l}$	Final representation of the temporal walk $W$ .
$\lambda=2\|E\|/(\|V\|T)$	Average link stream intensity, measuring interaction density over time.

Table 4: Key Notations and Definitions for CTWalks

Additional Notes:

•

The definitions aim to maintain clarity for each symbol and its application within the CTWalks framework.
•

Symbols like $g(h^{\prime}_{i-1},A(w_{i}))$ and $f(h,t)$ are parameterized models, where $g$ handles discrete updates and $f$ continuous temporal dynamics.
•

The temporal walks $W$ integrate both intra-community and inter-community dynamics, with anonymization enhancing generalization across unseen nodes.
•

Community-aware sampling eliminates the need for additional parameters to control local/global walks, enhancing model simplicity and robustness.

Appendix B Community Detection Algorithms

B.1 Louvain Method

The Louvain method newman2004fast ; blondel2008fast is an efficient modularity optimization algorithm widely used for community detection in undirected weighted graphs. It operates in two iterative phases:

Local Optimization Phase: Initially, each node is treated as a separate community. The algorithm iteratively moves nodes between neighboring communities to maximize the modularity gain $\Delta Q$ , defined as:

\Delta Q=\frac{\sum_{\text{in}}+2k_{\text{in}}}{2m}-\left(\frac{\sum_{\text{tot}}+k_{\text{tot}}}{2m}\right)^{2}-\left(\frac{\sum_{\text{tot}}}{2m}\right)^{2},

(23)

where $\sum_{\text{in}}$ is the sum of edge weights within a community, $k_{\text{in}}$ and $k_{\text{tot}}$ are the total weights of edges connected to the node inside the community and in the entire graph, respectively, and $m$ is the total edge weight in the graph.

2.

Community Aggregation Phase: Once no further modularity improvement is possible, each community is treated as a supernode, forming a new graph. The process is repeated until the modularity $Q$ no longer improves significantly.

The Louvain method is highly scalable and effective, making it suitable for large-scale graphs.

B.2 Other Community Detection Algorithms

Several alternative community detection methods exist, each with unique strengths and weaknesses:

•
Spectral Clustering: Based on the eigenvectors of the graph Laplacian, this method identifies $k$ communities.
- –
  
  Advantages: Effective for small-scale graphs and capable of capturing complex community structures.
- –
  
  Disadvantages: Computationally expensive for large graphs.
•
Random Walk-Based Methods (e.g., Infomap): These minimize the description length of random walks to identify communities.
- –
  
  Advantages: Naturally handle weighted graphs and hierarchical community structures.
- –
  
  Disadvantages: High computational resource requirements.
•
Label Propagation: Communities are formed through the iterative propagation and convergence of node labels.
- –
  
  Advantages: Simple and fast, well-suited for dynamic graphs.
- –
  
  Disadvantages: Susceptible to local optima and instability.
•

Modularity Maximization: A family of algorithms that optimize modularity $Q$ . The Louvain method is a prominent example.
•
Deep Learning-Based Methods: Algorithms like DeepWalk and Node2Vec learn node embeddings to capture community structures.
- –
  
  Advantages: Integrate node attributes and graph topology information.
- –
  
  Disadvantages: Require significant computational resources and are sensitive to hyperparameters.

In our framework, the Louvain method is employed for community detection due to its efficiency and suitability for weighted graphs. This method provides a solid foundation for subsequent node community classification and boundary node detection. Future work may explore advanced algorithms tailored to dynamic weighted graphs for more accurate and robust community detection.

Appendix C Batching and Complexity Analysis

C.1 Function Definitions: $g$ and $f$

In the CTWalks framework, two key functions, $g$ and $f$ , are defined to model discrete updates and continuous temporal evolution, respectively. Their complementary roles enable precise representation of both instantaneous and time-dependent dynamics within continuous-time dynamic graphs.

1. Instantaneous Update Function $g$

The function $g$ handles the discrete, step-wise state updates at specific nodes along a temporal walk. For each node $w_{i}$ , the hidden state $h_{i}$ is computed as:

h_{i}=g(h^{\prime}_{i-1},A(w_{i})),

(24)

where:

•

$h^{\prime}_{i-1}$ is the cumulative hidden state from the previous step.
•

$A(w_{i})$ is the anonymized representation of node $w_{i}$ .
•

$g$ is instantiated as a Gated Recurrent Unit (GRU) cho2014learning , designed to capture both structural and temporal information.

The GRU-based formulation of $g$ is given by:

z_{t}=\sigma(W_{z}h_{t-1}+U_{z}x_{t}+b_{z}),

(25)

r_{t}=\sigma(W_{r}h_{t-1}+U_{r}x_{t}+b_{r}),

(26)

\tilde{h}_{t}=\tanh(W_{h}(r_{t}\odot h_{t-1})+U_{h}x_{t}+b_{h}),

(27)

h_{t}=z_{t}\odot\tilde{h}_{t}+(1-z_{t})\odot h_{t-1},

(28)

where:

•

$z_{t}$ , $r_{t}$ , and $\tilde{h}_{t}$ are the update gate, reset gate, and candidate hidden state, respectively.
•

$x_{t}=A(w_{i})$ represents the input node features.
•

$\sigma(\cdot)$ and $\tanh(\cdot)$ denote the sigmoid and hyperbolic tangent activation functions, respectively.
•

$\odot$ represents element-wise multiplication.

2. Continuous Temporal Evolution Function $f$

The function $f$ models the continuous evolution of the hidden state over time intervals between nodes. Unlike $g$ , which incorporates node-specific input features, $f$ focuses solely on temporal dynamics and acts on the output of $g$ . The cumulative hidden state $h^{\prime}_{i}$ at step $i$ is computed by integrating $f$ over the time interval $[t_{i},t_{i+1}]$ :

h^{\prime}_{i}=\int_{t_{i}}^{t_{i+1}}f(h_{i},t)\,dt,

(29)

where:

•

$h_{i}$ is the instantaneous hidden state computed by $g$ ,
•

$t_{i}$ and $t_{i+1}$ are the timestamps associated with the current and next nodes in the temporal walk,
•

$f(h,t)$ is parameterized as a GRU-like model, similar to $g$ , but without external inputs $x_{t}$ .

The formulation of $f$ is as follows:

z_{t}=\sigma(W_{z}h_{t-1}+b_{z}),

(30)

r_{t}=\sigma(W_{r}h_{t-1}+b_{r}),

(31)

\tilde{h}_{t}=\tanh(W_{h}(r_{t}\odot h_{t-1})+b_{h}),

(32)

h_{t}=z_{t}\odot\tilde{h}_{t}+(1-z_{t})\odot h_{t-1},

(33)

where:

•

$z_{t}$ , $r_{t}$ , and $\tilde{h}_{t}$ are the update gate, reset gate, and candidate hidden state, respectively,
•

$W_{z}$ , $W_{r}$ , and $W_{h}$ are weight matrices,
•

$b_{z}$ , $b_{r}$ , and $b_{h}$ are biases,
•

$\sigma(\cdot)$ and $\tanh(\cdot)$ denote the sigmoid and hyperbolic tangent activation functions.

By removing external inputs $x_{t}$ , $f$ focuses purely on modeling the temporal continuity of the hidden state, allowing it to integrate spatiotemporal dependencies over time intervals.

Together, $g$ and $f$ enable CTWalks to seamlessly integrate spatial and temporal dynamics, delivering a robust framework for learning representations on continuous-time dynamic graphs.

C.2 Batching Analysis

Efficiently encoding temporal walks poses a significant challenge due to the irregular timestamps and varying lengths of walks. For $C$ temporal walks, each consisting of $l$ steps, solving $C\times l$ independent ordinary differential equations (ODEs) is computationally expensive. To address this, we employ a time reparameterization strategy, unifying the integration intervals of all walks into a standard range $[0,1]$ . This unification allows batch processing of the walks, enabling significant computational efficiencychen2020neural .

Reparameterization for Unified Time Intervals

Consider the $c$ -th walk with timestamps $[t_{\text{start}}^{c},t_{\text{end}}^{c}]$ , where $t_{\text{start}}^{c}$ is the time of the first node in the walk, and $t_{\text{end}}^{c}$ is the time of the last node. We reparameterize the time $t$ within this interval using a new variable $s\in[0,1]$ , defined as:

s=\frac{t-t_{\text{start}}^{c}}{t_{\text{end}}^{c}-t_{\text{start}}^{c}},\quad\text{thus,}\quad t=s\cdot(t_{\text{end}}^{c}-t_{\text{start}}^{c})+t_{\text{start}}^{c}.

(34)

Under this transformation, the instantaneous hidden state $h_{t}$ at time $t$ is reformulated as $\tilde{h}_{s}$ , such that:

\tilde{h}_{s}=h_{t}=h_{s\cdot(t_{\text{end}}^{c}-t_{\text{start}}^{c})+t_{\text{start}}^{c}}.

(35)

This allows all walks, regardless of their original time intervals, to operate within a common temporal scale of $[0,1]$ .

Reformulated ODE for Batch Processing

The cumulative hidden state for the $c$ -th walk is computed by solving an ODE over the interval $[t_{\text{start}}^{c},t_{\text{end}}^{c}]$ . After reparameterization, the ODE is reformulated for $\tilde{h}_{s}$ as follows:

\frac{d\tilde{h}_{s}}{ds}=f(h_{t},t)\cdot\frac{dt}{ds}=f(\tilde{h}_{s},s\cdot(t_{\text{end}}^{c}-t_{\text{start}}^{c})+t_{\text{start}}^{c})\cdot(t_{\text{end}}^{c}-t_{\text{start}}^{c}),

(36)

where $\frac{dt}{ds}=t_{\text{end}}^{c}-t_{\text{start}}^{c}$ .

The initial and final states are transformed as:

\tilde{h}_{0}=h_{t_{\text{start}}^{c}},\quad\tilde{h}_{1}=h_{t_{\text{end}}^{c}}.

(37)

The solution for the cumulative hidden state at the end of the walk becomes:

h_{t_{\text{end}}^{c}}=\text{ODESolve}(h_{t_{\text{start}}^{c}},f,t_{\text{start}}^{c},t_{\text{end}}^{c})=\text{ODESolve}(\tilde{h}_{0},\tilde{f},0,1).

(38)

Here, the reparameterized function $\tilde{f}$ ensures that all walks are solved over a common interval, enabling efficient batch computation.

Parallelism Across Walks

While each walk still involves step-wise computations within the reparameterized interval, the unification of time intervals to $[0,1]$ allows a single solver instance to process multiple walks in parallel. This is achieved by stacking the dynamics of all walks into a joint system:

\frac{d\tilde{h}_{\text{joint}}}{ds}=\begin{bmatrix}f_{1}(\tilde{h}_{1},s\cdot\Delta t_{1}+t_{\text{start}}^{1})\cdot\Delta t_{1}\\ f_{2}(\tilde{h}_{2},s\cdot\Delta t_{2}+t_{\text{start}}^{2})\cdot\Delta t_{2}\\ \vdots\\ f_{C}(\tilde{h}_{C},s\cdot\Delta t_{C}+t_{\text{start}}^{C})\cdot\Delta t_{C}\end{bmatrix},

(39)

where $\Delta t_{c}=t_{\text{end}}^{c}-t_{\text{start}}^{c}$ . The solver processes this joint system, reducing the computational overhead from $C$ independent solver calls to a single batched solver call.

Batch Processing Efficiency

For walks with $l=1$ , the reparameterization directly enables batch processing since the time interval for all walks is normalized to $[0,1]$ . However, for walks with $l>1$ , each step within the walk corresponds to a distinct interval, making true parallelism challenging. Solving multiple sub-intervals within each walk requires sequential processing.

To fully enable parallel processing for $l>1$ , further strategies, such as normalizing intermediate step intervals or discretizing the ODE solutions, could be employed. These approaches remain an open area for further optimization.

Handling Large Time Intervals

In real-world scenarios, large time intervals $\Delta t_{c}$ can cause numerical instability. To mitigate this, we normalize the intervals while preserving relative temporal differences using a logarithmic transformation:

\Delta t_{\text{scaled}}=\log_{10}(\Delta t+1),

(40)

ensuring that smaller intervals retain fine-grained dynamics while larger intervals are compressed to a manageable scale.

This batching mechanism combines time reparameterization and joint ODE solving, significantly improving computational efficiency and enabling scalable representation learning on dynamic graphs.

C.3 Complexity Analysis

We analyze the time complexity of the main components in CTWalks, including community detection, temporal walk sampling, anonymization, continuous evolution, and instantaneous updates. The detailed breakdown is as follows:

Community Detection.

Community detection is performed using a modularity-based algorithm, such as Louvain, which has a time complexity of $O(|E|\log n)$ , where $|E|$ is the number of edges and $n$ is the number of nodes. This step is executed once as preprocessing.

Temporal Walk Sampling.

In each batch, $BC$ walks are generated, where $B$ is the batch size (number of root nodes) and $C$ is the number of walks per root node. The sampling process involves $l$ steps per walk, with each step querying neighbors at a complexity of $O(k_{s})$ , where $k_{s}$ is the average number of neighbors. The total complexity for temporal walk sampling is $O(BCk_{s})$ .

Walk Anonymization.

For anonymization, each walk requires $O(l)$ operations to identify unique nodes and assign positional encodings. Across $BC$ walks, the total complexity becomes $O(BCl)$ .

Continuous Evolution.

Continuous evolution integrates the temporal evolution function over time intervals using an ODE solver, requiring $F$ function evaluations per step. For $BC$ walks with $l$ steps, the total complexity is $O(BClFd_{h}^{2})$ . Using the batch processing optimization introduced in Section B.1, this is reduced to $O(lFd_{h}^{2})$ .

Instantaneous Updates.

The instantaneous updates compute the hidden state at each step using a parameterized function, resulting in a complexity of $O(ld_{h}^{2})$ .

Subtotal (Sampling and Anonymization).

The combined complexity for sampling and anonymization, including temporal walk sampling and walk anonymization, is:

O(BC(k_{s}+l)).

(41)

Subtotal (Encoding).

The combined complexity for continuous evolution and instantaneous updates is:

O(lFd_{h}^{2}).

(42)

Total Complexity.

Combining all components, the total time complexity of CTWalks per epoch is:

O(|E|\log n)+O(BC(k_{s}+l))+O(lFd_{h}^{2}).

(43)

Table 5 summarizes the complexity of each component.

Component	Time Complexity
Community Detection	$O(\|E\|\log n)$
Temporal Walk Sampling	$O(BCk_{s})$
Walk Anonymization	$O(BCl)$
Subtotal (Sampling and Anonymization)	$O(BC(k_{s}+l))$
Continuous Evolution	$O(lFd_{h}^{2})$
Instantaneous Updates	$O(ld_{h}^{2})$
Subtotal (Encoding)	$O(lFd_{h}^{2})$
Total	$O(\|E\|\log n)+O(BC(k_{s}+l))+O(lFd_{h}^{2})$

Table 5: Time Complexity Analysis of CTWalks

Appendix D Additional Experimental Details

D.1 Dataset Details and Preprocessing

We evaluate our method on five real-world datasets across diverse domains. The details are as follows:

•

UCI: Contains 1,899 nodes and 59,835 temporal edges, representing message interactions in a university social forum. No additional preprocessing is required.¹¹1Dataset available at: https://snap.stanford.edu/data/CollegeMsg.html
•

MOOC: Logs 7,145 nodes and 411,749 temporal edges, capturing student interactions with course units over one month.²²2Dataset available at: http://snap.stanford.edu/jodie/mooc.csv
•

Enron: Includes 184 nodes and 125,235 temporal edges, recording email communications over several years.³³3Dataset available at: http://www.cs.cmu.edu/~enron/
•

Taobao: Contains 987,994 nodes and 2,099,520 temporal edges, representing user-item interactions with encoded action types (e.g., clicks, purchases).⁴⁴4Dataset available at: https://tianchi.aliyun.com/dataset/dataDetail?dataId=649&userId=1
•

Wikipedia: Bipartite graph with 9,227 nodes and 157,474 temporal edges, representing editor-page interactions encoded with LIWC features.⁵⁵5Dataset available at: http://snap.stanford.edu/jodie/wikipedia.csv

Data Preprocessing: The following steps ensure consistency and facilitate evaluation:

1.

All temporal edges are sorted chronologically to maintain temporal consistency.
2.

Edges are split into training (70%), validation (15%), and testing (15%) sets.
3.

Negative sampling is performed to generate edges absent in the original graph, ensuring a balanced dataset.

D.2 Baselines and Hyperparameter Tuning

Baseline Methods:

To evaluate the performance of our proposed method, we compare it with six state-of-the-art baselines specifically designed for continuous-time dynamic graphs (CTDGs):

•

CTDNE: CTDNE extends static network embedding techniques to CTDGs by leveraging temporal random walks and a skip-gram model to learn node representations. This method captures the temporal dependency of interactions.⁶⁶6CTDNE repository: https://github.com/LogicJake/CTDNE
•

JODIE: JODIE employs two mutually influenced recurrent neural networks (RNNs) to update the latent states of interacting nodes. The model is capable of predicting future embedding trajectories based on past interactions.⁷⁷7JODIE repository: https://github.com/claws-lab/jodie
•

DyRep: DyRep integrates sequence modeling with an attentive message-passing mechanism, incorporating 2-hop temporal neighborhood information to produce time-aware embeddings. The loss function is built upon temporal point processes.⁸⁸8DyRep repository: https://github.com/uoguelph-mlrg/LDG
•

TGAT: TGAT applies temporal encoding via random Fourier time encodings and attentively aggregates temporal neighborhood information to generate embeddings. The model supports multi-hop temporal message aggregation.⁹⁹9TGAT repository: https://github.com/StatsDLMathsRecomSys/Inductive-representation-learning-on-temporal-graphs
•

TGN: TGN proposes a memory-based message-passing framework, which updates node memories dynamically while combining key designs from JODIE and TGAT to enhance its capability to model temporal interactions.¹⁰¹⁰10TGN repository: https://github.com/twitter-research/tgn
•

CAWs: CAWs encode and aggregate anonymous temporal walks using RNNs, followed by pooling mechanisms to generate temporal node embeddings. Its community-aware design enables it to capture complex temporal dynamics.¹¹¹¹11CAWs repository: https://github.com/snap-stanford/CAW

Hyperparameter Tuning:

To ensure a fair comparison, we carefully adapt each baseline to our unified evaluation framework. Key hyperparameters are tuned through grid search to optimize model performance. Below, we summarize the tuning strategies for each method:

•

TGAT: Neighborhood Sampling Degree is tuned from {8, 16, 32, 64}; Model Layers are searched over {1, 2, 3}; Attention Heads (default product attention) are tuned from {2, 4, 6}; and Embedding Dimensions are tuned from {16, 32, 64, 100}.
•

JODIE: Embedding Dimensions are tuned from {32, 64, 128, 256}.
•

DyRep: Neighborhood Sampling Degree is tuned from {10, 16, 32, 64}, and Attention Layers are tuned over {1, 2, 3}.
•

TGN: Attention Heads are tuned from {2, 4, 8}; Embedding Dimensions for nodes, time, and messages are tuned over {16, 32, 64, 100}; and Memory Dimensions are searched over {4, 16, 32, 64, 172}.
•

CAWs: Walk Length is tuned over {1, 2, 3}; Number of Walks is tuned from {16, 32, 64, 128}; and Time Decay Factor is tuned over {10^-7, 10^-6, 10^-5, 10^-4}. Walk pooling is set to summation for consistency and simplicity.
•

CTDNE: Skip-gram Window Size is tuned from {3, 5, 10}.

Method	Hyperparameter	Search Range	Optimal Value
TGAT	Neighborhood Sampling Degree	{8, 16, 32, 64}	32
	Model Layers	{1, 2, 3}	2
	Attention Heads	{2, 4, 6}	4
	Embedding Dimensions	{32, 64, 128, 256}	256
JODIE	Embedding Dimensions	{32, 64, 128, 256}	256
DyRep	Neighborhood Sampling Degree	{10, 16, 32, 64}	16
	Attention Layers	{1, 2, 3}	2
TGN	Attention Heads	{2, 4, 8}	4
	Embedding Dimensions	{32, 64, 128, 256}	256
	Memory Dimensions	{4, 16, 32, 64, 172}	64
CAWs	Walk Length	{1, 2, 3}	3
	Number of Walks	{16, 32, 64}	64
	Time Decay Factor	{10^-7, 10^-6, 10^-5, 10^-4}	10^-6
CTDNE	Skip-gram Window Size	{3, 5, 10}	5

Table 6: Hyperparameter search ranges and optimal values for baseline methods.

Implementation Details:

All baselines are implemented using publicly available codebases. We ensure consistency in training and evaluation settings, including dataset splits and metrics, to enable fair comparisons. For additional implementation details, such as the specific training configurations and runtime environments, refer to Appendix C.3.

D.3 Implementation Details

Code Availability:

Our implementation of CTWalks is publicly available at https://github.com/leonyuhe/CTWalks. The repository includes detailed instructions for dataset preparation, model training, and hyperparameter tuning.

General Training Settings:

CTWalks is trained across all datasets using the following settings:

•

Optimizer: Adam with a learning rate of $10^{-4}$ and batch size of 32.
•

Early Stopping: Training stops if validation performance does not improve for three consecutive epochs.
•

Epoch Limit: Maximum 50 epochs.

Hyperparameter Tuning:

To ensure optimal performance, key hyperparameters controlling walk sampling are tuned via grid search. The following summarizes the hyperparameter configurations for all datasets:

•

Walk Length ( $l$ ): {1, 2, 3}
•

Number of Walks per Node ( $C$ ): {16, 32, 64}
•

ODE Solver: Fixed-step Runge-Kutta 3/8 method with a step size of 0.125.

Evaluation Tasks:

•

Transductive: Models are trained on all edges up to a specific time point and tested on future edges.
•
Inductive: Evaluates the ability to predict links involving unseen nodes. Two scenarios are considered:
1. 1.
  
  New-Old: Interactions between unseen and observed nodes.
2. 2.
  
  New-New: Interactions exclusively between unseen nodes.

Hardware Configuration: Experiments are conducted on an Ubuntu Linux server equipped with four NVIDIA GeForce RTX 3090 GPUs, two Intel(R) Xeon(R) Silver 4210 CPUs (2.20GHz), and 252GB of RAM. Each experiment is repeated five times, and the average results are reported.

Evaluation Metrics:

•

AUC (Area Under the ROC Curve): Measures the quality of predictions.
•

AP (Average Precision): Evaluates the precision-recall relationship, detail in Table 7.

Task		Methods	UCI	MOOC	Enron	Taobao	Wikipedia
Inductive	new v.s. new	DyRep	64.55 $\pm$ 4.22	74.93 $\pm$ 1.77	73.41 $\pm$ 0.41	88.33 $\pm$ 1.15^†	67.89 $\pm$ 1.21
		TGAT	77.20 $\pm$ 1.32	71.15 $\pm$ 1.27	64.11 $\pm$ 1.69	77.45 $\pm$ 1.45	91.32 $\pm$ 0.89
		TGN	80.28 $\pm$ 1.39	78.31 $\pm$ 1.77	81.78 $\pm$ 0.02	85.34 $\pm$ 0.41	64.56 $\pm$ 4.33
		CTDNE	63.75 $\pm$ 0.67	72.91 $\pm$ 0.27	67.14 $\pm$ 1.15	78.77 $\pm$ 0.08	63.09 $\pm$ 1.05
		JODIE	71.45 $\pm$ 0.49	76.08 $\pm$ 4.00	71.66 $\pm$ 0.81	82.18 $\pm$ 2.04	73.54 $\pm$ 0.61
		CAWs	88.78 $\pm$ 0.82^†	90.89 $\pm$ 0.33^†	96.95 $\pm$ 1.09	86.99 $\pm$ 2.80	96.91 $\pm$ 1.41
		CTWalks	91.26 $\pm$ 0.08	93.72 $\pm$ 0.52	95.91 $\pm$ 0.62^†	97.75 $\pm$ 0.38	94.94 $\pm$ 0.24^†
	new v.s. old	DyRep	94.11 $\pm$ 2.91^†	89.07 $\pm$ 1.29	95.18 $\pm$ 0.55	90.11 $\pm$ 1.42^†	77.45 $\pm$ 1.27
		TGAT	77.10 $\pm$ 0.99	68.40 $\pm$ 1.62	62.11 $\pm$ 1.24	65.93 $\pm$ 0.93	96.12 $\pm$ 1.10
		TGN	87.78 $\pm$ 1.01	73.02 $\pm$ 1.14^†	78.56 $\pm$ 0.42	89.09 $\pm$ 0.30	90.96 $\pm$ 0.89
		CTDNE	71.89 $\pm$ 0.70	70.10 $\pm$ 0.83	65.44 $\pm$ 0.38	69.02 $\pm$ 0.05	89.06 $\pm$ 1.02
		JODIE	72.55 $\pm$ 0.35	89.00 $\pm$ 1.81	86.54 $\pm$ 1.29	81.23 $\pm$ 2.03	75.66 $\pm$ 0.20
		CAWs	92.10 $\pm$ 0.16	91.14 $\pm$ 0.07^†	94.10 $\pm$ 1.22^†	89.56 $\pm$ 1.10	96.99 $\pm$ 0.81
		CTWalks	96.35 $\pm$ 0.16	93.81 $\pm$ 0.18	93.17 $\pm$ 0.19	94.78 $\pm$ 0.41	96.01 $\pm$ 0.22^†
Transductive		DyRep	96.11 $\pm$ 1.42^†	91.22 $\pm$ 0.22	97.41 $\pm$ 0.75	83.79 $\pm$ 1.07	78.23 $\pm$ 1.70
		TGAT	78.11 $\pm$ 0.99	72.95 $\pm$ 1.43	61.32 $\pm$ 1.27	63.76 $\pm$ 1.38	96.89 $\pm$ 1.11^†
		TGN	84.01 $\pm$ 1.16	74.21 $\pm$ 0.03	69.01 $\pm$ 1.14	88.21 $\pm$ 0.04^†	95.93 $\pm$ 0.23
		CTDNE	77.55 $\pm$ 0.86	74.03 $\pm$ 0.29	90.08 $\pm$ 0.68	65.78 $\pm$ 0.02	93.11 $\pm$ 0.26
		JODIE	75.29 $\pm$ 0.49	91.40 $\pm$ 0.78	61.11 $\pm$ 0.62	82.77 $\pm$ 0.29	90.44 $\pm$ 0.21
		CAWs	95.97 $\pm$ 0.05	95.11 $\pm$ 0.27^†	92.32 $\pm$ 0.51	86.88 $\pm$ 0.34	99.21 $\pm$ 0.26
		CTWalks	98.65 $\pm$ 0.08	95.88 $\pm$ 0.07	93.35 $\pm$ 0.54^†	92.98 $\pm$ 0.13	97.88 $\pm$ 0.23

Table 7: Transductive and inductive link prediction performances w.r.t. AP. We use bold font and ^† to highlight the best and second-best performances.

Appendix E Experiment on Purely Static Graphs

To further validate the benefits of community walks in enhancing network embedding learning, we conducted experiments on purely static graphs, focusing on the community structure without considering temporal information. Specifically, we applied our method, CTWalks, which incorporates community walks to generate node sequences, and subsequently used the word2Vec mikolov2013efficient framework to generate embeddings from these sequences. By deliberately excluding temporal data and leveraging word2Vec, our approach aims to highlight the effectiveness of CTWalks in capturing community structures and achieving competitive performance on static graphs compared to classical graph embedding algorithms.

Experimental Setup

We evaluated CTWalks (CTW) against six widely recognized baseline methods: node2vec (N2V) grover2016node2vec , DeepWalk (DW) perozzi2014deepwalk , LINE tang2015line , Struc2Vec (S2V) ribeiro2017struc2vec , GraphSage (GS) hamilton2017graphsage , and M-NMF (MN) wang2017community . To ensure a fair comparison, the implementations of these methods were obtained from publicly available repositories. Specifically, Node2vec, DeepWalk, and M-NMF were implemented using the Karate Club library¹²¹²12https://github.com/benedekrozemberczki/karateclub. The implementation of LINE was sourced from its official repository¹³¹³13https://github.com/tangjianpku/LINE. Similarly, Struc2Vec¹⁴¹⁴14https://github.com/leoribeiro/struc2vec and GraphSage¹⁵¹⁵15https://github.com/williamleif/GraphSAGE were obtained from their respective repositories. This approach ensured the use of standardized and well-maintained implementations for each baseline.

The experiments were conducted on six datasets from the Network Repository nr2015 and SNAP leskovec2016snap , spanning diverse domains such as social, biological, and ecological networks. All graphs were treated as static by disregarding temporal information. Each graph was split into 70% edges for training and 30% for testing. Node embeddings were learned on the training graph, while the testing edges were excluded to ensure unbiased evaluation.

Table 8: Dataset Statistics

Dataset	$\|V\|$	$\|E\|$	$d_{\text{avg}}$	$K_{\text{avg}}$	$T_{\text{frac}}$
fb-pages-food	620	2091	6.7	0.33	0.22
ego-Facebook	4039	88234	43.69	0.60	0.26
soc-hamsterster	2000	16097	16.10	0.5375	0.2314
aves-weaver-social	117	304	5.20	0.6924	0.5747
bio-DM-LC	483	997	4.13	0.1047	0.0551
bio-WormNet-v3	2274	78328	68.89	0.8390	0.7211

Notation: $|V|$ : Number of nodes; $|E|$ : Number of edges; $d_{\text{avg}}$ : Average node degree; $K_{\text{avg}}$ : Average clustering coefficient; $T_{\text{frac}}$ : Fraction of closed triangles.

The hyperparameters for CTW, DW, and N2V were set to $L_{w}=80$ (walk length), $R=10$ (number of walks per node), and $d=128$ (embedding dimension). Edge embeddings were computed using the Hadamard product of node embeddings. The performance was evaluated using logistic regression and two metrics: AUC (Area Under the Curve) and Average Precision (AP). Each experiment was repeated ten times with different random seeds, and the average results are reported.

E.1 Results and Analysis

Link Prediction.

The results of the link prediction task are presented in Table 10 (AUC) and Table 10 (AP). CTW consistently outperformed baseline methods across all datasets. Notably, CTW demonstrated significant improvements on datasets with strong community structures, such as soc-hamsterster and fb-pages-food. For instance:

•

On bio-WormNet-v3, CTW achieved an AUC of 0.9915, outperforming the second-best method, M-NMF, by 3.2%.
•

On ego-Facebook, CTW achieved an AP of 0.9891, showcasing robust performance in large-scale social networks.
•

On datasets with smaller community structures, such as aves-weaver-social, CTW achieved a notable improvement, with AUC gains of up to 5.1% over the best baseline.

Dataset	CTW	N2V	DW	LINE	S2V	GS	MN
fb-pages-food	0.9319	0.8764	0.8632	0.8121	0.8573	0.8902	0.8964
ego-Facebook	0.9885	0.9452	0.9203	0.8947	0.9345	0.9541	0.9721
soc-hamsterster	0.9122	0.8610	0.8423	0.8045	0.8534	0.8812	0.8734
aves-weaver-social	0.9114	0.8732	0.8517	0.8245	0.8651	0.8832	0.8903
bio-DM-LC	0.9337	0.8810	0.8743	0.8547	0.8723	0.8901	0.9121
bio-WormNet-v3	0.9915	0.9543	0.9402	0.9134	0.9624	0.9742	0.9603

Table 9: Average AUC Scores for Link Prediction on Static Graphs. Bold numbers represent the best results.

Dataset	CTW	N2V	DW	LINE	S2V	GS	MN
fb-pages-food	0.9421	0.8920	0.8814	0.8289	0.8712	0.9045	0.9102
ego-Facebook	0.9891	0.9534	0.9351	0.9107	0.9443	0.9642	0.9813
soc-hamsterster	0.9215	0.8735	0.8602	0.8224	0.8698	0.8914	0.8803
aves-weaver-social	0.9217	0.8801	0.8655	0.8371	0.8764	0.8942	0.9015
bio-DM-LC	0.9440	0.8954	0.8883	0.8651	0.8850	0.9025	0.9234
bio-WormNet-v3	0.9961	0.9602	0.9456	0.9175	0.9723	0.9815	0.9673

Table 10: Average AP Scores for Link Prediction on Static Graphs. Bold numbers represent the best results.

Appendix F Theoretical Analysis

We interpret CTWalks from the perspective of matrix factorization. We demonstrate that the two-layer random walk process in CTWalks corresponds to a novel matrix factorization form, integrating intra- and inter-community dynamics through a hierarchical transition mechanism. These analyses highlight CTWalks’ theoretical advantages in encoding community-aware structures, providing insights into its effectiveness in network representation learning.

Analysis 2: Matrix Factorization Perspective of CTWalks

The Skip-Gram with negative sampling (SGNS) framework mikolov2013distributed in network embedding methods such as DeepWalk perozzi2014deepwalk and node2vec grover2016node2vec has been shown to implicitly factorize a pointwise mutual information (PMI) matrix levy2014neural ; Qiu:2018 . CTWalks extends this perspective by incorporating a hierarchical, two-layer random walk that explicitly encodes both intra-community and inter-community transitions, resulting in a novel matrix factorization form.

CTWalks Transition Matrices

In CTWalk, the probability of transitioning between two nodes during the random walk is governed by two distinct components: intra-community transitions (within communities) and inter-community transitions (across communities). To model this, we introduce two matrices: 1) a block-diagonal matrix $M_{I}\in\mathbb{R}^{|V|\times|V|}$ that represents transitions within individual communities, and 2) an extended matrix $M_{C}\in\mathbb{R}^{|V|\times|V|}$ that captures transitions between communities via bridging nodes.

Intra-community transition matrix ( $M_{I}$ ).

The matrix $M_{I}$ represents the normalized intra-community transitions, structured as a block-diagonal matrix. Each block $D_{i}^{-1}A_{i}$ corresponds to a community $C_{i}$ , where:

•

$A_{i}\in\mathbb{R}^{|V_{i}|\times|V_{i}|}$ is the adjacency matrix of the subgraph $G_{i}=(V_{i},E_{i})$ , induced by the set of nodes $V_{i}$ and edges $E_{i}$ within community $C_{i}$ .
•

$D_{i}^{-1}$ is the inverse degree matrix of $A_{i}$ , ensuring row normalization of transition probabilities within each community.

The resulting block-diagonal structure of $M_{I}$ is expressed as:

M_{I}=\begin{pmatrix}D_{1}^{-1}A_{1}&0&\cdots&0\\ 0&D_{2}^{-1}A_{2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&D_{k}^{-1}A_{k}\end{pmatrix}.

(44)

This formulation confines random walks within individual communities $C_{1},C_{2},\dots,C_{k}$ , effectively capturing the dense intra-community relationships while isolating transitions between distinct communities.

Inter-community transition matrix ( $M_{C}$ ).

The matrix $M_{C}$ captures transitions between inter-community bridging nodes. To construct $M_{C}$ , we first normalize the adjacency matrix $A_{c}$ of $G_{c}=(V_{c},E_{c})$ , where $V_{c}$ is the set of bridging nodes. The normalized matrix is defined as:

A_{c}^{(n)}=D_{c}^{-1}A_{c},

(45)

where $D_{c}$ is the degree matrix of $A_{c}$ , ensuring that the rows of $A_{c}^{(n)}$ sum to one. Here, $A_{c}^{(n)}\in\mathbb{R}^{|V_{c}|\times|V_{c}|}$ represents the normalized transition probabilities restricted to the bridging nodes.

To integrate $M_{C}$ with $M_{I}$ , we align the node order in $M_{C}$ to match the node ordering used in $M_{I}$ . Specifically, the rows and columns of $M_{C}$ are arranged such that:

•

Nodes within the same community are grouped together, maintaining the block-diagonal structure of $M_{I}$ ,
•

Rows and columns corresponding to non-bridging nodes $w\notin V_{c}$ are filled with zeros.

This alignment ensures that both $M_{C}$ and $M_{I}$ are compatible for subsequent matrix operations, such as multi-step transition summation. The resulting $M_{C}$ can be visualized as follows:

M_{C}=\begin{pmatrix}0&0&0&0&0&\cdots&0\\ 0&A_{c}^{(n)}[v_{2},v_{2}]&0&0&A_{c}^{(n)}[v_{2},v_{5}]&\cdots&0\\ 0&0&0&0&0&\cdots&0\\ 0&0&0&0&0&\cdots&0\\ 0&A_{c}^{(n)}[v_{5},v_{2}]&0&0&A_{c}^{(n)}[v_{5},v_{5}]&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&0&0&0&\cdots&0\end{pmatrix}.

(46)

This design ensures that $M_{C}$ and $M_{I}$ remain consistent in terms of node alignment. By isolating inter-community transitions and aligning them to the global node order of $M_{I}$ , $M_{C}$ facilitates a seamless integration of intra-community and inter-community dynamics in the CTWalks framework.

Matrix Factorization of CTWalks

Lemma 2: Let $M_{C}$ and $M_{I}$ represent the inter-community and intra-community transition matrices, respectively. The embeddings learned by CTWalks using Skip-Gram with negative sampling correspond to a low-rank factorization of the following shifted Pointwise Mutual Information (PMI) matrix:

\log\left(\text{vol}(G)\cdot\left(\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)\right)D^{-1}\right)-\log k,

(47)

where:

•

$\text{vol}(G)$ = $\sum_{i}\sum_{j}A_{i,j}$ is the volume of graph $G$ ,
•

$T$ is the context window size,
•

$M_{C}^{r}$ and $M_{I}^{r}$ : The $r$ -step transition probability matrices for $M_{C}$ (inter-community transitions) and $M_{I}$ (intra-community transitions), respectively,
•

$D^{-1}$ is the inverse degree matrix of $G$ ,
•

$k$ is the negative sampling parameter.

Proof. We begin with the Skip-Gram objective, which seeks to maximize the probability of observing a context node $c$ given a target node $w$ . Under the CTWalks framework, random walks are independently applied to the inter-community transition matrix $M_{C}$ and the intra-community transition matrix $M_{I}$ .

Step 1: Joint Probability.

The joint probability $P(w,c)$ of observing a node $w$ with context $c$ is defined by the stationary distribution $\pi(w)$ and the $T$ -step transition probabilities:

P(w,c)\approx\pi(w)\cdot\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)[w,c],

(48)

where $\pi(w)$ is the stationary probability of node $w$ , given by:

\pi(w)=\frac{d(w)}{\text{vol}(G)}.

(49)

Step 2: Marginal Probabilities.

The marginal probability $P(w)$ of node $w$ under the stationary distribution is given by:

P(w)=\pi(w)=\frac{d(w)}{\text{vol}(G)}.

(50)

Similarly, the marginal probability $P(c)$ of context $c$ is:

P(c)=\pi(c)=\frac{d(c)}{\text{vol}(G)}.

(51)

Step 3: Pointwise Mutual Information (PMI).

The PMI between target node $w$ and context node $c$ is defined as:

PMI(w,c)=\log\frac{P(w,c)}{P(w)P(c)}.

(52)

Substituting $P(w,c)$ , $P(w)$ , and $P(c)$ into Eq.(52), we obtain:

PMI(w,c)\approx\log\frac{\pi(w)\cdot\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)[w,c]}{\pi(w)\pi(c)}.

(53)

Simplifying:

PMI(w,c)\approx\log\left(\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)[w,c]\cdot\frac{\text{vol}(G)}{d(c)}\right).

(54)

Step 4: Matrix Representation.

We can represent the PMI matrix in the following matrix form:

\log\left(\text{vol}(G)\cdot\left(\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)\right)D^{-1}\right).

(55)

Step 5: SGNS Objective.

The Skip-Gram with negative sampling introduces a shift $-\log k$ , where $k$ is the number of negative samples. Therefore, the final form of the factorization becomes:

\log\left(\text{vol}(G)\cdot\left(\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)D^{-1}\right)\right)-\log k.

(56)

Lemma 2 demonstrates that the embeddings learned by CTWalks correspond to the low-rank factorization of the combined $T$ -step transition probabilities derived from $M_{C}$ and $M_{I}$ . This factorization effectively integrates inter-community and intra-community relationships, capturing both local and global relationships without requiring additional parameters.

Significance of the Factorization.

By combining $M_{C}$ and $M_{I}$ in a structured manner, CTWalks enables an explicit encoding of both local community structures and global inter-community connectivity. This theoretical insight justifies the empirical superiority of CTWalks in capturing hierarchical and multi-scale graph properties in tasks such as link prediction and node classification.

Appendix G Detailed Explanation of Node Anonymization

Motivation for Anonymization.

The primary goal of anonymization is to achieve generalization across nodes, enabling the model to effectively operate on unseen nodes or structures during inference. By replacing explicit node identities with position-based representations:

•

The model avoids overfitting to specific node features.
•

Community labels enrich the anonymized representation, embedding mesoscopic graph structures.
•

Directionality-aware representations ensure that roles of nodes as sources or targets are preserved.

Process Overview.

Anonymization in CTWalks operates on temporal walks generated from a given interaction $((u,v),t_{i})$ . For each interaction, two sets of walks are constructed:

•

$\mathcal{S}_{u}$ : Walks originating from the source node $u$ .
•

$\mathcal{S}_{v}$ : Walks originating from the target node $v$ .

A unique set of nodes $V_{\text{interaction}}$ is obtained from $\mathcal{S}_{u}\cup\mathcal{S}_{v}$ . Each node $w\in V_{\text{interaction}}$ is anonymized by aggregating its positional occurrences across all walks in the respective sets, incorporating directionality and community context.

Anonymization Steps.

1. For each node $w$ in $V_{\text{interaction}}$ , perform the following steps:

•
Position Vector Computation for $\mathcal{S}_{u}$ :
- –
  
  For each walk $W\in\mathcal{S}_{u}$ , compute $w$ ’s positional occurrence vector:
  
  $A(w;W)=[\mathbb{I}(w=W[1]),\mathbb{I}(w=W[2]),\dots,\mathbb{I}(w=W[l])],$ (57)
  
  where $l$ is the walk length and $\mathbb{I}$ is an indicator function.
- –
  
  Aggregate $w$ ’s positional vectors across all walks in $\mathcal{S}_{u}$ :
  
  $A(w;\mathcal{S}_{u})=\sum_{W\in\mathcal{S}_{u}}A(w;W).$ (58)
•

Community Label Integration for $\mathcal{S}_{u}$ :

$A(w;\mathcal{S}_{u},C_{u})=[A(w;\mathcal{S}_{u})||C_{u}],$ (59)

where $C_{u}$ is the community label of the source node $u$ .
•

Repeat for $\mathcal{S}_{v}$ : Similarly, compute:

$A(w;\mathcal{S}_{v},C_{v})=[A(w;\mathcal{S}_{v})||C_{v}],$ (60)

where $C_{v}$ is the community label of the target node $v$ .
•

Combine Source and Target Representations: Incorporate directionality by concatenating the source and target representations:

$A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})=[A(w;\mathcal{S}_{u},C_{u})||A(w;\mathcal{S}_{v},C_{v})].$ (61)

2. Store the final anonymized representation $A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})$ for all $w\in V_{\text{interaction}}$ .

Anonymized Walk Construction.

After anonymizing individual nodes, the anonymized temporal walk is constructed. For a single walk $W=\{w_{1},w_{2},\dots,w_{l}\}$ with timestamps $t_{1}<t_{2}<\dots<t_{l}$ , the anonymized walk is defined as:

W_{\text{anon}}=\{(A(w_{1}),t_{1}),(A(w_{2}),t_{2}),\dots,(A(w_{l}),t_{l})\},

(62)

where $A(w_{i})$ is the direction- and community-aware anonymized representation of $w_{i}$ .

Discussion.

This anonymization process achieves the following:

•

Generalization: By removing raw node identities and replacing them with position-based representations, the process ensures generalization across different nodes and graph structures.
•

Directionality: Separate representations for $\mathcal{S}_{u}$ and $\mathcal{S}_{v}$ preserve the source and target node roles, capturing directional interactions.
•

Community Awareness: Incorporating community labels $C_{u}$ and $C_{v}$ enhances context-awareness, allowing the model to leverage mesoscopic structural information.

This integration of positional, directional, and community-based information establishes a strong foundation for robust representation learning in CTWalks. The anonymization process is detailed in Algorithm 3.

Algorithm 3 Node Anonymization in CTWalks for a Given Interaction

Input: Temporal interaction $((u,v),t_{i})$ , walk sets $\mathcal{S}_{u}$ and $\mathcal{S}_{v}$ , community labels $C_{u}$ and $C_{v}$ .
Output: Anonymized representations $\{A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})\mid w\in V_{\text{interaction}}\}$ , where $V_{\text{interaction}}$ is the set of unique nodes appearing in $\mathcal{S}_{u}$ and $\mathcal{S}_{v}$ .

1: Initialize

V_{\text{interaction}}\leftarrow\text{unique nodes in }\mathcal{S}_{u}\cup\mathcal{S}_{v}

2: Initialize

\text{AnonymizedList}\leftarrow\emptyset

3: for each node

w\in V_{\text{interaction}}

4: Initialize

A(w;\mathcal{S}_{u})\leftarrow\mathbf{0}

(zero vector of length equal to the walk length).

5: for each walk

W\in\mathcal{S}_{u}

6: Compute position vector

A(w;W)=[\mathbb{I}(w=W[1]),\dots,\mathbb{I}(w=W[l])]

7: Aggregate position vectors:

A(w;\mathcal{S}_{u})\leftarrow A(w;\mathcal{S}_{u})+A(w;W)

8: end for

9: Append community label:

A(w;\mathcal{S}_{u},C_{u})\leftarrow[A(w;\mathcal{S}_{u})||C_{u}]

10: Initialize

A(w;\mathcal{S}_{v})\leftarrow\mathbf{0}

(zero vector of length equal to the walk length).

11: for each walk

W\in\mathcal{S}_{v}

12: Compute position vector

A(w;W)=[\mathbb{I}(w=W[1]),\dots,\mathbb{I}(w=W[l])]

13: Aggregate position vectors:

A(w;\mathcal{S}_{v})\leftarrow A(w;\mathcal{S}_{v})+A(w;W)

14: end for

15: Append community label:

A(w;\mathcal{S}_{v},C_{v})\leftarrow[A(w;\mathcal{S}_{v})||C_{v}]

16: Concatenate direction-aware representations:

A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})\leftarrow[A(w;\mathcal{S}_{u},C_{u})||A(w;\mathcal{S}_{v},C_{v})].

(63)

17: Add

A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})

to AnonymizedList.

18: end for

19: Return AnonymizedList.

Appendix H Comparison and Advantages of Continuous-Time Approaches in CTWalks

Continuous-time models have gained significant traction in recent years, offering an alternative to traditional discrete-time approaches for modeling irregularly sampled time series and graph data. Below, we explore different continuous-time methodologies and position CTWalks within this landscape, demonstrating its unique contributions and advantages.

Overview of Continuous-Time Approaches

Standard RNNs. Recurrent Neural Networks (RNNs) excel at modeling high-dimensional, regularly sampled time series such as text and speech. However, they struggle with irregular sampling, as they inherently assume fixed time intervals between observations. This limitation necessitates preprocessing steps such as imputation or aggregation, which can lead to loss of critical temporal information lipton2016critical ; che2018recurrent .

RNN-Decay. To address the irregularity issue, RNN-Decay introduces exponentially decaying hidden states between observations. While this approach maintains some temporal dynamics, it assumes a simplistic decay function, which limits its ability to capture complex temporal dependencies mei2017neural .

Neural ODEs. Neural Ordinary Differential Equations (ODEs) chen2018neural define hidden state dynamics as continuous functions governed by ODEs, offering a principled way to model time as a continuous variable. Neural ODEs eliminate the need for fixed time steps, making them well-suited for irregularly sampled data. However, their reliance on purely continuous trajectories can oversimplify scenarios involving abrupt changes or discrete transitions.

ODE-RNNs. ODE-RNNs rubanova2019latent extend Neural ODEs by combining continuous dynamics with discrete updates. Between observations, hidden states evolve via ODE solvers, while at observation points, updates are applied using neural networks. This hybrid approach captures both continuous and discrete aspects of temporal data, making it particularly effective for sparse time series.

Figure 4 illustrates continuous-time approaches introduce enhancements for dynamic graphs. Standard RNNs fail to model irregular intervals effectively, while RNN-Decay offers limited improvements. Neural ODEs provide continuous trajectories but lack discrete adaptability. ODE-RNNs bridge this gap, and CTWalks extends this framework further by embedding community context into temporal dynamics. Building on the strengths of ODE-RNNs, CTWalks incorporates continuous-time dynamics into graph representation learning. Unlike prior methods, CTWalks integrates a community-aware framework with ODE-driven sampling, offering the following advantages:

•

Preservation of Temporal Dynamics: By employing ODE solvers to model temporal transitions, CTWalks accurately captures both intra- and inter-community dynamics without requiring imputation or aggregation. This aligns with the principles of Neural ODEs but extends their applicability to dynamic graphs.
•

Hybrid Continuous-Discrete Framework: Similar to ODE-RNNs, CTWalks combines continuous-time evolution with discrete updates. This enables the model to adapt to abrupt changes in graph structure, such as the formation or dissolution of communities.
•

Community-Aware Sampling: The integration of community-aware mechanisms ensures that temporal walks respect community boundaries while capturing cross-community interactions. This is a key distinction from traditional ODE-based methods, which lack structural awareness.

By embedding continuous-time dynamics into a community-aware framework, CTWalks bridges temporal modeling with structural awareness, enabling more sophisticated applications in dynamic graph representation learning.

Community-Aware Temporal Walks: Parameter-Free Representation Learning on Continuous-Time Dynamic Graphs

Abstract

1 Introduction

2 Related Work

3 Preliminaries

Definition 1: Continuous-Time Dynamic Graph.

Definition 2: Temporal Link Prediction.

Definition 3: Temporal Walk.

Definition 4: Community.

4 The Proposed Method: CTWalks

4.1 Sampling

Weighted Temporal Graph Construction.

Sampling Strategy.

4.2 Anonymization

Community- and Direction-Aware Representation.

Anonymized Walk Construction.

4.3 Encoding

4.4 Extension for Attributed Graphs

5 Theoretical Analysis

Analysis 1: Overcoming Locality Bias

6 Experiments

6.1 Experimental Setting

Baselines and Datasets.

Data Preparation.

Evaluation Tasks.

Training Details.

6.2 Experimental Results and Discussion

6.3 Ablation Study

7 Discussion and Conclusion

References

Appendix A Notations and Definitions

Appendix B Community Detection Algorithms

B.1 Louvain Method

B.2 Other Community Detection Algorithms

Appendix C Batching and Complexity Analysis

C.1 Function Definitions: gg and ff

1. Instantaneous Update Function gg

2. Continuous Temporal Evolution Function ff

C.2 Batching Analysis

Reparameterization for Unified Time Intervals

Reformulated ODE for Batch Processing

Parallelism Across Walks

Batch Processing Efficiency

Handling Large Time Intervals

C.3 Complexity Analysis

Community Detection.

Temporal Walk Sampling.

Walk Anonymization.

Continuous Evolution.

Instantaneous Updates.

Subtotal (Sampling and Anonymization).

Subtotal (Encoding).

Total Complexity.

Appendix D Additional Experimental Details

D.1 Dataset Details and Preprocessing

D.2 Baselines and Hyperparameter Tuning

Baseline Methods:

Hyperparameter Tuning:

Implementation Details:

D.3 Implementation Details

Code Availability:

General Training Settings:

Hyperparameter Tuning:

Appendix E Experiment on Purely Static Graphs

Experimental Setup

E.1 Results and Analysis

Link Prediction.

Appendix F Theoretical Analysis

Analysis 2: Matrix Factorization Perspective of CTWalks

CTWalks Transition Matrices

Intra-community transition matrix (MIM_{I}).

Inter-community transition matrix (MCM_{C}).

Matrix Factorization of CTWalks

Step 1: Joint Probability.

Step 2: Marginal Probabilities.

Step 3: Pointwise Mutual Information (PMI).

Step 4: Matrix Representation.

Step 5: SGNS Objective.

Significance of the Factorization.

Appendix G Detailed Explanation of Node Anonymization

C.1 Function Definitions: $g$ and $f$

1. Instantaneous Update Function $g$

2. Continuous Temporal Evolution Function $f$

Intra-community transition matrix ( $M_{I}$ ).

Inter-community transition matrix ( $M_{C}$ ).