ASD Classification on Dynamic Brain Connectome using Temporal Random Walk with Transformer-based Dynamic Network Embedding

Suchanuch Piriyasatit, Chaohao Yuan, Ercan Engin Kuruoglu^∗ Senior Member, IEEE ^∗Corresponding authorSuchanuch Piriyasatit, Chaohao Yuan and Ercan Engin Kuruoglu are with Institute of Data and Information, Tsinghua Shenzhen International Graduate School, Shenzhen, Guangdong, China (z/-yt22@mails.tsinghua.edu.cn; yuanch22@mails.tsinghua.edu.cn; kuruoglu@sz.tsinghua.edu.cn).This work is supported by Shenzhen Science and Technology Innovation Commission under Grant JCYJ20220530143002005, Tsinghua University SIGS Start-up fund under Grant QD2022024C, and Shenzhen Ubiquitous Data Enabling Key Lab under Grant ZDSYS20220527171406015.

Abstract

Autism Spectrum Disorder (ASD) is a complex neurological condition characterized by varied developmental impairments, especially in communication and social interaction. Accurate and early diagnosis of ASD is crucial for effective intervention, which is enhanced by richer representations of brain activity. The brain functional connectome, which refers to the statistical relationships between different brain regions measured through neuroimaging, provides crucial insights into brain function. Traditional static methods often fail to capture the dynamic nature of brain activity, in contrast, dynamic brain connectome analysis provides a more comprehensive view by capturing the temporal variations in the brain. We propose BrainTWT, a novel dynamic network embedding approach that captures temporal evolution of the brain connectivity over time and considers also the dynamics between different temporal network snapshots. BrainTWT employs temporal random walks to capture dynamics across different temporal network snapshots and leverages the Transformer’s ability to model long term dependencies in sequential data to learn the discriminative embeddings from these temporal sequences using temporal structure prediction tasks. The experimental evaluation, utilizing the Autism Brain Imaging Data Exchange (ABIDE) dataset, demonstrates that BrainTWT outperforms baseline methods in ASD classification.

Index Terms:

dynamic graph embedding, temporal random walk, transformers network, brain functional connectivity, graph classification

I Introduction

Autism Spectrum Disorder (ASD) is a developmental disorder characterized by difficulties in social interaction, repetitive behaviors, and restricted interests, with symptoms varying widely across the spectrum. The classification of ASD based on brain characteristics has advanced significantly with the evolution of neuroimaging technologies and analytical techniques. Historically, the diagnosis of ASD relied heavily on behavioral assessments and clinical observations [31, 30]. However, with the introduction of imaging techniques like MRI and fMRI, researchers have been able to study brain structure and function in more detail [28, 43, 15].

Brain functional connectivity or functional connectome refers to statistical relationships between different regions of the brain measured by correlation of neural signals. This connectivity can be observed through various neuroimaging techniques like fMRI, which detect blood flow changes associated with neural activity, indicating how different parts of the brain communicate during tasks or in resting states. Functional connectivity provides insights into both normal brain function and the impacts of neurological disorders on the brain. Brain’s functional states are not static but evolve over time [26], hence there is the need to study dynamic functional connectivity to understand the complex brain activity related to disorders like ASD. Studies [18, 29, 12] have highlighted how dynamic functional connectivity provides deeper insights into the temporal variability of brain networks, revealing patterns that static connectivity measures might miss.

Given a dynamic functional connectivity of the brain, the task of detecting ASD can be approached as a dynamic network classification task, which allows for the use of network embedding techniques. Network embedding techniques have become essential in capturing the complexities of network structures by transforming a network into a low-dimensional space for downstream tasks. A node-level embedding transforms each node in a network to a vector representation for node-level tasks such as node classification, link prediction, and node clustering while a graph-level or network-level embedding methods transform the whole network into a single vector representations that summarizes the entire network structure for graph-level tasks such as graph classification, graph similarity ranking, and anomaly detection.

Static network embedding methods often fail to capture the dynamic nature of the brain connectome. On the other hand, majority of the graph-level dynamic network embedding methods model the dynamic network in a snapshot-based manner where an embedding for each temporal network snapshot is learned independently without consideration for interactions or dynamics between snapshots. To address this, we propose BrainTWT, a novel graph-level dynamic network embedding that, instead of considering the dynamic brain network at each time snapshot independently, captures the dynamics across different time snapshots and models temporal nodes’ interactions attention on temporal structure prediction tasks. This is achieved by leveraging temporal random walks to capture the inter-snapshot dynamics of the dynamic brain network and utilizing a Transformer-based model to model dependencies and interactions across network states over time. The experimental results using the Autism Brain Imaging Data Exchange (ABIDE) benchmarking dataset [13] show that BrainTWT achieves better performance compared to other baselines for ASD classification.

II Related Work

Analysis on fMRI signals and functional connectivity data for brain disorder classification including ASD has advanced with the development of machine learning techniques. Initial studies focused on classical models such as Support Vector Machines (SVM) and Random Forest to analyze correlations between different brain regions evident in functional MRI (fMRI) data [1, 50]. Given high-dimensional nature of fMRI data, deep learning models frameworks are later applied including Convolutional Neural Networks, Recurrent Networks and have been employed to explore both spatial and temporal aspects of brain functional connectivity [22, 51, 3].

Another branch of approaches applied to functional connectivity is through using network embedding methods. Network embedding methods can be divided into node-level and network-level (or graph-level) embeddings, each can be designed for static or dynamic network:
Node-level Embedding: Node-level embedding methods learn a vector representation of each node in a network for node-level tasks such as node classification and link prediction. Various methods have been proposed for static networks. Matrix factorization-based method such as Locally Linear Embedding (LLE) [46], Laplacian eigenmaps [6], SVD [16], HOPE [41], and GraRep [11], decompose the Laplacian or network adjacency matrix to create embeddings that preserve node structures and high-order proximities. Random walk-based methods such as DeepWalk [44] and node2vec [17] use random walks to sample the context of nodes and apply Skip-gram to learn the embeddings. Moreover, Graph neural networks (GNNs) including graph convolutional networks (GCN) [24], graph attention networks (GAT) [54], graph auto-encoders (GAEs) [23] leverage message-passing mechanisms to generate network embeddings. To learn embeddings for dynamic network, various methods build upon these static methods to effectively learn node embeddings in time-varying networks. This includes CTDNE [35] which extends DeepWalk by using temporal random walks with Skip-gram. Dynamic graph neural networks are proposed such as EvolveGCN [42], TGAT [55].
Network-level or Graph-level embedding: Graph-level embedding methods learn a vector representation of the entire network for graph-level task such as graph classification. Various approaches use the results from node-level embeddings and aggregate all the node embeddings, by maximizing, averaging, or summing, into one vector to represent the whole network [27]. While other methods are proposed to directly learn embedding for the whole network. Early graph kernels methods such as message passing kernels [47, 34], shortest path kernels [7, 37], and subgraph kernels [48, 25], focus on capturing graph isomorphism and substructure frequencies. Skip-gram-based and random walk-based methods [33, 2] use sequences of nodes with Skip-gram to learn embeddings. Deep learning methods include recurrent neural network-based methods (RNNs) [63], convolution neural network-based methods (CNNs) [49, 38], GNNs-based method [32, 39]. Recently the Transformer-based methods [61] have gained increasing popularity. More in-depth reviews and analysis can be found in [58]. On the other hand, for dynamic networks, graph-level embedding methods without having to resort to pooling of node embeddings, have not been as widely explored. tdGraphEmbed [5] is the first to propose graph-level method based on random walk with Skip-gram to learn an embedding of each time snapshot of a dynamic network. GraphERT [4] uses random walk sequences sampled independently from each temporal network snapshot and utilizes a Transformer-based model to learn the embeddings. However, both tdGraphEmbed and GraphERT sample random walks independently from each timestep with the goal to obtain network representation for each temporal network snapshot. Their downstream tasks are conducted within different time snapshots of a dynamic network (e.g. similarity ranking, anomaly detection). Although we have different goals of summarizing the entire dynamic network and considering inter-snapshot relationships within a dynamic network, we are motivated by GraphERT to use the Transformers to model the inter-snapshot relationships where the learned embedding will have information of the entire dynamic brain connectivity across different time steps.
Applications of network embedding for brain networks and ASD classification: [59] explores various static network embedding methods applied to fMRI data for classifying ASD in a static functional connectivity. The methods explored are random-walk based methods including node2vec [17], struc2vec [45], AWE [19]. t-BNE [9] proposes tensor factorization-based method employing a partially symmetric tensor factorization approach with side information guidance to capture meaningful patterns associated with brain disorders. Various methods leverage graph neural networks to learn the brain functional connectivity. Hi-GCN [20] learns the brain network embeddings using hierarchical GCN to capture topological patterns in individual brain network considering relations with broader population-level characteristics. Other approaches are proposed to consider dynamic brain networks. [14] explores the age and sex difference on resting-state fMRIs using Spatio-temporal graph convolution network (ST-GCN) [56, 60]. Sequential Monte Carlo GCN (SMC-GCN) [62] adopts particle filtering and allows for a statistical interpretation of dynamic functional connectivity. [40] proposes Graph Autoencoder-based embedding to learn dynamic brain networks for ASD classification.

III Problem Statement

Given a set of dynamic brain connectomes $\mathcal{G}=\{G_{1},G_{2},\ldots,G_{M}\}$ , where $G_{i}=(V_{i},E_{i})$ represents a dynamic brain connectome of subject $i$ , and $V_{i}$ is the set of nodes shared across time steps and $E_{i}\subseteq V_{i}\times V_{i}\times T_{i}$ is the set of all temporal edges in $G_{i}$ . Our goal is to find a mapping function $f:G\rightarrow\mathbb{R}^{d}$ that transform each dynamic connectome $G_{i}$ into a $d$ -dimensional vector space. The resulting embeddings will be used as inputs to train logistic classifier for the ASD classification.

IV Methodology

Refer to caption — Figure 1: The dynamic brain network is obtained from Pearson’s correlation between the BOLD time-series of each pair of regions of interest (ROIs) in the brain. The dynamic brain network is then converted into a series of temporal random walk sequences, capturing the temporal evolution of brain connectivity. Each temporal sequence is then tokenized and embedded. The embedded sequences are processed through a Transformer model that uses self-attention mechanisms on node interactions based on their temporal and contextual significance. Parts of the sequence are masked, and the model predicts these masked nodes, refining the embeddings using contextual data. The model incorporates joint learning to optimize both the temporal dynamics and graph-level loss. The final embeddings are used as input features for ASD classification task.

IV-A Sampling Temporal Sequences with Temporal Random Walk

We first convert a dynamic brain network $G_{i}$ of subject $i$ into a set of temporal random walk sequences $\{\mathcal{W}_{1},\mathcal{W}_{2},...\mathcal{W}_{n}\}$ where $n$ is the number of random walk samples per network. Note that we drop the network index in the random walk sequence notation for simplicity.

A temporal random walk modifies the traditional random walk and incorporates time into the walk to ensure the transitions follow the chronological sequence of events that occur in the dynamic network. Formally, a temporal walk $\mathcal{W}_{k}$ on a dynamic network $G_{i}$ is a sequence of nodes $\{v_{1},v_{2},...,v_{l}\}$ of length $l$ , where the edge $e_{i}$ , which is the edge connecting $v_{i}$ and $v_{i+1}$ for $1\leq i<l-1$ , satisfies $\mathcal{T}(e_{i})\leq\mathcal{T}(e_{i+1})$ where $\mathcal{T}(e_{i})$ and $\mathcal{T}(e_{i+1})\in 1\dots T$ are the timesteps of edges $e_{i}$ and $e_{i+1}$ , respectively. This incorporates time information into the walk and ensures that the walk is a forward move in time and that it is an ordered sequence of events in the network.

Consider a random walker positioned at node $v_{j}$ during time $t$ . The set of temporal edges connected to node $v_{j}$ at time $t$ can be defined as:

\mathcal{N}_{T}(v_{j},t)=\{e_{jk}^{t^{\prime}}=(v_{j},v_{k},t^{\prime})\in E_{i}\text{ where }t^{\prime}\geq t\}.

The probability of transitioning along an edge within this neighborhood is given by:

\mathcal{P}_{T}(e_{jk}^{t^{\prime}}|v_{j},t)=\frac{\exp[t-t^{\prime}]}{\sum_{e\in\mathcal{N}_{T}(v_{j},t)}\exp[t-\mathcal{T}(e)]}.

(1)

This choice of the exponential function prioritizes edges that are closer in time to the current position, thereby reducing the likelihood of the walker making temporally distant jumps and preserving the continuity of time in the walk.

IV-B Learn Temporal Dynamics with the Transformer Model

To learn temporal dynamics from sequences of temporal random walk, we mask a percentage of nodes in each temporal sequence, and then predict those masked nodes, where we use the Transformer to model the token embeddings. This masked sequence prediction has been done in the context of natural language processing [21] which we will follow.

Before going through the self-attention layers, an input temporal walk sequence $\mathcal{W}_{k}=\{v_{k,1},v_{k,2},...,v_{k,l}\}\in\mathbb{R}^{l}$ will be tokenized and initialized as an input embedding $X_{k}=(<CLS>,\textbf{v}_{k,1},\textbf{v}_{k,2},...,\textbf{v}_{k,l})\in$ $\mathbb{R}^{n\times d}$ , where $n=l+1$ is the sequence length with the added special tokens $<CLS>$ to specify the start of a sequence. What important about the $<CLS>$ token is that its final hidden state after going through all the attention layers can be used as the aggregate embedding of this temporal sequence [21]. Next, we state the rest of the Transformer model [53]:

IV-B1 Self-Attention Mechanism

The self-attention mechanism dynamically determines the relevance, or “attention,” that each token in the sequence should pay to every other token when constructing its own representation. Given an input sequence embedding $X_{k}=(<CLS>,\textbf{v}_{k,1},\textbf{v}_{k,2},...,\textbf{v}_{k,l})\in$ $\mathbb{R}^{n\times d}$ , the attention scores is then computed using three learnable weight matrices $W^{Q}$ , $W^{K}$ , $W^{V}\in\mathbb{R}^{d\times d_{k}}$ as

Q=XW^{Q},\quad K=XW^{K},\quad V=XW^{V}.

(2)

The attention function then allows each node to dynamically attend to every node in the walk sequence, weighted by their computed attention scores,

\text{Attention}(Q,K,V)=\text{softmax}\left(\frac{QK^{T}}{\sqrt{d_{k}}}\right)V.

(3)

This updates the node embedding to reflect not only its current connections but also other connections along the temporal walk over time. Moreover, the Transformer can consider multiple sets of these learnable weight matrices, i.e., $W_{i}^{Q}$ , $W_{i}^{K}$ , $W_{i}^{V}\in\mathbb{R}^{d\times d_{k}}$ , and concatenate their attention outputs together, as called multi-heads attention, to capture multiple patterns in the data:

\text{MultiHead}(Q,K,V)=\text{Concat}(\text{head}_{1},...,\text{head}_{h})W^{O},

(4)

where $W^{O}\in\mathbb{R}^{hd_{k}\times d}$ , $h$ is the attention heads considered, and

\text{head}_{i}=\text{Attention}(QW_{i}^{Q},KW_{i}^{K},VW_{i}^{V}).

(5)

IV-B2 Position-wise Feed-Forward Networks

Following the attention mechanism, each position’s output is independently fed through a position-wise feed-forward network, which is essentially two linear transformations with a ReLU activation in between:

\text{FFN}(x)=\max(0,xW_{1}+b_{1})W_{2}+b_{2},

(6)

where $W_{1}\in\mathbb{R}^{d\times d_{ff}}$ , $b_{1}\in\mathbb{R}^{d_{ff}}$ , $W_{2}\in\mathbb{R}^{d_{ff}\times d}$ , $b_{2}\in\mathbb{R}^{d}$ .

IV-B3 Position Encodings

To account for the sequence order, positional encodings are added to the input embeddings. We use the positional encoding proposed in the original Transformer paper [53]:

\text{PE}(pos,2i)=\sin\left(\frac{pos}{10000^{2i/d}}\right)

(7)

\text{PE}(pos,2i+1)=\cos\left(\frac{pos}{10000^{2i/d}}\right),

(8)

where $pos$ is the token position and $i$ is the index $\in 1...d$ of the embedding dimension.

TABLE I: 10-fold Cross Validation Performance Comparison

Method	Accuracy	Sensitivity	Specificity	AUC
Random Forest [8]	0.6038 $\pm$ 0.0572	0.3926 $\pm$ 0.0751	0.7860 $\pm$ 0.0547	0.6660 $\pm$ 0.0512
GCN [24]	0.6246 $\pm$ 0.0593	0.5753 $\pm$ 0.1463	0.6659 $\pm$ 0.1498	0.6445 $\pm$ 0.0589
GAT [54]	0.6302 $\pm$ 0.0433	0.5527 $\pm$ 0.1285	0.6960 $\pm$ 0.1217	0.6551 $\pm$ 0.0651
tdGraphEmbed [5]	0.6325 $\pm$ 0.0432	0.5656 $\pm$ 0.0534	0.6902 $\pm$ 0.0509	0.6671 $\pm$ 0.0437
GraphERT [4]	0.6338 $\pm$ 0.0694	0.4648 $\pm$ 0.0937	0.7794 $\pm$ 0.0945	0.6949 $\pm$ 0.0712
GSA-LSTM [10]	0.6840	0.6440	0.6980	0.7050
BrainTWT
(without temporal dynamics)	0.5074 $\pm$ 0.0461	0.3453 $\pm$ 0.0821	0.6475 $\pm$ 0.0756	0.5093 $\pm$ 0.0664
BrainTWT
(with temporal dynamics)	0.7003 $\pm$ 0.0494	0.6504 $\pm$ 0.0733	0.7435 $\pm$ 0.0969	0.7527 $\pm$ 0.0445

IV-C Temporal Dynamics and Graph-level Loss Joint Learning

The joint loss in the similar manner as [4] is adopted. The main difference comes from the use of temporal random walk that makes a sequence includes nodes from across different timesteps of the network in temporal order. Therefore, instead of learning network structures through random walk path within each time snapshot independently, we learn temporal dynamics of the entire network considering its inter-snapshot relationships. Finally, we will obtain an embedding to represent the whole dynamic network.
The Temporal Dynamics Loss: A one-layer MLP classifier $C_{temporal}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{|V|}$ with learnable weight $W_{TD}\in\mathbb{R}^{d\times|V|}$ is constructed to predict the masked token in the temporal walk sequences. Given a masked token’s embedding output, $E_{i}$ , from the Transformer, the classifier predicts the actual node that corresponds to the token. The temporal dynamics loss is then the cross-entropy loss of the classifier:

\mathcal{L}_{TD}=-\frac{1}{n}\sum_{i=1}^{n}\sum_{(j,k)}\log((E_{i,j}\cdot W_{TD})_{k}),

(9)

where $n$ is the number of random walk samples, $E_{i,j}$ is the embedding output from the Transformer of the $j^{\text{th}}$ token in a walk sequence $i$ , and $k$ is the index of the actual node that corresponds to the masked postition.
The Graph-Level Loss: Intuitively, each learned embedding for each walk sequence should capture dynamics specific to its original temporal graph. Given an embedding of a random walk sequence, we want to predict the dynamic brain network from which it is sampled. Similarly, another one-layer MLP classifier $C_{graph}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{|\mathcal{G}|}$ with learnable weight $W_{GS}\in\mathbb{R}^{d\times|\mathcal{G}|}$ is constructed to predict probability distribution of all dynamic brain networks a sequence is sampled from.

The $<CLS>$ token embedding output, $E_{CLS_{i}}$ , from the Transformer for each random walk sequence is used as the aggregated representation for that walk sequence since it has aggregated contextual information from the entire sequence through the self-attention mechanism. The graph-level loss is

\mathcal{L}_{GS}=-\frac{1}{n}\sum_{i=1}^{n}\log((E_{CLS_{i}}\cdot W_{GS})_{k})

(10)

where $k$ is the index of the true dynamic network.
The Joint Loss is then as follows:

\mathcal{L}_{\text{total}}=\lambda_{1}\cdot\mathcal{L}_{TD}+\lambda_{2}\cdot\mathcal{L}_{GS},

(11)

where $\lambda_{1}$ and $\lambda_{2}$ are trade-offs between temporal dynamics and graph-level loss. This joint loss enables the model to learn temporal dynamics within each dynamic network, while ensuring that the learned embeddings are discriminative and uniquely represent each network.

After the training, $W_{GS}(G_{i})$ is the resulting dynamic graph embedding of the dynamic graph $G_{i}$ . These resulting embeddings are then used as inputs with corresponding ASD labels to train a logistic classifier.

V Experiment and Results

Our code is publicly available at https://github.com/suchanuchp/BrainTWT.

V-A Data Preparation

The resting state fMRI data is from the Autism Brain Imaging Data Exchange (ABIDE) [13]. A total of 871 subjects are considered where the subjects are filtered by experts to include only high quality data subjects result in a total of 871 subjects with 468 healthy controls and 403 ASDs [1]. The data is preprocessed using standardized method [36], including slice timing correction, motion correction, motion scrubbing, etc. The BOLD time-series for each regions of interest (ROIs) is extracted using the AAL atlas [52] which partitions the brain into 116 ROIs. To get a dynamic brain network for each subject, Pearson’s correlation between the BOLD time-series of each pair of ROIs is calculated, thresholding method is applied to keep only the connections where the correlation exceeds the 80^th percentile. This is done on a sliding window of length 50 and stride 5 to obtain a dynamic brain network.

V-B Experimental Settings

We set the maximum length of random walk $l=20$ and sample 30 walks per node. We mask 15% of the nodes for the temporal masked sequence prediction task. For the Transformer, we set the embedding dimension $d=252$ , the number of attention heads $h=4$ , and 6 hidden layers. To demonstrate the importance of temporal dynamics modeling, an ablation test is done by showing BrainTWT without the temporal dynamics loss ( $\lambda_{1}=0$ ) alone with BrainTWT with the temporal dynamics loss ( $\lambda_{1}=1$ , $\lambda_{2}=5$ ).

V-C Evaluation and Baseline Methods

We use the stratified K-fold cross-validation with $K=10$ as the evaluation method. Stratified K-fold cross-validation divides the dataset into $K$ folds and maintains an equal proportion of each class label in every fold to preserve the original distribution. Each fold serves once as a validation set while the model trains on the remaining folds. The mean scores from the cross-validation are reported along with the standard deviations. The results are compared with six baseline models including Random Forest [8], GCN [24], GAT [54], tdGraphEmbed [5], GraphERT [4], and GSA-LSTM [10]. We use mean pooling on node embeddings from GCN and GAT to get a network embedding. For tdGraphembed and GraphERT, we use mean pooling on temporal network snapshot embeddings to get an aggregated embedding of the entire dynamic network. Additionally, we also conduct a leave-one-site-out cross validation on our model for references. This is when all subjects from one site are used for evaluation and all subjects from the other sites are used for training.

V-D Results and Discussions

The results of 10-fold stratified cross-validation are shown in Table I. BrainTWT, when incorporating temporal dynamics learning, achieves a relative increase of 2.38%, 0.99%, 6.52%, and 6.77%, in accuracy, sensitivity, specificity, and AUC, respectively, when compared to the second best performing model, GSA-LSTM. In ablation study, it also significantly outperforms its counterpart that excludes the temporal dynamics learning. This demonstrates the importance of the temporal dynamics loss to learn the network embedding. Furthermore, it outperforms the static baselines, Random Forest, GCN, and GAT, again showing the importance of brain temporal information in ASD classification. Lastly, it outperforms the snapshot-based dynamic methods, tdGraphEmbed and GraphERT, showing the importance of including the inter-snapshot dynamics into the embeddings. Results of the leave-one-site out validation for BrainTWT can be found for reference in Table II where the highest accuracy is from the site TRINITY with 88.64% accuracy and the lowest is from STANFORD with 52.00% accuracy

TABLE II: leave-one-site-out results

Site	Subject	Acc.	Sen.	Spe.	AUC
	Count
CMU	11	0.8181	0.8333	0.8000	0.8000
CALTECH	15	0.8666	0.6000	1.0000	0.7200
KKI	33	0.7576	0.7500	0.7619	0.7500
LEUVEN_1	28	0.6786	0.5714	0.7857	0.7806
LEUVEN_2	28	0.6786	0.5833	0.7500	0.7448
MAX_MUN	46	0.5870	0.5263	0.6296	0.5867
NYU	172	0.6453	0.8108	0.5204	0.7625
OHSU	25	0.5200	0.4167	0.6154	0.6154
OLIN	28	0.6429	0.7143	0.5714	0.7194
PITT	50	0.6800	0.4583	0.8846	0.7180
SBL	26	0.6538	0.5000	0.7857	0.6905
SDSU	27	0.7407	0.5000	0.8421	0.8355
STANFORD	25	0.5200	0.5833	0.4615	0.6154
TRINITY	44	0.8864	0.9474	0.8400	0.9116
UCLA_1	64	0.6406	0.6486	0.6296	0.6947
UCLA_2	21	0.8571	1.0000	0.7000	0.8364
UM_1	86	0.5814	0.5882	0.5769	0.6572
UM_2	34	0.7647	0.5385	0.9048	0.7912
USM	67	0.6418	0.5581	0.7917	0.7965
YALE	41	0.7561	0.7273	0.7895	0.8062

VI Conclusion

In this study, we propose a dynamic network embedding method BrainTWT for ASD classification that leverages temporal random walks and Transformer-based models to capture the dynamic evolution of brain connectivity over time. Our method outperforms traditional static and snapshot-based dynamic methods by effectively incorporating the temporal dynamics and inter-snapshot relationships within dynamic brain functional connectomes. The evaluation using the ABIDE dataset confirms that BrainTWT significantly improves classification performance, demonstrating the importance of modeling temporal information in neurological disorder analysis. Future research can extend these techniques to other cognitive disorders and integrate multimodal neuroimaging data to enrich the diagnostic potential. In the future, we will also exploit spectral graph information for the dynamic network prediction method [57].

References

[1] Alexandre Abraham, Michael P Milham, Adriana Di Martino, R Cameron Craddock, Dimitris Samaras, Bertrand Thirion, and Gael Varoquaux. Deriving reproducible biomarkers from multi-site resting-state data: An autism-based example. NeuroImage, 147:736–745, 2017.
[2] Bijaya Adhikari, Yao Zhang, Naren Ramakrishnan, and B Aditya Prakash. Sub2vec: Feature learning for subgraphs. In Advances in Knowledge Discovery and Data Mining: 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part II 22, pages 170–182. Springer, 2018.
[3] Shale Ahammed, Sijie Niu, Rishad Ahmed, Jiwen Dong, Xizhan Gao, and Yuehui Chen. Darkasdnet: classification of asd on functional mri using deep neural network. Frontiers in Neuroinformatics, 15:635657, 2021.
[4] Moran Beladev, Gilad Katz, Lior Rokach, Uriel Singer, and Kira Radinsky. Graphert–transformers-based temporal dynamic graph embedding. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, pages 68–77, 2023.
[5] Moran Beladev, Lior Rokach, Gilad Katz, Ido Guy, and Kira Radinsky. tdgraphembed: Temporal dynamic graph-level embedding. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 55–64, 2020.
[6] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems, 14, 2001.
[7] Karsten M Borgwardt and Hans-Peter Kriegel. Shortest-path kernels on graphs. In Fifth IEEE International Conference on Data Mining (ICDM’05), pages 8–pp. IEEE, 2005.
[8] Leo Breiman. Random forests. Machine Learning, 45:5–32, 2001.
[9] Bokai Cao, Lifang He, Xiaokai Wei, Mengqi Xing, Philip S Yu, Heide Klumpp, and Alex D Leow. t-bne: Tensor-based brain network embedding. In proceedings of the 2017 SIAM International Conference on Data Mining, pages 189–197. SIAM, 2017.
[10] Peng Cao, Guangqi Wen, Xiaoli Liu, Jinzhu Yang, and Osmar R Zaiane. Modeling the dynamic brain network representation for autism spectrum disorder diagnosis. Medical & Biological Engineering & Computing, 60(7):1897–1913, 2022.
[11] Shaosheng Cao, Wei Lu, and Qiongkai Xu. Grarep: Learning graph representations with global structural information. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 891–900. ACM, 2015.
[12] Yuzhe Chen, Dayu Qin, and Ercan Engin Kuruoglu. Topological and graph theoretical analysis of dynamic functional connectivity for autism spectrum disorder. In International Conference on Brain Informatics. Springer, 2024.
[13] Adriana Di Martino, Chao-Gan Yan, Qingyang Li, Erin Denio, Francisco X Castellanos, Kaat Alaerts, Jeffrey S Anderson, Michal Assaf, Susan Y Bookheimer, Mirella Dapretto, et al. The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism. Molecular psychiatry, 19(6):659–667, 2014.
[14] Soham Gadgil, Qingyu Zhao, Adolf Pfefferbaum, Edith V Sullivan, Ehsan Adeli, and Kilian M Pohl. Spatio-temporal graph convolution for resting-state fmri analysis. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part VII 23, pages 528–538. Springer, 2020.
[15] Daniele Giansanti. An umbrella review of the fusion of fMRI and ai in autism. Diagnostics, 13(23):3552, 2023.
[16] Gene H Golub and Christian Reinsch. Singular value decomposition and least squares solutions. In Handbook for Automatic Computation: Volume II: Linear Algebra, pages 134–151. Springer, 1971.
[17] Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 855–864, 2016.
[18] Vatika Harlalka, Raju S Bapi, PK Vinod, and Dipanjan Roy. Atypical flexibility in dynamic functional connectivity quantifies the severity in autism spectrum disorder. Frontiers in Human Neuroscience, 13:6, 2019.
[19] Sergey Ivanov and Evgeny Burnaev. Anonymous walk embeddings. In International Conference on Machine Learning, pages 2186–2195. PMLR, 2018.
[20] Hao Jiang, Peng Cao, MingYi Xu, Jinzhu Yang, and Osmar Zaiane. Hi-gcn: A hierarchical graph convolution network for graph embedding learning of brain network and brain disorders prediction. Computers in Biology and Medicine, 127:104096, 2020.
[21] Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT, volume 1. Minneapolis, Minnesota, 2019.
[22] Meenakshi Khosla, Keith Jamison, Amy Kuceyeski, and Mert R Sabuncu. 3d convolutional neural networks for classification of functional connectomes. In International Workshop on Deep Learning in Medical Image Analysis, pages 137–145. Springer, 2018.
[23] Thomas N. Kipf and Max Welling. Variational graph auto-encoders. In Neural Information Processing Systems Workshop on Bayesian Deep Learning, 2016.
[24] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations. International Conference on Learning Representations, 2017.
[25] Nils Kriege and Petra Mutzel. Subgraph matching kernels for attributed graphs. In Proceedings of the 29th International Conference on Machine Learning, pages 1313–1320. International Machine Learning Society, 2012.
[26] Kangjoo Lee, Jie Lisa Ji, Clara Fonteneau, Lucie Berkovitch, Masih Rahmati, Lining Pan, Grega Repovš, John H Krystal, John D Murray, and Alan Anticevic. Human brain state dynamics are highly reproducible and associated with neural and behavioral features. PLoS Biology, 22(9):e3002808, 2024.
[27] Zhi-Peng Li, Si-Guo Wang, Qin-Hu Zhang, Yi-Jie Pan, Nai-An Xiao, Jia-Yang Guo, Chang-An Yuan, Wen-Jian Liu, and De-Shuang Huang. Graph pooling for graph-level representation learning: a survey. Artificial Intelligence Review, 58(2):45, 2024.
[28] Meijie Liu, Baojuan Li, and Dewen Hu. Autism spectrum disorder studies using fmri data and machine learning: a review. Frontiers in Neuroscience, 15:697870, 2021.
[29] Lin Ma, Tengfei Yuan, Wei Li, Lining Guo, Dan Zhu, Zirui Wang, Zhixuan Liu, Kaizhong Xue, Yaoyi Wang, Jiawei Liu, et al. Dynamic functional connectivity alterations and their associated gene expression pattern in autism spectrum disorders. Frontiers in Neuroscience, 15:794151, 2022.
[30] Mark Mintz. Evolution in the understanding of autism spectrum disorder: historical perspective. The Indian Journal of Pediatrics, 84(1):44–52, 2017.
[31] Pat Mirenda. Autism spectrum disorder: Past, present, and future. Perspectives on Augmentative and Alternative Communication, 22(3):131–138, 2013.
[32] Christopher Morris, Martin Ritzert, Matthias Fey, William L Hamilton, Jan Eric Lenssen, Gaurav Rattan, and Martin Grohe. Weisfeiler and leman go neural: Higher-order graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 4602–4609, 2019.
[33] Annamalai Narayanan, Mahinthan Chandramohan, Lihui Chen, Yang Liu, and Santhoshkumar Saminathan. subgraph2vec: Learning distributed representations of rooted sub-graphs from large graphs. arXiv preprint arXiv:1606.08928, 2016.
[34] Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting. Propagation kernels: efficient graph kernels from propagated information. Machine learning, 102:209–245, 2016.
[35] Giang H Nguyen, John Boaz Lee, Ryan A Rossi, Nesreen K Ahmed, Eunyee Koh, and Sungchul Kim. Dynamic network embeddings: From random walks to temporal random walks. In 2018 IEEE International Conference on Big Data (Big Data), pages 1085–1092. IEEE, 2018.
[36] Jared A Nielsen, Brandon A Zielinski, P Thomas Fletcher, Andrew L Alexander, Nicholas Lange, Erin D Bigler, Janet E Lainhart, and Jeffrey S Anderson. Multisite functional connectivity mri classification of autism: Abide results. Frontiers in human neuroscience, 7:599, 2013.
[37] Giannis Nikolentzos, Polykarpos Meladianos, François Rousseau, Yannis Stavrakas, and Michalis Vazirgiannis. Shortest-path graph kernels for document similarity. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1890–1900, 2017.
[38] Giannis Nikolentzos, Polykarpos Meladianos, Antoine Jean-Pierre Tixier, Konstantinos Skianis, and Michalis Vazirgiannis. Kernel graph convolutional neural networks. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27, pages 22–32. Springer, 2018.
[39] Giannis Nikolentzos and Michalis Vazirgiannis. Random walk graph neural networks. Advances in Neural Information Processing Systems, 33:16211–16222, 2020.
[40] Fuad Noman, Sin-Yee Yap, Raphaël C-W Phan, Hernando Ombao, and Chee-Ming Ting. Graph autoencoder-based embedded learning in dynamic brain networks for autism spectrum disorder identification. In 2022 IEEE International Conference on Image Processing (ICIP), pages 2891–2895. IEEE, 2022.
[41] Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu. Asymmetric transitivity preserving graph embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1105–1114. ACM, 2016.
[42] Aldo Pareja, Giacomo Domeniconi, Jie Chen, Tengfei Ma, Toyotaro Suzumura, Hiroki Kanezashi, Tim Kaler, Tao Schardl, and Charles Leiserson. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 5363–5370, 2020.
[43] Mahie Patil, Nofel Iftikhar, and Latha Ganti. Neuroimaging insights into autism spectrum disorder: Structural and functional brain. Health Psychology Research, 12, 2024.
[44] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 701–710. ACM, 2014.
[45] Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 385–394, 2017.
[46] Sam T Roweis and Lawrence K Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
[47] Nino Shervashidze, Pascal Schweitzer, Erik Jan Van Leeuwen, Kurt Mehlhorn, and Karsten M Borgwardt. Weisfeiler-lehman graph kernels. Journal of Machine Learning Research, 12(9), 2011.
[48] Nino Shervashidze, SVN Vishwanathan, Tobias Petri, Kurt Mehlhorn, and Karsten Borgwardt. Efficient graphlet kernels for large graph comparison. In Artificial Intelligence and Statistics, pages 488–495. PMLR, 2009.
[49] Martin Simonovsky and Nikos Komodakis. Dynamic edge-conditioned filters in convolutional neural networks on graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3693–3702, 2017.
[50] Vigneshwaran Subbaraju, Mahanand Belathur Suresh, Suresh Sundaram, and Sundararajan Narasimhan. Identifying differences in brain activities and an accurate detection of autism spectrum disorder using resting state functional-magnetic resonance imaging: A spatial filtering approach. Medical Image Analysis, 35:375–389, 2017.
[51] Rajat Mani Thomas, Selene Gallo, Leonardo Cerliani, Paul Zhutovsky, Ahmed El-Gazzar, and Guido Van Wingen. Classifying autism spectrum disorder using the temporal statistics of resting-state functional mri data with 3d convolutional neural networks. Frontiers in Psychiatry, 11:440, 2020.
[52] Nathalie Tzourio-Mazoyer, Brigitte Landeau, Dimitri Papathanassiou, Fabrice Crivello, Octave Etard, Nicolas Delcroix, Bernard Mazoyer, and Marc Joliot. Automated anatomical labeling of activations in spm using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. Neuroimage, 15(1):273–289, 2002.
[53] A Vaswani. Attention is all you need. Advances in Neural Information Processing Systems, 2017.
[54] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. In 6th International Conference on Learning Representations, 2018.
[55] Da Xu, Chuanwei Ruan, Evren Korpeoglu, Sushant Kumar, and Kannan Achan. Inductive representation learning on temporal graphs. In International Conference on Learning Representations, 2020.
[56] Sijie Yan, Yuanjun Xiong, and Dahua Lin. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[57] Yi Yan, Ercan E Kuruoglu, and Mustafa A Altinkaya. Adaptive sign algorithm for graph signal processing. Signal Processing, 200:108662, 2022.
[58] Zhenyu Yang, Ge Zhang, Jia Wu, Jian Yang, Quan Z Sheng, Shan Xue, Chuan Zhou, Charu Aggarwal, Hao Peng, Wenbin Hu, et al. State of the art and potentialities of graph-level learning. ACM Computing Surveys, 57(2):1–40, 2024.
[59] Ali Yousefian, Farzaneh Shayegh, and Zeinab Maleki. Detection of autism spectrum disorder using graph representation learning algorithms and deep neural network, based on fmri signals. Frontiers in Systems Neuroscience, 16:904770, 2023.
[60] Bing Yu, Haoteng Yin, and Zhanxing Zhu. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. IJCAI, 2018.
[61] Chaohao Yuan, Kangfei Zhao, Ercan Engin Kuruoglu, Liang Wang, Tingyang Xu, Wenbing Huang, Deli Zhao, Hong Cheng, and Yu Rong. A survey of graph transformers: Architectures, theories and applications. arXiv preprint arXiv:2502.16533, 2025.
[62] Fengfan Zhao and Ercan Engin Kuruoglu. Sequential monte carlo graph convolutional network for dynamic brain connectivity. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7535–7539. IEEE, 2024.
[63] Xiaohan Zhao, Bo Zong, Ziyu Guan, Kai Zhang, and Wei Zhao. Substructure assembling network for graph classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.