Geometric Deep Learning on Multiple Spectral Embeddings: Learning of Cortical Surface Data

Abstract

Neuronal cell bodies mostly reside in the cerebral cortex. The study of this thin and highly convoluted surface is essential for understanding how the brain works. The analysis of surface data is, however, challenging due the high variability of the cortical geometry. This paper presents a novel approach for learning and exploiting surface data directly across surface domains. Current approaches rely on geometrical simplifications, such as spherical inflations, a popular but costly process. For instance, the widely used FreeSurfer takes up to 4 hours to parcellate brain surfaces by slowly deforming brain models towards labeled atlases. One shot learning of surface data would provide a new family of fast algorithms for processing brain surfaces. However, the current limitation of existing state-of-the-art graph convolution networks is their inability to compare surface data across different surface domains. Surface bases are indeed incompatible between brain geometries. This paper leverages recent advances in spectral graph matching to transfer surface data across aligned spectral domains. This novel approach enables a direct learning of surface data across compatible surface bases by exploiting spectral filters over intrinsic representations of surface neighborhoods. We illustrate the benefits of this approach with an application to brain parcellation. We validate the algorithm over 101 manually labeled brain surfaces. The results show a significant improvement in labeling accuracy over recent Euclidean approaches, while gaining a drastic speed improvement over conventional neuroimaging approaches.

1 Introduction

Neuroimage analysis consists of studying functional and anatomical information over the brain geometry. The thin outer layer of the brain cerebrum is of particular interest due to its key role in learning, vision and perception. Statistical frameworks on surfaces are, therefore, highly sought for studying various aspects of the brain. For instance, variations in surface data can reveal new biomarkers as well as potential relations with disease processes [Arbabshirani2017Single]. The challenge consists of learning surface data over highly complex and convoluted surfaces. Conventional approaches rely on geometrical simplifications, such as spherical inflation and slow mesh deformations [Tustison2014Largescale], a popular but costly process. For instance, the widely used FreeSurfer [fischl2004:automatically] takes up to a few hours of computation to parcellate brain surfaces on a standard computer. Fundamentally, current statistical frameworks exploit spatial information mostly derived from the Euclidean domain, for instance, based on vector fields, image or volumetric coordinates [Zhang2011ODVBA, Hua2013Unbiased]. Such information is highly variable across brain geometries and severally hinders the training of modern machine learning algorithms.

State-of-the-art learning approaches ^{fancyline,backgroundcolor=red!25,bordercolor=red}^{fancyline,backgroundcolor=red!25,bordercolor=red}todo: fancyline,backgroundcolor=red!25,bordercolor=redHervé says: Christian/Karthik, citations? [Kamnitsas2017Efficient] have the potential to offer a drastic speed advantage over traditional surface-based methods, but operate on image or volumetric spaces. Geometric deep learning [Bronstein2017Geometric, Monti2017Geometric, Levie2017CayleyNets] recently proposes to use convolutional filters on irregular graphs. This approach formulates the convolution theorem from Fourier space to spectral domains over graphs. Furthermore, Chebyshev polynomials avoids the explicit computation of graph Laplacian eigenvectors [Defferrard2016Convolutional], and local graph filtering is made possible within small neighborhoods [Monti2017Geometric]. The main concern of these methods is their inability to compare surface data across different surface domains [Bronstein2013Making, Kovnatsky2013Coupled, Ovsjanikov2012Functional, Eynard2015Multimodal]. Laplacian eigenbases are indeed incompatible across multiple geometries. One approach is to map local graph information onto geodesic patches and follow conventional spatial convolution via template matching [Masci2015Geodesic, Boscaini2016Learning]. Geodesic patches are, however, obtained using fixed local polar coordinates, and recently, with parametric constructions of patches [Monti2017Geometric]. Fundamentally, spatial representations of surface data remain defined in Euclidean spaces, for instance, as polar representations of pixels or mesh vertices.

This paper leverages recent advances in spectral graph matching to transfer surface data across aligned spectral domains [Lombaert2015Brain]. Multiple spectral domains are brought into correspondence by aligning spectral embeddings across geometries. The transfer of spectral coordinates across domains provides a robust formulation for spectral methods that naturally handles differences across Laplacian eigenvectors, including sign flips, ordering and mixing of eigenvectors in higher frequencies. This spectral alignment strategy was exploited to learn surface data within the Random Forests framework [Lombaert2015Spectral], but was limited to pointwise information, ignoring local patterns within surface neighborhoods.

Existing attempts of graph convolutions on neuroimages [Parisot2017Spectral, Ktena2017Distance] exploit single graph structures and rely on Euclidean representations of points on brain surfaces. The learning of cortical data across multiple surfaces remain hindered by the incompatibility of spectral bases across geometries. Our approach consists of leveraging spectral coordinates within convolutional networks. This bridges a gap between learning algorithms and geometrical representations. To the best of our knowledge, this is the first attempt of learning surface data via spectral graph convolutions in neuroimaging. This novel approach enables a direct learning of surface data across compatible surface bases by exploiting spectral filters over intrinsic representations of surface neighborhoods. We illustrate the learning capabilities of this approach with an application to brain parcellation. The validation over 101 manually labeled brain surfaces [Klein2017Mindboggling] shows a significant improvement of spectral graph convolutions over Euclidean approaches, from a Dice score of 59% to 82%. This performance is aligned with the well established FreeSurfer algorithm [fischl2004:automatically], which scores 83%, while gaining a drastic speed improvement, in the order of seconds. Our contributions are multifold. The transfer of spectral bases across domains enables the design of spectral filters in graph convolutional approaches. Our adaptive spectral filters can consequently learn cortical surface data across multiple geometries, as well as exploit local patterns of data within surface neighborhoods. The next section details the fundamentals of our spectral approach, followed by experiments evaluating the impact of our spectral strategy over standard Euclidean approaches for graph convolutions.

Refer to caption — Figure 1: An overview of the algorithm - The first column indicates input to the graph convolutions with sulcul depth $S_{d}$ and corresponding spectral coordinates $\mathbf{U}1,\mathbf{U}2,\mathbf{U}3$ color coded on a brain surface. The output response after each convolution layer is visualized as feature maps. As we move along the depth of the architecture, a coarser to fine geometric features are learned. The final predicted probabilities (Pred) as heat map is compared with the ground truth (GT) for two separate parcels. The brain surface is inflated just for visualization.

2 Method

An overview of the proposed method is shown in Fig. 1. In a first step, cortical surfaces modeled as a brain graph are embedded in a spectral manifold using the graph Laplacian operator. The spectral embedding of different brain surfaces are then aligned in the manifold using the Iterative Closest Point (ICP) algorithm. Finally, a geometric convolutional neural network (CNN) is used to map input features, corresponding to the spectral coordinates and sulcul depth of brain graph vertices, to a labeled graph.

2.1 Spectral basis construction

Let brain graph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ from set of $N$ vertices with position $\mathbf{x}_{i}\in\mathbb{R}^{3}$ , and edges of a surface model $S$ . The normalized graph Laplacian operator with weighted adjacency matrix $\mathbf{A}$ and diagonal degree matrix $\mathbf{D}$ is of the form, $\mathbf{L}=\mathbf{I}-\mathbf{D}^{-1}\mathbf{A}$ . Then, the spectral decomposition of this graph is $\mathbf{L}=\mathbf{U}\mathbf{\Lambda}\mathbf{U}^{-1}$ . The vibration modes of the surface model $S$ is depicted by the sorted eigenvalues $\mathbf{\Lambda}=\mathrm{\textbf{Diag}}(\lambda_{0},\ldots,\lambda_{N})$ and their associated eigenfunctions $\mathbf{U}=[\mathbf{u}_{0},\ldots,\mathbf{u}_{N}]$ . The n-dimensional spectral coordinates in normalized scale is $\widehat{\mathbf{U}}=\mathbf{\Lambda}^{\frac{1}{2}}\mathbf{U}$ . Unlike Euclidean domain for navigating on surfaces, spectral coordinates can aid in moving over the surface for learning.

The Laplacian for different brain graph differs due to the inconsistency in adjacency matrix across subjects. Hence, the spectral basis for multiple brains graphs will be misaligned and the learning algorithm would be incapable of generalization. We handle this issue by realigning all spectral representations to an arbitrary reference. We can compute a spectral transformation $T^{(f\rightarrow g)}$ that can produce a new spectral basis $\mathbf{U}_{T}(f)$ for a brain graph $\mathbf{U}(f)$ matching to a reference basis $\mathbf{U}(g)$ such that $T^{(f\rightarrow g)}=(\mathbf{U}(g))^{-1}\mathbf{U}_{T}(f)$ . The first five spectral coordinates with Iterative closest point, in practice, are sufficient for matching spectral representation of two different brains. The aligned spectral coordinates $spe(n)$ thus obtained for a vertex x for a brain $g$ is $n^{th}$ row of the matrix $\widehat{\mathbf{U}}T^{(g\rightarrow ref)}$ .

2.2 Graph convolution on surface

We propose a deep learning framework to learn on graphs by adapting the traditional convolution principles applied on rigid Euclidean space. A point $x$ on a cartesian grid has a defined neighborhood at a fixed distance. Consider nodes $i$ and $j$ from the built graph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ . from set of vertices with position x, and edges of a surface model $S$ . The geodesic distance between vertices of a surface model vary depending on the folds and valleys. A fixed grid based neighborhood cannot be defined for a graph structure due to its inherent properties. Hence, we propose to use a learnable gaussian kernel to measure the influence of neighboring nodes on a given node $i$ . A Gaussian graph kernel between node $i$ and node $j$ is defined as,

t_{ijk}=g_{ij}\cdot\exp(-\sigma_{k}\|(X_{j}-X_{i})-\mu_{k}\|^{2})

(1)

where $g_{ij}$ indicates node connectivity of $i$ and $j$ ; $X_{i}$ is the spectral embedding of the node $i$ ; $\mu_{k}$ and $\sigma_{k}$ are the Gaussian parameters of the filter kernel learned over epochs.

A graph convolution filter $w_{pqk}^{(l)}$ between input $q$ and output $p$ is learned producing a $M_{l}$ filter response maps at a layer $l$ . The graph convolution operation is defined as,

z_{ip}^{(l+1)}\ =\ \sum_{j=1}^{N}\sum_{q=1}^{M_{l}}\sum_{k=1}^{K_{l}}w_{pqk}^{(l)}\cdot t_{ijk}^{(l)}\cdot y_{jq}^{(l)}\ +\ b_{q}^{(l)}

(2)

where, $b_{q}^{(l)}$ is the bias for kernel $k$ for input feature $q$ ; $y_{jq}^{(l)}$ is the input feature $q$ of node $j$ .

The proposed architecture contains three graph convolution layers ( $l=3$ ) with feature maps $M_{1}=32;M_{2}=64;M_{3}=32$ . The size of the last layer corresponds to the number of parcels to be segmented (32 in this case). Each layer has four Gaussian kernels $K=4$ operating on $N$ connected neighbours of node $i$ . A non linear activation function, for instance, Leaky ReLU is applied after each layer to obtain filter response $y_{ip}^{(l)}=\max(0.01z_{ip}^{(l)},\,z_{ip}^{(l)})$ .

Since the parcels to be segmented are mutually exclusive, we choose softmax layer at end of last graph convolution layer to obtain a probability of belonging to a parcel for all node. The softmax function is given by $\frac{exp(z_{ip}^{l})}{\sum_{q}exp(z_{iq}^{l})}$ . The error $E$ between the predicted $y_{ic}$ and the target $q_{ic}$ is measured using the cross-entropy loss given by

E=-\sum_{i=1}^{N}\sum_{c=1}^{C}q_{ic}\,\log(y_{ic})

(3)

This loss is minimized by propagating the error using standard gradient descent optimization scheme. Along with graph convolution filter weights, the parameter $\mu_{k}$ and $\sigma_{k}$ of the Gaussian kernel is updated.

3 Results

We now evaluate the performance of our contributions. First, we highlight the advantage of moving graph learning frameworks from a conventional Euclidean domain to a Spectral domain. Second, we assess the improvement of exploiting neighborhoods of surface data versus learning pointwise information on spectral embeddings. Our validation is performed on Mindboggle [Klein2017Mindboggling], the largest publicly available dataset of manually labeled brain MRI. It consists of 101 subjects collected from different sites, with cortical meshes varying from 102K to 185K nodes. Each brain surface contains 32 manually labeled parcels. In our experiments, the dataset was randomly split into training, validation and testing with a ratio of 70-10-20. We further compared our results with the established FreeSurfer software. Our implementation is in Matlab, and uses an i7 desktop computer with 16GB of RAM and an Nvidia Titan X GPU. The computation of spectral embedding and basis alignment takes roughly 15 seconds. Each training epoch takes $\sim$ 5-6 minutes. However, a single forward pass during testing for cortical parcellation takes approximately 3 seconds.

3.1 Euclidean domain versus Spectral domain

This experiment evaluates the improvement of moving the learning operations from the Euclidean domain to a Spectral domain. To do so, we compare the classification accuracy on 32 cortical parcels when running our algorithm, respectively, in the Euclidean and Spectral domains. Quantitative results are measured in terms of average Dice overlap ( $2|A\cap B|/(|A|+|B|)$ ) and Hausdorff distances. Qualitative results are shown in Fig. 2.

Euclidean domain – In our baseline, similarly to the latest approaches of graph convolutions networks [Monti2017Geometric], we learn from input features in the Euclidean domain. Each cortical point is represented using sulcal depth and its spatial location, i.e., $f_{i}^{eu}=(S_{d}^{i},X,Y,Z)$ . The architecture remains the same as described earlier (Fig. 1). We train our network on the randomly split training set until the performance decreases on the validation set. The average Dice overlap across all parcels in our dataset is 59% ( $\pm$ 20.5, min/max = 0/85.54%). The Hausdorff distance averaged across all parcels is 18.87mm. Fig. 2 clearly illustrates the current limitation of existing graph convolutions approaches. Spatial ambiguities are inherent in the Euclidean domain, i.e., neighboring coordinates in space may not necessarily be neighbors on surfaces. This ambiguity may confuse training on highly convoluted surfaces such as the brain, and possibly explains the strong spatial irregularities observed in the parcels boundaries.

Spectral domain – Our contribution is to operate in a geometry-aware Spectral domain. We now learn on input features represented in the Spectral domain, where each cortical point consists of the sulcal depth with the first three spectral coordinates, i.e, $f_{i}^{sp}=(S_{d}^{i},spe(1),spe(2),spe(3))$ . We use the same architecture and data split as before. The average Dice overlap across all parcels improves to 82% ( $\pm$ 5.32, min/max = 69.45/95.09%). Details for each parcels are shown in Fig. 3. The Hausdorff distance averaged across all parcels is now reduced to 12.30mm. This is a 40% improvement over learning in the conventional Euclidean domain. The qualitative results of Fig. 2 show that our cortical parcellation is almost similar to the manual parcellation. The boundary, however, is irregular and requires further regularization.

As an illustration of further refinement, we apply a standard graphcut [graphcut] regularization for both, Euclidean and Spectral outputs. To do so, the graphcut algorithm is set using the surface adjacency matrix as graph edge weights, and the output node probabilities as graph node weights. The graphcut regularization further improves the overall classification accuracy from 68.24% to 71.08% in the Euclidean domain, and from 84.35% to 85.35% in the Spectral domain. Similar improvement is observed in terms of Hausdorff distance, with a reduction from 18.87mm to 15.96 in the Euclidean domain, and from 12.30mm to 10.62mm in the Spectral domain. The Dice overlap scores remain, however, similar after regularization, perhaps due to a correct initial overlap of parcels by our algorithm.

3.2 Point wise vs neighborhood

In this experiment, we illustrate the benefits of exploiting neighborhoods of surface data over pointwise information in a spectral domain. To follow standard approaches of learning pointwise surface data [Lombaert2015Spectral], we set our weighted adjacency matrix as an identity matrix. This removes any neighboring connection. The graph convolutions, therefore, requires no update for $\sigma$ and $\mu$ . Training an pointwise information, $f_{i}^{sp}=(S_{d}^{i},spe(1),spe(2),spe(3))$ , results in an average Dice overlap of 75% ( $\pm$ 14.3%, min/max = 16.16/92.32%). As a reminder, training on neighborhoods results in 82% of Dice score. Exploiting neighborhoods of surface data improves, therefore, the performance of our algorithm. A closer look to the performance scores for each parcels (Fig. 3) also reveals a general improvement when exploiting neighborhoods over pointwise surface data. Results for all parcels are also at par with FreeSurfer’s performance.

In the MindBoggle dataset, the manual parcellations were created by experts who corrected parcels initially generated from FreeSurfer. The agreement between manually corrected parcels and FreeSurfer is $83\%$ ^{fancyline,backgroundcolor=red!25,bordercolor=red}^{fancyline,backgroundcolor=red!25,bordercolor=red}todo: fancyline,backgroundcolor=red!25,bordercolor=redHervé says: standard deviation? of Dice score. Our algorithm produces a 82% Dice score with respect to manual parcellation, while gaining a significant improvement in computation time, from hours to seconds for processing one subject. ^{fancyline,backgroundcolor=red!25,bordercolor=red}^{fancyline,backgroundcolor=red!25,bordercolor=red}todo: fancyline,backgroundcolor=red!25,bordercolor=redHervé says: We’re missing a note on computation time here Our graph convolutional framework also results in a stable performance when compared to an Euclidean approach or to FreeSurfer, who respectively have a standard deviation on Dice scores of 20.5% and 10.3% across all parcels. Our spectral approach varies by 5.3%.

4 Conclusion

In this paper, we present a novel framework for learning of surfaces. The goal of the work was to build a geometry aware deep learning framework operating directly of surface data. We proposed a possible solution for working on different surface data by aligning spectral embedding to a common domain. We also show the effect of local patterns on learning of surface data. The highly convoluted structure of the cortex makes it harder for learning algorithm in euclidean space to generalize. This problem can be seen qualitatively as irregularities in Fig.2. The algorithm working directly in spectral domain with geometric context improves the dice overlap over working in euclidean domain from $59\%$ to $82\%$ . The results are comparable with the state-of-the-art method with significant improvement in computation time ( $\sim$ 18 seconds vs hours). Hence, spectral embedding provide better opportunities to analyze surface data. The experiments convincingly substantiates the claims made in the paper. We show the potential of this algorithm with brain parcellation as one application. The use of this algorithm can be extended to cortical thickness estimation or disease predication and beyond cortical surfaces. In general, the put-forth algorithm can be a powerful tool for analyzing medical data lying on surfaces.