This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Department of Diagnostic and Interventional Radiology,
Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
22institutetext: CIBM Centre d’Imagerie BioMédicale, Lausanne, Switzerland

Multi-dimensional topological loss for cortical plate segmentation in fetal brain MRI

Priscille de Dumast Corresponding author, [email protected]    Hamza Kebiri 1122    Vincent Dunet 11   
Mériam Koob
11
   Meritxell Bach Cuadra 2211
Abstract

The fetal cortical plate (CP) undergoes drastic morphological changes during the in utero development. Therefore, CP growth and folding patterns are key indicator in the assessment of the brain development and maturation. Magnetic resonance imaging (MRI) offers specific insights for the analysis of quantitative imaging biomarkers. Nonetheless, accurate and, more importantly, topologically correct MR image segmentation remains the key baseline to such analysis. In this study, we propose a deep learning segmentation framework for automatic and morphologically consistent segmentation of the CP in fetal brain MRI. Our contribution is two fold. First, we generalized a multi-dimensional topological loss function in order to enhance the topological accuracy. Second, we introduced hole ratio, a new topology-based validation measure that quantifies the size of the topological defects taking into account the size of the structure of interest. Using two publicly available datasets, we quantitatively evaluated our proposed method based on three complementary metrics which are overlap-, distance- and topology-based on 27 fetal brains. Our results evidence that our topology-integrative framework outperforms state-of-the-art training loss functions on super-resolution reconstructed clinical MRI, not only in shape correctness but also in the classical evaluation metrics (mean±\pmstd: Dice similarity coefficient of 0.85±\pm0.01, average symmetric surface distance of 0.19±\pm0.01 mm and hole ratio of 0.06±\pm0.03). Furthermore, results on additional 31 out-of-domain SR reconstructions from clinical acquisitions were qualitatively assessed by three experts. The experts’ consensus ranked our TopoCP method as the best segmentation in 100% of the cases with a high inter-expert agreement. Overall, both quantitative and qualitative results, on a wide range of gestational ages and number of cases, support the generalizability and added value of our topology-guided framework for fetal CP segmentation.

Keywords:
Fetal brain cortical plate deep learning topology magnetic resonance imaging

1 Introduction

1.1 Clinical context

During in utero development, the human fetal cortical plate (CP) that is the future cortex undergoes drastic changes [Tierney and Nelson, 2009]. Indeed, the brain moves from a smooth surface at 10 weeks of gestation to a convoluted one at 35 weeks thanks to the appearance of the cortical gyrifications (see Figure 1). Jointly, the surface area and the volume of the future cortex are respectively increased 50 and 40 times during the 2nd and 3rd trimesters of pregnancy [Vasung et al., 2016, Vasung et al., 2020].

Cortical folding consists in the chronologic appearance of primary, secondary and lastly tertiary sulci [Garcia et al., 2018]. Nearly all gyri are in place at birth, even though the gyrification process carries on afterward [Lenroot and Giedd, 2006]. Cortical gyrification is considered to be a relevant marker of fetal brain maturation, as the chronological sequence of appearance of sulci is well known during the fetal period [Garel et al., 2003]. Conversely, abnormal fetal sulcation and gyration indicate disruption of one of the three main fetal stages of normal cortical development (i.e. cell proliferation, migration and cortical organization) [Barkovich et al., 2005, Leibovitz et al., 2022].

Altered cerebral cortex formation, induced either by genetic mutations, vascular injuries, metabolic diseases, fetal infection or teratogenic causes, may lead to malformations of cortical development. An updated classification of this group of heterogenic disorders has been recently published in a consensus statement [Severino et al., 2020]. Those rare disorders usually manifest with developmental delay, seizures, and motor and sensory deficits [Leventer et al., 2008]. Given the consequences of abnormal brain gyrification, early diagnosis is crucial, for which the analysis of cortical maturation is an asset.

Advanced neuroimaging techniques can greatly benefit the characterization of neurotypical and pathological fetal brain development. Ultrasound (US) is the first prenatal imaging modality for fetal screening. Unfortunately, its sensitivity to the surrounding maternal tissues easily alters the image quality. Conversely, magnetic resonance imaging (MRI) does not share these limitations and hence comes as a key complementary imaging tool to look for additional information in equivocal situation or to confirm or rule out eventual US pathological findings [Salomon et al., 2006, Griffiths et al., 2010, Prayer et al., 2017, Griffiths et al., 2019]. MRI is a non-invasive and reliable imaging method for the monitoring and follow-up of fetal brain development that relies on the tissue properties to generate image contrasts [Gholipour et al., 2014]. MRI is a valuable technique highly suitable for evaluating fetal brain morphometry and connectivity in neurotypical and pathological cases [Garel, 2004, Prayer et al., 2017]. In fetal brain MRI, T2-weighted (T2w) sequences evidence soft-tissue contrast. In this respect, they are essentially used to assess morphological and structural development [Gholipour et al., 2014].

In clinical practice, fast MR sequences are run in order to “freeze” the unpredictable fetal motion in the plane of acquisition [Prayer et al., 2017]. While the series presents an excellent in-plane resolution (1 to 2 mm2mm^{2}), slice thickness has a sub-optimal resolution (3 to 5 mm) [Gholipour et al., 2014]. In a clinical examination, several low-resolution (LR) series are acquired in three orthogonal planes to offer complementary structural information from the three dimensions. However, the strong anisotropy of the volumes and the remaining inter-slice motion hampers a correct realignment of the images in the anatomical planes and corrupts any three-dimensional based measurements. Over the last decades, novel advanced image processing algorithms, based on super-resolution (SR) methods, were developed for the reconstruction of 3D volumes [Rousseau et al., 2006, Gholipour et al., 2010, Kuklisova-Murgasova et al., 2012, Tourbier et al., 2015, Ebner et al., 2020]. The underlying reconstruction principle is to combine a set of clinical LR acquisitions into a single high-resolution isotropic motion-free volume [Uus et al., 2022]. In addition to improved visualization, the availability of SR reconstructed volumes opens up to more accurate 3D-based quantitative analysis of the fetal brain anatomy such fetal brain biometry [Pier et al., 2016, Kyriakopoulou et al., 2017, Khawam et al., 2021].

Quantitative analysis of imaging biomarkers has defined cortical development for the typically developing brain [Rajagopalan et al., 2011, Clouchoux et al., 2012, Wright et al., 2014, Xia et al., 2019], while other works evidenced discrepancies in cortical volumes and sulcal patterns in the pathological brain [Clouchoux et al., 2013, Egaña-Ugrinovic et al., 2013, Im et al., 2013, Tarui et al., 2018]. Nevertheless, these analyses require prior additional segmentation processing steps. While expert manual image annotation is considered the gold standard, it is time-consuming, tedious, and subject to inter- and intra- expert variability. For this reason, manual segmentation is not an enduringly reliable method.

Thus robust automated fetal brain MRI segmentation methods are key for further analysis. Nonetheless, in contrast to adult brain segmentation, fetal brain segmentation remains challenging as to provide an age-invariant method [Makropoulos et al., 2018].

1.2 Related works in cortical plate segmentation

Segmentation of the cortical plate is particularly challenging as it undergoes significant changes throughout gestation due to brain growth and maturation, respectively modifying the morphology and the image contrast (see Figure 1). Furthermore, the cortex being a thin structure (from 1 to 2 mm for fetuses between 21 and 40 weeks of gestational age [Vasung et al., 2016]) strongly affected by partial volume effects, anatomical topology is prone to be incorrectly captured by automatic segmentation methods.

Refer to caption
Figure 1: Illustration of MR T2w contrast and in utero cortical folding changes throughout gestation for neurotypical fetal brain subjects of 22, 30, and 34.8 gestational weeks (GW).

In Caldairou et al[Caldairou et al., 2011], they introduced the first topological-based segmentation of the fetal cortex based on geometrical constraints along with anatomical and topological priors. However, the sample size in that study was small (i.e. six fetuses) and topological correctness was not evaluated for the provided segmentations. More recently, deep learning (DL) methods have also focused on fetal brain MRI cortical gray matter segmentation. Based on a neonatal segmentation framework, a recent study introduced a hybrid segmentation process that minimizes the need for human interaction in the segmentation of the developing cortex [Fetit et al., 2020]. Also with DL models, a multiple-predictions approach with a test-time augmentation to improve the robustness of the method was suggested [Hong et al., 2020]. Finally, a two-stage segmentation framework with an attention refinement was proposed [Dou et al., 2021]. Nevertheless, while the segmentation accuracy of these recent DL-based methods is promising, none of these CP segmentation works includes topological constraints nor assesses the topology. Overall, these studies of automatic segmentation frameworks report high similarity, as for overlap (Dice similarity coefficient of 0.87 in [Dou et al., 2021] and 0.90 in [Hong et al., 2020]), and low difference, as for the boundary distance-based metrics (average symmetric surface distance of 0.28 mmmm in [Dou et al., 2021] and mean surface distance of 0.18 mmmm in [Hong et al., 2020]), compared to the ground truth segmentation, but illustrated results show a lack of topological consistency with notably discontinuous/broken cortical ribbons (see Fig. 5 in  [Dou et al., 2021], Figures 5 and 6 in [Fetit et al., 2020]).

Here, we propose to integrate a topological constraint in a deep image segmentation framework to overcome the limitation of disjoint CP segmentation in fetal MRI. To our knowledge, only two works previously explored the topological fidelity of the semantic medical image segmentation with DL. First, Hu et al. [Hu et al., 2019] proposed using a topological loss for neuronal membrane segmentation. Second, a study presented topological constraints for MR cardiac image segmentation [Clough et al., 2021] based on prior topological knowledge, such as the number of connected components or handles present in the structure of interest. Although theoretical CP topological features are known, such prior information could only be valid in a whole 3D volume analysis. Therefore, we inspired from [Hu et al., 2019] that does not share this limitation to build a prior-free framework.

1.3 Contributions

In this work, we incorporate topological constraints and assess the topology of the CP segmentation in fetal brain MRI. To this end, our first contribution is the generalization of a topological loss function based on persistent homology into a multi-dimensional (in topological spaces) formulation. Using a publicly available dataset of SR reconstructed fetal brain MRI [Payette et al., 2021] along with a spatiotemporal atlas [Gholipour et al., 2017], we compare our topology-integrative optimization method (TopoCP) to other widely used loss functions (Baseline and Hybrid[Payette et al., 2022] with a state-of-the-art U-Net segmentation network. We further assess our proposed method compared to semi-automatic manual annotations. For the first time, the topological correctness of the segmentation is assessed in the evaluation of automatic CP segmentation. In that respect, our second contribution is a new topology-based metric for the quantification of the CP segmentation defects not only in number but also in size. We quantitatively compare automatic and semi-automatic segmentation with complementary metrics on two independent pure testing sets. As a complement, three fetal brain MRI experts further visually compare automatic segmentation on an additional out-of-distribution clinical dataset. The results evidence an overall significant improvement in the segmentation using our proposed topological loss function.

In Section 2 we introduce our multi-dimensional topological loss function, the overall segmentation framework and our original topology hole ratio assessment metrics; in Section 3, we describe the experimental design, along with materials and description of the experiments performed; in Section 4, we describe and discuss the results; and finally, we conclude on this work in Section 5.

2 Methodology

We propose a topologically-guided deep learning framework for the cortical plate segmentation of the fetal brain MRI. This is done by including a topological constraint in the optimization of state-of-the-art deep-learning image segmentation strategies. First, we will introduce our custom loss function that we adapted for generalization from [Hu et al., 2019] (see Section 2.1). Second, we present the segmentation framework in which our custom loss function is integrated (see Section 2.2). Finally, we present topology-based metrics for further assessment of our method (see Section 2.3).

2.1 Topological loss

In semantic image segmentation, conventional optimization loss functions (e.g. the cross-entropy) often proceed to a pixel-wise comparison of the class-prediction that is summarized in a likelihood function ff to the one-hot encoded target vector. In this work, we aim to integrate the analysis of the global shape correctness of the prediction through the study of topology during the model optimization.

2.1.1 Computational topology

Topology defines the properties of an object that are preserved through deformation [Rote and Vegter, 2006]. In computational topology, local features are derived to conclude on the global properties of an object. Specifically, in a binary image, local information relies on the connectivity of a voxel to its neighbors in an objects. By generalizing the connectivity information, one can conclude on global features such as the number of connected components or holes.

Topology structures are defined in the different topological dimensions. The number of topological structures in each kk-dimension is counted with the kk-dimensional Betti numbers (BNkk). In the context of 3D binary images, one can count BN0 connected components, BN11 holes (also called handles or tunnels), and finally BN22 cavities. Figure 2 shows three 2D binary image patchs of the cortex and their Betti numbers. Note that for a 2D binary image, BN22 is always 0.

Refer to caption
Figure 2: Example patches of 2D binary images with the topological properties: number of connected components BN0, number of holes BN11, and number of cavities BN22.

In practice, and similarly to [Hu et al., 2019], prior to the computation of topology, we pad the image patches twice with constant values (see Figure 2, Figure 3 (A) and Figure 4 (A)). We perform a first padding with the maximal value of the patch in order to work with closed structures. Then, we pad the patch with 0 value voxels to define a background.

2.1.2 Persistent homology

Persistent homology offers a workaround to analyze the topology of a continuous-valued nn-dimensional image function. In the context of our semantic image segmentation, we consider the likelihood map f:Ωn[0,1]f\colon\Omega\subset\mathbb{R}^{n}\rightarrow[0,1] of a voxel to belong to the CP that is predicted from a DL-based model (Figure 3 (A)). In order to reduce the problem to a binary analysis as presented in the previous Section 2.1.1, persistent homology tracks the topological structures of ff through filtration gγg_{\gamma} to different thresholds γ[0,1]\gamma\in[0,1]:

gγ:[0,1]\displaystyle g_{\gamma}\colon[0,1] {0,1}\displaystyle\longrightarrow\{0,1\}
x\displaystyle x gγ(x)={1if xγ,0otherwise.\displaystyle\longmapsto g_{\gamma}(x)=\begin{cases}1&\text{if }x\geq\gamma,\\ 0&\text{otherwise.}\end{cases} (1)

Snapshots of the topology are reported into a persistence barcode (Figure 3 (B)). Each bar corresponds to a topological structure (e.g. connected components, handles) which is characterized by its appearance and disappearance threshold values (γbirth;γdeath)(\gamma_{birth};\gamma_{death}). The persistence barcode can be filtered based on the structures persistence. The persistence of a structure is defined by its life time Δγ=γdeathγbirth\Delta\gamma=\gamma_{death}-\gamma_{birth}. In persistent homology, the minimum persistence (mp) is the minimum lifetime accepted in the topological structures filtration. Figure 3 (B) shows an example of a likelihood image binarized at different thresholds γ[0,1]\gamma\in[0,1]. Two persistence barcodes with mp=0.001mp=0.001 (top) and mp=0.1mp=0.1 (bottom) are presented. With a lower mp, we observe the presence of a considerable amount of irrelevant structures. Indeed, the CP is a thin cerebral tissue (only a few voxels-wide in SR volumes), and is therefore sensitive to broken connections. In the barcode, this turns into the appearance of many connected components (blue) with short life time, i.e. low persistence.

Finally, the structure pairs (γbirth;γdeath)(\gamma_{birth};\gamma_{death}) extracted from the persistence barcodes are considered as coordinates. These coordinates define critical points, transcribed in the associated persistence diagram (see Figure 3 (C)). With low mp (top) we observe many critical points close to the diagonal. This diagonal corresponds to γbirth=γdeath\gamma_{birth}=\gamma_{death}, i.e. the structure does not exist.

The choice of the mp must be set to avoid noise structures without being too strict. Note that for a binary image, all topological structures have a persistence Δγ=1\Delta\gamma=1, with coordinates (0;1)(0;1) in the persistent diagram representation.

Refer to caption
Figure 3: Illustration of persistent homology for a CP likelihood map ff (A). Panel (B) illustrates the progressive filtration of ff with minimum persistence (mp) of 0.001 (top) and 0.1 (bottom). Topological structures are reported into a persistence barcode. Panel (C) shows the persistence diagram for both mp. In the persistence barcode, respectively diagram, blue bars, respectively blue dots, represent the connected components. Similarly, red elements represent the 1-dimensional holes.

2.1.3 Topological loss function for fetal brain MRI

The topological loss function aims at directly comparing the persistent homology of the predicted likelihood map ff to the target true topology. We propose a topological loss function topo\mathcal{L}_{topo} that is adapted from [Hu et al., 2019]. Our contribution lies in the multi-dimensional approach of the topological loss computation. While our focus in this work is on the 1-dimensional holes, we evidence in Section 2.1.2 the importance the connected components can have in the persistent homology of the fetal CP. Therefore, differently from [Hu et al., 2019] that only considered 11-dimensional structures (i.e. 2D holes), our topological loss for the fetal CP segmentation will additionally integrate the analysis of 0-dimensional homology structures.

We detail here the process of the computation of topo\mathcal{L}_{topo} between two 2D image patches, the target segmentation and the predicted likelihood map ff (see Figure 4 (A)), hence the dimensions involved are k{0;1}k\in\{0;1\}. First, persistent homology is computed tracking all kk-dimensional structures in both images (see in Figure 4 (B), the persistence barcodes partitioned by dimension). Second, the per-dimension persistence diagrams are matched between the ground truth and the prediction (Figure 4 (C)). All kk-dimensional structures are matched such that, the NN-greater persistence structures are matched to the NN true ground truth elements. Note that a sufficiently accurate likelihood is needed to prevent structures mismatch. Others are matched to the diagonal. Based on the implementation in [Hu et al., 2019]  111https://github.com/HuXiaoling/TopoLoss, we compute an adapted Wasserstein distance [Cohen-Steiner et al., 2010], from the matched pairs of critical points in each dimension. The kk-dimensional distance is our topok\mathcal{L}_{topo-k}, the kk-dimensional topological loss function. Ultimately, topok\mathcal{L}_{topo-k} are combined such that :

topo=k=0Kωktopok\mathcal{L}_{topo}=\sum_{k=0}^{K}\omega_{k}\mathcal{L}_{topo-k} (2)

where topok\mathcal{L}_{topo-k} is the topological loss of the kk-dimensional space with a contribution weight of ωk\omega_{k}.

Refer to caption
Figure 4: Illustration of the topological loss computation process between a ground truth binary image and a likelihood map ff (A). (B) shows their corresponding persistence barcodes in each dimension (k=0k=0 and k=1k=1). (C) illustrates a per-dimension persistence diagram matching. (D) shows how the final topo\mathcal{L}_{topo} is inferred, with ω0=ω1=1\omega_{0}=\omega_{1}=1. In the persistence barcode, respectively diagram, blue bars, respectively blue dots, represent the connected components. Similarly, red elements represent the holes.

2.2 Segmentation framework

Our topological loss is an architecture-agnostic optimization function. In other words, it is independent of the deep-learning network architecture. In our segmentation framework, we use a state-of-the-art architecture, U-Net [Ronneberger et al., 2015], to compare different optimization methods. Two reference loss functions (Baseline and Hybrid) are implemented to evaluate the added value of our topological loss function (TopoCP) (see configurations in Section 2.2.3).

2.2.1 Model architecture

The well established U-Net [Ronneberger et al., 2015] architecture is selected as it has recently proved its good accuracy in fetal brain MRI tissue segmentation [Khalili et al., 2019, Payette et al., 2021, Payette et al., 2022]. We use a 2D U-Net architecture that is composed of an encoding and a decoding paths with skipped connections. The encoding path in our study is composed of 5 repetitions of the followings: two 3x3 convolutional layers, followed by a rectified linear unit (ReLU) activation function and a 2x2 max-pooling downsampling layer. Feature maps are hence doubled after each block, starting from 32 to 512. In the expanding path, 2x2 upsampled encoded features concatenated with the corresponding encoding path are 3x3 convolved and passed through ReLU. The network prediction is computed with a final 1x1 convolution. The number of network trainable parameters is 7,852,002.

2.2.2 Multiview patch-based approach

First, T2w images are masked in order to only consider intracranial space voxels in the CP segmentation. Second, to alleviate the computational cost due to the topological loss, 64×6464\times 64 voxel size sub-image patches are extracted from the 3D volume in the three orthogonal planes (axial, coronal and sagittal). Bringing information from the three dimensional orientations, our method thus implements a 2.5D, or multiview, patch-based approach. To increase the number of predictions per voxel, overlapping patches are extracted. Empirically, the sliding window’s step size for the patch extraction is set 16 voxels. Finally, intensities of all patches are standardized to have mean 0 and variance 1.

2.2.3 Training and optimization strategies

Input samples are randomly augmented at each epoch of the training phase. Extensive augmentations are performed spatially (flipping and elastic deformation) and intensity-based (bias, blurring, gamma and noise). All augmentation have a probability of occurrence of 0.50.5, except flipping occuring with a probability of 0.20.2. Augmentation are performed with the TorchIO python package (v0.18.75) [Pérez-García et al., 2021].

Two reference segmentation methods (Baseline and Hybrid) are trained with fetal brain MRI state-of-the-art optimization loss function, in order to compare with our method (TopoCP). Thus, we evaluate the following three configurations:

  • Baseline is trained using the distribution-based binary cross-entropy loss function bce\mathcal{L}_{bce}.

  • Hybrid is trained with an hybrid loss function combining the dice loss dice\mathcal{L}_{dice} and the binary cross-entropy loss bce\mathcal{L}_{bce} such that:

    =bce+dice\mathcal{L}=\mathcal{L}_{bce}+\mathcal{L}_{dice} (3)

    Such hybrid loss function proved efficient in multi-tissue fetal brain MRI segmentation, as it has been used by the Top 5 teams of the 2022 edition of the MICCAI FeTA challenge [Payette et al., 2022].

  • TopoCP is trained with the following loss combination:

    =(1λtopo)bce+λtopotopo\mathcal{L}=(1-\lambda_{topo})\mathcal{L}_{bce}+\lambda_{topo}\mathcal{L}_{topo} (4)

    where bce\mathcal{L}_{bce} is the binary cross-entropy loss and topo\mathcal{L}_{topo} is the topological term presented in Section 2.1.3. λtopo\lambda_{topo} defines the weight of the contribution of topo\mathcal{L}_{topo} in the final loss.

As the computation of the topological loss is expensive and need sufficiently accurate probability maps to perform a relevant matching of the structures (see Section 2.1.3), we adopt the training strategy presented in [Hu et al., 2019]: 1) a common warm-up network is trained over 15 epochs using the binary cross-entropy loss =bce\mathcal{L}=\mathcal{L}_{bce}; 2) Baseline, Hybrid and TopoCP are initialized with the pretrained warm-up weights. An early stopping strategy monitors the global validation loss \mathcal{L}, respectively the topological validation loss topo\mathcal{L}_{topo}, for the Baseline and Hybrid configurations, respectively for our TopoCP configuration. All learning rates are set to 0.010.01. Training and evaluation were performed with Tensorflow v2.7  [Martin Abadi et al., 2015] and a GeForce RTX 2080TI GPU.

A 4-folds cross-validation approach is adopted to assess the learning performances of the different methods. In this way, we will assess multiple λtopo\lambda_{topo} in order to determine an optimal value (see Section 3.3.1).

2.2.4 Ensemble learning

In order to reduce the variance and increase the generalization power of our model, we adopt ensemble learning. In the final testing inference phase of each configuration, we perform a majority voting on the summed likelihoods from all 4 cross-validation networks. Finally, despite cortex would theoretically be two connected components (left and right hemispheres), in practice, partial volume in the mid-sagittal plane most often leads to having one single component. Thus, only the biggest connected component of the whole cortical volume ensemble prediction is kept.

2.2.5 TopoCP parameters setting

As introduced in Section 2.1.2, persistent homology structures are filtered on a minimum persistence criteria. This criteria tunes the sensitivity of our loss to the noisy structures out of the filtration step. Empirically, we observed that the higher minimum persistence is the tougher filtration of the structures and thus may discard relevant ones. Reducing the persistence threshold leads to having an increasingly large formation of noisy irrelevant structures to be matched. Based on these empirical analysis, we set our minimum persistence to 0.010.01 for all experiments.

Equation 2 presents our global topological loss in which different contributions can be assigned to each dimension. We analyzed the importance of both 0-dimensional and 1-dimensional topological terms on a reduced set of randomly sampled patches. Empirically, we observe that the importance of 0-dimensional and 1-dimensional topological terms is patch-dependent. In some patches, topo0\mathcal{L}_{topo-0} is more affected than topo1\mathcal{L}_{topo-1}, and vice versa in others. We therefore decided to give equal contribution to both terms, as they are undoubtedly both important. In other words, all kk dimension had the same weighting ωk=1\omega_{k}=1.

Finally, the contribution of our global topological loss is valued with the λtopo\lambda_{topo}. We describe in Section 3.3.1 the cross-validation approach implemented to determine an optimal value.

2.3 Topological assessment of the fetal CP

Recent works [Yeghiazaryan and Voiculescu, 2018, Maier-Hein et al., 2022] evidenced the importance of considering complementary metrics for the assessment of semantic segmentation methods. Specifically, in this work, segmentation should be assessed for its closeness to the target topology.

We presented in Section 2.1.1 the BNkk that define global features of a 3D binary image. To quantitatively compare topology of binary images, we introduce the kk-dimensional Betti number error (BNEkk) as the absolute difference of the ground truth expected value and the prediction measures. As the CP segmentation is filtered for its biggest connected component (see Section 2.2.4), BNE0 is incidentally irrelevant to consider. Additionally, this work specifically focuses on the presence of 1-dimensional holes. In this way, we only consider BNE11. Besides, considering that the human cortex is a closed structure with no obstruction, its ground truth expected BN11 is 0.

While BNE11 focuses on the count of occlusions on the CP surface, this score is not providing any information on the holes themselves. Indeed, we often observe that the number of holes that is a discrete value is not necessarily correlated to the size of the broken connections (i.e. the obstruction) nor to the size of the structure of interest (i.e. the fetal CP).

In that respect, we introduce a new metric that aims to quantify the size of the hole. The hole ratio (HR) is the ratio of false negative voxels that are connected to a hole (FNholesFN_{holes}) over the true voxels of the region of interest, which are represented by the true positives (TPTP) and false negatives (FNFN).

HR=FNholesTP+FNHR=\frac{FN_{holes}}{TP+FN} (5)

Our in-house implementation (available online, see Section 2.4) illustrated in Figure 5 successively identifies the location of one voxel per 1-dimensional hole, propagates these voxels into the mask of holes candidates, i.e. the FN, and finally, computes the volume ratio presented in Equation 5. Let us note that this measure strongly relies on the topological correctness of the ground truth.

Refer to caption
Figure 5: Illustration of the workflow for the computation of the Hole Ratio (HR).

Figure 6(a) shows (B) and (C), two 3D rendering of the same image patch CP segmentation. (A) shows the 3D rendering of the ground truth, enlighting the region of interest. One can easily observe that the (B) segmentation has a main hole compared to the (C) segmentation that presents multiple medium-size holes. Additionally, quantitative results (see Table 6(b)) confirm the discrepancy between the quantity of holes as a number and the quantity of holes as a ratio of the region of interest.

Albeit BNE1 must be used with caution, it is still a relevant score to assess the CP topology in the absence of topologically accurate ground truth segmentation. Nonetheless, we promote the use of an additional quantitative metric relative to the size of the occlusions to undertake a robust quantitative analysis of the broken connections in the CP segmentation.

Both topology-based measures (BNE1 and HR) rely on the cubical complex implementation of the GUDHI library (v3.5.0) [noa,].

Refer to caption
(a) 3D rendering of (A) Ground truth subject cortical plate (dark green) and region of interest (light green), and two different segmentation (B) and (C).
(B) (C)
BNE1 1 4
HR 0.36 0.33
(b) Table of the topology-based metrics (BNE1: 1-dimensional Betti number error; HR: Hole ratio) for the example cortical plate segmentation (B) and (C) shown in Figure 6(a).
Figure 6: Illustration of the discrepancy between the quantification of holes as a number and its quantification as a ratio over the region of interest.

2.4 Code availability

Baseline, Hybrid and TopoCP models implementation, including the optimization loss functions can be found in our Github repository 222https://github.com/Medical-Image-Analysis-Laboratory/FetalCP˙segmentation. The weights of the trained model are made available to perform inference. The implementation of the topology-based evaluation metric is also made available at this link.

3 Experiment design

The overall experiment design to compare the three segmentation frameworks Baseline, Hybrid and our TopoCP is outlined in Figure 7.

Refer to caption
Figure 7: Illustration of the overall experiment design. Panel (A) illustrates the different datasets and their splitting for training/testing purposes (see Section 3.1). Panel (B) illustrates the training phase. A common warm-up network is trained to initialize the three configurations Baseline, Hybrid and TopoCP, each optimized with its own optimization loss function (see Section  2.2). A cross-validation approach is used to determine an optimal hyperparameter λtopo\lambda_{topo} (see Section 3.3.1). Panel (C) illustrates the testing phase. Predictions are inferred through the cross-validation networks and combined in a majority voting step. Methods are assessed and compared quantitatively with complementary performance metrics (see Sections 3.3.23.3.3 3.3.4,  3.3.5 3.3.6) and qualitatively by three experts (see Sections 3.3.7).

3.1 Datasets

A summary of clinical and atlas datasets is shown in Table 1.

Dataset Number of subjects (Neurotypical / Pathological) Gestational age (weeks) SR reconstruction Image resolution (mm3)
TRAINING FeTA [Payette et al., 2021] 24 (13/11) [20.9-34.8] (28.2±\pm3.6) Simple-IRTK  [Kuklisova-Murgasova et al., 2012] 0.8630.86^{3}
TESTING FeTA [Payette et al., 2021] 9 (4/5) [22.9-34.8] (27.4±\pm3.6) Simple-IRTK  [Kuklisova-Murgasova et al., 2012] 0.8630.86^{3}
STA [Gholipour et al., 2017] 18 (18/0) [21-38] Gholipour et. al, 2017 [Gholipour et al., 2017] 0.8030.80^{3}
CHUV 33 (24/9) [21-35] (29.6±\pm3.6) MIALSRTK [Tourbier et al., 2019] 0.8030.80^{3}
Table 1: Summary of the data used for training and quantitative and qualitative evaluation.

3.1.1 Clinical dataset: FeTA

We use the subset of the publicly available dataset Fetal Tissue Annotation and Segmentation Dataset (FeTA v2.0) [Payette et al., 2021] with Simple-IRTK [Kuklisova-Murgasova et al., 2012] SR-reconstructions at isotropic resolution of 0.86mm0.86mm. After visual inspection of the images, seven volumes were excluded due to bad SR quality (3) and severe pathology (7) (e.g. major ventriculomegaly). The remaining 33 fetal brains were composed of 17 neurotypical and 16 pathological subjects, in the gestational age (GA) range of 20.9 to 34.8 weeks. Twenty-four (24) subjects (13 neurotypical and 11 pathological subjects in the GA range of 20.9 to 34.8 weeks, 28.2±\pm3.6) were randomly selected for the method development and the remaining nine (9) subjects (4 neurotypical and 5 pathological subjects in the GA range of 22.9 to 34.8 weeks, 27.4±\pm3.6) were retained for pure testing purposes. Note that details on the fetal brain pathologies are not disclosed in the dataset information.

Manual label annotations of the intracranial space tissues classified into seven categories (extra-axial cerebrospinal fluid spaces, the cortical gray matter (GM), the white matter, the ventricular system (lateral, third and fourth ventricles), the cerebellum, the deep gray matter and the brainstem) are provided for all SR reconstructed volume. In this work, we exclusively consider the cortical GM label. Annotations were manually performed following an optimized protocol. Two experts respectively annotated the external border of the cortex cerebri and the external border of the white matter, on every 2nd to 3rd slice of the axial view. Individual structure annotations are post-processed with interpolation and smoothing prior to merging into a final label maps. Ultimately, sparse interpolated annotations result in noisy label maps often showing topological inconsistencies. Figure 8 shows the extracted cortical GM from the final label maps (left) for (A) Subject 1, a 34.8 weeks of GA neurotypical subject and (B) Subject 2, a 28.1 weeks of GA pathological subject. Three-dimensional (3D) rendering evidences the presence of apertures in the final CP annotations.

As motivated in Section 2.3, topologically accurate ground truth segmentation are necessary to perform a valid topological assessment of an automatic method. In this respect, we perform further manual correction of the CP FeTA manual annotations. Four engineers refined the CP label maps of the 9 fetal brains of the clinical pure testing set. Editing of the label maps were performed using the ITK-SNAP [Yushkevich et al., 2006] software with a specific focus on the topological correctness and contour refinement of the label maps. Finally, all CP manual corrections were checked (and corrected if needed) by a pediatric radiologist with 17 years of experience. Right columns of panel (A) and (B) of Figure 8 show the corrected tissue annotations overlaid to the T2w image and their 3D rendering. In our further experiments, we refer to the corrected manual annotations as the ground truth.

The original FeTA dataset is under the ethical committee of the Canton of Zurich, Switzerland (Decision numbers: 2017-00885, 2016-01019, 2017-00167).

Refer to caption
Figure 8: Illustration of the FeTA original manual and the corrected ground truth CP segmentation for (A) Subject 1, a 34.8 weeks of GA neurotypical subject and (B) Subject 2, a 28.1 weeks of GA pathological subject. T2w images and CP segmentation overlaid (top) are respectively shown on an axial and coronal view for Subject 1 and Subject 2. 3D rendering is presented for all segmentations (bottom).

3.1.2 Atlas dataset: STA

The normative spatio-temporal atlas (STA) of the fetal brain [Gholipour et al., 2017] provides 3D high-quality isotropic volumes for all gestational age between 21 and 38 weeks. Each atlas subject is constructed with the contribution of 6 to 23 SR-reconstructed individual fetal brains. The integration of multiple subjects per gestational age reduces the morphological variability. Therefore, the T2w atlas images appear smoother than clinical acquisitions.

The atlas comes with two label maps, respectively containing the cerebral tissue and structure labels, and a regional cortex parcellation. The initial tissue label maps of more than 50 classes are converted into a 7-tissue label maps, matching those defined in the FeTA dataset. Minor adjustments are performed while synchronizing tissue and regional maps. For instance, voxel labelled as cortex in one map and as corpus callosum in the other are dumped in white matter class. As opposed to the clinical dataset, atlas labels do not require further manual corrections as they were already manually refined and present decent topology [Gholipour et al., 2017].

This dataset is approved by the Boston Children’s Hospital Institutional Review Board and the Committee on Clinical Investigation and written informed consent was obtained from all participants.

3.1.3 Out-of-domain clinical dataset

Thirty-three fetal brain clinical MR examination conducted in our institution at the Lausanne University Hospital (CHUV), Lausanne, Switzerland, were SR-reconstructed with the MIALSRTK pipeline [Tourbier et al., 2015, Tourbier et al., 2019]. SR volumes are further resampled to an isotropic resolution of 0.8mm0.8mm and an engineer coarsely realigned the volumes to the anatomical plane. This clinical set is composed of 24 neurotypical and 9 pathological subjects in the GA range of 21 to 35 weeks (29.6 ±\pm3.6). The quality of the SR reconstructions was assessed similarly as in [Khawam et al., 2021] into three categories: bad (non usable, very blurred), average (overall good with remaining partial volume effect/blurring), and excellent (good quality with no blurring). Overall, none of the clinical SR reconstruction is bad, 17 are average 14 are excellent. No reference segmentation are available for this dataset.

The local ethics committee of the Canton of Vaud, Switzerland (CER-VD 2021-00124) approved the retrospective collection and analysis of MRI data and the prospective studies for the collection and analysis of the MRI data in presence of a signed form of either general or specific consent.

3.2 Assessment metrics

Although our segmentation framework infers CP segmentation in a 2.5D multi-view strategy (2D image patches from the three orthogonal planes), we proceed to the quantitative evaluation in 3D that is of the whole cortical volume. Automatic medical image segmentation requires the use of complementary metrics for the assessment of different aspects of the segmentation [Yeghiazaryan and Voiculescu, 2018, Maier-Hein et al., 2022]. Most commonly, overlap-based (e.g. Dice similarity coefficient, Jaccard similarity coefficient, Intersection over union) and distance-based (e.g. Xth percentile Hausdorff distance, average symmetric surface distance) metrics are reported [Yeghiazaryan and Voiculescu, 2018]. However, in this work we aim to assess the segmentation not only in terms of overlap and distance accuracy to the ground truth but also in terms of shape correctness. Therefore, we also consider topology-based metrics. Table 2 summarizes the metrics used in the training (for learning monitoring) and testing (for final evaluation) phases.

The Dice Similarity Coefficient (DSC) [Dice, 1945] is an overlap-based similarity metric. Robust to outliers, it is a widely-used metric to assess medical image segmentation accuracy. Average Symmetric Surface Distance (ASSD) is the mean of the directed average surface distances [Yeghiazaryan and Voiculescu, 2018]. The latter is defined as the average of the distances of points from one surface to their closest points on the other object boundary. The ASSD is computed using the python MedPy 333https://loli.github.io/medpy/ implementation (v0.4.0).

In the case of absence of topologically accurate ground truth (i.e. see Section 3.3.1 for cross-validation details), we consider BNE1 to quantitatively assess the topology, while our proposed hole ratio HR is used in the pure testing phase (see details on topology metrics in Section 2.3).

Arrows in Table 2 indicate whether each metric is better maximized or minimized. Taking values between 0 and 1, DSC is a similarity metric that is better maximized (\uparrow). Difference metrics (ASSD, BNE1 and HR) must be minimized (\downarrow).

Overlap Boundary-distance Topology
Training DSC \uparrow ASSD \downarrow BNE1 \downarrow
Testing DSC \uparrow ASSD \downarrow HR \downarrow
Table 2: Summary of the metrics used during the training phase (for learning monitoring) and testing phase (for evaluation). Arrows indicate whether higher \uparrow or lower \downarrow scores are better.

3.3 Experiments

3.3.1 topo\mathcal{L}_{topo} parameter settings

Our first experiment consists in the setting of the TopoCP λtopo\lambda_{topo} parameter that quantifies the contribution of our topological loss. As mentioned in the prior Section 2.2.3, we use a cross-validation approach by means of which we assess multiple λtopo\lambda_{topo} in order to determine an optimal value. The ideal λtopo\lambda_{topo} is a dataset dependant hyperparameter. According to [Hu et al., 2019], λtopo\lambda_{topo} must be chosen to avoid the risk of over-weighting of topo\mathcal{L}_{topo} over bce\mathcal{L}_{bce}. Indeed, while bce\mathcal{L}_{bce} is defined at every voxel of the image, topo\mathcal{L}_{topo} is only defined at some critical points. The values 0.00020.0002, 0.0050.005, 0.0010.001, 0.050.05, 0.10.1 and 0.20.2 are the NN λtopo\lambda_{topo} evaluated in the training phase. TopoCPnTopoCP_{n} define the set of 4 networks trained for cross-validation with λn\lambda_{n}. We consider DSC, ASSD and BNE1 for evaluation. The average performances over the folds are computed for each TopoCPnTopoCP_{n} networks and ranked for each metric. TopoCPnTopoCP_{n} are finally ranked, based on the sum of metric-wise ranking, to elect the optimal λtopo\lambda_{topo}. The latter is then selected in the following experiments.

3.3.2 Methods comparison

We compare our TopoCP method to the two reference segmentation methods Baseline and Hybrid on both the clinical and atlas test sets. We assess the three complementary metrics DSC, ASSD and HR. We perform paired Wilcoxon rank-sum tests to assess the statistical significance between TopoCP and the two reference configurations. Significance level is set to 0.05.

3.3.3 GA analysis

The STA set presents a large and steady range of GA with one subject per week from 18 to 38 weeks of GA. We observe the quantitative performances of DSC, ASSD and HR along gestation, i.e. as a function of the subject GA.

3.3.4 Spatial topological analysis

From the STA set, we group the regional labels into 5 classes corresponding to the brain lobes, namely the frontal lobe, the occipital lobe, the parietal lobe, the temporal lobe, and the insula lobe. Figure 9 shows 3D rendering of the finale maps of the brain lobes for the subject atlas of 21, 30 and 38 weeks of GA.

We proceed to a lobe-based analysis of the topology HR metric to analyze if TopoCP benefits in one, some, all or none of them. We perform a paired Wilcoxon rank-sum test to assess the statistical significance between TopoCP method and the reference configurations. Statistical ignificance level is set to 0.05.

Refer to caption
Figure 9: 3D rendering of the CP for STA subjects of 21, 30 and 38 weeks of gestation. The cortical volumes are split into the 5 lobes of the brain: the frontal lobe (red), the occipital lobe (green), the parietal lobe (dark blue), the temporal lobe (yellow), and the insula lobe (light blue).

3.3.5 Group analysis: Neurotypical vs. Pathological

The FeTA testing set presents a good heterogeneity in the neurotypical (N=4) and pathological (N=5) subjects. In each group, we observe the variation of the DSC, the ASSD and the HR with TopoCP. No statistical analysis is performed due to the small sample sizes (groups of N=4 and N=5).

3.3.6 Manual annotations comparison

Ultimately, we evaluate the performances of TopoCP compared to the original FeTA manual annotations, using the topologically corrected segmentations as ground truth.

Let us note that these original FeTA annotations are sparse and interpolated, hence resulting in noisy references. Nevertheless, they are still used for training, as manual topological correction of 24 volumes would not be realistic (time/expertise effort). We quantitatively assess these segmentation with the DSC, the ASSD and the HR. The segmentation correctness difference was tested with the paired Wilcoxon rank-sum. The pp-value level for statistical significance was set at 0.05.

3.3.7 Experts evaluation

Three experts in fetal brain MRI (two radiologists and one engineer) perform an independent and blind assessment of the three automatic segmentation methods on the clinical CHUV dataset. Each fetal brain MR exam is provided with the SR reconstructed volume, the subject’s GA at scan time, the subject’s group (i.e. Neurotypical or Pathological) and the three segmentation (from configurations Baseline, Hybrid and TopoCP) that are randomly anonymized with labels AA, BB and CC.

The experts are asked to rank the segmentation masks AA, BB and CC as Best, Medium and Worst. Visualization of the images and their segmentation are done with the open-source ITK-SNAP [Yushkevich et al., 2006] software. Specifically, binary segmentation are both visualized in 2D, as an overlay to the T2w gray-scale SR images, and in 3D, with the ITK-SNAP integrated 3D viewer.

We assess the inter-rater reliability with the percentage agreement and an ordinal Gwet’s agreement coefficient (Gwet’s AC) that we interpret according to Altman’s benchmarking scale [Gwet, 2014]. We further consider a consensus evaluation as the majority voting of the experts’ evaluation.

4 Results and Discussion

4.1 λtopo\lambda_{topo} hyper-parameter tuning

Table 3 shows the averaged validation scores of all three configurations, and specifically for each λtopo\lambda_{topo} assessed in the TopoCP configuration. Our first observation is that, regardless of the value of the λtopo\lambda_{topo} parameter, TopoCP is better performing than both reference methods, as we reach the state of the art performances in all the three complementary metrics. Overall, all TopoCPnTopoCP_{n} give similar DSC (mean: 0.76) and ASSD (mean ±\pm standard deviation: 0.27 ±\pm 0.01) performances, although λtopo=0.01\lambda_{topo}=0.01 is of the highest rank for both overlap and boundary-distance based scores. An increased inter-TopoCPnTopoCP_{n} variability appears for the topology-based metric (BNE11) with mean score from 20.6 to 22.5. Counting the number of bores in the CP segmentation, λtopo=0.005\lambda_{topo}=0.005 is giving the best performances. We observe large BNE11 standard deviation for all TopoCPnTopoCP_{n}. Nonetheless, the finest topology-relative λtopo\lambda_{topo} is not only giving the minimum averaged BNE11, but is also noticeably presenting a smaller BNE11 standard deviation of 7.8 (BNE11 range: from 7.8 to 10). Therefore, λtopo=0.005\lambda_{topo}=0.005 is the most accurate and precise of the λtopo\lambda_{topo} assessed as for the topology fidelity. The substantial fluctuation in the topological metric shows the importance of the choice of the λtopo\lambda_{topo} hyper-parameter.

Finally, our global ranking that is derived from metric-wise rankings evidences the ideal value λtopo=0.005\lambda_{topo}=0.005. We observe that none of the extreme values considered (i.e. 0.00020.0002 and 0.20.2) are in the Top 3 best performing λtopo\lambda_{topo}. Therefore, we can say that although λtopo=0.005\lambda_{topo}=0.005 might not be the exact optimal λtopo\lambda_{topo}, it certainly falls in a relevant range and in the right order of magnitude.

Configuration DSC \uparrow ASSD \downarrow BNE11 \downarrow Ranking \downarrow
Baseline 0.748 ±\pm 0.009 0.292 ±\pm 0.02 29.8 ±\pm 14.5
Hybrid 0.744 ±\pm 0.004 0.297 ±\pm 0.01 31 ±\pm 13.4
TopoCP λtopo\lambda_{topo} 0.0002 0.758 ±\pm 0.007 (5) 0.274 ±\pm 0.01 (5) 22.1 ±\pm 10 (4) 5
0.001 0.761 ±\pm 0.007 (2) 0.270 ±\pm 0.01 (3) 21.0 ±\pm 8.5 (2) 2
0.005 0.760 ±\pm 0.007 (3) 0.269 ±\pm 0.01 (2) 20.6 ±\pm 7.8 (1) 1
0.01 0.762 ±\pm 0.007 (1) 0.268 ±\pm 0.01 (1) 22.5 ±\pm 9.0 (5) 2
0.2 0.760 ±\pm 0.005 (4) 0.272 ±\pm 0.01 (4) 21.6 ±\pm 7.9 (3) 4
Table 3: Table of the validation scores (mean ±\pm standard deviation) of the dice similarity coefficient (DSC), the average symmetric surface distance (ASSD) and the 1-dimensional Betti number error (BNE11). Arrows indicate whether the metric is better maximized \uparrow or minimized \downarrow. The best scores between all λtopo\lambda_{topo} are shown in bold. A ranking for each metric is shown in parenthesis. The final ranking is formulated from the sum of metric-wise ranking scores. Baseline stands for the bce\mathcal{L}_{bce} loss and Hybrid corresponds to bce+dice\mathcal{L}_{bce}+\mathcal{L}_{dice}.

4.2 Methods comparison

Figure 10 illustrates the accuracy of the fetal CP segmentation for a pathological subject of 26.6 (Subject 1) and a neurotypical subject of 34.8 (Subject 2) weeks of GA. The topologically corrected ground truth and the three configurations segmentation with Baseline, Hybrid, and TopoCP, are compared. Qualitative 2D assessment (top rows) of the segmentation are presented as an overlay on the T2w image on an axial, respectively coronal, view for Subject 1, respectively Subject 2. Additionally, 3D rendering of the CP segmentation are presented in the bottom rows. Overall, we observe that all configurations generates a thinner ribbon than the corrected ground truth. Specifically, TopoCP presents fixed cortical connections that are broken in the Baseline and Hybrid segmentations (white arrows). The CP TopoCP segmentation 3D rendering seems to present less bores than the two reference configurations Baseline and Hybrid. In particular, the segmentation appears, equivalently for the young and the old fetuses, more challenging, for all methods, in the lower parts of the frontal and temporal lobes, although TopoCP seems to exhibit a more sensitive segmentation in these areas. TopoCP appears to be more sensitive to the complexity of the CP morphology in older fetuses. White circles evidence in Subject 2 an improved segmentation in the hippocampal area and the depth of a gyrification. White arrows show area where the topological correctness recovered with TopoCP, compared to the Baseline. In Subject 1, two connections are fixed in the frontal lobe, although one of them is already fixed in the Hybrid configuration.

Refer to caption
Figure 10: Segmentation results on an axial view (top) and 3D rendering (bottom) of the cortical plate on FeTA subjects: (A) a neurotypical subject of 34.8 weeks of GA, and (B) a pathological subject of 28.1 weeks of GA. Comparison of (a) the manually corrected ground truth segmentation, (b) the results of the Baseline networks trained with bce\mathcal{L}_{bce}, (c) the results of the Hybrid configuratio trained with =bce+dice\mathcal{L}=\mathcal{L}_{bce}+\mathcal{L}_{dice} and (d) and the results obtained with our custom method TopoCP trained with =(1λtopo)bce+λtopotopo\mathcal{L}=(1-\lambda_{topo})\mathcal{L}_{bce}+\lambda_{topo}\mathcal{L}_{topo}, where λtopo=0.005\lambda_{topo}=0.005. White circles show representative area where cortical gyrification have a better in-depth segmentation with TopoCP method. White arrows show fixed connections using our TopoCP method compared to the reference segmentations.

Quantitative results for both test sets (total of 27 cases), FeTA and STA, are respectively presented in Table 4(a) and Table 4(b). Tables show the mean ± standard deviation of the CP segmentation for each testing metric (DSC, ASSD and HR) in each configuration. Overall, the performance of the segmentation framework is improved when trained with our optimal TopoCP configuration (i.e. λtopo=0.005\lambda_{topo}=0.005) for all metrics in both datasets. TopoCP is always performing significantly better than both Baseline and Hybrid methods.

Furthermore, while all analyzed aspects of the segmentation, namely the overlap, the boundary-distance and the topology are improved with TopoCP compared to the reference methods, we observe a drop in the performances between FeTA and STA evaluation. Indeed, DSC goes from 0.85 in FeTA to 0.79 in STA, ASSD from 0.19 to 0.41 and HR from 0.06 to 0.23. We believe this is due to the domain shift between FeTA and STA images (different reconstruction pipelines, different intensity-based processing, etc) generating an inter-dataset variation in the data distribution. Such domain gap between the two sets of images is not learned from the training data that are only composed of FeTA images. Still, DL can generalize to some extent. Further training with multi-dataset images or the use of domain adaptation strategies can partially fit the domain gap. Let us note that we do not address the data distribution generalization in this paper as it is beyond its scope. Furthermore, such performance drop occurs also in Baseline and Hybrid approach.

DSC \uparrow ASSD \downarrow HR \downarrow
Baseline 0.82 ±\pm 0.02 0.22 ±\pm 0.05 0.093 ±\pm 0.03
Hybrid 0.82 ±\pm 0.02 0.23 ±\pm 0.06 0.10 ±\pm 0.04
TopoCP 0.85 ±\pm 0.01 (*, +) 0.19 ±\pm 0.04 (*, +) 0.06 ±\pm 0.03 (*, +)
(a) FeTA
DSC \uparrow ASSD \downarrow HR \downarrow
Baseline 0.77 ±\pm 0.05 0.42 ±\pm 0.14 0.25 ±\pm 0.10
Hybrid 0.77 ±\pm 0.05 0.42 ±\pm 0.15 0.26 ±\pm 0.11
TopoCP 0.79 ±\pm 0.05 (*, +) 0.41 ±\pm 0.18 (*, +) 0.23 ±\pm 0.10 (*, +)
(b) STA
Table 4: Tables of the metrics computed on the pure testing sets FeTA 4(a) and STA 4(b). Mean ±\pm standard deviation for the dice similarity coefficient (DSC), Average symmetric surface distance (ASSD) and holes ratio (HR) are presented. Arrows indicate whether the metric is better maximized \uparrow or minimized \downarrow. The best scores between all three configurations are shown in bold. pp-values of Wilcoxon rank sum test between TopoCP and the reference configurations, Baseline (*) and Hybrid (+), are considered statistically significant for p<0.05p<0.05.

4.3 Segmentation performance over gestation

Taking advantage of the steady GA-distribution in the STA set, we perform an analysis of the metrics throughout gestation. Figure 11 shows the performance metrics as a function of the GA for 18 cases from 21 to 37 weeks of GA. Regardless of the configuration, we observe a trend in the performances based on the GA. Indeed, all metrics reach better performances for subjects younger than 30 weeks of GA. From week 23 to 31, all three methods (Basline, Hybrid and TopoCP) seem to give equivalent scores. Outside this range (i.e. GA<23<23 and GA>30>30 weeks), TopoCP is always performing better than the other two methods, except for one outlier subject of 38 weeks of GA.

We visually inspect the TopoCP segmentation mask of the STA 38-weeks-old subject to better understand the origin of this outlier. Specifically in the cerebellum, we observe the presence of false positives that are connected to the main cortical segmentation through brainstem false positives. The cerebellum is a ”little brain” composed of white matter encased in the cerebellar cortex. In terms of fetal brain T2w MR contrast and similarly to the CP, the cerebellar cortex expresses as a thin dark ribbon surrounding white matter. Therefore, it is a challenging area to accurately differentiate in the segmentation of the fetal CP at a patch-level. Similar mis-segmentations appear in younger fetuses, nevertheless our post-processing step to keep the biggest connected component filters out most of it. In this oldest subject, cerebellum errors are worsened with brainstem mis-segmentation. We hypothesize such false positive errors in the cerebellum are due to mis-leading contextual information, due to the reduced field of view of the patches. Therefore, we believe that increasing the patch size could help to overcome these mis-segmentation. Nevertheless, while the performances are particularly damaged in the distance metric, TopoCP still performs better in terms of DSC and HR compares to the other configurations.

Overall, it is in the second half of the third trimester GAs (i.e. GA>>30 weeks) that we observe an increased benefit from TopoCP, compared to other methods. Our topological loss has a stronger positive effect on the topological errors for old subjects with more complex topology, although the whole range of gestational age consistently presents benefits from TopoCP.

We derive two hypothesis on the variation of the performances throughout gestation. First, we recall that the training data present subjects in the range 20.9 to 34.8 weeks of GA with mean 28.2 and standard deviation 3.6. Therefore, variation of young and old fetal brains are less represented in the learning process. Additionally, third trimester subjects present advanced sulcal patterns, resulting in a substantially more complex topology. Therefore, we postulate this accentuate the unstable evolution of segmentation accuracy over gestation.

Refer to caption
Figure 11: Evolution of the performance metrics (DSC: Dice similarity coefficient; ASSD: average symmetric surface distance; HR: holes ratio) on the STA images as a function of the gestational age (from 21 to 38 weeks of gestation).

4.4 Topology analysis per brain lobes

Figure 12 presents a comparison of HR at a lobe-level between the configurations. This boxplot evidences the significant benefits (p<0.05p<0.05) of TopoCP in most areas (frontal, occipital, temporal and insula lobes) compared to the Baseline configuration. In the parietal lobe, TopoCP is on average performing better than the Baseline although without statistical significance. Compared to the Hybrid configuration, TopoCP presents a significantly lower HR in all brain lobes. Regardless of the configuration, the parietal lobe is always the better segmented lobe in terms of HR as opposed to the insula lobe.

Refer to caption
Figure 12: Comparison of the hole ratio (HR) in the fetal brain lobes (frontal, occipital, parietal, temporal and insula) on the STA dataset for all configurations (Baseline, Hybrid and TopoCP). Dashed horizontal lines indicate the per-lobe mean HR for each configuration. p-values of paired Wilcoxon rank-sum tests are displayed comparing TopoCP to each reference methods.

4.5 Groups analysis

Table 5 summarizes the performance metrics (mean ±\pm standard deviation) in both neurotypical (NT) and pathological (PT) groups. We observe the same trend in this group-based analysis as in the overall method comparison (see Tables 4 in Section 4.2). Regrettably, precise information on the pathology are not available in the FeTA dataset meta data. Therefore, we cannot draw any conclusion relative to eventual cortical pathologies.

DSC \uparrow ASSD \downarrow HR \downarrow
NT PT NT PT NT PT
Baseline  0.82 ±\pm 0.01 0.82 ±\pm 0.02 0.24 ±\pm 0.07 0.21 ±\pm 0.03 0.09 ±\pm 0.04 0.09 ±\pm 0.04
Hybrid  0.81 ±\pm 0.02 0.82 ±\pm 0.02 0.25 ±\pm 0.07 0.22 ±\pm 0.04 0.1 ±\pm 0.05 0.1 ±\pm 0.04
TopoCP  0.84 ±\pm 0.01  0.85 ±\pm 0.01  0.21 ±\pm 0.06  0.18 ±\pm 0.02  0.06 ±\pm 0.03  0.06 ±\pm 0.03
Table 5: Table of the metrics computed on the FeTA pure testing for neurotypical (NT) and pathological (PT) groups. Mean ±\pm standard deviation for the dice similarity coefficient (DSC), Average symmetric surface distance (ASSD) and hole ratio (HR) are presented. Arrows indicate whether the metric is better maximized \uparrow or minimized \downarrow. The best scores between our TopoCP method and the original annotations are shown in bold.

4.6 Robustness to noisy manual annotations

Figure 13 (top) shows a comparative T2w axial view of the ground truth topologically corrected segmentation (A), the original manual annotation provided in FeTA (B) and the TopoCP predicted segmentation (C). Overall, we observe an improved accuracy with the automatic method. Specifically, white arrows indicate cortical location where TopoCP fixes topological inconsistencies, compared to the original manual annotations. The white circle focuses on the hippocampal area where the manual annotations are confused. Figure 13 (bottom) shows 3D rendering of the true cortical volume (green). In the manual and automatic TopoCP segmentation, the 1-dimensional holes are evidenced with the false negatives connected to 1-dimensional holes (light camel).

Table 6 shows a comparison of the performance metrics of the original FeTA manual annotations on 9 subjects and our TopoCP method. TopoCP is significantly better (*) than the original manual annotations in all metrics (DSC, ASSD and HR).

The automatic TopoCP segmentation method is able to learn segmentation features from noisy annotations. This improvement is conveyed in all three similarity, boundary-distance and topology -based metrics.

Refer to caption
Figure 13: Qualitative 2D and 3D assessment of CP segmentation on a 31.5 weeks of GA neurotypical subject. Comparison of (A) the corrected ground truth to (B) the original manual annotation and (C) our TopoCP automatic segmentation method. Segmentations are overlaid on a T2w axial view (top). Segmentation 3D renderings (bottom) highlight the true positives (green) and false negatives connected to 1-dimensional holes (light red).
DSC \uparrow ASSD \downarrow HR \downarrow
TopoCP 0.85 ±\pm 0.01 (*) 0.19 ±\pm 0.04 (*) 0.062 ±\pm 0.03 (*)
Manual 0.82 ±\pm 0.02 0.23 ±\pm 0.04 0.23 ±\pm 0.06
Table 6: Table of the metrics computed on the pure testing sets FeTA. Mean ±\pm standard deviation for the dice similarity coefficient (DSC), Average symmetric surface distance (ASSD) and hole ratio (HR) are presented. Arrows indicate whether the metric is better maximized \uparrow or minimized \downarrow. The best scores between our TopoCP method and the original annotations are shown in bold. pp-values of Wilcoxon rank sum test between TopoCP and the original annotations are considered statistically significant (*) for p<0.05p<0.05.

4.7 Out-of-domain qualitative assessment

Table 7 summarizes the classification results of the segmentation masks according to each expert into Worst, Medium or Best. A consensus of the three experts assessment is presented in the bottom row.

The estimated Gwet’s AC between the three experts was 0.680.68 (standard error (SE): 0.100.10) for Worst, 0.680.68 (SE: 0.100.10) for Medium and 11 for Best segmentation classifications. According to Altman’s benchmarking scale, the estimated coefficients for Worst and Medium segmentations are considered to be either Moderate, Good or Very Good with a probability of 0.990.99. The Best segmentation classification has a Very good agreement between the experts with a probability of 1. With a percentage agreement of 100 %, the consensus of experts classifies TopoCP as the Best segmentation method in 100% of the cases. Inter-rater discrepancies are observed in the choice of Worst and Medium between Baseline and Hybrid segmentation.

Although TopoCP is ranked as Best segmentation in all cases, predictions still present many segmentation errors. We emphasize that the distribution of this clinical set differs from the FeTA training set as they were generated with different SR methods. Nevertheless, while all three configurations present altered segmentation due to the domain shift, still, TopoCP remains the better performing method.

Best Medium Worst
Baseline Hybrid TopoCP Baseline Hybrid TopoCP Baseline Hybrid TopoCP
Radiologist 1 0 0 31 25 6 0 6 25 0
Radiologist 2 0 0 31 26 5 0 5 26 0
Engineer 0 0 31 23 8 0 8 23 0
Percentage agreement 100 % 78 % 78 %
Gwet’s AC (SE) 1 (-) 0.68 (0.10) 0.68 (0.10)
Consensus 0 0 31 26 5 0 5 26 0
Table 7: Table of the three experts (two radiologists and one engineer) qualitative ranking of the three segmentation configurations (Baseline, Hybrid and TopoCP) as Best, Medium and Worst. Percentage agreement between experts and Gwet’s AC with standard error (SE) are presented for each ranking category. Finally, a consensus ranking is presented in the bottom row as a majority voting of the experts’ evaluation.

5 Conclusion

In this work, we developed a topological loss function for the optimization of deep-learning based segmentation methods of the fetal cortical plate in MRI. Our core contribution lies in the multi-dimensional approach of this generalized loss function. Jointly, we presented an original topology-based metric to quantify the 1-dimensional topological errors both in terms of count and size. We presented extensive quantitative and qualitative validation on a total of 58 fetal brains of a wide range of GA (from 21 to 38 weeks of GA), including both neurotypical and pathological subjects. We compared our TopoCP method to (i) state-of-the-art methods and (ii) semi-automatic noisy reference segmentation. Experiments have shown that the integration of a topological constraint in the segmentation framework of the CP in fetal brain MRI significantly benefits not only the shape correctness - as it first aims, but also the overlap and distance accuracy. Although our segmentation framework is implemented for 2D image patches, 3D information is integrated thanks to the multi-view pipeline with the extraction of patches from the three orthogonal orientations (axial, coronal and sagittal). While our approach cannot be considered to be 3D, yet the benefit of our multi-dimensional topological loss is conveyed in the 3D metrics, including the topology-based one. Nevertheless, we believe that the adoption of a real 3D-based framework could only improve the overall performances, although we acknowledge that the computational cost of the topological loss in this process is an important shortcoming. Moreover, results evidence that the generalization of the learned topology is not hampered by the noisiness of the manual annotations used for training.

This study is the first to address both the specific improvement of the topological correctness of the CP segmentation, and the definition of a topological assessment. The reduced gap in the topological and shape correctness accuracy is ultimately associated with minimal manual refinement needed for further quantitative surface-based analysis. Future work will focus on a wider generalization of our method application. Indeed, while our method is formulated to consider multiple dimensions, we only present a 2D application. The overall framework can be generalized for a 3D-based approach. Similarly, we focused here on a single-tissue, namely the cortical GM segmentation, segmentation problem, although generalization to a multi-tissue segmentation approach could be applied.

6 Acknowledgments

This work is supported by the Swiss National Science Foundation through grants 182602 and 141283. We acknowledge access to the facilities and expertise of the CIBM Center for Biomedical Imaging, a Swiss research center of excellence founded and supported by Lausanne University Hospital (CHUV), University of Lausanne (UNIL), Ecole polytechnique fédérale de Lausanne (EPFL), University of Geneva (UNIGE) and Geneva University Hospitals (HUG). We acknowledge Dr Hélène Lajous and Andrés Le Boeuf for their help in the topology correction of the manual annotations.

The authors have no relevant financial or non-financial interests to disclose.

References

  • [noa, ] GUDHI, Geometry Understanding in Higher Dimensions.
  • [Barkovich et al., 2005] Barkovich, A. J., Kuzniecky, R. I., Jackson, G. D., Guerrini, R., and Dobyns, W. B. (2005). A developmental and genetic classification for malformations of cortical development. Neurology, 65(12):1873–1887.
  • [Caldairou et al., 2011] Caldairou, B., Passat, N., Habas, P., Studholme, C., Koob, M., Dietemann, J.-L., and Rousseau, F. (2011). Segmentation of the cortex in fetal MRI using a topological model. In 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pages 2045–2048, Chicago, IL, USA. IEEE.
  • [Clouchoux et al., 2013] Clouchoux, C., du Plessis, A. J., Bouyssi-Kobar, M., Tworetzky, W., McElhinney, D. B., Brown, D. W., Gholipour, A., Kudelski, D., Warfield, S. K., McCarter, R. J., Robertson, R. L., Evans, A. C., Newburger, J. W., and Limperopoulos, C. (2013). Delayed Cortical Development in Fetuses with Complex Congenital Heart Disease. Cerebral Cortex, 23(12):2932–2943.
  • [Clouchoux et al., 2012] Clouchoux, C., Guizard, N., Evans, A. C., du Plessis, A. J., and Limperopoulos, C. (2012). Normative fetal brain growth by quantitative in vivo magnetic resonance imaging. American Journal of Obstetrics and Gynecology, 206(2):173.e1–173.e8.
  • [Clough et al., 2021] Clough, J., Byrne, N., Oksuz, I., Zimmer, V. A., Schnabel, J. A., and King, A. (2021). A Topological Loss Function for Deep-Learning based Image Segmentation using Persistent Homology. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1.
  • [Cohen-Steiner et al., 2010] Cohen-Steiner, D., Edelsbrunner, H., Harer, J., and Mileyko, Y. (2010). Lipschitz Functions Have L p -Stable Persistence. Foundations of Computational Mathematics, 10(2):127–139.
  • [Dice, 1945] Dice, L. R. (1945). Measures of the Amount of Ecologic Association Between Species. Ecology, 26(3):297–302.
  • [Dou et al., 2021] Dou, H., Karimi, D., Rollins, C. K., Ortinau, C. M., Vasung, L., Velasco-Annis, C., Ouaalam, A., Yang, X., Ni, D., and Gholipour, A. (2021). A Deep Attentive Convolutional Neural Network for Automatic Cortical Plate Segmentation in Fetal MRI. IEEE Transactions on Medical Imaging, 40(4):1123–1133.
  • [Ebner et al., 2020] Ebner, M., Wang, G., Li, W., Aertsen, M., Patel, P. A., Aughwane, R., Melbourne, A., Doel, T., Dymarkowski, S., De Coppi, P., David, A. L., Deprest, J., Ourselin, S., and Vercauteren, T. (2020). An automated framework for localization, segmentation and super-resolution reconstruction of fetal brain MRI. NeuroImage, 206:116324.
  • [Egaña-Ugrinovic et al., 2013] Egaña-Ugrinovic, G., Sanz-Cortes, M., Figueras, F., Bargalló, N., and Gratacós, E. (2013). Differences in cortical development assessed by fetal MRI in late-onset intrauterine growth restriction. American Journal of Obstetrics and Gynecology, 209(2):126.e1–126.e8.
  • [Fetit et al., 2020] Fetit, A., Alansary, A., Cordero-Grande, L., Cupitt, J., Davidson, A., Edwards, A., Hajnal, J., Hughes, E., Kamnitsas, K., Kyriakopoulou, V., Makropoulos, A., Patkee, P., Price, A., Rutherford, M., and Rueckert, D. (2020). A deep learning approach to segmentation of the developing cortex in fetal brain MRI with minimal manual labeling. pages 241–261. PMLR.
  • [Garcia et al., 2018] Garcia, K. E., Kroenke, C. D., and Bayly, P. V. (2018). Mechanics of cortical folding: stress, growth and stability. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1759):20170321.
  • [Garel, 2004] Garel, C. (2004). MRI of the Fetal Brain. Springer Berlin Heidelberg, Berlin, Heidelberg.
  • [Garel et al., 2003] Garel, C., Elmaleh, M., Chantrel, E., Sebag, G., and Brisse, H. (2003). Fetal MRI: normal gestational landmarks for cerebral biometry, gyration and myelination. Child’s Nervous System, 19(7-8):422–425.
  • [Gholipour et al., 2014] Gholipour, A., Estroff, J. A., Barnewolt, C. E., Robertson, R. L., Grant, P. E., Gagoski, B., Warfield, S. K., Afacan, O., Connolly, S. A., Neil, J. J., Wolfberg, A., and Mulkern, R. V. (2014). Fetal MRI: A Technical Update with Educational Aspirations. Concepts in Magnetic Resonance. Part A, Bridging Education and Research, 43(6):237–266.
  • [Gholipour et al., 2010] Gholipour, A., Estroff, J. A., and Warfield, S. K. (2010). Robust Super-Resolution Volume Reconstruction From Slice Acquisitions: Application to Fetal Brain MRI. IEEE Transactions on Medical Imaging, 29(10):1739–1758.
  • [Gholipour et al., 2017] Gholipour, A., Rollins, C. K., Velasco-Annis, C., Ouaalam, A., Akhondi-Asl, A., Afacan, O., Ortinau, C. M., Clancy, S., Limperopoulos, C., Yang, E., Estroff, J. A., and Warfield, S. K. (2017). A normative spatiotemporal MRI atlas of the fetal brain for automatic segmentation and analysis of early brain growth. Scientific Reports, 7(1):476.
  • [Griffiths et al., 2010] Griffiths, P., Reeves, M., Morris, J., Mason, G., Russell, S., Paley, M., and Whitby, E. (2010). A Prospective Study of Fetuses with Isolated Ventriculomegaly Investigated by Antenatal Sonography and In Utero MR Imaging. American Journal of Neuroradiology, 31(1):106–111.
  • [Griffiths et al., 2019] Griffiths, P. D., Bradburn, M., Campbell, M. J., Cooper, C. L., Embleton, N., Graham, R., Hart, A. R., Jarvis, D., Kilby, M. D., Lie, M., Mason, G., Mandefield, L., Mooney, C., Pennington, R., Robson, S. C., and Wailoo, A. (2019). MRI in the diagnosis of fetal developmental brain abnormalities: the MERIDIAN diagnostic accuracy study. Health Technology Assessment (Winchester, England), 23(49):1–144.
  • [Gwet, 2014] Gwet, K. L. (2014). Handbook of inter-rater reliability: the definitive guide to measuring the extent of agreement among raters. Advances Analytics, LLC, Gaithersburg, Md, fourth edition edition.
  • [Hong et al., 2020] Hong, J., Yun, H. J., Park, G., Kim, S., Laurentys, C. T., Siqueira, L. C., Tarui, T., Rollins, C. K., Ortinau, C. M., Grant, P. E., Lee, J.-M., and Im, K. (2020). Fetal Cortical Plate Segmentation Using Fully Convolutional Networks With Multiple Plane Aggregation. Frontiers in Neuroscience, 14:591683.
  • [Hu et al., 2019] Hu, X., Li, F., Samaras, D., and Chen, C. (2019). Topology-Preserving Deep Image Segmentation. In Wallach, H., Larochelle, H., Beygelzimer, A., Alché-Buc, F. d., Fox, E., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  • [Im et al., 2013] Im, K., Pienaar, R., Paldino, M. J., Gaab, N., Galaburda, A. M., and Grant, P. E. (2013). Quantification and discrimination of abnormal sulcal patterns in polymicrogyria. Cerebral Cortex (New York, N.Y.: 1991), 23(12):3007–3015.
  • [Khalili et al., 2019] Khalili, N., Lessmann, N., Turk, E., Claessens, N., Heus, R. d., Kolk, T., Viergever, M., Benders, M., and Išgum, I. (2019). Automatic brain tissue segmentation in fetal MRI using convolutional neural networks. Magnetic Resonance Imaging, 64:77–89.
  • [Khawam et al., 2021] Khawam, M., de Dumast, P., Deman, P., Kebiri, H., Yu, T., Tourbier, S., Lajous, H., Hagmann, P., Maeder, P., Thiran, J.-P., Meuli, R., Dunet, V., Bach Cuadra, M., and Koob, M. (2021). Fetal Brain Biometric Measurements on 3D Super-Resolution Reconstructed T2-Weighted MRI: An Intra- and Inter-observer Agreement Study. Frontiers in Pediatrics, 9:639746.
  • [Kuklisova-Murgasova et al., 2012] Kuklisova-Murgasova, M., Quaghebeur, G., Rutherford, M. A., Hajnal, J. V., and Schnabel, J. A. (2012). Reconstruction of fetal brain MRI with intensity matching and complete outlier removal. Medical Image Analysis, 16(8):1550–1564.
  • [Kyriakopoulou et al., 2017] Kyriakopoulou, V., Vatansever, D., Davidson, A., Patkee, P., Elkommos, S., Chew, A., Martinez-Biarge, M., Hagberg, B., Damodaram, M., Allsop, J., Fox, M., Hajnal, J. V., and Rutherford, M. A. (2017). Normative biometry of the fetal brain using magnetic resonance imaging. Brain Structure and Function, 222(5):2295–2307.
  • [Leibovitz et al., 2022] Leibovitz, Z., Lerman-Sagie, T., and Haddad, L. (2022). Fetal Brain Development: Regulating Processes and Related Malformations. Life, 12(6):809.
  • [Lenroot and Giedd, 2006] Lenroot, R. K. and Giedd, J. N. (2006). Brain development in children and adolescents: Insights from anatomical magnetic resonance imaging. Neuroscience & Biobehavioral Reviews, 30(6):718–729.
  • [Leventer et al., 2008] Leventer, R. J., Guerrini, R., and Dobyns, W. B. (2008). Malformations of cortical development and epilepsy. Dialogues in Clinical Neuroscience, 10(1):47–62.
  • [Maier-Hein et al., 2022] Maier-Hein, L., Reinke, A., Christodoulou, E., Glocker, B., Godau, P., Isensee, F., Kleesiek, J., Kozubek, M., Reyes, M., Riegler, M. A., Wiesenfarth, M., Baumgartner, M., Eisenmann, M., Heckmann-Nötzel, D., Kavur, A. E., Rädsch, T., Tizabi, M. D., Acion, L., Antonelli, M., Arbel, T., Bakas, S., Bankhead, P., Benis, A., Cardoso, M. J., Cheplygina, V., Cimini, B., Collins, G. S., Farahani, K., van Ginneken, B., Hashimoto, D. A., Hoffman, M. M., Huisman, M., Jannin, P., Kahn, C. E., Karargyris, A., Karthikesalingam, A., Kenngott, H., Kopp-Schneider, A., Kreshuk, A., Kurc, T., Landman, B. A., Litjens, G., Madani, A., Maier-Hein, K., Martel, A. L., Mattson, P., Meijering, E., Menze, B., Moher, D., Moons, K. G. M., Müller, H., Nickel, F., Nichyporuk, B., Petersen, J., Rajpoot, N., Rieke, N., Saez-Rodriguez, J., Gutiérrez, C. S., Shetty, S., van Smeden, M., Sudre, C. H., Summers, R. M., Taha, A. A., Tsaftaris, S. A., Van Calster, B., Varoquaux, G., and Jäger, P. F. (2022). Metrics reloaded: Pitfalls and recommendations for image analysis validation. Publisher: arXiv Version Number: 1.
  • [Makropoulos et al., 2018] Makropoulos, A., Counsell, S. J., and Rueckert, D. (2018). A review on automatic fetal and neonatal brain MRI segmentation. NeuroImage, 170:231–248.
  • [Martin Abadi et al., 2015] Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng (2015). TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems.
  • [Payette et al., 2021] Payette, K., de Dumast, P., Kebiri, H., Ezhov, I., Paetzold, J. C., Shit, S., Iqbal, A., Khan, R., Kottke, R., Grehten, P., Ji, H., Lanczi, L., Nagy, M., Beresova, M., Nguyen, T. D., Natalucci, G., Karayannis, T., Menze, B., Bach Cuadra, M., and Jakab, A. (2021). An automatic multi-tissue human fetal brain segmentation benchmark using the Fetal Tissue Annotation Dataset. Scientific Data, 8(1):167.
  • [Payette et al., 2022] Payette, K., Li, H., de Dumast, P., Licandro, R., Ji, H., Siddiquee, M. M. R., Xu, D., Myronenko, A., Liu, H., Pei, Y., Wang, L., Peng, Y., Xie, J., Zhang, H., Dong, G., Fu, H., Wang, G., Rieu, Z., Kim, D., Kim, H. G., Karimi, D., Gholipour, A., Torres, H. R., Oliveira, B., Vilaça, J. L., Lin, Y., Avisdris, N., Ben-Zvi, O., Bashat, D. B., Fidon, L., Aertsen, M., Vercauteren, T., Sobotka, D., Langs, G., Alenyà, M., Villanueva, M. I., Camara, O., Fadida, B. S., Joskowicz, L., Weibin, L., Yi, L., Xuesong, L., Mazher, M., Qayyum, A., Puig, D., Kebiri, H., Zhang, Z., Xu, X., Wu, D., Liao, K., Wu, Y., Chen, J., Xu, Y., Zhao, L., Vasung, L., Menze, B., Cuadra, M. B., and Jakab, A. (2022). Fetal Brain Tissue Annotation and Segmentation Challenge Results. Publisher: arXiv Version Number: 1.
  • [Pier et al., 2016] Pier, D. B., Gholipour, A., Afacan, O., Velasco-Annis, C., Clancy, S., Kapur, K., Estroff, J. A., and Warfield, S. K. (2016). 3D Super-Resolution Motion-Corrected MRI: Validation of Fetal Posterior Fossa Measurements. Journal of Neuroimaging: Official Journal of the American Society of Neuroimaging, 26(5):539–544.
  • [Prayer et al., 2017] Prayer, D., Malinger, G., Brugger, P. C., Cassady, C., De Catte, L., De Keersmaecker, B., Fernandes, G. L., Glanc, P., Gonçalves, L. F., Gruber, G. M., Laifer-Narin, S., Lee, W., Millischer, A.-E., Molho, M., Neelavalli, J., Platt, L., Pugash, D., Ramaekers, P., Salomon, L. J., Sanz, M., Timor-Tritsch, I. E., Tutschek, B., Twickler, D., Weber, M., Ximenes, R., and Raine-Fenning, N. (2017). ISUOG Practice Guidelines: performance of fetal magnetic resonance imaging. Ultrasound in Obstetrics & Gynecology, 49(5):671–680.
  • [Pérez-García et al., 2021] Pérez-García, F., Sparks, R., and Ourselin, S. (2021). TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Computer Methods and Programs in Biomedicine, 208:106236.
  • [Rajagopalan et al., 2011] Rajagopalan, V., Scott, J., Habas, P. A., Kim, K., Corbett-Detig, J., Rousseau, F., Barkovich, A. J., Glenn, O. A., and Studholme, C. (2011). Local tissue growth patterns underlying normal fetal human brain gyrification quantified in utero. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 31(8):2878–2887.
  • [Ronneberger et al., 2015] Ronneberger, O., Fischer, P., and Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. In Navab, N., Hornegger, J., Wells, W. M., and Frangi, A. F., editors, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, volume 9351, pages 234–241. Springer International Publishing, Cham. Series Title: Lecture Notes in Computer Science.
  • [Rote and Vegter, 2006] Rote, G. and Vegter, G. (2006). Computational Topology: An Introduction. In Boissonnat, J.-D. and Teillaud, M., editors, Effective Computational Geometry for Curves and Surfaces, pages 277–312. Springer Berlin Heidelberg.
  • [Rousseau et al., 2006] Rousseau, F., Glenn, O. A., Iordanova, B., Rodriguez-Carranza, C., Vigneron, D. B., Barkovich, J. A., and Studholme, C. (2006). Registration-Based Approach for Reconstruction of High-Resolution In Utero Fetal MR Brain Images. Academic Radiology, 13(9):1072–1081.
  • [Salomon et al., 2006] Salomon, L., Ouahba, J., Delezoide, A.-L., Vuillard, E., Oury, J.-F., Sebag, G., and Garel, C. (2006). Third-trimester fetal MRI in isolated 10- to 12-mm ventriculomegaly: is it worth it? BJOG: An International Journal of Obstetrics and Gynaecology, 113(8):942–947.
  • [Severino et al., 2020] Severino, M., Geraldo, A. F., Utz, N., Tortora, D., Pogledic, I., Klonowski, W., Triulzi, F., Arrigoni, F., Mankad, K., Leventer, R. J., Mancini, G. M. S., Barkovich, J. A., Lequin, M. H., and Rossi, A. (2020). Definitions and classification of malformations of cortical development: practical guidelines. Brain, 143(10):2874–2894.
  • [Tarui et al., 2018] Tarui, T., Madan, N., Farhat, N., Kitano, R., Ceren Tanritanir, A., Graham, G., Gagoski, B., Craig, A., Rollins, C. K., Ortinau, C., Iyer, V., Pienaar, R., Bianchi, D. W., Grant, P. E., and Im, K. (2018). Disorganized Patterns of Sulcal Position in Fetal Brains with Agenesis of Corpus Callosum. Cerebral Cortex (New York, N.Y.: 1991), 28(9):3192–3203.
  • [Tierney and Nelson, 2009] Tierney, A. L. and Nelson, C. A. (2009). Brain Development and the Role of Experience in the Early Years. Zero to Three, 30(2):9–13.
  • [Tourbier et al., 2019] Tourbier, S., Bresson, X., Hagmann, P., Meuli, R., and Bach Cuadra, M. (2019). sebastientourbier/mialsuperresolutiontoolkit: MIAL Super-Resolution Toolkit v1.0.
  • [Tourbier et al., 2015] Tourbier, S., Bresson, X., Hagmann, P., Thiran, J.-P., Meuli, R., and Cuadra, M. B. (2015). An efficient total variation algorithm for super-resolution in fetal brain MRI with adaptive regularization. NeuroImage, 118:584–597.
  • [Uus et al., 2022] Uus, A. U., Egloff Collado, A., Roberts, T. A., Hajnal, J. V., Rutherford, M. A., and Deprez, M. (2022). Retrospective motion correction in foetal MRI for clinical applications: existing methods, applications and integration into clinical practice. The British Journal of Radiology, page 20220071.
  • [Vasung et al., 2016] Vasung, L., Lepage, C., Radoš, M., Pletikos, M., Goldman, J. S., Richiardi, J., Raguž, M., Fischi-Gómez, E., Karama, S., Huppi, P. S., Evans, A. C., and Kostovic, I. (2016). Quantitative and Qualitative Analysis of Transient Fetal Compartments during Prenatal Human Brain Development. Frontiers in Neuroanatomy, 10.
  • [Vasung et al., 2020] Vasung, L., Rollins, C. K., Velasco-Annis, C., Yun, H. J., Zhang, J., Warfield, S. K., Feldman, H. A., Gholipour, A., and Grant, P. E. (2020). Spatiotemporal Differences in the Regional Cortical Plate and Subplate Volume Growth during Fetal Development. Cerebral Cortex, 30(8):4438–4453.
  • [Wright et al., 2014] Wright, R., Kyriakopoulou, V., Ledig, C., Rutherford, M., Hajnal, J., Rueckert, D., and Aljabar, P. (2014). Automatic quantification of normal cortical folding patterns from fetal brain MRI. NeuroImage, 91:21–32.
  • [Xia et al., 2019] Xia, J., Wang, F., Benkarim, O. M., Sanroma, G., Piella, G., González Ballester, M. A., Hahner, N., Eixarch, E., Zhang, C., Shen, D., and Li, G. (2019). Fetal cortical surface atlas parcellation based on growth patterns. Human Brain Mapping, 40(13):3881–3899.
  • [Yeghiazaryan and Voiculescu, 2018] Yeghiazaryan, V. and Voiculescu, I. (2018). Family of boundary overlap metrics for the evaluation of medical image segmentation. Journal of Medical Imaging, 5(01):1.
  • [Yushkevich et al., 2006] Yushkevich, P. A., Piven, J., Hazlett, H. C., Smith, R. G., Ho, S., Gee, J. C., and Gerig, G. (2006). User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. NeuroImage, 31(3):1116–1128.