Recovering high-quality FODs from a reduced number of diffusion-weighted images using a model-driven deep learning architecture
Abstract
Fibre orientation distribution (FOD) reconstruction using deep learning has the potential to produce accurate FODs from a reduced number of diffusion-weighted images (DWIs), decreasing total imaging time. Diffusion acquisition invariant representations of the DWI signals are typically used as input to these methods to ensure that they can be applied flexibly to data with different b-vectors and b-values; however, this means the network cannot condition its output directly on the DWI signal. In this work, we propose a spherical deconvolution network, a model-driven deep learning FOD reconstruction architecture, that ensures intermediate and output FODs produced by the network are consistent with the input DWI signals. Furthermore, we implement a fixel classification penalty within our loss function, encouraging the network to produce FODs that can subsequently be segmented into the correct number of fixels and improve downstream fixel-based analysis. Our results show that the model-based deep learning architecture achieves competitive performance compared to a state-of-the-art FOD super-resolution network, FOD-Net. Moreover, we show that the fixel classification penalty can be tuned to offer improved performance with respect to metrics that rely on accurately segmented of FODs. Our code is publicly available at https://github.com/Jbartlett6/SDNet.
Index Terms:
Diffusion MRI, model-based deep learning, FOD reconstructionI Introduction
Fibre orientation distributions (FODs) relate signal attenuation in diffusion-weighted magnetic resonance images to the volume fractions and orientations of fibre populations in the brain [1, 2, 3]. Their flexibility and capacity to discern intra-voxel fibre populations facilitates a range of subsequent quantitative analyses; tractography algorithms can be used to obtain tractograms, and FOD segmentation can provide discrete fibre bundle elements (fixels) [4, 5].
Multi-shell, high angular resolution diffusion imaging datasets are required to fit FODs with sufficient angular detail, and for separating the contribution of different tissue types [6, 3]. The approximately linear relationship between the time a subject spends in the scanner and the number of diffusion-weighted images (DWIs) collected means acquiring such datasets is time consuming.
Deep learning can help to alleviate this issue by performing FOD reconstruction, the task of fitting high-fidelity FODs to a reduced number of DWI signals. To ensure their flexibility, such deep learning methods should be invariant to changes in diffusion MRI acquisition arising due to inter-facility variability or DWI volume corruption. Resampling techniques such as spherical harmonics (SH) [7, 8, 9] and nearest neighbour [10] interpolation have been explored to resample arbitrary DWI acquisitions onto a pre-defined spherical grid. Alternatively, an SH representation of the signal can be used as input to the network [11, 12, 13, 14]. FOD super-resolution methods [15, 16, 17] perform constrained spherical deconvolution (CSD) as a pre-processing step and take the SH representation of the FOD as input. Results in the literature vary due to the range of acquisitions and CSD algorithms used to fit the FODs such as: single-shell-single-tissue [15], two-tissue [16] and single-shell-three-tissue [17] FODs.
High computational costs and the risk of overfitting mean it is not feasible to process all signals in the spatial and diffusion-acquisition dimensions concurrently. By predicting the central FOD from a limited spatial neighbourhood of the input [11, 17, 13], a compromise can be found between reducing the computational burden and exploiting the abundance of spatial correlations present in the data. Such methods commonly utilise a 3D convolutional neural network (CNN) for feature extraction, followed by fully connected or transformer layers for FOD prediction [9].
It is common practice for FODs to be fit using CSD with a maximum SH order of eight [17, 11] in order to to capture angular frequency content of the DWI signal at a maximum b-value of [6]. Some tractography algorithms require only the orientations of fibre populations in each voxel as input, so a number of FOD reconstruction algorithms predict only these quantities [7, 10]. Alternatively, an unsupervised loss function with sparsity inducing regularisation can be used to reconstruct FODs with an increased maximum order of 20 [8]. Whilst improving the angular separation, these methods change the FOD model, meaning it is likely that fixel-derived scalars, such as apparent fibre density and peak amplitude, also deviate. Therefore, it would be infeasible to apply such methods within a fixel-based analysis pipeline.
Model-based deep learning exploits domain knowledge of a process to inspire neural network architectures. Many approaches alternate between CNN-based denoising and data consistency blocks [18, 19, 20, 21]. Data consistency blocks use prior knowledge of an appropriate forward model to ensure a network produces solutions consistent with the input signal.
When calculating acquisition invariant representations of the DWI signal, fitting errors are incurred. We conjecture that such errors lead to the degradation of FOD reconstruction performance since the subsequently applied neural networks cannot directly condition their output on the true DWI signal, and model-based deep learning has the potential to lessen the impact of these errors by ensuring intermediate and output FODs are consistent with the DWI signal. In the context of FOD reconstruction, data consistency blocks minimise a linear combination of the CSD data consistency and an additional, deep learning based, regularisation term. Current implementations use a pre-trained autoencoder based regularisation term [15], however this means the network will not be optimised for FOD reconstruction performance. Model-based deep learning has to this point not been combined with techniques proven successful in end-to-end FOD reconstruction architecture.
In this paper Spherical Deconvolution Network (SDNet) is introduced, a model-based deep learning architecture that utilises spatial information from surrounding voxels and is optimised to perform FOD reconstruction of multi-shell data. Additionally, we propose a fixel classification penalty within our loss function to improve angular separation without distorting the shape of the reconstructed FODs, which can be tuned to suit the requirements of the reconstructed FODs. The efficacy is evaluated by extensive comparisons with a state-of-the-art FOD super-resolution method, FOD-Net, as well as an ablation study. Our results show that including model-based deep learning improves the performance of the network.
II Method
II-A Network Architecture
Constrained spherical deconvolution is used to fit FODs to DWI signal by optimising the following objective function:
(1) |
where are the SH coefficients of the FOD, are the DWI signals, and spherically convolves the FOD with the response functions of the tissue types being modelled. To facilitate a data-driven regularisation term, optimised for FOD reconstruction, we consider an arbitrary regularisation term, , in place of the ubiquitous non-negativity constraint. In the following we outline how the variable splitting methods used in Jia et al. [20], Duan et al. [21] can be adapted to solve (1).
First, we introduce an auxiliary splitting variable , converting (1) into the following equivalent form:
(2) |
Using the penalty function method, we add these constraints back into the model and minimise the joint objective:
(3) |
Eq. (3) can be solved for and using an alternating optimisation scheme:
(4) |
The first convex optimisation can be solved using matrix inversion. The second equation is a denoising problem with arbitrary regularisation, the optimal form of which is unknown. In order to learn the regularisation to improve FOD reconstruction performance, the iterative process can be unrolled and the denoising step solved using a neural network, :
(5) |
The network architecture (Fig. 1) takes nine voxels in each spatial dimension for 30 different diffusion gradients, resulting in a volume of DWI signals as input, and passes them through alternating DWI consistency and deep regularisation blocks. The network outputs a vector , a high-fidelity prediction of the FOD from the central voxel of the input patch.
II-A1 DWI Consistency
Each DWI consistency block solves the matrix inversion in (5) independently for each voxel, maintaining spatial resolution. The initial DWI consistency block optimises only for the first three even orders of spherical harmonic coefficients to ensure robustness to aggressive DWI undersampling.
II-A2 Deep Regularisation
Each deep regularisation block is applied to a concatenation of the previous two DWI consistency blocks, meaning the block is conditioned on both earlier representations. Validation tests showed these connections improve network performance (data not shown). The initial convolution kernels are applied with one layer of zero padding in each dimension, as to maintain spatial resolution, and are followed by 3D batch normalisation layers and parametric rectified linear unit (PReLU) activation functions. The number of channels is increased in this manner until it has reached 448 (Fig. 1). No padding is applied in the final convolution kernel followed by a PReLU function, reducing the resolution in each spatial dimension by two. Finally, a convolution kernel is then applied to the 512-channel feature maps to obtain a 94-channel input to a gated linear unit (GLU) activation function,which is the output of the block. Residual connections, referencing the output of the previous DWI consistency block, are used to improve gradient flow through the network. The deep regularisation block reduces each spatial dimension of its input by two.
II-B Loss Functions
In addition to the customary MSE loss, a fixel classification penalty is proposed to give greater control over the angular separation of the reconstructed FODs. The mechanics of this method can be considered similar to the microstructure sensitive loss proposed for DWI signal reconstruction in [22]. To overcome the inherent non-differentiable nature of the fast marching level set FOD segmentation algorithm [23], a fixel classification network is applied to predict the number of fixels each voxel contains. The output is passed into a cross-entropy component of the loss function. Since we are concerned with the white matter components of the FODs, the loss function and performance metrics are not functions of the grey matter and cerebrospinal fluid components of the FOD. For notational simplicity, from this point onwards refers only to the white matter component of the FOD. The overall loss function is as follows:
(6) |
where is the number of data points in the mini-batch, are the reconstructed and fully sampled white matter FODs, is the cross-entropy, are the predicted logits and the one-hot encoding of the number of fixels respectively and is a hyperparameter to balance the two components of the loss function.
Number of Fixels | Count | Percentage before thresholding | Percentage after thresholding |
1 | 310994 | 49% | 49% |
2 | 200673 | 32% | 32% |
3 | 76672 | 12% | 12% |
4 | 24095 | 4% | 6.7% |
5 | 10975 | 2% | - |
6 | 4979 | 0.8% | - |
7 | 1800 | 0.3% | - |
When training the fixel classification network, the number of fixels in each voxel were thresholded to four (Tab. I), reducing the inclusion of spurious peaks and class imbalance. A simple, fully-connected architecture was used, with layers containing 45, 1000, 800, 600, 400, 200, 100, 5 neurons. Between each layer there are ReLU activation and 1D batch normalisation functions, other than between the penultimate and final layer where the batch normalisation is omitted. A softmax activation function, followed by cross-entropy loss, were then applied to the output of the network. The classification network was trained using the same training set as SDNet. Fully sampled FODs were used as the input, and the ground truth targets were calculated using the fast level set marching algorithm [23].
II-C Implementation Details
To demonstrate the impact of the fixel classification penalty, experiments were carried out with and . The ADAM optimiser [24], with learning rate warm-up, was used for parameter optimisation, with an initial learning rate of , increasing to after iterations. To minimise hyperparameter tuning, was optimised simultaneously with the network weights. From validation experiments (data not included), we found that the most effective way to utilise the classification loss to train SDNet was to initially train the model with and then to increase to its final value after this initial training stage. To do so we trained SDNet with only MSE loss until convergence, then trained the network until convergence with .
III Experiments
III-A Dataset
A subset of the WU-Minn Human Connectome Project (HCP) dataset [25], consisting of 30 subjects, was split and used for training, validation, and testing, respectively. The HCP images have isotropic resolution with 90 gradient directions for and 18 images. The HCP dataset was minimally pre-processed in accordance with [26].
Additionally, prior to applying SDNet, each subject’s data was normalised using MRtrix3’s dwinormalise function. The fully sampled FODs were fit to all 288 DWIs; first, the response functions were calculated using the method proposed in [27], then the FODs calculated using MSMT-CSD [3]. White matter response functions and FODs were modelled with and the grey matter and cerebrospinal fluid component response functions and FODs were modelled with , resulting in a total of 47 SH coefficients.
The sampling pattern Caruyer et al. [28] utilised in the HCP is such that for any , selection of the first DWI volumes results in evenly spread b-vectors. To prepare the input data, the first 9 DWIs from each non-zero shell were selected with an additional 3 images, resulting in a total of 30 DWI signals.
Only patches in which the central voxel is classified as grey matter or white matter are used for training. The grey matter voxels were included to improve performance near the boundary of the two tissue types, as highlighted in [17]. The grey and white matter masks were calculated using the method outlined in [29], which is implemented using the FSL software package [30].
From this point onwards, for notational convenience, SDNet () will be referred to as SDNet and SDNet () will be referred to as . To evaluate the performance of the introduced methods, SDNet, , FOD-Net [17], and super-resolved MSMT CSD, referred to as MSMT CSD for notational simplicity, were all compared. In the original implementation, FOD-Net maps FODs fit using the single shell three tissue CSD algorithm [27] to 32 DWIs (4 ) to the desired MSMT CSD obtained FODs. To allow a fair comparison between FOD-Net and the proposed networks, FOD-Net was trained using the same training set as SDNet. Since the final block in the SDNet architecture is a DWI consistency block, it cannot map to normalised FODs, therefore the target training data is not normalised. It should be noted that the normalisation can still be performed as a post-processing step. Otherwise, the same configuration settings found in the Github repository released by the FOD-Net authors were used.
III-B Performance Metrics
To evaluate the performance of the FOD reconstruction algorithms, performance metrics were calculated voxel-wise then averaged over regions of interest. The regions considered were the white matter and intersections of individual tracts within the white matter. The tracts considered were: the corpus callosum (CC), the middle cerebellar peduncle (MCP), the corticospinal tract (CST), and the superior longitudinal fascicle (SLF). To understand how the algorithm performs in voxels containing different numbers of fibres, we considered the intersections of these tracts as in [17]. For voxels containing a single fibre, we considered voxels in the CC containing a single fixel, which we refer to as ROI-1-CC. For two crossing fibres, we considered voxels in the intersection of the MCP and CST containing two fixels, which we refer to as ROI-2-MCP. For three crossing fibres, we considered voxels in the intersection of the SLF, CST and CC containing three fixels, which we refer to as ROI-3-SLF. The white matter mask was calculated using the FSL five tissue type segmentation algorithm in MRtrix3. The segmentation masks for the white matter fibre tracts were obtained using TractSeg [31].
The SSE between the reconstructed FODs, , and the fully sampled FODs, , was computed as follows:
(7) |
The angular correlation coefficient (ACC) [32] was computed as follows:
(8) |
We refer to SSE and ACC as FOD-based performance metrics, since they compare the SH representation of the FODs prior to any further processing.
Fixel-based analysis requires each FOD to be segmented into fixels, each of which has associated apparent fibre density and peak amplitude [23]. To calculate the associated error metrics, peak amplitude and apparent fibre density vectors must be assembled. Each vector consists of the respective scalar for each fixel ordered according to the peak amplitude and are padded to a fixed length. The remaining metrics are referred to as fixel-based performance metrics since they require the FOD to be segmented into fixels prior to evaluation.
Fixel accuracy was defined for a region of interest as the proportion of voxels in which the FOD is segmented into the correct number of fixels.
The peak amplitude error (PAE) was calculated between the reconstructed, , and fully sampled FOD’s, , peak amplitude vectors:
(9) |
The apparent fibre density error (AFDE) was calculated between the reconstructed, , and fully sampled FOD’s, , apparent fibre density vectors:
(10) |
III-C Ablation Study
To investigate the impact of the DWI consistency block on the performance of the network, an ablation study was conducted. The network was trained without the DWI consistency blocks, and all other aspects of the architecture and network training remained the same. We compared this model to SDNet with the DWI consistency blocks included.
III-D Statistical Analysis
Shapiro-Wilk tests for normality () were applied for each performance metric and method; unless otherwise stated there is insufficient evidence to reject the null hypothesis that the groups are normally distributed.
Since the data was normally distributed, and each method was applied to the same set of test subjects, a repeated measures one-way ANOVA () was applied to each performance metric to determine whether there was a main effect between the conditions. Finally, to determine which methods contributed to the main effect, post-hoc t-tests with Bonferroni correction (adjusted for ) were used to identify effects between the FOD reconstruction algorithms.
IV Results
IV-A Qualitative Results
The qualitative results comparing all methods (Fig. 2) show that the deep learning methods reconstructed FODs that more closely resembled the ground truth when compared to MSMT CSD. The primary difference is the presence of spurious peaks produced by MSMT CSD, whereas the deep learning based algorithms coherently captured the major tracts in this region due to their denoising effect.
The highlighted region in Fig. 2 (panels f.-j.) shows an area where FOD-Net produced distorted FODs compared to SDNet and . MSMT CSD reconstructed particularly noisy FODs in this area, which the results obtained by FOD-Net resembled some similarities to. The FODs produced by SDNet underestimated the amplitude in this region but more accurately distinguished between fibre populations and captured their directions. In this region, which contains dominant fibre populations with large angular separation, the impact of increasing on the reconstructed FODs is minimal; only a small change in the direction of the fibres is observed. In the larger tracts in panels Fig. 2 a.-e., such as the green fibre population going upwards in the bottom left corner, all deep learning methods performed similarly.
The qualitative results comparing SDNet with (Fig. 3) illustrate that better separated fibre populations. The presence of fibre populations going from the lower left to upper right of panels Fig. 3 d.-f. are separated from the larger fibre population by but not by SDNet without the fixel classification penalty. The FODs reconstructed in the broader region, captured in panels Fig. 3 a.-c., show that larger fibre populations are reconstructed similarly for both SDNet and .
IV-B FOD-based Results
The SSE error maps (Fig. 4) show that lower SSE is achieved throughout the brain by all deep learning methods compared to MSMT CSD. SDNet generally achieved smaller errors than the other deep learning methods. This is particularly evident in, but not restricted to, the areas highlighted by the red arrows. The error maps produced by and FOD-Net are similar.
The average FOD-based performance results (Fig. 5 and Tab. II) show that SDNet reconstructed FODs with significantly lower SSE and higher ACC than the compared methods in all regions of interest considered. The training curves (Fig. 6) show that increasing caused the validation ACC to decrease over the validation set.
In the white matter voxels, SDNet achieved the lowest SSE by a statistically significant margin over all compared methods, followed by and FOD-Net, between which there was no statistically significant difference in SSE. SDNet also achieved the strongest ACC performance in the white matter, where it improved over all other methods by a statistically significant margin. There was no statistically significant difference between and FOD-Net with respect to ACC in the white matter.
In all of ROI-1-CC, ROI-2-MCP, and ROI-3-SLF, SDNet achieved the strongest SSE and ACC results (Fig. 5 and Tab. II) by a statistically significant margin. FOD-Net and showed no statistically significant differences with respect to SSE and ACC in ROI-1-CC and ROI-2-MCP but in ROI-3-SLF achieved a statistically significant improvement over FOD-Net with respect to both SSE and ACC. In all regions, all deep learning based FOD reconstruction methods outperformed MSMT CSD with respect to SSE and ACC by a statistically significant margin.
IV-C Fixel-based Results
The fixel-based performance results (Fig. 7 and Tab. II) show greater variation between regions and an increased dependence on . The training curves (Fig. 6) show that increasing caused the validation fixel accuracy to increase over the validation set. In the white matter, achieved the strongest fixel accuracy by a significant margin, followed by SDNet and FOD-Net between which there was no statistically significant difference.
In ROI-1-CC, ROI-2-MCP, and ROI-3-SLF, we see that the fixel accuracy of the deep learning FOD reconstruction methods decreased as the number of fixels increased. In ROI-1-CC, SDNet achieved the strongest performance by a statistically significant margin, followed by FOD-Net and , between which there is no statistically significant difference in fixel accuracy in the same region.
As the number of fixels in the ROIs increased, the fixel accuracy of increased relative to other methods. In ROI-2-MCP, achieved the highest fixel accuracy but not by a statistically significant margin over FOD-Net. Both methods outperformed SDNet by a statistically significant margin. In ROI-3-SLF this pattern continued as ’s performance further improved, and it achieved a statistically significant fixel accuracy increase over the other deep learning methods. There was no statistically significant difference in fixel accuracy between FOD-Net and SDNet in ROI-3-SLF. In all regions other than ROI-3-SLF, MSMT performed worse than all other methods by a statistically significant margin.
For AFDE in the white matter, achieved the lowest error by a statistically significant margin, followed by FOD-Net and SDNet between which there is no statistically significant difference in AFDE in the white matter. For PAE in the white matter, achieved the lowest error, which was a statistically significant improvement over SDNet but not FOD-Net. For both AFDE and PAE in the white matter, MSMT CSD achieved a higher error than all compared methods by a statistically significant margin.
Metric | SDNet | SDNet (pSD) | FOD-Net (pSD, pSD) | MSMT CSD (pSD, pSD) | |
SSE () | White Matter | 0.0110.001 | 0.0120.001 (0.001) | 0.0130.001 (0.001, 0.034) | 0.0410.002 (0.001, 0.001) |
ROI-1-CC | 0.0070.001 | 0.0080.001 (0.001) | 0.0080.001 (0.001, 0.295) | 0.0280.002 (0.001, 0.001) | |
ROI-2-MCP | 0.0160.001 | 0.0180.001 (0.001) | 0.0170.001 (0.001, 0.175) | 0.0450.002 (0.001, 0.001) | |
ROI-3-SLF | 0.0140.001 | 0.0150.001 (0.005) | 0.0170.001 (0.001, 0.001) | 0.0630.002 (0.001, 0.003) | |
ACC () | White Matter | 92.2090.003 | 91.1520.003 (0.001) | 91.1840.003 (0.001, 0.484) | 79.2680.005 (0.001, 0.001) |
ROI-1-CC | 95.0900.002 | 93.9940.002 (0.001) | 94.2970.002 (0.001, 0.009) | 84.6620.004 (0.001, 0.001) | |
ROI-2-MCP | 92.7620.003 | 91.7460.003 (0.001) | 92.0460.003 (0.001, 0.032) | 79.7960.006 (0.001, 0.001) | |
ROI-3-SLF | 94.5770.005 | 94.2330.005 (0.001) | 93.2910.005 (0.001, 0.001) | 74.8440.011 (0.001, 0.001) | |
Fix Acc () | White Matter | 0.6400.011 | 0.6640.008 (0.001) | 0.6450.009 (0.037, 0.001) | 0.5360.006 (0.001, 0.001) |
ROI-1-CC | 0.9010.002 | 0.8510.005 (0.001) | 0.8670.003 (0.001, 0.036) | 0.4690.010 (0.001, 0.001) | |
ROI-2-MCP | 0.7540.011 | 0.7910.010 (0.001) | 0.7720.009 (0.001, 0.018) | 0.5480.009 (0.001, 0.001) | |
ROI-3-SLF | 0.6060.032 | 0.6480.031 (0.001) | 0.5880.029 (0.023, 0.001) | 0.5480.009 (0.076, 0.163) | |
PAE () | White Matter | 0.1550.006 | 0.1470.005 (0.001) | 0.1520.005 (0.065, 0.011) | 0.2440.007 (0.001, 0.001) |
ROI-1-CC | 0.0620.002 | 0.0720.002 (0.001) | 0.0690.002 (0.001, 0.053) | 0.2100.007 (0.001, 0.001) | |
ROI-2-MCP | 0.1350.003 | 0.1360.003 (0.393) | 0.1360.002 (0.843, 0.973) | 0.2190.004 (0.001, 0.001) | |
ROI-3-SLF | 0.1790.009 | 0.1780.007 (0.779) | 0.1940.010 (0.001, 0.002) | 0.2780.006 (0.001, 0.001) | |
AFDE () | White Matter | 0.1640.005 | 0.1510.004 (0.001) | 0.1600.005 (0.012, 0.002) | 0.2080.006 (0.001, 0.001) |
ROI-1-CC | 0.0650.001 | 0.0740.001 (0.001) | 0.0730.002 (0.002, 0.711) | 0.1870.007 (0.001, 0.001) | |
ROI-2-MCP | 0.1070.002 | 0.1050.001 (0.526) | 0.1060.001 (0.713, 0.489) | 0.1710.003 (0.001, 0.001) | |
ROI-3-SLF | 0.1510.007 | 0.1490.006 (0.462) | 0.1650.007 (0.001, 0.001) | 0.2300.006 (0.001, 0.001) |
In ROI-1-CC, ROI-2-MCP and ROI-3-SLF, both AFDE and PAE generally increased as the number of fixels increased. In ROI-1-CC, SDNet achieved strongest results with respect to both AFDE and PAE and in ROI-2-MCP all three deep learning methods performed similarly with respect to both AFDE and PAE. In ROI-3-SLF, SDNet and achieved similar AFDE and PAE, with no statistically significant difference between them, but both achieved a statistically significant improvement compared to FOD-Net.
IV-D Ablation Study
Metric | SDNet | SDNet w/o DC | Percentage Change | p value |
SSE () | 9.1 % | 0.05 | ||
ACC () | 0.57% | 0.05 | ||
Fix Acc () | 2.3% | 0.05 | ||
AFDE () | 1.3 % | 0.05 | ||
PAE () | 0.8% | 0.05 |
The results of the ablation study (Tab. III) clearly demonstrate that removing the DWI consistency blocks from the SDNet architecture caused the performance of the network to degrade significantly with respect to all metrics. The greatest relative degradation of performance occurred with respect to SSE, however consistent reductions in the performance of all other metrics was also observed.
V Discussion
SDNet is a model-based deep learning architecture that employs DWI consistency blocks to ensure intermediate FODs are consistent with the DWI signal, whilst making use of spatial information and multi-shell DWI data to reconstruct FODs. We compared our network to FOD-Net [17], a FOD super-resolution network, which fits FODs to the DWI signal prior to the network’s forward pass. Our results show that SDNet improved over FOD-Net in terms of FOD-based performance, and performed similarly with respect to most fixel-based metrics. We conjecture that FOD-Net loses some details of the DWI signal in the FOD fitting stage. Our qualitative results (Fig. 2) support this since the FODs reconstructed by FOD-Net more closely resembled the unstable input MSMT-CSD FODs, whereas by ensuring consistency with the DWI signal, SDNet more robustly reconstructed FODs which closely resembled the ground truth. The quantitative results collected from our comparison and ablation studies highlighted the improvement in FOD-based performance enabled by including DWI consistency blocks.
The ultimate goal of deep learning based FOD reconstruction is to produce FODs that are useful for quantitative analysis. FOD registration [33], a key component of longitudinal and group FOD analyses, relies on distance between SH coefficients to captures FOD similarity. By achieving a low SSE, the SH representations will bear increased similarity to the ground truth FODs. We therefore anticipate that SDNet will help ensure that FOD registration is minimally impacted by DWI undersampling, and so too the subsequent analysis.
Another factor that may impact such analyses is data containing abnormalities, such as pathologies. Such data will likely not be abundant in the datasets used for training deep learning based FOD reconstruction networks, and as a consequence, reduced performance caused by overfitting becomes probable. Since the DWI consistency blocks ensure that solutions will be consistent with the measured DWI data, we expect that SDNet will be less likely to overfit therefore performing comparatively well compared to networks without DWI consistency blocks. However, further investigation is beyond the scope of the current work.
The outcome of such quantitative analysis is also dependent on the post-registration steps in the pipeline, which, in the case of a fixel-based analysis [4], will be predominantly impacted by the fixel-based performance. Comparing multiple FOD reconstruction algorithms revealed that strong FOD-based performance doesn’t directly translate to strong fixel-based performance. The disconnect between FOD and fixel-based performance is evident in the statistically significant difference in SSE over the white matter between SDNet and FOD-Net, but the absence of a statistically significant effect in fixel accuracy over the same set of voxels. This effect can be attributed to FOD segmentation’s dependence on the angular separation of the FOD lobes, which is dependent on the higher order SH coefficients, which only contribute a small amount to the SSE. This highlights that SSE loss alone may not be optimal for reconstructing FODs that are to be used in a fixel-based analysis pipeline.
By introducing an additional loss component, which penalises reconstructed FODs judged to be made up of the incorrect number of fixels, we have demonstrated that fixel-based performance can be improved. The impact of the proposed loss function is illustrated by the statistically significant increase in fixel accuracy in the white matter achieved by compared to SDNet and FOD-Net. The qualitative results (Fig. 3) highlighted the improved angular separation of fibres with low angular separation. It is also evident that the overall shape of the FOD is captured, as opposed to discrete, or Dirac-like FODs [8, 7, 10]. Furthermore, statistically significant improvements were recorded in fixel accuracy, PAE and AFDE by across the white matter.
However, the introduction of fixel classification penalty in ROI-1-CC led to a reduction in fixel-based performance. This highlighted a potential bias of SDNet towards over-estimating the number of fixels in each voxel. The input of FOD reconstruction networks are necessarily derived from a DWI acquisition with low angular resolution, so do not have sufficient information to reconstruct FODs that contain all fixels, as observed in Fig. 4. Therefore, the effect of the fixel classification penalty will generally be to correct these underestimations by encouraging the network to increase the number of fixels. Since ROI-1-CC contains only single fixel voxels, the fixel-classification penalty may have increased the number of over-estimations in this region, which, when combined with the already strong performance of SDNet and FOD-Net, led to the observed decrease in performance. On the other hand, in ROI-3-SLF, a region containing 3 crossing fibres, the use of fixel classification penalty improved performance compared to the other two deep learning methods, and despite worse performance in ROI-1-CC, resulted in an improvement in performance over the white matter voxels for all fixel-based performance metrics.
In the current work, the fixel classification network is trained on the ground truth data alone, which, depending on the efficacy of the FOD reconstruction algorithm, will have a different distribution to the reconstructed FODs. One possible approach to further improving performance is to devise an algorithm to jointly train the FOD reconstruction network and the fixel classification network, similar to the method used to train generative adversarial networks [34].
The fixel classification penalty component of the loss function appears to share some characteristics with regularisation terms that are ubiquitous in model-based methods for solving ill-posed inverse problems. In particular, to minimise a combination of SSE loss and the fixel classification penalty, a decrease in SSE was incurred, and we have identified in our validation experiments that the extent of such a sacrifice can be controlled by the adjustment of (data not included). This suggests that the solution that obtains the lowest SSE may fail to capture certain desirable features of the FOD. In this work, we have highlighted this impact on the separation of fibre populations with similar orientations, but it is possible other features such as the continuity of fibre populations through space could also be improved using similar methods.
VI Conclusion
In this work we have proposed SDNet, a model-based deep learning architecture optimised for FOD reconstruction. In addition to the learned regularisation blocks, are trained directly in an end-to-end fashion and therefore optimised for the task of FOD reconstruction, the network also takes a neighbourhood of multi-shell DWI signals as input to an architecture containing multiple cascades. We further show that there is a trade-off between FOD-based and fixel-based performance, and propose a fixel classification penalty term in our loss function, as implemented in , as a method of controlling the the trade-off between these performance metrics. We show that, when compared to a state-of-the-art FOD super-resolution network, FOD-Net, gains in FOD-based and fixel-based performance were achieved by SDNet and , respectively.
Acknowledgment
We would like to thank Xi Jia from University of Birmingham for the fruitful discussion on network architecture and parameter tuning in this research. The computations described in this research were performed using the Baskerville Tier 2 HPC service (https://www.baskerville.ac.uk/). Baskerville was funded by the EPSRC and UKRI through the World Class Labs scheme (EP/T022221/1) and the Digital Research Infrastructure programme (EP/W032244/1) and is operated by Advanced Research Computing at the University of Birmingham.
References
- Tournier et al. [2004] J.-D. Tournier, F. Calamante, D. G. Gadian, and A. Connelly, “Direct estimation of the fiber orientation density function from diffusion-weighted MRI data using spherical deconvolution,” Neuroimage, vol. 23, no. 3, pp. 1176–1185, 2004.
- Tournier et al. [2007] J.-D. Tournier, F. Calamante, and A. Connelly, “Robust determination of the fibre orientation distribution in diffusion MRI: non-negativity constrained super-resolved spherical deconvolution,” Neuroimage, vol. 35, no. 4, pp. 1459–1472, 2007.
- Jeurissen et al. [2014] B. Jeurissen, J.-D. Tournier, T. Dhollander, A. Connelly, and J. Sijbers, “Multi-tissue constrained spherical deconvolution for improved analysis of multi-shell diffusion MRI data,” NeuroImage, vol. 103, pp. 411–426, 2014.
- Raffelt et al [2012] D. Raffelt et al, “Apparent fibre density: a novel measure for the analysis of diffusion-weighted magnetic resonance images,” Neuroimage, vol. 59, no. 4, pp. 3976–3994, 2012.
- Raffelt et al. [2017] D. A. Raffelt, J. D. Tournier, R. E. Smith, D. N. Vaughan, G. Jackson, G. R. Ridgway, and A. Connelly, “Investigating white matter fibre density and morphology using fixel-based analysis,” NeuroImage, vol. 144, pp. 58–73, 1 2017.
- Tournier et al. [2013] J. D. Tournier, F. Calamante, and A. Connelly, “Determination of the appropriate b value and number of gradient directions for high-angular-resolution diffusion-weighted imaging,” NMR in Biomedicine, vol. 26, pp. 1775–1786, 12 2013.
- Koppers and Merhof [2016] S. Koppers and D. Merhof, “Direct estimation of fiber orientations using deep learning in diffusion imaging,” in International Workshop on Machine Learning in Medical Imaging. Springer, 2016, pp. 53–60.
- Elaldi et al. [2021] A. Elaldi, N. Dey, H. Kim, and G. Gerig, “Equivariant spherical deconvolution: Learning sparse orientation distribution functions from spherical data,” in International Conference on Information Processing in Medical Imaging. Springer, 2021, pp. 267–278.
- Hosseini et al. [2022] S. Hosseini, M. Hassanpour, S. Masoudnia, S. Iraji, S. Raminfard, and M. Nazem-Zadeh, “Cttrack: A CNN+ transformer-based framework for fiber orientation estimation & tractography,” Neuroscience Informatics, vol. 2, no. 4, p. 100099, 2022.
- Karimi et al. [2021] D. Karimi, L. Vasung, C. Jaimes, F. Machado-Rivas, S. K. Warfield, and A. Gholipour, “Learning to estimate the fiber orientation distribution function from diffusion-weighted MRI,” NeuroImage, vol. 239, p. 118316, 2021.
- Lin et al. [2019] Z. Lin, T. Gong, K. Wang, Z. Li, H. He, Q. Tong, F. Yu, and J. Zhong, “Fast learning of fiber orientation distribution function for MR tractography using convolutional neural network,” Medical physics, vol. 46, no. 7, pp. 3101–3116, 2019.
- Nath et al. [2020] V. Nath, S. K. Pathak, K. G. Schilling, W. Schneider, and B. A. Landman, “Deep learning estimation of multi-tissue constrained spherical deconvolution with limited single shell DW-MRI,” in Medical Imaging 2020: Image Processing, vol. 11313. SPIE, 2020, pp. 162–171.
- kop [2017] “Reconstruction of diffusion anisotropies using 3D deep convolutional neural networks in diffusion imaging,” in Modeling, Analysis, and Visualization of Anisotropy. Springer, 2017, pp. 393–404.
- Jha et al. [2022] R. R. Jha, S. K. Pathak, V. Nath, W. Schneider, B. R. Kumar, A. Bhavsar, and A. Nigam, “VRfRNet: Volumetric ROI fODF reconstruction network for estimation of multi-tissue constrained spherical deconvolution with only single shell dMRI,” Magnetic Resonance Imaging, vol. 90, pp. 1–16, 2022.
- Patel et al. [2018] K. Patel, S. Groeschel, and T. Schultz, “Better fiber ODFs from suboptimal data with autoencoder based regularization,” in International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2018, pp. 55–62.
- Lucena et al. [2021] O. Lucena, S. B. Vos, V. Vakharia, J. Duncan, K. Ashkan, R. Sparks, and S. Ourselin, “Enhancing the estimation of fiber orientation distributions using convolutional neural networks,” Computers in Biology and Medicine, vol. 135, p. 104643, 2021.
- Zeng et al. [2022] R. Zeng, J. Lv, H. Wang, L. Zhou, M. Barnett, F. Calamante, and C. Wang, “FOD-Net: A deep learning method for fiber orientation distribution angular super resolution,” Medical Image Analysis, vol. 79, p. 102431, 2022.
- Aggarwal et al. [2018] H. K. Aggarwal, M. P. Mani, and M. Jacob, “MoDL: Model-based deep learning architecture for inverse problems,” IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 394–405, 2018.
- Schlemper et al. [2017] J. Schlemper, J. Caballero, J. V. Hajnal, A. Price, and D. Rueckert, “A deep cascade of convolutional neural networks for MR image reconstruction,” in International Conference on Information Processing in Medical Imaging. Springer, 2017, pp. 647–658.
- Jia et al. [2021] X. Jia, A. Thorley, W. Chen, H. Qiu, L. Shen, I. B. Styles, H. J. Chang, A. Leonardis, A. De Marvao, D. P. O’Regan et al., “Learning a model-driven variational network for deformable image registration,” IEEE Transactions on Medical Imaging, vol. 41, no. 1, pp. 199–212, 2021.
- Duan et al. [2019] J. Duan, J. Schlemper, C. Qin, C. Ouyang, W. Bai, C. Biffi, G. Bello, B. Statton, D. P. O’regan, and D. Rueckert, “VS-Net: Variable splitting network for accelerated parallel MRI reconstruction,” in International Conference on Medical Image Computing and Computer-assisted Intervention. Springer, 2019, pp. 713–722.
- Chen et al. [2023] G. Chen, Y. Hong, K. M. Huynh, and P.-T. Yap, “Deep learning prediction of diffusion MRI data with microstructure-sensitive loss functions,” Medical Image Analysis, p. 102742, 2023.
- Smith et al. [2013] R. E. Smith, J.-D. Tournier, F. Calamante, and A. Connelly, “Sift: Spherical-deconvolution informed filtering of tractograms,” Neuroimage, vol. 67, pp. 298–312, 2013.
- Kingma and Ba [2014] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- Van Essen et al. [2013] D. C. Van Essen, S. M. Smith, D. M. Barch, T. E. Behrens, E. Yacoub, K. Ugurbil, W.-M. H. Consortium et al., “The WU-Minn human connectome project: an overview,” Neuroimage, vol. 80, pp. 62–79, 2013.
- Sotiropoulos et al. [2013] S. N. Sotiropoulos, S. Jbabdi, J. Xu, J. L. Andersson, S. Moeller, E. J. Auerbach, M. F. Glasser, M. Hernandez, G. Sapiro, M. Jenkinson et al., “Advances in diffusion MRI acquisition and processing in the human connectome project,” Neuroimage, vol. 80, pp. 125–143, 2013.
- Dhollander et al. [2019] T. Dhollander, R. Mito, D. Raffelt, and A. Connelly, “Improved white matter response function estimation for 3-tissue constrained spherical deconvolution,” in Proc. Intl. Soc. Mag. Reson. Med, vol. 555, no. 10, 2019.
- Caruyer et al. [2013] E. Caruyer, C. Lenglet, G. Sapiro, and R. Deriche, “Design of multishell sampling schemes with uniform coverage in diffusion MRI,” Magnetic Resonance in Medicine, vol. 69, no. 6, pp. 1534–1540, 2013.
- Zhang et al. [2001] Y. Zhang, M. Brady, and S. Smith, “Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm,” IEEE Transactions on Medical Imaging, vol. 20, no. 1, pp. 45–57, 2001.
- Jenkinson et al. [2012] M. Jenkinson, C. F. Beckmann, T. E. Behrens, M. W. Woolrich, and S. M. Smith, “FSL,” Neuroimage, vol. 62, no. 2, pp. 782–790, 2012.
- Wasserthal et al. [2018] J. Wasserthal, P. Neher, and K. H. Maier-Hein, “TractSeg-Fast and accurate white matter tract segmentation,” NeuroImage, vol. 183, pp. 239–253, 2018.
- Anderson [2005] A. W. Anderson, “Measurement of fiber orientation distributions using high angular resolution diffusion imaging,” Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine, vol. 54, no. 5, pp. 1194–1206, 2005.
- Raffelt et al. [2011] D. Raffelt, J.-D. Tournier, J. Fripp, S. Crozier, A. Connelly, and O. Salvado, “Symmetric diffeomorphic registration of fibre orientation distributions,” Neuroimage, vol. 56, no. 3, pp. 1171–1180, 2011.
- Goodfellow et al. [2014] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” 6 2014. [Online]. Available: http://arxiv.org/abs/1406.2661