This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Spatial von-Mises Fisher Regression for Directional Data

Zhou Lan  
Brigham and Women’s Hospital, Harvard Medical School
and
Arkaprava Roy*
Department of Biostatistics, University of Florida
and
for the Alzheimer’s Disease Neuroimaging Initiative
Abstract

Spatially varying directional data are routinely observed in several modern applications such as meteorology, biology, geophysics, and engineering, etc. However, only a few approaches are available for covariate-dependent statistical analysis for such data. To address this gap, we propose a novel generalized linear model to analyze such that using a von Mises Fisher (vMF) distributed error structure. Using a novel link function that relies on the transformation between Cartesian and spherical coordinates, we regress the vMF-distributed directional data on the external covariates. This regression model enables us to quantify the impact of external factors on the observed directional data. Furthermore, we impose the spatial dependence using an autoregressive model, appropriately accounting for the directional dependence in the outcome. This novel specification renders computational efficiency and flexibility. In addition, a comprehensive Bayesian inferential toolbox is thoroughly developed and applied to our analysis. Subsequently, employing our regression model to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data, we gain new insights into the relationship between cognitive impairment and the orientations of brain fibers along with examining empirical efficacy through simulation experiments.


Keywords: Bayesian Angular Inference, Diffusion tensor imaging, Markov chain Monte Carlo, Tangent-normal decomposition

1 Introduction

Directional statistics, alternatively referred to as circular or angular statistics in two-dimensional cases or spherical statistics for higher dimensions, is a distinct subfield of multivariate statistics specifically tailored for analyzing directional data (Mardia and Jupp, 2009). Such data arise when measurements are expressed as angles or directions, inherently exhibiting spherical relationships. Instances of directional data can be found in diverse disciplines, including meteorology, biology, geophysics, and engineering, with examples ranging from wind direction and animal migration patterns to diffusion tensor imaging. Moreover, a variety of external features often exist that control the directional outcome.

In this paper, our proposed methodological developments focus on studying the association between the white matter fiber orientations in the brain, and subject-level covariates (e.g., age, gender, cognitive score, and genetic information), revealing the factors driving the fiber orientations (Schwartzman et al., 2005, 2008). In brain neuroimaging analyses, spatial dependencies become an important feature. Furthermore, the incorporation of spatial dependence is appealing in many directional data applications such as wind direction modeling (Neto et al., 2014; Murphy et al., 2022), brain fiber orientation modeling (Goodlett et al., 2009; Zhu et al., 2011), and modeling actigraph data from wearable devices (Banerjee et al., 2023). Hence, to effectively analyze this type of data, it is important to employ a strategy that can address: a) the inherent directional nature of the outcome; b) the complex spatial dependencies; and c) an effective and convenient summarization tool for the effects of the external factors associated with the outcome.

The fiber orientation data considered in this paper are derived from diffusion tensor imaging (DTI), a technique that has gained wide popularity for its increasing efficiency and accuracy in measuring microstructures (Soares et al., 2013). In the DTI imaging technique, a diffusion ellipsoid, which represents diffusivity in each voxel, is commonly used. Here, we focus on the principal eigenvector 𝑬1\bm{E}_{1}, interpreted as the tangent direction along the fiber bundle (rightmost panel in Figure 1). Thus, it is referred to as the principal diffusion direction throughout the rest of the paper and is one of the most critical markers for studying white matter fibers. The association between principal diffusion directions and subjects’ covariates has emerged as a perspective in studying neurodegenerative diseases (Schwartzman et al., 2005, 2008). With studies revealed that tissue properties may vary systematically along each tract, i.e., tract profiles of white matter properties (Yeatman et al., 2012), it becomes an important scientific question in investigating how the principal diffusion directions along each tract are driven by covariate effects.

Refer to caption
Figure 1: Macro to micro-view of the brain’s anatomical structure: the left panel shows that the brain is connected through the streamlines of fiber tracts; the middle panel provides an example fiber tract; finally, the right panel shows the directional data characterizing the streamlines.

Traditional statistical methods, which rely on Euclidean space, are generally unsuitable for analyzing directional data due to their inability to account for the spherical relationships among values. To tackle these issues, directional statistics utilize a unique set of techniques (Gould, 1969) and probability distributions (Johnson and Wehrly, 1978). Several regression approaches have been developed for handling circular or 2-dimensional directional outcome data (Gould, 1969; Johnson and Wehrly, 1978; Fisher and Lee, 1992; Presnell et al., 1998). However, these approaches are not generalizable for spherical outcomes where the data sits on three or higher-dimensional spheres (Mardia and Jupp, 2009; Paine et al., 2020). A regression method for analyzing such data was first introduced in Chang (1986), and subsequent developments include (Downs, 2003; Rosenthal et al., 2014; Scealy and Wood, 2019; Paine et al., 2020), and few other approaches are proposed relying on the properties of homogeneous Riemannian manifold (Hinkle et al., 2014; Cornea et al., 2017). Pewsey and García-Portugués (2021) provided a thorough review in this context. Nevertheless, none of these approaches are suitable for studying spatially varying directional outcomes.

In this paper, we develop a novel regression approach to infer the effects of several external factors on spatially varying directional data as an outcome. Notationally, let a pp-variate directional data measured at the replication ii and location vv, denoted as 𝑬iv𝒮p1\bm{E}_{iv}\in\mathcal{S}^{p-1}, be the outcome of interest, and let a covariate vector 𝑿i=[1,Xi1,,Xic,,XiC]\bm{X}_{i}=[1,{X}_{i1},\ldots,{X}_{ic},\ldots,{X}_{iC}] represent the array of predictors. Here 𝒮p1\mathcal{S}^{p-1} stands for the pp-dimensional sphere. Our objective is to regress 𝑬iv\bm{E}_{iv} on 𝑿i\bm{X}_{i} for statistical inference purposes. In summary, we propose a generalized regression model for unit vector-valued responses, incorporating a novel link function based on the transformation between Cartesian and spherical coordinates. This link function enables us to project the directional data onto Euclidean space, thus allowing us to efficiently model the covariate effects.

As discussed in Paine et al. (2020), the regression models for angular data or 𝒮1\mathcal{S}^{1} are not easily extended for 𝒮2\mathcal{S}^{2} or higher dimensional spheres. The fact that the intercept takes care of the rotational invariance as in Fisher and Lee (1992) is very specific to the angular data only. In our model, such modification would not make the estimate rotation invariant as the response is in 𝒮2\mathcal{S}^{2}. Following model Structure 2 of Paine et al. (2020) and Section 4.3.1 of Scealy and Wood (2019), we add an unknown rotation matrix 𝑸\bm{Q} as a parameter that allows the inference to not depend on the coordinate system. Unlike Scealy and Wood (2019), we put a prior and sample this orthogonal matrix within our MCMC since in a spatial setting, their approach for the pre-specification of this matrix may not be appropriate. Thus, the posterior samples of the regression coefficients become dependent on the posterior samples of the orthogonal matrix. Thus, it remains not straightforward to use the regression coefficients directly for scientific inference. Thus, we consider a tangent-normal decomposition-based inference framework instead of using the regression coefficients directly. Specifically, we summarize the change in directional-valued mean upon a one-unit change in predictors in terms of the magnitude of the tangent-normal component and use those summarizations to quantify the importance of different predictors. Further details are provided in Section 3. Moreover, the inclusion of this orthogonal matrix improves flexibility along the lines of Kent et al. (2006); Scealy and Wood (2019); Paine et al. (2020) and also the model performance, as shown in our simulation results in Section 4.

Another important component to be addressed is the aspect of complex spatial dependence. In many modern applications, the directional data are spatially correlated, and the spatial dependence presents a different and unique pattern, i.e., the spatial dependence of this type of directional data is usually along streamlines (Yeatman et al., 2012). The streamlines have different meanings in different applications. In animal migration, streamlines are the animal migration pathways (Mahan, 1991). In DTI, the streamlines are the fiber streamlines, i.e., the blue curves in the middle panel of Figure 1. In Goodlett et al. (2009); Zhu et al. (2011), the authors proposed methods that induce spatial dependence by considering arc length distances along a fiber while analyzing the fiber orientation data. Such dependence can be observed in our motivating DTI data (see Section A.1 in the supplementary materials) too. Motivated by this, we impose the spatial dependence along the streamlines, relying on an autoregressive model. For our Alzheimer’s Disease Neuroimaging Initiative (ADNI) data analysis, the fiber streamlines, based on the tractography atlases from Yeh et al. (2018) are used. The tractography atlases provided by Yeh et al. (2018) offer inter-voxel connectivities along a given fiber tract fornix (see Figure 2). These are estimated using high-quality diffusion-weighted images from healthy subjects. Fornix’s tract profile has been studied in many other studies (Valdés Cabrera et al., 2020; Cahn et al., 2021).

Refer to caption
Figure 2: Tractography atlas of fornix .

In the rest of the paper, we first illustrate the model construction in Section 2. The proposed Bayesian angular inference is given in Section 3. To evaluate the performance of our model, we provide numerical studies in Section 4. Finally, we conclude and discuss in Section 5. Supplementary figures and computational materials are in supplementary materials.

2 Model Construction

In this section, we provide the details of the model construction. Section 2.1 describes the distribution that serves as the noise model for directional data. Subsequently, Section 2.2 introduces the link function, which allows us to relate a linear combination of the predictors to the response variables. Finally, we describe the model for spatial dependence in Section 2.3.

2.1 vMF Distribution

A random pp-dimensional directional data point, 𝑬iv\bm{E}_{iv}, lies on 𝒮p1\mathcal{S}^{p-1}, as described in Section 1. The von Mises-Fisher (vMF) distribution (Mardia and Jupp, 2009) is a commonly used probability distribution for handling directional data, and spherical regression models are built based on the vMF distribution. However, building spherical regression models is not straightforward since the inference might depend on an arbitrary coordinate system in which the responses are defined. Following the approaches in Scealy and Wood (2019); Paine et al. (2020), We assume that 𝑸T𝑬iv\bm{Q}^{T}\bm{E}_{iv} follows a von Mises-Fisher (vMF) distribution (Mardia and Jupp, 2009), denoted as 𝑸T𝑬ivvMF(𝝁iv,κ)\bm{Q}^{T}\bm{E}_{iv}\sim\text{vMF}\big{(}\bm{\mu}_{iv},\kappa\big{)} with the probability density function expressed as

f[𝑸T𝑬iv|𝝁iv,κ]=Cp(κ)exp(κ𝝁ivT𝑸T𝑬iv)=Cp(κ)exp(κivT𝑬iv),\displaystyle f\Big{[}\bm{Q}^{T}\bm{E}_{iv}|\bm{\mu}_{iv},\kappa\Big{]}=C_{p}(\kappa)\exp\Big{(}\kappa\bm{\mu}_{iv}^{T}\bm{Q}^{T}\bm{E}_{iv}\Big{)}=C_{p}(\kappa)\exp\Big{(}\kappa\mathcal{M}_{iv}^{T}\bm{E}_{iv}\Big{)}, (1)

In the expression, Cp(κ)C_{p}(\kappa) is the normalizing constant such as Cp(κ)=κp/21(2π)p/2Ip/21(κ)C_{p}(\kappa)={\frac{\kappa^{p/2-1}}{(2\pi)^{p/2}I_{p/2-1}(\kappa)}}, where IaI_{a} stands for the modified Bessel function of the first kind at order aa. In this paper, we primarily focus on the case with p=3p=3, due to the nature of the DTI data, and in that case, C3(κ)=κ2π(eκeκ)C_{3}(\kappa)=\frac{\kappa}{2\pi(e^{\kappa}-e^{-\kappa})} and the density simplifies to κexp(κ(μT𝐱1))2π(1exp(2κ))\frac{\kappa\exp(\kappa(\mu^{T}\mathbf{x}-1))}{2\pi(1-\exp(-2\kappa))} for a point 𝐱𝒮2\mathbf{x}\in\mathcal{S}^{2}. There are some pictorial illustrations available in Straub (2017). The term 𝝁iv\bm{\mu}_{iv} is a directional vector, i.e., 𝝁iv=1\|\bm{\mu}_{iv}\|=1, where \|\cdot\| represents the 2\ell_{2} norm. κ0\kappa\geq 0 is a concentration parameter. The term 𝑸\bm{Q} is set as an unknown parameter in our model and modeled as a special orthogonal matrix.

The density is maximized at 𝑬iv=𝑸𝝁iv\bm{E}_{iv}=\bm{Q}\bm{\mu}_{iv}, the mode direction parameter. Thus, we further introduce the notation

iv=𝑸𝝁iv,\mathcal{M}_{iv}=\bm{Q}\bm{\mu}_{iv}, (2)

and it is the mode direction for the original response 𝑬iv\bm{E}_{iv}, and is named as direct mode direction to differentiate it from 𝝁iv\bm{\mu}_{iv} which is called rotated mode direction. In the next subsection, we propose a link function to relate predictors with 𝝁iv\bm{\mu}_{iv}. Since the link function is known and pre-specified, the orthogonal matrix 𝑸\bm{Q} here provides additional flexibility by identifying an optimal coordinate system with respect to the link.

Refer to caption
(a) The graphical illustration describes the transformation between Cartesian and spherical coordinates. The azimuth angle and elevation angle are labeled in the Cartesian coordinate system.
Refer to caption
(b) The graphical illustration of tangent-normal decomposition of covariate effects. The yellow arrow is the typical mode direction; the blue arrow is the specific mode direction; the green arrow is the deviation direction; The scalar mv(0,1)m_{v}\in(0,1) controls the magnitude of this effect.
Figure 3: The graphical illustrations of directional statistics.

2.2 Linking to Predictors

In accordance with the generalized linear regression (GLM) framework, we propose a relationship between a linear combination of the predictors and the response variables through a link function. It is also reasonable to allow the linear combination of the predictors to be unbounded, mirroring other GLM frameworks. Therefore, our proposed link function is a composition of two functions, each handling a different level of transformations. The following section provides detailed insight into the construction of the link function.

In this section, we focus on our link function for 3-dimensional spherical coordinates, i.e., when 𝑬iv\bm{E}_{iv} lies on 𝒮p1\mathcal{S}^{p-1} and p=3p=3. Let (x,y,z)(x,y,z) represent the Cartesian coordinates of a point on the sphere such that x2+y2+z2=1x^{2}+y^{2}+z^{2}=1. This point can be represented in spherical coordinates using two parameters, as depicted in Figure 3(a). The azimuth angle θ(π,π]\theta\in(-\pi,\pi] represents the counterclockwise angle in the x-y plane from the positive x-axis, and the elevation angle ϕ(π2,π2]\phi\in(-\frac{\pi}{2},\frac{\pi}{2}] is the angle from the x-y plane. The one-to-one transformation from Cartesian to spherical coordinates is as follows:

u:(x,y,z)(atan2(y,x),atan2(z,1z2))=(θ,ϕ),u:(x,y,z)\rightarrow\Big{(}\text{atan2}(y,x),\text{atan2}(z,\sqrt{1-z^{2}})\Big{)}=(\theta,\phi),

where atan2(y,x)\text{atan2}(y,x) is defined by the limit expression limzx+(yz)+π2sgn(y)sgn(x)(sgn(x)1)\lim_{z\rightarrow x^{+}}(\frac{y}{z})+\frac{\pi}{2}\text{sgn}(y)\text{sgn}(x)(\text{sgn}(x)-1). While the spherical coordinates (θ,ϕ)(\theta,\phi) are located in Euclidean space, they are bounded. Hence, we first scale the spherical coordinates (θ,ϕ)(\theta,\phi) to fall within the interval (0,1](0,1], then apply the logit function. The transformation is:

g:(θ,ϕ)(logit(θ+π2π),logit(ϕ+π/2π))=(θ~,ϕ~)×.g:(\theta,\phi)\rightarrow\Big{(}\text{logit}\big{(}\frac{\theta+\pi}{2\pi}\big{)},\text{logit}\big{(}\frac{\phi+\pi/2}{\pi}\big{)}\Big{)}=(\widetilde{\theta},\widetilde{\phi})\in\mathbb{R}\times\mathbb{R}.

Here, logit(x)=log(x1x)\text{logit}(x)=\log\left(\frac{x}{1-x}\right). Although the logit transformation ill-behaves on the boundary (i.e., at x=1x=1), the regression model built on this transformation is still adequately flexible, as the set of such degenerate cases has measure zero. This transformation serves to map the bounded spherical coordinates to a new set of parameters (θ~,ϕ~)(\widetilde{\theta},\widetilde{\phi}), which are unbounded. For clarity, the overall mapping from (x,y,z)(x,y,z) to (θ~,ϕ~)(\widetilde{\theta},\widetilde{\phi}) is denoted by ()\ell(\cdot), which is a composite of g()g(\cdot) and u()u(\cdot), i.e., ():=u()g()\ell(\cdot):=u(\cdot)\circ g(\cdot). Thus, (x,y,z)=[θ~,ϕ~]\ell(x,y,z)=[\widetilde{\theta},\widetilde{\phi}] represents this innovative function that projects the directional data from 𝒮2\mathcal{S}^{2} to ×\mathbb{R}\times\mathbb{R}.

A point on a pp dimensional hypersphere, 𝒮p1\mathcal{S}^{p-1}, can be expressed using p1p-1 angles. The first p2p-2 angles (denoted as ϕ(p2)=[ϕ1(p2),,ϕp2(p2)]{\bm{\phi}}^{(p-2)}=[{{\phi}}_{1}^{(p-2)},\ldots,{{\phi}}_{p-2}^{(p-2)}]) are within the interval (π/2,π/2](-\pi/2,\pi/2], while the last angle θ{\theta} is within (π,π](-\pi,\pi]. The mapping function, when extended to the general dimension, is formulated as:

g:(θ,ϕ(p2))(logit(θ+π2π),[logit(ϕ1(p2)+π/2π),,logit(ϕp1(p2)+π/2π)])\displaystyle g:(\theta,\bm{\phi}^{(p-2)})\rightarrow\Big{(}\text{logit}\big{(}\frac{\theta+\pi}{2\pi}\big{)},\big{[}\text{logit}\big{(}\frac{{\phi}_{1}^{(p-2)}+\pi/2}{\pi}\big{)},\ldots,\text{logit}\big{(}\frac{{\phi}_{p-1}^{(p-2)}+\pi/2}{\pi}\big{)}\big{]}\Big{)}
=(θ~,[ϕ~1(p2),,ϕ~p2(p2)])×p2.\displaystyle=(\widetilde{\theta},\big{[}\widetilde{{\phi}}_{1}^{(p-2)},\ldots,\widetilde{{\phi}}_{p-2}^{(p-2)}\big{]})\in\mathbb{R}\times\mathbb{R}^{p-2}.

In our proposed generalized regression model, the link function ()\ell(\cdot) plays a key role. The term (𝝁iv)=(θ~,[ϕ~1(p2),,ϕ~p2(p2)])\ell(\bm{\mu}_{iv})=(\widetilde{\theta},\big{[}\widetilde{{\phi}}_{1}^{(p-2)},\ldots,\widetilde{{\phi}}_{p-2}^{(p-2)}\big{]}) is considered as the prediction terms and is formulated as a function of the covariate vector 𝑿i\bm{X}_{i}. Specifically, we have:

𝔼(θ~,[ϕ~1(p2),,ϕ~p2(p2)])=(𝑿i𝜶v,[𝑿i𝜷v,1(p2),,𝑿i𝜷v,p2(p2)]).\mathbb{E}(\widetilde{\theta},\big{[}\widetilde{{\phi}}_{1}^{(p-2)},\ldots,\widetilde{{\phi}}_{p-2}^{(p-2)}\big{]})=(\bm{X}_{i}\bm{\alpha}_{v},[\bm{X}_{i}\bm{\beta}^{(p-2)}_{v,1},\ldots,\bm{X}_{i}\bm{\beta}^{(p-2)}_{v,p-2}]).

Lemma 1 illustrates that the link function ()\ell(\cdot) is a bijective function. This provides us with a direct mapping between the coordinates (θ~,[ϕ~1(p2),,ϕ~p2(p2)])(\widetilde{\theta},\big{[}\widetilde{{\phi}}_{1}^{(p-2)},\ldots,\widetilde{{\phi}}_{p-2}^{(p-2)}\big{]}) and the mode direction 𝝁iv\bm{\mu}_{iv}. This mapping is crucial in facilitating the interpretation of the regression coefficients.

Lemma 1.

The function ():𝒮p1p2×\ell(\cdot):\mathcal{S}^{p-1}\rightarrow\mathbb{R}^{p-2}\times\mathbb{R} is bijective.

Proof.

We know ():=u()g()\ell(\cdot):=u(\cdot)\circ g(\cdot) where \circ is function composition. Since u()u(\cdot) and g()g(\cdot) are both bijective, ()\ell(\cdot) is bijective. ∎

For the remainder of this article, in view of our DTI application and to simplify the exposition, we set p=3p=3, i.e.,

𝔼[θ~iv,ϕ~iv]T=[𝑿i𝜶v,𝑿i𝜷v]T,\mathbb{E}[\widetilde{\theta}_{iv},\widetilde{\phi}_{iv}]^{T}=[\bm{X}_{i}\bm{\alpha}_{v},\bm{X}_{i}\bm{\beta}_{v}]^{T},

and drop the superscript (p2)(p-2). Despite this simplification, it should be noted that it is straightforward to generalize all the steps we take here to accommodate any general pp.

2.3 Spatial Dependence

Spatial dependence is an essential component in many applications. In this section, we further illustrate the model specification which accounts for spatial dependence. By preserving 𝔼[θ~iv,ϕ~iv]T=[𝑿i𝜶v,𝑿i𝜷v]T\mathbb{E}[\widetilde{\theta}_{iv},\widetilde{\phi}_{iv}]^{T}=[\bm{X}_{i}\bm{\alpha}_{v},\bm{X}_{i}\bm{\beta}_{v}]^{T}, we let [θ~iv,ϕ~iv]T=[𝑿i𝜶v+ϵiv,𝑿i𝜷v+ξiv]T[\widetilde{\theta}_{iv},\widetilde{\phi}_{iv}]^{T}=[\bm{X}_{i}\bm{\alpha}_{v}+\epsilon_{iv},\bm{X}_{i}\bm{\beta}_{v}+\xi_{iv}]^{T}, where ϵi=(ϵ1i,,ϵiv,,ϵiV)\bm{\epsilon}_{i}=(\epsilon_{1i},\ldots,\epsilon_{iv},\ldots,\epsilon_{iV}) and 𝝃i=(ξ1i,,ξiv,,ξiV)\bm{\xi}_{i}=(\xi_{1i},\ldots,\xi_{iv},\ldots,\xi_{iV}) are distributed as the mean-zero Gaussian distribution with covariance matrices 𝚺(ϵ)\bm{\Sigma}^{(\epsilon)} and 𝚺(ξ)\bm{\Sigma}^{(\xi)}, respectively.

The focus of our applications is principal directional data where the spatial dependence is along a given fiber streamline. Suppose we have KK fiber streamlines, k\mathbb{Z}_{k} is a set of consecutive indices within the kk-th fiber streamline. We let the directional statistics are auto-correlated along a given streamline. For completeness, we first state the definition and notations related to the Gaussian autoregressive modeling in Definition 1.

Definition 1 (AR(PP) Process).

Given At=At1Φ1++AtPΦP+ϵtA_{t}=A_{t-1}\Phi_{1}+\ldots+A_{t-P}\Phi_{P}+\epsilon_{t} and ϵt𝒩(0,σ2)\epsilon_{t}\sim\mathcal{N}(0,\sigma^{2}), for tt\in\mathbb{Z}. For tt\in\mathbb{Z}, we define that {At:t}\{A_{t}:t\in\mathbb{Z}\} follows a Gaussian AR-PP process with correlation parameters (Φ1,,ΦP\Phi_{1},\ldots,\Phi_{P}) and variance σ2\sigma^{2}, denoted as ARP([Φ1,,ΦP],σ2)\text{AR}_{P}\Big{(}[\Phi_{1},\ldots,\Phi_{P}],\sigma^{2}\Big{)}.

We thus have ϵi=(ϵ1i,,ϵiv,,ϵiV)\bm{\epsilon}_{i}=(\epsilon_{1i},\ldots,\epsilon_{iv},\ldots,\epsilon_{iV}) and 𝝃i=(ξ1i,,ξiv,,ξiV)\bm{\xi}_{i}=(\xi_{1i},\ldots,\xi_{iv},\ldots,\xi_{iV}) follow a multivariate normal distribution with an autoregressive covariance matrix, denoted as

ϵik={ϵiv:vk}ARP([Φϵ1,,ΦϵP],τϵ2)\displaystyle\bm{\epsilon}_{ik}=\{\epsilon_{iv}:v\in\mathbb{Z}_{k}\}\sim\text{AR}_{P}\Big{(}[\Phi_{\epsilon 1},\ldots,\Phi_{\epsilon P}],\tau_{\epsilon}^{2}\Big{)} (3)
𝝃ik={ξiv:vk}ARP([Φξ1,,ΦξP],τξ2),\displaystyle\bm{\xi}_{ik}=\{\xi_{iv}:v\in\mathbb{Z}_{k}\}\sim\text{AR}_{P}\Big{(}[\Phi_{\xi 1},\ldots,\Phi_{\xi P}],\tau_{\xi}^{2}\Big{)},

where (Φϵ,1,,Φϵ,P,τϵ2)(\Phi_{\epsilon,1},\ldots,\Phi_{\epsilon,P},\tau_{\epsilon}^{2}) and (Φξ,1,,Φξ,P,τξ2)(\Phi_{\xi,1},\ldots,\Phi_{\xi,P},\tau_{\xi}^{2}) are the corresponding parameters. Note that our characterization of the AR(P)(P) process is based on the partial correlations but not the autoregressive coefficients directly.

2.4 Prior Settings

We now put priors on the model parameters to proceed with our Bayesian inference. The matrix 𝑸\bm{Q} is first reparametrized using the Cayley transformation as 𝑸=(𝑰𝑨)(𝑰+𝑨)1\bm{Q}=(\bm{I}-\bm{A})(\bm{I}+\bm{A})^{-1}, where 𝑨\bm{A} is a skew-symmetric matrix. This parametrization is useful for efficient posterior computation. Setting 𝑨=[0a1a2a10a3a2a30]\bm{A}=\begin{bmatrix}0&a_{1}&a_{2}\\ -a_{1}&0&a_{3}\\ -a_{2}&-a_{3}&0\end{bmatrix}, the priors for the three parameters a1,a3,a3a_{1},a_{3},a_{3} are set as mean-zero normal with variance 100100. For the regression coefficients, we also assign AR-PP process priors, denoted as

𝜶k(c)={αv(c):vk}ARP([Φα,1,,Φα,P],σα2)\displaystyle\bm{\alpha}_{k}(c)=\{\alpha_{v}(c):v\in\mathbb{Z}_{k}\}\sim\text{AR}_{P}\Big{(}[\Phi_{\alpha,1},\ldots,\Phi_{\alpha,P}],\sigma_{\alpha}^{2}\Big{)} (4)
𝜷k(c)={βv(c):vk}ARP([Φβ,1,,Φβ,P],σβ2),\displaystyle\bm{\beta}_{k}(c)=\{\beta_{v}(c):v\in\mathbb{Z}_{k}\}\sim\text{AR}_{P}\Big{(}[\Phi_{\beta,1},\ldots,\Phi_{\beta,P}],\sigma_{\beta}^{2}\Big{)},

where 𝜶v=(αv(0),,αv(c),,αv(C))\bm{\alpha}_{v}=\Big{(}{\alpha}_{v}(0),\ldots,{\alpha}_{v}(c),\ldots,{\alpha}_{v}(C)\Big{)} and 𝜷v=(βv(0),,βv(c),,βv(C))\bm{\beta}_{v}=\Big{(}{\beta}_{v}(0),\ldots,{\beta}_{v}(c),\ldots,{\beta}_{v}(C)\Big{)}. The terms αv(0){\alpha}_{v}(0) and βv(0){\beta}_{v}(0) are the coefficients for the intercepts. The terms αv(c){\alpha}_{v}(c) and βv(c){\beta}_{v}(c) are the coefficients for the covariate cc. The variances are assigned with weakly informative inverse-gamma priors with shape and rate parameters set to 0.1, denoted as τϵ2,τξ2,σα2,σβ2𝒢𝒜(0.1,0.1)\tau_{\epsilon}^{-2},\tau_{\xi}^{-2},\sigma_{\alpha}^{-2},\sigma_{\beta}^{-2}\sim\mathcal{GA}(0.1,0.1). We put a diffuse prior for the concentration parameter, κ\kappa, denoted as κ1\kappa\propto 1. The concentration parameter κ\kappa can be viewed as a nugget effect. An autoregressive process with the partial autocorrelation parameters restricted to (1,1)(-1,1) can ensure stationarity (Barnett et al., 1996). Therefore, we assign uniform (1,1)(-1,1) prior on the partial autocorrelation parameters. Specifically, ρα,p,ρβ,p,ρϵ,p,ρξ,p𝒰(1,1)\rho_{\alpha,p},\rho_{\beta,p},\rho_{\epsilon,p},\rho_{\xi,p}\sim\mathcal{U}(-1,1) for p{1,2,,P}p\in\{1,2,\ldots,P\}, where ρα,p,ρβ,p,ρϵ,p,ρξ,p\rho_{\alpha,p},\rho_{\beta,p},\rho_{\epsilon,p},\rho_{\xi,p} are the corresponding partial autocorrelation parameters associated with the autoregressive process priors for 𝜶v(c)\bm{\alpha}_{v}(c)’s, 𝜷v(c)\bm{\beta}_{v}(c)’s, and the autoregressive random effects ϵik\bm{\epsilon}_{ik}’s, and 𝝃ik\bm{\xi}_{ik}’s, respectively. Applying the Durbin-Levinson recursion on the partial autocorrelations (ρα,p,ρβ,p,ρϵ,p,ρξ,p\rho_{\alpha,p},\rho_{\beta,p},\rho_{\epsilon,p},\rho_{\xi,p}), we get back the autoregressive coefficients (Φα,p,Φβ,p,Φϵ,p,Φξ,p\Phi_{\alpha,p},\Phi_{\beta,p},\Phi_{\epsilon,p},\Phi_{\xi,p}) using the R function inla.ar.pacf2phi() from the R package inla (Lindgren and Rue, 2015). The reverse transformation is applied using inla.ar.phi2pacf() from the same package. We utilize a Markov chain Monte Carlo (MCMC) algorithm, which includes Metropolis-Hastings-within-Gibbs steps, to generate the posterior samples needed for our Bayesian inference. The codes and usages of our codes are stated in Section B.1. of the supplementary materials.

3 Bayesian Angular Inference

In this section, we present our novel Bayesian angular inference framework for directional data analysis. The interpretation of the regression coefficients in our model is not as straightforward as it is in linear regression. The proposed inferential framework aims to address this complexity in inference. The main difficulty arises when trying to illustrate the effects of covariates. In Section 3.2, we illustrate our approach for characterizing the effects of the covariates. This relies on a suitable metric for performing inferences based on directional statistics. For clarity, Section 3.1 first introduces the angular expectation formula which is going to be the metric in the later part. These approaches aim to improve our understanding and analysis of complex directional data.

3.1 Angular Expectation

For a random unit vector \mathcal{M}, the angular expectation is defined as

𝔸=argmin𝒎𝒮p1δ(𝒎,)π()𝑑,\mathbb{A}\mathcal{M}=\arg\min_{\bm{m}\in\mathcal{S}^{p-1}}\int\delta(\bm{m},\mathcal{M})\pi(\mathcal{M})d\mathcal{M},

where 𝔸\mathbb{A} is introduced as an operator returning the angular expectation, pp is the dimension of \mathcal{M}, and π()\pi(\mathcal{M}) stands for the distribution of \mathcal{M}. The angular expectation 𝔸\mathbb{A} is the main tool for our proposed Bayesian angular inference scheme. For a new subject with the covariate vector 𝑿new\bm{X}_{\text{new}}, the posterior predictive distribution of the direct mode direction 𝝁new,v\bm{\mu}_{\text{new},v} for the vv-th index (Equation 2), is [new,v|𝑿new;data]\Big{[}\mathcal{M}_{\text{new},v}|\bm{X}_{\text{new}};\text{data}\Big{]}. For clarity, we use subscript “new” to replace the original subscript “ii” in the model for referring to a new subject. For summarizing above posterior of new,v\mathcal{M}_{\text{new},v}, we compute the posterior angular expectation using the MCMC samples {new,v(t):t=1:T}\{\mathcal{M}_{\text{new},v}^{(t)}:t=1:T\} as

𝔸[new,v|𝑿new;data]argmin𝒎𝒮p11Tt=1Tδ(𝒎,new,v(t))=t=1Tnew,v(t)/Tt=1Tnew,v(t)/T.\displaystyle\mathbb{A}\Big{[}\mathcal{M}_{\text{new},v}|\bm{X}_{\text{new}};\text{data}\Big{]}\approx\arg\min_{\bm{m}\in\mathcal{S}^{p-1}}\frac{1}{T}\sum_{t=1}^{T}\delta(\bm{m},\mathcal{M}_{\text{new},v}^{(t)})=\frac{\sum_{t=1}^{T}\mathcal{M}_{\text{new},v}^{(t)}/T}{\|\sum_{t=1}^{T}\mathcal{M}_{\text{new},v}^{(t)}/T\|}.

Relating to the real applications, the above term profiles the expected fiber direction after the model-based adjustment for the covariate vector 𝑿new\bm{X}_{\text{new}}.

3.2 Tangent-Normal Decomposition to Investigate Covariate Effects

A transparent inference framework illustrating the covariate effects is essential for any scientific study. In our proposed model, the coefficients 𝜶v\bm{\alpha}_{v} and 𝜷v\bm{\beta}_{v} do not offer a straightforward illustration of the covariate effects on the mode directions. This is also a limitation of many regression models for complicated responses. To overcome this, we incorporate the tangent-normal decomposition (Mardia and Jupp, 2009, Page 169) to summarize the covariate effects on directional data as follows. The tangent-normal decomposition is an integrated approach, but can be decomposed into the following three steps for a concise illustration.

Our proposed procedure starts by defining the following two covariate vectors: a) 𝑿T=[1,XT1,,XTc,,XTC]\bm{X}_{\text{T}}=[1,{X}_{\text{T}1},\ldots,{X}_{\text{T}c},\ldots,{X}_{\text{T}C}]: Covariate vector for the typical condition; b) 𝑿S=[1,XS1,,XSc,,XSC]{\bm{X}}_{\text{S}}=[1,{X}_{\text{S}1},\ldots,{X}_{\text{S}c},\ldots,{X}_{\text{S}C}]: Covariate vector for the specific condition. Thus, we replace the original subscript “ii” with “T” and “S” in our notation, to represent a subject in the typical and specific condition, respectively.

Next, we obtain the corresponding predictive posterior distributions of the mode directions with the given typical or specific covariate vector. They are referred as to the typical and specific mode direction, respectively, expressed as follows: a) Typical Mode Direction: ¯v=[Tv|𝑿T;data]\overline{\mathcal{M}}_{v}=\Big{[}\mathcal{M}_{\text{T}v}|{\bm{X}}_{\text{T}};\text{data}\Big{]}; b) Specific Mode Direction: ~v=[Sv|𝑿H;data]\widetilde{\mathcal{M}}_{v}=\Big{[}\mathcal{M}_{\text{S}v}|{\bm{X}}_{\text{H}};\text{data}\Big{]}. They can also be approximately learned from the MCMC output through the 𝔸\mathbb{A} operator defined above.

To quantify the covariate effect, we employ the tangent-normal decomposition on the specific mode direction ~v\widetilde{\mathcal{M}}_{v} as in Figure 3(b) where the typical mode direction ¯v\overline{\mathcal{M}}_{v} is the tangent direction and the term 𝑹v\bm{R}_{v} is the normal direction. Applying this to the principal diffusion directions, the term 𝑹v\bm{R}_{v} can be interpreted as the deviation direction that is caused by the difference from the specific condition to the typical condition. The scalar mv(0,1)m_{v}\in(0,1), which controls the magnitude of this effect, can be used to quantify the importance of this covariate: the larger value of mvm_{v} indicates the stronger deviation between the specific condition to the typical condition. Computationally, the tangent-normal decomposition can be applied to each MCMC sample tt. This provides us with the posterior samples of 𝑹v\bm{R}_{v} and mvm_{v}, denoted as {𝑹v(t):t=1:T}\{\bm{R}_{v}^{(t)}:t=1:T\} and {mv(t):t=1:T}\{m_{v}^{(t)}:t=1:T\}. We use those samples to obtain posterior estimates of 𝔸[𝑹v|data]\mathbb{A}[\bm{R}_{v}|\text{data}] to spatially profile the deviation directions caused by the difference between typical condition and specific condition. Similarly, the posterior mean estimate of mvm_{v} can also be computed and used to quantify the magnitude. We use the following notation to express the tangent-normal decomposition:

𝒟TN(¯v||~v)={𝑹v,mv}\displaystyle\mathcal{D}_{TN}(\overline{\mathcal{M}}_{v}||\widetilde{\mathcal{M}}_{v})=\{\bm{R}_{v},m_{v}\} (5)

Our above illustration provides the general description of tangent-normal decomposition. As long as it is feasible to specify the typical and specific conditions, the may use {𝑹v,mv}\{\bm{R}_{v},m_{v}\} to infer the covariate effects caused by the difference between the typical and specific conditions. The below are two important remarks which are important for the general use of our approach.

Remark 1.

Two covariate vectors can be any two conditions where we want to compare the specific condition “to” the typical condition which is considered as reference. Usually, 𝐗T{\bm{X}}_{\text{T}} is set to the common characterization.

Remark 2.

𝒟TN(¯v||~v)\mathcal{D}_{TN}(\overline{\mathcal{M}}_{v}||\widetilde{\mathcal{M}}_{v}) is not symmetric in the two inputs, i.e., 𝒟TN(¯v||~v)𝒟TN(𝛍~v||¯v)\mathcal{D}_{TN}(\overline{\mathcal{M}}_{v}||\widetilde{\mathcal{M}}_{v})\not\equiv\mathcal{D}_{TN}(\widetilde{\bm{\mu}}_{v}||\overline{\mathcal{M}}_{v}). Thus, the specific condition and the typical condition cannot be exchanged.

While we illustrate the use of our approach as a generic use, interpreting one covariate effect at a time is usually a common manner. Therefore, we further provide two examples below to illustrate how to interpret one covariate effect at a time. In summary, the key strategy is to have typical and specific conditions differ only in the covariate cc^{\prime} of our primary interest.

Example 1 (Continuous Covariate).

Let cc^{\prime} represent a continuous covariate. The terms XTcX_{\text{T}c^{\prime}} and XScX_{\text{S}c^{\prime}} correspond to the values in the typical and the specific cases, respectively. When investigating the impact of a single unit increase in the covariate cc^{\prime}, with XTc=aX_{\text{T}c^{\prime}}=a, we assign XSc=a+1X_{\text{S}c^{\prime}}=a+1. Concurrently, we ensure XSc=XTcX_{\text{S}c}=X_{\text{T}c} for all other cc variables.

Example 2 (Categorical Covariate).

Let cc^{\prime} represent a categorical covariate with MM levels, where m1,,Mm\in{1,\ldots,M}. The terms {XTc,m}{0,1}\{X_{\text{T}c^{\prime},m}\}\in\{0,1\} and {XSc,m}{0,1}\{X_{\text{S}c^{\prime},m}\}\in\{0,1\} denote the corresponding specification for level mm. Specifically, XTc,j=1X_{\text{T}c^{\prime},j}=1 and XSc,j=1X_{\text{S}c^{\prime},j^{\prime}}=1 if the typical case is categorized under the jj-th level and the specific condition falls under the jj^{\prime}-th level. This is with the condition that XTc,m=0X_{\text{T}c^{\prime},m}=0 for all mjm\neq j, and XSc,m=0X_{\text{S}c^{\prime},m}=0 for all kjk\neq j^{\prime}. In assessing the impact of the jj^{\prime}-th level starting from a typical case in the jj-th level, we set XSc,j=1X_{\text{S}c^{\prime},j^{\prime}}=1 and XTc,j=1X_{\text{T}c^{\prime},j}=1, while ensuring XSc=XTcX_{\text{S}c}=X_{\text{T}c} for all the other c{0:C}c\in\{0:C\} variables.

4 Numerical Studies

In this section, we aim to demonstrate the efficacy of our model compared to traditional methods. Initially, we evaluate our model fitness on synthetic data in Section . We generate the synthetic data to best mimic the real ADNI data.

The ADNI initiative was inaugurated in 2003 as a public-private partnership with the primary goal of assessing whether the amalgamation of serial magnetic resonance imaging, positron emission tomography, other biological markers, and clinical as well as neuropsychological assessments could aid in predicting the progression of Alzheimer’s disease. This paper specifically focuses on ADNI-2, a project that commenced in September 2011 and spanned five years (Aisen et al., 2010). The Centers for Disease Control and Prevention (https://www.cdc.gov/aging/aginginfo/alzheimers.htm) suggests that Alzheimer’s disease symptoms typically occur after the age of 60. Therefore, we have tailored our study cohort to focus on subjects over 60 years old. The subjects are further classified into four disease states: healthy controls (CN), early mild cognitive impairment (EMCI), late mild cognitive impairment (LMCI), and Alzheimer’s Disease (AD). We use g{CN,EMIC,LMCI,AD}g\in\{\text{CN},\text{EMIC},\text{LMCI},\text{AD}\} to denote the group index.

Additionally, we collect subject-level information including age (Xi,ageX_{i,age}), gender (Xi,genderX_{i,gender}), mini-mental state examination (MMSE) score (Xi,MMSEX_{i,MMSE}), and the Apolipoprotein E (APOE) profile (Xi,APOEX_{i,APOE}). The MMSE is a performance-based neuropsychological test score ranging from 0 to 3030. Subjects with cognitive impairments are typically associated with a lower MMSE score. The APOE is a genetic marker with three major variants (ϵ\epsilon-2, ϵ\epsilon-3, and ϵ\epsilon-4). The ϵ\epsilon-4 variant is considered the most significant known genetic risk factor for Alzheimer’s disease across various ethnic groups (Sadigh-Eteghad et al., 2012). Consequently, our analysis includes APOE-4 as a binary indicator, which denotes the presence of the ϵ\epsilon-4 variant in subjects. We also incorporate age and gender into our analysis, given that previous studies have established their correlation with AD (Podcasy and Epperson, 2022). In Section 4.1, we provide the model comparison based on both synthetic data and real ADNI data. In Section 4.2, we provide the tangent-normal decomposition within Bayesian angular inference to infer the covariate effects. In both sections, we make the design matrix as

𝑿i=[1,Xi,age,Xi,gender,Xi,MMSE,Xi,APOE]\displaystyle\bm{X}_{i}=[1,X_{i,age},X_{i,gender},X_{i,MMSE},X_{i,APOE}] (6)

We further specify the regression terms as

𝔼(𝝁iv)=𝔼[θ~iv,ϕ~iv]T=[𝑿i𝜶gv,𝑿i𝜷gv]Tfor subject i in clinical group g.\displaystyle\mathbb{E}\ell\Big{(}\bm{\mu}_{iv}\Big{)}=\mathbb{E}[\widetilde{\theta}_{iv},\widetilde{\phi}_{iv}]^{T}=[\bm{X}_{i}\bm{\alpha}_{gv},\bm{X}_{i}\bm{\beta}_{gv}]^{T}\quad\text{for subject $i$ in clinical group $g$}. (7)

In this way, the covariate effects are clinical group-specific.

4.1 Model Comparison

In this section, we carry out numerical analyses on both the ADNI data and synthetic principal diffusion direction data. The purpose of these studies is to demonstrate the superior performance of our proposed vMF regression relative to other methods. Here, we consider three competing methods based on multivariate Gaussian distribution: 1) Gaussian Regression 1: a multivariate regression model for the principal diffusion directions 𝑬iv\bm{E}_{iv}, 2) Gaussian Regression 2: a multivariate regression model for transformed means (𝑬iv)\ell\Big{(}\bm{E}_{iv}\Big{)}, 3) Non-spatial vMF regression: 𝑸T𝑬ivvMF(iv,κ)\bm{Q}^{T}\bm{E}_{iv}\sim\text{vMF}\big{(}\mathcal{M}_{iv},\kappa\big{)} and (𝝁iv)=[θ~iv,ϕ~iv]T=[𝑿i𝜶gv,𝑿i𝜷gv]T\ell\Big{(}\bm{\mu}_{iv}\Big{)}=[\widetilde{\theta}_{iv},\widetilde{\phi}_{iv}]^{T}=[\bm{X}_{i}\bm{\alpha}_{gv},\bm{X}_{i}\bm{\beta}_{gv}]^{T}. Specific details are described as follows. The Gaussian Regression 1 simply treats the principal diffusion directions 𝑬iv\bm{E}_{iv} as normally distributed random variables, such that 𝑬iv𝒩(iv,𝚺1)\bm{E}_{iv}\sim\mathcal{N}(\mathcal{M}_{iv},\bm{\Sigma}_{1}), where iv(p)=𝑿i𝒖v(p)\mathcal{M}_{iv}^{(p)}=\bm{X}_{i}\bm{u}_{v}^{(p)}, iv(p)\mathcal{M}_{iv}^{(p)} is the pp-th element of iv\mathcal{M}_{iv} for p{1,2,3}p\in\{1,2,3\}, i.e., iv=(iv(1),iv(2),iv(3))\mathcal{M}_{iv}=(\mathcal{M}_{iv}^{(1)},\mathcal{M}_{iv}^{(2)},\mathcal{M}_{iv}^{(3)}). Alternatively, in case of Gaussian Regression 2:, the terms (𝑬iv)=[Aiv,Biv]T\ell\Big{(}\bm{E}_{iv}\Big{)}=[A_{iv},B_{iv}]^{T} are assumed to follow a normal distribution, denoted as (𝑬iv)=[Aiv,Biv]T𝒩([θ~iv,ϕ~iv]T,𝚺2)\ell\Big{(}\bm{E}_{iv}\Big{)}=[A_{iv},B_{iv}]^{T}\sim\mathcal{N}([\tilde{\theta}_{iv},\tilde{\phi}_{iv}]^{T},\bm{\Sigma}_{2}), where θ~iv=𝑿i𝒂v\tilde{\theta}_{iv}=\bm{X}_{i}\bm{a}_{v} and ϕ~iv=𝑿i𝒃v\tilde{\phi}_{iv}=\bm{X}_{i}\bm{b}_{v}. In case of Non-spatial vMF Regression:, we simply remove the spatially dependent randomness from the proposed model.

We take the Bayesian route for inference for all these methods and put weakly informative priors on the unknown parameters. Specifically, we give normal priors with mean 𝟎\bm{0} and covariance matrix 1000×𝑰1000\times\bm{I} on the coefficients, denoted as 𝒖v(1),𝒖v(2),𝒖v(3),𝒂v,𝒃v𝒩(𝟎,1000×𝑰)\bm{u}_{v}^{(1)},\bm{u}_{v}^{(2)},\bm{u}_{v}^{(3)},\bm{a}_{v},\bm{b}_{v}\sim\mathcal{N}(\bm{0},1000\times\bm{I}). We further put inverse Wishart prior with degrees of freedom 55 and scale matrix 𝑰\bm{I} on the covariance matrices, denoted as 𝚺11,𝚺21𝒲(5,𝑰)\bm{\Sigma}_{1}^{-1},\bm{\Sigma}_{2}^{-1}\sim\mathcal{W}(5,\bm{I}). Our inference is based on the posterior samples, generated by applying the MCMC algorithm, where we collect 3,0003,000 post-burn samples after discarding the first 2,0002,000. The MCMC convergence is monitored by Heidelberger and Welch’s convergence diagnostic (Heidelberger and Welch, 1981).

We generate synthetic principal diffusion direction data using our proposed vMF regression model. We design a synthetic fiber tract that includes K=12K=12 fibers and a total of 1,2561,256 voxels. We also have covariate vectors for 20 subjects for training and 20 subjects for validation. For each data replication, we fit our model to the training data to obtain the parameter estimates and these parameter estimates are used to predict the response in the validation data. The discrepancies between the predict the responses and true responses are used to measure model accuracy.

The tractography atlas, values of covariate vectors and voxel-wise coefficients utilized are stored in the R file synthetic.Rdata, as detailed in Section B.2 of the supplementary materials. For other parameters, we set P=5P=5 and generate partial correlation coefficients from 𝒰(1,1)\mathcal{U}(-1,1). The variances are specified as τϵ2=τξ2=σα2=σβ2=1\tau_{\epsilon}^{2}=\tau_{\xi}^{2}=\sigma_{\alpha}^{2}=\sigma_{\beta}^{2}=1, and orthogonal matrix is specified as 𝑸=𝑰\bm{Q}=\bm{I}. We further give κ{20,30,40}\kappa\in\{20,30,40\}, respectively. A total of 50 replicated data sets for each setting.

Prediction errors are quantified in terms of a) the separation angle b) the root mean squared error. The separation angle is defined as i,vδ(^iv,𝑬iv)/Niv\sum_{i,v}\delta\Big{(}\widehat{\mathcal{M}}_{iv},\bm{E}_{iv}\Big{)}/N_{iv} for vMF regression and i,vδ(^iv,𝑬iv)/Niv\sum_{i,v}\delta\Big{(}\widehat{\mathcal{M}}_{iv},\bm{E}_{iv}\Big{)}/N_{iv} for Gaussian regression methods. The root mean squared error is defined as i,v^iv𝑸^T𝑬iv/Niv\sum_{i,v}\|\widehat{\mathcal{M}}_{iv}-\hat{\bm{Q}}^{T}\bm{E}_{iv}\|/N_{iv} for vMF regression and i,v^iv/^iv𝑬iv/Niv\sum_{i,v}\|\widehat{\mathcal{M}}_{iv}/\|\widehat{\mathcal{M}}_{iv}\|-\bm{E}_{iv}\|/N_{iv} for Gaussian case. We give the prediction errors spanning simulation replications in Figure 4. Our proposed method substantially outperforms Gaussian alternatives and the non-spatial vMF regression model based on these prediction errors. The spatial vMF regression with P=5P=5 demonstrates comparatively better performance among P{2,,7}P\in\{2,\ldots,7\}. This aligns with the fact that true data is generated by setting P=5P=5. It suggests that the optimal lag can be chosen based on the predictive performances for our real data analysis.

Refer to caption
Refer to caption
Figure 4: Comparison of Prediction errors quantified in terms of separation angles and root-mean-squared errors between the observed and predicted means. For the proposed vMF model, errors for different choices of spatial lags from 2 to 7 are provided.

4.2 Bayesian Angular Inference for ADNI Data

In this section, we utilize our Bayesian angular inference method to analyze data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). We illustrate the effects of covariates via our proposed tangent-normal decomposition. Our techniques shed fresh light on the neurological progression of Alzheimer’s disease. To maintain a specific emphasis in our study, we concentrate on examining the influence of the APOE-4 allele on brain anatomical structure. This analysis takes into account other covariates such as age, gender, and scores from the Mini-Mental State Examination (MMSE). We focus on fornix in this section.

To achieve this goal, the typical covariate vector is given as follows

𝑿¯T(g)=[1,X¯age(g),X¯T,gender=0,X¯MMSE(g),XT,APOE=0]\displaystyle\bar{\bm{X}}_{T}^{(g)}=[1,\bar{X}_{age}^{(g)},\bar{X}_{T,gender}=0,\bar{X}_{MMSE}^{(g)},X_{T,APOE}=0] (8)

or

𝑿¯T(g)=[1,X¯age(g),X¯T,gender=1,X¯MMSE(g),XT,APOE=0]\displaystyle\bar{\bm{X}}_{T}^{(g)}=[1,\bar{X}_{age}^{(g)},\bar{X}_{T,gender}=1,\bar{X}_{MMSE}^{(g)},X_{T,APOE}=0] (9)

where Equations (10) and (11) are the typical covariate vectors of male and female, respectively. We consider the wild type of APOE-4 (i.e., XT,APOE=0X_{T,APOE}=0) as typical. The terms X¯age(g)\bar{X}_{age}^{(g)} and X¯MMSE(g)\bar{X}_{MMSE}^{(g)} are the averages of age and MMSE within group gg. We decompose the gender specification into male and female, in order to provide the gender-specific inference. The corresponding specific covariate vector is given as

𝑿¯S(g)=[1,X¯age(g),X¯T,gender=0,X¯MMSE(g),XS,APOE=1]\displaystyle\bar{\bm{X}}_{S}^{(g)}=[1,\bar{X}_{age}^{(g)},\bar{X}_{T,gender}=0,\bar{X}_{MMSE}^{(g)},X_{S,APOE}=1] (10)

or

𝑿¯S(g)=[1,X¯age(g),X¯s,gender=1,X¯MMSE(g),XS,APOE=1],\displaystyle\bar{\bm{X}}_{S}^{(g)}=[1,\bar{X}_{age}^{(g)},\bar{X}_{s,gender}=1,\bar{X}_{MMSE}^{(g)},X_{S,APOE}=1], (11)

where the APOE-4 status is considered as mutated (i.e., XS,APOE=1X_{S,APOE}=1).

To incorporate group-level analysis into our study, we use the superscript (g)(g) to denote group-specific terms. In this section, we focused on the investigation of the effect of APOE-4. The posterior [mv(g)|data][m_{v}^{(g)}|\text{data}] implies the strength of the covariate effect and we consider mv(g)>0.65m_{v}^{(g)}>0.65 as large impact. Therefore, we calculate the posterior probabilities for Pr(mv(g)0.65|data)Pr(m_{v}^{(g)}-0.65|\text{data}) As an exploratory analysis, we consider the voxels whose posterior z-value larger than the 9-th quantile of the posterior z-values over voxels and group as impacted voxels. The visualization of the ADNI group is in Figure 5. Compared to other clinical groups (see Figure A.2 in supplementary materials), the ADNI group’s fornix anatomical stricture is more impacted by the factor of APOE. Also, male has more laterality of the APOE effect, compared to women. Male’s left component of fornix is more impacted by the factor of APOE.

Refer to caption
Figure 5: We present a tangent-normal decomposition of APOE-4 with a specific focus on the fornix as our region of interest. The upper panel is for female and the lower panel is for male. The colors are for the posterior z-scores. For impacted voxels, the red arrows are for typical mode directions and the blue arrows are for specific mode directions.

5 Conclusion and Discussion

This paper presents a novel spatial generalized linear regression framework for modeling spatially varying directional data as an outcome. The framework leverages a vMF-distributed error model, shown to accurately capture local, covariate-dependent variations. Given the directional nature of the data, we employ an autoregressive framework to capture spatial variations. Numerical evaluations, performed on both real and synthetic data, demonstrate our proposed method’s superior performance. By applying our proposed Bayesian angular inference scheme to the ADNI data, we have gleaned several novel scientific insights.

Our current analysis is cross-sectional. However, the exploration of longitudinal changes, specific to our ADNI study, akin to (Wang and Guo, 2019; Kundu et al., 2019), could be intriguing for identifying underlying mechanistic changes over time. Nonetheless, such covariate-dependent longitudinal research poses significant challenges. For instance, brain images and other biomarkers are often measured asynchronously (Li et al., 2022). In future work, extending our generalized vMF regression to accommodate asynchronously collected longitudinal data would be an interesting direction to pursue. Furthermore, our proposed model can be seamlessly integrated into other popular inference frameworks, such as mediation analysis, providing additional avenues of interest. In the context of analyzing DTI data, it will be interesting to incorporate fiber-shape information for more holistic analysis (Yeh, 2020). Future research will explore such a direction.

One of the most important aspects of our proposed model is the incorporation of spatial dependence. Thus, one may consider other modeling choices for the directional distribution and link functions proposed in the literature (Scealy and Welsh, 2017; Scealy and Wood, 2019; Paine et al., 2020) and integrate spatial dependence. The proposed autoregressive spatial dependence can still be applied under these other modeling choices seamlessly. It is thus an important future research avenue to compare these different modeling choices holistically for spatially dependent directional data. Another direction of future research is to consider a scale mixture of vMF distribution to characterize the data, which will allow different subjects to have different κ\kappa’s. Another generalization is possible in the specification of κ\kappa. We have now kept it fixed for all the spatial locations. It might be interesting to allow κ\kappa to vary slowly. Furthermore, unimodality might be a restrictive assumption.

References

  • (1)
  • Aisen et al. (2010) Aisen, P. S., Petersen, R. C., Donohue, M. C., Gamst, A., Raman, R., Thomas, R. G., Walter, S., Trojanowski, J. Q., Shaw, L. M. and Beckett, L. A. (2010), ‘Clinical core of the Alzheimer’s disease neuroimaging initiative: progress and plans’, Alzheimer’s & Dementia 6(3), 239–246.
  • Banerjee et al. (2023) Banerjee, S., Alaimo Di Loro, P., Mingione, M., Lipsitt, J., Batteate, C. and Jerrett, M. (2023), ‘Bayesian hierarchical modeling and analysis for actigraph data from wearable devices’.
  • Barnett et al. (1996) Barnett, G., Kohn, R. and Sheather, S. (1996), ‘Bayesian estimation of an autoregressive model using Markov chain Monte Carlo’, Journal of Econometrics 74(2), 237–254.
  • Cahn et al. (2021) Cahn, A. J., Little, G., Beaulieu, C. and Tétreault, P. (2021), ‘Diffusion properties of the fornix assessed by deterministic tractography shows age, sex, volume, cognitive, hemispheric, and twin relationships in young adults from the human connectome project’, Brain Structure and Function 226(2), 381–395.
  • Chang (1986) Chang, T. (1986), ‘Spherical regression’, The Annals of Statistics 14(3), 907–924.
  • Cornea et al. (2017) Cornea, E., Zhu, H., Kim, P., Ibrahim, J. G. and Initiative, A. D. N. (2017), ‘Regression models on riemannian symmetric spaces’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(2), 463–482.
  • Downs (2003) Downs, T. (2003), ‘Spherical regression’, Biometrika 90(3), 655–668.
  • Fisher and Lee (1992) Fisher, N. I. and Lee, A. J. (1992), ‘Regression models for an angular response’, Biometrics pp. 665–677.
  • Goodlett et al. (2009) Goodlett, C. B., Fletcher, P. T., Gilmore, J. H. and Gerig, G. (2009), ‘Group analysis of DTI fiber tract statistics with application to neurodevelopment’, Neuroimage 45(1), S133–S142.
  • Gould (1969) Gould, A. L. (1969), ‘A regression technique for angular variates’, Biometrics pp. 683–700.
  • Heidelberger and Welch (1981) Heidelberger, P. and Welch, P. D. (1981), ‘A spectral method for confidence interval generation and run length control in simulations’, Communications of the ACM 24(4), 233–245.
  • Hinkle et al. (2014) Hinkle, J., Fletcher, P. T. and Joshi, S. (2014), ‘Intrinsic polynomials for regression on riemannian manifolds’, Journal of Mathematical Imaging and Vision 50, 32–52.
  • Johnson and Wehrly (1978) Johnson, R. A. and Wehrly, T. E. (1978), ‘Some angular-linear distributions and related regression models’, Journal of the American Statistical Association 73(363), 602–606.
  • Kent et al. (2006) Kent, J., Mardia, K. and McDonnell, P. (2006), ‘The complex bingham quartic distribution and shape analysis’, Journal of the Royal Statistical Society Series B: Statistical Methodology 68(5), 747–765.
  • Kundu et al. (2019) Kundu, S., Lukemire, J., Wang, Y. and Guo, Y. (2019), ‘A novel joint brain network analysis using longitudinal Alzheimer’s disease data’, Scientific reports 9(1), 1–18.
  • Li et al. (2022) Li, T., Li, T., Zhu, Z. and Zhu, H. (2022), ‘Regression analysis of asynchronous longitudinal functional and scalar data’, Journal of the American Statistical Association 117(539), 1228–1242.
  • Lindgren and Rue (2015) Lindgren, F. and Rue, H. (2015), ‘Bayesian spatial modelling with r-inla’, Journal of statistical software 63, 1–25.
  • Mahan (1991) Mahan, R. P. (1991), Circular statistical methods: Applications in spatial and temporal performance analysis, Vol. 16, US Army Research Institute for the Behavioral and Social Sciences.
  • Mardia and Jupp (2009) Mardia, K. V. and Jupp, P. E. (2009), Directional statistics, Vol. 494, John Wiley & Sons.
  • Murphy et al. (2022) Murphy, E., Huang, W., Bessac, J., Wang, J. and Kotamarthi, R. (2022), ‘Joint modeling of wind speed and wind direction through a conditional approach’, arXiv preprint arXiv:2211.13612 .
  • Neto et al. (2014) Neto, J. H. V., Schmidt, A. M. and Guttorp, P. (2014), ‘Accounting for spatially varying directional effects in spatial covariance structures’, Journal of the Royal Statistical Society: Series C: Applied Statistics pp. 103–122.
  • Paine et al. (2020) Paine, P. J., Preston, S., Tsagris, M. and Wood, A. T. (2020), ‘Spherical regression models with general covariates and anisotropic errors’, Statistics and Computing 30, 153–165.
  • Pewsey and García-Portugués (2021) Pewsey, A. and García-Portugués, E. (2021), ‘Recent advances in directional statistics’, Test 30(1), 1–58.
  • Podcasy and Epperson (2022) Podcasy, J. L. and Epperson, C. N. (2022), ‘Considering sex and gender in Alzheimer disease and other dementias’, Dialogues in clinical neuroscience .
  • Presnell et al. (1998) Presnell, B., Morrison, S. P. and Littell, R. C. (1998), ‘Projected multivariate linear models for directional data’, Journal of the American Statistical Association 93(443), 1068–1077.
  • Rosenthal et al. (2014) Rosenthal, M., Wu, W., Klassen, E. and Srivastava, A. (2014), ‘Spherical regression models using projective linear transformations’, Journal of the American Statistical Association 109(508), 1615–1624.
  • Sadigh-Eteghad et al. (2012) Sadigh-Eteghad, S., Talebi, M. and Farhoudi, M. (2012), ‘Association of apolipoprotein e epsilon 4 allele with sporadic late onset Alzheimer’s disease. a meta-analysis’, Neurosciences Journal 17(4), 321–326.
  • Scealy and Welsh (2017) Scealy, J. and Welsh, A. (2017), ‘A directional mixed effects model for compositional expenditure data’, Journal of the American Statistical Association 112(517), 24–36.
  • Scealy and Wood (2019) Scealy, J. and Wood, A. T. (2019), ‘Scaled von mises–fisher distributions and regression models for paleomagnetic directional data’, Journal of the American Statistical Association .
  • Schwartzman et al. (2005) Schwartzman, A., Dougherty, R. F. and Taylor, J. E. (2005), ‘Cross-subject comparison of principal diffusion direction maps’, Magnetic Resonance in Medicine: An Official Journal of the International Society for Magnetic Resonance in Medicine 53(6), 1423–1431.
  • Schwartzman et al. (2008) Schwartzman, A., Mascarenhas, W. F. and Taylor, J. E. (2008), ‘Inference for eigenvalues and eigenvectors of Gaussian symmetric matrices’, The Annals of Statistics 36(6), 2886–2919.
  • Soares et al. (2013) Soares, J. M., Marques, P., Alves, V. and Sousa, N. (2013), ‘A hitchhiker’s guide to diffusion tensor imaging’, Frontiers in neuroscience 7.
  • Straub (2017) Straub, J. (2017), ‘Bayesian inference with the von-mises-fisher distribution in 3d’.
  • Valdés Cabrera et al. (2020) Valdés Cabrera, D., Stobbe, R., Smyth, P., Giuliani, F., Emery, D. and Beaulieu, C. (2020), ‘Diffusion tensor imaging tractography reveals altered fornix in all diagnostic subtypes of multiple sclerosis’, Brain and behavior 10(1), e01514.
  • Wang and Guo (2019) Wang, Y. and Guo, Y. (2019), ‘A hierarchical independent component analysis model for longitudinal neuroimaging studies’, NeuroImage 189, 380–400.
  • Yeatman et al. (2012) Yeatman, J. D., Dougherty, R. F., Myall, N. J., Wandell, B. A. and Feldman, H. M. (2012), ‘Tract profiles of white matter properties: automating fiber-tract quantification’, PloS one 7(11), e49790.
  • Yeh (2020) Yeh, F.-C. (2020), ‘Shape analysis of the human association pathways’, Neuroimage 223, 117329.
  • Yeh et al. (2018) Yeh, F.-C., Panesar, S., Fernandes, D., Meola, A., Yoshino, M., Fernandez-Miranda, J. C., Vettel, J. M. and Verstynen, T. (2018), ‘Population-averaged atlas of the macroscale human structural connectome and its network topology’, Neuroimage 178, 57–68.
  • Zhu et al. (2011) Zhu, H., Kong, L., Li, R., Styner, M., Gerig, G., Lin, W. and Gilmore, J. H. (2011), ‘FADTTS: functional analysis of diffusion tensor tract statistics’, NeuroImage 56(3), 1412–1425.