Covariate-Adjusted Inference for Differential Analysis of High-Dimensional Networks
Abstract
Differences between biological networks corresponding to disease conditions can help delineate the underlying disease mechanisms. Existing methods for differential network analysis do not account for dependence of networks on covariates. As a result, these approaches may detect spurious differential connections induced by the effect of the covariates on both the disease condition and the network. To address this issue, we propose a general covariate-adjusted test for differential network analysis. Our method assesses differential network connectivity by testing the null hypothesis that the network is the same for individuals who have identical covariates and only differ in disease status. We show empirically in a simulation study that the covariate-adjusted test exhibits improved type-I error control compared with naïve hypothesis testing procedures that do not account for covariates. We additionally show that there are settings in which our proposed methodology provides improved power to detect differential connections. We illustrate our method by applying it to detect differences in breast cancer gene co-expression networks by subtype.
1 Introduction
Complex diseases are often associated with aberrations in biological networks, such as gene regulatory networks and brain functional or structural connectivity networks [1]. Performing differential network analysis, or identifying connections in biological networks that change with disease condition, can provide insights into the disease mechanisms and lead to the identification of network-based biomarkers [19, 10].
Probabilistic graphical models are commonly used to summarize the conditional independence structure of a set of nodes in a biological network. A common approach to differential network analysis is to first estimate the graph corresponding to each disease condition and then assess between-condition differences in the graph. For instance, when using Gaussian graphical models, one can learn the network by estimating the inverse covariance matrix using the graphical LASSO [9]; one can then identify changes in the inverse covariance matrix associated with disease condition [49, 38, 15]. Alternatively, the condition-specific networks can be estimated using neighborhood selection [27]; in this approach, partial correlations among nodes are estimated by fitting a series of linear regressions in which one node is treated as the outcome, and the remaining nodes are treated as regressors. Changes in the network can then be delineated from differences in the regression coefficients by disease condition [2, 39]. More generally, the condition-specific networks are often modeled using exponential family pairwise interaction models [23, 40, 44, 43].
The approach to differential network analysis described above may lead to the detection of between-group differences in biological networks that are not necessarily meaningful, in particular, when the condition-specific networks depend on covariates (e.g., age and sex). This is because between-group network differences can be induced by confounding variables, i.e., variables that are associated with both the within-group networks, and the disease condition. In such cases, the network differences by disease condition may only reflect the association between the confounding variable and the disease. It is therefore important to account for the relationship between covariates and biological networks when performing differential network analysis.
In this paper, we propose a two-sample test for differential network analysis that accounts for within-group dependence of the networks on covariates. More specifically, we propose to perform covariate-adjusted inference using a class of pairwise interaction models for the within-group networks. Our approach treats each condition-specific network as a function of the covariates. It then performs a hypothesis test for equivalence of these functions. To accommodate the high-dimensional setting, in which the number of nodes in the network is large relative to the number of samples collected, we propose to estimate the networks using a regularized estimator and to perform hypothesis testing using a bias-corrected version of the regularized estimate [11].
Our proposal is related to existing literature on modeling networks as functions of a small number of variables. For example, there are various proposals for estimating high-dimensional inverse covariance matrices, conditional upon continuous low-dimensional features [50, 36]. Also related are methods for regularized estimation of high-dimensional varying coefficient models, wherein the regression coefficients are functions of a small number of covariates [35]. Our method is similar but places a particular emphasis on hypothesis testing in order to assess the statistical significance of observed changes in the network. Our approach lays the foundation for a general class of graphical models and is the first, to the best of our knowledge, to perform covariate-adjusted hypothesis tests for differential network analysis.
The rest of the paper is organized as follows. In Section 2, we begin with a broad overview of our proposed framework for covariate-adjusted differential network analysis in pairwise interaction exponential family models and introduce some working examples. In the following sections, we specialize our framework by considering two different approaches for estimation and inference: In Section 3, we describe a method that uses neighborhood selection [27, 7, 40], and in Section 4, we discuss an alternative estimation approach that utilizes the score matching framework of Hyvärinen [17, 18]. We assess the performance of our proposed methodology on synthetic data in Section 5 and apply it to a breast cancer data set from The Cancer Genome Atlas (TCGA) [37] in Section 6. We conclude with a brief discussion in Section 7.
2 Overview of the Proposed Framework
2.1 Differential Network Analysis without Covariate Adjustment
To formalize our problem, we begin by introducing some notation. We compare networks between two groups, labeled by . We obtain measurements of variables , corresponding to nodes in a graphical model [26], on subjects in group and subjects in group . We define as the sample space of . Let denote the data for node for subject in group , and let be an -dimensional vector of measurements on node for group .
Our objective is to determine whether the association between variables and , conditional upon all other variables, differs by group. Our approach is to specify a model for such that the conditional dependence between any two nodes and can be represented by a single scalar parameter . If the association between nodes and is the same in both groups and , . Conversely, if , we say nodes and are differentially connected. We assess for differential connectivity by performing a test of the null hypothesis
(1) |
We consider a general class of exponential family pairwise interaction models. For , we assume the density function for takes the form
(2) |
where and are fixed and known functions, is a matrix with elements , and is the log-partition function. The dependence between and is measured by , and nodes and are conditionally independent in group if and only if .
This class of exponential family distributions is rich and includes several models that have been studied previously in the graphical modeling literature. One such example is the Gaussian graphical model, perhaps the most widely-used graphical model for continuous data. For the density function for mean-centered Gaussian random vectors can be expressed as
(3) |
and is thus a special case of (2) with and . The non-negative Gaussian density, which takes the form of (3) with the constraint that takes values in , also belongs to the exponential family class. Another canonical example is the Ising model, commonly used for studying conditional dependencies among binary random variables. For , the density function for the Ising model can be expressed as
Additional examples include the Poisson model, the exponential graphical model, and conditionally-specified mixed graphical models [40, 7].
When asymptotically normal estimates of and are available, one can perform a calibrated test of based on the difference between the estimates. In many cases, asymptotically normal estimates can be obtained using well-established methodology. For instance, when the log-partition function is available in closed form and is tractable, one can obtain estimates via (penalized) maximum likelihood. This is a standard approach in the Gaussian setting, in which case the log-partition function is easy to compute. However, this is not the case for other exponential family models. Likelihood-based estimation strategies are thus generally difficult to implement. In this paper, we consider two alternative strategies that have been proposed to overcome these computational challenges and are more broadly applicable.
The first approach we discuss is neighborhood selection [7, 27, 40]. Consider a sub-class of exponential family graphical models for which the conditional density function for any node given the remaining nodes belongs to a univariate exponential family model. Because the log-partition function in univariate exponential family models is available in closed form, it is computationally feasible to estimate each conditional density function. By estimating the conditional density functions, one can identify the neighbors of nodes , that is, the nodes upon which the conditional distribution depends. This approach was first proposed as an alternative to maximum likelihood estimation for estimating Gaussian graphical models [27]. To describe our approach, we focus on the Gaussian case, though this approach is more widely applicable and can be used for modeling dependencies among, e.g., Poisson, binomial, and exponential random variables as well [7, 40].
In Gaussian graphical models, the dependency of node on all other nodes can be determined based on the linear model
(4) |
The regression coefficients measure the strength of linear association between nodes and conditional upon all other nodes and are zero if and only if nodes and are conditionally independent; is an intercept term and is zero if all nodes are mean-centered. (We acknowledge a slight abuse of notation here, as the regression coefficients in (4) are not equivalent to parameters in (2). However, either estimand fully characterizes conditional independence.) In the low-dimensional setting (i.e., ), statistically efficient and asymptotically normal estimates of the regression coefficients can be readily obtained via ordinary least squares. In high-dimensions (i.e., ), the ordinary least squares estimates are inconsistent, so to obtain consistent estimates we typically rely upon regularized estimators such as the LASSO and the elastic net [33, 51]. Regularized estimators are generally biased and have intractable sampling distributions, and as such, are unsuitable for performing formal statistical inference. However, several methods have recently emerged for obtaining asymptotically normal estimates by correcting the bias of regularized estimators [20, 12, 47].
The second computationally efficient approach we consider is to estimate the density function using the score matching framework of Hyvärinen [17, 18]. Hyvärinen derives a loss function for estimation of density functions for continuous random variables that is based on the gradient of the log-density with respect to the observations. As such, the score matching loss does not depend on the log-partition function in exponential family models. Moreover, when the joint distribution for belongs to an exponential family model, the loss is quadratic in the unknown parameters, allowing for efficient computation. In low dimensions, the minimizer of the score matching loss is consistent and asymptotically normal. In high dimensions, one can obtain asymptotically normal estimates by minimizing a regularized version of the score matching loss to obtain an initial estimate [23, 44] and subsequently correcting for the bias induced by regularization [43].
2.2 Covariate-Adjusted Differential Network Analysis
We now consider the setting in which the within-group networks depend on covariates. We denote by a -dimensional random vector of covariate measurements for group , and we define as the sample space of . Let refer to the value of covariate for subject in group , and let be a -dimensional vector containing all covariates for subject in group . We assume the number of covariates is small relative to the sample size (i.e., ).
To study the dependence of the within-group networks on the covariates, we specify a model for the nodes given the covariates that allows for the inter-node dependencies to vary as a function of . The model defines a function that takes as input a vector of covariates and returns a measure of association between nodes and for a subject in group with identical covariates. One can interpret as a conditional version of , given the covariates.
We assume that can be written as a low-dimensional linear basis expansion in of dimension — that is,
(5) |
where is a map from a set of covariates to its expansion, is a -dimensional vector, and denotes the vector inner product. Let refer to the -th element of . One can take the simple approach of specifying as a linear basis, for , though more flexible choices such as polynomial or B-spline bases can also be considered. It may be preferable to specify so that is an additive function of the covariates. This allows one to easily assess the effect of any specific covariate on the network by estimating the sub-vector of that is relevant to the covariate of interest.
When the association between nodes and does not depend on group membership, for all , and . In other words, if one subject from group and another subject from group have identically-valued covariates, the corresponding measure of association between nodes and is also the same. In the covariate-adjusted setting, we say that nodes and are differentially connected if there exists such that , or equivalently, if . We can thus assess differential connectivity between nodes and by testing the null hypothesis
(6) |
Similar to the unadjusted setting, when asymptotically normal estimates of and are available, a calibrated test can be constructed based on the difference between the estimates.
We now specify a form for the conditional distribution of given as a generalization of the exponential family pairwise interaction model (2). We assume the conditional density for given can be expressed as
(7) |
where , and the proportionality is up to a normalizing constant that does not depend on . Above, is a fixed and known function, and the main effects of the covariates on are represented by the scalar parameters . The conditional dependence between nodes and , given all other nodes and given that is quantified by , and if and only if nodes and are conditionally independent at . One can thus view as a conditional version of in (7).
Either of the estimation strategies introduced in Section 2.1 can be used to perform covariate-adjusted inference. When the conditional distribution of each node given the remaining nodes and the covariates belongs to a univariate exponential family model, the covariate-dependent network can be estimated using neighborhood selection because the node conditional distributions can be estimated efficiently with likelihood-based methods. Alternatively, we can estimate the conditional density function (7) using score matching.
As a working example, we again consider estimation of covariate-dependent Gaussian networks using neighborhood selection. Suppose the conditional distribution of given takes the form
(8) |
Then the dependencies of node on all other nodes can be determined based on the following varying coefficient model [14]:
(9) |
The varying coefficient model is a generalization of the linear model that treats the regression coefficients as functions of the covariates. In (9), returns a regression coefficient that quantifies the linear relationship between nodes and for subjects in group with covariates equal to . Then and are conditionally independent given all other nodes and given if and only if . The varying coefficients can thus be viewed as a conditional version of the regression coefficients in (4). (We have again abused the notation, as the varying coefficient functions in (9) are not equal to the parameters in (8), though both functions are zero for the same values of ). The intercept term accounts for the main effect of on . We can remove this main effect term by first centering the nodes about their conditional mean given (which can be estimated by performing a linear regression of on ).
In Sections 3 and 4, we discuss construction of asymptotically normal estimators of in the low- and high-dimensional settings using neighborhood selection and score matching. Before proceeding, we first examine the connection between the null hypotheses and .
2.3 The Relationship between Hypotheses and
Hypotheses in (1) and in (6) are related but not equivalent. It is possible that holds while fails and vice versa. We provide an example below. Suppose we are using neighborhood selection to perform differential network analysis in the Gaussian setting, so we are making a comparison of linear regression coefficients between the two groups. Suppose further that the within-group networks depend on single scalar covariate , and the nodes are centered about their conditional mean given . One can show that the regression coefficients are equal to the average of their conditional versions . That is, . Now, suppose holds. If and do not share the same distribution (e.g., the covariate tends to take higher values in group than in group ), the average conditional inter-node association may differ, and may not hold. Although the conditional association between nodes, given the covariate, does not differ by group, the average conditional association does differ, as illustrated in Figure 1a. In such a scenario, the difference in the average conditional association is induced by the dependence of the covariate on group membership and the dependence of the inter-node association on the covariate. Thus, inequality of and does not necessarily capture a meaningful association between the network and group membership. Similarly when holds, it is possible that . For instance, suppose that the distribution of the covariate is the same in both groups, and in both groups. If the between-node association depends more strongly upon the covariates in one group than the other, will be false. This example is depicted in Figure 1b. In this scenario, adjusting for covariates should provide improved power to detect differential connections. We note that for other distributions, it does not necessarily hold that , but regardless, there is generally no equivalence between hypotheses and .
3 Covariate-Adjusted Differential Network Analysis Using Neighborhood Selection
In this section, we describe in detail an approach for covariate-adjusted differential network analysis using neighborhood selection. To simplify our presentation, we focus on Gaussian graphical models, though this strategy is generally applicable to graphical models for which the node conditional distributions belong to univariate exponential family models.
3.1 Covariate-Adjustment via Neighborhood Selection in Low Dimensions
We first discuss testing the unadjusted null hypothesis in (1), where the are the regression coefficients in (4). Suppose, for now, that we are in the low-dimensional setting, so the number of nodes is smaller than the sample sizes , .
It is well-known that the regression coefficients can be characterized as the minimizers of the expected least squares loss — that is,
One can obtain an estimate of by minimizing the empirical average of the least squares, taking
where denotes the norm. The ordinary least squares estimate is available in closed form and is easy to compute. The estimates are unbiased, and, under mild assumptions, are approximately normally distributed for sufficiently large — that is,
with (though can be calculated in closed form, we omit the expression for brevity).
We construct a test of based on the difference between the estimates of the group-specific regression coefficients, . When holds, is normally distributed with mean zero and variance . Given a consistent estimate of the variance, we can use the test statistic
which follows a chi-square distribution with one degree of freedom under the null for and sufficiently large. A p-value for can be calculated as
In the low-dimensional setting, performing a covariate-adjusted test is similar to performing the unadjusted test. We can obtain an estimate of by minimizing the empirical average of the least squares loss
(10) |
To simplify the presentation, we introduce additional notation that allows us to rewrite (10) in a condensed form. Let be the matrix
(11) |
We can now equivalently express (10) as
(12) |
Again, is an unbiased and approximately normal for sufficiently large , satisfying
where is a positive definite matrix of dimension (though a closed form expression is available, we omit it here for brevity).
We construct a test of based on . Under the null hypothesis, follows a normal distribution with mean zero and variance . Given a consistent estimate of , we can test using the test statistic
Under the null, the test statistic follows a chi-squared distribution with degrees of freedom, and a p-value can therefore be calculated as
3.2 Covariate-Adjustment via Neighborhood Selection in High Dimensions
The methods described in Section 3.1 are only appropriate when the number of nodes is small relative to the sample size. Model (9) has parameters, so the least squares estimator of Section 3.1 provides stable estimates as long as and are larger than . However, in the high-dimensional setting, where the the number of parameters exceeds the sample size, the ordinary least squares estimates behave poorly.
To fit the varying coefficient model (9) in the high-dimensional setting, we use a regularized estimator that relies upon an assumption of sparsity in the networks. The sparsity assumption requires that within each group only a small number of nodes are partially correlated, meaning that in (9), only a few of the vectors are nonzero. To leverage the sparsity assumption, we propose to use the group LASSO estimator [46]:
(13) |
where is a tuning parameter. The group LASSO provides a sparse estimate and sets some to be exactly zero, resulting in networks with few edges. The level of sparsity of is determined by , with higher values forcing more to zero. We discuss selection of the tuning parameter in Section 5.1.
Though the group LASSO provides a consistent estimate of , the estimate is not approximately normally distributed. The group LASSO estimate of retains a bias that diminishes at the same rate as the standard error. As a result, the group LASSO estimator has a non-standard sampling distribution that cannot be derived analytically and is therefore unsuitable for hypothesis testing.
We can obtain approximately normal estimates of by correcting the bias of , as was first proposed to obtain normal estimates for the classical -penalized version of the LASSO [12, 47]. These “de-biased” or “de-sparsified” estimators can been shown to be approximately normal with moderately large samples even in the high-dimensional setting; they are therefore suitable for hypothesis testing. Our approach is to use a de-biased version of the group LASSO. Bias correction in group LASSO problems is well studied [11, 16, 28], so we are able to perform covariate-adjusted inference by applying previously-developed methods.
The bias of the group LASSO estimate can be written as
(14) |
where is a nonzero -dimensional vector (recall is the dimension of ). Our approach is to obtain an estimate of the bias and to use a de-biased estimator, defined as
(15) |
For a suitable choice of , the bias-corrected estimator is approximately normal for a sufficiently large sample size under mild conditions, i.e.,
(16) |
where the variance is a positive definite matrix, for which we obtain an estimate . We provide a derivation for the bias-correction and the form of our variance estimate in Appendix A.
Similar to Section 2.1, we test the null hypothesis in (6) using the test statistic
(17) |
The test statistic asymptotically follows a chi-squared distribution with degrees of freedom under the null hypothesis.
4 Covariate-Adjusted Differential Network Analysis Using Score Matching
In this section, we discuss covariate-adjustment using the score matching framework introduced in Section 2. We first describe the score matching estimator in greater detail and then specialize the framework to estimation of pairwise exponential family graphical models in the low- and high dimensional settings. As shown later in this section, for exponential family distributions with continuous support, the score matching loss function is a quadratic function of parameters, providing a computationally-efficient framework for estimating graphical models.
4.1 The Score Matching Framework
We begin by providing a brief summary of the score matching framework [17, 18]. Let be a random vector generated from a distribution with density function . For any candidate density , we denote the gradient and Laplician of the log-density by
The score matching loss is defined as a measure of divergence between a candidate density function and the true density :
(18) |
It is apparent that the score matching loss is minimized when . A natural approach to constructing an estimator for would then be to minimize the empirical score matching loss given observations , defined as
Because the score matching loss function takes as input the gradient of the log density function, the loss does not depend on the normalizing constant. This makes score matching appealing when the normalizing constant is intractable.
The empirical loss seemingly depends on prior knowledge of . However, if and both tend to zero as approaches the boundary of , a partial integration argument can be used to show that the score matching loss can be expressed as
(19) |
where ‘const.’ is a term that does not depend on . We can therefore estimate by minimizing an empirical version of the score matching loss that does not depend on . We can express the empirical loss as
The score matching loss is particularly appealing for exponential family distributions with continuous support, as it leads to a quadratic optimization function [23]. However, when is non-negative, the arguments used to express (18) as (19) fail because and do not approach zero at the boundary. We can overcome this problem by instead considering the generalized score matching framework [44, 18] as an extension that is suitable for non-negative data. Let be positive and differentiable functions, let , let denote the derivative of , and let denote the element-wise product operator. The generalized score matching loss is defined as
(20) |
and is also minimized when . As for the original score matching loss (18), the generalized score matching loss seemingly depends on prior knowledge of . However, under mild technical conditions on and (see Appendix B.1), the loss in (20) can be rewritten as
(21) |
The generalized score matching loss thus no longer depends on , and an estimator can be constructed by minimizing the empirical version of (21) with respect to . To this end, the original generalized score matching estimator considered [18]. In this case, it becomes necessary to estimate high moments of , leading to poor performance of the estimator. It has been shown that by instead taking as a slowly increasing function, such as , one obtains improved theoretical results and better empirical performance [44].
4.2 Covariate-Adjustment in High-Dimensional Exponential Family Models via Score Matching
In this sub-section, we discuss construction of asymptotically normal estimators for the parameters of the exponential family pairwise interaction model (7) using the generalized score matching framework. To simplify our presentation, we consider the setting in which we are only interested in studying the connectedness between one node and all other neighboring nodes in the network. To this end, it suffices to estimate the conditional density of given all other nodes and the covariates . A similar approach to the one we describe below can also be used to estimate the entire joint density (7). For simplicity, we assume that in (7), there exist functions and such that for all and for all , and that . For and the conditional density can thus be expressed as
(22) |
where the density is up to a normalizing constant that does not depend on .
We first explicitly define the score matching loss for the conditional density function (22). Let , and similarly let . Let and denote the first and second derivatives of with respect to , and similarly, let and denote the first and second derivatives of with respect to . We define a non-negative function , and let denote the first derivative of . Then for candidate parameters and , the empirical generalized score matching loss for the conditional density of given all other nodes and the covariates can be expressed as
(23) |
The true parameters and can characterized as the minimizers of the population score matching loss , as discussed in Section 4.1.
The loss function in (23) is quadratic in parameters and and can thus be solved efficiently. When the sample size is much larger than the number of unknown parameters , one can estimate and by simply minimizing with respect to the unknown parameters. The empirical loss function is quadratic in , so the minimizer of the loss is available in closed form and can be computed efficiently. Moreover, we can readily establish asymptotic normality of the parameter estimates using results from classical M-estimation theory [34]. To avoid including cumbersome notation, we reserve the details for Appendix B.2.
When the sample size is smaller than the number of parameters, the minimizer of is no longer consistent. Similar to Section 3.2, we use regularization to obtain a consistent estimator in the high-dimensional setting. We define the -regularized generalized score matching estimator as
(24) |
where is a tuning parameter. Similar to the group LASSO estimator (13), the regularization term in (24) induces sparsity in the estimate and sets some to be exactly zero. The tuning parameter controls the level of sparsity, where more vectors are zero for higher . In Appendix B.3, we establish consistency of the regularized score matching estimator assuming sparsity of and some additional regularity conditions.
As is the case for the group LASSO estimator, the regularized score matching estimator has an intractable limiting distribution because its bias and standard error diminish at the same rate. We can obtain an asymptotically normal estimate by subtracting from the initial estimate an estimate of the bias. In Appendix B.4, we construct such a bias-corrected estimate that, for sufficiently large , satisfies
for a positive definite matrix . Given bias-corrected estimates and a consistent estimate of , we can test the null hypothesis (6) using the test statistic
Under the null hypothesis, the test statistic follows a chi-squared distribution with degrees of freedom.
5 Numerical Studies
In this section, we examine the performance of our proposed test in a simulation study. We consider the neighborhood selection approach described in Section 3. Our simulation study has three objectives: (1) to assess the stability of our estimators for the covariate-dependent networks, (2) to examine the effect of sample size on statistical power and type-I error control, and (3) to illustrate that failing to adjust for covariates can in some settings result in poor type-I error control or reduced statistical power.
5.1 Implementation
We first discuss implementation of the neighborhood selection approach. The group LASSO estimate (13) does not exist in closed form, in contrast to the ordinary least squares estimate (12). To solve (13), we use the efficient algorithm implemented in the publicly available R package gglasso [42].
The group LASSO estimator requires selection of a tuning parameter , which controls the sparsity of the estimate. We select the tuning parameter by performing -fold cross-validation, using folds. Since the selection of is sensitive to the scale of the columns of in (11), we scale the columns by their standard deviations prior to cross-validating. After fitting the group LASSO with the selected tuning parameter, we convert the estimates back to their original scale by dividing the estimates by the standard deviations of the columns of .
5.2 Simulation Setting
In what follows, we describe our simulation setting. In short, we generate data from the varying coefficient model (9), where we treat nodes through as predictors, and treat node as the response. We first randomly generate data for nodes through in groups and from the same multivariate normal distribution. We then construct and generate data for two covariates so that one covariate acts as a confounding variable, and the other covariate should improve statistical power to detect differential associations after adjustment.
To simulate data for nodes through , we first generate a random graph with nodes and an edge density of .05 from a power law distribution with power parameter 5 [30]. Denoting the edge set of the graph by , we generate the matrix as
with . Defining by the smallest eigenvalue of , we set , where is the identity matrix. We then draw from a multivariate normal distribution with mean zero and covariance for for each group .
We generate from a distribution and from a distribution. We center and scale both and to the interval. We generate and each from a Uniform distribution.
We consider two different choices for the varying coefficient functions :
-
•
Linear Polynomial:
and for .
-
•
Cubic Polynomial:
and for .
The first covariate confounds the association between nodes and 1. The distribution of depends on group membership, and affects the association between genes and . However, for all . Thus, in (6) holds while in (1) fails, as depicted in Figure 1a. Failing to adjust for should therefore result in an inflated type-I error rate for the hypothesis . Adjusting for the second covariate should improve the power to detect the differential connection between nodes and . We have constructed so that , though the association between nodes and 2 depends more strongly on in group than in group . Thus, holds while fails, as depicted in Figure 1b. The association between nodes and does not depend on either covariate, though the association differs by group. Thus, one should be able to identify a differential connection using either the adjusted or unadjusted test. Node is conditionally independent of all other nodes in both groups.
For , we generate as
where follows a normal distribution with zero mean and unit variance. We use balanced sample sizes and consider . We set the number of nodes . The graph for nodes through contains 15 edges. Leaving fixed, we generate 400 random data sets following the above approach.
We consider two choices of the basis expansion :
-
1.
Linear basis: ;
-
2.
Cubic polynomial basis: .
Using a linear basis, , and model (9) has parameters. With the cubic polynomial basis, , and there are parameters.
We compare our proposed methodology with the approach for differential network analysis without covariate adjustment described in Section 3.1. In the unadjusted analysis, ordinary least squares estimation is justified because although is large with respect to , is smaller than .
5.3 Simulation Results
Figure 2 shows the Monte Carlo estimates of the expected error for the de-biased group LASSO estimates , , for . We only report the error when the basis is correctly specified for the varying coefficient function — that is, when is linear basis, and is a linear function or when is a cubic basis, and is a cubic function. In both the linear and cubic polynomial settings, the average estimation error for decreases with the sample size for all , as expected. We also find that in small samples, the estimation error is substantially lower in the linear setting than in the cubic setting. This suggests that estimates are less stable in more complex models.
In Table 1, we report Monte Carlo estimates of the probability of rejecting , the null hypothesis that nodes and are not differentially connected given , for , , , and , using both the adjusted and unadjusted tests at the significance level . As the purpose of the simulation study is to examine the behavior of the edge-wise test, we do not perform a multiple testing correction.
For (i.e., when fails, but holds), the unadjusted test is anti-conservative, and the probability of falsely rejecting increases with the sample size. When an adjusted test is performed using a linear basis, and when is linear, the type-I error rate is slightly inflated but appears to approach the nominal level of as the sample size increases. However, when is a cubic function, and the linear basis is mis-specified, the type-I error rate is inflated, though it is still slightly lower than that of unadjusted test. For both specifications of , the covariate-adjusted test controls the type-I error rate near the nominal level when a cubic polynomial basis is used. For , (i.e., when holds, but fails), the unadjusted test exhibits low power to detect differential associations. The adjusted test provides greatly improved power when either a linear or cubic basis is used. For , (i.e., when both and fail), the unadjusted test and both adjusted tests are well-powered against the null. For (i.e., when genes and are conditionally independent in both groups), the unadjusted test and the adjusted test with a linear basis both control the type-I error near the nominal level. However, the covariate-adjusted test is conservative when a cubic basis is used.
Unadjusted | Linear Adjustment | Cubic Adjustment | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Linear | 0.15 | 0.278 | 0.385 | 0.13 | 0.09 | 0.072 | 0.04 | 0.062 | 0.05 | |
0.042 | 0.078 | 0.045 | 0.27 | 0.532 | 0.73 | 0.08 | 0.27 | 0.52 | ||
0.48 | 0.912 | 0.988 | 0.605 | 0.922 | 0.965 | 0.218 | 0.738 | 0.902 | ||
0.052 | 0.054 | 0.053 | 0.045 | 0.048 | 0.048 | 0.009 | 0.017 | 0.025 | ||
Cubic | 0.21 | 0.505 | 0.668 | 0.358 | 0.315 | 0.342 | 0.07 | 0.055 | 0.07 | |
0.052 | 0.068 | 0.082 | 0.6 | 0.882 | 0.975 | 0.195 | 0.73 | 0.93 | ||
0.408 | 0.84 | 0.978 | 0.55 | 0.898 | 0.982 | 0.202 | 0.772 | 0.945 | ||
0.056 | 0.05 | 0.054 | 0.053 | 0.054 | 0.052 | 0.009 | 0.02 | 0.027 |
The simulation results corroborate our expectations and suggest that there are potential benefits to covariate adjustment. We find that when the sample size is large, the covariate-adjusted test behaves reasonably well with either choice of basis function. However, in small samples, the covariate-adjusted test is somewhat imprecise, and the type-I error rate can be slightly above or below the nominal level. Practitioners should therefore exercise caution when using our proposed methodology in very small samples.
6 Data Example
Breast cancer classification based on expression of estrogen receptor hormone (ER) is prognostic of clinical outcomes. Breast cancers can be classified as estrogen receptor positive (ER+) and estrogen receptor negative (ER-), with approximately 70% of breast cancers being ER+ [25]. In ER+ breast cancer, the cancer cells require estrogen to grow; this has been shown to be associated with positive clinical outcomes, compared with ER- breast cancer [6]. Identifying differences between the biological pathways of ER+ and ER- breast cancers can be helpful for understanding the underlying disease mechanisms.
It is has been shown that age is associated with ER status and that age can be associated with gene expression [22, 41]. This warrants consideration of age as an adjustment variable in a comparison of gene co-expression networks between ER groups.
We perform an age-adjusted differential analysis of the ER+ and ER- breast cancer networks, using publicly available data from The Cancer Genome Atlas (TCGA) [37]. We obtain clinical measurements and gene expression data from a total of 806 ER+ patients and 237 ER- patients. We consider the set of genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) breast cancer pathway [21], and adjust for age as our only covariate. The average age in the ER+ plus group is 59.3 years (SD = 13.3), and the average age in the ER- group is 55.9 years (SD = 12.4). We use a linear basis for covariate adjustment. In the ER+ group, the sample size is considerably larger than the number of the parameters, so we can fit the varying coefficient model (9) using ordinary least squares. We use the de-biased group LASSO to estimate the network for the ER- group because the sample size is smaller than the number of model parameters. We compare the results from the covariate-adjusted analysis with the unadjusted approach described in Section 3.1.
To assess for differential connectivity between any two nodes and , we can either treat node or node as the response in the varying coefficient model (9). We can then test either of the hypotheses or . Our approach is to set our p-value for the test for differential connectivity between nodes and as the minimum of the p-values for the tests of and , though we acknowledge that this strategy is anti-conservative.
Our objective is to identify all pairs of differentially connected genes, so we need to adjust for the fact that we perform a separate hypothesis test for each gene pair. We account for multiplicity by controlling the false discovery rate at the level using the Benjamini-Yekutieli method [3].

The differential networks obtained from the unadjusted and adjusted analyses are substantially different. We report 106 differentially connected edges from the adjusted analysis (shown in Figure 3), compared to only two such edges from the unadjusted analysis. This suggests it is possible that relationship between the gene co-expression network and age differs by ER group.
7 Discussion
In this paper, we have addressed challenges that arise when performing differential network analysis [32] in the setting where the network depends on covariates. Using both synthetic and real data, we showed that accounting for covariates can result in better control of type-I error and improved power.
We propose a parsimonious approach for covariate adjustment in differential network analysis. A number of improvements and extensions can be made to our current work. First, while this paper focuses on differential network analysis in exponential family models, our framework can be applied to other models where conditional dependence between any pair of nodes can be represented by a single scalar parameter. This includes semi-parametric models such as the nonparanormal model [24], as well as distributions defined over complex domains, which can be modeled using the generalized score matching framework [45]. Additionally, we only discuss testing edge-wise differences between the networks, though testing differences between sub-networks may also be of interest. When the sub-networks are low-dimensional, one can construct a chi-squared test using similar test statistics as presented in Section 3 and Section 4 because joint asymptotic normality of a low-dimensional set of the estimators can be readily established. Such an approach is not applicable to high-dimensional sub-networks, but it may be possible to construct a calibrated test using recent results on simultaneous inference in high-dimensional models [48, 43]. We can also improve the statistical efficiency of the network estimates by considering joint estimation procedures that borrow information across groups [13, 8, 31]. Finally, we assume that the relationship between the network and the covariates can be represented by a low-dimensional basis expansion. Investigating nonparametric approaches that relax this assumption can be a fruitful area of research.
Funding
The authors gratefully acknowledge the support of the NSF Graduate Research Fellowship Program under grant DGE-1762114 as well as NSF grant DMS-1561814 and NIH grant R01-GM114029. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.
Availability of Data
This findings of this paper are supported by data from The Cancer Genome Atlas, which are accessible using the publicly available R package RTCGA.
Code Availability
An implementation of the proposed methodology is available at https://github.com/awhudson/CovDNA.
References
- [1] Barabási, A.L., Gulbahce, N., Loscalzo, J.: Network medicine: a network-based approach to human disease. Nature Reviews Genetics 12(1), 56–68 (2011)
- [2] Belilovsky, E., Varoquaux, G., Blaschko, M.B.: Testing for differences in Gaussian graphical models: Applications to brain connectivity. In: Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016)
- [3] Benjamini, Y., Yekutieli, D.: The control of the false discovery rate in multiple testing under dependency. Annals of Statistics pp. 1165–1188 (2001)
- [4] Breheny, P., Huang, J.: Penalized methods for bi-level variable selection. Statistics and its Interface 2(3), 369 (2009)
- [5] Bühlmann, P., van de Geer, S.: Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media (2011)
- [6] Carey, L.A., Perou, C.M., Livasy, C.A., Dressler, L.G., Cowan, D., Conway, K., Karaca, G., Troester, M.A., Tse, C.K., Edmiston, S., et al.: Race, breast cancer subtypes, and survival in the carolina breast cancer study. Journal of the American Medical Association 295(21), 2492–2502 (2006)
- [7] Chen, S., Witten, D.M., Shojaie, A.: Selection and estimation for mixed graphical models. Biometrika 102(1), 47–64 (2015)
- [8] Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B 76(2), 373–397 (2014)
- [9] Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
- [10] de la Fuente, A.: From ‘differential expression’to ‘differential networking’–identification of dysfunctional regulatory networks in diseases. Trends in Genetics 26(7), 326–333 (2010)
- [11] van de Geer, S.: Estimation and testing under sparsity. Lecture Notes in Mathematics 2159 (2016)
- [12] van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R.: On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics 42(3), 1166–1202 (2014)
- [13] Guo, J., Levina, E., Michailidis, G., Zhu, J.: Joint estimation of multiple graphical models. Biometrika 98(1), 1–15 (2011)
- [14] Hastie, T., Tibshirani, R.: Varying-coefficient models. Journal of the Royal Statistical Society: Series B 55(4), 757–779 (1993)
- [15] He, H., Cao, S., Zhang, J.g., Shen, H., Wang, Y.P., Deng, H.: A statistical test for differential network analysis based on inference of Gaussian graphical model. Scientific Reports 9(1), 1–8 (2019)
- [16] Honda, T.: The de-biased group lasso estimation for varying coefficient models. Annals of the Institute of Statistical Mathematics pp. 1–27 (2019)
- [17] Hyvärinen, A.: Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research 6(24), 695–709 (2005)
- [18] Hyvärinen, A.: Some extensions of score matching. Computational Statistics & Data Analysis 51(5), 2499–2512 (2007)
- [19] Ideker, T., Krogan, N.J.: Differential network biology. Molecular Systems Biology 8(1) (2012)
- [20] Javanmard, A., Montanari, A.: Confidence intervals and hypothesis testing for high-dimensional regression. Journal of Machine Learning Research 15(1), 2869–2909 (2014)
- [21] Kanehisa, M., Goto, S.: Kegg: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28(1), 27–30 (2000)
- [22] Khan, S.A., Rogers, M.A., Khurana, K.K., Meguid, M.M., Numann, P.J.: Estrogen receptor expression in benign breast epithelium and breast cancer risk. Journal of the National Cancer Institute 90(1), 37–42 (1998)
- [23] Lin, L., Drton, M., Shojaie, A.: Estimation of high-dimensional graphical models using regularized score matching. Electronic Journal of Statistics 10(1), 806–854 (2016)
- [24] Liu, H., Lafferty, J., Wasserman, L.: The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research 10(80), 2295–2328 (2009)
- [25] Lumachi, F., Brunello, A., Maruzzo, M., Basso, U., Mm Basso, S.: Treatment of estrogen receptor-positive breast cancer. Current Medicinal Chemistry 20(5), 596–604 (2013)
- [26] Maathuis, M., Drton, M., Lauritzen, S., Wainwright, M.: Handbook of graphical models. CRC Press (2018)
- [27] Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. The Annals of Statistics 34(3), 1436–1462 (2006)
- [28] Mitra, R., Zhang, C.H.: The benefit of group sparsity in group inference with de-biased scaled group lasso. Electronic Journal of Statistics 10(2), 1829–1873 (2016)
- [29] Negahban, S.N., Ravikumar, P., Wainwright, M.J., Yu, B.: A unified framework for high-dimensional analysis of -estimators with decomposable regularizers. Statistical Science 27(4), 538–557 (2012)
- [30] Newman, M.E.: The structure and function of complex networks. SIAM Review 45(2), 167–256 (2003)
- [31] Saegusa, T., Shojaie, A.: Joint estimation of precision matrices in heterogeneous populations. Electronic Journal of Statistics 10(2), 1341–1392 (2016)
- [32] Shojaie, A.: Differential network analysis: A statistical perspective. Wiley Interdisciplinary Reviews: Computational Statistics p. e1508 (2020)
- [33] Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B 58(1), 267–288 (1996)
- [34] van der Vaart, A.W.: Asymptotic statistics, vol. 3. Cambridge University Press (2000)
- [35] Wang, H., Xia, Y.: Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association 104(486), 747–757 (2009)
- [36] Wang, J., Kolar, M.: Inference for sparse conditional precision matrices. arXiv preprint arXiv:1412.7638 (2014)
- [37] Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R.M., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M.: The cancer genome atlas pan-cancer analysis project. Nature Genetics 45(10), 1113–1120 (2013)
- [38] Xia, Y., Cai, T., Cai, T.T.: Testing differential networks with applications to the detection of gene-gene interactions. Biometrika 102(2), 247–266 (2015)
- [39] Xia, Y., Cai, T., Cai, T.T.: Two-sample tests for high-dimensional linear regression with an application to detecting interactions. Statistica Sinica 28, 63–92 (2018)
- [40] Yang, E., Ravikumar, P., Allen, G.I., Liu, Z.: Graphical models via univariate exponential family distributions. Journal of Machine Learning Research 16(1), 3813–3847 (2015)
- [41] Yang, J., Huang, T., Petralia, F., Long, Q., Zhang, B., Argmann, C., Zhao, Y., Mobbs, C.V., Schadt, E.E., Zhu, J., et al.: Synchronized age-related gene expression changes across multiple tissues in human and the link to complex diseases. Scientific Reports 5(1), 1–16 (2015)
- [42] Yang, Y., Zou, H.: A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing 25(6), 1129–1141 (2015)
- [43] Yu, M., Gupta, V., Kolar, M.: Simultaneous inference for pairwise graphical models with generalized score matching. Journal of Machine Learning Research 21(91), 1–51 (2020)
- [44] Yu, S., Drton, M., Shojaie, A.: Generalized score matching for non-negative data. Journal of Machine Learning Research 20(76), 1–70 (2019)
- [45] Yu, S., Drton, M., Shojaie, A.: Generalized score matching for general domains. Information and Inference: A Journal of the IMA (2021)
- [46] Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B 68(1), 49–67 (2006)
- [47] Zhang, C.H., Zhang, S.S.: Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B 76(1), 217–242 (2014)
- [48] Zhang, X., Cheng, G.: Simultaneous inference for high-dimensional linear models. Journal of the American Statistical Association 112(518), 757–768 (2017)
- [49] Zhao, S.D., Cai, T.T., Li, H.: Direct estimation of differential networks. Biometrika 101(2), 253–268 (2014)
- [50] Zhou, S., Lafferty, J., Wasserman, L.: Time varying undirected graphs. Machine Learning 80(2), 295–319 (2010)
- [51] Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B 67(2), 301–320 (2005)
Appendix
A De-biased Group LASSO Estimator
In this subsection, we derive a de-biased group LASSO estimator. Our construction is essentially the same as the one presented in van de Geer [11].
With as defined in (11), let be an dimensional matrix. For , let , let , and let denote the sub-gradient of . We can express the sub-gradient as where if , and is otherwise a vector with norm less than one. The KKT conditions for the group LASSO imply that the estimate satisfies
With some algebra, we can rewrite this as
Let be defined as the matrix
and let be an estimate of . We can write as
(25) |
The first term in (25) is an approximation for the bias of the group LASSO estimate. This term is a function only of the observed data and not of any unknown quantities. This term can therefore be directly added to the initial estimate . If is a consistent estimate of , the second term is asymptotically equivalent to
Thus, is asymptotically equivalent to a sample average of mean zero i.i.d. random variables. The central limit theorem can then be applied to establish convergence in distribution to the multivariate normal distribution at an rate for any low-dimensional sub-vector. The third term will also be asymptotically negligible if is an approximate inverse of . This would suggest that an estimator of the form
will be asymptotically normal for an appropriate choice of .
Before describing our construction of , we find it helpful to consider an alternative expression for . We define the matrices as
(26) |
We also define the matrix as
It can be shown that can be expressed as
We can thus estimate by performing a series of regressions to estimate each matrix .
Following the approach of van de Geer et al. [12], we use a group LASSO variant of the nodewise LASSO to construct . To proceed, we require some additional notation. For any matrix for dimensional vectors , let . Let be the subgradient of . We use the group LASSO to obtain estimates of :
(27) |
We then estimate as
Our estimate takes the form
With this construction of , we can establish a bound on the remainder term in (25). To show this, we make use of the following lemma, which states a special case of the dual norm inequality for the group LASSO norm (see, e.g., Chapter 6 of van de Geer [11]).
Lemma 1.
Let and be -dimensional vectors, and let and be -dimensional vectors. Then
The KKT conditions for (27) imply that for all
(28) |
Lemma 1 and (28) imply that
where is the norm. With , can be shown to be consistent under sparsity of (i.e., only a few matrices have some nonzero columns) and some additional regularity conditions. Additionally, it can be shown under sparsity of (i.e., very few vectors are nonzero) and some additional regularity conditions that . Thus, a scaled version of the remainder term (iii) is if . We refer readers to Chapter 8 of Bühlmann and van de Geer [5] for a more comprehensive discussion of assumptions required for consistency of the group LASSO.
We now express the de-biased group LASSO estimator for as
(29) |
We have established that can be written as
As stated above, the central limit theorem implies asymptotic normality of .
We now construct an estimate for the variance of . Suppose the residual is independent of , and let denote the residual variance
We can approximate the variance of as
(30) |
As is typically unknown, we instead us the estimate
where is an estimate of the degrees of freedom for the group LASSO estimate . In our implementation, we use the estimate proposed by Breheny and Huang [4].
B Generalized Score Matching Estimator
In this section, we establish consistency of the regularized score matching estimator and derive a bias-corrected estimator.
B.1 Form of Generalized Score Matching Loss
B.2 Generalized Score Matching Estimator in Low Dimensions
In this section, we provide an explicit form for the generalized score matching estimator in the low-dimensional setting and state its limiting distribution. We first introduce some additional notation below that allows for the generalized score matching loss to be written in a condensed form. Recall the form of the conditional density for the pairwise interaction model in (22). We define
Let for and for . We can express the empirical score matching loss (23) as
We write the gradient of the risk function as
Thus, the minimizer of the empirical loss takes the form
By applying Theorem 5.23 of van der Vaart [34],
where the matrices and are defined as
We estimate the variance of as , where
B.3 Consistency of Regularized Generalized Score Matching Estimator
In this subsection, we argue that the regularized generalized score matching estimators and from (24) are consistent. Let . We establish convergence rates of and . Our approach is based on proof techniques described in Bühlmann and van de Geer [5].
Our result requires a notion of compatibility between the penalty function and the loss . Such notions are commonly assumed in the high-dimensional literature. Below, we define the compatibility condition.
Definition 1 (Compatibility Condition).
Let be a set containing indices of the nonzero elements of , and let denote the complement of . Let be a -dimensional vector where the -th element is one if , and zero otherwise. The group LASSO compatibility condition holds for the index set and for constant if for all ,
where is the element-wise product operator.
Theorem 2.
Let be the set
for some . Suppose the compatibility condition also holds. Then on the set ,
Proof of Theorem 2.
The regularized score matching estimator necessarily satisfies the following basic inequality:
With some algebra, this inequality can be rewritten as
By Lemma 1, on the set and using we get
On the left hand side, we apply the triangle inequality to get
On the right hand side, we observe that
We then have
Now,
where we use the compatiblility condition for the first inequality, and for the second inequality use the fact that
for any . The conclusion follows immediately. ∎
If the event occurs with probability tending to one, Theorem 2 implies
We select so that the event occurs with high probability. For instance, suppose the elements of the matrix
are sub-Gaussian, and consider the event
where is the norm. Observing that , it is only necessary to show that holds with high probability. It is shown in Corollary 2 of Negahban et al. [29] that there exist constants such that with , holds with probability at least . Thus, occurs with probability tending to one as . For distributions with heavier tails, a larger choice of may be required [44].
B.4 De-biased Score Matching Estimator
The KKT conditions for the regularized score matching loss imply that the estimator satisfies
With some algebra, we can rewrite the KKT conditions as
Now, let be the matrix
let , and let be an estimate of . We can now rewrite the KKT conditions as
(31) |
As is the case for the de-biased group LASSO in Appendix A, the first term in (31) depends only on the observed data and can be directly subtracted from the initial estimate. The second term is asymptotically equivalent to
(32) |
if is a consistent estimate of . Using the fact that , it can be seen that (32) is an average of i.i.d. random quantities with mean zero. The central theorem then implies that any low-dimensional sub-vector is asymptotically normal. The last term is asymptotically negligible if is an approximate inverse of and if is consistent for . Thus, for an appropriate choice of , we expect asymptotic normality of an estimator of the form
Before constructing , we first provide an alternative expression for . We define the matrices and as
We also define the matrices as
Additionally, we define the matrices and
It can be shown that can be expressed as
We can thus estimate by estimating each of the matrices , , and .
Similar to our discussion of the de-biased group LASSO in Appendix A, we use a group-penalized variant of the nodewise LASSO to construct . We estimate and as
where are tuning parameters, and is as defined in Appendix A. We estimate as
Additionally, we define the matrices and
We then take as
When , , and satisfy appropriate sparsity conditions and some additional regularity assumptions, is a consistent estimate of for and (see, e.g., Chapter 8 of Bühlmann and van de Geer [5] for a more comprehensive discussion). Using the same argument presented in Appendix A, we are able to obtain the following bound on a scaled version of the remainder term :
The remainder is and hence asymptotically negligible if , where is the tuning parameter for the regularized score matching estimator (see Theorem 2).
The de-biased estimate of can be expressed as
(33) |
The difference between the de-biased estimator and the true parameter can be expressed as
As discussed above, the central limit theorem implies asymptotic normality of . We can estimate the asymptotic variance of as
where we define