Statistical Inference for Qualitative Interactions with Applications to Precision Medicine and Differential Network Analysis
Abstract
Qualitative interactions occur when a treatment effect or measure of association varies in sign by sub-population. Of particular interest in many biomedical settings are absence/presence qualitative interactions, which occur when an effect is present in one sub-population but absent in another. Absence/presence interactions arise in emerging applications in precision medicine, where the objective is to identify a set of predictive biomarkers that have prognostic value for clinical outcomes in some sub-population but not others. They also arise naturally in gene regulatory network inference, where the goal is to identify differences in networks corresponding to diseased and healthy individuals, or to different subtypes of disease; such differences lead to identification of network-based biomarkers for diseases. In this paper, we argue that while the absence/presence hypothesis is important, developing a statistical test for this hypothesis is an intractable problem. To overcome this challenge, we approximate the problem in a novel inference framework. In particular, we propose to make inferences about absence/presence interactions by quantifying the relative difference in effect size, reasoning that when the relative difference is large, an absence/presence interaction occurs. The proposed methodology is illustrated through a simulation study as well as an analysis of breast cancer data from the Cancer Genome Atlas.
1 Introduction
An objective of many biomedical studies is to identify and test for interactions, which arise when a measure of effect or association between variables differs by sub-population. Precision medicine and genetic network inference provide examples of areas in which interactions are of interest. For instance, researchers in precision medicine seek to understand how patients’ characteristics are associated with heterogeneity in response to treatment. In genetics studies, it is of interest to determine how genetic networks, which summarize the associations between genes, depend on phenotype.
Interactions may lack clinical or scientific significance when differences in effect are small. In addition to detecting interactions, it is important to identify which are meaningful. For example, in precision medicine, the most important differences in treatment effect may be those in which some sub-populations of patients benefit from the treatment, while other sub-populations are harmed or unaffected. Additionally, one may want to identify differences among sub-populations in the set of biomarkers that have prognostic value for a health outcome — that is, to determine whether some biomarkers are predictive of the outcome in only a subset of the full population. Genetic network inference provides another example: When comparing sub-population level genetic networks, it may be of primary interest to identify pairs of genes that share an association in some sub-populations but share no association in others, or to identify pairs that have a positive association in one sub-population and a negative association another. This is known as differential network biology (Ideker and Krogan, 2012).
Such qualitative interactions are the focus of this paper. Qualitative interactions occur when a measure of effect differs in sign by sub-population. We consider two types of qualitative interactions: positive/negative interactions — also known in the literature as cross-over interactions (Gail and Simon, 1985) — and absence/presence interactions — sometimes referred to as pure interactions (VanderWeele, 2019). Positive/negative interactions occur when an effect is positive in one sub-population and negative in another, and absence/presence interactions occur when the effect is present in one population but absent in another.
Our objective is to formally test for qualitative interactions, given independent samples from each sub-population. Testing for positive/negative interactions is well-studied (Gail and Simon, 1985; Piantadosi and Gail, 1993; Pan and Wolfe, 1997; Silvapulle, 2001; Li and Chan, 2006), while testing for absence/presence interactions has received substantially less attention. Naïve approaches, to be discussed in the sequel, require an untenable minimum signal strength condition — that if an effect is present in any sub-population, it is large enough to be detected with absolute certainty. No approaches exist, to the best of our knowledge, that obviate this assumption.
In this paper, we propose a novel framework for inference about absence/presence interactions. Our proposed methodology allows for well-calibrated hypothesis testing under mild assumptions. We also introduce a numerical summary that measures the strength of absence/presence interactions, while accounting for the uncertainty associated with parameter estimation. Additionally, we describe methods for simultaneous inference about absence/presence and positive/negative interactions. The methodology we introduce provides an effective and flexible inference tool in precision medicine and genetic network analysis, as we illustrate in simulations and an analysis of breast cancer data from The Cancer Genome Atlas (TCGA).
2 Background
2.1 Notation
As we begin to formalize the problem, we first introduce some notation. We consider two sub-populations, labeled by . Let denote a measure of association in sub-population . When convenient, we write . We can consider various measures of association, such as: correlation coefficients, indicating the strength of linear relationship between two variables of interest; log odds ratios, describing the relationship between predictors and a binary outcome; and log hazard ratios, describing the association between predictors and a time-to-event outcome.
We assume that given i.i.d. samples of size and from each sub-population, -consistent and asymptotically normal estimates of are available, i.e.,
with denoting the asymptotic variance. For expositional simplicity, we assume balanced sample sizes (key results are stated more generally in the Appendix). We also assume is known, though we can instead use a consistent estimate, as is commonly done in practice.
We now formally state the null hypotheses of no positive/negative interactions and no absence/presence interaction, labeled and , respectively:
(1) | |||
(2) |
We let and denote the corresponding null and alternative regions of the parameter space, depicted in Figure 1. (Recall that the null region is the set of parameters such that the null hypothesis holds, and the alternative region is the complement of the null region.) The positive/negative null region is the union of the the non-negative and non-positive orthants, and the absence/presence null region is the union of all open orthants and the origin.
2.2 Testing Composite Null Hypotheses
Our goal is to use the estimate to perform tests of and such that the size is controlled asymptotically under mild assumptions. Recall that for a null hypothesis with accompanying null region , the size of a test is defined as
In words, the size is the largest possible type-I error rate that could be achieved under any probability distribution given that belongs to the null region.
Here, we describe our general approach for controlling the size at a pre-specified level . We first define a test statistic , a map from observable data to a real-valued number, with larger values of corresponding to more evidence against the null hypothesis. We then calculate the test statistic on the observed data, which we denote by . We write as the probability of observing a random test statistic at least as large as the observed value , assuming . We reject the null hypothesis if
One can think of as the p-value under a specific null distribution ; is then the largest of all such p-values. We reject when there is sufficient evidence to reject all hypotheses . We can view as a generalization of the usual p-value for simple null hypotheses to tests with composite null hypotheses, and will simply refer to as “p-value”. Tests of the above form are guaranteed to control the size (Casella and Berger, 2002).
2.3 Existing Methodology
We now review existing approaches to test for qualitative interactions. We first discuss testing positive/negative interactions before moving to absence/presence interactions.
Gail and Simon (1985) developed the most widely used procedure to test for positive/negative interactions. Though Gail and Simon proposed a general -sample test, we focus on the two-sample problem in this paper. We note that various -sample tests for positive/negative interactions have been proposed (Piantadosi and Gail, 1993; Silvapulle, 2001; Li and Chan, 2006), but these procedures are essentially equivalent to the Gail-Simon test in the two-sample setting.
Gail and Simon’s approach is to perform a likelihood ratio test based on the asymptotic sampling distribution of . The likelihood ratio test rejects for large values of
where is the standard normal density. By performing algebreic manipulations, one can show that the likelihood ratio test equivalently rejects the null for large values of
The test statistic can be interpreted as the shortest distance between and the null region, where the distance is inversely weighted by the asymptotic variances of the estimates, as illustrated in Figure 2.
Gail and Simon show that the test statistic can be calculated as
where denotes the indicator function. Furthermore, one can verify that for all , the p-value is easily calculated as
The likelihood ratio test is quite intuitive and rejects the null when a positive estimate of association is observed in one population, a negative estimate is observed in another population, and both associations are statistically significant.
Now, we discuss approaches to test for absence/presence interactions. While one might be tempted to perform a likelihood ratio test for absence/presence interactions, the likelihood ratio test fails in the sense that it never can reject the null. To see this, we first recognize that, similar to the positive/negative interaction test, the absence/presence likelihood ratio test would reject for large values of
Again, the test statistic is the shortest distance between and the null region . Because the alternative region has zero area, lies in the null region with probability one. Therefore, the test statistic is always 0, and the likelihood ratio test has no power.
One might alternatively attempt to test the absence/presence null by separately testing the null hypotheses for and rejecting the absence/presence null when is rejected for one and not rejected for the other. To control the size of a test of this form, we need to simultaneously control the type-I error under two scenarios: (1) there is an association in both sub-populations, and (2) there is no association in either population. When there is an association in both populations, a type-I error occurs when we incorrectly fail to reject one of . When there is no association in either population, we make a type-I error when we incorrectly reject one of . Thus, controlling the size of the test for using this approach requires simultaneous control of the type-I error rate and type-II error rate of tests for . If the tests for are consistent — that is, the type-II error rates tend to zero with sufficiently large samples — this approach is asymptotically valid, as only the type-I error rates for tests of need to be controlled. However, with even moderately large samples, we will not be able to correctly reject false with absolute certainty unless the true association is strong and hence easy to detect. This would make the test of unreliable in the presence of weak signal.
Specifically, we require . To see this, we construct a more formal argument. For simplicity, suppose . We consider tests of of the form
where is a constant that would be selected to control the size. The probability of rejecting the absence/presence null is
Suppose and are guaranteed to be greater than if they are both nonzero. Then, for , , and . Therefore, to control the size, we are only required to select so that the type-I error is controlled when ; this can be done by taking as the quantile of the standard normal distribution. However, if we allow , we can see a drastically inflated type-I error rate. For instance, for a small , let , . Then while , so . Thus, when small signal is permitted, tests of this form will be asymptotically anti-conservative.
Both approaches discussed above for testing absence/presence interactions fail for a similar reason: it is difficult to gather evidence supporting that a measure of association is exactly equal to zero. This is captured by the alternative region having zero area, causing the failure of the first approach. In the second approach, to obtain evidence supporting that an association is zero, we require that is only accepted when ; for this, we rely upon a minimum signal strength condition to guarantee that any non-zero association is detected.
3 Proposed Methodology
3.1 Refinement of Absence/Presence Hypothesis
To mitigate the challenges described in Section 2, we consider a refinement of the absence/presence null hypothesis. The key idea is that in practice, absence/presence interactions can be approximated by considering the settings where an association is at least moderately large in one population and negligible or near zero in the other; or when one association is substantially stronger than the other. This means that we can expand the alternative region to include neighborhoods of zero in a way that the absence/presence interpretation is preserved.
Recall that when there exists an absence/presence interaction, the ratio of the maximum of the absolute value of the to the minimum is infinite. We cannot test that the ratio is infinite because we will never have evidence to support that the denominator is exactly zero. However, we can test that the ratio is large because we may have evidence to support that the denominator is very small. Motivated by this intuition, we propose to test whether the relative difference between sub-population measures of association is greater than a large pre-specified constant . Formally, let and . We define the new relative difference null hypothesis as
Equivalently,
(3) |
The null region , illustrated in Figure 1, is the union of four linear subspaces — each residing in a separate orthant of ; the boundary of each subspace is the union of the spans of vectors with absolute direction and .
The relative difference null region can be viewed as a relaxation of the absence/presence null. For a large choice of , both our original and refined null hypotheses have the same interpretation: the greater measure of association is substantially larger than the lesser. However, we find it appealing that has a reasonable interpretation for any choice of ; that is, the multiplicative difference in strength of association is no larger than .
To motivate defining the refined null hypothesis in terms of relative differences rather than absolute differences, we argue that testing for relative differences has at least the following benefits. First, relative differences are unitless, so the relative difference null is compatible with unitless measures of association such as the Pearson correlation coefficient, which are often preferred in the analysis of biological data. Second, a reasonable can be selected without prior knowledge of ranges of strength of association.
Of course, the relative difference null hypothesis depends on the choice of . For small values of , the relative difference null may be too dissimilar from the absence/presence null to retain its interpretation; for large , the alternative region becomes very small, and the hypothesis may be overly conservative. In the following subsections, we first construct a test of the relative difference null hypothesis for a pre-specified and then describe an approach to identify the set of such that the test rejects the null hypothesis, as to circumvent tuning parameter selection.
3.2 Likelihood Ratio Test for Relative Difference Hypothesis
We now develop a testing procedure for the new relative difference null hypothesis. The relative difference null region, unlike the absence/presence null region, has non-zero area, so a likelihood ratio test will not fail in the same manner as the likelihood ratio test for absence/presence interactions. Similar to the previously discussed examples, the likelihood ratio test statistic is
and can be interpreted as the shortest (weighted) distance between and the null region . Clearly, the test statistic is zero whenever lies in the null region. Otherwise, is the shortest distance between and the closest of the four linear subspaces that define . The test statistic can be calculated as the distance between and its projection onto the span of the vector . The test statistic’s geometric interpretation is illustrated in Figure 2.
The likelihood ratio test statistic is straightforward to calculate. Let and be the strongest and weakest estimated absolute association. In the following lemma, we state that is equal to the difference between and divided by a normalizing constant. Thus, the test statistic can be viewed as a plug-in of into (3) with an additional normalizing constant and closely resembles the test statistic proposed by Fieller (1940) to conduct inference about ratios of means.
Lemma 1.
The likelihood ratio test statistic can be written as
where , and .
We now discuss how to obtain a p-value for the relative difference hypothesis. First, we obtain an observed test statistic , a realization of calculated from the data. Following the approach described in Section 2.2, we define the p-value as , the largest of all asymptotic tail probabilities such that belongs to the null region. To determine the maximum tail probability, we characterize the limiting distribution of assuming for all in the null region.
Though the null region contains an infinite number of values, can only attain one of three limiting distributions corresponding to the following three cases:
-
1.
The true association is in the interior of the null region, i.e., .
-
2.
The true association is on the boundary of the null region, but both associations are non-zero, i.e., .
-
3.
The true association is zero in both sub-populations, i.e., .
In Proposition 1, we describe the asymptotic behavior of for cases 1 and 2 above. We provide here some intuition for the result and reserve a formal argument for the Appendix In case 1, it is easy to argue that because is consistent, is a negative number with probability tending to one, and therefore never provides evidence against the null. In case 2, asymptotically follows a standard normal distribution. To see this, we note that because both associations are non-zero, consistency and asymptotic normality of imply that, for large , the sign and order of the estimates are deterministic. That the signs are asymptotically deterministic implies that is asymptotically normal (speaking loosely, ), and that order is asymptotically deterministic implies that and are asymptotically independent. Therefore, taking the difference between and , suitably standardized, is asymptotically equivalent to taking a difference between two independent normal random variables with equal means. Dividing by the asymptotic variances gives the claimed result.
Proposition 1.
In the interior of the null region, i.e., when , converges in distribution to . At all nonzero boundary points of the null region, i.e., when , converges in distribution to a standard normal random variable.
In case 3, when both associations are zero, the asymptotic distribution of the test statistic is more complicated. More specifically, implies that, asymptotically, follows a half-normal distribution instead of a normal distribution. Moreover, the order of remains random in the limit. This gives rise to a non-standard limiting distribution. In particular, unlike cases 1 and 2, the limiting distribution depends on the asymptotic variances of the sub-population estimates (and also the ratio of the sample sizes in the unbalanced case). We are nonetheless able to derive an analytic expression for the distribution function, stated in Proposition 2. We reserve this statement for the Appendix, as the expression is cumbersome.
By Propositions 1 and 2, we can calculate the p-value as the maximum of the tail probabilities in cases 2 and 3. That is,
(4) |
where denotes the standard normal distribution function and is the limiting distribution function for the test statistic when . Though the limiting distribution under is non-standard, tail probabilities can be calculated easily, as we show in the Appendix (Remark 1). The p-value is therefore simple to calculate.
We have derived an analytic approximation for the power of the likelihood ratio test. In Proposition 2, we more generally characterize the limiting distribution of the test statistic under hypotheses of the form . The local asymptotic power of the proposed test at the level is available by considering , where we define as the maximum of the quantiles of the limiting distribution of under scenarios 2 and 3 described above. An analytic finite-sample approximation of the power can be calculated as .
A contour plot of the local asymptotic power is given in Figure 3. We consider both cases of equal and unequal variance of the estimators. We observe that the likelihood ratio test has low power when the strongest effect is small, and power improves considerably when the strongest effect grows. Additionally, we find that in the presence of unequal variance, the test has greater power when the weakest effect is estimated with higher precision than the strongest effect.
3.3 Quantifying Relative Difference in Effect by Inverting Likelihood Ratio Test
Rather than perform a the likelihood ratio test for a pre-specified , one may prefer to directly estimate the relative difference in effect. Naïvely, one might consider estimating the relative difference in effect as . However, this estimate will behave poorly when and should therefore not be reported in practice.
To overcome this issue, we propose to quantify the relative difference in effect size by inverting the likelihood ratio test, similar to Fieller (1940). We define
as the largest such that the likelihood ratio test rejects the null hypothesis at the level. When the likelihood ratio test fails to reject for all , we will use the convention . We find it appealing that converges to if , and should approach but not exceed otherwise. We discuss calculation of in the Appendix.
3.4 Simultaneous Test of Qualitative Interactions
When it is of interest to identify both absence/presence and positive/negative qualitative interactions, it may be desirable to test for both simultaneously. In this section, we construct an omnibus test that achieves exact control of the size asymptotically.
We define the omnibus qualitative interaction null hypothesis as
The null region is the intersection of the positive/negative and relative difference null regions, as depicted in Figure 4. We observe that as , the omnibus null and alternative regions tend to the positive/negative null and alternative regions.
To construct the likelihood ratio test, we proceed using similar arguments to those presented in Section 3.2. The likelihood ratio statistic is the distance between the estimate and its projection onto the null region , inversely weighted by the asymptotic variance of . A simple expression for is given in Lemma 2.
Lemma 2.
The likelihood ratio statistic can be written as
Unsurprisingly, the likelihood ratio statistic for the omnibus test approaches the likelihood ratio statistic for the Gail-Simon likelihood ratio statistic for positive/negative interactions in the limit of large . The tests will, therefore, be nearly identical for sufficiently large .
To characterize the asymptotic behavior of the omnibus test statistic at each location under the null, we use similar arguments to those in the previous section. If belongs to the interior of the null region, converges in probability to zero. If belongs to the boundary of the null region and is non-zero, converges weakly to a uniform mixture of zero and the chi-squared distribution with one degree of freedom. If is zero, the limiting distribution of is non-standard, though it can be characterized nonetheless. Formal statements of asymptotic properties of are given in Propositions 3 and Remark 2 (the statement of Remark 2 is also cumbersome, and is reserved for the Appendix).
Proposition 3.
If belongs to the interior of the null region , converges in distribution to zero. If is on the boundary of the null region, but , as .
Calculating the p-value for the omnibus test is no more difficult than calculating the p-value for the absence/presence test. Defining , the p-value can be calculated as
(5) |
where is the value of the test statistic calculated on the observed data. We characterize the local asymptotic power of the omnibus likelihood ratio test in Proposition 4 in the Appendix.
4 Simulation Study
In a Monte Carlo simulation study, we examine how type-I error probabilities and statistical power of the likelihood ratio tests for qualitative interactions are affected by signal strength, sample size, and selection of . Additionally, we examine how depends on the true sub-population effects and the sample size.
We generate random observations in sub-population under the linear model:
Here, is the predictor of interest, is the response, and is white noise. The measure of association in which we are interested is the regression coefficient . We fix and consider . A total of synthetic data sets are randomly generated for each and .
For each synthetic data set, we perform both the relative difference likelihood ratio test and the omnibus qualitative interaction likelihood ratio test with and ; we use a significance level of . We additionally calculate with . Parameter estimation is performed with ordinary least squares, and model-based estimates of the standard error are used.
In Figure 5, we plot the Monte Carlo estimate of the rejection probability of the relative difference likelihood ratio test over the range of . We see that the test achieves control of size for both choices of and both sample sizes. We observe that power is largest when is near zero, as we would expect. Power is moderately strong when but fairly poor when . With , the likelihood ratio test has almost zero power when the sample size is small, though we see modest improvement when the sample size is larger.
Figure 5 also shows the estimated rejection probabilities of the omnibus test over . Control of size is achieved in both large and small samples, as expected. We note that for the omnibus test, power increases as tends to , and power is uniformly larger when than when .
In Figure 6, we plot the quantiles of the values from 1000 synthetic data sets for each . We expect that for a fixed , most should approach but not exceed as sample size increases; our simulations are consistent with this expectation. We note that when the sample size is small, tends to underestimate the relative difference in effect size.
5 Data Example
In this example, we investigate genetic differences in breast cancer sub-types. Classification of breast cancer based on expression of estrogen receptor (ER) is known to be associated with clinical outcomes. Approximately 70% of breast cancers are estrogen receptor positive (ER+) cancers, meaning that estrogen causes cancer cells to grow (Lumachi et al., 2013); breast cancers are otherwise estrogen receptor negative (ER-). Patients with ER+ breast cancer tend to experience better clinical outcomes than ER- patients (Carey et al., 2006).
We conduct an analysis using publicly available data from The Cancer Genome Atlas (TCGA) (Weinstein et al., 2013). We use clinical data and gene expression data from a total of 806 ER+ patients and 237 ER- patients.
We first investigate the differences between the genetic networks in ER+ and ER- breast cancer. Both ER+ and ER- breast cancer are expected to have similar pathways, but identifying differences between them may be key to understanding the underlying disease mechanism. We then conduct an analysis to assess whether any genes in a set known to be associated with breast cancer are strongly prognostic of disease outcomes in only one of the estrogen receptor groups.
5.1 Differential Network Analysis
Our objective is to determine whether there are any pairs of genes that are much more strongly associated in one estrogen receptor group than the other. We consider the set of genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000) breast cancer pathway and measure the association between gene expression levels using the Pearson correlation.
We test the relative difference null hypothesis for each pair of genes with . In Figure 7, we display the pairs of genes that are statistically significant at the level after a Bonferroni adjustment. We find that each of the genes progesterone (PGR), insulin-like growth factor 1 (IGF1R), and estrogen receptor 1 (ESR1) have multiple differential connections; each belongs to at least two pairs such that the association is twice as strong in the ER+ population than in the ER- population. These genes have been shown in the literature to be associated with sub-type and prognosis (Farabaugh et al., 2015; Reinert et al., 2019; Kurozumi et al., 2017).
5.2 Prognostic Value of Biomarkers
The goal of this analysis is to assess whether any of the KEGG genes have a stronger association with time to death in one estrogen receptor group than in the other. For each gene, we fit a univariate Cox proportional-hazards model with time to death as the outcome in both of the estrogen receptor groups separately; we measure association using the log hazard ratio. A total of 64 deaths occurred in the ER+ group, and 33 deaths occurred the ER- group. We calculate with for each gene.
In Figure 7, we compare the log hazard ratios of the ER+ and ER- groups in a scatterplot. Though the log hazard ratios for most genes are similar between subgroups, there are twelve genes with larger than one. A complete list is available in Table 1. The two genes with the strongest interactions are Growth Factor Receptor-bound Protein 2 (GRB2; ), which has a stronger association in the ER- group, and Adenomatous Polyposis Coli (APC; ), which has a stronger association in the ER+ group. Both genes have been hypothesized to be associated with breast cancer carcinogenesis (Daly et al., 1994; Jin et al., 2001).
Gene | ER+ Log HR (SE) | ER- Log HR (SE) | |
---|---|---|---|
GRB2 | -0.06 (0.31) | -1.66 (0.68) | 2.04 |
APC | 1.34 (0.32) | -0.09 (0.33) | 1.91 |
BAX | -1.05 (0.24) | 0.04 (0.36) | 1.53 |
PIK3CA | 1.13 (0.28) | 0.14 (0.32) | 1.51 |
SOS2 | 1.13 (0.36) | -0.1 (0.37) | 1.33 |
MAP2K2 | -0.87 (0.27) | 0.03 (0.35) | 1.22 |
GADD45G | -0.52 (0.13) | -0.07 (0.19) | 1.21 |
HES5 | 0.02 (0.2) | 0.51 (0.18) | 1.19 |
WNT2 | -0.36 (0.09) | 0 (0.17) | 1.14 |
DLL4 | 0.09 (0.2) | 0.68 (0.27) | 1.10 |
FRAT2 | -1.22 (0.31) | -0.45 (0.29) | 1.08 |
SOS1 | 1.19 (0.3) | -0.34 (0.42) | 1.01 |
6 Discussion
We have proposed a general framework for inference about absence/presence qualitative interactions. We argued that naïve procedures rely upon untenable conditions because the absence/presence hypothesis is ill-posed. We thus proposed to relax the problem in order to conduct well-calibrated inference that maintains the absence/presence interpretation and only requires mild assumptions.
In simulations, we found that our methodology has low power when signal is weak or sample sizes are small. To an extent, this is just a feature of the problem; naturally, one would require even more information to detect qualitative interactions than what is required to detect quantitative interactions. However, we provide no guarantee that our methodology is optimal, as tests for composite hypotheses based upon supremum p-values can be conservative in practice (Bayarri and Berger, 2000).
Nonetheless, our framework is interpretable and provides a natural approach for quantifying differences in strength of association by sub-population in general settings. Though we only considered measures of marginal association in our examples, our method can be used with conditional measures of association as well; we only require that asymptotically normal estimates are available. In particular, our approach remains valid in the high-dimensional setting, where asymptotically normal estimates can be obtained using the techniques of, e.g., van de Geer et al. (2014) and Zhang and Zhang (2014). Finally, our method has good utility in the analysis of genomics data, as we demonstrated in our real data example.
7 Acknowledgements
The authors gratefully acknowledge the support of the NSF Graduate Research Fellowship Program under grant DGE-1762114 as well as NSF grant DMS-1561814 and NIH grant R01-GM114029. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.
References
- Bayarri and Berger (2000) Bayarri, M. and Berger, J. O. (2000). P values for composite null models. Journal of the American Statistical Association 95, 1127–1142.
- Carey et al. (2006) Carey, L. A., Perou, C. M., Livasy, C. A., Dressler, L. G., Cowan, D., Conway, K., Karaca, G., Troester, M. A., Tse, C. K., Edmiston, S., et al. (2006). Race, breast cancer subtypes, and survival in the carolina breast cancer study. Jama 295, 2492–2502.
- Casella and Berger (2002) Casella, G. and Berger, R. L. (2002). Statistical inference, volume 2. Duxbury Pacific Grove, CA.
- Daly et al. (1994) Daly, R. J., Binder, M. D., and Sutherland, R. L. (1994). Overexpression of the grb2 gene in human breast cancer cell lines. Oncogene 9, 2723–2727.
- Farabaugh et al. (2015) Farabaugh, S. M., Boone, D. N., and Lee, A. V. (2015). Role of igf1r in breast cancer subtypes, stemness, and lineage differentiation. Frontiers in endocrinology 6, 59.
- Fieller (1940) Fieller, E. C. (1940). The biological standardization of insulin. Supplement to the Journal of the Royal Statistical Society 7, 1–64.
- Gail and Simon (1985) Gail, M. and Simon, R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics 41, 361–372.
- Ideker and Krogan (2012) Ideker, T. and Krogan, N. J. (2012). Differential network biology. Molecular systems biology 8, 565.
- Jin et al. (2001) Jin, Z., Tamura, G., Tsuchiya, T., Sakata, K., Kashiwaba, M., Osakabe, M., and Motoyama, T. (2001). Adenomatous polyposis coli (apc) gene promoter hypermethylation in primary breast cancers. British journal of cancer 85, 69–73.
- Kanehisa and Goto (2000) Kanehisa, M. and Goto, S. (2000). Kegg: kyoto encyclopedia of genes and genomes. Nucleic acids research 28, 27–30.
- Kurozumi et al. (2017) Kurozumi, S., Matsumoto, H., Hayashi, Y., Tozuka, K., Inoue, K., Horiguchi, J., Takeyoshi, I., Oyama, T., and Kurosumi, M. (2017). Power of pgr expression as a prognostic factor for er-positive/her2-negative breast cancer patients at intermediate risk classified by the ki67 labeling index. BMC cancer 17, 354.
- Li and Chan (2006) Li, J. and Chan, I. S. (2006). Detecting qualitative interactions in clinical trials: an extension of range test. Journal of Biopharmaceutical statistics 16, 831–841.
- Lumachi et al. (2013) Lumachi, F., Brunello, A., Maruzzo, M., Basso, U., and Mm Basso, S. (2013). Treatment of estrogen receptor-positive breast cancer. Current medicinal chemistry 20, 596–604.
- Müller (2001) Müller, A. (2001). Stochastic ordering of multivariate normal distributions. Annals of the Institute of Statistical Mathematics 53, 567–575.
- Pan and Wolfe (1997) Pan, G. and Wolfe, D. A. (1997). Test for qualitative interaction of clinical significance. Statistics in medicine 16, 1645–1652.
- Piantadosi and Gail (1993) Piantadosi, S. and Gail, M. (1993). A comparison of the power of two tests for qualitative interactions. Statistics in Medicine 12, 1239–1248.
- Reinert et al. (2019) Reinert, T., Coelho, G. P., Mandelli, J., Zimermann, E., Zaffaroni, F., Bines, J., Barrios, C. H., and Graudenz, M. S. (2019). Association of esr1 mutations and visceral metastasis in patients with estrogen receptor-positive advanced breast cancer from brazil. Journal of oncology 2019,.
- Silvapulle (2001) Silvapulle, M. J. (2001). Tests against qualitative interaction: Exact critical values and robust tests. Biometrics 57, 1157–1165.
- van de Geer et al. (2014) van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R., et al. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. The Annals of Statistics 42, 1166–1202.
- VanderWeele (2019) VanderWeele, T. J. (2019). The interaction continuum. Epidemiology (Cambridge, Mass.) 30, 648.
- Weinstein et al. (2013) Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R. M., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J. M., Network, C. G. A. R., et al. (2013). The cancer genome atlas pan-cancer analysis project. Nature genetics 45, 1113.
- Zhang and Zhang (2014) Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76, 217–242.
Software Availability
Implementation of the proposed methodology and code to reproduce results from the data analysis are available at https://github.com/awhudson/QualitativeInteractions. The TCGA data are available using the public R package RTCGA.
Appendix
Theoretical Results for Relative Difference Likelihood Ratio Test
Below, we generalize Lemma 1 and Proposition 1 to the setting of unbalanced sample sizes .
Lemma 1.
The likelihood ratio test statistic can be written as
where , and .
Proof of Lemma 1.
We define and as
Now, we define as the projection of onto the null region. If is in the null region , the projection is equal to . Otherwise, is the projection of onto the space spanned by , with distance inversely weighted by . We proceed by finding an expression for .
Now, the weighted distance between and is
Hence, the likelihood ratio test rejects the null for large values of
or equivalently, for large positive values of , where
∎
Proposition 1.
Let , and assume as . Then,
-
i
In the interior of the null region, i.e., when , converges in distribution to .
-
ii
At all nonzero boundary points of the null region, i.e., when , converges in distribution to a standard normal random variable.
Proof of Proposition 1.
First we prove (i).
Suppose .
Then consistency of and the continuous mapping theorem imply that .
Now, because and both tend to zero, .
Therefore,
Now we prove (ii).
Suppose .
By applying the delta method,
where the matrix is
Now, Slutsky’s theorem implies that
The test statistic can be written as
with and as defined in Proposition 1. By the continuous mapping theorem, the denominator converges in probability to
Thus, Slutsky’s theorem implies that
∎
In Proposition 2, we characterize the local asymptotic behavior of the relative difference likelihood ratio test.
Proposition 2.
Let , and assume as . Suppose , and . Further, assume that is locally regular in the sense that for . Then as , for all , converges to
where follows a bivariate normal distribution with mean , unit variance and correlation , and follows a bivariate normal distribution with mean , unit variance and correlation , with are defined as
Remark 1.
Setting , we obtain the limiting distribution of when , which is critical for calculating the p-value .
Proof of Proposition 2.
We first re-write the tail probability as
Thus,
And,
Now,
Similarly,
∎
Theoretical Results for Simultaneous Test for Qualitative Interactions
In what follows, we generalize Lemma 2 and Proposition 3 to the case of unbalanced sample sizes.
Lemma 2.
The likelihood ratio statistic can be written as
Proof of Lemma 2.
If belongs to the null region, , the likelihood ratio statistic is clearly zero. If or , the likelihood ratio statistic is minimum of the distances between and its projections onto the span of and the span of . Algebra (similar to the proof of Lemma 1) gives the desired result, and similarly if or . ∎
Proposition 3.
Let , and assume . Then,
-
i
If belongs to the interior of the null region , converges in distribution to zero.
-
ii
If is on the boundary of the null region, but , as .
Proof of Proposition 3.
We first prove (i).
If belongs to the interior of the null region, consistency of and the continuous mapping theorem imply that converges in probability to zero.
Then, by Slutsky’s theorem, converges in distribution to zero.
To prove (ii), recall that we can write the null and alternative regions as
Now, we write
The second and third summands above are exactly equal to 0. To see this, note that in the second term and implies that , and similarly for the third term. In the first and last summands, we can ignore the term . To see this, note that in the first term and imply that ; a similar argument holds for the fourth term. Thus,
(6) |
Suppose, for the moment, . Then
Therefore,
Thus,
A similar argument follows if , , or . ∎
In Proposition 4, we characterize the local asymptotic behavior of the omnibus test for qualitative interactions.
Proposition 4.
Let , and assume as . Suppose with . Further, assume is locally regular in the sense that for . Then as , for all , converges to
where follows a bivariate normal distribution with mean , unit variance, and correlation , where
Remark 2.
Setting , we obtain the limiting distribution of when , which is critical for calculating the p-value .
Calculation of
Define as a realization of the likelihood ratio statistic for the relative difference hypothesis, calculated on the observed data. For a pre-specified , recall that is defined as the largest such that the likelihood ratio test rejects the relative difference null. Defining , we can evaluate as
Therefore, we are only required to calculate and .
Both and can be calculated with root-finding algorithms. To see this, we first observe that is monotone in , recalling that the numerator of decreases in while the denominator increases. Monotonicity of implies that is monotone in . Now, applying of Proposition 2,
where and follow bivariate normal distributions with mean zero, unit variance, and correlations and , as defined in Proposition 2, respectively. Both and are monotone decreasing in , so Theorem 8 in Müller (2001) implies that is monotone in . Thus is monotone in .
Monotonicity of and implies that if the likelihood ratio test rejects the null hypothesis for some , and are the unique roots of
respectively. These roots can be easily calculated via, e.g., the bisection method. In our implementation, we use the uniroot function in R.