This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Transformation models for ROC analysis

Ainesh Sewak
Universität Zürich &Torsten Hothorn
Universität Zürich
\Plainauthor

Ainesh Sewak, Torsten Hothorn \Plaintitle \Abstract Receiver operating characteristic (ROC) analysis is one of the most popular approaches for evaluating and comparing the accuracy of medical diagnostic tests. Although various methodologies have been developed for estimating ROC curves and its associated summary indices, there is no consensus on a single framework that can provide consistent statistical inference whilst handling the complexities associated with medical data. Such complexities might include covariates that influence the diagnostic potential of a test, ordinal test data, censored data due to instrument detection limits or correlated biomarkers. We propose a regression model for the transformed test results which exploits the invariance of ROC curves to monotonic transformations and naturally accommodates these features. Our use of maximum likelihood inference guarantees asymptotic efficiency of the resulting estimators and associated confidence intervals. Simulation studies show that the estimates based on transformation models are unbiased and yield coverage at nominal levels. The methodology is applied to a cross-sectional study of metabolic syndrome where we investigate the covariate-specific performance of weight-to-height ratio as a non-invasive diagnostic test. Software implementations for all the methods described in the article are provided in the \pkgtram \proglangR package. \Keywordstransformation model, ROC curve, AUC, diagnostic test, distribution regression, ordinal outcome, censoring, Youden Index, OVL, correlated biomarkers, limit of detection \Plainkeywordstransformation model, ROC curve, AUC, diagnostic tests, distribution regression, ordinal outcome, censoring, Youden Index, OVL, correlated biomarkers, limit of detection \Address Ainesh Sewak, Torsten Hothorn
Institut für Epidemiologie, Biostatistik und Prävention
Universität Zürich
Hirschengraben 84, CH-8001 Zürich, Switzerland
[email protected], [email protected]

1 Introduction

Estimating receiver operating characteristic (ROC) curves for evaluating the performance of medical diagnostic tests has been a main focus of statistical literature over the last decades (Green and Swets, 1966; Egan, 1975; Pepe, 2003; Zou et al., 2011; Inácio et al., 2021b). Diagnostic tests screen for the presence or absence of a disease. Characterizing their accuracy is essential to ensure appropriate prevention, treatment and monitoring of diseases. ROC curves are a valuable tool in determining the diagnostic potential of a test and continue to be extensively applied in biomedical studies as new tests or biomarkers are developed in radiology, oncology, genetics, and other related fields. Increasingly more applications can be expected due to advancements in technology and analyzing resulting data requires a computationally straightforward approach to provide accurate and consistent statistical inference.

Previous research has focused on extending statistical methodology for ROC curve estimation to address issues such as adjustment for covariates (Pepe, 1997, 1998, 2000; Faraggi, 2003), incorporating censoring due to instrument detection limits (Perkins et al., 2007; Ruopp et al., 2008; Bantis et al., 2017), comparing correlated diagnostic tests (Hanley and McNeil, 1983; DeLong et al., 1988; Zou and Hall, 2002; Molodianovitch et al., 2006; Bantis and Feng, 2016) and robustness to model misspecification (Bianco et al., 2020; Inácio et al., 2021a). In addition, a wide variety of parametric and nonparametric methods have been proposed within frequentist and Bayesian paradigms (Hsieh and Turnbull, 1996; Alonzo and Pepe, 2002; Erkanli et al., 2006; Branscum et al., 2008; Yao et al., 2010; González-Manteiga et al., 2011; Rodríguez-Álvarez et al., 2011; Cai and Pepe, 2002; Cai and Moskowitz, 2004; Inácio et al., 2013; Ghosal et al., 2022). However, there is no consensus on an analytic approach that can handle all these issues simultaneously.

An attractive feature of the ROC curve, which has scantly been used for its estimation, is that it remains invariant to monotonic transformations of the test results. Although transformations have been used to bring continuous test results into a form that approximately satisfies the assumptions of a suitable parametric model (Faraggi, 2003; Schisterman et al., 2004), estimation of a transformation function has been limited to the Box-Cox power transformation family (Box and Cox, 1964; Zou et al., 1998; Zou and Hall, 2000). For rank-based methods, the transformation function can be left unspecified but in all cases a restriction to normality has been previously imposed on the model for the ROC curve (Zou et al., 1997).

In this article we present a new unifying methodological framework for estimating ROC curves and its associated summary indices by modeling the relationship between the transformed test results and potential covariates. We employ transformation models to jointly estimate the transformation function and regression parameters (Hothorn et al., 2014, 2018). This approach specifies a parametric model for the ROC curve but remains distribution-free because we do not impose any strong assumptions on the transformation function. Using the estimated parameters, we show how to evaluate covariate effects on discriminatory performance of diagnostic tests. We focus on data obtained from case-control or cross-sectional studies of subjects with continuous or ordinal test results. Unlike nonparametric methods which are flexible but difficult to interpret and implement, transformation models excel on both fronts. Their output can be easily interpreted by clinicians, which is of important consideration in biomedical applications. \proglangR implementations of all methods discussed in this article are available, along with a set of supporting examples. The goals of this article are to introduce a general methodology for performing ROC analysis which is model-based but distribution-free; to apply maximum likelihood inference procedures to examine covariate effects with potentially censored observations; and to demonstrate methods for testing differences between correlated ROC curves of several tests at specific covariate levels within the same conceptual and computational framework.

1.1 Notation

Let the random variable YY denote the continuous result of a diagnostic test and let DD denote the disease status, with D=1D=1 if a subject is diseased and 0 if nondiseased. We denote quantities conditional on the disease status using subscripts. For example, Y1Y_{1} and Y0Y_{0} are the test results in the diseased and nondiseased populations with cumulative distribution functions (CDF) given by F1F_{1} and F0F_{0} and densities f1f_{1} and f0f_{0}, respectively. Suppose that the subject is diagnosed as diseased when their test result exceeds a threshold value, cc. By convention, we assume that larger values of the test result are more indicative of the disease. The probability of truly identifying a diseased and nondiseased subject are defined as sensitivity, \Prob(Y1>c)=1F1(c)\Prob(Y_{1}>c)=1-F_{1}(c), and specificity, \Prob(Y0c)=F0(c)\Prob(Y_{0}\leq c)=F_{0}(c), respectively. The set of pairs (1specificity,sensitivity)(1-\text{specificity},\text{sensitivity}) for all cc\in\mathbb{R} produce the ROC curve. By setting p=1F0(c)p=1-F_{0}(c), an equivalent representation of the ROC curve is

ROC(p)=1F1(F01(1p)).\displaystyle\text{ROC}(p)=1-F_{1}\left(F_{0}^{-1}(1-p)\right).

1.2 Summary indices

Summary indices of the ROC curve quantify the degree of separation between the distributions Y1Y_{1} and Y0Y_{0}. The most widely used index is the area under the ROC curve (AUC) defined by

AUC=\Prob(Y1>Y0)=01ROC(p)𝑑p.\displaystyle\text{AUC}=\Prob(Y_{1}>Y_{0})=\int_{0}^{1}\text{ROC}(p)dp.

The AUC represents the probability that the test results of a randomly selected diseased subject exceeds the one of a nondiseased subject and is directly related to the Mann-Whitney-Wilcoxon U-statistic (MWW) (Bamber, 1975; Hanley and McNeil, 1982). Alternative indices include the Youden index (Youden, 1950), JJ, which combines sensitivity and specificity over all possible thresholds to provide the maximum potential effectiveness of a diagnostic test, given by

J\displaystyle J =maxc[F0(c)F1(c)].\displaystyle=\max_{c\in\mathbb{R}}[F_{0}(c)-F_{1}(c)].

The Youden index is equivalent to the Smirnov (or the two-sample Kolmogorov-Smirnov) test statistic (Gail and Green, 1976; Komaba et al., 2022) and can be represented as half the L1L_{1} distance between two densities or as the complement of the overlapping coefficient (OVL) (Weitzman, 1970; Feller, 1991; Schmid and Schmidt, 2006; Martínez-Camblor, 2022):

J\displaystyle J =12|f0(y)f1(y)|𝑑y=1min[f1(y),f0(y)]𝑑y=1OVL.\displaystyle=\frac{1}{2}\int|f_{0}(y)-f_{1}(y)|dy=1-\int\min[f_{1}(y),f_{0}(y)]dy=1-\text{OVL}.

Additionally, the threshold corresponding to JJ, where sensitivity and specificity are maximized, denoted as cc^{*}, is often used in clinical practice as the optimal classification threshold to screen subjects.

1.3 Covariates

The result of a diagnostic test can be affected by both the disease status, as well as by covariates associated with the subjects. Consequently, the test result distributions and their separation may differ by levels of the covariates. In order to appropriately understand the accuracy of the test in subpopulations, we can use covariate-specific or conditional ROC curves (Tosteson and Begg, 1988; Pepe, 1997). Throughout we use boldface letters to denote vectors and lightface for scalars. Let 𝑿X denote a vector of covariates that are hypothesized to have an impact on the accuracy of the test. The conditional CDF in the diseased population is given by F1(y𝒙)=\Prob(Y1y𝑿=𝒙)F_{1}(y\mid\text{\boldmath$x$})=\Prob(Y_{1}\leq y\mid\text{\boldmath$X$}=\text{\boldmath$x$}) and analogously given for the nondiseased population. The covariate-specific ROC can be written as

ROC(p𝒙)=1F1(F01(1p𝒙)𝒙)\displaystyle\text{ROC}(p\mid\text{\boldmath$x$})=1-F_{1}\left(F_{0}^{-1}(1-p\mid\text{\boldmath$x$})\mid\text{\boldmath$x$}\right) (1)

with its counterpart conditional summary indices, AUC(𝒙)\text{AUC}(\text{\boldmath$x$}) and J(𝒙)J(\text{\boldmath$x$}), defined accordingly.

The covariate-specific ROC curve can be generated by modeling the conditional distribution of the test results. This is known as the induced or indirect methodology for modeling ROC curves (Pepe, 1997, 1998, 2003; Faraggi, 2003; Rodríguez-Álvarez et al., 2011). Alternatively, direct methodology is based on a regression model for the ROC curve itself (Pepe, 2000; Alonzo and Pepe, 2002; Cai and Moskowitz, 2004).

1.4 Overview

The article proceeds as follows. In Section 2, we propose a transformation modeling framework for parameterizing ROC curves from which we derive closed-form expressions for associated AUC and Youden summary indices. We then show how our approach addresses the main components of ROC analysis: summarizing the discriminatory capabilities of a diagnostic test for two-sample classification, adjusting for covariates, and comparing the efficacy of two (or more) correlated diagnostic tests. We discuss maximum likelihood estimation procedures for our model and corresponding inference. Specifically, we develop score confidence bands for the ROC curve and score confidence intervals for its summary indices, which remain invariant to reparametrizations and hence can be determined from the estimated parameters of the transformation model. In Section 4 we apply our approach to a cross-sectional study for detection of metabolic syndrome. Specifically, using the weight-to-height ratio as a non-invasive diagnostic test for metabolic syndrome, we investigate the age and gender-specific performance of the test. We conclude the article with a discussion.

2 Methods

2.1 Transformation model

The ROC curve is a composition of distribution functions and thus is invariant to strictly monotonically increasing transformations of YY. We propose a model for the conditional distribution of the transformed test result given the disease status and covariates. This transformation is obtained from the data and leads to a distribution-free framework to parameterize the covariate-specific ROC curve and its summary indices.

Suppose there exists a strictly monotonically increasing function hh such that the relationship between the transformed test result and the covariates follows a shift-scale model

h(Y)\displaystyle h(Y) =μd(𝒙)+σd(𝒙)Z,\displaystyle=\mu_{d}(\text{\boldmath$x$})+\sigma_{d}(\text{\boldmath$x$})Z,

where D=dD=d specifies the disease indicator (D=0D=0 for nondiseased and D=1D=1 for diseased), 𝑿=𝒙\text{\boldmath$X$}=\text{\boldmath$x$} a fixed set of covariates, μd(𝒙)\mu_{d}(\text{\boldmath$x$}) the shift term, σd(𝒙)\sigma_{d}(\text{\boldmath$x$}) the scale term and ZZ\in\mathbb{R} is a latent random variable with an a priori known absolutely continuous log-concave CDF, FZF_{Z}. Given that DD and 𝑿X are fixed, the conditional CDF for YY is

\Prob(YyD=d,𝑿=𝒙)=Fd(y𝒙)=FZ(h(y)μd(𝒙)σd(𝒙)).\displaystyle\begin{split}\Prob(Y\leq y\mid D=d,\text{\boldmath$X$}=\text{\boldmath$x$})&=F_{d}(y\mid\text{\boldmath$x$})=F_{Z}\left(\frac{h(y)-\mu_{d}(\text{\boldmath$x$})}{\sigma_{d}(\text{\boldmath$x$})}\right).\end{split} (2)

Equation 2 represents a general class of models called transformation models (Bickel and Doksum, 1981; Bickel, 1986; Bickel et al., 1993; Hothorn et al., 2018). The transformation function hh uniquely characterizes the distribution of YY, similar to the density or distribution function. Plugging in this conditional CDF of YY into Equation 1, hh cancels out and the covariate-specific ROC curve is given by

ROC(p𝒙)=1FZ(ζ(𝒙)FZ1(1p)δ(𝒙)),\displaystyle\begin{split}\text{ROC}(p\mid\text{\boldmath$x$})&=1-F_{Z}\left(\zeta(\text{\boldmath$x$})F_{Z}^{-1}(1-p)-\delta(\text{\boldmath$x$})\right),\end{split} (3)
where
δ(𝒙)\displaystyle\delta(\text{\boldmath$x$}) =μ1(𝒙)μ0(𝒙)σ1(𝒙),\displaystyle=\frac{\mu_{1}(\text{\boldmath$x$})-\mu_{0}(\text{\boldmath$x$})}{\sigma_{1}(\text{\boldmath$x$})},
ζ(𝒙)\displaystyle\zeta(\text{\boldmath$x$}) =σ0(𝒙)σ1(𝒙).\displaystyle=\frac{\sigma_{0}(\text{\boldmath$x$})}{\sigma_{1}(\text{\boldmath$x$})}.

Thus, the ROC curve is completely determined by the shift and scale terms of the model.

The binormal (Dorfman and Alf Jr, 1969) and bilogistic (Ogilvie and Creelman, 1968) ROC curves can be obtained by setting FZF_{Z} to the standard normal distribution function probit1=Φ\operatorname{probit}^{-1}=\Phi, or the standard logistic distribution function logit1(x)=expit(x)=(1+exp(x))1\operatorname{logit}^{-1}(x)=\operatorname{expit}(x)=(1+\exp(-x))^{-1}, in Equation 3, respectively. Similarly, the proportional hazard (Gönen and Heller, 2010) and reverse proportional hazard alternatives (Khan, 2022) for the ROC curve also fall within the purview of our transformation model with FZF_{Z} specified as cloglog1(x)=1exp(exp(x))\operatorname{cloglog}^{-1}(x)=1-\exp(-\exp(x)) (minimum extreme value distribution function) and loglog1(x)=exp(exp(x))\operatorname{loglog}^{-1}(x)=\exp(-\exp(-x)) (maximum extreme value distribution function), respectively.

However, to the best of our knowledge, the only literature where the transformation function hh is included in the model formulation of the ROC curve is Zou (1997), who jointly models the shift term and the parameters of a Box-Cox power transformation function. A key point of this paper is that we explicitly estimate hh jointly with μ(𝒙)\mu(\text{\boldmath$x$}) from the observed data and are not restricted to normality imposed by power transformation families. Thus, the methods we propose allow for proper propagation of uncertainty from the estimated transformation function h^\hat{h} into the estimates of the shift and scale terms of the model.

The ROC curve in Equation 3 follows a parametric model depending on FZF_{Z}, but is distribution-free as in Pepe (2000); Alonzo and Pepe (2002), because no assumptions are made about the transformation hh and consequently for the distribution of the test results. The approach to model the test results as a function of the disease status and covariates was originally proposed in the latent variable ordinal regression setting by Tosteson and Begg (1988) and extended by Pepe (1997, 2000, 2003) to modeling covariate effects directly on the ROC curve.

Tosteson and Begg (1988) pointed out that to ensure concavity of the induced ROC curve, the scale term must be omitted, i.e., σd(𝒙)=1\sigma_{d}(\text{\boldmath$x$})=1 for d={0,1}d=\{0,1\}. The ROC curve is termed proper if it is concave or, equivalently, if the derivative of the ROC curve is a monotonically decreasing function (Egan, 1975; Pan and Metz, 1997; Dorfman et al., 1997). A concave ROC curve is desirable as it yields the maximal sensitivity for a given value of specificity (McIntosh and Pepe, 2002). In this sense, as the decision criterion for classifying subjects is optimal when the ROC curve is concave, we focus the remaining work on the model involving only the shift term. Hence, the effect of covariates on the ROC curve is contained in the difference between the shift terms for diseased and nondiseased subjects, δ(𝒙)=μ1(𝒙)μ0(𝒙)\delta(\text{\boldmath$x$})=\mu_{1}(\text{\boldmath$x$})-\mu_{0}(\text{\boldmath$x$}). For a relaxation of this assumption, see Siegfried et al. (2022) who estimate the scale functions through regression models.

2.1.1 Two-sample case

To fix ideas, we first consider the case of two samples without covariates. Let the shift term take the form μd(𝒙)=δd\mu_{d}(\text{\boldmath$x$})=\delta d. The CDF of the test results in the nondiseased population is given by F0(y)=FZ(h(y))F_{0}(y)=F_{Z}(h(y)) and in the diseased population by F1(y)=FZ(h(y)δ)F_{1}(y)=F_{Z}(h(y)-\delta). Using Equation 3, the induced ROC curve can be expressed as

ROC(p)\displaystyle\text{ROC}(p) =1FZ(FZ1(1p)δ).\displaystyle=1-F_{Z}\left(F_{Z}^{-1}(1-p)-\delta\right). (4)

The model assumption implies that a monotone function hh exists to transform both Y1Y_{1} and Y0Y_{0} into the same distribution, ZFZZ\sim F_{Z} separated by a shift parameter, δ\delta. The induced ROC curve from this model doesn’t assume a particular distribution of the test result, rather, it quantifies the difference between the test result distributions on the scale of a user-defined FZF_{Z}. In this sense, the difference between the test result distributions is described by δ\delta. Each choice of FZF_{Z} leads to a different interpretation of δ\delta. For example, when FZF_{Z} is selected to be the standard normal distribution function, δ\delta is the difference in means of the transformed test results comparing the diseased and nondiseased groups, \E[h(Y1)h(Y0)]\E[h(Y_{1})-h(Y_{0})]. Similarly, when FZF_{Z} is the standard logistic distribution function, exp(δ)\exp(\delta) is the ratio of odds of having a positive test result comparing diseased and nondiseased groups. Closed-form expressions can be derived for summary indices of the ROC curve by solving the appropriate integrals. The expressions of AUC, JJ, the optimal threshold cc^{*}, sensitivity and specificity at cc^{*} are given for some well known choices of FZF_{Z} in Table 1.

Index FZF_{Z} probit1\operatorname{probit}^{-1} logit1\operatorname{logit}^{-1} cloglog1\operatorname{cloglog}^{-1} loglog1\operatorname{loglog}^{-1}
AUC Φ(δ2)\Phi\left(\frac{\delta}{\sqrt{2}}\right) {exp(δ)(exp(δ)1δ)(exp(δ)1)2δ01/2δ=0\left\{\begin{array}[]{ll}\frac{\exp(\delta)(\exp(\delta)-1-\delta)}{(\exp(\delta)-1)^{2}}&\delta\neq 0\\ 1/2&\delta=0\end{array}\right. expit(δ)\operatorname{expit}(\delta)
JJ 12Φ(|δ|2)1-2\Phi\left(\frac{-|\delta|}{2}\right) 12expit(|δ|2)1-2\operatorname{expit}\left(\frac{-|\delta|}{2}\right) exp(|δ|e|δ|1)exp(|δ|e|δ|1)\exp\left(\frac{-|\delta|}{e^{|\delta|}-1}\right)-\exp\left(\frac{|\delta|}{e^{-|\delta|}-1}\right)
cc^{*} h1(δ2)h^{-1}\left(\frac{\delta}{2}\right) h1(log(δ1eδ))h^{-1}\left(\log\left(\frac{\delta}{1-e^{-\delta}}\right)\right) h1(log(eδ1δ))h^{-1}\left(\log\left(\frac{e^{\delta}-1}{\delta}\right)\right)
Sens(c)\text{Sens}(c^{*}) Φ(δ2)\Phi\left(\frac{\delta}{2}\right) expit(δ2)\operatorname{expit}\left(\frac{\delta}{2}\right) exp(δeδ1)\exp\left(\frac{-\delta}{e^{\delta}-1}\right) 1exp(δeδ1)1-\exp\left(\frac{\delta}{e^{-\delta}-1}\right)
Spec(c)\text{Spec}(c^{*}) 1exp(δeδ1)1-\exp\left(\frac{\delta}{e^{-\delta}-1}\right) exp(δeδ1)\exp\left(\frac{-\delta}{e^{\delta}-1}\right)
Table 1: Closed-form expressions for the area under the receiver operating characteristic curve (AUC), Youden Index (JJ), optimal classification threshold (cc^{*}), sensitivity (Sens) and specificity (Spec) at cc^{*} in terms of the shift parameter δ\delta in the linear transformation model given by Fd(y)=FZ(h(y)δd)F_{d}(y)=F_{Z}(h(y)-\delta d).

2.1.2 Conditional ROC curve

The accuracy of a diagnostic test may be influenced by a set of covariates 𝑿X. To evaluate this effect on the ROC curve and its summary indices, we assume a linear transformation model with a shift term that takes the form

μd(𝒙)=δd+𝒙𝝃+d𝒙𝜸,\displaystyle\mu_{d}(\text{\boldmath$x$})=\delta d+\text{\boldmath$x$}^{\top}\text{\boldmath$\xi$}+d\text{\boldmath$x$}^{\top}\text{\boldmath$\gamma$}, (5)

where 𝝃,𝜸P\text{\boldmath$\xi$},\text{\boldmath$\gamma$}\in\mathbb{R}^{P} are the coefficient vectors for the covariates and interaction term, respectively. Under this model, the resulting covariate-specific ROC curve is

ROC(p𝒙)=1FZ(FZ1(1p)(δ+𝒙𝜸)),\displaystyle\text{ROC}(p\mid\text{\boldmath$x$})=1-F_{Z}\left(F_{Z}^{-1}(1-p)-(\delta+\text{\boldmath$x$}^{\top}\text{\boldmath$\gamma$})\right),

where the covariate effect on the ROC curve is given by the difference in shift terms between diseased and nondiseased subjects, δ(𝒙)=δ+𝒙𝜸\delta(\text{\boldmath$x$})=\delta+\text{\boldmath$x$}^{\top}\text{\boldmath$\gamma$}. Similarly, the covariate-specific AUC is given by

AUC(𝒙)=\Prob(Y0<Y1𝑿=𝒙)=a(δ(𝒙))=a(δ+𝒙𝜸),\displaystyle\text{AUC}(\text{\boldmath$x$})=\Prob(Y_{0}<Y_{1}\mid\text{\boldmath$X$}=\text{\boldmath$x$})=a\left(\delta(\text{\boldmath$x$})\right)=a\left(\delta+\text{\boldmath$x$}^{\top}\text{\boldmath$\gamma$}\right), (6)

where a:[0,1]a:\mathbb{R}\mapsto[0,1] is the AUC function from the first row of Table 1 for different choices of FZF_{Z}. The expressions for JJ, cc^{*}, sensitivity and specificity can analogously be adjusted to account for covariates, with δ\delta replaced by δ+𝒙𝜸\delta+\text{\boldmath$x$}^{\top}\text{\boldmath$\gamma$} in Table 1. Interpretation of the interaction coefficient is similar to Pepe (1998). For example, in the case of a continuous covariate X=xX=x\in\mathbb{R}, for each possible specificity value 1p(0,1)1-p\in(0,1), a unit increase in xx results in a γ\gamma-unit increase in the ROC curve (or an increase in the sensitivity) on the scale of FZF_{Z}. If γ\gamma is positive, an increase in xx corresponds to an increase in the ROC curve, indicating that a test is better able to discriminate the two populations for larger values of xx and, vice versa. Note that the ROC curve varies with the covariate contingent upon the presence of an interaction between dd and xx (Pepe, 2003). For γ=0\gamma=0, the covariate affects the distribution of the test results from the diseased and nondiseased population, but not the ROC curve. That is, for all levels of xx, the difference between the transformed distributions h(Y1)h(Y_{1}) and h(Y0)h(Y_{0}) is given by δ\delta, thus the ROC curve is unchanged. Analogous interpretations hold when we are interested in modeling a set of covariates 𝑿X, which could possibly include categorical covariates.

Standard regression techniques have also been proposed as an alternative to assess the effect of covariates on summary indices rather than deriving the induced ROC curve. For example, Dodd and Pepe (2003) model the partial AUC as a regression function of covariates. Our model equivalently results in a regression model for the AUC when the shift term is given from Equation 5 and a particularly chosen scale FZF_{Z}. Specifically, note that Equation 6 is the model proposed by Dodd and Pepe (2003), where δ+𝒙𝜸\delta+\text{\boldmath$x$}^{\top}\text{\boldmath$\gamma$} is in the form of a usual linear predictor and aa is a monotonically increasing inverse link function which defines the scale for the regression coefficients. This relationship provides another interpretation of the interaction term. For example, when FZ=cloglog1F_{Z}=\operatorname{cloglog}^{-1} or FZ=loglog1F_{Z}=\operatorname{loglog}^{-1}, exp(γ)\exp(\gamma) is a ratio of AUC odds comparing a subject with covariate value x+1x+1 to xx. The AUC odds conditional on xx are defined as \Prob(Y1>Y0X=x)/\Prob(Y1Y0X=x){\Prob(Y_{1}>Y_{0}\mid X=x)}/{\Prob(Y_{1}\leq Y_{0}\mid X=x)}. As will be shown in Section 2.2, an advantage of our approach is that we estimate the regression parameters of the transformation model using maximum likelihood estimation and do not rely on less efficient binary regression techniques. Thus, we retain parameter interpretation while allowing for efficient estimation of a variety of covariate-specific summary indices of the ROC curve.

We show that our method is additionally related to the probabilistic index model (PIM) of Thas et al. (2012); De Neve et al. (2019), who propose to model the AUC as a function of 𝑿𝑿\text{\boldmath$X$}^{*}-\text{\boldmath$X$}. Here, the superscript (*) indicates an alternate configuration of the modeled covariates. We discuss the PIM of the form

\Prob(Y<YD,D,X,X)=a(δ~(DD)+(𝑿𝑿)𝝃~+(D𝑿D𝑿)𝜸~)\displaystyle\Prob(Y<Y^{*}\mid D,D^{*},X,X^{*})=a\left(\text{$\tilde{\delta}$}(D^{*}-D)+(\text{\boldmath$X$}^{*}-\text{\boldmath$X$})^{\top}\text{\boldmath$\tilde{\xi}$}+(D^{*}\text{\boldmath$X$}^{*}-D\text{\boldmath$X$})^{\top}\text{\boldmath$\tilde{\gamma}$}\right)

where aa is the function as defined previously and the tilde (~\texttildelow) is used to distinguish the parameters of the PIM from those of the transformation model. Contrasting a diseased and nondiseased subject with the same covariate value, we let D=1D^{*}=1, D=0D=0, 𝑿=𝑿\text{\boldmath$X$}^{*}=\text{\boldmath$X$} and a=logit1a=\operatorname{logit}^{-1}, then the probabilistic index is given by expit(δ~+𝑿𝜸~)\operatorname{expit}(\text{$\tilde{\delta}$}+\text{\boldmath$X$}^{\top}\text{\boldmath$\tilde{\gamma}$}). Thus, this is equivalent to the AUC resulting from a linear transformation model with FZ=cloglog1F_{Z}=\operatorname{cloglog}^{-1} and a shift term as in Equation 5. Similarly, when FZ=probit1F_{Z}=\operatorname{probit}^{-1}, we have the following relationship between the parameters of the linear transformation model and the PIM, δ~=δ/2\text{$\tilde{\delta}$}=\delta/\sqrt{2} and 𝜸~=𝜸/2\text{\boldmath$\tilde{\gamma}$}=\text{\boldmath$\gamma$}/\sqrt{2} . As was the case for ROC curves modeled by transformation models, the interaction between DD and 𝑿X allows the AUC to vary with covariates. The main effect 𝝃~\tilde{\xi}, however, does not effect the AUC for configurations where the interest is to compare the distributions of Y1Y_{1} and Y0Y_{0} among subjects with a specific value of covariates.

We can also consider more general and potentially nonlinear formulations of the shift and scale terms in our framework. For the special case of FZ=probit1F_{Z}=\operatorname{probit}^{-1}, the AUC from a shift-scale transformation model (Siegfried et al., 2022) is given by

\Prob(Y0<Y1𝑿0=𝒙0,𝑿1=𝒙1)=Φ(μ1(𝒙1)μ0(𝒙0)σ0(𝒙0)2+σ1(𝒙1)2)\displaystyle\Prob(Y_{0}<Y_{1}\mid\text{\boldmath$X$}_{0}=\text{\boldmath$x$}_{0},\text{\boldmath$X$}_{1}=\text{\boldmath$x$}_{1})=\Phi\left(\frac{\mu_{1}(\text{\boldmath$x$}_{1})-\ \mu_{0}(\text{\boldmath$x$}_{0})}{\displaystyle\sqrt{\sigma_{0}(\text{\boldmath$x$}_{0})^{2}+\sigma_{1}(\text{\boldmath$x$}_{1})^{2}}}\right)

where 𝑿0\text{\boldmath$X$}_{0} and 𝑿1\text{\boldmath$X$}_{1} are the corresponding (potentially different) sets of covariates in the nondiseased and diseased populations, respectively. When the scale term depends only on a single set of covariates, σ0(𝒙0)=σ1(𝒙1)=σ(𝒙)\sigma_{0}(\text{\boldmath$x$}_{0})=\sigma_{1}(\text{\boldmath$x$}_{1})=\sigma(\text{\boldmath$x$}), despite varying sets of covariates in the shift terms, all the expressions hold from Table 1 with δ\delta replaced by μ1(𝒙1)μ0(𝒙0)σ(𝒙)\frac{\mu_{1}(\text{\boldmath$x$}_{1})-\mu_{0}(\text{\boldmath$x$}_{0})}{\sigma(\text{\boldmath$x$})}. However, such closed-form expressions cannot be derived for other choices of FZF_{Z} when the scale term depends on the disease indicator or on different sets of covariates. In such cases, AUCs and other summary indices can be derived using numeric techniques on the induced ROC curve.

2.1.3 Comparing correlated ROC curves

If two (or more) diagnostic tests are measured on the same set of subjects (paired design), the test results are likely to be correlated leading to paired or correlated ROC curves (Swets and Pickett, 1982). To compare the accuracy of these tests, an approach that is frequently used in practice is to consider a summary index of the ROC curve and evaluate a hypothesis for the equivalence of the indices. For example, DeLong et al. (1988) developed an asymptotically normal nonparametric statistic to compare correlated AUCs based on the theory of U-statistics; see also Hanley and McNeil (1983); Wieand et al. (1989); Thompson and Zucchini (1989); McClish (1989); Jiang et al. (1996); Bandos et al. (2005, 2006) who provide similar methods. The problem with using the AUC as a summary index is that diagnostic tests with different ROC curves could potentially have the same value for the AUC. To overcome this limitation, we construct a distribution-free test for differences in the marginal ROC curve parameters using multivariate transformation models that inherently account for the correlations between outcomes in their dependency structure. The null hypothesis represents the equality of entire ROC curves instead of comparing correlated summary indices. Additionally, our test allows for covariate adjusted comparisons to be made.

Let the continuous random vector 𝒀=(Y1\text{\boldmath$Y$}=(Y_{1}, Y2,,YJ)Y_{2},\dots,Y_{J}) denote the results of JJ diagnostic tests, possibly dependent on a set of covariates 𝑿X. This notation should not be confused with subscripts denoting quantities conditional on D=1D=1. To account for the correlation between the test results, we consider a multivariate transformation model with the strictly monotonically increasing conditional transformation function g:JJg:\mathbb{R}^{J}\mapsto\mathbb{R}^{J}. This function maps the unknown conditional distribution of 𝒀Y depending on the disease indicator and covariates, to a random vector with a known distribution 𝒁=g(𝒀d,𝒙)\text{\boldmath$Z$}=g(\text{\boldmath$Y$}\mid d,\text{\boldmath$x$}). Specifically, the vector 𝒁=(Z1,,ZJ)\text{\boldmath$Z$}=(Z_{1},\dots,Z_{J}) follows a zero mean multivariate normal distribution 𝒁NJ(𝟎,𝚺)\text{\boldmath$Z$}\sim N_{J}(\mathbf{0},\mathbf{\Sigma}) with covariance matrix 𝚺J×J\mathbf{\Sigma}\in\mathbb{R}^{J\times J}. Consequently, the marginal distribution of a multivariate normal vector is itself normally distributed with ZjN(0,ςj2)Z_{j}\sim N(0,\varsigma^{2}_{j}), where the diagonal elements of 𝚺\mathbf{\Sigma} give the variances ςj2\varsigma_{j}^{2} for j=1,,Jj=1,\dots,J. Thus, the conditional joint CDF of 𝒀Y is given by

\Prob(𝒀𝒚D=d,𝑿=𝒙)\displaystyle\Prob(\text{\boldmath$Y$}\leq\text{\boldmath$y$}\mid D=d,\text{\boldmath$X$}=\text{\boldmath$x$}) =\Prob(g(𝒀d,𝒙)g(𝒚d,𝒙))\displaystyle=\Prob(g(\text{\boldmath$Y$}\mid d,\text{\boldmath$x$})\leq g(\text{\boldmath$y$}\mid d,\text{\boldmath$x$}))
=\Prob(𝒁g(𝒚d,𝒙))\displaystyle=\Prob(\text{\boldmath$Z$}\leq g(\text{\boldmath$y$}\mid d,\text{\boldmath$x$}))
=Φ𝟎,𝚺(g1(y1d,𝒙),,gJ(yJd,𝒙))\displaystyle={\Phi}_{\mathbf{0},\mathbf{\Sigma}}\left(g_{1}(y_{1}\mid d,\text{\boldmath$x$}),\dots,g_{J}(y_{J}\mid d,\text{\boldmath$x$})\right)

where Φ𝟎,𝚺\Phi_{\mathbf{0},\mathbf{\Sigma}} is the joint CDF of a multivariate normal distribution with a zero mean vector and covariance matrix 𝚺\mathbf{\Sigma}. Letting

gj(yjd,𝒙)\displaystyle g_{j}(y_{j}\mid d,\text{\boldmath$x$}) =Φ0,ςj21(FZ(hj(yj)μd(𝒙)))\displaystyle=\Phi^{-1}_{0,\varsigma_{j}^{2}}\left(F_{Z}(h_{j}(y_{j})-\mu_{d}(\text{\boldmath$x$}))\right)

where Φ0,ς2\Phi_{0,\varsigma^{2}} is the CDF of a normal distribution with variance ς2\varsigma^{2}, the CDF of 𝒀Y coincides to the Gaussian copula and the marginal distribution of YjY_{j} is given by

\Prob(YjyjD=d,𝑿=𝒙)\displaystyle\Prob(Y_{j}\leq y_{j}\mid D=d,\text{\boldmath$X$}=\text{\boldmath$x$}) =Φ𝟎,𝚺(,,,gj(yjd,𝒙),,,)\displaystyle={\Phi}_{\mathbf{0},\mathbf{\Sigma}}\left(\infty,\dots,\infty,g_{j}(y_{j}\mid d,\text{\boldmath$x$}),\infty,\dots,\infty\right)
=Φ0,ςj2(gj(yjd,𝒙))\displaystyle=\Phi_{0,\varsigma_{j}^{2}}\left(g_{j}(y_{j}\mid d,\text{\boldmath$x$})\right)
=FZ(hj(yj)μd,j(𝒙)).\displaystyle=F_{Z}\left(h_{j}(y_{j})-\mu_{d,j}(\text{\boldmath$x$})\right).

By using the multivariate model specified we allow 𝒀Y to have a nonlinear dependence structure through the marginal transformation functions hjh_{j}. Analogous to our univariate transformation model, this multivariate model implies that the jjth marginal transformed test results follow

hj(Yj)=μd,j(𝒙)+Z\displaystyle h_{j}(Y_{j})=\mu_{d,j}(\text{\boldmath$x$})+Z

where the function μd,j\mu_{d,j} is the shift term for the jjth test result model and ZFZZ\sim F_{Z}, as described previously. The marginal ROC curve is then given by

ROCj(p𝒙)=1FZ(FZ1(1p)δj(𝒙))\displaystyle\text{ROC}_{j}(p\mid\text{\boldmath$x$})=1-F_{Z}\left(F_{Z}^{-1}(1-p)-\delta_{j}(\text{\boldmath$x$})\right)

where δj(𝒙)=μ1,j(𝒙)μ0,j(𝒙)\delta_{j}(\text{\boldmath$x$})=\mu_{1,j}(\text{\boldmath$x$})-\mu_{0,j}(\text{\boldmath$x$}) and values of marginal summary indices can be computed from the closed-form expressions given in Table 1 using the marginal ROC curve parameters. Specifically, δ\delta can be replaced by the covariate effect on the summary indices given by δj(𝒙)\delta_{j}(\text{\boldmath$x$}). From this result, we see that the null hypothesis of equal ROC curves at a given set of covariate values 𝒙x for the jjth and kkth test is equivalent to testing if δj(𝒙)δk(𝒙)=0\delta_{j}(\text{\boldmath$x$})-\delta_{k}(\text{\boldmath$x$})=0. Moreover, this hypothesis test is distribution-free as we make no assumptions about hjh_{j}.

Klein et al. (2022) establish conditions which decompose general transformation functions into the marginal ones described above. Specifically, the covariance matrix takes the form 𝚺=𝚲1𝚲\mathbf{\Sigma}=\mathbf{\Lambda}^{-1}\mathbf{\Lambda}^{-\top} where 𝚲\mathbf{\Lambda} is a lower triangular (J×J)(J\times J) matrix whose coefficients characterize the dependence structure given by the Gaussian copula. The matrix 𝚲\mathbf{\Lambda} could also vary between diseased and nondiseased subjects and potentially depend on a set of covariates 𝒙x. This allows the dependence structure of 𝒀Y to change as a function of 𝒙x. For the special case of two diagnostic tests, the correlation between Y1Y_{1} and Y2Y_{2} for a given set of covariates 𝒙x is given by λd(𝒙)λd(𝒙)2+1\frac{-\lambda_{d}(\text{\boldmath$x$})}{\sqrt{\lambda_{d}(\text{\boldmath$x$})^{2}+1}}. For more details and examples of multivariate transformation models, see Klein et al. (2022); Barbanti et al. (2022).

2.2 Univariate estimation

In this section we propose estimation methods for a transformation model with univariate test results. We provide an explicit parameterization of the transformation function and the shift term. We then maximize the likelihood contributions for a potentially exact continuous, right-, left- or interval-censored datum to jointly estimate the model parameters. This enables us to fully determine the ROC curve and its summary indices as well as handle test results which are ordinal or impacted by instrument detection limits.

2.2.1 Parameterization

We parameterize the transformation function as

h(yϑ)=𝒃(y)ϑ=m=0Mϑmbm(y)fory,\displaystyle h(y\mid\text{\boldmath$\vartheta$})=\text{\boldmath$b$}(y)^{\top}\text{\boldmath$\vartheta$}=\sum_{m=0}^{M}\vartheta_{m}b_{m}(y)\quad\text{for}\quad y\in\mathbb{R}, (7)

where 𝒃(y)=(b0(y),,bM(y))\text{\boldmath$b$}(y)=(b_{0}(y),\dots,b_{M}(y))^{\top} is vector of M+1M+1 basis functions with coefficients ϑM+1\text{\boldmath$\vartheta$}\in\mathbb{R}^{M+1}. Polynomials in Bernstein form offer a choice of basis that provides a flexible way of estimating the underlying function. The Bernstein basis polynomial of order MM is defined on the interval [l,u][l,u] as

bm(y)=(Mm)y~m(1y~)Mm,m=0,,M,\displaystyle b_{m}(y)={M\choose m}\tilde{y}^{m}(1-\tilde{y})^{M-m},\quad m=0,\dots,M, (8)

where y~=yluy[0,1]\tilde{y}=\frac{y-l}{u-y}\in[0,1]. The restriction ϑmϑm+1\vartheta_{m}\leq\vartheta_{m+1} for m=0,,M1m=0,\dots,M-1, guarantees the monotonicity of hh. Observe that the transformation function is linear in the parameters that define it and, any nonlinearity of the test results is modeled by the basis functions. If the order MM is chosen to be sufficiently large, Bernstein polynomials can uniformly approximate any real-valued continuous function on an interval (Farouki, 2012).

2.2.2 Likelihood

Denote the complete parameter vector as 𝜽=(𝜷,ϑ)\text{\boldmath$\theta$}=(\text{\boldmath$\beta$}^{\top},\text{\boldmath$\vartheta$}^{\top})^{\top}, where 𝜷=(δ,𝝃,𝜸)2P+1\text{\boldmath$\beta$}=(\delta,\text{\boldmath$\xi$}^{\top},\text{\boldmath$\gamma$}^{\top})^{\top}\in\mathbb{R}^{2P+1} are the vector of regression coefficients parameterizing the function μd\mu_{d} from Section 2.1 and ϑM+1\text{\boldmath$\vartheta$}\in\mathbb{R}^{M+1} are the basis coefficients. We follow the maximum likelihood approach proposed by Hothorn et al. (2018) to jointly estimate 𝜷\beta and ϑ\vartheta. The advantages of embedding the model in the likelihood framework are as follows. (i) All forms of random censoring (right, left, interval) as well as truncation can directly be incorporated into likelihood contributions defined in terms of the distribution function (Lindsey et al., 1996). (ii) If the given model is correctly specified, under regularity conditions, the maximum likelihood estimator (MLE) will be asymptotically the most efficient estimator (minimum variance and unbiased). (iii) The MLE is asymptotically normally distributed and has a sample variance that can be computed from the inverse of the Fisher information matrix. This can be used to generate confidence intervals for the estimated parameters. (iv) The MLE is equivariant which implies invariance of the score test (or the Lagrange multiplier test) to reparameterizations (Rao, 1948; Aitchison and Silvey, 1958; Silvey, 1959; Dagenais and Dufour, 1991). Specifically, as will be shown in Section 2.4.1, by inverting the score test, our method produces confidence bands for the ROC curve and appropriate score intervals for its summary indices.

The likelihood contribution of a single observation O=(Y,D,𝑿)O=(Y,D,\text{\boldmath$X$}) where Y(y¯,y¯]={y:y¯<yy¯}Y\in(\underline{y},\overline{y}]=\{y\in\mathbb{R}:\underline{y}<y\leq\overline{y}\} is given by

L(𝜽O)={fZ(h(yϑ)μd(𝒙𝜷))h(yϑ)y‘exact continuous’1FZ(h(y¯ϑ)μd(𝒙𝜷))y(y¯,)‘right censored’FZ(h(y¯ϑ)μd(𝒙𝜷))y(,y¯)‘left censored’FZ(h(y¯ϑ)μd(𝒙𝜷))FZ(h(y¯ϑ)μd(𝒙𝜷))y(y¯,y¯]‘interval censored’,L(\text{\boldmath$\theta$}\mid O)=\begin{cases}\begin{aligned} &f_{Z}(h(y\mid\text{\boldmath$\vartheta$})-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$}))h^{\prime}(y\mid\text{\boldmath$\vartheta$})&&y\in\mathbb{R}&\quad&\text{`exact continuous'}\\ &1-F_{Z}(h(\underline{y}\mid\text{\boldmath$\vartheta$})-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$}))&&y\in(\underline{y},\infty)&\quad&\text{`right censored'}\\ &F_{Z}(h(\overline{y}\mid\text{\boldmath$\vartheta$})-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$}))&&y\in(-\infty,\overline{y})&\quad&\text{`left censored'}\\ &\begin{split}F_{Z}(h(\overline{y}\mid\text{\boldmath$\vartheta$})-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$}))-\\ F_{Z}(h(\underline{y}\mid\text{\boldmath$\vartheta$})-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$}))\end{split}&&y\in(\underline{y},\overline{y}]&\quad&\text{`interval censored'},\\ \end{aligned}\end{cases}

where fZf_{Z} is the density function of ZZ and h(yϑ)h^{\prime}(y\mid\text{\boldmath$\vartheta$}) is the first derivative of the transformation function with respect to yy. Given a sample of NN independent and identically distributed observations OiO_{i} for i=1,,Ni=1,\dots,N, the log-likelihood is given by (𝜽)=i=1Nlog(Li(𝜽))\ell(\text{\boldmath$\theta$})=\sum_{i=1}^{N}\log(L_{i}(\text{\boldmath$\theta$})), where LiL_{i} is the likelihood contribution of observation ii. The (unconditional) maximum likelihood estimate of 𝜽\theta is the solution to the optimization problem

𝜽^=(𝜷^,ϑ^)=argmax𝜷,ϑ(𝜷,ϑ),\displaystyle\hat{\text{\boldmath$\theta$}}=(\text{\boldmath$\hat{\beta}$},\text{\boldmath$\hat{\vartheta}$})=\operatorname*{{arg\,max}}_{\text{\boldmath$\beta$},\text{\boldmath$\vartheta$}}\ell(\text{\boldmath$\beta$},\text{\boldmath$\vartheta$}),

subject to the monotonicity constraint ϑmϑm+1\vartheta_{m}\leq\vartheta_{m+1} for m=0,,M1m=0,\dots,M-1. The resulting ROC curve only depends on 𝜷\beta which is decoupled from the parameters needed to model the transformation function ϑ\vartheta. The score function is defined as the first derivative of the log-likelihood function with respect to each of the parameters and is given by

S(𝜽)=((𝜽)𝜷(𝜽)ϑ)=(S𝜷(𝜽)Sϑ(𝜽)).\displaystyle S(\text{\boldmath$\theta$})=\left(\begin{array}[]{l}\frac{\partial\ell(\text{\boldmath$\theta$})}{\partial\text{\boldmath$\beta$}}\\ \frac{\partial\ell(\text{\boldmath$\theta$})}{\partial\text{\boldmath$\vartheta$}}\end{array}\right)=\left(\begin{array}[]{l}S_{\text{\boldmath$\beta$}}(\text{\boldmath$\theta$})\\ S_{\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})\end{array}\right).

We perform constrained optimization using the likelihood and score contributions to determine the maximum likelihood estimates for 𝜷\beta and ϑ\vartheta. For computational details, see Hothorn (2020). The asymptotic variance of the MLE can further be estimated by the expected Fisher information matrix which is the variance-covariance matrix of the score function and is defined as

I(𝜽)=\E(2(𝜽)𝜷𝜷2(𝜽)𝜷ϑ2(𝜽)ϑ𝜷2(𝜽)ϑϑ)=(I𝜷,𝜷(𝜽)I𝜷,ϑ(𝜽)I𝜷,ϑ(𝜽)Iϑ,ϑ(𝜽)).\displaystyle I(\text{\boldmath$\theta$})=-\E\left(\begin{array}[]{ll}\frac{\partial^{2}\ell(\text{\boldmath$\theta$})}{\partial\text{\boldmath$\beta$}\partial\text{\boldmath$\beta$}^{\top}}&\frac{\partial^{2}\ell(\text{\boldmath$\theta$})}{\partial\text{\boldmath$\beta$}\partial\text{\boldmath$\vartheta$}^{\top}}\\ \frac{\partial^{2}\ell(\text{\boldmath$\theta$})}{\partial\text{\boldmath$\vartheta$}\partial\text{\boldmath$\beta$}^{\top}}&\frac{\partial^{2}\ell(\text{\boldmath$\theta$})}{\partial\text{\boldmath$\vartheta$}\partial\text{\boldmath$\vartheta$}^{\top}}\end{array}\right)=\left(\begin{array}[]{ll}I_{\text{\boldmath$\beta$},\text{\boldmath$\beta$}}(\text{\boldmath$\theta$})&I_{\text{\boldmath$\beta$},\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})\\ I_{\text{\boldmath$\beta$},\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})^{\top}&I_{\text{\boldmath$\vartheta$},\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})\end{array}\right).

The matrix is partitioned such that the submatrix I𝜷,𝜷(𝜽)I_{\text{\boldmath$\beta$},\text{\boldmath$\beta$}}(\text{\boldmath$\theta$}) corresponds to the parameter related to the disease indicator and covariates. The variation in ϑ\vartheta and its relationship to 𝜷\beta is not of direct relevance here but is nonetheless estimated.

2.2.3 Limit of detection

Instrument precision can affect the evaluation of diagnostic biomarkers. For example, when biomarker levels are at or below the limit of detection (LOD) yLODy_{\text{LOD}}, the observed value lies in an interval (,yLOD)(-\infty,y_{\text{LOD}}) and the resulting measurement is left censored. Often a replacement value is substituted for such measurements. Alternatively, only biomarker values above the LOD are used for the ROC analysis. It has been shown that these approaches lead to biased estimation (Lynn, 2001; Singh and Nocerino, 2002). Various adjustments to ROC curves and its summary indices have been proposed to handle such censored measurements (Mumford et al., 2006; Perkins et al., 2007, 2009; Ruopp et al., 2008; Bantis et al., 2017; Xiong et al., 2022). However, these methods typically do not account for covariates. Our framework naturally accounts for such observations in the likelihood function for left censored test results. Similarly, the right censored likelihood accounts for measurements which are affected by an upper limit of detection. Thus, our method provides a smooth covariate-specific ROC curve for all values of specificity with estimates and inference appropriately incorporating the observed information.

2.2.4 Ordinal test results

Many diagnostic tests of interest to clinical researchers are measured on ordinal scales. These are rank-ordered categorical variables for which quantitative differences between levels are unknown. Examples include the Memory Impairment Screen (MIS) for the measurement of cognition to distinguish Alzheimer’s disease and other dementias (Buschke et al., 1999) or the risk assessment of disease from radiological images such as with the Breast Imaging Reporting and Data System (BI-RADS) (American College of Radiology (2013), ACR). A simple reparameterization of the transformation function facilitates an extension of ROC curves to such ordinal test results.

Let the random variable Y{y1,,yK}Y\in\{y_{1},\dots,y_{K}\} denote the results of an ordered categorical diagnostic test with KK categories, where yk<yk+1y_{k}<y_{k+1} for k=1,,K1k=1,...,K-1 . Define the covariate-specific ROC curve as a discrete function of the subject categories

{(1F0(y𝒙),1F1(y𝒙)),y{y1,,yK}}.\displaystyle\left\{\left(1-F_{0}(y\mid\text{\boldmath$x$}),1-F_{1}(y\mid\text{\boldmath$x$})\right),y\in\{y_{1},\dots,y_{K}\}\right\}.

This set of points plots the covariate-specific 1specificity1-\text{specificity} and sensitivity coordinates for each of the KK categories. Suppose that the following transformation model defines the distribution of the ordinal test results in the diseased and nondiseased populations

Fd(yk𝒙)=FZ(h(yk)μd(𝒙)),\displaystyle F_{d}(y_{k}\mid\text{\boldmath$x$})=F_{Z}\left(h(y_{k})-\mu_{d}(\text{\boldmath$x$})\right),

where the transformation function assigns one parameter to each of the categories of the test result except the last one, h(yk)=ϑkh(y_{k})=\vartheta_{k}\in\mathbb{R} for k=1,,K1k=1,\dots,K-1. Note that since \Prob(YyK𝑿=𝒙)=1\Prob(Y\leq y_{K}\mid\text{\boldmath$X$}=\text{\boldmath$x$})=1, only K1K-1 parameters need to be specified. The monotonicity constraint ϑk<ϑk+1\vartheta_{k}<\vartheta_{k+1} is also necessary so that Fd(yk𝒙)<Fd(yk+1𝒙)F_{d}(y_{k}\mid\text{\boldmath$x$})<F_{d}(y_{k+1}\mid\text{\boldmath$x$}) for k=1,,K1k=1,\dots,K-1. Observe that this is a special case of the general shift-scale model in Equation 2. Under a linear shift term as given in Equation 5 and FZ=logit1F_{Z}=\operatorname{logit}^{-1}, this model is known as the proportional odds logistic regression (POLR).

Setting pk=1F0(yk𝒙)p_{k}=1-F_{0}(y_{k}\mid\text{\boldmath$x$}), the ROC curve under the ordinal transformation model is given by

ROC(pk)=1FZ(FZ1(1pk)δ(𝒙)),k=1,,K.\displaystyle\text{ROC}(p_{k})=1-F_{Z}\left(F_{Z}^{-1}(1-p_{k})-\delta(\text{\boldmath$x$})\right),\quad k=1,\dots,K.

The parameters of the model can be estimated by using the maximum likelihood scheme described previously with the interval censored likelihood contribution

FZ(ϑkμd(𝒙𝜷))FZ(ϑk1μd(𝒙𝜷)),\displaystyle F_{Z}(\vartheta_{k}-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$}))-F_{Z}(\vartheta_{k-1}-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$})),

where the observed subject has a test result of yky_{k} and covariate value 𝒙x. After estimating the parameters, it is possible to plot a continuous ROC curve as a function of all p[0,1]p\in[0,1]. This curve’s shape depends on the choice of FZF_{Z} and represents the underlying parametric form of the model. However, it does not represent the ROC curve for a continuous test. Specifically, the function is only defined at the KK discrete specificity values. Summary measures can be calculated using the expressions detailed in Table 1. Note that these would be based on a smooth interpolation given by the model rather a discrete one and lead to the interpretation of an ordinal test with a latent continuous scale.

2.3 Multivariate estimation

The estimation of a multivariate test result follows a similar approach to the univariate case. The marginal transformation functions are parameterized in terms of monotonically increasing basis polynomials of Bernstein form hj(yjϑj)=𝒃j(yj)ϑjh_{j}(y_{j}\mid\text{\boldmath$\vartheta$}_{j})=\text{\boldmath$b$}_{j}(y_{j})^{\top}\text{\boldmath$\vartheta$}_{j}, where 𝒃j:M+1\text{\boldmath$b$}_{j}:\mathbb{R}\mapsto\mathbb{R}^{M+1} is a vector of M+1M+1 basis functions with individual components bj,mb_{j,m} defined as in Equation 8 for j=1,,Jj=1,\dots,J and m=0,,Mm=0,\dots,M. The constraint ϑj,mϑj,m+1\vartheta_{j,m}\leq\vartheta_{j,m+1} for m=0,,M1m=0,\dots,M-1, guarantees the monotonicity of hjh_{j} and the smooth parameterization the existence of the derivative hj(yjϑj)=𝒃j(yj)ϑjh^{\prime}_{j}(y_{j}\mid\text{\boldmath$\vartheta$}_{j})=\text{\boldmath$b$}^{\prime}_{j}(y_{j})^{\top}\text{\boldmath$\vartheta$}_{j}.

With a slight abuse of notation compared to the univariate case, we also denote the vector of regression coefficients as 𝜷=(𝜷1,,𝜷J)J(2P+1)\text{\boldmath$\beta$}=(\text{\boldmath$\beta$}_{1}^{\top},\dots,\text{\boldmath$\beta$}_{J}^{\top})^{\top}\in\mathbb{R}^{J(2P+1)} and the vector of basis coefficients as ϑ=(ϑ1,,ϑJ)J(M+1)\text{\boldmath$\vartheta$}=(\text{\boldmath$\vartheta$}_{1}^{\top},\dots,\text{\boldmath$\vartheta$}_{J}^{\top})^{\top}\in\mathbb{R}^{J(M+1)}, with each component parameterizing a separate marginal conditional distribution function. The covariance matrix 𝚺\mathbf{\Sigma} captures the correlation structure between the transformed diagnostic tests and the vector 𝝇J(J+1)/2\text{\boldmath$\varsigma$}\in\mathbb{R}^{J(J+1)/2} contains all the elements of 𝚺\mathbf{\Sigma}. We denote the complete set of unknown model parameters for a transformation model with a multivariate test result as 𝜽=(𝜷,ϑ,𝝇)\text{\boldmath$\theta$}=(\text{\boldmath$\beta$}^{\top},\text{\boldmath$\vartheta$}^{\top},\text{\boldmath$\varsigma$}^{\top})^{\top}. The likelihood contribution of a single observation O=(𝒀,D,𝑿)O=(\text{\boldmath$Y$},D,\text{\boldmath$X$}), with 𝒀=𝒚=(y1,,yJ)\text{\boldmath$Y$}=\text{\boldmath$y$}=(y_{1},\dots,y_{J})^{\top} being an exact continuous vector of test results, is given by

L(𝜽O)=ϕ𝟎,𝚺(g(𝒚d,𝒙,𝜷,ϑ))j=1Jgj(yjd,𝒙,𝜷,ϑ),\displaystyle L(\text{\boldmath$\theta$}\mid O)={\phi}_{\mathbf{0},\mathbf{\Sigma}}\left(g(\text{\boldmath$y$}\mid d,\text{\boldmath$x$},\text{\boldmath$\beta$},\text{\boldmath$\vartheta$})\right)\prod_{j=1}^{J}g^{\prime}_{j}\left(y_{j}\mid d,\text{\boldmath$x$},\text{\boldmath$\beta$},\text{\boldmath$\vartheta$}\right),

where ϕ𝟎,𝚺\phi_{\mathbf{0},\mathbf{\Sigma}} is the joint density function of a multivariate normal distribution with a zero mean vector and covariance matrix 𝚺\mathbf{\Sigma}. With a choice of each component gjg_{j} from Equation 2.1.3, its derivative is given by

gj(yjd,𝒙,𝜷,ϑ)=fZ(hj(yjϑ)μd(𝒙𝜷))hj(yj𝜷)ϕ0,ςj2(Φ0,ςj21{FZ(hj(yjϑ)μd(𝒙𝜷))}).\displaystyle g^{\prime}_{j}(y_{j}\mid d,\text{\boldmath$x$},\text{\boldmath$\beta$},\text{\boldmath$\vartheta$})=\frac{f_{Z}\left(h_{j}(y_{j}\mid\text{\boldmath$\vartheta$})-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$})\right)h^{\prime}_{j}(y_{j}\mid\text{\boldmath$\beta$})}{\phi_{0,\varsigma_{j}^{2}}\left(\Phi^{-1}_{0,\varsigma_{j}^{2}}\left\{F_{Z}\left(h_{j}(y_{j}\mid\text{\boldmath$\vartheta$})-\mu_{d}(\text{\boldmath$x$}\mid\text{\boldmath$\beta$})\right)\right\}\right)}.

In the special case where FZ=probit1F_{Z}=\operatorname{probit}^{-1}, this derivative simplifies to ςjhj(yj)\varsigma_{j}h^{\prime}_{j}(y_{j}).

We derive the maximum likelihood estimate of 𝜽\theta from a sample of NN independent and identically distributed observations using constrained maximization algorithms following the strategy detailed in Section 2.2.2. For more details see Klein et al. (2022).

2.4 Confidence intervals

In the following section, we present three methods to calculate confidence bands for the ROC curve and confidence intervals for its summary indices. These quantities are functions of the parameters in the model G(𝜷):2P+1G(\text{\boldmath$\beta$}):\mathbb{R}^{2P+1}\rightarrow\mathbb{R} and to maintain nominal coverage for a confidence interval for G(𝜷)G(\text{\boldmath$\beta$}), appropriate methods are needed. The methods discussed include inversion of the score test, delta method and simulation from the asymptotic distribution of the estimate. The methods are ordered by their degree of theoretical justification. We start with score intervals which are invariant to parameter transformations but become computationally expensive when dealing with a large set of parameters. We then discuss estimating the variance using the multivariate delta method and conclude with a simple simulation method which is versatile without being computationally demanding.

2.4.1 Score intervals

In the two-sample univariate case where δ\delta defines the ROC curve, as in Equation 4, we can construct score intervals for δ\delta. Unlike the Wald and other commonly used intervals, score intervals are especially desirable as they are invariant to transformations of the parameters. That is, a score CI for G(δ)G(\delta) (e.g., the AUC a(δ)a(\delta)), provides the same level of coverage as would a score CI for δ\delta. In turn, under a correctly specified model, a score CI for δ\delta allows the construction of accurately covered uniform confidence bands for the ROC curve as well as intervals for its summary indices such as the AUC and the Youden index.

We first generate score confidence intervals for δ\delta by inverting the score test. In this case, the null hypothesis is given by H0:δ=δ0H_{0}:\delta=\delta_{0} where δ0\delta_{0} is a specific value of the parameter of interest. Under H0H_{0}, the restricted (conditional) maximum likelihood estimator for ϑ\vartheta can be obtained by

ϑ^(δ0)=argmaxϑ(δ0,ϑ),\displaystyle\text{\boldmath$\hat{\vartheta}$}(\delta_{0})=\operatorname*{{arg\,max}}_{\text{\boldmath$\vartheta$}}\ell(\delta_{0},\text{\boldmath$\vartheta$}),

or as a solution of the M+1M+1 score equations Sϑ(δ0,ϑ)=𝟎S_{\text{\boldmath$\vartheta$}}(\delta_{0},\text{\boldmath$\vartheta$})=\mathbf{0}. Note that this estimate is a function of δ0\delta_{0}. Letting 𝜽~=(δ0,ϑ^(δ0))\text{\boldmath$\tilde{\theta}$}=(\delta_{0},\text{\boldmath$\hat{\vartheta}$}(\delta_{0})), the quadratic (Rao) score statistic simplifies to

R(δ0)\displaystyle R(\delta_{0}) =S(𝜽~)I1(𝜽~)S(𝜽~)\displaystyle=S(\text{\boldmath$\tilde{\theta}$})^{\top}I^{-1}(\text{\boldmath$\tilde{\theta}$})S(\text{\boldmath$\tilde{\theta}$})
=(Sδ(𝜽~),𝟎)I1(𝜽~)(Sδ(𝜽~),𝟎)\displaystyle=(S_{\delta}(\text{\boldmath$\tilde{\theta}$})^{\top},\mathbf{0}^{\top})I^{-1}(\text{\boldmath$\tilde{\theta}$})(S_{\delta}(\text{\boldmath$\tilde{\theta}$})^{\top},\mathbf{0}^{\top})^{\top}
=Sδ(𝜽~)Aδ,δ(𝜽~)Sδ(𝜽~),\displaystyle=S_{\delta}(\text{\boldmath$\tilde{\theta}$})^{\top}A_{\delta,\delta}(\text{\boldmath$\tilde{\theta}$})S_{\delta}(\text{\boldmath$\tilde{\theta}$}),

where Aδ,δ(𝜽~)A_{\delta,\delta}(\text{\boldmath$\tilde{\theta}$}) denotes the submatrix corresponding to δ\delta of the inverse Fisher information matrix and is given by the Schur complement Iδ,δ(𝜽)Iδ,ϑ(𝜽)Iϑ,ϑ1(𝜽)Iδ,ϑ(𝜽)I_{\delta,\delta}(\text{\boldmath$\theta$})-I_{\delta,\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})I^{-1}_{\text{\boldmath$\vartheta$},\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})I_{\delta,\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})^{\top}. Under H0H_{0}, R(δ0)R(\delta_{0}) converges asymptotically to a chi-square distribution with 1 degree of freedom, R(δ0)Dχ12R(\delta_{0})\stackrel{{\scriptstyle D}}{{\longrightarrow}}\chi^{2}_{1}. This result is explained by Rao (2005). Thus, inverting the score statistic by enumerating values of δ0\delta_{0} allows for the construction of (1α)(1-\alpha) score confidence intervals for δ\delta defined as

{δ0R(δ0)<χ12(1α)},\displaystyle\{\delta_{0}\in\mathbb{R}\mid R(\delta_{0})<\chi^{2}_{1}(1-\alpha)\},

where χP2(1α)\chi^{2}_{P}(1-\alpha) is the (1α)(1-\alpha) quantile value of the χ12\chi^{2}_{1} distribution. Equivalently, we can use the square root of the score statistic to construct a (1α)(1-\alpha) score interval using quantiles of the standard normal distribution, {δ0Φ1(α/2)<R(δ0)Φ1(1α/2)}\{\delta_{0}\in\mathbb{R}\mid\Phi^{-1}(\alpha/2)<\sqrt{R(\delta_{0})}\leq\Phi^{-1}(1-\alpha/2)\}. Finally, we apply the function GG to both the lower and upper limits of the interval to construct score confidence bands for the ROC curve or score confidence intervals for its summary indices

The score statistic is given by R(δ0)=Sδ(𝜽~)2Aδ,δ(𝜽~)R(\delta_{0})=S_{\delta}(\text{\boldmath$\tilde{\theta}$})^{2}A_{\delta,\delta}(\text{\boldmath$\tilde{\theta}$}). Testing if there is a significant difference between the nondiseased and diseased populations coincides to the hypothesis test, H0:δ=0H_{0}:\delta=0. This is computationally efficient because only the distribution of R(0)R(0) needs to be computed. However, computing score CIs requires updating the restricted MLEs ϑ^(δ0)\text{\boldmath$\hat{\vartheta}$}(\delta_{0}) for an enumeration of δ0\delta_{0} values. This becomes computationally intractable when enumerating a higher dimensional grid of parameters.

2.4.2 Delta method

Since the MLE satisfies

n(𝜷^𝜷)𝐷NP+1(𝟎,A𝜷,𝜷(𝜽)),\displaystyle\sqrt{n}(\text{\boldmath$\hat{\beta}$}-\text{\boldmath$\beta$})\xrightarrow{D}N_{P+1}\left(\mathbf{0},A_{\text{\boldmath$\beta$},\text{\boldmath$\beta$}}(\text{\boldmath$\theta$})\right),

then by the multivariate delta method, G(𝜷^)G(\text{\boldmath$\hat{\beta}$}) also follows a normal distribution with

𝕍(G(𝜷^))=1nG(𝜷)A𝜷,𝜷(𝜽)G(𝜷),\displaystyle\mathbb{V}(G(\text{\boldmath$\hat{\beta}$}))=\frac{1}{n}\nabla G(\text{\boldmath$\beta$})^{\top}A_{\text{\boldmath$\beta$},\text{\boldmath$\beta$}}(\text{\boldmath$\theta$})\nabla G(\text{\boldmath$\beta$}),

where G(𝜷)\nabla G(\text{\boldmath$\beta$}) is the gradient of GG evaluated at 𝜷\beta and the inverse Fisher information matrix A𝜷,𝜷(𝜽)A_{\text{\boldmath$\beta$},\text{\boldmath$\beta$}}(\text{\boldmath$\theta$}) is given by the Schur complement I𝜷,𝜷(𝜽)I𝜷,ϑ(𝜽)Iϑ,ϑ1(𝜽)I𝜷,ϑ(𝜽)I_{\text{\boldmath$\beta$},\text{\boldmath$\beta$}}(\text{\boldmath$\theta$})-I_{\text{\boldmath$\beta$},\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})I^{-1}_{\text{\boldmath$\vartheta$},\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})I_{\text{\boldmath$\beta$},\text{\boldmath$\vartheta$}}(\text{\boldmath$\theta$})^{\top}. For example, when the shift term takes the linear form as in Equation 5 and GG defines the AUC function for FZ=probit1F_{Z}=\operatorname{probit}^{-1}, the entries of G(𝜷)\nabla G(\text{\boldmath$\beta$}) are given by

G(𝜷)δ=12C,G(𝜷)ξi=0andG(𝜷)γi=xi2C,\displaystyle\frac{\partial G(\text{\boldmath$\beta$})}{\partial\delta}=\frac{1}{\sqrt{2}}C,\quad\frac{\partial G(\text{\boldmath$\beta$})}{\partial\xi_{i}}=0\quad\text{and}\quad\frac{\partial G(\text{\boldmath$\beta$})}{\partial\gamma_{i}}=\frac{x_{i}}{\sqrt{2}}C,

where C=ϕ(δ+𝒙𝜸2)C=\phi\left(\frac{\delta+\text{\boldmath$x$}^{\top}\text{\boldmath$\gamma$}}{\sqrt{2}}\right), ϕ\phi is the density of the standard normal distribution and ii indexes the PP covariates. In general, the gradient can be estimated by calculating such derivatives and evaluating the resulting function at the MLE. Similarly, the variance-covariance matrix of the estimated parameters A𝜷,𝜷(𝜽)A_{\text{\boldmath$\beta$},\text{\boldmath$\beta$}}(\text{\boldmath$\theta$}) can be computed by inverting the numerically evaluated Hessian matrix. Thus, a (1α)(1-\alpha) level confidence interval for G(𝜷)G(\text{\boldmath$\beta$}) is given by

G(𝜷)±Φ1(α/2)𝕍^(G(𝜷^)).\displaystyle G(\text{\boldmath$\beta$})\pm\Phi^{-1}(\alpha/2)\sqrt{\hat{\mathbb{V}}(G(\text{\boldmath$\hat{\beta}$}))}.

2.4.3 Simulated intervals

When the function GG has complex derivatives, as would be the case for nonlinear shift terms μd(𝒙)\mu_{d}(\text{\boldmath$x$}) or when calculating optimal thresholds cc^{*} where GG includes the inverse of the transformation function, constructing confidence intervals using the delta method becomes infeasible. For these cases, we apply a simple simulation-based algorithm which utilizes the asymptotic normality of the MLE to calculate confidence intervals for the ROC curve and its summary indices, which are functions of the parameters of interest. The steps of the algorithm to construct (1α)(1-\alpha) level confidence intervals for G(𝜷^)G(\text{\boldmath$\hat{\beta}$}) can be summarized as follows:

  1. 1.

    Generate BB independent samples from the asymptotic multivariate normal distribution of the parameter estimates NP+1(𝜷^,1nA^𝜷,𝜷(𝚯^))N_{P+1}\left(\text{\boldmath$\hat{\beta}$},\frac{1}{n}\hat{A}_{\text{\boldmath$\beta$},\text{\boldmath$\beta$}}(\mathbf{\hat{\Theta}})\right) and denote as 𝜷^1,,𝜷^B\text{\boldmath$\hat{\beta}$}^{*}_{1},\dots,\text{\boldmath$\hat{\beta}$}^{*}_{B}.

  2. 2.

    For each sample b=1,,Bb=1,\dots,B, calculate the function of interest G(𝜷^b)G(\text{\boldmath$\hat{\beta}$}^{*}_{b}).

  3. 3.

    Construct the confidence interval (QG(𝜷^)(α/2),QG(𝜷^)(1α/2))\left(Q_{G(\text{\boldmath$\hat{\beta}$}^{*})}(\alpha/2),Q_{G(\text{\boldmath$\hat{\beta}$}^{*})}(1-\alpha/2)\right), where QG(𝜷^)Q_{G(\text{\boldmath$\hat{\beta}$}^{*})} is the empirical quantile function of the sample G(𝜷^1),,G(𝜷^B)G(\text{\boldmath$\hat{\beta}$}^{*}_{1}),\dots,G(\text{\boldmath$\hat{\beta}$}^{*}_{B}).

A similar algorithm is presented in Mandel (2013), who discuss its asymptotic validity and present several examples that show its empirical coverage adheres to nominal levels with results similar to the delta method.

3 Empirical evaluation

We conducted a simulation study to evaluate the performance of our estimators in the two-sample setting. We considered a data generating process such that nondiseased test results followed a standard normal distribution F0(y)=Φ(y)F_{0}(y)=\Phi(y) and the diseased results a distribution with the CDF F1(y)=FZ(FZ1(Φ(y))δ)F_{1}(y)=F_{Z}(F_{Z}^{-1}(\Phi(y))-\delta). To obtain different shapes of the ROC curve, we chose three choices of FZ{probit1,logit1,cloglog1}F_{Z}\in\{\operatorname{probit}^{-1},\operatorname{logit}^{-1},\operatorname{cloglog}^{-1}\} and varied δ\delta such that the AUC{0.5,0.65,0.8,0.95}\text{AUC}\in\{0.5,0.65,0.8,0.95\} or that J{0,0.25,0.5,0.8}J\in\{0,0.25,0.5,0.8\}, leading to a variety of configurations. Under this simulation paramaterization, the ROC curves followed the form of Equation 4 and summary indices could be calculated as a function of the estimated δ\delta from Table 1. The conventional binormal model corresponded to FZ=probit1=ΦF_{Z}=\operatorname{probit}^{-1}=\Phi, which induced proper binormal ROC curves. This was the only configuration where the test results for both groups were normally distributed. We included this configuration to ascertain the loss of power associated with our estimators when the standard binormal assumption held. For other choices of FZF_{Z} with AUC>0.5\text{AUC}>0.5, the resulting distributions of the diseased test results were non-normal, with variances and higher moments differing between the two groups. Specifically, the configuration of FZ=logit1F_{Z}=\operatorname{logit}^{-1} led to light tailed distributions for the diseased test results, whilst FZ=cloglog1F_{Z}=\operatorname{cloglog}^{-1} led to skewed, heavy-tailed distributions. The corresponding density functions for the data generating model with selected AUC values are given in Figure 1.

Refer to caption
Figure 1: Density functions for the model used to generate the data for the simulations. The nondiseased test results followed a standard normal distribution corresponding to an AUC=0.5\text{AUC}=0.5. The diseased test results varied with three choices of FZF_{Z}: probit1\operatorname{probit}^{-1}, logit1\operatorname{logit}^{-1} and cloglog1\operatorname{cloglog}^{-1} each of which had an AUC of 0.5, 0.65, 0.8 and 0.95.

For 10 000 replications of each configuration, we generated balanced data sets with sample sizes N0=N1{25,50,100}N_{0}=N_{1}\in\{25,50,100\}. The transformation models discussed in Section 2 were fitted to the simulated data sets assuming a parameterization of the transformation function given by a Bernstein basis polynomial of order M=6M=6. The true data-generating model had a nonlinear transformation function h=FZ1Φh=F_{Z}^{-1}\circ\Phi. Our model estimation procedure aimed to approximate this function alongside the shift parameter δ\delta. The functions implementing these models are available from the \pkgtram package (Hothorn, 2020), and included \codeBoxCox, \codeColr and \codeCoxph models which corresponded to FZF_{Z} selected as probit1\operatorname{probit}^{-1}, logit1\operatorname{logit}^{-1} and cloglog1\operatorname{cloglog}^{-1}, respectively.

Table 2 displays the bias and root-mean-square error (RMSE) for the AUC estimates using different methods under the various configurations. We found that all three methods had minimal bias and equivalent RMSEs for an AUC=0.5\text{AUC}=0.5, where the test results were unable to distinguish between the two groups. The \codeBoxCox and \codeColr models yielded approximately unbiased AUC estimates in all cases, even when they were misspecified for the true data generating process. However, estimates based on the proportional hazards model, \codeCoxph, were biased for data generating processes other than FZ=cloglog1F_{Z}=\operatorname{cloglog}^{-1}. We also computed nominal 90% score confidence intervals for the AUC using the approach described in Section 2.4.1. The observed coverage probabilities are summarized in Table 3. When FZF_{Z} was correctly specified, all methods had close to nominal coverage under all sample size configurations. In addition, the score confidence intervals from the \codeColr model were accurate even when it was misspecified for the true data generating process. This was unlike the \codeBoxCox and \codeCoxph models, whose coverage deteriorated for higher AUC values when they were misspecified.

We compared our approaches to a set of alternative methods for computing confidence intervals for the AUC and Youden index. The software details of all the methods used alongside their respective features and references are summarized in Supplementary Material Table S1. We detail the empirical coverage and average width of the confidence intervals for the AUC in Figures S1 and S2, respectively. Estimates based on transformation models (\proglangR packages \pkgtram, \pkgorm, \pkgpim) yielded coverage close to the nominal level and significantly outperformed the other methods when the model was correctly specified for the true data generating process. All other methods generally performed close to nominal levels for low to medium AUC values (0.5-0.8), but broke down for higher AUC values. Methods which used FZ=cloglog1F_{Z}=\operatorname{cloglog}^{-1} gave confidence intervals which were shorter in length (overconfident) and otherwise there were no notable differences between the methods.

Analogously, Figures S3 and S4 detail the coverage and length of the confidence intervals for the Youden index. The methods which were based on the overlap coefficient failed to cover the configuration where J=0J=0 because their lower limits were never below 0. Our methods estimated confidence intervals for δ\delta\in\mathbb{R} which naturally accounted for this scenario. Specifically, the lower limit of the CI for JJ is given by G(max{0,δl})G(\max\{0,\delta^{l}\}) and the upper limit by G(max{|δl|,|δu|})G(\max\{\lvert\delta^{l}\rvert,\lvert\delta^{u}\rvert\}), where GG is the Youden index function defined in Table 1 and (δl,δu)(\delta^{l},\delta^{u}) is the confidence interval for δ\delta. Remarkably, the transformation model with FZ=logit1F_{Z}=\operatorname{logit}^{-1} provided coverage at nominal levels for all simulation configurations with a relatively small confidence interval width. The approach of Franco-Pereira et al. (2021) (FP) was also accurate under model misspecification but was more involved. Namely, it consisted of estimating Box Cox transformation parameters under a binormal framework with bootstrap variance, all carried out on the logit scale and then back-transformed. In a setting with covariates, censoring or with J=0J=0 this methodology would be limited.

Figures S5 and S6 show the coverage and area of the confidence bands for the ROC curve. All the approaches based on transformation models covered the configuration with AUC=0.5\text{AUC}=0.5 accurately. However, the other approaches did not yield coverage close to nominal levels in this configuration with the exception of Martínez-Camblor et al. (2018), whose confidence bands had a significantly larger area. For all other configurations, only transformation models which were correctly specified for the true data-generating model provided accurate results.

FZF_{Z} probit1\text{probit}^{-1} logit1\text{logit}^{-1} cloglog1\text{cloglog}^{-1}
AUC Method N0=N1N_{0}=N_{1} Bias RMSE Bias RMSE Bias RMSE
0.5 \codeBoxCox 25 0.001 0.083 0.000 0.082 0.000 0.082
50 0.000 0.057 0.000 0.057 0.001 0.057
100 0.000 0.041 0.000 0.040 0.000 0.040
\codeColr 25 0.001 0.084 0.000 0.083 0.001 0.083
50 0.000 0.058 -0.001 0.058 0.001 0.058
100 0.000 0.041 0.000 0.041 0.000 0.041
\codeCoxph 25 0.001 0.077 0.000 0.076 0.000 0.077
50 0.000 0.052 0.000 0.052 0.001 0.052
100 0.000 0.036 0.000 0.036 0.000 0.036
0.65 \codeBoxCox 25 0.003 0.078 -0.002 0.078 0.004 0.078
50 0.001 0.055 -0.004 0.055 0.002 0.054
100 0.000 0.038 -0.006 0.040 0.002 0.038
\codeColr 25 0.001 0.078 0.002 0.078 0.002 0.079
50 0.000 0.055 0.001 0.055 0.000 0.055
100 0.000 0.038 0.000 0.039 0.001 0.038
\codeCoxph 25 -0.021 0.077 -0.028 0.080 0.006 0.071
50 -0.025 0.057 -0.032 0.061 0.003 0.049
100 -0.027 0.044 -0.034 0.050 0.002 0.034
0.8 \codeBoxCox 25 0.004 0.061 -0.002 0.066 0.003 0.063
50 0.002 0.043 -0.004 0.046 0.001 0.044
100 0.001 0.030 -0.006 0.033 0.000 0.031
\codeColr 25 0.000 0.061 0.002 0.063 0.002 0.062
50 -0.002 0.043 0.001 0.044 0.000 0.044
100 -0.002 0.030 0.000 0.031 0.000 0.031
\codeCoxph 25 -0.038 0.075 -0.049 0.086 0.007 0.054
50 -0.043 0.063 -0.056 0.075 0.003 0.038
100 -0.047 0.057 -0.060 0.070 0.001 0.027
0.95 \codeBoxCox 25 0.001 0.027 0.003 0.030 0.003 0.031
50 0.001 0.019 0.002 0.022 0.003 0.022
100 0.001 0.014 0.002 0.016 0.003 0.016
\codeColr 25 -0.006 0.028 0.000 0.028 0.001 0.028
50 -0.007 0.021 0.000 0.020 0.001 0.019
100 -0.007 0.015 0.000 0.014 0.001 0.014
\codeCoxph 25 -0.038 0.054 -0.042 0.065 0.003 0.023
50 -0.041 0.050 -0.049 0.062 0.002 0.016
100 -0.044 0.049 -0.054 0.061 0.002 0.011
Table 2: Bias and RMSE simulation results for estimation of the AUC. Data were generated using F0(y)=Φ(y)F_{0}(y)=\Phi(y) for nondiseased results and F1(y)=FZ(FZ1(Φ(y))δ)F_{1}(y)=F_{Z}(F_{Z}^{-1}(\Phi(y))-\delta) for the diseased results. We varied FZ{probit1F_{Z}\in\{\operatorname{probit}^{-1}, logit1\operatorname{logit}^{-1} and cloglog1}\operatorname{cloglog}^{-1}\}, AUC and sample size. The proposed linear transformation models \codeBoxCox, \codeColr and \codeCoxph were used from the \pkgtram package to estimate δ\delta and the transformation function h=FZ1Φh=F_{Z}^{-1}\circ\Phi.
FZF_{Z}
AUC Method N0=N1N_{0}=N_{1} probit1\text{probit}^{-1} logit1\text{logit}^{-1} cloglog1\text{cloglog}^{-1}
0.5 \codeBoxCox 25 0.888 0.890 0.890
50 0.895 0.894 0.894
100 0.896 0.900 0.894
\codeColr 25 0.890 0.890 0.893
50 0.898 0.894 0.895
100 0.895 0.901 0.896
\codeCoxph 25 0.874 0.875 0.873
50 0.887 0.887 0.886
100 0.895 0.897 0.886
0.65 \codeBoxCox 25 0.883 0.882 0.890
50 0.891 0.891 0.891
100 0.899 0.888 0.897
\codeColr 25 0.887 0.891 0.894
50 0.894 0.897 0.892
100 0.903 0.895 0.895
\codeCoxph 25 0.851 0.831 0.878
50 0.825 0.794 0.890
100 0.774 0.713 0.897
0.8 \codeBoxCox 25 0.892 0.868 0.883
50 0.893 0.876 0.885
100 0.898 0.874 0.893
\codeColr 25 0.902 0.893 0.895
50 0.905 0.898 0.898
100 0.908 0.899 0.906
\codeCoxph 25 0.762 0.685 0.892
50 0.650 0.538 0.892
100 0.462 0.304 0.896
0.95 \codeBoxCox 25 0.885 0.831 0.810
50 0.895 0.840 0.832
100 0.894 0.839 0.830
\codeColr 25 0.907 0.888 0.867
50 0.901 0.897 0.887
100 0.877 0.896 0.889
\codeCoxph 25 0.557 0.532 0.855
50 0.355 0.312 0.878
100 0.119 0.100 0.871
Table 3: Observed coverage probabilities of nominal 90% score confidence intervals for the AUC. Score intervals were calculated for δ\delta from the proposed linear transformation models \codeBoxCox, \codeColr and \codeCoxph. The score intervals for the AUC were then calculated by transforming the intervals for δ\delta using the closed-form expressions given in Table 1.

4 Application

The prevalence of obesity has increased consistently for most countries in the recent decade and this trend is a serious global health concern (Abarca-Gómez et al., 2017). Obesity contributes directly to increased risk of cardiovascular disease (CVD) and its risk factors, including type 2 diabetes, hypertension and dyslipidemia (Zalesin et al., 2008; Grundy, 2004). Metabolic syndrome (MetS) refers to the joint presence of several cardiovascular risk factors and is characterized by insulin resistance (Eckel et al., 2010). The National Cholesterol Education Program Adult Treatment Panel III (NCEP-ATP III) criteria is the most widely used definition for MetS, but it requires laboratory analysis of a blood sample. This has led to the search for non-invasive techniques which allow reliable and early detection of MetS.

Waist-to-height ratio (WHtR) is a well-known anthropometric index used to predict visceral obesity. Visceral obesity is an independent risk factor for development of MetS by means of the increased production of free fatty acids whose presence obstructs insulin activity (Bosello and Zamboni, 2000). This suggests that higher values of WHtR, reflecting obesity and CVD risk factors, are more indicative of incident MetS. Several studies have found that WHtR is highly predictive of MetS (Shao et al., 2010; Romero-Saldaña et al., 2016; Suliga et al., 2019). However, as waist circumference changes with age and gender (Stevens et al., 2010), it is also important to study whether or not the performance of WHtR at diagnosing MetS is impacted by these variables. Evaluation of WHtR as a predictor of MetS after adjusting for covariates is necessary so that more tailored interventions can be initiated to improve outcomes.

We illustrate the use of our methods to data from a cross sectional study designed to validate the use of WHtR and systolic blood pressure (SBP) as markers for early detection of MetS in a working population from the Balearic Islands (Spain). Detailed descriptions of the study methodology and population characteristics are reported in Romero-Saldaña et al. (2018). Briefly, data on 60 799 workers were collected during their work health periodic assessments between 2012 and 2016. Presence of MetS was determined by the NCEP-ATP III criteria and the sample consisted of 5 487 workers with MetS.

4.1 Two-sample analysis

FZF_{Z} δ\delta AUC JJ
Φ\Phi 1.492 (1.462, 1.521) 0.854 (0.849, 0.859) 0.544 (0.535, 0.553)
expit\operatorname{expit} 2.785 (2.730, 2.841) 0.871 (0.866, 0.875) 0.602 (0.593, 0.611)
cloglog1\text{cloglog}^{-1} 1.186 (1.157, 1.215) 0.766 (0.761, 0.771) 0.412 (0.403, 0.421)
loglog1\text{loglog}^{-1} 1.425 (1.397, 1.453) 0.806 (0.802, 0.810) 0.484 (0.475, 0.492)
Table 4: Estimates and 95% score confidence intervals of the shift paramater, AUC and JJ in the two-sample linear transformation model for WHtR as a marker of MetS.

We first examined the unconditional performance of WHtR as a marker to diagnose MetS, denoted YY and DD, respectively. We fit a linear transformation model which leads to the ROC curve of the form in Equation 4, where δ\delta is the shift parameter, for various choices of the inverse link function FZF_{Z}. Associated inference of the AUC and JJ are calculated using the closed-form expressions from Table 1. The resulting estimates are presented in Table 4. The AUCs are consistently bounded away from 0.5 indicating a good capacity of WHtR to discriminate between workers with and without MetS. This can also be seen from the estimated ROC curve plotted in Figure 2 which lies well above the diagonal line as well as the modeled densities which have a small degree of overlap. The confidence intervals and uniform confidence bands are quite small due to the large sample size.

Refer to caption
Figure 2: Estimates from the linear transformation model with a single shift parameter, h(Y)=δd+Zh(Y)=\delta d+Z where ZZ is chosen to be a standard logistic distribution. (A) Density functions of WHtR for the workers who were diagnosed with MetS (dotted line) and those who were not (solid line). (B) ROC curve for WHtR as a marker of MetS with 95% uniform score confidence bands are represented by gray shaded areas.

4.2 Conditional ROC analysis

Next, we investigated if the discriminatory ability of WHtR in separating workers with and without MetS varies with covariates. We considered a transformation model that included the main effects of covariates plus interaction terms with the disease indicator, which leads to the ROC curve given by

ROC(p𝒙)=1FZ(FZ1(1p)(δ+γ𝒙))\displaystyle\text{ROC}(p\mid\text{\boldmath$x$})=1-F_{Z}(F_{Z}^{-1}(1-p)-(\delta+\gamma^{\top}\text{\boldmath$x$}))

where the covariates 𝒙x included age, gender and tobacco consumption and the inverse link function FZ=logit1F_{Z}=\operatorname{logit}^{-1}. Figure 3 displays the covariate-specific ROC curves fitted to these data. The performance of WHtR appeared to be better for females compared to males and decreased with age. The effect of smoking, although significant in the model, does not seem to substantially alter the ROC curves given the other covariates are kept fixed. To inspect the covariate effect further, we calculated the age- and gender-specific AUCs and Youden indices from the model. Figure 4 clearly shows that the discriminatory capabilities of WHtR in distinguishing workers with MetS is consistently better for females and decreases with age.

Refer to caption
Figure 3: Estimated covariate-specific ROC curves for WHtR as a marker of MetS for female (solid line) and male workers (dashed line). Vertical panels represent a specific age (30, 40, 50) and horizontal panels smoking status.
Refer to caption
Figure 4: Age-based AUC and Youden indices where WHtR is used as a marker to detect MetS for non-smoking female (solid line) and male (dashed line) workers. 95% Wald pointwise confidence bands are represented by gray shaded areas.

4.3 Comparing correlated biomarkers

Type WHtR SBP Difference
δj\delta_{j} 1.874 (1.811, 1.945) 1.214 (1.151, 1.273) 0.660 (0.578, 0.750)
AUC 0.907 (0.900, 0.915) 0.805 (0.792, 0.816) 0.103 (0.090, 0.117)
JJ 0.651 (0.635, 0.669) 0.456 (0.435, 0.476) 0.195 (0.171, 0.222)
Table 5: Marginal estimates and 95% Wald confidence intervals of the shift paramater, AUC and Youden index from the two-sample bivariate normal transformation model for WHtR and SBP as markers of MetS for a 40-year non-smoking female worker. Wald confidence intervals are calculated by simulating from the asymptotic distribution of the MLE.
Refer to caption
Figure 5: Estimated marginal ROC curves for WHtR (solid line) and SBP (dot dashed line) as markers of MetS for a 40-year old non-smoking female worker. 95% uniform confidence bands are represented by gray shaded areas.

Finally, we compare the performance of WHtR to SBP, explicitly accounting for the within-patient clustered nature of these measurements. That is, both markers are measured on the same subject and hence their measurements are correlated. We consider a transformation model for the joint distribution of 𝒀=(WHtR,SBP)\text{\boldmath$Y$}=(\text{WHtR},\text{SBP}) that induces a marginal ROC curve for the jjth marker, given by

ROCj(p𝒙)=1FZ(FZ1(1p)(δj+γj𝒙))for j{WHtR, SBP},\displaystyle\text{ROC}_{j}(p\mid\text{\boldmath$x$})=1-F_{Z}(F_{Z}^{-1}(1-p)-(\delta_{j}+\gamma_{j}^{\top}\text{\boldmath$x$}))\quad\text{for }j\in\{\text{WHtR, SBP}\},

where δj\delta_{j} is the marginal shift parameter, γj\gamma_{j} are the marginal interaction coefficients and FZ=probit1F_{Z}=\operatorname{probit}^{-1}. Figure 5 shows the corresponding estimated marginal ROC curves for the two markers for a 40-year old non-smoking female worker. The uniform confidence band is generated by the simulation procedure detailed in Section 2.4.3. Whilst both markers perform well at predicting MetS, WHtR seems to be favorable with a ROC curve uniformly above SBP. We tested the hypothesis for equality of ROC curves H0:ROC1(p)=ROC2(p)H_{0}:\text{ROC}_{1}(p)=\text{ROC}_{2}(p) for all p[0,1]p\in[0,1], using the marginal coefficients from the joint model. This hypothesis test is equivalent to testing if the difference in the AUCs or Youden indices of the two markers is equal to 0. The estimates, differences and corresponding confidence intervals are detailed in Table 5. The difference of 0.103 between AUCs is significant (pp value <0.001<0.001) and indicates that the WHtR is better at discriminating incident MetS compared to SBP.

5 Discussion

This article presents a new modeling framework for ROC analysis that can be used to characterize the accuracy of medical diagnostic tests. Our model is based on estimating an unknown transformation function for the test results and yields a distribution-free yet model-based estimator for the ROC curve. Covariates that influence the diagnostic accuracy of tests can naturally be accommodated as regression parameters into the model and covariate-specific summary indices such as the AUC and Youden index are easily computed using closed-form expressions. We extend the same conceptual transformation framework to a multivariate setting for comparing correlated diagnostic tests after appropriately accounting for their dependency structure.

Our proposed approach has several features which distinguish it from contemporary methods of ROC analysis. Firstly, we employ maximum likelihood to jointly estimate all parameters defining the transformation function and regression coefficients. This implies the variation in the estimated transformation parameters is accounted for and appropriately propagated for the inference of the ROC curve. In turn, asymptotic efficiency is guaranteed for our estimators and we avoid reliance on resampling procedures for the construction of confidence intervals. Secondly, transformation models focus on estimating the conditional distribution function whose evaluation directly provides the likelihood contributions for interval, right- and left-censored data that commonly arises due to instrument detection limits. Additionally, with a simple reparameterization of the transformation function, our model extends to ROC curves for ordinal test results. Thirdly, no strong assumptions are made regarding the transformation function which results in a highly flexible model that retains interpretability of the regression coefficients. Fourthly, we develop a test for covariate-specific correlated ROC curves, a problem which has yet to be addressed in the literature. Lastly, software implementations for all the methods described in this article are available in the \pkgtram \proglangR package (see Supplementary Material for example code), thus enabling a unified framework for ROC analysis.

One aspect that warrants further investigation is model selection, specifically with regards to the choice of FZF_{Z}. In general, this may depend on the aim of the study and distinct features of the data. Our simulation explored the extent to which the bias and coverage of the resulting estimates were affected by model choice. We found that a model with FZ=logit1F_{Z}=\operatorname{logit}^{-1} provided accurate results even when it was misspecified for the true data generating process. This model also behaves very similarly to the semiparametric cumulative probability model (Tian et al., 2020), also known as a proportional odds ordinal logistic model, both which estimate a log-odds ratio δ\delta. The equivalence of the transformation model’s odds ratio to the MWW test statistic has been well studied (Wang and Tian, 2017). The MWW statistic has a bounded influence function and is robust to contaminations of the specified model (Hampel, 1974). Due to their equivalence, we hypothesize that the transformation model with FZ=logit1F_{Z}=\operatorname{logit}^{-1} is also endowed with the same robustness properties as the MWW and can be chosen when no a priori model is known. However, further research is needed to develop suitable model verification and selection methods.

Our proposed estimators assume that the difference between the diseased and nondiseased distributions is described by a shift-term, μd\mu_{d} on the scale of FZF_{Z}. A relaxation of this assumption would allow for separate transformation functions in each of the two groups. Namely, consider a model where the nondiseased results follow a distribution with the CDF F0(y)=FZ(h0(y))F_{0}(y)=F_{Z}(h_{0}(y)) and the diseased with the CDF F1(y)=FZ(h1(y))F_{1}(y)=F_{Z}(h_{1}(y)). Defining a new transformation function r=h1h01FZ1:[0,1]r=h_{1}\circ h_{0}^{-1}\circ F_{Z}^{-1}:[0,1]\mapsto\mathbb{R}, the smooth ROC curve with no shift assumptions is given by ROC(p)=1FZ(r(1p))\text{ROC}(p)=1-F_{Z}(r(1-p)). This model has more flexibility but sacrifices the properness property desirable for ROC curves. Furthermore, care has to be taken in defining the correct likelihood contributions for accurate inference of this model as uncertainty enters from both transformation functions.

In future work we plan to pursue various extensions of transformation models for ROC analysis to consider (1) penalty terms for high-dimensional covariates (Kook and Hothorn, 2021), (2) mixed effects for clustered observations (Tamási and Hothorn, 2021), and (3) covariate-dependent transformation functions through forest-based machine learning methods (Hothorn and Zeileis, 2021).

Funding

Financial support by Swiss National Science Foundation, grant number 200021_184603, is gratefully acknowledged.

References

  • Abarca-Gómez et al. (2017) Abarca-Gómez L, Abdeen ZA, Hamid ZA, Abu-Rmeileh NM, Acosta-Cazares B, Acuin C, Adams RJ, Aekplakorn W, Afsana K, Aguilar-Salinas CA, et al. (2017). “Worldwide trends in body-mass index, underweight, overweight, and obesity from 1975 to 2016: a pooled analysis of 2416 population-based measurement studies in 128· 9 million children, adolescents, and adults.” The Lancet, 390(10113), 2627–2642.
  • Aitchison and Silvey (1958) Aitchison J, Silvey S (1958). “Maximum-likelihood estimation of parameters subject to restraints.” The Annals of Mathematical Statistics, pp. 813–828.
  • Alonzo and Pepe (2002) Alonzo TA, Pepe MS (2002). “Distribution-free ROC analysis using binary regression techniques.” Biostatistics, 3(3), 421–432.
  • American College of Radiology (2013) (ACR) American College of Radiology (ACR) (2013). ACR BI-RADS atlas: Breast Imaging Reporting and Data System. 5th edition. American College of Radiology.
  • Bamber (1975) Bamber D (1975). “The area above the ordinal dominance graph and the area below the receiver operating characteristic graph.” Journal of Mathematical Psychology, 12(4), 387–415.
  • Bandos et al. (2005) Bandos AI, Rockette HE, Gur D (2005). “A permutation test sensitive to differences in areas for comparing ROC curves from a paired design.” Statistics in Medicine, 24(18), 2873–2893.
  • Bandos et al. (2006) Bandos AI, Rockette HE, Gur D (2006). “A permutation test for comparing ROC curves in multireader studies: a multi-reader ROC, permutation test.” Academic Radiology, 13(4), 414–420.
  • Bantis and Feng (2016) Bantis LE, Feng Z (2016). “Comparison of two correlated ROC curves at a given specificity or sensitivity level.” Statistics in Medicine, 35(24), 4352–4367.
  • Bantis et al. (2017) Bantis LE, Yan Q, Tsimikas JV, Feng Z (2017). “Estimation of smooth ROC curves for biomarkers with limits of detection.” Statistics in Medicine, 36(24), 3830–3843.
  • Barbanti et al. (2022) Barbanti L, Brandl R, Hothorn T (2022). “Multi-species count transformation models.” arXiv:2201.13095 [stat.ME], URL https://arxiv.org/abs/2201.13095.
  • Bianco et al. (2020) Bianco AM, Boente G, Gonzalez-Manteiga W (2020). “A robust approach for ROC curves with covariates.” arXiv:2007.00150 [stat.ME], URL https://arxiv.org/abs/2007.00150.
  • Bickel (1986) Bickel P (1986). “Efficient Testing in a Class of Transformation Models.” In Proceedings of the 45th Session of the International Statistical Institute, volume 23, pp. 63–23.
  • Bickel and Doksum (1981) Bickel PJ, Doksum KA (1981). “An analysis of transformations revisited.” Journal of the American Statistical Association, 76(374), 296–311.
  • Bickel et al. (1993) Bickel PJ, Klaassen CA, Bickel PJ, Ritov Y, Klaassen J, Wellner JA, Ritov Y (1993). Efficient and adaptive estimation for semiparametric models, volume 4. Johns Hopkins University Press Baltimore.
  • Bosello and Zamboni (2000) Bosello O, Zamboni M (2000). “Visceral obesity and metabolic syndrome.” Obesity Reviews, 1(1), 47–56.
  • Box and Cox (1964) Box GE, Cox DR (1964). “An analysis of transformations.” Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211–243.
  • Branscum et al. (2008) Branscum AJ, Johnson WO, Hanson TE, Gardner IA (2008). “Bayesian semiparametric ROC curve estimation and disease diagnosis.” Statistics in Medicine, 27(13), 2474–2496.
  • Buschke et al. (1999) Buschke H, Kuslansky G, Katz M, Stewart WF, Sliwinski MJ, Eckholdt HM, Lipton RB (1999). “Screening for dementia with the memory impairment screen.” Neurology, 52(2), 231–231.
  • Cai and Moskowitz (2004) Cai T, Moskowitz CS (2004). “Semi-parametric estimation of the binormal ROC curve for a continuous diagnostic test.” Biostatistics, 5(4), 573–586.
  • Cai and Pepe (2002) Cai T, Pepe MS (2002). “Semiparametric receiver operating characteristic analysis to evaluate biomarkers for disease.” Journal of the American Statistical Association, 97(460), 1099–1107.
  • Dagenais and Dufour (1991) Dagenais MG, Dufour JM (1991). “Invariance, nonlinear models, and asymptotic tests.” Econometrica: Journal of the Econometric Society, 59(6), 1601–1615.
  • De Neve et al. (2019) De Neve J, Thas O, Gerds TA (2019). “Semiparametric linear transformation models: Effect measures, estimators, and applications.” Statistics in Medicine, 38(8), 1484–1501.
  • DeLong et al. (1988) DeLong ER, DeLong DM, Clarke-Pearson DL (1988). “Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.” Biometrics, 44(3), 837–845.
  • Demidenko (2012) Demidenko E (2012). “Confidence intervals and bands for the binormal ROC curve revisited.” Journal of Applied Statistics, 39(1), 67–79.
  • Dodd and Pepe (2003) Dodd LE, Pepe MS (2003). “Partial AUC estimation and regression.” Biometrics, 59(3), 614–623.
  • Dorfman and Alf Jr (1969) Dorfman DD, Alf Jr E (1969). “Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals-rating-method data.” Journal of Mathematical Psychology, 6(3), 487–496.
  • Dorfman et al. (1997) Dorfman DD, Berbaum KS, Metz CE, Lenth RV, Hanley JA, Dagga HA (1997). “Proper receiver operating characteristic analysis: the bigamma model.” Academic Radiology, 4(2), 138–149.
  • Eckel et al. (2010) Eckel RH, Alberti KG, Grundy SM, Zimmet PZ (2010). “The metabolic syndrome.” The Lancet, 375(9710), 181–183.
  • Egan (1975) Egan JP (1975). Signal Detection Theory and ROC-Analysis. Academic Press.
  • Erkanli et al. (2006) Erkanli A, Sung M, Jane Costello E, Angold A (2006). “Bayesian semi-parametric ROC analysis.” Statistics in Medicine, 25(22), 3905–3928.
  • Faraggi (2003) Faraggi D (2003). “Adjusting receiver operating characteristic curves and related indices for covariates.” Journal of the Royal Statistical Society: Series D (The Statistician), 52(2), 179–192.
  • Farouki (2012) Farouki RT (2012). “The Bernstein polynomial basis: A centennial retrospective.” Computer Aided Geometric Design, 29(6), 379–419.
  • Fay (2022) Fay MP (2022). \pkgasht: Applied Statistical Hypothesis Tests. \proglangR package version 0.9.7, URL https://CRAN.R-project.org/package=asht.
  • Feller (1991) Feller W (1991). An Introduction to Probability Theory and Its Applications. Wiley, New York.
  • Feng et al. (2020) Feng D, Manevski D, Perme MP (2020). \pkgauRoc: Various Methods to Estimate the AUC. \proglangR package version 0.2-1, URL https://CRAN.R-project.org/package=auRoc.
  • Franco-Pereira et al. (2021) Franco-Pereira AM, Nakas CT, Reiser B, Carmen Pardo M (2021). “Inference on the overlap coefficient: The binormal approach and alternatives.” Statistical Methods in Medical Research, 30(12), 2672–2684.
  • Gail and Green (1976) Gail MH, Green SB (1976). “A generalization of the one-sided two-sample Kolmogorov-Smirnov statistic for evaluating diagnostic tests.” Biometrics, pp. 561–570.
  • Ghosal et al. (2022) Ghosal S, Grantz KL, Chen Z (2022). “Estimation of multiple ordered ROC curves using placement values.” Statistical Methods in Medical Research, pp. 1470–1483.
  • Gönen and Heller (2010) Gönen M, Heller G (2010). “Lehmann family of ROC curves.” Medical Decision Making, 30(4), 509–517.
  • González-Manteiga et al. (2011) González-Manteiga W, Pardo-Fernández JC, van Keilegom I (2011). “ROC curves in non-parametric location-scale regression models.” Scandinavian Journal of Statistics, 38(1), 169–184.
  • Green and Swets (1966) Green DM, Swets JA (1966). Signal Detection Theory and Psychophysics, volume 1. Wiley New York.
  • Grundy (2004) Grundy SM (2004). “Obesity, metabolic syndrome, and cardiovascular disease.” The Journal of Clinical Endocrinology & Metabolism, 89(6), 2595–2600.
  • Hampel (1974) Hampel FR (1974). “The influence curve and its role in robust estimation.” Journal of the American Statistical Association, 69(346), 383–393.
  • Hanley and McNeil (1982) Hanley JA, McNeil BJ (1982). “The meaning and use of the area under a receiver operating characteristic (ROC) curve.” Radiology, 143(1), 29–36.
  • Hanley and McNeil (1983) Hanley JA, McNeil BJ (1983). “A method of comparing the areas under receiver operating characteristic curves derived from the same cases.” Radiology, 148(3), 839–843.
  • Harrell Jr (2001) Harrell Jr FE (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis, volume 608. Springer.
  • Harrell Jr (2022) Harrell Jr FE (2022). \pkgrms: Regression Modeling Strategies. \proglangR package version 6.3-0, URL https://CRAN.R-project.org/package=rms.
  • Hothorn (2020) Hothorn T (2020). “Most likely transformations: The \pkgmlt package.” Journal of Statistical Software, 92, 1–68.
  • Hothorn et al. (2014) Hothorn T, Kneib T, Bühlmann P (2014). “Conditional transformation models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1), 3–27.
  • Hothorn et al. (2018) Hothorn T, Möst L, Bühlmann P (2018). “Most likely transformations.” Scandinavian Journal of Statistics, 45(1), 110–134.
  • Hothorn and Zeileis (2021) Hothorn T, Zeileis A (2021). “Predictive distribution modeling using transformation forests.” Journal of Computational and Graphical Statistics, 30(4), 1181–1196.
  • Hsieh and Turnbull (1996) Hsieh F, Turnbull BW (1996). “Nonparametric and semiparametric estimation of the receiver operating characteristic curve.” The Annals of Statistics, 24(1), 25–40.
  • Inácio et al. (2013) Inácio V, Jara A, Hanson TE, de Carvalho M (2013). “Bayesian nonparametric ROC regression modeling.” Bayesian Analysis, 8(3), 623–646.
  • Inácio et al. (2021a) Inácio V, M Lourenço V, de Carvalho M, Parker RA, Gnanapragasam V (2021a). “Robust and flexible inference for the covariate-specific receiver operating characteristic curve.” Statistics in Medicine, 40(26), 5779–5795.
  • Inácio et al. (2021b) Inácio V, Rodríguez-Álvarez MX, Gayoso-Diz P (2021b). “Statistical evaluation of medical tests.” Annual Review of Statistics and Its Application, 8, 41–67.
  • Jensen et al. (2000) Jensen K, Müller HH, Schäfer H (2000). “Regional confidence bands for ROC curves.” Statistics in Medicine, 19(4), 493–509.
  • Jiang et al. (1996) Jiang Y, Metz CE, Nishikawa RM (1996). “A receiver operating characteristic partial area index for highly sensitive diagnostic tests.” Radiology, 201(3), 745–750.
  • Khan and Brandenburger (2020) Khan MRA, Brandenburger T (2020). \pkgROCit: Performance Assessment of Binary Classifier with Visualization. \proglangR package version 2.1.1, URL https://CRAN.R-project.org/package=ROCit.
  • Khan (2022) Khan RA (2022). “Resilience family of receiver operating characteristic curves.” arXiv:2203.13665 [stat.ME], [math.ST], URL https://arxiv.org/abs/2203.13665.
  • Klein et al. (2022) Klein N, Hothorn T, Barbanti L, Kneib T (2022). “Multivariate conditional transformation models.” Scandinavian Journal of Statistics, 49(1), 116–142.
  • Komaba et al. (2022) Komaba A, Johno H, Nakamoto K (2022). “A novel statistical approach for two-sample testing based on the overlap coefficient.” arXiv:2206.03166 [math.ST], URL https://arxiv.org/abs/2206.03166.
  • Konietschke et al. (2015) Konietschke F, Placzek M, Schaarschmidt F, Hothorn LA (2015). \pkgnparcomp: An \proglangR Software Package for Nonparametric Multiple Comparisons and Simultaneous Confidence Intervals.” Journal of Statistical Software, 64(9), 1–17. URL http://www.jstatsoft.org/v64/i09/.
  • Kook and Hothorn (2021) Kook L, Hothorn T (2021). “Regularized Transformation Models: The \pkgtramnet Package.” R Journal, 13(1), 581.
  • Lindsey et al. (1996) Lindsey JK, et al. (1996). Parametric Statistical Inference. Oxford University Press.
  • Lynn (2001) Lynn HS (2001). “Maximum likelihood inference for left-censored HIV RNA data.” Statistics in Medicine, 20(1), 33–45.
  • Mandel (2013) Mandel M (2013). “Simulation-based confidence intervals for functions with complicated derivatives.” The American Statistician, 67(2), 76–81.
  • Martínez-Camblor (2022) Martínez-Camblor P (2022). “About the use of the overlap coefficient in the binary classification context.” Communications in Statistics-Theory and Methods, pp. 1–11.
  • Martínez-Camblor et al. (2018) Martínez-Camblor P, Pérez-Fernández S, Corral N (2018). “Efficient nonparametric confidence bands for receiver operating-characteristic curves.” Statistical Methods in Medical Research, 27(6), 1892–1908.
  • McClish (1989) McClish DK (1989). “Analyzing a portion of the ROC curve.” Medical Decision Making, 9(3), 190–195.
  • McIntosh and Pepe (2002) McIntosh MW, Pepe MS (2002). “Combining several screening tests: optimality of the risk score.” Biometrics, 58(3), 657–664.
  • Molodianovitch et al. (2006) Molodianovitch K, Faraggi D, Reiser B (2006). “Comparing the areas under two correlated ROC curves: parametric and non-parametric approaches.” Biometrical Journal, 48(5), 745–757.
  • Mumford et al. (2006) Mumford SL, Schisterman EF, Vexler A, Liu A (2006). “Pooling biospecimens and limits of detection: effects on ROC curve analysis.” Biostatistics, 7(4), 585–598.
  • Ogilvie and Creelman (1968) Ogilvie JC, Creelman CD (1968). “Maximum-likelihood estimation of receiver operating characteristic curve parameters.” Journal of Mathematical Psychology, 5(3), 377–391.
  • Pan and Metz (1997) Pan X, Metz CE (1997). “The “proper” binormal model: parametric receiver operating characteristic curve estimation with degenerate data.” Academic Radiology, 4(5), 380–389.
  • Pepe (1997) Pepe MS (1997). “A regression modelling framework for receiver operating characteristic curves in medical diagnostic testing.” Biometrika, 84(3), 595–608.
  • Pepe (1998) Pepe MS (1998). “Three approaches to regression analysis of receiver operating characteristic curves for continuous test results.” Biometrics, pp. 124–135.
  • Pepe (2000) Pepe MS (2000). “An interpretation for the ROC curve and inference using GLM procedures.” Biometrics, 56(2), 352–359.
  • Pepe (2003) Pepe MS (2003). The Statistical Evaluation of Medical Tests for Classification and Prediction. Medicine.
  • Pérez Fernández et al. (2018) Pérez Fernández S, Martínez Camblor P, Filzmoser P, Corral Blanco NO, et al. (2018). \pkgnsROC: an \proglangR package for non-standard ROC curve analysis.” The \proglangR Journal, 10 (2).
  • Perez-Jaume et al. (2017) Perez-Jaume S, Skaltsa K, Pallarès N, Carrasco JL (2017). \pkgThresholdROC: Optimum Threshold Estimation Tools for Continuous Diagnostic Tests in \proglangR.” Journal of Statistical Software, 82(4), 1–21. 10.18637/jss.v082.i04.
  • Perkins et al. (2007) Perkins NJ, Schisterman EF, Vexler A (2007). “Receiver operating characteristic curve inference from a sample with a limit of detection.” American Journal of Epidemiology, 165(3), 325–333.
  • Perkins et al. (2009) Perkins NJ, Schisterman EF, Vexler A (2009). “Generalized ROC curve inference for a biomarker subject to a limit of detection and measurement error.” Statistics in Medicine, 28(13), 1841–1860.
  • Rao (1948) Rao CR (1948). “Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation.” In Mathematical Proceedings of the Cambridge Philosophical Society, volume 44, pp. 50–57. Cambridge University Press.
  • Rao (2005) Rao CR (2005). “Score Test: Historical Review and Recent Developments.” In Advances in Ranking and Selection, Multiple Comparisons, and Reliability, pp. 8–20. Birkhäuser, Boston, U.S.A.
  • Ridout and Linkie (2009) Ridout M, Linkie M (2009). “Estimating overlap of daily activity patterns from camera trap data.” Journal of Agricultural, Biological, and Environmental Statistics, 14(3), 322–337.
  • Robin et al. (2011) Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M (2011). \pkgpROC: an open-source package for \proglangR and \proglangS+ to analyze and compare ROC curves.” BMC Bioinformatics, 12, 77.
  • Rodríguez-Álvarez et al. (2011) Rodríguez-Álvarez MX, Roca-Pardiñas J, Cadarso-Suárez C (2011). “ROC curve and covariates: extending induced methodology to the non-parametric framework.” Statistics and Computing, 21(4), 483–499.
  • Romero-Saldaña et al. (2016) Romero-Saldaña M, Fuentes-Jiménez FJ, Vaquero-Abellán M, Álvarez-Fernández C, Molina-Recio G, López-Miranda J (2016). “New non-invasive method for early detection of metabolic syndrome in the working population.” European Journal of Cardiovascular Nursing, 15(7), 549–558.
  • Romero-Saldaña et al. (2018) Romero-Saldaña M, Tauler P, Vaquero-Abellán M, López-González AA, Fuentes-Jiménez FJ, Aguiló A, Álvarez-Fernández C, Molina-Recio G, Bennasar-Veny M (2018). “Validation of a non-invasive method for the early detection of metabolic syndrome: a diagnostic accuracy test in a working population.” BMJ Open, 8(10), e020476.
  • Ruopp et al. (2008) Ruopp MD, Perkins NJ, Whitcomb BW, Schisterman EF (2008). “Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection.” Biometrical Journal, 50(3), 419–430.
  • Schisterman et al. (2004) Schisterman EF, Faraggi D, Reiser B (2004). “Adjusting the generalized ROC curve for covariates.” Statistics in Medicine, 23(21), 3319–3331.
  • Schmid and Schmidt (2006) Schmid F, Schmidt A (2006). “Nonparametric estimation of the coefficient of overlapping—theory and empirical application.” Computational Statistics & Data Analysis, 50(6), 1583–1596.
  • Shao et al. (2010) Shao J, Yu L, Shen X, Li D, Wang K (2010). “Waist-to-height ratio, an optimal predictor for obesity and metabolic syndrome in Chinese adults.” The Journal of Nutrition, Health & Aging, 14(9), 782–785.
  • Siegfried et al. (2022) Siegfried S, Kook L, Hothorn T (2022). “Distribution-Free Location-Scale Regression.” arXiv:2208.05302 [stat.ME], URL https://arxiv.org/abs/2208.05302.
  • Silvey (1959) Silvey SD (1959). “The Lagrangian multiplier test.” The Annals of Mathematical Statistics, 30(2), 389–407.
  • Singh and Nocerino (2002) Singh A, Nocerino J (2002). “Robust estimation of mean and variance using environmental data sets with below detection limit observations.” Chemometrics and Intelligent Laboratory Systems, 60(1-2), 69–86.
  • Stevens et al. (2010) Stevens J, Katz EG, Huxley RR (2010). “Associations between gender, age and waist circumference.” European Journal of Clinical Nutrition, 64(1), 6–15.
  • Suliga et al. (2019) Suliga E, Ciesla E, Głuszek-Osuch M, Rogula T, Głuszek S, Kozieł D (2019). “The usefulness of anthropometric indices to identify the risk of metabolic syndrome.” Nutrients, 11(11), 2598.
  • Swets and Pickett (1982) Swets JA, Pickett RM (1982). Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. New York: Academic Press.
  • Tamási and Hothorn (2021) Tamási B, Hothorn T (2021). \pkgtramME: Mixed-Effects Transformation Models Using Template Model Builder.” The R Journal, 13(1), 581–594.
  • Thas et al. (2012) Thas O, Neve JD, Clement L, Ottoy JP (2012). “Probabilistic index models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(4), 623–671.
  • Therneau (2022) Therneau TM (2022). A Package for Survival Analysis in \proglangR. \proglangR package version 3.3-1, URL https://CRAN.R-project.org/package=survival.
  • Thompson and Zucchini (1989) Thompson ML, Zucchini W (1989). “On the statistical analysis of ROC curves.” Statistics in Medicine, 8(10), 1277–1290.
  • Tian et al. (2020) Tian Y, Hothorn T, Li C, Harrell Jr FE, Shepherd BE (2020). “An empirical comparison of two novel transformation models.” Statistics in Medicine, 39(5), 562–576.
  • Tosteson and Begg (1988) Tosteson ANA, Begg CB (1988). “A general regression methodology for ROC curve estimation.” Medical Decision Making, 8(3), 204–215.
  • Wang and Tian (2017) Wang Y, Tian L (2017). “The Equivalence Between Mann-Whitney Wilcoxon Test and Score Test Based on the Proportional Odds Model for Ordinal Responses.” In 4th International Conference on Industrial Economics System and Industrial Security Engineering (IEIS), pp. 1–5. IEEE, Kyoto, Japan. 10.1109/IEIS.2017.8078606.
  • Weitzman (1970) Weitzman MS (1970). Measures of Overlap of Income Distributions of White and Negro Families in the United States, volume 3. US Bureau of the Census.
  • Wieand et al. (1989) Wieand S, Gail MH, James BR, James KL (1989). “A family of nonparametric statistics for comparing diagnostic markers with paired or unpaired data.” Biometrika, 76(3), 585–592.
  • Xiong et al. (2022) Xiong C, Luo J, Agboola F, Grant E, Morris JC (2022). “A family of estimators to diagnostic accuracy when candidate tests are subject to detection limits—Application to diagnosing early stage Alzheimer disease.” Statistical Methods in Medical Research, 31(5), 882–898.
  • Yao et al. (2010) Yao F, Craiu RV, Reiser B (2010). “Nonparametric covariate adjustment for receiver operating characteristic curves.” Canadian Journal of Statistics, 38(1), 27–46.
  • Youden (1950) Youden WJ (1950). “Index for rating diagnostic tests.” Cancer, 3(1), 32–35.
  • Zalesin et al. (2008) Zalesin KC, Franklin BA, Miller WM, Peterson ED, McCullough PA (2008). “Impact of obesity on cardiovascular disease.” Endocrinology and Metabolism Clinics of North America, 37(3), 663–684.
  • Zou (1997) Zou KH (1997). Analysis of Some Transformation Models for the Two-sample Problem with Special Reference to Receiver Operating Characteristic Curves. Ph.D. thesis, University of Rochester.
  • Zou and Hall (2000) Zou KH, Hall W (2000). “Two transformation models for estimating an ROC curve derived from continuous data.” Journal of Applied Statistics, 27(5), 621–631.
  • Zou and Hall (2002) Zou KH, Hall W (2002). “Semiparametric and parametric transformation models for comparing diagnostic markers with paired design.” Journal of Applied Statistics, 29(6), 803–816.
  • Zou et al. (1997) Zou KH, Hall WJ, Shapiro DE (1997). “Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests.” Statistics in Medicine, 16(19), 2143–2156.
  • Zou et al. (2011) Zou KH, Liu A, Bandos AI, Ohno-Machado L, Rockette HE (2011). Statistical Evaluation of Diagnostic Performance: Topics in ROC analysis. CRC Press.
  • Zou et al. (1998) Zou KH, Tempany CM, Fielding JR, Silverman SG (1998). “Original smooth receiver operating characteristic curve estimation from continuous data: statistical methods for analyzing the predictive value of spiral CT of ureteral stones.” Academic Radiology, 5(10), 680–687.

Appendix A Simulation study

Reference \proglangR package ROC AUC Youden index
Estimate CB Estimate CI Estimate CI
Hothorn (2020) \pkgtram
Harrell Jr (2001, 2022) \pkgrms
Thas et al. (2012) \pkgpim
Therneau (2022) \pkgsurvival
Robin et al. (2011) \pkgpROC
Fay (2022) \pkgasht
Konietschke et al. (2015) \pkgnparcomp
Khan and Brandenburger (2020) \pkgROCit
Feng et al. (2020) \pkgauRoc
Perez-Jaume et al. (2017) \pkgThresholdROC
Ridout and Linkie (2009) \pkgoverlap
Franco-Pereira et al. (2021) -
Pérez Fernández et al. (2018) \pkgnsROC
Table S1: Overview of the different methods used in the simulation study. References to the original publication along with \proglangR software details are given. The (✓) indicates if a method computes the specific metric. The metrics included estimates for the ROC curve, AUC and Youden index as well as corresponding confidence bands or confidence intervals.
Refer to caption
Figure S1: Empirical coverage probabilities of 90% confidence intervals for the AUC from the different methods used in the simulation study. The methods are denoted by their \proglangR package name from Table S1 followed by a short abbreviation to uniquely identify methods with multiple procedures. These included “pbt” (probit1\operatorname{probit}^{-1}), “lgt” (logit1\operatorname{logit}^{-1}), “clg” (cloglog1\operatorname{cloglog}^{-1}), “asy” (asymptotic), “boot” (bootstrap) and “bn” (binormal).
Refer to caption
Figure S2: Average width of 90% confidence intervals for the AUC from the different methods used in the simulation study. The methods are denoted by their \proglangR package name from Table S1 followed by a short abbreviation to uniquely identify methods with multiple procedures. These included “pbt” (probit1\operatorname{probit}^{-1}), “lgt” (logit1\operatorname{logit}^{-1}), “clg” (cloglog1\operatorname{cloglog}^{-1}), “asy” (asymptotic), “boot” (bootstrap) and “bn” (binormal).
Refer to caption
Figure S3: Empirical coverage probabilities of 90% confidence intervals for the Youden index from the different methods used in the simulation study. The methods are denoted by their \proglangR package name from Table S1 followed by a short abbreviation to uniquely identify methods with multiple procedures. These included “pbt” (probit1\operatorname{probit}^{-1}), “lgt” (logit1\operatorname{logit}^{-1}), “clg” (cloglog1\operatorname{cloglog}^{-1}), “norm” (bootstrap normal approximation interval), “basic” (basic bootstrap interval), “perc” (bootstrap percentile interval), “dta” (delta method), “bc” (BoxCox transformation), “boot” (bootstrap variance), “bn eq” (binormal equal variance) and “bn uneq” (binormal unequal variance).
Refer to caption
Figure S4: Average width of 90% confidence intervals for the AUC from the different methods used in the simulation study. The methods are denoted by their \proglangR package name from Table S1 followed by a short abbreviation to uniquely identify methods with multiple procedures. These included “pbt” (probit1\operatorname{probit}^{-1}), “lgt” (logit1\operatorname{logit}^{-1}), “clg” (cloglog1\operatorname{cloglog}^{-1}), “norm” (bootstrap normal approximation interval), “basic” (basic bootstrap interval), “perc” (bootstrap percentile interval), “dta” (delta method), “bc” (BoxCox transformation), “boot” (bootstrap variance), “bn eq” (binormal equal variance) and “bn uneq” (binormal unequal variance).
Refer to caption
Figure S5: Empirical coverage probabilities of 90% confidence bands for the ROC curve from the different methods used in the simulation study. The methods are denoted by their \proglangR package name from Table S1 followed by a short abbreviation to uniquely identify methods with multiple procedures. These included “pbt” (probit1\operatorname{probit}^{-1}), “lgt” (logit1\operatorname{logit}^{-1}), “clg” (cloglog1\operatorname{cloglog}^{-1}), “boot” (empirical bootstrap), “bn” (binormal), “psn” (Martínez-Camblor et al., 2018), “dek” (Demidenko, 2012) and “jms” (Jensen et al., 2000).
Refer to caption
Figure S6: Average areas of 90% confidence bands for the ROC curve from the different methods used in the simulation study. The methods are denoted by their \proglangR package name from Table S1 followed by a short abbreviation to uniquely identify methods with multiple procedures. These included “pbt” (probit1\operatorname{probit}^{-1}), “lgt” (logit1\operatorname{logit}^{-1}), “clg” (cloglog1\operatorname{cloglog}^{-1}), “boot” (empirical bootstrap), “bn” (binormal), “psn” (Martínez-Camblor et al., 2018), “dek” (Demidenko, 2012) and “jms” (Jensen et al., 2000).

Appendix B Computational details

This section provides \proglangR code for fitting transformation models and conducting ROC analysis for the metabolic syndrome (MetS) application detailed in the paper. We first consider the unconditional performance of waist-to-height ratio (WHtR) as a diagnostic test of MetS. Next, we examined if the discriminatory ability of WHtR varies with covariates. Finally, we compare the performance of an alternative diagnostic test for MetS, systolic blood pressure (SBP). The structure of the code follows the tables and figures presented in the main text.

B.1 Two-sample analysis

\MakeFramed
library("tram")
library("rdryad")

### MetS data; doi: 10.5061/dryad.cb51t54
dd <- dryad_files_download(79303)
dat <- read.csv(dd[[1]])
dat$D <- dat$MetS
dat$Y <- dat$WHtR
supp <- with(dat, c(min(Y), max(Y)))
nd <- data.frame(D = c(0,1))
ndD1 <- nd[2,, drop = FALSE]

### Two-sample ROC analysis
### Model with F_Z = logit^{-1}
(m_lgt <- Colr(Y ~ D, data = dat, support = supp))

### ROC curve with 95 percent Wald CI
r <- ROC(m_lgt, newdata = ndD1, conf.level = 0.95)
plot(r)
### Score test
(sc <- score_test(m_lgt, level = 0.95))

### Summary indices and Wald CIs
PI(m_lgt, newdata = ndD1, conf.level = 0.95)
TV(m_lgt, newdata = ndD1, conf.level = 0.95)
OVL(m_lgt, newdata = ndD1, conf.level = 0.95)

### Score CIs for summary indices
PI(sc$conf.int, link = "logistic")
TV(rev(sc$conf.int), link = "logistic")
OVL(sc$conf.int, link = "logistic")

### Modeled densities
grd <- seq(supp[1], supp[2], by=0.001)
dens <- predict(m_lgt, nd, type = "density", q=grd)
plot(grd, dens[,1], type = "l", ylim = c(0, 10),
     xlab = "Waist-to-height ratio (WHtR) cm/cm",
     ylab = "Density")
lines(grd, dens[,2], lty = 3)
legend("topright", legend=c("No MetS", "MetS"), lty = c(1, 3))
\endMakeFramed

B.2 Conditional ROC analysis

\MakeFramed
(m_cov <- Colr(Y ~ D * (Age + Gender + Smoke), data = dat, support = supp))

grd_age <- 20:70
nd1 <- expand.grid(D = 1, Age = grd_age, Gender = c(0, 1), Smoke = 0)
nd0 <- expand.grid(D = 0, Age = grd_age, Gender = c(0, 1), Smoke = 0)

### Covariate-specific ROC curve for 40 year old non-smokers
### Solid line: Female, Dashed line: Male
p <- seq(0, 1, by = 0.001)
rcov <- ROC(m_cov,
            newdata = subset(nd1, Age == 40),
            reference = subset(nd0, Age == 40),
            one2one = T, prob = p)
plot(rcov)
### Age-based AUC for non-smokers
### Solid line: Female, Dashed line: Male
auc <- PI(m_cov,
          newdata = nd1,
          reference = nd0,
          one2one = TRUE, conf.level = 0.95)
id_f <- which(nd1$Gender==0)
id_m <- which(nd1$Gender==1)

## Females
plot(grd_age, auc[id_f, "Estimate"],
     xlab = "Age (in years)", ylab = "AUC",
     ylim = c(0, 1), type = "l")
lines(grd_age, auc[id_f, "lwr"], lty = 3, col = "gray50")
lines(grd_age, auc[id_f, "upr"], lty = 3, col = "gray50")

## Males
lines(grd_age, auc[id_m, "Estimate"], lty = 2)
lines(grd_age, auc[id_m, "lwr"], lty = 3, col = "gray50")
lines(grd_age, auc[id_m, "upr"], lty = 3, col = "gray50")
legend("bottomright", legend =