Improved inference for MCP-Mod approach using time-to-event endpoints with small sample sizes

Márcio A. Diniz Biostatistics Research Center, Samuel Oschin Comprehensive Cancer Center, Cedars-Sinai Medical Center, Los Angeles, California, USA, e-mail to [email protected]. Diego I. Gallardo Department of Mathematics, Engineering School, University of Atacama, Copiapó, Chile, e-mail to [email protected]. Tiago M. Magalhães Department of Statistics, Institute of Exact Sciences, Federal University of Juiz de Fora, Juiz de Fora, Brazil, e-mail to [email protected].

Abstract

The Multiple Comparison Procedures with Modeling Techniques (MCP-Mod) framework has been recently approved by the U.S. Food and Administration and European Medicines Agency as fit-for-purpose for phase II studies. Nonetheless, this approach relies on the asymptotic properties of Maximum Likelihood (ML) estimators, which might not be reasonable for small sample sizes. In this paper, we derived improved ML estimators and correction for their covariance matrices in the censored Weibull regression model based on the corrective and preventive approaches. We performed two simulation studies to evaluate ML and improved ML estimators with their covariance matrices in (i) a regression framework (ii) the Multiple Comparison Procedures with Modeling Techniques framework. We have shown that improved ML estimators are less biased than ML estimators yielding Wald-type statistics that controls type I error without loss of power in both frameworks. Therefore, we recommend the use of improved ML estimators in the MCP-Mod approach to control type I error at nominal value for sample sizes ranging from 5 to 25 subjects per dose.

Keywords: MCP-Mod approach; small sample size; Weibull model; bias correction; covariance refinement.

1 Introduction

Adequate designs for early-phase trials is essential to a successful clinical drug development. Traditionally, investigators evaluate safety in phase I trials, proof-of-concept (PoC) in phase IIa trials, and efficacy in phase IIb trials. When drugs are promising in these early stages, phase III trials are designed accruing a large number of patients to provide a definitive evidence of efficacy. Nonetheless, Jardim et al. [1] showed that 20% of 80 cancer drug programs submitted between 2009 to 2015 to the Food and Drug Administration (FDA) did not have any data for phase II trials such that 31% obtained FDA approval; 46% had positive PoC with of 76% FDA-approval; and 34% had negative PoC with only 15% of FDA-approval. Therefore, attaining proof-of-concept is a predictor to a successful phase III trial.

A classic PoC is designed with a highest dose allowable based on Phase I clinical trial results to compare with placebo in a two-arm design or a historical threshold in a one-arm design. Once proof of concept is shown, dose range studies to identify a minimum effective dose (MED) or target dose (TD) are conducted following two possible approaches: multiple comparisons (MCP) and modeling (Mod). The MCP approach corresponds to evaluate contrasts among doses while preserving the family-wise error rate (FWER), which is robust to distribution assumptions but restricted to the set of doses under investigation; the Mod approach assumes a dose-response relationship allowing to estimate the response for a dose even if that dose is not under investigation but it heavily depends on choosing the appropriate functional form.

Bretz et al. [2] proposed a framework named Multiple Comparison Procedures with Modeling Techniques (MCP-Mod) unifying phase IIa and IIb trials into a seamless design while taking advantage of both traditional approaches for normally distributed endpoints. Later, Pinheiro et al. [3] extended this methodology to general parametric models using generalized least squares estimation allowing statisticians to consider more complex designs and other types of endpoints such as binary and time to event. Regulatory agencies have stated their approval of the MCP-Mod framework as an adequate and efficient methodology for design and analysis of phase II dose-finding studies that will guide dose selection for phase III trials.

The MCP-Mod framework is implemented in two steps: (i) MCP-step consists of a trend test to assess the presence of a dose response signal among a set of pre-specified candidate models while preserving FWER; (ii) Mod-step corresponds to estimate dose-response curves in order to identify the optimal dose that achieves a desired level of response in comparison to placebo among the models that were selected in the MCP-step. Therefore, the properties of the trend test defined as a Wald statistic used in the MCP-step and the maximum likelihood (ML) estimators with their covariance matrix used in the Mod-step are essential to the successful implementation of MCP-Mod approach. However, both steps rely on the the asymptotic properties of the ML estimators, which are only valid for large sample sizes.

In cancer mouse studies, time-to-death is an endpoint that could be used to guide the identification of the minimum effective dose (MED) with large expected effect sizes and small sample sizes. Nonetheless, the application of the MCP-Mod framework is limited due the underlying asymptotic assumptions. In this context, parametric survival models such as the censored Weibull regression model (WRM) was showed to be useful given that provides clinical meaningful interpretation based on event time ratios or hazard ratios[4]. Moreover, the asymptotic properties of the ML estimators can be studied and refined for small sample sizes. Recently, Magalhães et al. [5] obtained the skewness coefficient of the distribution of the maximum likelihood estimators for WRM, and Magalhães et al. [6] derived improved test statistics for LR, score and gradient tests but not for the Wald statistic.

Our main goal is to derive improved ML estimators for the regression parameters and its second-order covariance matrix for WRM, then use them as input to the generalized least squares procedure proposed by Pinheiro et al. [3] yielding a type I error probability closer to the nominal value when testing proof-of-activity and more accurate MED estimates. Moreover, we particularize the results from Cox and Snell [7] and Magalhães et al. [8] that are very general to WRM and they can be used in a much broader context than the MCP-Mod framework.

The remaining paper is organized as follows: in Section 2, we revisit the Weibull distribution and its properties; in Section 3, we review the main concepts of the MCP-Mod framework; in Section 4, we introduce the improved estimators; in Section 5, we conducted a simulation study to evalute the use of improved estimators in the MCP-Mod framework; finally, in Section 6, we presented some concluding remarks.

2 Weibull distribution

The Weibull distribution [9] is commonly used to analysis of time-to-event or lifetime data and a continuous random variable $T$ is called Weibull, if its probability density function (pdf) is

\displaystyle f(t;\lambda,\sigma)=\frac{1}{\sigma\lambda^{1/\sigma}}t^{1/\sigma-1}\exp\left\{-\left(t/\lambda\right)^{1/\sigma}\right\},\ t>0,

(1)

where $\sigma>0$ is the shape parameter and $\lambda>0$ is the scale parameter, it says $T\sim$ WE $(\lambda,\sigma)$ . Two particular models under this parametrization are obtained for $\sigma=1$ and $\sigma=1/2$ , which represents the exponential and the Rayleigh models with means $\lambda$ and $\lambda\sqrt{\pi/2}$ , respectively. In this work, we focused on those models. However, if $\sigma$ is unknown, we assume that it can be replaced by consistent estimate. The $\rho\%$ survival time is given by

\displaystyle t_{\rho}=\lambda[-\log(\rho)]^{\sigma}

(2)

The regression structure can be incorporated in (1) by making

\displaystyle\log(\lambda_{i})={\mbox{\boldmath{$x$}}}_{i}^{\top}{\mbox{\boldmath{$\beta$}}}

(3)

where $\beta$ is a p-vector of unknown parameters and ${\mbox{\boldmath{$x$}}}_{i}$ is a vector of predictors related to the $i$ th observation.

In lifetime data, there is the censoring restriction, i.e, if $T_{1},\ldots,T_{n}$ are a random sample from (1), instead of $T_{i}$ , we observe, under right censoring, $t_{i}=\min(T_{i},L_{i})$ , where $L_{i}$ is the censoring time, independent of $T_{i}$ and $\delta_{i}=1$ , if $T_{i}\leq L_{i}$ or $\delta_{i}=0$ , otherwise, $i=1,\ldots,n$ . Here, we consider an hybrid censoring scheme, where the study is finalized when a pre-fixed number $r\leq n$ out of $n$ observations have failed, as well as when a prefixed time, say $L_{1}=\cdots=L_{n}=L$ , has been reached. The type I censoring is a particular case for $r=n$ and the type II censoring appears when $L_{1},\ldots,L_{n}=+\infty$ . Additionally, we add the non-informative censoring assumption, i.e., the random variables $L_{i}$ does not depend on $\lambda$ . Usually, the regression modeling considers the distribution of $Y_{i}=\log(T_{i})$ instead of $T_{i}$ , which is an accelerated lifetime model form, see Kalbfleisch and Prentice[10]. The distribution of $Y_{i}$ is of the extreme value form with pdf given by

\displaystyle f(y_{i};{\mbox{\boldmath{$x$}}}_{i})=\frac{1}{\sigma}\exp\left\{\frac{y_{i}-\mu_{i}}{\sigma}-\exp\left(\frac{y_{i}-\mu_{i}}{\sigma}\right)\right\},\quad-\infty<y_{i}<\infty,

(4)

where $\mu_{i}=\log\lambda_{i}$ . From this moment, we assume that $\sigma$ is known, Then, the log-likelihood function derived from (4) is given by

\displaystyle\ell({\mbox{\boldmath{$\beta$}}})=\sum_{i=1}^{n}\left[\delta_{i}\left(-n\log\sigma+\frac{y_{i}-\mu_{i}}{\sigma}\right)-\exp\left(\frac{y_{i}-\mu_{i}}{\sigma}\right)\right].

The total score function and the total Fisher information matrix for $\beta$ are, respectively, ${\mbox{\boldmath{$U$}}}_{{\bm{\beta}}}=\sigma^{-1}{\mbox{\boldmath{$X$}}}^{\top}{\mbox{\boldmath{$W$}}}^{1/2}{\mbox{\boldmath{$v$}}}$ and ${\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}=\sigma^{-2}{\mbox{\boldmath{$X$}}}^{\top}{\mbox{\boldmath{$W$}}}{\mbox{\boldmath{$X$}}}$ , where ${\mbox{\boldmath{$X$}}}=({\mbox{\boldmath{$x$}}}_{1},\ldots,{\mbox{\boldmath{$x$}}}_{n})^{\top}$ , the model matrix, assuming rank $({\mbox{\boldmath{$X$}}})=p$ , ${\mbox{\boldmath{$W$}}}=$ diag $(w_{1},\ldots,w_{n})$ , $w_{i}=\mathds{E}\left[\exp\left(\frac{y_{i}-\mu_{i}}{\sigma}\right)\right]$ and ${\mbox{\boldmath{$v$}}}=(v_{1},\ldots,v_{n})^{\top}$ , $v_{i}=\left\{-\delta_{i}+\exp\left(\frac{y_{i}-\mu_{i}}{\sigma}\right)\right\}w_{i}^{-1/2}$ . It can observed that the value of $w_{i}$ depends on the mechanism of censoring. That means $w_{i}=q\times\left\{1-\exp\left[-L_{i}^{1/\sigma}\exp(-\mu_{i}/\sigma)\right]\right\}+(1-q)\times\left(r/n\right)$ , where $q=\mathds{P}\left(W_{(r)}\leq\log L_{i}\right)$ and $W_{(r)}$ denotes the $r$ th order statistic from $W_{1},\ldots,W_{n}$ . Note that $q=1$ and $q=0$ for types I and II censoring, respectively, as showed in Magalhães et al. [5]. The maximum likelihood estimator (MLE) of $\beta$ , $\widehat{{\mbox{\boldmath{$\beta$}}}}$ , is the solution of ${\mbox{\boldmath{$U$}}}_{{\bm{\beta}}}={\bf 0}$ . The $\widehat{{\mbox{\boldmath{$\beta$}}}}$ can not be expressed in closed-form. It is typically obtained by numerically maximizing the log-likelihood function using a Newton or quasi-Newton nonlinear optimization algorithm. Under mild regularity conditions and in large samples,

\displaystyle\widehat{{\bm{\beta}}}\sim\mbox{N}_{p}\left({\bm{\beta}},{\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}\right),

(5)

approximately. The classic Wald test[11] statistic is

\displaystyle W_{MLE}=\left({\bm{C}}\widehat{\bm{\beta}}-{\bm{C}}{\bm{\beta}}^{(0)}\right)^{\top}\left\{{\bm{C}}\widehat{\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}{\bm{C}}^{\top}\right\}^{-1}\left({\bm{C}}\widehat{\bm{\beta}}-{\bm{C}}{\bm{\beta}}^{(0)}\right),

(6)

where $C$ is a matrix of contrasts $m\times p$ . Under the null hypothesis $\mathcal{H}:{\mbox{\boldmath{$C$}}}{\mbox{\boldmath{$\beta$}}}={\mbox{\boldmath{$C$}}}{\mbox{\boldmath{$\beta$}}}^{(0)}$ , $W_{MLE}$ has a $\chi_{p}^{2}$ distribution up to an error of order $n^{-1}$ . The null hypothesis is rejected for a given nominal level, $\alpha$ say, if the test statistic exceeds the upper $100(1-\alpha)\%$ quantile of the $\chi_{p}^{2}$ distribution.

3 MCP-Mod General approach

We briefly summarize the two-stage procedure discussed by Pinheiro et al. [3] following the same notation. We consider that time-to-event responses $t_{ij}$ for doses $x_{i}$ given to $j$ th subject for $i=0,\ldots,p$ and $j=1,\ldots,n_{i}$ can be described by the Weibull distribution with scale parameters $\lambda_{i}$ and shape parameter $\sigma$ defined in (1) and (3). Then, we consider the parameter $\mu$ as our response for the dose-response model such that it could be defined as the median survival time (2) for $\rho=0.5$ or, alternatively, $\log(\lambda)$ .

Initially, a set of candidate dose-response models $\mu_{i}=f(x_{i},\mbox{\boldmath{$\theta$}})$ is considered such that each model can be rewritten as function of standardized model as below

\displaystyle f_{m}(x,\mbox{\boldmath{$\theta$}})=\theta_{0}+\theta_{1}f_{m}^{0}(x,\mbox{\boldmath{$\theta$}}^{0})

(7)

for $m=1,\ldots,M$ , where $(\theta_{0},\theta_{1})$ are unknown parameters. In this work, we consider $(M=5)$ standardized models: (i) linear: $f^{0}(x,\mbox{\boldmath{$\theta$}}^{0})=x$ ; (ii) emax : $f^{0}(x,\mbox{\boldmath{$\theta$}}^{0})=x/(x+ED_{50})$ where $ED_{50}$ can be interpreted as the dose that produces the desired response on 50% of subjects; (iii) exponential: $f^{0}(x,\mbox{\boldmath{$\theta$}}^{0})=\exp\{x/\delta\}-1$ , where $\delta$ is the exponential rate; (iv) logistic: $f^{0}(x,\mbox{\boldmath{$\theta$}}^{0})=1/(1+\exp\{(ED_{50}-x)/\delta\})$ (v) beta: $f^{0}(x,\mbox{\boldmath{$\theta$}}^{0})=\beta(\delta_{1},\delta_{2})(x/scal)^{\delta_{1}}(1-x/scal)^{\delta_{2}}$ .

3.1 MCP-step

In this step, a set of contrasts corresponding to the candidate models will be tested. Let $\mu$ denote the estimated dose-response parameter vector. For each candidate model, an optimal contrast $\mbox{\boldmath{$c$}}^{opt}$ that maximizes the probability of rejecting the hypothesis of non-signal dose-response is derived assuming that the candidate model is correct and guess estimates for $\mbox{\boldmath{$\theta$}}^{0}$ .

\displaystyle\mbox{\boldmath{$c$}}^{opt}\propto\mbox{\boldmath{$S$}}^{-1}\left(\mbox{\boldmath{$\mu$}}_{m}^{0}-\frac{{\mbox{\boldmath{$\mu$}}_{m}^{0}}^{\top}\mbox{\boldmath{$S$}}^{-1}\mbox{\boldmath{$1$}}}{\mbox{\boldmath{$1$}}\mbox{\boldmath{$S$}}^{-1}\mbox{\boldmath{$1$}}^{\top}}\right),

(8)

where $\mbox{\boldmath{$\mu$}}_{m}^{0}=(f_{m}^{0}(x_{1},\mbox{\boldmath{$\theta$}}^{0}),\ldots,f_{m}^{0}(x_{p},\mbox{\boldmath{$\theta$}}^{0}))$ and $S$ is the covariance matrix of $\mu$ . Assuming that $\hat{\mu}$ follows approximately $\mbox{N}_{p}(\mbox{\boldmath{$\mu$}},\mbox{\boldmath{$S$}})$ , the test of hypotheses for proof-of-concept can be translated to $H_{0}:\mbox{\boldmath{$c$}}^{opt}_{m}\mbox{\boldmath{$\mu$}}=0$ vs. $H_{1}:\mbox{\boldmath{$c$}}^{opt}_{m}\mbox{\boldmath{$\mu$}}>0$ for candidate model $m$ based on the Wald test statistic

\displaystyle W^{(m)}=(\mbox{\boldmath{$c$}}^{opt}_{m}\mbox{\boldmath{$\hat{\mu}$}})^{\top}\left\{\mbox{\boldmath{$C$}}^{opt}\mbox{\boldmath{$\hat{S}$}}{\mbox{\boldmath{$C$}}^{opt}}^{\top}\right\}^{-1}_{m,m}\mbox{\boldmath{$c$}}^{opt}_{m}\mbox{\boldmath{$\hat{\mu}$}}

(9)

where $\hat{S}$ is the estimated covariance matrix, $\mbox{\boldmath{$C$}}^{opt}$ is the matrix $m\times p$ of optimal contrast with $[\mbox{\boldmath{$A$}}]_{m,m}$ denoting the $m^{th}$ diagonal element of matrix $A$ for $m=1,\ldots,M$ . Critical values for tests are derived based on the joint distribution for $W=(W^{(1)},\ldots,W^{(M)})$ allowing one to calculate multiplicity adjusted p-values controlling the FWER at a prespecificed nominal type I error $\alpha$ .

3.2 Mod-step

In this step, the estimation of non-linear dose responses models is performed in two stages. In the first stage, the parameters $\mbox{\boldmath{$\mu$}}=[\mu_{1},\ldots,\mu_{p}]$ are estimated using standard software packages with analysis of variance (ANOVA) parametrization for the design matrix resulting into a separate parameter $\mu_{i}$ for each dose level $x_{i}$ for $i=1,\ldots,D$ . In particular for $\mu=\log(\lambda)$ , we have $\mbox{\boldmath{$\mu$}}=\mbox{\boldmath{$\beta$}}$ and $\mbox{\boldmath{$S$}}={\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}$ .

In the second stage, the non-linear dose-response model $f(x,\mbox{\boldmath{$\theta$}})$ is fitted by minimizing the generalized linear squares (GLS) criterion:

\displaystyle\ \Psi(\mbox{\boldmath{$\theta$}})=(\mbox{\boldmath{$\hat{\mu}$}}-\mbox{\boldmath{$f(x,\theta)$}})^{\prime}\mbox{\boldmath{$\hat{S}$}}^{-1}(\mbox{\boldmath{$\hat{\mu}$}}-\mbox{\boldmath{$f(x,\theta)$}})

(10)

with respect to $\theta$ .

Then, the minimum effective dose can be estimated as $\widehat{MED}=\{x|f(x,\mbox{\boldmath{$\hat{\theta}$}})>f(0,\mbox{\boldmath{$\hat{\theta}$}})+\Delta\}$ , where $\Delta$ is a clinical meaningful threshold and $\hat{\theta}$ minimizes (10).

4 Improved inference

4.1 Bias correction

Inferences based on maximum likelihood method depend strongly on asymptotic properties. Among these properties, the MLE is approximately non-biased, in other words, $\mathds{E}(\widehat{{\bm{\beta}}}-{\bm{\beta}})=\mathcal{O}(n^{-1})$ , which is essential to define the mean of the normal distribution of $\widehat{{\bm{\beta}}}$ . Therefore, likelihood inferences based on asymptotic approximation may not be reliable when sample sizes are small or moderate, and two approaches are available to correct the MLE.

The corrective approach.

The bias of the MLE can be written as $\mathds{E}(\widehat{{\bm{\beta}}}-{\bm{\beta}})=\mathds{B}({\bm{\beta}})+\mathcal{O}(n^{-2})$ , where $\mathds{B}({\bm{\beta}})$ is a term of order $\mathcal{O}(n^{-1})$ , a function of the derivatives of the log-likelihood function. Cox and Snell [7] proposed a bias-corrected maximum likelihood estimator (BCE), that can be expressed as $\widetilde{{\bm{\beta}}}=\widehat{{\bm{\beta}}}-\mathds{B}(\widehat{{\bm{\beta}}})$ , where $\mathds{B}({\bm{\beta}})$ is the term of order $n^{-1}$ , evaluated in $\widehat{{\bm{\beta}}}$ and $\mathds{E}(\widetilde{{\mbox{\boldmath{$\lambda$}}}}-{\mbox{\boldmath{$\lambda$}}})=\mathcal{O}(n^{-2})$ , i.e., less biased then MLE of ${\bm{\beta}}$ . The Cox and Snell’s method is known as a corrective approach because the MLE is calculated and then, the bias correction is applied. For the censored Weibull regression model, the expression of $\mathds{B}(\widehat{\bm{\beta}})$ has the form

\displaystyle\mathds{B}(\widehat{\bm{\beta}})=-\frac{1}{2\sigma^{3}}{\bm{P}}{\bm{Z}}_{d}\left({\bm{W}}+2\sigma{\bm{W}}^{\prime}\right){\bm{1}},

(11)

where ${\bm{P}}={\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}{\bm{X}}^{\top}$ , ${\bm{Z}}={\bm{X}}{\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}{\bm{X}}^{\top}$ , ${\bm{Z}}_{d}$ is a diagonal matrix with diagonal given by the diagonal of ${\bm{Z}}$ , ${\bm{W}}^{\prime}=$ diag $(w_{1}^{\prime},\ldots,w_{n}^{\prime})$ , $w_{i}^{\prime}=-\sigma^{-1}L_{i}^{1/\sigma}\exp\{-L_{i}^{1/\sigma}\exp(-\mu_{i}/\sigma)-\mu_{i}/\sigma\}$ and ${\bm{1}}$ is a $n$ -dimensional vector of ones.

The preventive approach.

As alternative to the corrective approach, Firth [12] proposed the following modification in the score vector:

\displaystyle{\bm{U}}_{{\bm{\beta}}}^{\star}={\bm{U}}_{{\bm{\beta}}}-{\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}\mathds{B}({\bm{\beta}}),

(12)

where $\mathds{B}({\bm{\beta}})$ is given by (11). The estimator $\check{{\bm{\beta}}}$ , solution of ${\bm{U}}_{{\bm{\beta}}}^{\star}={\bm{0}}$ , has a bias of order $\mathcal{O}(n^{-2})$ . This is a preventive approach because the procedure already computes a less biased estimator than the regular MLE.

4.2 Covariance correction

From the general result of Magalhães et al. [8] , we derived the specific matrix expression for the MLE and BCE second-order covariance matrices for the censored Weibull regression model and it is given by

\displaystyle\mbox{{\bf Cov}}_{\bm{2}}^{\bm{\tau}}({\bm{\beta}}^{\star})={\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}+{\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}\left\{{\bm{\Delta}}+{\bm{\Delta}}^{\top}\right\}{\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}+\mathcal{O}(n^{-3}),

(13)

where ${\mbox{\boldmath{$\Delta$}}}=-0.5{\mbox{\boldmath{$\Delta$}}}^{(1)}+0.25{\mbox{\boldmath{$\Delta$}}}^{(2)}+0.5\tau_{2}{\mbox{\boldmath{$\Delta$}}}^{(3)}$ with

	$\displaystyle\Delta^{(1)}$	$\displaystyle=\frac{1}{\sigma^{4}}{\bm{X}}^{\top}{\bm{W}}^{\star}{\bm{Z}}_{d}{\bm{X}},$
	$\displaystyle\Delta^{(2)}$	$\displaystyle=-\frac{1}{\sigma^{6}}{\bm{X}}^{\top}\left[{\bm{W}}{\bm{Z}}^{(2)}{\bm{W}}-2\sigma{\bm{W}}{\bm{Z}}^{(2)}{\bm{W}}^{\prime}-6\sigma^{2}{\bm{W}}^{\prime}{\bm{Z}}^{(2)}{\bm{W}}^{\prime}\right]{\bm{X}},$
	$\displaystyle\Delta^{(3)}$	$\displaystyle=\frac{1}{\sigma^{5}}{\bm{X}}^{\top}{\bm{W}}^{\prime}{\bm{W}}^{\star\star}{\bm{X}},$

${\bm{W}}^{\star}=$ diag $(w_{1}^{\star},\ldots,w_{n}^{\star})$ , $w_{i}^{\star}=w_{i}(w_{i}-2)-2\sigma w_{i}^{\prime}+\sigma\tau_{1}(w_{i}^{\prime}+2\sigma w_{i}^{\prime\prime})$ , ${\bm{Z}}^{(2)}={\bm{Z}}\odot{\bm{Z}}$ , with $\odot$ representing a direct product of matrices (Hadamard product), ${\bm{W}}^{\star\star}$ is a diagonal matrix, with ${\bm{Z}}({\bm{W}}+2\sigma{\bm{W}}^{\prime}){\bm{Z}}_{d}{\bm{1}}$ as its diagonal, ${\bm{W}}^{\prime\prime}=$ diag $(w_{1}^{\prime\prime},\ldots,w_{n}^{\prime\prime})$ , $w_{i}^{\prime\prime}=-\sigma^{-1}w_{i}^{\prime}\left[L_{i}^{1/\sigma}\exp(-\mu_{i}/\sigma)-1\right]$ , ${\bm{\tau}}=(\tau_{1},\tau_{2})=(1,1)$ indicating the second-order covariance matrix of the MLE ${\bm{\beta}}^{\star}=\widehat{{\bm{\beta}}}$ denoted by $\mbox{{\bf Cov}}_{\bm{2}}(\widehat{{\bm{\beta}}})$ and ${\bm{\tau}}=(0,-1)$ indicating the second-order covariance matrix of the BCE ${\bm{\beta}}^{\star}=\widetilde{{\bm{\beta}}}$ denoted by $\mbox{{\bf Cov}}_{\bm{2}}(\widetilde{{\bm{\beta}}})$ .

4.3 Wald-type test

Let $\mbox{{\bf Cov}}_{\bm{2}}^{-1}(\widehat{{\bm{\beta}}})$ and $\mbox{{\bf Cov}}_{\bm{2}}^{-1}(\widetilde{{\bm{\beta}}})$ the inverse of $\mbox{{\bf Cov}}_{\bm{2}}(\widehat{{\bm{\beta}}})$ and $\mbox{{\bf Cov}}_{\bm{2}}(\widetilde{{\bm{\beta}}})$ , respectively and considering also the partitions and the notation for the Fisher information matrix discussed in the introductory section, we can propose four modifications to the Wald test in (6):

$\displaystyle W_{MLE2}$	$\displaystyle=\left({\bm{C}}\widehat{\bm{\beta}}-{\bm{C}}{\bm{\beta}}^{(0)}\right)^{\top}\left\{{\bm{C}}{\widehat{\rm\bf C}}\mbox{{\bf ov}}_{\bm{2}}^{-1}(\widehat{{\bm{\beta}}}){\bm{C}}^{\top}\right\}^{-1}\left({\bm{C}}\widehat{\bm{\beta}}-{\bm{C}}{\bm{\beta}}^{(0)}\right),$	(14)
$\displaystyle W_{BCE}$	$\displaystyle=\left({\bm{C}}\widetilde{{\bm{\beta}}}-{\bm{C}}{\bm{\beta}}^{(0)}\right)^{\top}\left\{{\bm{C}}\widetilde{\bm{K}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}{\bm{C}}^{\top}\right\}^{-1}\left({\bm{C}}\widetilde{{\bm{\beta}}}-{\bm{C}}{\bm{\beta}}^{(0)}\right),$	(15)
$\displaystyle W_{BCE2}$	$\displaystyle=\left({\bm{C}}\widetilde{{\bm{\beta}}}-{\bm{C}}{\bm{\beta}}^{(0)}\right)^{\top}\left\{{\bm{C}}{\widetilde{\rm\bf C}}\mbox{{\bf ov}}_{\bm{2}}^{-1}(\widetilde{{\bm{\beta}}}){\bm{C}}^{\top}\right\}^{-1}\left({\bm{C}}\widetilde{{\bm{\beta}}}-{\bm{C}}{\bm{\beta}}^{(0)}\right),$	(16)
$\displaystyle W_{Firth}$	$\displaystyle=\left({\bm{C}}\check{{\bm{\beta}}}-{\bm{C}}{\bm{\beta}}^{(0)}\right)^{\top}\left\{{\bm{C}}\check{{\bm{K}}}_{{\bm{\beta}}{\bm{\beta}}}^{-1}{\bm{C}}^{\top}\right\}^{-1}\left({\bm{C}}\check{{\bm{\beta}}}-{\bm{C}}{\bm{\beta}}^{(0)}\right),$	(17)

where ${\widehat{\rm\bf C}}\mbox{{\bf ov}}_{\bm{2}}(\widehat{{\bm{\beta}}})$ is the matrix $\mbox{{\bf Cov}}_{\bm{2}}({\bm{\beta}})$ evaluated at $\widehat{{\bm{\beta}}}$ , $\widetilde{{\bm{K}}}_{{\bm{\beta}}{\bm{\beta}}}$ is the Fisher information evaluated at $\widetilde{{\bm{\beta}}}$ , ${\widetilde{\rm\bf C}}\mbox{{\bf ov}}_{\bm{2}}(\widetilde{{\bm{\beta}}})$ is the matrix $\mbox{{\bf Cov}}_{\bm{2}}(\widetilde{{\bm{\beta}}})$ evaluated at $\widetilde{{\bm{\beta}}}$ , $\check{{\bm{K}}}_{{\bm{\beta}}{\bm{\beta}}}$ is the Fisher information evaluated at $\check{{\bm{\beta}}}$ . Under $\mathcal{H}$ , $W_{MLE}$ , $W_{BCE}$ , $W_{BCE2}$ , $W_{Firth}$ follow a $\chi_{p}^{2}$ distribution.

4.4 Improved Estimator strategies

In the supplemental material and the next section, we studied the statistical properties of improved estimators for $\bm{\beta}$ and its covariance matrix with the following strategies: the classical ML estimator with the Fisher information as covariance matrix (MLE); the classical ML estimator with the corrected covariance matrix defined in (13) (MLE2); the bias corrected estimator (BCE) given in (11) with the Fisher information as covariance matrix; the bias corrected estimator with the corrected covariance matrix (BCE2); and the Firth estimator defined in (12) with its Fisher information as covariance matrix.

5 Simulation study

In our collaborative work, investigators wanted to establish a dose-response relationship between a new inhibitor agent for pancreatic cancer in combination with a given dose of gemcitabine in mouse models. Based on preliminary data, a survival median time of 4 months was estimated in control-treated KPC mice model such that previous studies showed no survival benefit with only gemcitabine [13]. Assuming a Weibull distribution with $\sigma=0.5$ , we calculated the placebo effect equal to 1.57 and the maximum effect of 2.26 considering a hazard ratio of 4. Investigators were interested in the minimum effective dose yielding a minimum hazard ratio of 2 corresponding to $\Delta=0.693$ .

Model	Constraints	Guess estimates/True parameters			True MED
Model	Constraints	$\theta_{0}$	$\theta_{1}$	$\theta_{2}$	True MED
Constant	-	$E_{0}=1.569$	-	-	-
Linear	-	$E_{0}=1.569$	$\delta=0.0139$	-	-
Emax	50% at $x_{4}$	$E_{0}=1.569$	$E_{Max}=2.079$	$ED_{50}=50.000$	25.00
Exponential	10% at $x_{1}$	$E_{0}=1.569$	$E_{1}=0.017$	$\delta=22.756$	84.51
Logistic	10% at $x_{3}$ 80% at $x_{4}$	$E_{0}=1.569$	$E_{Max}=1.391$	$ED_{50}=40.329$ $\delta=6.976$	40.37
Beta	30% at $x_{2}$	$E_{0}=1.569$	$E_{Max}=1.386$	$\delta_{1}=0.749$ $\delta_{2}=1.049$	10.61

Table 1: Scenarios (Cosntant, Emax, Exponential, Logistic and Beta) and candidate models (Linear, Emax, Exponential, Logistic and Beta) defined based on, respectively, true parameters and guess estimates. True parameters/guess estimates were calculated based on placebo effect of 1.57, maximum effect of 2.96 and constraints with scale parameter of 120 for Beta model. True MED was calculated based on

\Delta=0.693

Five scenarios (constant, emax, exponential, logistic and beta model) presented in Table 1 were studied. True parameters were defined based on the aforementioned placebo and maximum effects, and the percent of maximum effect that is achieved at given dose as discussed in Bornkamp et al.[14]. For each scenario, the doses 0, 5, 25, 50 and 100 (mg/kg) were considered such that true MED was calculated as continuous dose given in Table 1. Five candidate models (linear, emax, exponential, logistic and beta model) were considered with guess estimates defined as the true parameters to calculate the optimal contrasts in (8). For each scenario, the null hypothesis of non-signal of the new inhibitor agent was tested at 5% significance level, and we used Akaike Information Criteria (AIC) to choose the model to estimate MED when more than one model rejected the non-signal hypothesis. Furthermore, we assumed that the target dose is estimated as a continuous dose for any value within the range of the dose grid. In case, the target dose is estimated outside of the dose grid, then the closest dose is selected.

For both steps of the MCP-Mod framework, the five strategies (MLE, MLE2, BCE, BCE2 and Firth) were evaluated based on a Monte Carlo simulation study with 100,000 replicates for sample sizes ranging from 5 to 100 mice per dose and censoring rate of 10%, 25% and 50%. The following operating characteristics were studied: (i) convergence rate of GLS algorithm in the MCP-Mod General framework; (ii) Probability of incorrectly detecting a dose-response signal under the scenario with a constant dose response curve, i.e., type I error; (iii) Probability of correctly detecting a dose-response signal under scenarios with a non-constant dose-response curve (Emax, Exponential, Logistic, Beta), i.e., power; (iv) Probability of selecting the true model given that there is a signal; (v) Bias of MED estimate and (vi) RMSE of MED estimate.

5.1 Results

In Figure 1, convergence rates when calculating estimators and applying them as input for the MCP-Mod framework is presented for different censoring rates and true models. When censoring is 10%, there is no difference among estimators; when censoring is 25%, similar conclusion can be drawn but the proposed strategies have lower convergence rates than MLE for the true model Exponential and sample size of 5 subjects per dose: 94% for BCE and BCE, 97% for Firth and MLE2 ; when censoring is 50%, the lower convergence rates of the estimators Firth, MLE2, BCE and BCE2 are also observed in other scenarios with dose-relationship signal for sample sizes of 5 or 10 subjects per dose reaching 69% for BCE and BCE2 when true model is either Logistic or Exponential, 76% for BCE and BCE2 when true model is either Beta or Emax. These lower convergence rates are attained because small sample sizes in combination with high censoring rates result into doses with no events (in our case, deaths), therefore, very large estimates for the regression coefficients are obtained and, consequently, singular covariance matrix estimates.

For the MCP-step, type I error probability is displayed for different censoring rates and true models in Figure 2. When censoring is 10%, the strategy MLE shows empirical type I error probability inflated up to 0.086 when sample size is 5 subjects per dose, and reaches the nominal type error probability when sample size is 100 subjects per dose; the proposed strategies are slightly conservative such that the strategy Firth shows type I error probability uniformly closer to 0.05 than the other proposed strategies. When censoring is 25%, strategies based on refined estimators are more conservative than MLE; among the proposed strategies, the strategy Firth is consistently superior followed by BCE, BCE2 and MLE2 reaching the same performance than MLE strategy for a sample size of 50 subjects per dose. When censoring is 50%, all strategies are overly conservative including MLE; Firth strategy is uniformly superior followed by BCE, MLE BCE2 and MLE2 up to a sample size of 100 subjects per dose. For all censoring values and strategies, type I error probability converges to its nominal value for sample size of 100 subjects per dose.

In Figure 3A, the probability of correctly detecting dose-response signal in the MCP-step is showed as function of censoring rates and true models. It is expected that the strategy with inflated type I error show higher power than strategies with empirical type I error closer to its nominal value. A fair comparison would require us to re-adjust critical values to reject the null hypothesis such that the empirical type I probability was set at 0.05 for all strategies. Nonetheless, we did not adjust the critical value as this procedure would never be done in practice due computational costs. Therefore, we only discussed strategies that have probability of type I error less or equal to its nominal value of 0.05.

When censoring is 10%, the probability of correctly detecting dose-response signal is less the target of 0.8 with 5 subjects/dose such that the strategy Firth shows a higher power of at least 0.04 in comparison to the other proposed strategies; for sample size of 10 or larger, differences among strategies are negligible with power above its target for all strategies and true models. When censoring rate is 25%, the strategy Firth is superior to others by at least 0.06 for 5 subjects/dose; however, all strategies have power less than its target; for sample size of 10 or larger, differences among strategies are negligible including MLE that is uniformly less conservative than the proposed strategies; moreover, power is above its target value for sample size of 10 for all true models except for Exponential as true model. When censoring is 50%, the strategy Firth shows consistently higher power while MLE2 strategy shows consistently lower power when compared to others with differences among strategies minor for sample size of 15 or higher subjects/dose, except for the strategy MLE2 and Exponential as true model.

In Figure 3B, the probability of correctly selecting dose-response model using AIC is calculated given that we selected at least one model in the MCP-step, i.e., it is a conditional probability. When censoring is 10%, differences among strategies are negligible. When censoring is 25%, there is no clear pattern for sample size of 5 subjects/dose; for larger sample sizes, the strategy MLE shows a slight better performance than other strategies up to 0.03 for sample size from 10 to 20 subjects/dose. When censoring is 50%, there is no clear patterns for sample size of 5 and 10 subjects/dose; for larger sample sizes, the strategy MLE also shows higher probability no more than 0.02 for sample size of 15 to 25 subjects/dose.

For Mod-step, we calculated the relative bias and RMSE of $\widehat{MED}$ estimator in Figure 4A and B, respectively, for different censoring rates and true models. When censoring is 10% and 25%, MLE and proposed strategies based on improved estimators have negligible differences for bias and RMSE; when censoring is 50%, the strategies BCE and Firth presented lowest bias followed by MLE and BCE2 with noticeable poorer performance for MLE2 for sample sizes from 5 and 15 while similar conclusions can be drawn for RMSE only the Exponential as true model with sample sizes up to 15 subjects/dose and Logistic for sample size of 5 sujbects/dose. Otherwise, differences are negligible.

6 Concluding Remarks

We have derived improved inferences based on the Wald statistic for WRM particularizing general results from Cox and Snell [7] and Magalhães et al. [8], which complements previous results for improved inference based on likelihood ratio, Rao score and gradient statistics discussed in Magalhães and Gallardo [6]. Few authors have presented improved inference for survival models with small sample sizes under the classical approach: Cordeiro and Colosimo [15, 16] and Medeiros [17] derived those statistics, respectively, for censored exponential regression models (ERM), a particular case of censored WRM. Also for ERM, Lemonte [18] presented the second-order covariance matrix of the MLE.

We have also proposed strategies based on bias-corrected (BCE, BCE2 and Firth) and second-order covariance matrices (MLE2, BCE2) as input for the general MCP-Mod framework introduced by Pinheiros et al.[3], which addresses the issue of relying on the asymptotic properties of MLE, which might not be valid for small sample sizes. To the best of our knowledge, this work is the first attempt to apply refined estimators for small sample sizes in the general MCP-Mod framework. Two simulations studies were performed to study the properties of refined estimators in an usual context of regression models and relevant operating characteristics in the MCP-Mod framework.

In the simulation study presented for general censored Weibull regression model in the supplementary material, we showed numerical evidences that BCE and Firth estimator have lower bias and RMSE than MLE; second-order covariance matrices evaluated at MLE and BCE are closer to their respective empirical covariance matrices in comparison to the first-order covariance matrices; and Wald statistics derived from the combination between bias-corrected estimators and second-order covariance matrices yielded type I error probability closer to the nominal value than the standard Wald statistic with no loss of power.

In the simulation study for the MCP-Mod framework, we have found that refined estimators and second-order covariance matrices approximate type I error probability to its nominal value in the MCP-step, while there are negligible differences between MLE and refined estimators in the probability of correctly detecting the dose-response signal, probability correctly selecting the dose-response model, bias and RMSE when censoring rates are up to 25%. For censoring rate of 50%, we found convergence issues for the corrected estimators and second-order covariance matrices and poorer performance in the assessed operating characteristics.

In conclusion, we recommend the use of Firth as a strategy using refined estimators for small sample sizes in the MCP-Mod framework. In the context of basic science with limited sample sizes and large effect sizes, we do not expect large censoring rates in mouse experiments. In human trials, smaller effect sizes are pursued such that larger sample sizes are required. In this case, proposed strategies are not needed because all estimators present comparable performance for large sample sizes. Nonetheless, we showed that type I error probability is still inflated even for sample sizes of 25 subjects per dose, therefore, the use of proposed strategies would avoid to dedicate further efforts on non-promising drugs.

We hope that refined estimators allow statisticians to implement the MCP-Mod framework with small sample sizes accelerating the pre-clinical and clinical drug development process. An R-package, MCPModBC [19], is available on CRAN to fit the Weibull model with refined estimators and perform simulations for power considerations. Similar ideas can be applied to other distributions such as Binomial, Negative Binomial and Poisson, and they are currently under investigation. Furthermore, the impact of model misspecification requires further study.

7 Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Refer to caption — Figure 1: Convergence rate when calculating MLE, MLE2, BCE, BCE2 and Firth estimators and applying them as input for the MCP-Mod framework as function of censoring rate and true model.

References

[1] Jardim Denis L, Groves Eric S, Breitfeld Philip P, Kurzrock Razelle. Factors associated with failure of oncology drugs in late-stage clinical development: a systematic review Cancer treatment reviews. 2017;52:12–21.
[2] Bretz Frank, Pinheiro José C, Branson Michael. Combining multiple comparisons and modeling techniques in dose-response studies Biometrics. 2005;61:738–748.
[3] Pinheiro José, Bornkamp Björn, Glimm Ekkehard, Bretz Frank. Model-based dose finding under model uncertainty using general parametric models Statistics in medicine. 2014;33:1646–1661.
[4] Carroll Kevin J. On the use and utility of the Weibull model in the analysis of survival data Controlled clinical trials. 2003;24:682–701.
[5] Magalhães Tiago M., Gallardo Diego I., Gómez H. W.. Skewness of maximum likelihood estimators in the Weibull censored data Symmetry. 2019;11:1351.
[6] Magalhães Tiago M., Gallardo Diego I.. Bartlett and Bartlett-type corrections for censored data from a Weibull distribution SORT - Statistics and Operations Research Transactions. 2020;44:127–140.
[7] Cox David R., Snell E. J.. A general definition of residuals Journal of the Royal Statistical Society. Series B (Methodological). 1968;30:248–275.
[8] Magalhães Tiago M., Botter Denise A., Sandoval Mônica C.. A general expression for second-order covariance matrices - an application to dispersion models Brazilian Journal of Probability and Statistics. 2021;35:37–49.
[9] Weibull Waloddi. A statistical distribution function of wide applicability Journal of Applied Mechanics. 1951;18:293–297.
[10] Kalbfleisch John D., Prentice Ross L.. The statistical analysis of failure time data. New Jersey: John Wiley & Sons2 ed. 2002.
[11] Wald Abraham. Test of statistical hypotheses concerning several parameter when the number of observations is large Transactions of the American Mathematical Society. 1943;54:426–482.
[12] Firth David. Bias reduction of maximum likelihood estimates Biometrika. 1993;80:27–38.
[13] Olive Kenneth P, Jacobetz Michael A, Davidson Christian J, et al. Inhibition of Hedgehog signaling enhances delivery of chemotherapy in a mouse model of pancreatic cancer Science. 2009;324:1457–1461.
[14] Bornkamp Björn, Pinheiro José, Bretz Frank, others . MCPMod: An R package for the design and analysis of dose-finding studies Journal of Statistical Software. 2009;29:1–23.
[15] Cordeiro Gaus M., Colosimo Enrico A.. Improved likelihood ratio tests for exponential censored data Journal of Statistical Computation and Simulation. 1997;56:303–315.
[16] Cordeiro Gaus M., Colosimo Enrico A.. Corrected score tests for exponential censored data Statistics & Probability Letters. 1999;44:365–373.
[17] Medeiros Francisco M. C., Lemonte Artur J.. Likelihood-based inference in censored exponential regression models Communications in Statistics - Theory and Methods. 2021;50:3214–3233.
[18] Lemonte Artur J.. Covariance matrix of maximum likelihood estimators in censored exponential regression models Communications in Statistics - Theory and Methods. 2022;51:1765–1777.
[19] Diniz Márcio A., Gallardo Diego I., Magalhães Tiago M.. MCPModBC: MCP-Mod with bias corrected estimators 2023. R package version 1.0.