This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Efficient estimation of partially linear additive Cox models and variance estimation under shape restrictions

Junjun Lang KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200062, China Yukun Liu Corresponding author: [email protected] KLATASDS-MOE, School of Statistics, East China Normal University, Shanghai 200062, China Jing Qin National Institute of Allergy and Infectious Diseases, National Institutes of Health, USA
Abstract

Shape-restricted inferences have exhibited empirical success in various applications with survival data. However, certain works fall short in providing a rigorous theoretical justification and an easy-to-use variance estimator with theoretical guarantee. Motivated by Deng et al. (2023), this paper delves into an additive and shape-restricted partially linear Cox model for right-censored data, where each additive component satisfies a specific shape restriction, encompassing monotonic increasing/decreasing and convexity/concavity. We systematically investigate the consistencies and convergence rates of the shape-restricted maximum partial likelihood estimator (SMPLE) of all the underlying parameters. We further establish the aymptotic normality and semiparametric effiency of the SMPLE for the linear covariate shift. To estimate the asymptotic variance, we propose an innovative data-splitting variance estimation method that boasts exceptional versatility and broad applicability. Our simulation results and an analysis of the Rotterdam Breast Cancer dataset demonstrate that the SMPLE has comparable performance with the maximum likelihood estimator under the Cox model when the Cox model is correct, and outperforms the latter and Huang (1999)’s method when the Cox model is violated or the hazard is nonsmooth. Meanwhile, the proposed variance estimation method usually leads to reliable interval estimates based on the SMPLE and its competitors.

Keywords:  Shape restriction; Righter-censored data; Additive model; Semiparametric efficiency; Variance estimation

1 Introduction

Shape restrictions (such as monotonicity and convexity) arise naturally in numerous practical scenarios. For instance, the growth curves of animals and plants in ecology and the dose-response in medicine must inherently exhibit non-decreasing characteristics (Chang et al., 2007; Wang and Ghosh, 2012). In the realm of economics, utility and production functions are often concave in income and prices (Matzkin, 1991; Varian, 1984), cost functions are monotone increasing, concave in input prices, and may exhibit non-increasing or non-decreasing returns to scale (Horowitz and Lee, 2017). In genetic epidemiology studies, the cumulative risk of a disease for individuals possess monotonicity (Qin et al., 2014). While in reliability analysis, the bathtub curve describing the failure rate typically displays convexity.

Incorporating shape restrictions into statistical analysis, apart from its exceptional interpretability and ability to enforce domain-specific constraints, often results in an estimation procedure that is devoid of tuning parameters, enhancing its efficiency and robustness. Therefore shape-restricted techniques has become an increasing popular tool for statistical inference or learning in various settings over the past decades. A comprehensive review on shape-restricted nonparametric inferences can be found in Groeneboom and Jongbloed (2014) and references therein. Recently, Chen and Samworth (2016) developed an algorithm for the estimation of the generalized additive model in which each of the additive components is linear or subject to a shape restriction. Balabdaoui et al. (2019) considered the estimation of the index parameter in a single-index model with a monotonically increasing link function. Deng and Zhang (2020) studied minimax and adaptation rates in general multiple isotonic regression. Feng et al. (2022) systematically investigate the theoretical properties of the least squared estimator of a S-shaped regression function.

This paper focus on the statistical inference for righter-censored survival data. Let TT denote the survival time and (Z,X)p×d(Z,X)\in\mathbb{R}^{p}\times\mathbb{R}^{d} denote a (p+d)×1(p+d)\times 1 vector of covariates. We consider the partially linear Cox model of (Sasieni, 1992, PLCM) for modelling the conditional hazard function, i.e.

λT(tx,z)=λ(t)exp(βx+g(z)),\displaystyle\lambda_{T}(t\mid x,z)=\lambda(t)\exp(\beta^{\top}x+g(z)), (1)

where λ()\lambda(\cdot) is the unspecified baseline hazard function, βd\beta\in\mathbb{R}^{d} is unspecified and g():pg(\cdot):\mathbb{R}^{p}\mapsto\mathbb{R} is an unknown function. This model reduces to the renowned Cox proportional hazards model (Cox, 1972, 1975) when the covariate ZZ disappears, and it becomes the nonparametric Cox model (Sleeper and Harrington, 1990; O’Sullivan, 1993) in the absence of XX.

Many nonparametric techniques have been developed for the estimation of the PLCM, in particular for the linear covariate effect. Examples include profile partial likelihood together with a kernel technique (Heller, 2001), maximum likelihood estimation with a deep neural network (Zhong et al., 2022), and a kernel machine representation method (Rong et al., 2024), etc. However, these methods are either hampered by the curse of dimensionality or lack interpretability for g(Z)g(Z), or suffer from tuning parameters, whose selection is not always straightforward. Alternatively, Huang (1999) proposed to model g(Z)g(Z) by a generalized additive model (Hastie and Tibshirani, 1986, 1990), which effectively avoids the curse of dimensionality and enforces an additive effect for the covariate ZZ. Specifically,

g(Z)=j=1pgj(Z(j)),\displaystyle g(Z)=\sum\limits_{j=1}^{p}g_{j}(Z_{(j)}), (2)

where for 1jp1\leq j\leq p, Z(j)Z_{(j)} is the jj-th component of ZZ and gjg_{j} is an unknown function. Huang (1999) proposed the use of polynomial splines to fit the unknown additive components. This method entails a number of tuning parameters, also yields convergence rates that lack conciseness and elegance. Furthermore, the spline method does not provide good interpretability for the additive covariate Z(j)Z_{(j)}.

Our paper is motivated by the work of Deng et al. (2023) which studied a shape-restricted and additive PLCM. Specifically, under models (1) and 2, they assume that each gjg_{j} is monotonic increasing/decreasing or convex/concave. An active-set optimization algorithm was provided to calculate the shape-restricted maximum likelihood estimator. The shape-restriction strategy facilitates the utilization of prior knowledge regarding the effect of the log conditional hazard function on each covariate Z(j)Z_{(j)} and leads to a tuning-parameter-free estimation procedure. However, they proved only a consistency result, and did not provide any asymptotic normality results. Qin et al. (2021) studied a PLCM with a single additive component subject to shape restrictions, but they did not establish any n\sqrt{n}-consistency result. In addition, in general shape-restriction inferences, even if asymptotic normality results can be established, it is generally challenging to construct reasonable estimators for the asymptotic variances with theoretical guarantees (Groeneboom and Hendrickx, 2017).

This paper makes two main contributions to the literature of additive and shape-restricted PLCMs for survival data. The first contribution is to provide powerful statistical guarantees for the shape-restricted maximum partial likelihood estimator (SMPLE) and the induced Breslow-type estimator for the baseline cumulative hazard function under the model assumption of Deng et al. (2023). This includes a thorough convergence rate analysis for the estimators of the infinitely dimensional parameters, as well as establishing asymptotic normality and semiparametric efficiency for the estimator of the linear covariate effect. Our second contribution is to offer an easy-to-use estimator for the asymptotic variance of the linear covariate effect estimator. We show that this variance estimation method always provide consistent estimators once the corresponding asymptotic normality result holds. This method is very flexible and is applicable for general purpose especially in shape-restricted inferences, where theoretical guarantee of a bootstrap variance estimator is generally rather challenging (Groeneboom and Hendrickx, 2017). Our simulation results and an analysis of the Rotterdam Breast Cancer dataset demonstrate that the SMPLE has comparable performance with the maximum likelihood estimator under the Cox model when the Cox model is correct, and outperforms the latter and Huang (1999)’s method when the Cox model is violated and the hazard is nonsmooth. Meanwhile, the proposed variance estimation method usually leads to reliable interval estimates for the SMPLE and its competitors.

The rest of this paper is organized as follows. Section 2 introduces notations, data, and the shape-restricted maximum partial likelihood estimators (SMPLE). Section 3 investigates the convergence rates of the SMPLEs for all the unknown parameters, including β\beta, the unknown additive components, and the baseline cumulative hazard function. Section 4 establishes the asymptotic normality and semiparametric efficiency of the SMPLE for β\beta. A novel estimation method is also provided to estimate the asymptotic variance of the SMPLE of β\beta. A simulation study and real data analysis are presented in Section 5 and 6, respectively. Section 7 contains concluding remarks. For clarity, all technical proofs are postponed to the supplementary material.

2 Methodology

2.1 Data and model assumptions

Let TT and (X,Z)(X,Z) be the survival time and the vector of covariates, respectively, in the introduction. Suppose that given (X,Z)(X,Z), the conditional hazard function of TT satisfies model (1) with g(Z)g(Z) satisfying (2). The survival time TT may be right censored by a censoring time CC and we only observe Y=min(T,C)Y=\min(T,C). Throughout this paper, we use 𝟏(A)\mathbf{1}(A) to denote the indicator function of the set AA and use a subscript 0 to highlight the true counterpart of a parameter. Let Δ=𝟏(TC)\Delta=\mathbf{1}(T\leq C) be the non-censoring indicator. Given nn independent and identically distributed (iid) observations (Xi,Zi,Yi,Δi)(X_{i},Z_{i},Y_{i},\Delta_{i}), 1in1\leq i\leq n, from (X,Z,Y,Δ)(X,Z,Y,\Delta), we wish to infer (β0,g0)(\beta_{0},g_{0}) and the baseline cumulative hazard function Λ0(y)=0yλ0(t)𝑑t\Lambda_{0}(y)=\int_{0}^{y}\lambda_{0}(t)dt.

The identifiability issue of models (1) and (2) was investigated by Deng et al. (2023), following which we assume 𝔼{g0,j(Z(j))Δ}=0{\mathbb{E}}\{g_{0,j}(Z_{(j)})\Delta\}=0, j=1,2,,pj=1,2,\cdots,p, for identifiability. Furthermore, we assume that for 1jp1\leq j\leq p, g0,j()g_{0,j}(\cdot) satisfies one of the four shape restrictions: monotone increasing, monotone decreasing, convex and concave, which are encoded as shape types 1, 2, 3 and 4, respectively. For any additive function g=j=1pgj(z(j))g=\sum\limits_{j=1}^{p}g_{j}(z_{(j)}), we define sha(g)=(sha(g1),,sha(gp)){\rm sha}(g)=({\rm sha}(g_{1}),\cdots,{\rm sha}(g_{p}))^{\top}, where sha(h){1,2,3,4}{\rm sha}(h)\in\{1,2,3,4\} denotes the shape type of a univariate function hh. We always denote 𝒌0=sha(g0)\boldsymbol{k}_{0}={\rm sha}(g_{0}). Let 𝒳\mathcal{X} be the support of XX and for simplicity, we assume that the support of Z(j)Z_{(j)} is [0,1][0,1] for j=1,2,,pj=1,2,\cdots,p.

2.2 SMPLE

For any (β,g)(\beta,g), denote η=(β,g)\eta=(\beta,g) and Rη(U)=Xβ+g(Z)R_{\eta}(U)=X^{\top}\beta+g(Z), where U=(X,Z)U=(X^{\top},Z^{\top})^{\top}. 1/n1/n times the usual partial log likelihood is

Ln(η)=1ni=1nΔi[Rη(Ui)log(j=1n𝟏(YjYi)exp(Rη(Uj)))].\displaystyle L_{n}(\eta)=\frac{1}{n}\sum\limits_{i=1}^{n}\Delta_{i}\left[R_{\eta}(U_{i})-\log\left(\sum\limits_{j=1}^{n}\mathbf{1}(Y_{j}\geq Y_{i})\exp(R_{\eta}(U_{j}))\right)\right].

We propose to estimate η\eta by the shape-restricted maximum partial likelihood estimator (SMPLE),

η^:=(β^,g^)=argmaxηd×𝒢𝒌0Ln(η),\displaystyle\hat{\eta}:=(\hat{\beta},\hat{g})=\operatorname{argmax}_{\eta\in\mathbb{R}^{d}\times\mathcal{G}_{\boldsymbol{k}_{0}}}L_{n}(\eta), (3)

where

𝒢𝒌0={g:[0,1]pg(Z)=j=1pgj(z(j)),sha(g)=𝒌0,𝔼[Δgj(Z(j))]=0,1jp}\displaystyle\mathcal{G}_{\boldsymbol{k}_{0}}=\bigg{\{}g:[0,1]^{p}\mapsto\mathbb{R}\mid g(Z)=\sum\limits_{j=1}^{p}g_{j}(z_{(j)}),~{}{\rm sha}(g)=\boldsymbol{k}_{0},\mathbb{E}\left[\Delta g_{j}(Z_{(j)})\right]=0,~{}1\leq j\leq p\bigg{\}}

is the parameter space of gg. With the SMPLE in (3), we estimate Λ0(y)\Lambda_{0}(y) by the Breslow-type estimator

Λ^(y;η^)=1nj=1nΔjS0,n(Yj,η^)𝟏(yYj),\displaystyle\hat{\Lambda}(y;\hat{\eta})=\frac{1}{n}\sum\limits_{j=1}^{n}\frac{\Delta_{j}}{S_{0,n}(Y_{j},\hat{\eta})}\mathbf{1}(y\geq Y_{j}), (4)

where S0,n(y,η)=(1/n)i=1n{𝟏(Yiy)exp(Rη(Ui))}.S_{0,n}(y,\eta)=(1/n)\sum\limits_{i=1}^{n}\{\mathbf{1}(Y_{i}\geq y)\exp(R_{\eta}(U_{i}))\}.

The SMPLE defined in (3) can be calculated with the active-set algorithm introduced in Deng et al. (2023). Let g^j\hat{g}_{j} be functions satisfying g^(Z)=j=1pg^j(Z(j))\hat{g}(Z)=\sum\limits_{j=1}^{p}\hat{g}_{j}(Z_{(j)}) for all ZZ. The function g^(Z)\hat{g}(Z) is unique only at the observed ZiZ_{i} and is therefore non-unique typically for ZZ other than ZiZ_{i}’s, which is akin to general shape-restricted regression estimators (Chen and Samworth, 2016). This implies that g^j()\hat{g}_{j}(\cdot) is usually non-unique for 1jp1\leq j\leq p, and the solution set of g^j()\hat{g}_{j}(\cdot) always contains a piece-wise linear function (Deng et al., 2023). See Figure 3 for an illustration.

3 Rate of convergence

The consistency property of the SMPLEs β^\hat{\beta}, g^\hat{g}, and g^j\hat{g}_{j} was established by Deng et al. (2023). In this section, we establish their convergence rates. We make the following assumptions.

Assumption 1.

(i) The observed {Yi}i=1n\{Y_{i}\}_{i=1}^{n} are in the interval [0,τ][0,\tau], for some τ>0\tau>0. (ii) Given UU, TT and CC are mutually independent of each other. (iii) Λ0(τ)<\Lambda_{0}(\tau)<\infty and pr(CτU)c>0{\rm pr}(C\geq\tau\mid U)\geq c>0 almost surely for some constant cc. (iv) 𝔼[ΔX]=0\mathbb{E}[\Delta X]=0 and 𝔼[Δ]>0\mathbb{E}[\Delta]>0.

Let \|\cdot\| denote the usual Euclidean norm and f()\|f(\cdot)\|_{\infty} the supreme norm of a real-valued function ff. For any constant M>0M>0, define

𝒦M,𝒌0:={ηη=(β,g),g𝒢𝒌0,β+j=1pgj()M}.\displaystyle\mathcal{K}_{M,\boldsymbol{k}_{0}}:=\left\{\eta\mid\eta=(\beta,g),g\in\mathcal{G}_{\boldsymbol{k}_{0}},\|\beta\|+\sum\limits_{j=1}^{p}\|g_{j}(\cdot)\|_{\infty}\leq M\right\}.
Assumption 2.

The support 𝒳\mathcal{X} of XX is a bounded subset of d\mathbb{R}^{d} and there exists a positive constant M0>0M_{0}>0 such that η0𝒦M0,𝐤0\eta_{0}\in\mathcal{K}_{M_{0},\boldsymbol{k}_{0}}.

Assumption 3.

There exists a small positive constant ϵ\epsilon such that pr(Δ=1U)>ϵ{\rm pr}(\Delta=1\mid U)>\epsilon almost surely with respect to the probability measure of UU.

Assumption 4.

The joint density of (Y,Z,Δ)(Y,Z,\Delta) satisfies

0<inf(y,z)[0,τ]×[0,1]ppr(Y=y,Z=z,Δ=1)sup(y,z)[0,τ]×[0,1]ppr(Y=y,Z=z,Δ=1)<.\displaystyle 0<\inf_{(y,z)\in[0,\tau]\times[0,1]^{p}}{\rm pr}(Y=y,Z=z,\Delta=1)\leq\sup_{(y,z)\in[0,\tau]\times[0,1]^{p}}{\rm pr}(Y=y,Z=z,\Delta=1)<\infty.
Assumption 5.

When sha(g0,j){3,4}{{\rm sha}}(g_{0,j})\in\{3,4\}, the density function Z(j)Z_{(j)} with respect to the Lebesgue measure has uniformly upper and lower bounds on [0,1][0,1].

Assumptions 12 are standard in the theoretical analysis of traditional Cox model and its variants (Huang, 1999; Zhong et al., 2022). Assumption 3 ensures that the probability of being uncensored is positive regardless of the covariate values, and it is used to establish the convergence rate results in Theorem 1. Assumption 4 is used in the calculation of the semiparametric efficiency lower bound (Huang, 1999). In Assumption 5, the upper bound requirement is used to calculate some entropy results needed in the proof of Proposition 1, and the lower bound requirement guarantees that the approximation errors of piecewise linear approximations of the convex/concave additive components to themselves are small enough in the proof of Theorem 3.

The Proposition below establishes the consistency of Rη^()R_{\hat{\eta}}(\cdot), as an estimator of Rη0()R_{\eta_{0}}(\cdot), which roughly implies the consistency of η^=(β^,g^)\hat{\eta}=(\hat{\beta},\hat{g}). We would have proved the consistencies of β^\hat{\beta} and each g^j\hat{g}_{j} separately. However, the latter results are not needed in the proofs of the subsequent convergence rate results given the consistency of Rη^()R_{\hat{\eta}}(\cdot).

Proposition 1.

Suppose that models (1) and (2) and Assumptions 1, 2 and 5 are satisfied. As nn\rightarrow\infty, we have

Rη^()Rη0()=op(1).\displaystyle\|R_{\hat{\eta}}(\cdot)-R_{\eta_{0}}(\cdot)\|_{\infty}=o_{p}(1). (5)

Define d2(η,η0)=𝔼U{(Rη(U)Rη0(U))2},d^{2}(\eta,\eta_{0})=\mathbb{E}_{U}\left\{(R_{\eta}(U)-R_{\eta_{0}}(U))^{2}\right\}, where 𝔼U\mathbb{E}_{U} denotes the expectation with respect to UU. Let L2\|\cdot\|_{L_{2}} denote the L2(P)L_{2}(P) norm and ρ(𝒌0)=0.5+0.5𝟏(i=1p(sha(g0,i){1,2}))\rho(\boldsymbol{k}_{0})=0.5+0.5\cdot\mathbf{1}(\cup_{i=1}^{p}({\rm sha}(g_{0,i})\in\{1,2\})). One of our main results is to establish the convergence rate of the SMPLE η^\hat{\eta}.

Theorem 1.

Assume the same conditions in Proposition 1. As nn\rightarrow\infty, we have

d(η^,η0)=Op(n12+ρ(𝒌0)).d(\hat{\eta},\eta_{0})=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).

Furthermore, if Assumptions 34 are satisfied and I(β0)I(\beta_{0}) (defined in (7)) is non-singular, then for 1jp1\leq j\leq p,

β^β0=Op(n12+ρ(𝒌0)),g^j(Z(j))g0,j(Z(j))L2=Op(n12+ρ(𝒌0)).\displaystyle\|\hat{\beta}-\beta_{0}\|=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right),\quad\|\hat{g}_{j}(Z_{(j)})-g_{0,j}(Z_{(j)})\|_{L_{2}}=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).

According to Theorem 1, the rates of convergence of all g^j\hat{g}_{j} are Op(n2/5)O_{p}(n^{-2/5}) if none of the additive components of gg is monotonic. Conversely, if one additive component of gg is monotonic, then their convergence rates all slow down to Op(n1/3)O_{p}(n^{-1/3}). An explanation for this finding is that the complexity of the class of bounded and monotonic functions is much larger than that of the class of bounded and convex (or concave) functions. These convergence rate results are free from the covariate dimensionality and exhibit a much more elegant form than those in Huang (1999) and Zhong et al. (2022). Theorem 1 also establishes the convergence rate of β^\hat{\beta}, although it is sub-optimal.

With Theorem 1, we are able to establish the uniformly rate of convergence for the SMPLE Λ^(y;η^)\hat{\Lambda}(y;\hat{\eta}) in (4) of the baseline cumulative hazard function Λ0(y)\Lambda_{0}(y). It turns out that Λ^(y;η^)\hat{\Lambda}(y;\hat{\eta}) has the same convergence rate as η^\hat{\eta}, although their convergence rates are quantified by different distances.

Theorem 2.

Assume the same conditions as in Proposition 1. As nn\rightarrow\infty, it holds that

supy[0,τ]|Λ^(y;η^)Λ0(y)|=Op(n12+ρ(𝒌0)).\displaystyle\sup_{y\in[0,\tau]}\left|\hat{\Lambda}(y;\hat{\eta})-\Lambda_{0}(y)\right|=O_{p}\left(n^{-\frac{1}{2+\rho(\boldsymbol{k}_{0})}}\right).
Remark 1.

In practice, one may impose a combination of monotonicity and convexity/concavity constraints on the additive components according to domain knowledge. See (Chen and Samworth, 2016; Kuchibhotla et al., 2023; Deng et al., 2023) for further motivation on additional shape constraints. Proposition 1 and Theorems 14 still hold when model (2) incorporates additive components that satisfy both monotonicity and convexity/concavity restrictions. An intuitive explanation for this result is that the parameter space 𝒢𝐤0\mathcal{G}_{\boldsymbol{k}_{0}} is reduced by additional constraints on the additive components and this can lead to better convergence rates of the SMPLE (if not the same).

4 Asymptotic normality and efficiency

Based on the convergence rate results in the previous section, in this section, we further show that our SMPLE in (3) for the linear covariate effect β^\hat{\beta} is asymptotically normal and semiparametric efficient, in the sense that its asymptotic variance achieves the semiparametric efficiency lower bound (Bickel et al., 1993) or the information bound of estimating β0\beta_{0} under models (1) and (2).

We begin with presenting the information bound for β0\beta_{0}. Recall that U=(X,Z)U=(X^{\top},Z^{\top})^{\top}, and define

M(y)M(yY,Δ,U)=Δ𝟏(Yy)0y𝟏(Yt)exp{Rη0(U)}𝑑Λ0(t),\displaystyle M(y)\equiv M(y\mid Y,\Delta,U)=\Delta\mathbf{1}(Y\leq y)-\int_{0}^{y}\mathbf{1}(Y\geq t)\exp\{R_{\eta_{0}}(U)\}d\Lambda_{0}(t),

which is a counting process martingale associated with the Cox model. The log-likelihood of model (1) based on one observation (X,Z,Y,Δ)(X,Z,Y,\Delta) is (up to constant)

(β,g,Λ)=Δlogλ(Y)+Δ{Xβ+g(Z)}Λ(Y)exp{Xβ+g(Z)}.\displaystyle\ell(\beta,g,\Lambda)=\Delta\log\lambda(Y)+\Delta\{X^{\top}\beta+g(Z)\}-\Lambda(Y)\exp\{X^{\top}\beta+g(Z)\}. (6)

Conisder a parametric smooth sub-model {λ(ν):ν}\{\lambda_{(\nu)}:\nu\in\mathbb{R}\} and {gj,(ν):ν}\{g_{j,(\nu)}:\nu\in\mathbb{R}\}, 1jp1\leq j\leq p, with λ(0)=λ0\lambda_{(0)}=\lambda_{0} and gj,(0)=g0,jg_{j,(0)}=g_{0,j}. Define L2(PY)L_{2}(P_{Y}) to be the set of a()a(\cdot) satisfying 𝔼{Δa2(Y)}<{\mathbb{E}}\{\Delta a^{2}(Y)\}<\infty and a(y)=logλ(ν)(y)/ν|ν=0,a(y)=\partial\log\lambda_{(\nu)}(y)/\partial\nu|_{\nu=0}, for some submodel. Similarly, for 1jp1\leq j\leq p, define L20(PZ(j))L_{2}^{0}(P_{Z_{(j)}}) to be the set of hjh_{j} satisfying 𝔼{Δhj(Z(j))}=0{\mathbb{E}}\{\Delta h_{j}(Z_{(j)})\}=0, 𝔼{Δhj2(Z(j))}<{\mathbb{E}}\{\Delta h_{j}^{2}(Z_{(j)})\}<\infty and hj(z(j))=gj,(ν)(z(j))/ν|ν=0h_{j}(z_{(j)})=\partial g_{j,(\nu)}(z_{(j)})/\partial\nu|_{\nu=0} for some submodel. The following lemma, which is Theorem 3.1 of Huang (1999), gives the information bound of β0\beta_{0}.

Lemma 1 (Theorem 3.1 of Huang (1999)).

Suppose that models (1) and (2) and Assumptions 14 are satisfied. Let ((𝐚),(𝐡1),,(𝐡p))((\boldsymbol{a}^{\star})^{\top},(\boldsymbol{h}_{1}^{\star})^{\top},\cdots,(\boldsymbol{h}_{p}^{\star})^{\top})^{\top} be the unique, vector-valued function in L2(PY)d×L20(PZ(1))d××L20(PZ(p))dL_{2}(P_{Y})^{d}\times L_{2}^{0}(P_{Z_{(1)}})^{d}\times\cdots\times L_{2}^{0}(P_{Z_{(p)}})^{d} that minimizes

𝔼{ΔX𝒂(Y)𝒉1(Z(1))𝒉p(Z(p))2}.\displaystyle{\mathbb{E}}\left\{\Delta\|X-\boldsymbol{a}(Y)-\boldsymbol{h}_{1}(Z_{(1)})-\cdots-\boldsymbol{h}_{p}(Z_{(p)})\|^{2}\right\}.
  • (1)

    The efficient score for estimation of β0\beta_{0} is

    β0(Y,Δ,U)=0τ{X𝒂(y)𝒉(Z)}𝑑M(y),\displaystyle\ell_{\beta_{0}}^{\star}(Y,\Delta,U)=\int_{0}^{\tau}\{X-\boldsymbol{a}^{\star}(y)-\boldsymbol{h}^{\star}(Z)\}dM(y),

    where 𝒉(Z)=j=1p𝒉j(Z(j))\boldsymbol{h}^{\star}(Z)=\sum\limits_{j=1}^{p}\boldsymbol{h}_{j}^{\star}(Z_{(j)}) and 𝒂(y)=𝔼{X𝒉(Z)Y=y,Δ=1}.\boldsymbol{a}^{\star}(y)={\mathbb{E}}\left\{X-\boldsymbol{h}^{\star}(Z)\mid Y=y,\Delta=1\right\}.

  • (2)

    The information bound for estimation of β0\beta_{0} is

    I(β0)=𝔼[{β0(Y,Δ,U)}2]=𝔼[Δ{X𝒂(Y)𝒉(Z)}2],\displaystyle I(\beta_{0})={\mathbb{E}}\left[\left\{\ell_{\beta_{0}}^{\star}(Y,\Delta,U)\right\}^{\otimes 2}\right]={\mathbb{E}}\left[\Delta\left\{X-\boldsymbol{a}^{\star}(Y)-\boldsymbol{h}^{\star}(Z)\right\}^{\otimes 2}\right], (7)

    where A2=AAA^{\otimes 2}=AA^{\top} for any vector or matrix AA.

Additional assumptions are needed to obtain the asymptotic normality and efficiency of β^\hat{\beta}. Denote 𝒉j=(𝒉j,1,,𝒉j,d)\boldsymbol{h}_{j}^{\star}=(\boldsymbol{h}_{j,1}^{\star},\cdots,\boldsymbol{h}_{j,d}^{\star})^{\top} for 1jp1\leq j\leq p.

Assumption 6.

When sha(g0,j){1,2}{\rm sha}(g_{0,j})\in\{1,2\}, there exist constant C~1>0\tilde{C}_{1}>0 and C~2>0\tilde{C}_{2}>0 such that 𝐡j(x1)𝐡j(y1)C~1|x1y1|\|\boldsymbol{h}^{\star}_{j}(x_{1})-\boldsymbol{h}^{\star}_{j}(y_{1})\|\leq\tilde{C}_{1}|x_{1}-y_{1}| and |g0,j1(x2)g0,j1(y2)|C~2|x2y2|.|g_{0,j}^{-1}(x_{2})-g_{0,j}^{-1}(y_{2})|\leq\tilde{C}_{2}|x_{2}-y_{2}|. for all x1,y1[0,1]x_{1},y_{1}\in[0,1] and x2,y2[M0,M0]x_{2},y_{2}\in[-M_{0},M_{0}].

Assumption 7.

When sha(g0,j){3,4}{\rm sha}(g_{0,j})\in\{3,4\}, the function 𝐡j,i~()\boldsymbol{h}_{j,\tilde{i}}^{\star}(\cdot) is p¯\bar{p}-Ho¨\rm\ddot{o}lder continuous for all 1i~d1\leq\tilde{i}\leq d and some p¯(ρ(𝐤0), 2]\bar{p}\in(\rho(\boldsymbol{k}_{0}),\;2].

Assumption 8.

When sha(g0,j){3}{\rm sha}(g_{0,j})\in\{3\}, the function g0,j(t)C~t2g_{0,j}(t)-\tilde{C}t^{2} is convex for some constant C~>0\tilde{C}>0; when sha(g0,j){4}{\rm sha}(g_{0,j})\in\{4\}, the function g0,j(t)+C~t2g_{0,j}(t)+\tilde{C}t^{2} is concave for some constant C~>0\tilde{C}>0.

Assumption 6 is used to control the fluctuation of the score function corresponding to the monotone components in the direction of the projection defined in Lemma 1. (Huang, 2002; Cheng, 2009) adopted a similar assumption. Assumptions 78, which are analogues of Assumption B1 of Kuchibhotla et al. (2023), are used for approximations of the convex (concave) components.

Theorem 3.

Suppose that models (1) and (2) and Assumptions 18 are satisfied. Further assume that I(β0)I(\beta_{0}) is non-singular. Then as nn\rightarrow\infty, n(β^β0)dN(0,I1(β0)).\sqrt{n}(\hat{\beta}-\beta_{0})\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,I^{-1}(\beta_{0})). This implies that the asymptotic variance achieves the information bound and β^\hat{\beta} is asymptotically semiparametric efficient among all regular estimators of β0\beta_{0}.

By Theorem 3, β^\hat{\beta} has an asymptotically normal distribution with asymptotic variance I1(β0)I^{-1}(\beta_{0}). When making inference about β0\beta_{0} based on this theorem, we need to construct a consistent estimator for I1(β0)I^{-1}(\beta_{0}). However, I1(β0)I^{-1}(\beta_{0}) or equivalently I(β0)I(\beta_{0}) has a rather complicated form, making its plug-in estimator not easy to use. To crack this nut, we propose a novel data-splitting estimation method to estimate I1(β0)I^{-1}(\beta_{0}).

4.1 Data-splitting variance estimation and inference on β\beta

We introduce the proposed data-splitting variance estimation method under a general setting as it is applicable generally. Let θ0d\theta_{0}\in\mathbb{R}^{d} be a functional of a statistical population 𝒫\mathscr{P} and θ^n\hat{\theta}_{n} be an estimator of θ0\theta_{0} based on i.i.d samples {Oi}i=1n\{O_{i}\}_{i=1}^{n} from 𝒫\mathscr{P}. Suppose that n(θ^nθ0)𝑑N(0,Σ)\sqrt{n}(\hat{\theta}_{n}-\theta_{0}){\overset{d}{\longrightarrow\;}}N(0,\Sigma), where Σ\Sigma is a semi-positive matrix. Let kn<nk_{n}<n and knk_{n}\rightarrow\infty. We partition the sample into kn\lfloor k_{n}\rfloor subsamples, each of which has mn=n/knm_{n}=\lfloor n/k_{n}\rfloor observations, and let θ^ni\hat{\theta}_{ni} denote the estimator of θ0\theta_{0} based on the ii-th subsample, 1ikn1\leq i\leq\lfloor k_{n}\rfloor. Our splitting-data estimator for the asymptotic variance Σ\Sigma is defined as

Σ^=mnkni=1kn(θ^niθ^¯n)(θ^niθ^¯n),\displaystyle\hat{\Sigma}=\frac{m_{n}}{\lfloor k_{n}\rfloor}\sum_{i=1}^{\lfloor k_{n}\rfloor}(\hat{\theta}_{ni}-\bar{\hat{\theta}}_{n})(\hat{\theta}_{ni}-\bar{\hat{\theta}}_{n})^{\top}, (8)

where θ^¯n\bar{\hat{\theta}}_{n} is the sample mean of θ^n1,,θ^nkn\hat{\theta}_{n1},\ldots,\hat{\theta}_{n\lfloor k_{n}\rfloor}. For better stableness, we may repeat the above splitting and estimating procedure for many times and take the average of the resulting variance estimates as a final variance estimate.

Theorem 4.

Let θ^n\hat{\theta}_{n} be an estimator of θ0\theta_{0} based on i.i.d samples {Oi}i=1n\{O_{i}\}_{i=1}^{n} and n(θ^nθ0)𝑑N(0,Σ)\sqrt{n}(\hat{\theta}_{n}-\theta_{0}){\overset{d}{\longrightarrow\;}}N(0,\Sigma). Let kn=nα~k_{n}=n^{\tilde{\alpha}} for some α~(0,1)\tilde{\alpha}\in(0,1), mn=n1α~m_{n}=\lfloor n^{1-\tilde{\alpha}}\rfloor and Σ^\hat{\Sigma} be the variance estimator in (8). Then Σ^=Σ+op(1)\hat{\Sigma}=\Sigma+o_{p}(1) as nn\rightarrow\infty.

Theorem 4 guarantees the validity of the data-splitting variance estimator. This method is very easy to use and is flexible enough for general purpose. Alternatively, we may construct a variance estimator by bootstrap. However, the consistency of a bootstrap variance estimator often requires stronger conditions (Groeneboom and Hendrickx, 2017) and is often very difficult to prove, especially under shape restrictions.

As a specific application, we apply the data-splitting estimation method to construct an estimator for the information bound or the asymptotic variance I1(β0)I^{-1}(\beta_{0}) of β^\hat{\beta}. Denote the resulting estimator by I1(β0)^\widehat{I^{-1}(\beta_{0})}, which is consistent to I1(β0)I^{-1}(\beta_{0}) by Theorem 4. Therefore n(β^β0){I1(β0)^}1(β^β0)n(\hat{\beta}-\beta_{0})^{\mathrm{\scriptscriptstyle\top}}\{\widehat{I^{-1}(\beta_{0})}\}^{-1}(\hat{\beta}-\beta_{0}) follows asymptotically χ2(d)\chi^{2}(d), a chisquare distribution of dd degrees of freedom. For α(0,1)\alpha\in(0,1), let χ1α2(d)\chi^{2}_{1-\alpha}(d) be the (1α)(1-\alpha) quantile of χ2(d)\chi^{2}(d). A (1α)(1-\alpha)-level confidence region for β0\beta_{0} can be constructed as

{β:n(β^β){I1(β0)^}1(β^β)χ1α2(d)}.\displaystyle\{\beta:n(\hat{\beta}-\beta)^{\mathrm{\scriptscriptstyle\top}}\{\widehat{I^{-1}(\beta_{0})}\}^{-1}(\hat{\beta}-\beta)\leq\chi^{2}_{1-\alpha}(d)\}. (9)

And for the hypothesis H0:β=β0H1:ββ0H_{0}:\beta=\beta_{0}\leftrightarrow H_{1}:\beta\neq\beta_{0}, we propose to reject H0H_{0} at the significance level α\alpha if

n(β^β0){I1(β0)^}1(β^β0)>χ1α2(d).\displaystyle n(\hat{\beta}-\beta_{0})^{\mathrm{\scriptscriptstyle\top}}\{\widehat{I^{-1}(\beta_{0})}\}^{-1}(\hat{\beta}-\beta_{0})>\chi^{2}_{1-\alpha}(d). (10)

By Theorems 3 and 4, the confidence region (9) has an asymptotically correct (1α)(1-\alpha) coverage probability, and the test defined by the rejection region (10) has an asymptotically correct type I eror α\alpha.

5 Simulations

In this section, we conduct simulations to assess the finite-sample performance of the proposed SMPLE β^\hat{\beta} and the proposed confidence region (9) for the linear covariate effect β\beta. To generate data, we take XX and ZZ to be two scalar random variables, which are iid from the standard normal distribution, and take the conditional distribution of TT given (X,Z)(X,Z) to be an exponential distribution with mean 1/exp(β0X+g0(Z))1/\exp(\beta_{0}X+g_{0}(Z)). Therefore the conditional hazard function of TT given (X,Z)(X,Z) is exp(β0X+g0(Z))\exp(\beta_{0}X+g_{0}(Z)). We set the censoring time CC to follow a uniform distribution on (0,c)(0,c). We consider three scenarios for g0g_{0}: (I) g0(z)=2zg_{0}(z)=-2z, (II) g0(z)=|z|3/2g_{0}(z)=-|z|^{3}/2 and (III) g0(z)=2|z|g_{0}(z)=2|z|. We set β0=2\beta_{0}=-2 and consider two choices for cc: 55 and 1010, and two sample sizes: 600600 and 800800. A larger cc results in a smaller censoring proportion.

When implementing our SMPLE, we set 𝒌0\boldsymbol{k}_{0} to be 3,4,33,4,3 in Scenarios I–III, respectively. For comparison, we also consider the traditional Cox regression estimator (TCR) of β\beta and the partial likelihood estimator of β\beta with the rr-order polynomial splines under the partially linear additive model (Huang, 1999, SPLA-rr), where rr may be 2,32,3 or 44. We generate 1000 random samples to evaluate the performance of the above five estimation methods.

5.1 Point estimation

Table 1 presents 100100 times the simulated root mean square errors (RMSEs) and the absolute biases (BIASs) of these estimators. The model assumptions of SMPLE and the SPLA-rr are correct in all the three scenarios, whereas the standard Cox model is correctly specified only in Scenario I. As expected, in Scenario I, TCR has uniformly the best performance among the five estimators under comparison in terms of RMSE and BIAS. Nevertheless, the SMPLE and the SPLA-rr estimators have almost the same RMSEs and BIASs. When the standard Cox model is misspecified in Scenarios II and III, TCR has much larger RMSEs and BIASs than SMPLE and the SPLA-rr, or equivalently, SMPLE and the SPLA-rr have clear priority over TCR. Compared with SPLA-rr, SMPLE is comparable and slightly inferior in Scenarios I and II, but has uniformly much smaller RMSEs and BIASs in Scenario III. A possible explanation for this phenomenon is that although continuous in all three scenarios, the hazard function is smooth in Scenarios I and II but nonsmooth in Scenario III. As the sample size nn or the constant cc increases, we have more completely observed data, consequently all estimators have improved performance when the underlying model assumption is correct. A counterexample is the performance of TCR in Scenarios II and III, where TCR has larger RMSEs and BIASs as nn or cc increases.

Figure 1 displays the boxplots of the SMPLE and SPLA-rr (r=2, 3, 4) estimators (minus β0\beta_{0}) of β\beta under study when the sample size is n=800n=800. TCR is excluded here as it has extremely large RMSEs and BIASs in Scenarios II and III. SMPLE and the three SPLA-rr exhibit almost the same performance in Scenario I. In Scenario II, where the true hazard function is smooth, the four methods have close variances, but from SMPLE, to SPLA-22, SPLA-33, and SPLA-44, their BIASs become smaller and smaller. In Scenario III, where the true hazard function is nonsmooth, the four methods have close variances again, however the three SPLA-rr estimators have much larger BIASs than SMPLE, whose BIASs are negligible.

Table 1: Simulated root mean square errors and absolute biases (in parentheses) of the five point estimators under comparison. All results have been multiplied by 100.
Scenario cc nn SMPLE TCR SPLA-22 SPLA-33 SPLA-44
I 5 600 9.25(7.29)\underset{(7.29)}{9.25} 9.13(7.17)\underset{(7.17)}{9.13} 9.17(7.21)\underset{(7.21)}{9.17} 9.22(7.27)\underset{(7.27)}{9.22} 9.29(7.32)\underset{(7.32)}{9.29}
800 8.04(6.36)\underset{(6.36)}{8.04} 7.95(6.30)\underset{(6.30)}{7.95} 7.99(6.33)\underset{(6.33)}{7.99} 8.01(6.35)\underset{(6.35)}{8.01} 8.03(6.36)\underset{(6.36)}{8.03}
10 600 8.96(7.19)\underset{(7.19)}{8.96} 8.86(7.07)\underset{(7.07)}{8.86} 8.90(7.11)\underset{(7.11)}{8.90} 8.96(7.17)\underset{(7.17)}{8.96} 9.01(7.21)\underset{(7.21)}{9.01}
800 7.65(6.07)\underset{(6.07)}{7.65} 7.57(6.03)\underset{(6.03)}{7.57} 7.61(6.05)\underset{(6.05)}{7.61} 7.62(6.05)\underset{(6.05)}{7.62} 7.64(6.06)\underset{(6.06)}{7.64}
II 5 600 10.29(8.19)\underset{(8.19)}{10.29} 68.89(67.57)\underset{(67.57)}{68.89} 9.46(7.65)\underset{(7.65)}{9.46} 9.46(7.66)\underset{(7.66)}{9.46} 9.70(7.78)\underset{(7.78)}{9.70}
800 8.49(6.68)\underset{(6.68)}{8.49} 70.06(69.02)\underset{(69.02)}{70.06} 8.17(6.53)\underset{(6.53)}{8.17} 8.10(6.44)\underset{(6.44)}{8.10} 8.09(6.36)\underset{(6.36)}{8.09}
10 600 9.89(7.86)\underset{(7.86)}{9.89} 78.58(77.53)\underset{(77.53)}{78.58} 9.21(7.42)\underset{(7.42)}{9.21} 9.18(7.40)\underset{(7.40)}{9.18} 9.30(7.44)\underset{(7.44)}{9.30}
800 8.20(6.46)\underset{(6.46)}{8.20} 79.62(78.79)\underset{(78.79)}{79.62} 8.04(6.48)\underset{(6.48)}{8.04} 7.93(6.38)\underset{(6.38)}{7.93} 7.78(6.15)\underset{(6.15)}{7.78}
III 5 600 8.54(6.87)\underset{(6.87)}{8.54} 63.46(63.15)\underset{(63.15)}{63.46} 20.54(18.83)\underset{(18.83)}{20.54} 19.39(17.63)\underset{(17.63)}{19.39} 9.74(7.95)\underset{(7.95)}{9.74}
800 7.28(5.74)\underset{(5.74)}{7.28} 63.61(63.35)\underset{(63.35)}{63.61} 20.92(19.60)\underset{(19.60)}{20.92} 19.98(18.62)\underset{(18.62)}{19.98} 9.28(7.74)\underset{(7.74)}{9.28}
10 600 8.43(6.78)\underset{(6.78)}{8.43} 63.70(63.40)\underset{(63.40)}{63.70} 21.05(19.40)\underset{(19.40)}{21.05} 19.91(18.21)\underset{(18.21)}{19.91} 9.86(8.04)\underset{(8.04)}{9.86}
800 7.17(5.66)\underset{(5.66)}{7.17} 63.71(63.47)\underset{(63.47)}{63.71} 21.25(20.01)\underset{(20.01)}{21.25} 20.31(19.04)\underset{(19.04)}{20.31} 9.34(7.78)\underset{(7.78)}{9.34}
Refer to caption
Figure 1: Boxplots of the SMPLE, SPLA-22, SPLA-33 and SPLA-44 estimates (minus β0\beta_{0}) of β\beta when the sample size is n=800n=800.
Table 2: Simulated coverage probabilities and average interval lengths (in parentheses) of the confidence intervals at the 95% confidence level based on the five estimators under comparison
Scenario cc nn SMPLE TCR SPLA-22 SPLA-33 SPLA-44
I 5 600 0.947(0.412)\underset{(0.412)}{0.947} 0.930(0.384)\underset{(0.384)}{0.930} 0.940(0.396)\underset{(0.396)}{0.940} 0.944(0.409)\underset{(0.409)}{0.944} 0.948(0.422)\underset{(0.422)}{0.948}
800 0.941(0.343)\underset{(0.343)}{0.941} 0.937(0.324)\underset{(0.324)}{0.937} 0.937(0.332)\underset{(0.332)}{0.937} 0.941(0.340)\underset{(0.340)}{0.941} 0.949(0.348)\underset{(0.348)}{0.949}
10 600 0.940(0.388)\underset{(0.388)}{0.940} 0.930(0.364)\underset{(0.364)}{0.930} 0.937(0.374)\underset{(0.374)}{0.937} 0.940(0.385)\underset{(0.385)}{0.940} 0.942(0.395)\underset{(0.395)}{0.942}
800 0.945(0.327)\underset{(0.327)}{0.945} 0.935(0.310)\underset{(0.310)}{0.935} 0.941(0.317)\underset{(0.317)}{0.941} 0.949(0.325)\underset{(0.325)}{0.949} 0.948(0.332)\underset{(0.332)}{0.948}
II 5 600 0.947(0.459)\underset{(0.459)}{0.947} 0.001(0.452)\underset{(0.452)}{0.001} 0.941(0.410)\underset{(0.410)}{0.941} 0.951(0.422)\underset{(0.422)}{0.951} 0.951(0.440)\underset{(0.440)}{0.951}
800 0.950(0.382)\underset{(0.382)}{0.950} 0(0.396)\underset{(0.396)}{0} 0.926(0.346)\underset{(0.346)}{0.926} 0.938(0.355)\underset{(0.355)}{0.938} 0.949(0.368)\underset{(0.368)}{0.949}
10 600 0.943(0.429)\underset{(0.429)}{0.943} 0(0.437)\underset{(0.437)}{0} 0.929(0.383)\underset{(0.383)}{0.929} 0.935(0.393)\underset{(0.393)}{0.935} 0.938(0.407)\underset{(0.407)}{0.938}
800 0.945(0.355)\underset{(0.355)}{0.945} 0(0.383)\underset{(0.383)}{0} 0.919(0.321)\underset{(0.321)}{0.919} 0.931(0.329)\underset{(0.329)}{0.931} 0.942(0.340)\underset{(0.340)}{0.942}
III 5 600 0.955(0.384)\underset{(0.384)}{0.955} 0(0.261)\underset{(0.261)}{0} 0.424(0.339)\underset{(0.339)}{0.424} 0.494(0.350)\underset{(0.350)}{0.494} 0.908(0.364)\underset{(0.364)}{0.908}
800 0.955(0.321)\underset{(0.321)}{0.955} 0(0.223)\underset{(0.223)}{0} 0.260(0.289)\underset{(0.289)}{0.260} 0.302(0.296)\underset{(0.296)}{0.302} 0.863(0.304)\underset{(0.304)}{0.863}
10 600 0.952(0.373)\underset{(0.373)}{0.952} 0(0.254)\underset{(0.254)}{0} 0.370(0.330)\underset{(0.330)}{0.370} 0.441(0.340)\underset{(0.340)}{0.441} 0.893(0.353)\underset{(0.353)}{0.893}
800 0.948(0.312)\underset{(0.312)}{0.948} 0(0.217)\underset{(0.217)}{0} 0.233(0.281)\underset{(0.281)}{0.233} 0.279(0.288)\underset{(0.288)}{0.279} 0.847(0.296)\underset{(0.296)}{0.847}

5.2 Interval estimation

We continue to compare the proposed confidence interval (CI) for β\beta with those based on TCR and the three SPLA-rr estimators. To be specific, let β~\tilde{\beta} denote a generic estimator of β\beta and let σ2\sigma^{2} denote its asymptotic variance. We use the proposed data-splitting method with α~=0.35\tilde{\alpha}=0.35 to estimate the asymptotic variance σ2\sigma^{2}, and let σ~2\tilde{\sigma}^{2} be the resulting estimator. Then a Wald-type confidence interval at the confidence level 95% for β\beta is β~±n1/2σ~u0.975\tilde{\beta}\pm n^{-1/2}\tilde{\sigma}u_{0.975} and its length is 2n1/2σ~u0.9752n^{-1/2}\tilde{\sigma}u_{0.975}, where u0.975u_{0.975} is the 0.9750.975 quantile of the standard normal distribution. Table 2 presents the simulated coverage probabilities and average interval lengths of the Wald-type confidence intervals based on the five estimators.

We observe that in all the three scenarios, the SMPLE-based CI always has very accurate coverage probabilities. In contrast, the TCR-based CI has desirable coverage accuracy only in Scenario I, where the Cox model is correct, and its performance in Scenarios II and III is unacceptable. The CIs based on SPLA-rr have acceptable coverage accuracy in Scenarios I and II, although they may have slight undercoverage, but performs very poor in Scenario III. Interestingly, the CI based on SPLA-rr with a larger rr have more accurate coverage accuracy. To get more insights about these simulation results, we display in Figure 2 the QQ-plots of the standardized estimates n{I1(β0)^}1/2(β~β0)\sqrt{n}\{\widehat{I^{-1}(\beta_{0})}\}^{-1/2}(\tilde{\beta}-\beta_{0}) for β~\tilde{\beta} being the SMPLE and SPLA-44 estimators. We see that the distribution of the standardized SMPLE estimator is always very close to the standard normal. In contrast, the distribution of the standardized SPLA-44 estimator is close to the standard normal in Scenarios I and II, but far away from the standard normal in Scenario III. This may explains why the SMPLE-based CI always has desirable coverage accuracy but the SPLA-44-based CI has severely undercoverage in Scenario III.

In summary, when the Cox model is correct, the point and interval estimators based on the proposed SMPLE have very close performance as those based on TCR, which is the optimal estimation method under the Cox model. And they have obvious advantages over those based on TCR or SPLA-rr when the Cox model is incorrect or the hazard function is nonsmooth. Another advantage of SMPLE over SPLA-rr is that it is free from tuning parameters but the latter is not. Tuning parameters usually have a big influence on the subsequent analysis but their optimal choices are difficult to determine in practice.

Refer to caption
Figure 2: QQ-plots of the standardized estimates (n{I1(β0)^}1/2(β~β0)\sqrt{n}\{\widehat{I^{-1}(\beta_{0})}\}^{-1/2}(\tilde{\beta}-\beta_{0})) of SMPLE (blue circle) and SPLA-44 (green cross) versus N(0,1)N(0,1). Columns 1–3: Scenarios I–III; Row 1: c=5c=5 and n=600n=600; Row 2: c=10c=10 and n=600n=600; Row 3: c=5c=5 and n=800n=800; Row 4: c=10c=10 and n=800n=800.

6 A real application

We apply the proposed SMPLE method to analyze the Rotterdam Breast Cancer (RBC) dataset, which is publicly available from the R package survival. This dataset was used in Royston and Altman (2013) to perform external validation of a Cox prognostic model. The RBC dataset comprises 2982 primary breast cancers patients whose records were included in the Rotterdam tumor bank. We focus on studying potential factors that affect the survival time (TT) of breast cancers patients, defined as the days from primary surgery to the death. We consider four covariates: progesterone receptors (unit: fmol/l, XX), age at surgery (Z1Z_{1}), number of positive lymph nodes (Z2Z_{2}), estrogen receptors (unit: fmol/l, Z3Z_{3}). Let Z=(Z1,Z2,Z3)Z=(Z_{1},Z_{2},Z_{3})^{\mathrm{\scriptscriptstyle\top}}. All covariates have been scaled to between 0 and 11. The impact of the content of progesterone receptors on the treatment of breast cancer is not clear. We consider modelling the conditional hazard of TT given (X=x,Z=z)(X=x,Z=z) by the following partially linear Cox additive model

λT(tx,z)=λ(t)exp{xβ+f1(z1)+f2(z2)+f3(z3)},\displaystyle\lambda_{T}(t\mid x,z)=\lambda(t)\exp\{x\beta+f_{1}(z_{1})+f_{2}(z_{2})+f_{3}(z_{3})\}, (11)

where λ()\lambda(\cdot), f1()f_{1}(\cdot) f2()f_{2}(\cdot) and f3()f_{3}(\cdot) are all left un-specified. Since elderly patients are typically predisposed to a higher risk of cancer deterioration or recurrence, we postulate that f1()f_{1}(\cdot) is monotonically increasing. Similarly, f2()f_{2}(\cdot) and f3()f_{3}(\cdot) are also assumed to be monotonically increasing functions. Inspired by our extensive simulation results and discussion in Remark 1, we further impose a convexity or concavity restriction on f1()f_{1}(\cdot)f3()f_{3}(\cdot).

Because menopause generally affects the therapeutic effect and the mortality of breast cancer, we categorize the patients into two groups based on whether they have reached menopause or not. Group I (premenopausal) comprises 13121312 samples, exhibiting a censor rate of 64.33%64.33\%, while Group II (postmenopausal) consists of 16701670 samples with a censor rate of 51.86%51.86\%. We apply the point and interval estimation methods considered in the simulation section to estimate the linear covariate effect under model (11). We set the confidence level to 95% when constructing confidence intervals. In the proposed variance estimation method, we take α~=0.3\tilde{\alpha}=0.3 and construct variance estimates by repeating the splitting and estimating procedure twenty times and taking average. Table 3 summarizes the analysis results.

Table 3: Results of point and interval estimation for the RBC dataset.
Group I Group II
Point estimate Interval estimate Point estimate Interval estimate
SMPLE 5.96-5.96 [8.96,2.95][-8.96,-2.95] 0.62-0.62 [1.23,0.01][-1.23,-0.01]
TCR 5.19-5.19 [8.22,2.16][-8.22,-2.16] 0.70-0.70 [1.41,0.01][-1.41,0.01]
SPLA-22 5.33-5.33 [8.48,2.18][-8.48,-2.18] 0.39-0.39 [1.08,0.30][-1.08,0.30]
SPLA-33 4.98-4.98 [8.26,1.69][-8.26,-1.69] 0.40-0.40 [1.13,0.33][-1.13,0.33]
SPLA-44 5.15-5.15 [8.66,1.64][-8.66,-1.64] 0.34-0.34 [1.08,0.39][-1.08,0.39]

The point estimates of β\beta are about 5-5 or less in Group I, which are much less than those in Group II, namely 0.7-0.7 or larger. This implies that the effect of progesterone receptors (XX) on the hazard rate of survival time among breast cancer patients has a significant increase in Group II than in Group I, which potentially stems from the substantial reduction in progesterone levels in women’s bodies post-menopause. For patients who have not reached menopause, the 95%95\% confidence intervals derived from all five methods exclude 0, suggesting a significant effect of the progesterone receptors on the survival time of breast cancer patients. Among these techniques, our proposed SMPLE method produces the shortest confidence interval. Conversely, for postmenopausal patients, only our SMPLE-based confidence interval excludes 0, which together with the point estimate 0.62-0.62 suggests that the progesterone receptors still has a significant effect in increasing the survival time of breast cancer patients. Once again, our confidence interval exhibits the shortest confidence interval. According to our reasonable shape restriction assumptions and our simulation experience especially those in Scenario III, we believe that the analysis results based on our SMPLE are more reliable, which also demonstrate its priority over the competing methods.

Figure 3 depicts the estimates of f1()f_{1}(\cdot)f3()f_{3}(\cdot) in model (3) under our shape constraints. All the estimated curves are continuous and piecewise linear functions in both groups. We see that both number of positive lymph nodes (Z2Z_{2}) and estrogen receptors (Z3Z_{3}) have very similar effects in the two groups on the survival time TT. Exceptionally, the covariate age at the time of surgery (Z1Z_{1}) has a significantly more pronounced effect on TT in Group II than in Group I. This discrepancy may stem from the fact that premenopausal women tend to be younger, and thus, age has not yet exerted a substantial influence on their physical condition.

Refer to caption
Figure 3: Curves of the fitted f1()f_{1}(\cdot)f3()f_{3}(\cdot) in model (11) by the proposed shape-restricted maximum likelihood estimation. Row 1: Group I, Row 2: Group II.

7 Concluding remarks

In this paper, we study an additive and shape-restricted partially linear Cox model. We systematically investigate the consistencies and convergence rates of the SMPLEs for the additive components and the baseline cumulative hazard function. Notably, the convergence rates of the SMPLEs for the infinite-dimensional parameters, including the additive components and Λ()\Lambda(\cdot), are independent of the covariate dimension, exhibiting an elegant form. We find that the SMPLE for β\beta is n\sqrt{n} consistent and asymptotically semiparametric efficiency.

To estimate the asymptotic variance, we propose a flexible variance estimation method by data splitting. We prove that the resulting variance estimator is consistent if α~(0,1)\tilde{\alpha}\in(0,1), namely, both the number of split samples and the sample size of each split sample all goes to infinity. With this variance estimator, valid confidence regions for the linear covariate effect can be constructed. We set α~\tilde{\alpha} to 0.3 around in our numerical studies. Natural questions are if there is an optimal choice of α~\tilde{\alpha}, and if so, how to determine the optimal value. For the time being, we do not have any results on these questions, which may be left for future research.

In addition to the efficiency studies of the estimation of the linear covariate effect, there is also a compelling interest in developing the minimax lower bound for the estimation of the additive and shape-restricted components. We conjecture that, by leveraging the techniques in (Guntuboyina and Sen, 2015; Zhong et al., 2022), the minimax lower bound under the L2(P)L_{2}(P) norm for the convex/concave component may be (asymptotically) lower bounded by n2/5n^{-2/5} up to a constant factor. Furthermore, if none of additive components is monotonic, the SMPLEs for the convex/concave components are rate-optimal, as suggested by Theorem 1. However, a formal proof of this result is beyond the scope of this paper.

References

  • Balabdaoui et al. (2019) Balabdaoui, F., Groeneboom, P., and Hendrickx, K. (2019). Score estimation in the monotone single-index model. Scandinavian Journal of Statistics, 46(2), 517–544.
  • Bickel et al. (1993) Bickel, P. J., Klaassen, J., Ritov, Y., and Wellner, J. A. (1993). Efficient and adaptive estimation for semiparametric models, volume 4. Springer.
  • Chang et al. (2007) Chang, I.-S., Chien, L.-C., Hsiung, C. A., Wen, C.-C., and Wu, Y.-J. (2007). Shape restricted regression with random bernstein polynomials. Lecture Notes-Monograph Series, pages 187–202.
  • Chen and Samworth (2016) Chen, Y. and Samworth, R. J. (2016). Generalized additive and index models with shape constraints. Journal of the Royal Statistical Society, 78(4), 729–754.
  • Cheng (2009) Cheng, G. (2009). Semiparametric additive isotonic regression. Journal of Statistical Planning and Inference, 139(6), 1980–1991.
  • Cox (1972) Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–202.
  • Cox (1975) Cox, D. R. (1975). Partial likelihood. Biometrika, 62(2), 269–276.
  • Deng et al. (2023) Deng, G., Xu, G., Fu, Q., Wang, X., and Qin, J. (2023). Active-set algorithm-based statistical inference for shape-restricted generalized additive cox regression models. Journal of Statistical Computation and Simulation, 93(3), 416–441.
  • Deng and Zhang (2020) Deng, H. and Zhang, C.-H. (2020). Isotonic regression in multi-dimensional spaces and graphs. The Annals of Statistics, 48(6), 3672–3698.
  • Feng et al. (2022) Feng, O. Y., Chen, Y., Han, Q., Carroll, R. J., and Samworth, R. J. (2022). Nonparametric, tuning-free estimation of s-shaped functions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(4), 1324–1352.
  • Groeneboom and Hendrickx (2017) Groeneboom, P. and Hendrickx, K. (2017). The nonparametric bootstrap for the current status model. Electronic Journal of Statistics, 11(2), 3446–3484.
  • Groeneboom and Jongbloed (2014) Groeneboom, P. and Jongbloed, G. (2014). Nonparametric Estimation under Shape Constraints: Estimators, Algorithms and Asymptotics. London, UK: Cambridge University Press.
  • Guntuboyina and Sen (2015) Guntuboyina, A. and Sen, B. (2015). Global risk bounds and adaptation in univariate convex regression. Probability Theory and Related Fields, 163(1), 379–411.
  • Hastie and Tibshirani (1986) Hastie, T. J. and Tibshirani, R. J. (1986). Generalized additive models. Statistical Science, 1(3), 297–310.
  • Hastie and Tibshirani (1990) Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Aditive Models. London: Chapman and Hall.
  • Heller (2001) Heller, G. (2001). The cox proportional hazards model with a partly linear relative risk function. Lifetime data analysis, 7, 255–277.
  • Horowitz and Lee (2017) Horowitz, J. L. and Lee, S. (2017). Nonparametric estimation and inference under shape restrictions. Journal of Econometrics, 201(1), 108–126.
  • Huang (1999) Huang, J. (1999). Efficient estimation of the partly linear additive Cox model. The Annals of Statistics, 27(5), 1536–1563.
  • Huang (2002) Huang, J. (2002). A note on estimating a partly linear model under monotonicity constraints. Journal of Statistical Planning and Inference, 107(1-2), 343–351.
  • Kuchibhotla et al. (2023) Kuchibhotla, A. K., Patra, R. K., and Sen, B. (2023). Semiparametric efficiency in convexity constrained single index model. Journal of the American Statistical Association, 118(541), 272–286.
  • Matzkin (1991) Matzkin, R. L. (1991). Semiparametric estimation of monotone and concave utility functions for polychotomous choice models. Econometrica, 59(5), 1315–1327.
  • O’Sullivan (1993) O’Sullivan, F. (1993). Nonparametric estimation in the cox model. The Annals of Statistics, 21(1), 124–145.
  • Qin et al. (2014) Qin, J., Garcia, T. P., Ma, Y., Tang, M.-X., Marder, K., and Wang, Y. (2014). Combining isotonic regression and em algorithm to predict genetic risk under monotonicity constraint. The annals of applied statistics, 8(2), 1182–1208.
  • Qin et al. (2021) Qin, J., Deng, G., Ning, J., Yuan, A., and Shen, Y. (2021). Estrogen receptor expression on breast cancer patients’ survival under shape-restricted cox regression model. The annals of applied statistics, 15(3), 1291–1307.
  • Rong et al. (2024) Rong, Y., Zhao, S. D., Zheng, X., and Li, Y. (2024). Kernel cox partially linear regression: Building predictive models for cancer patients’ survival. Statistics in Medicine, 43(1), 1–15.
  • Royston and Altman (2013) Royston, P. and Altman, D. G. (2013). External validation of a cox prognostic model: principles and methods. BMC medical research methodology, 13, 1–15.
  • Sasieni (1992) Sasieni, P. (1992). Information bounds for the conditional hazard ratio in a nested family of regression models. Journal of the Royal Statistical Society: Series B (Methodological), 54(2), 617–635.
  • Sleeper and Harrington (1990) Sleeper, L. A. and Harrington, D. P. (1990). Regression splines in the cox model with application to covariate effects in liver disease. Journal of the American Statistical Association, 85(412), 941–949.
  • Varian (1984) Varian, H. R. (1984). The nonparametric approach to production analysis. Econometrica, 52(3), 579–597.
  • Wang and Ghosh (2012) Wang, J. and Ghosh, S. K. (2012). Shape restricted nonparametric regression with bernstein polynomials. Computational Statistics & Data Analysis, 56(9), 2729–2741.
  • Zhong et al. (2022) Zhong, Q., Mueller, J., and Wang, J.-L. (2022). Deep learning for the partially linear Cox model. The Annals of Statistics, 50(3), 1348–1375.